School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 12
  • Item
    No Preview Available
    The impact of low-cost, genome-wide resequencing on association studies.
    Balding, D (Springer Science and Business Media LLC, 2005-06)
  • Item
    Thumbnail Image
    Population Structure and Cryptic Relatedness in Genetic Association Studies
    Astle, W ; Balding, DJ (INST MATHEMATICAL STATISTICS, 2009-11)
    We review the problem of confounding in genetic association studies, which arises principally because of population structure and cryptic relatedness. Many treatments of the problem consider only a simple ``island'' model of population structure. We take a broader approach, which views population structure and cryptic relatedness as different aspects of a single confounder: the unobserved pedigree defining the (often distant) relationships among the study subjects. Kinship is therefore a central concept, and we review methods of defining and estimating kinship coefficients, both pedigree-based and marker-based. In this unified framework we review solutions to the problem of population structure, including family-based study designs, genomic control, structured association, regression control, principal components adjustment and linear mixed models. The last solution makes the most explicit use of the kinships among the study subjects, and has an established role in the analysis of animal and plant breeding studies. Recent computational developments mean that analyses of human genetic association data are beginning to benefit from its powerful tests for association, which protect against population structure and cryptic kinship, as well as intermediate levels of confounding by the pedigree.
  • Item
    Thumbnail Image
    Limit theorems for sequences of random trees
    Balding, D ; Ferrari, PA ; Fraiman, R ; Sued, M (SPRINGER, 2009-08)
    We consider a random tree and introduce a metric in the space of trees to define the ``mean tree'' as the tree minimizing the average distance to the random tree. When the resulting metric space is compact we have laws of large numbers and central limit theorems for sequence of independent identically distributed random trees. As application we propose tests to check if two samples of random trees have the same law.
  • Item
    Thumbnail Image
    Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation
    Cornuet, J-M ; Santos, F ; Beaumont, MA ; Robert, CP ; Marin, J-M ; Balding, DJ ; Guillemaud, T ; Estoup, A (OXFORD UNIV PRESS, 2008-12-01)
    UNLABELLED: Genetic data obtained on population samples convey information about their evolutionary history. Inference methods can extract part of this information but they require sophisticated statistical techniques that have been made available to the biologist community (through computer programs) only for simple and standard situations typically involving a small number of samples. We propose here a computer program (DIY ABC) for inference based on approximate Bayesian computation (ABC), in which scenarios can be customized by the user to fit many complex situations involving any number of populations and samples. Such scenarios involve any combination of population divergences, admixtures and population size changes. DIY ABC can be used to compare competing scenarios, estimate parameters for one or more scenarios and compute bias and precision measures for a given scenario and known values of parameters (the current version applies to unlinked microsatellite data). This article describes key methods used in the program and provides its main features. The analysis of one simulated and one real dataset, both with complex evolutionary scenarios, illustrates the main possibilities of DIY ABC. AVAILABILITY: The software DIY ABC is freely available at http://www.montpellier.inra.fr/CBGP/diyabc.
  • Item
    Thumbnail Image
    Inference of haplotypic phase and missing genotypes in polyploid organisms and variable copy number genomic regions
    Su, S-Y ; White, J ; Balding, DJ ; Coin, LJM (BMC, 2008-12-01)
    BACKGROUND: The power of haplotype-based methods for association studies, identification of regions under selection, and ancestral inference, is well-established for diploid organisms. For polyploids, however, the difficulty of determining phase has limited such approaches. Polyploidy is common in plants and is also observed in animals. Partial polyploidy is sometimes observed in humans (e.g. trisomy 21; Down's syndrome), and it arises more frequently in some human tissues. Local changes in ploidy, known as copy number variations (CNV), arise throughout the genome. Here we present a method, implemented in the software polyHap, for the inference of haplotype phase and missing observations from polyploid genotypes. PolyHap allows each individual to have a different ploidy, but ploidy cannot vary over the genomic region analysed. It employs a hidden Markov model (HMM) and a sampling algorithm to infer haplotypes jointly in multiple individuals and to obtain a measure of uncertainty in its inferences. RESULTS: In the simulation study, we combine real haplotype data to create artificial diploid, triploid, and tetraploid genotypes, and use these to demonstrate that polyHap performs well, in terms of both switch error rate in recovering phase and imputation error rate for missing genotypes. To our knowledge, there is no comparable software for phasing a large, densely genotyped region of chromosome from triploids and tetraploids, while for diploids we found polyHap to be more accurate than fastPhase. We also compare the results of polyHap to SATlotyper on an experimentally haplotyped tetraploid dataset of 12 SNPs, and show that polyHap is more accurate. CONCLUSION: With the availability of large SNP data in polyploids and CNV regions, we believe that polyHap, our proposed method for inferring haplotypic phase from genotype data, will be useful in enabling researchers analysing such data to exploit the power of haplotype-based analyses.
  • Item
    Thumbnail Image
    A Genome-Wide Association Study of the Metabolic Syndrome in Indian Asian Men
    Zabaneh, D ; Balding, DJ ; Ruiz, JR (PUBLIC LIBRARY SCIENCE, 2010-08-04)
    We conducted a two-stage genome-wide association study to identify common genetic variation altering risk of the metabolic syndrome and related phenotypes in Indian Asian men, who have a high prevalence of these conditions. In Stage 1, approximately 317,000 single nucleotide polymorphisms were genotyped in 2700 individuals, from which 1500 SNPs were selected to be genotyped in a further 2300 individuals. Selection for inclusion in Stage 1 was based on four metabolic syndrome component traits: HDL-cholesterol, plasma glucose and Type 2 diabetes, abdominal obesity measured by waist to hip ratio, and diastolic blood pressure. Association was tested with these four traits and a composite metabolic syndrome phenotype. Four SNPs reaching significance level p<5x10(-7) and with posterior probability of association >0.8 were found in genes CETP and LPL, associated with HDL-cholesterol. These associations have already been reported in Indian Asians and in Europeans. Five additional loci harboured SNPs significant at p<10(-6) and posterior probability >0.5 for HDL-cholesterol, type 2 diabetes or diastolic blood pressure. Our results suggest that the primary genetic determinants of metabolic syndrome are the same in Indian Asians as in other populations, despite the higher prevalence. Further, we found little evidence of a common genetic basis for metabolic syndrome traits in our sample of Indian Asian men.
  • Item
    Thumbnail Image
    Common Genetic Variation Near Melatonin Receptor MTNR1B Contributes to Raised Plasma Glucose and Increased Risk of Type 2 Diabetes Among Indian Asians and European Caucasians
    Chambers, JC ; Zhang, W ; Zabaneh, D ; Sehmi, J ; Jain, P ; McCarthy, MI ; Froguel, P ; Ruokonen, A ; Balding, D ; Jarvelin, M-R ; Scott, J ; Elliott, P ; Kooner, JS (AMER DIABETES ASSOC, 2009-11)
    OBJECTIVE: Fasting plasma glucose and risk of type 2 diabetes are higher among Indian Asians than among European and North American Caucasians. Few studies have investigated genetic factors influencing glucose metabolism among Indian Asians. RESEARCH DESIGN AND METHODS: We carried out genome-wide association studies for fasting glucose in 5,089 nondiabetic Indian Asians genotyped with the Illumina Hap610 BeadChip and 2,385 Indian Asians (698 with type 2 diabetes) genotyped with the Illumina 300 BeadChip. Results were compared with findings in 4,462 European Caucasians. RESULTS: We identified three single nucleotide polymorphisms (SNPs) associated with glucose among Indian Asians at P < 5 x 10(-8), all near melatonin receptor MTNR1B. The most closely associated was rs2166706 (combined P = 2.1 x 10(-9)), which is in moderate linkage disequilibrium with rs1387153 (r(2) = 0.60) and rs10830963 (r(2) = 0.45), both previously associated with glucose in European Caucasians. Risk allele frequency and effect sizes for rs2166706 were similar among Indian Asians and European Caucasians: frequency 46.2 versus 45.0%, respectively (P = 0.44); effect 0.05 (95% CI 0.01-0.08) versus 0.05 (0.03-0.07 mmol/l), respectively, higher glucose per allele copy (P = 0.84). SNP rs2166706 was associated with type 2 diabetes in Indian Asians (odds ratio 1.21 [95% CI 1.06-1.38] per copy of risk allele; P = 0.006). SNPs at the GCK, GCKR, and G6PC2 loci were also associated with glucose among Indian Asians. Risk allele frequencies of rs1260326 (GCKR) and rs560887 (G6PC2) were higher among Indian Asians compared with European Caucasians. CONCLUSIONS: Common genetic variation near MTNR1B influences blood glucose and risk of type 2 diabetes in Indian Asians. Genetic variation at the MTNR1B, GCK, GCKR, and G6PC2 loci may contribute to abnormal glucose metabolism and related metabolic disturbances among Indian Asians.
  • Item
    Thumbnail Image
    A Genome-Wide Association Study of Neuroticism in a Population-Based Sample
    Calboli, FCF ; Tozzi, F ; Galwey, NW ; Antoniades, A ; Mooser, V ; Preisig, M ; Vollenweider, P ; Waterworth, D ; Waeber, G ; Johnson, MR ; Muglia, P ; Balding, DJ ; Domschke, K (PUBLIC LIBRARY SCIENCE, 2010-07-09)
    Neuroticism is a moderately heritable personality trait considered to be a risk factor for developing major depression, anxiety disorders and dementia. We performed a genome-wide association study in 2,235 participants drawn from a population-based study of neuroticism, making this the largest association study for neuroticism to date. Neuroticism was measured by the Eysenck Personality Questionnaire. After Quality Control, we analysed 430,000 autosomal SNPs together with an additional 1.2 million SNPs imputed with high quality from the Hap Map CEU samples. We found a very small effect of population stratification, corrected using one principal component, and some cryptic kinship that required no correction. NKAIN2 showed suggestive evidence of association with neuroticism as a main effect (p < 10(-6)) and GPC6 showed suggestive evidence for interaction with age (p approximately = 10(-7)). We found support for one previously-reported association (PDE4D), but failed to replicate other recent reports. These results suggest common SNP variation does not strongly influence neuroticism. Our study was powered to detect almost all SNPs explaining at least 2% of heritability, and so our results effectively exclude the existence of loci having a major effect on neuroticism.
  • Item
    Thumbnail Image
    Pathway Analysis of GWAS Provides New Insights into Genetic Susceptibility to 3 Inflammatory Diseases
    Eleftherohorinou, H ; Wright, V ; Hoggart, C ; Hartikainen, A-L ; Jarvelin, M-R ; Balding, D ; Coin, L ; Levin, M ; Weedon, MN (PUBLIC LIBRARY SCIENCE, 2009-11-30)
    Although the introduction of genome-wide association studies (GWAS) have greatly increased the number of genes associated with common diseases, only a small proportion of the predicted genetic contribution has so far been elucidated. Studying the cumulative variation of polymorphisms in multiple genes acting in functional pathways may provide a complementary approach to the more common single SNP association approach in understanding genetic determinants of common disease. We developed a novel pathway-based method to assess the combined contribution of multiple genetic variants acting within canonical biological pathways and applied it to data from 14,000 UK individuals with 7 common diseases. We tested inflammatory pathways for association with Crohn's disease (CD), rheumatoid arthritis (RA) and type 1 diabetes (T1D) with 4 non-inflammatory diseases as controls. Using a variable selection algorithm, we identified variants responsible for the pathway association and evaluated their use for disease prediction using a 10 fold cross-validation framework in order to calculate out-of-sample area under the Receiver Operating Curve (AUC). The generalisability of these predictive models was tested on an independent birth cohort from Northern Finland. Multiple canonical inflammatory pathways showed highly significant associations (p 10(-3)-10(-20)) with CD, T1D and RA. Variable selection identified on average a set of 205 SNPs (149 genes) for T1D, 350 SNPs (189 genes) for RA and 493 SNPs (277 genes) for CD. The pattern of polymorphisms at these SNPS were found to be highly predictive of T1D (91% AUC) and RA (85% AUC), and weakly predictive of CD (60% AUC). The predictive ability of the T1D model (without any parameter refitting) had good predictive ability (79% AUC) in the Finnish cohort. Our analysis suggests that genetic contribution to common inflammatory diseases operates through multiple genes interacting in functional pathways.
  • Item
    Thumbnail Image
    Fregene: Simulation of realistic sequence-level data in populations and ascertained samples
    Chadeau-Hyam, M ; Hoggart, CJ ; O'Reilly, PF ; Whittaker, JC ; De Iorio, M ; Balding, DJ (BIOMED CENTRAL LTD, 2008-09-08)
    BACKGROUND: FREGENE simulates sequence-level data over large genomic regions in large populations. Because, unlike coalescent simulators, it works forwards through time, it allows complex scenarios of selection, demography, and recombination to be modelled simultaneously. Detailed tracking of sites under selection is implemented in FREGENE and provides the opportunity to test theoretical predictions and gain new insights into mechanisms of selection. We describe here main functionalities of both FREGENE and SAMPLE, a companion program that can replicate association study datasets. RESULTS: We report detailed analyses of six large simulated datasets that we have made publicly available. Three demographic scenarios are modelled: one panmictic, one substructured with migration, and one complex scenario that mimics the principle features of genetic variation in major worldwide human populations. For each scenario there is one neutral simulation, and one with a complex pattern of selection. CONCLUSION: FREGENE and the simulated datasets will be valuable for assessing the validity of models for selection, demography and population genetic parameters, as well as the efficacy of association studies. Its principle advantages are modelling flexibility and computational efficiency. It is open source and object-oriented. As such, it can be customised and the range of models extended.