School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 31
  • Item
    Thumbnail Image
    Optimizing genomic medicine in epilepsy through a gene-customized approach to missense variant interpretation
    Traynelis, J ; Silk, M ; Wang, Q ; Berkovic, SF ; Liu, L ; Ascher, DB ; Balding, DJ ; Petrovski, S (COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT, 2017-10)
    Gene panel and exome sequencing have revealed a high rate of molecular diagnoses among diseases where the genetic architecture has proven suitable for sequencing approaches, with a large number of distinct and highly penetrant causal variants identified among a growing list of disease genes. The challenge is, given the DNA sequence of a new patient, to distinguish disease-causing from benign variants. Large samples of human standing variation data highlight regional variation in the tolerance to missense variation within the protein-coding sequence of genes. This information is not well captured by existing bioinformatic tools, but is effective in improving variant interpretation. To address this limitation in existing tools, we introduce the missense tolerance ratio (MTR), which summarizes available human standing variation data within genes to encapsulate population level genetic variation. We find that patient-ascertained pathogenic variants preferentially cluster in low MTR regions (P < 0.005) of well-informed genes. By evaluating 20 publicly available predictive tools across genes linked to epilepsy, we also highlight the importance of understanding the empirical null distribution of existing prediction tools, as these vary across genes. Subsequently integrating the MTR with the empirically selected bioinformatic tools in a gene-specific approach demonstrates a clear improvement in the ability to predict pathogenic missense variants from background missense variation in disease genes. Among an independent test sample of case and control missense variants, case variants (0.83 median score) consistently achieve higher pathogenicity prediction probabilities than control variants (0.02 median score; Mann-Whitney U test, P < 1 × 10-16). We focus on the application to epilepsy genes; however, the framework is applicable to disease genes beyond epilepsy.
  • Item
    Thumbnail Image
    Summary statistic analyses can mistake confounding bias for heritability
    Holmes, JB ; Speed, D ; Balding, DJ (WILEY, 2019-12)
    Linkage disequilibrium SCore regression (LDSC) has become a popular approach to estimate confounding bias, heritability, and genetic correlation using only genome-wide association study (GWAS) test statistics. SumHer is a newly introduced alternative with similar aims. We show using theory and simulations that both approaches fail to adequately account for confounding bias, even when the assumed heritability model is correct. Consequently, these methods may estimate heritability poorly if there was an inadequate adjustment for confounding in the original GWAS analysis. We also show that the choice of a summary statistic for use in LDSC or SumHer can have a large impact on resulting inferences. Further, covariate adjustments in the original GWAS can alter the target of heritability estimation, which can be problematic for test statistics from a meta-analysis of GWAS with different covariate adjustments.
  • Item
    No Preview Available
    Reevaluation of SNP heritability in complex human traits
    Speed, D ; Cai, N ; Johnson, MR ; Nejentsev, S ; Balding, DJ (NATURE PUBLISHING GROUP, 2017-07)
    SNP heritability, the proportion of phenotypic variance explained by SNPs, has been reported for many hundreds of traits. Its estimation requires strong prior assumptions about the distribution of heritability across the genome, but current assumptions have not been thoroughly tested. By analyzing imputed data for a large number of human traits, we empirically derive a model that more accurately describes how heritability varies with minor allele frequency (MAF), linkage disequilibrium (LD) and genotype certainty. Across 19 traits, our improved model leads to estimates of common SNP heritability on average 43% (s.d. 3%) higher than those obtained from the widely used software GCTA and 25% (s.d. 2%) higher than those from the recently proposed extension GCTA-LDMS. Previously, DNase I hypersensitivity sites were reported to explain 79% of SNP heritability; using our improved heritability model, their estimated contribution is only 24%.
  • Item
    Thumbnail Image
    Worldwide FST Estimates Relative to Five Continental-Scale Populations
    Steele, CD ; Court, DS ; Balding, DJ (WILEY, 2014-11)
    We estimate the population genetics parameter FST (also referred to as the fixation index) from short tandem repeat (STR) allele frequencies, comparing many worldwide human subpopulations at approximately the national level with continental-scale populations. FST is commonly used to measure population differentiation, and is important in forensic DNA analysis to account for remote shared ancestry between a suspect and an alternative source of the DNA. We estimate FST comparing subpopulations with a hypothetical ancestral population, which is the approach most widely used in population genetics, and also compare a subpopulation with a sampled reference population, which is more appropriate for forensic applications. Both estimation methods are likelihood-based, in which FST is related to the variance of the multinomial-Dirichlet distribution for allele counts. Overall, we find low FST values, with posterior 97.5 percentiles < 3% when comparing a subpopulation with the most appropriate population, and even for inter-population comparisons we find FST < 5%. These are much smaller than single nucleotide polymorphism-based inter-continental FST estimates, and are also about half the magnitude of STR-based estimates from population genetics surveys that focus on distinct ethnic groups rather than a general population. Our findings support the use of FST up to 3% in forensic calculations, which corresponds to some current practice.
  • Item
    Thumbnail Image
    A genome-wide association scan implicates DCHS2, RUNX2, GLI3, PAX1 and EDAR in human facial variation
    Adhikari, K ; Fuentes-Guajardo, M ; Quinto-Sanchez, M ; Mendoza-Revilla, J ; Chacon-Duque, JC ; Acuna-Alonzo, V ; Jaramillo, C ; Arias, W ; Barquera Lozano, R ; Macin Perez, G ; Gomez-Valdes, J ; Villamil-Ramirez, H ; Hunemeier, T ; Ramallo, V ; Silva de Cerqueira, CC ; Hurtado, M ; Villegas, V ; Granja, V ; Gallo, C ; Poletti, G ; Schuler-Faccini, L ; Salzano, FM ; Bortolini, M-C ; Canizales-Quinteros, S ; Cheeseman, M ; Rosique, J ; Bedoya, G ; Rothhammer, F ; Headon, D ; Gonzalez-Jose, R ; Balding, D ; Ruiz-Linares, A (NATURE PUBLISHING GROUP, 2016-05)
    We report a genome-wide association scan for facial features in ∼6,000 Latin Americans. We evaluated 14 traits on an ordinal scale and found significant association (P values<5 × 10(-8)) at single-nucleotide polymorphisms (SNPs) in four genomic regions for three nose-related traits: columella inclination (4q31), nose bridge breadth (6p21) and nose wing breadth (7p13 and 20p11). In a subsample of ∼3,000 individuals we obtained quantitative traits related to 9 of the ordinal phenotypes and, also, a measure of nasion position. Quantitative analyses confirmed the ordinal-based associations, identified SNPs in 2q12 associated to chin protrusion, and replicated the reported association of nasion position with SNPs in PAX3. Strongest association in 2q12, 4q31, 6p21 and 7p13 was observed for SNPs in the EDAR, DCHS2, RUNX2 and GLI3 genes, respectively. Associated SNPs in 20p11 extend to PAX1. Consistent with the effect of EDAR on chin protrusion, we documented alterations of mandible length in mice with modified Edar funtion.
  • Item
    Thumbnail Image
    Integrating dynamic mixed-effect modelling and penalized regression to explore genetic association with pharmacokinetics
    Bertrand, J ; De Iorio, M ; Balding, DJ (LIPPINCOTT WILLIAMS & WILKINS, 2015-05)
    CONTEXT: In a previous work, we have shown that penalized regression approaches can allow many genetic variants to be incorporated into sophisticated pharmacokinetic (PK) models in a way that is both computationally and statistically efficient. The phenotypes were the individual model parameter estimates, obtained a posteriori of the model fit and known to be sensitive to the study design. OBJECTIVE: The aim of this study was to propose an integrated approach in which genetic effect sizes are estimated simultaneously with the PK model parameters, which should improve the estimate precision and reduce sensitivity to study design. METHODS: A total of 200 data sets were simulated under the null and each of the following three alternative scenarios: (i) a phase II study with N=300 participants and n=6 sampling times, wherein six unobserved causal variants affect the drug elimination clearance; (ii) the addition of participants with a residual concentration collected in clinical routine (N=300, n=6 plus N=700, n=1); and (iii) a phase II study (N=300, n=6) in which four unobserved causal variants affect two different model parameters. RESULTS: In all scenarios the integrated approach detected fewer false positives. In scenario (i), true-positive rates were low and the stepwise procedure outperformed the integrated approach. In scenario (ii), approaches performed similarly and rates were higher. In scenario (iii), the integrated approach outperformed the stepwise procedure. CONCLUSION: A PK phase II study with N=300 lacks the power to detect genetic effects on PK using genetic arrays. Our approach can simultaneously analyse phase II and clinical routine data and identify when genetic variants affect multiple PK parameters.
  • Item
    Thumbnail Image
    Evidence for a Common Origin of Blacksmiths and Cultivators in the Ethiopian Ari within the Last 4500 Years: Lessons for Clustering-Based Inference
    van Dorp, L ; Balding, D ; Myers, S ; Pagani, L ; Tyler-Smith, C ; Bekele, E ; Tarekegn, A ; Thomas, MG ; Bradman, N ; Hellenthal, G ; Di Rienzo, A (PUBLIC LIBRARY SCIENCE, 2015-08)
    The Ari peoples of Ethiopia are comprised of different occupational groups that can be distinguished genetically, with Ari Cultivators and the socially marginalised Ari Blacksmiths recently shown to have a similar level of genetic differentiation between them (FST ≈ 0.023 - 0.04) as that observed among multiple ethnic groups sampled throughout Ethiopia. Anthropologists have proposed two competing theories to explain the origins of the Ari Blacksmiths as (i) remnants of a population that inhabited Ethiopia prior to the arrival of agriculturists (e.g. Cultivators), or (ii) relatively recently related to the Cultivators but presently marginalized in the community due to their trade. Two recent studies by different groups analysed genome-wide DNA from samples of Ari Blacksmiths and Cultivators and suggested that genetic patterns between the two groups were more consistent with model (i) and subsequent assimilation of the indigenous peoples into the expanding agriculturalist community. We analysed the same samples using approaches designed to attenuate signals of genetic differentiation that are attributable to allelic drift within a population. By doing so, we provide evidence that the genetic differences between Ari Blacksmiths and Cultivators can be entirely explained by bottleneck effects consistent with hypothesis (ii). This finding serves as both a cautionary tale about interpreting results from unsupervised clustering algorithms, and suggests that social constructions are contributing directly to genetic differentiation over a relatively short time period among previously genetically similar groups.
  • Item
    Thumbnail Image
    A genome-wide association study identifies multiple loci for variation in human ear morphology
    Adhikari, K ; Reales, G ; Smith, AJP ; Konka, E ; Palmen, J ; Quinto-Sanchez, M ; Acuna-Alonzo, V ; Jaramillo, C ; Arias, W ; Fuentes, M ; Pizarro, M ; Barquera Lozano, R ; Macin Perez, G ; Gomez-Valdes, J ; Villamil-Ramirez, H ; Hunemeier, T ; Ramallo, V ; Silva de Cerqueira, CC ; Hurtado, M ; Villegas, V ; Granja, V ; Gallo, C ; Poletti, G ; Schuler-Faccini, L ; Salzano, FM ; Bortolini, M-C ; Canizales-Quinteros, S ; Rothhammer, F ; Bedoya, G ; Calderon, R ; Rosique, J ; Cheeseman, M ; Bhutta, MF ; Humphries, SE ; Gonzalez-Jose, R ; Headon, D ; Balding, D ; Ruiz-Linares, A (NATURE PUBLISHING GROUP, 2015-06)
    Here we report a genome-wide association study for non-pathological pinna morphology in over 5,000 Latin Americans. We find genome-wide significant association at seven genomic regions affecting: lobe size and attachment, folding of antihelix, helix rolling, ear protrusion and antitragus size (linear regression P values 2 × 10(-8) to 3 × 10(-14)). Four traits are associated with a functional variant in the Ectodysplasin A receptor (EDAR) gene, a key regulator of embryonic skin appendage development. We confirm expression of Edar in the developing mouse ear and that Edar-deficient mice have an abnormally shaped pinna. Two traits are associated with SNPs in a region overlapping the T-Box Protein 15 (TBX15) gene, a major determinant of mouse skeletal development. Strongest association in this region is observed for SNP rs17023457 located in an evolutionarily conserved binding site for the transcription factor Cartilage paired-class homeoprotein 1 (CART1), and we confirm that rs17023457 alters in vitro binding of CART1.
  • Item
    Thumbnail Image
    Increased Population Risk of AIP-Related Acromegaly and Gigantism in Ireland
    Radian, S ; Diekmann, Y ; Gabrovska, P ; Holland, B ; Bradley, L ; Wallace, H ; Stals, K ; Bussell, A-M ; McGurren, K ; Cuesta, M ; Ryan, AW ; Herincs, M ; Hernandez-Ramirez, LC ; Holland, A ; Samuels, J ; Aflorei, ED ; Barry, S ; Denes, J ; Pernicova, I ; Stiles, CE ; Trivellin, G ; McCloskey, R ; Ajzensztejn, M ; Abid, N ; Akker, SA ; Mercado, M ; Cohen, M ; Thakker, RV ; Baldeweg, S ; Barkan, A ; Musat, M ; Levy, M ; Orme, SM ; Unterlaender, M ; Burger, J ; Kumar, AV ; Ellard, S ; McPartlin, J ; McManus, R ; Linden, GJ ; Atkinson, B ; Balding, DJ ; Agha, A ; Thompson, CJ ; Hunter, SJ ; Thomas, MG ; Morrison, PJ ; Korbonits, M (WILEY-BLACKWELL, 2017-01)
    The aryl hydrocarbon receptor interacting protein (AIP) founder mutation R304* (or p.R304* ; NM_003977.3:c.910C>T, p.Arg304Ter) identified in Northern Ireland (NI) predisposes to acromegaly/gigantism; its population health impact remains unexplored. We measured R304* carrier frequency in 936 Mid Ulster, 1,000 Greater Belfast (both in NI) and 2,094 Republic of Ireland (ROI) volunteers and in 116 NI or ROI acromegaly/gigantism patients. Carrier frequencies were 0.0064 in Mid Ulster (95%CI = 0.0027-0.013; P = 0.0005 vs. ROI), 0.001 in Greater Belfast (0.00011-0.0047) and zero in ROI (0-0.0014). R304* prevalence was elevated in acromegaly/gigantism patients in NI (11/87, 12.6%, P < 0.05), but not in ROI (2/29, 6.8%) versus non-Irish patients (0-2.41%). Haploblock conservation supported a common ancestor for all the 18 identified Irish pedigrees (81 carriers, 30 affected). Time to most recent common ancestor (tMRCA) was 2550 (1,275-5,000) years. tMRCA-based simulations predicted 432 (90-5,175) current carriers, including 86 affected (18-1,035) for 20% penetrance. In conclusion, R304* is frequent in Mid Ulster, resulting in numerous acromegaly/gigantism cases. tMRCA is consistent with historical/folklore accounts of Irish giants. Forward simulations predict many undetected carriers; geographically targeted population screening improves asymptomatic carrier identification, complementing clinical testing of patients/relatives. We generated disease awareness locally, necessary for early diagnosis and improved outcomes of AIP-related disease.
  • Item
    Thumbnail Image
    A genome-wide association scan in admixed Latin Americans identifies loci influencing facial and scalp hair features
    Adhikari, K ; Fontanil, T ; Cal, S ; Mendoza-Revilla, J ; Fuentes-Guajardo, M ; Chacon-Duque, J-C ; Al-Saadi, F ; Johansson, JA ; Quinto-Sanchez, M ; Acuna-Alonzo, V ; Jaramillo, C ; Arias, W ; Barquera Lozano, R ; Macin Perez, G ; Gomez-Valdes, J ; Villamil-Ramirez, H ; Hunemeier, T ; Ramallo, V ; Silva de Cerqueira, CC ; Hurtado, M ; Villegas, V ; Granja, V ; Gallo, C ; Poletti, G ; Schuler-Faccini, L ; Salzano, FM ; Bortolini, M-C ; Canizales-Quinteros, S ; Rothhammer, F ; Bedoya, G ; Gonzalez-Jose, R ; Headon, D ; Lopez-Otin, C ; Tobin, DJ ; Balding, D ; Ruiz-Linares, A (NATURE PORTFOLIO, 2016-03)
    We report a genome-wide association scan in over 6,000 Latin Americans for features of scalp hair (shape, colour, greying, balding) and facial hair (beard thickness, monobrow, eyebrow thickness). We found 18 signals of association reaching genome-wide significance (P values 5 × 10(-8) to 3 × 10(-119)), including 10 novel associations. These include novel loci for scalp hair shape and balding, and the first reported loci for hair greying, monobrow, eyebrow and beard thickness. A newly identified locus influencing hair shape includes a Q30R substitution in the Protease Serine S1 family member 53 (PRSS53). We demonstrate that this enzyme is highly expressed in the hair follicle, especially the inner root sheath, and that the Q30R substitution affects enzyme processing and secretion. The genome regions associated with hair features are enriched for signals of selection, consistent with proposals regarding the evolution of human hair.