School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 51
  • Item
    No Preview Available
    Prediction of eye, hair and skin colour in Latin Americans
    Palmal, S ; Adhikari, K ; Mendoza-Revilla, J ; Fuentes-Guajardo, M ; de Cerqueira, CCS ; Bonfante, B ; Chacon-Duque, JC ; Sohail, A ; Hurtado, M ; Villegas, V ; Granja, V ; Jaramillo, C ; Arias, W ; Barquera Lozano, R ; Everardo-Martinez, P ; Gomez-Valdes, J ; Villamil-Ramirez, H ; Hunemeier, T ; Ramallo, V ; Parolin, M-L ; Gonzalez-Jose, R ; Schuler-Faccini, L ; Bortolini, M-C ; Acuna-Alonzo, V ; Canizales-Quinteros, S ; Gallo, C ; Poletti, G ; Bedoya, G ; Rothhammer, F ; Balding, D ; Faux, P ; Ruiz-Linares, A (ELSEVIER IRELAND LTD, 2021-07)
    Here we evaluate the accuracy of prediction for eye, hair and skin pigmentation in a dataset of > 6500 individuals from Mexico, Colombia, Peru, Chile and Brazil (including genome-wide SNP data and quantitative/categorical pigmentation phenotypes - the CANDELA dataset CAN). We evaluated accuracy in relation to different analytical methods and various phenotypic predictors. As expected from statistical principles, we observe that quantitative traits are more sensitive to changes in the prediction models than categorical traits. We find that Random Forest or Linear Regression are generally the best performing methods. We also compare the prediction accuracy of SNP sets defined in the CAN dataset (including 56, 101 and 120 SNPs for eye, hair and skin colour prediction, respectively) to the well-established HIrisPlex-S SNP set (including 6, 22 and 36 SNPs for eye, hair and skin colour prediction respectively). When training prediction models on the CAN data, we observe remarkably similar performances for HIrisPlex-S and the larger CAN SNP sets for the prediction of hair (categorical) and eye (both categorical and quantitative), while the CAN sets outperform HIrisPlex-S for quantitative, but not for categorical skin pigmentation prediction. The performance of HIrisPlex-S, when models are trained in a world-wide sample (although consisting of 80% Europeans, https://hirisplex.erasmusmc.nl), is lower relative to training in the CAN data (particularly for hair and skin colour). Altogether, our observations are consistent with common variation of eye and hair colour having a relatively simple genetic architecture, which is well captured by HIrisPlex-S, even in admixed Latin Americans (with partial European ancestry). By contrast, since skin pigmentation is a more polygenic trait, accuracy is more sensitive to prediction SNP set size, although here this effect was only apparent for a quantitative measure of skin pigmentation. Our results support the use of HIrisPlex-S in the prediction of categorical pigmentation traits for forensic purposes in Latin America, while illustrating the impact of training datasets on its accuracy.
  • Item
    Thumbnail Image
    Retraction of a peer reviewed article suggests ongoing problems with Australian forensic science.
    Brook, C ; Lynøe, N ; Eriksson, A ; Balding, D (Elsevier BV, 2021)
    We describe events arising from the case of Joby Rowe, convicted of the homicide of his three month old daughter, and explore what they illustrate about systemic problems in the forensic science community in Australia. A peer reviewed journal article that scrutinized the forensic evidence presented in the Rowe case was retracted by a forensic science journal for reasons unrelated to quality or accuracy, under pressure from forensic medical experts criticized in the article. Details of the retraction obtained through freedom of information mechanisms reveal improper pressure and subversion of publishing processes in order to avoid scrutiny. The retraction was supported by the editorial board and two Australian forensic science societies, which is indicative of serious deficiencies in the leadership of forensic science in Australia. We propose paths forward including blind peer review, publication of expert reports, and a criminal cases review authority, that would help stimulate a culture that encourages scrutiny, and relies on evidence-based rather than eminence-based knowledge.
  • Item
    Thumbnail Image
    Optimizing genomic medicine in epilepsy through a gene-customized approach to missense variant interpretation
    Traynelis, J ; Silk, M ; Wang, Q ; Berkovic, SF ; Liu, L ; Ascher, DB ; Balding, DJ ; Petrovski, S (COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT, 2017-10)
    Gene panel and exome sequencing have revealed a high rate of molecular diagnoses among diseases where the genetic architecture has proven suitable for sequencing approaches, with a large number of distinct and highly penetrant causal variants identified among a growing list of disease genes. The challenge is, given the DNA sequence of a new patient, to distinguish disease-causing from benign variants. Large samples of human standing variation data highlight regional variation in the tolerance to missense variation within the protein-coding sequence of genes. This information is not well captured by existing bioinformatic tools, but is effective in improving variant interpretation. To address this limitation in existing tools, we introduce the missense tolerance ratio (MTR), which summarizes available human standing variation data within genes to encapsulate population level genetic variation. We find that patient-ascertained pathogenic variants preferentially cluster in low MTR regions (P < 0.005) of well-informed genes. By evaluating 20 publicly available predictive tools across genes linked to epilepsy, we also highlight the importance of understanding the empirical null distribution of existing prediction tools, as these vary across genes. Subsequently integrating the MTR with the empirically selected bioinformatic tools in a gene-specific approach demonstrates a clear improvement in the ability to predict pathogenic missense variants from background missense variation in disease genes. Among an independent test sample of case and control missense variants, case variants (0.83 median score) consistently achieve higher pathogenicity prediction probabilities than control variants (0.02 median score; Mann-Whitney U test, P < 1 × 10-16). We focus on the application to epilepsy genes; however, the framework is applicable to disease genes beyond epilepsy.
  • Item
    Thumbnail Image
    Assessing the Forensic Value of DNA Evidence from Y Chromosomes and Mitogenomes
    Andersen, MM ; Balding, DJ (MDPI, 2021-08)
    Y chromosome and mitochondrial DNA profiles have been used as evidence in courts for decades, yet the problem of evaluating the weight of evidence has not been adequately resolved. Both are lineage markers (inherited from just one parent), which presents different interpretation challenges compared with standard autosomal DNA profiles (inherited from both parents). We review approaches to the evaluation of lineage marker profiles for forensic identification, focussing on the key roles of profile mutation rate and relatedness (extending beyond known relatives). Higher mutation rates imply fewer individuals matching the profile of an alleged contributor, but they will be more closely related. This makes it challenging to evaluate the possibility that one of these matching individuals could be the true source, because relatives may be plausible alternative contributors, and may not be well mixed in the population. These issues reduce the usefulness of profile databases drawn from a broad population: larger populations can have a lower profile relative frequency because of lower relatedness with the alleged contributor. Many evaluation methods do not adequately take account of distant relatedness, but its effects have become more pronounced with the latest generation of high-mutation-rate Y profiles.
  • Item
    Thumbnail Image
    Summary statistic analyses can mistake confounding bias for heritability
    Holmes, JB ; Speed, D ; Balding, DJ (WILEY, 2019-12)
    Linkage disequilibrium SCore regression (LDSC) has become a popular approach to estimate confounding bias, heritability, and genetic correlation using only genome-wide association study (GWAS) test statistics. SumHer is a newly introduced alternative with similar aims. We show using theory and simulations that both approaches fail to adequately account for confounding bias, even when the assumed heritability model is correct. Consequently, these methods may estimate heritability poorly if there was an inadequate adjustment for confounding in the original GWAS analysis. We also show that the choice of a summary statistic for use in LDSC or SumHer can have a large impact on resulting inferences. Further, covariate adjustments in the original GWAS can alter the target of heritability estimation, which can be problematic for test statistics from a meta-analysis of GWAS with different covariate adjustments.
  • Item
    Thumbnail Image
    A GWAS in Latin Americans identifies novel face shape loci, implicating VPS13B and a Denisovan introgressed region in facial variation
    Bonfante, B ; Faux, P ; Navarro, N ; Mendoza-Revilla, J ; Dubied, M ; Montillot, C ; Wentworth, E ; Poloni, L ; Varon-Gonzalez, C ; Jones, P ; Xiong, Z ; Fuentes-Guajardo, M ; Palmal, S ; Chacon-Duque, JC ; Hurtado, M ; Villegas, V ; Granja, V ; Jaramillo, C ; Arias, W ; Barquera, R ; Everardo-Martinez, P ; Sanchez-Quinto, M ; Gomez-Valdes, J ; Villamil-Ramirez, H ; de Cerqueira, CCS ; Hunemeier, T ; Ramallo, V ; Liu, F ; Weinber, SM ; Shaffer, JR ; Stergiakouli, E ; Howe, LJ ; Hysi, PG ; Spector, TD ; Gonzalez-Jose, R ; Schuler-Faccini, L ; Bortolini, R-C ; Acuna-Alonzo, V ; Canizales-Quinteros, S ; Gallo, C ; Poletti, G ; Bedoya, G ; Rothhammer, F ; Thauvin-Robinet, C ; Faivre, L ; Costedoat, C ; Balding, D ; Cox, T ; Kayser, M ; Duplomb, L ; Yalcin, B ; Cotney, J ; Adhikari, K ; Ruiz-Linares, A (AMER ASSOC ADVANCEMENT SCIENCE, 2021-02)
    To characterize the genetic basis of facial features in Latin Americans, we performed a genome-wide association study (GWAS) of more than 6000 individuals using 59 landmark-based measurements from two-dimensional profile photographs and ~9,000,000 genotyped or imputed single-nucleotide polymorphisms. We detected significant association of 32 traits with at least 1 (and up to 6) of 32 different genomic regions, more than doubling the number of robustly associated face morphology loci reported until now (from 11 to 23). These GWAS hits are strongly enriched in regulatory sequences active specifically during craniofacial development. The associated region in 1p12 includes a tract of archaic adaptive introgression, with a Denisovan haplotype common in Native Americans affecting particularly lip thickness. Among the nine previously unidentified face morphology loci we identified is the VPS13B gene region, and we show that variants in this region also affect midfacial morphology in mice.
  • Item
    No Preview Available
    Reevaluation of SNP heritability in complex human traits
    Speed, D ; Cai, N ; Johnson, MR ; Nejentsev, S ; Balding, DJ (NATURE PUBLISHING GROUP, 2017-07)
    SNP heritability, the proportion of phenotypic variance explained by SNPs, has been reported for many hundreds of traits. Its estimation requires strong prior assumptions about the distribution of heritability across the genome, but current assumptions have not been thoroughly tested. By analyzing imputed data for a large number of human traits, we empirically derive a model that more accurately describes how heritability varies with minor allele frequency (MAF), linkage disequilibrium (LD) and genotype certainty. Across 19 traits, our improved model leads to estimates of common SNP heritability on average 43% (s.d. 3%) higher than those obtained from the widely used software GCTA and 25% (s.d. 2%) higher than those from the recently proposed extension GCTA-LDMS. Previously, DNase I hypersensitivity sites were reported to explain 79% of SNP heritability; using our improved heritability model, their estimated contribution is only 24%.
  • Item
    No Preview Available
    The impact of low-cost, genome-wide resequencing on association studies.
    Balding, D (Springer Science and Business Media LLC, 2005-06)
  • Item
    No Preview Available
    Understanding complex traits: from farmers to pharmas
    Speed, D ; Balding, DJ (BIOMED CENTRAL LTD, 2012-07-27)
    A report on the 4th International Conference on Quantitative Genetics (ICQG4), Edinburgh, UK, June 17-22, 2012.
  • Item
    Thumbnail Image
    Worldwide FST Estimates Relative to Five Continental-Scale Populations
    Steele, CD ; Court, DS ; Balding, DJ (WILEY, 2014-11)
    We estimate the population genetics parameter FST (also referred to as the fixation index) from short tandem repeat (STR) allele frequencies, comparing many worldwide human subpopulations at approximately the national level with continental-scale populations. FST is commonly used to measure population differentiation, and is important in forensic DNA analysis to account for remote shared ancestry between a suspect and an alternative source of the DNA. We estimate FST comparing subpopulations with a hypothetical ancestral population, which is the approach most widely used in population genetics, and also compare a subpopulation with a sampled reference population, which is more appropriate for forensic applications. Both estimation methods are likelihood-based, in which FST is related to the variance of the multinomial-Dirichlet distribution for allele counts. Overall, we find low FST values, with posterior 97.5 percentiles < 3% when comparing a subpopulation with the most appropriate population, and even for inter-population comparisons we find FST < 5%. These are much smaller than single nucleotide polymorphism-based inter-continental FST estimates, and are also about half the magnitude of STR-based estimates from population genetics surveys that focus on distinct ethnic groups rather than a general population. Our findings support the use of FST up to 3% in forensic calculations, which corresponds to some current practice.