School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 8 of 8
  • Item
    Thumbnail Image
    Optimizing genomic medicine in epilepsy through a gene-customized approach to missense variant interpretation
    Traynelis, J ; Silk, M ; Wang, Q ; Berkovic, SF ; Liu, L ; Ascher, DB ; Balding, DJ ; Petrovski, S (COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT, 2017-10)
    Gene panel and exome sequencing have revealed a high rate of molecular diagnoses among diseases where the genetic architecture has proven suitable for sequencing approaches, with a large number of distinct and highly penetrant causal variants identified among a growing list of disease genes. The challenge is, given the DNA sequence of a new patient, to distinguish disease-causing from benign variants. Large samples of human standing variation data highlight regional variation in the tolerance to missense variation within the protein-coding sequence of genes. This information is not well captured by existing bioinformatic tools, but is effective in improving variant interpretation. To address this limitation in existing tools, we introduce the missense tolerance ratio (MTR), which summarizes available human standing variation data within genes to encapsulate population level genetic variation. We find that patient-ascertained pathogenic variants preferentially cluster in low MTR regions (P < 0.005) of well-informed genes. By evaluating 20 publicly available predictive tools across genes linked to epilepsy, we also highlight the importance of understanding the empirical null distribution of existing prediction tools, as these vary across genes. Subsequently integrating the MTR with the empirically selected bioinformatic tools in a gene-specific approach demonstrates a clear improvement in the ability to predict pathogenic missense variants from background missense variation in disease genes. Among an independent test sample of case and control missense variants, case variants (0.83 median score) consistently achieve higher pathogenicity prediction probabilities than control variants (0.02 median score; Mann-Whitney U test, P < 1 × 10-16). We focus on the application to epilepsy genes; however, the framework is applicable to disease genes beyond epilepsy.
  • Item
    No Preview Available
    Reevaluation of SNP heritability in complex human traits
    Speed, D ; Cai, N ; Johnson, MR ; Nejentsev, S ; Balding, DJ (NATURE PUBLISHING GROUP, 2017-07)
    SNP heritability, the proportion of phenotypic variance explained by SNPs, has been reported for many hundreds of traits. Its estimation requires strong prior assumptions about the distribution of heritability across the genome, but current assumptions have not been thoroughly tested. By analyzing imputed data for a large number of human traits, we empirically derive a model that more accurately describes how heritability varies with minor allele frequency (MAF), linkage disequilibrium (LD) and genotype certainty. Across 19 traits, our improved model leads to estimates of common SNP heritability on average 43% (s.d. 3%) higher than those obtained from the widely used software GCTA and 25% (s.d. 2%) higher than those from the recently proposed extension GCTA-LDMS. Previously, DNase I hypersensitivity sites were reported to explain 79% of SNP heritability; using our improved heritability model, their estimated contribution is only 24%.
  • Item
    Thumbnail Image
    Increased Population Risk of AIP-Related Acromegaly and Gigantism in Ireland
    Radian, S ; Diekmann, Y ; Gabrovska, P ; Holland, B ; Bradley, L ; Wallace, H ; Stals, K ; Bussell, A-M ; McGurren, K ; Cuesta, M ; Ryan, AW ; Herincs, M ; Hernandez-Ramirez, LC ; Holland, A ; Samuels, J ; Aflorei, ED ; Barry, S ; Denes, J ; Pernicova, I ; Stiles, CE ; Trivellin, G ; McCloskey, R ; Ajzensztejn, M ; Abid, N ; Akker, SA ; Mercado, M ; Cohen, M ; Thakker, RV ; Baldeweg, S ; Barkan, A ; Musat, M ; Levy, M ; Orme, SM ; Unterlaender, M ; Burger, J ; Kumar, AV ; Ellard, S ; McPartlin, J ; McManus, R ; Linden, GJ ; Atkinson, B ; Balding, DJ ; Agha, A ; Thompson, CJ ; Hunter, SJ ; Thomas, MG ; Morrison, PJ ; Korbonits, M (WILEY-BLACKWELL, 2017-01)
    The aryl hydrocarbon receptor interacting protein (AIP) founder mutation R304* (or p.R304* ; NM_003977.3:c.910C>T, p.Arg304Ter) identified in Northern Ireland (NI) predisposes to acromegaly/gigantism; its population health impact remains unexplored. We measured R304* carrier frequency in 936 Mid Ulster, 1,000 Greater Belfast (both in NI) and 2,094 Republic of Ireland (ROI) volunteers and in 116 NI or ROI acromegaly/gigantism patients. Carrier frequencies were 0.0064 in Mid Ulster (95%CI = 0.0027-0.013; P = 0.0005 vs. ROI), 0.001 in Greater Belfast (0.00011-0.0047) and zero in ROI (0-0.0014). R304* prevalence was elevated in acromegaly/gigantism patients in NI (11/87, 12.6%, P < 0.05), but not in ROI (2/29, 6.8%) versus non-Irish patients (0-2.41%). Haploblock conservation supported a common ancestor for all the 18 identified Irish pedigrees (81 carriers, 30 affected). Time to most recent common ancestor (tMRCA) was 2550 (1,275-5,000) years. tMRCA-based simulations predicted 432 (90-5,175) current carriers, including 86 affected (18-1,035) for 20% penetrance. In conclusion, R304* is frequent in Mid Ulster, resulting in numerous acromegaly/gigantism cases. tMRCA is consistent with historical/folklore accounts of Irish giants. Forward simulations predict many undetected carriers; geographically targeted population screening improves asymptomatic carrier identification, complementing clinical testing of patients/relatives. We generated disease awareness locally, necessary for early diagnosis and improved outcomes of AIP-related disease.
  • Item
    Thumbnail Image
    In-frame seven amino-acid duplication in AIP arose over the last 3000 years, disrupts protein interaction and stability and is associated with gigantism
    Salvatori, R ; Radian, S ; Diekmann, Y ; Iacovazzo, D ; David, A ; Gabrovska, P ; Grassi, G ; Bussell, A-M ; Stals, K ; Weber, A ; Quinton, R ; Crowne, EC ; Corazzini, V ; Metherell, L ; Kearney, T ; Du Plessis, D ; Sinha, AK ; Baborie, A ; Lecoq, A-L ; Chanson, P ; Ansorge, O ; Ellard, S ; Trainer, PJ ; Balding, D ; Thomas, MG ; Korbonits, M (BIOSCIENTIFICA LTD, 2017-09)
    OBJECTIVE: Mutations in the aryl hydrocarbon receptor-interacting protein (AIP) gene are associated with pituitary adenoma, acromegaly and gigantism. Identical alleles in unrelated pedigrees could be inherited from a common ancestor or result from recurrent mutation events. DESIGN AND METHODS: Observational, inferential and experimental study, including: AIP mutation testing; reconstruction of 14 AIP-region (8.3 Mbp) haplotypes; coalescent-based approximate Bayesian estimation of the time to most recent common ancestor (tMRCA) of the derived allele; forward population simulations to estimate current number of allele carriers; proposal of mutation mechanism; protein structure predictions; co-immunoprecipitation and cycloheximide chase experiments. RESULTS: Nine European-origin, unrelated c.805_825dup-positive pedigrees (four familial, five sporadic from the UK, USA and France) included 16 affected (nine gigantism/four acromegaly/two non-functioning pituitary adenoma patients and one prospectively diagnosed acromegaly patient) and nine unaffected carriers. All pedigrees shared a 2.79 Mbp haploblock around AIP with additional haploblocks privately shared between subsets of the pedigrees, indicating the existence of an evolutionarily recent common ancestor, the 'English founder', with an estimated median tMRCA of 47 generations (corresponding to 1175 years) with a confidence interval (9-113 generations, equivalent to 225-2825 years). The mutation occurred in a small tandem repeat region predisposed to slipped strand mispairing. The resulting seven amino-acid duplication disrupts interaction with HSP90 and leads to a marked reduction in protein stability. CONCLUSIONS: The c.805_825dup allele, originating from a common ancestor, associates with a severe clinical phenotype and a high frequency of gigantism. The mutation is likely to be the result of slipped strand mispairing and affects protein-protein interactions and AIP protein stability.
  • Item
    Thumbnail Image
    How convincing is a matching Y-chromosome profile?
    Andersen, MM ; Balding, DJ ; de Knijff, P (PUBLIC LIBRARY SCIENCE, 2017-11)
    The introduction of forensic autosomal DNA profiles was controversial, but the problems were successfully addressed, and DNA profiling has gone on to revolutionise forensic science. Y-chromosome profiles are valuable when there is a mixture of male-source and female-source DNA, and interest centres on the identity of the male source(s) of the DNA. The problem of evaluating evidential weight is even more challenging for Y profiles than for autosomal profiles. Numerous approaches have been proposed, but they fail to deal adequately with the fact that men with matching Y-profiles are related in extended patrilineal clans, many of which may not be represented in available databases. The higher mutation rates of modern profiling kits have led to increased discriminatory power but they have also exacerbated the problem of fairly conveying evidential value. Because the relevant population is difficult to define, yet the number of matching relatives is fixed as population size varies, it is typically infeasible to derive population-based match probabilities relevant to a specific crime. We propose a conceptually simple solution, based on a simulation model and software to approximate the distribution of the number of males with a matching Y profile. We show that this distribution is robust to different values for the variance in reproductive success and the population growth rate. We also use importance sampling reweighting to derive the distribution of the number of matching males conditional on a database frequency, finding that this conditioning typically has only a modest impact. We illustrate the use of our approach to quantify the value of Y profile evidence for a court in a way that is both scientifically valid and easily comprehensible by a judge or juror.
  • Item
    Thumbnail Image
    Latin Americans show wide-spread Converso ancestry and imprint of local Native ancestry on physical appearance
    Chacon-Duque, J-C ; Adhikari, K ; Fuentes-Guajardo, M ; Mendoza-Revilla, J ; Acuna-Alonzo, V ; Barquera, R ; Quinto-Sanchez, M ; Gomez-Valdes, J ; Everardo Martinez, P ; Villamil-Ramirez, H ; Hunemeier, T ; Ramallo, V ; Silva de Cerqueira, CC ; Hurtado, M ; Villegas, V ; Granja, V ; Villena, M ; Vasquez, R ; Llop, E ; Sandoval, JR ; Salazar-Granara, AA ; Parolin, M-L ; Sandoval, K ; Penaloza-Espinosa, RI ; Rangel-Villalobos, H ; Winkler, CA ; Klitz, W ; Bravi, C ; Molina, J ; Corach, D ; Barrantes, R ; Gomes, V ; Resende, C ; Gusmao, L ; Amorim, A ; Xue, Y ; Dugoujon, J-M ; Moral, P ; Gonzalez-Jose, R ; Schuler-Faccini, L ; Salzano, FM ; Bortolini, M-C ; Canizales-Quinteros, S ; Poletti, G ; Gallo, C ; Bedoya, G ; Rothhammer, F ; Balding, D ; Hellenthal, G ; Ruiz-Linares, A (NATURE PORTFOLIO, 2018-12-19)
    Historical records and genetic analyses indicate that Latin Americans trace their ancestry mainly to the intermixing (admixture) of Native Americans, Europeans and Sub-Saharan Africans. Using novel haplotype-based methods, here we infer sub-continental ancestry in over 6,500 Latin Americans and evaluate the impact of regional ancestry variation on physical appearance. We find that Native American ancestry components in Latin Americans correspond geographically to the present-day genetic structure of Native groups, and that sources of non-Native ancestry, and admixture timings, match documented migratory flows. We also detect South/East Mediterranean ancestry across Latin America, probably stemming mostly from the clandestine colonial migration of Christian converts of non-European origin (Conversos). Furthermore, we find that ancestry related to highland (Central Andean) versus lowland (Mapuche) Natives is associated with variation in facial features, particularly nose morphology, and detect significant differences in allele frequencies between these groups at loci previously associated with nose morphology in this sample.
  • Item
    Thumbnail Image
    How many individuals share a mitochondrial genome?
    Andersen, MM ; Balding, DJ ; Weir, B (PUBLIC LIBRARY SCIENCE, 2018-11)
    Mitochondrial DNA (mtDNA) is useful to assist with identification of the source of a biological sample, or to confirm matrilineal relatedness. Although the autosomal genome is much larger, mtDNA has an advantage for forensic applications of multiple copy number per cell, allowing better recovery of sequence information from degraded samples. In addition, biological samples such as fingernails, old bones, teeth and hair have mtDNA but little or no autosomal DNA. The relatively low mutation rate of the mitochondrial genome (mitogenome) means that there can be large sets of matrilineal-related individuals sharing a common mitogenome. Here we present the mitolina simulation software that we use to describe the distribution of the number of mitogenomes in a population that match a given mitogenome, and investigate its dependence on population size and growth rate, and on a database count of the mitogenome. Further, we report on the distribution of the number of meioses separating pairs of individuals with matching mitogenome. Our results have important implications for assessing the weight of mtDNA profile evidence in forensic science, but mtDNA analysis has many non-human applications, for example in tracking the source of ivory. Our methods and software can also be used for simulations to help validate models of population history in human or non-human populations.
  • Item
    Thumbnail Image
    Genome-wide mega-analysis identifies 16 loci and highlights diverse biological mechanisms in the common epilepsies
    Abou-Khalil, B ; Auce, P ; Avbersek, A ; Bahlo, M ; Balding, DJ ; Bast, T ; Baum, L ; Becker, AJ ; Becker, F ; Berghuis, B ; Berkovic, SF ; Boysen, KE ; Bradfield, JP ; Brody, LC ; Buono, RJ ; Campbell, E ; Cascino, GD ; Catarino, CB ; Cavalleri, GL ; Cherny, SS ; Chinthapalli, K ; Coffey, AJ ; Compston, A ; Coppola, A ; Cossette, P ; Craig, JJ ; de Haan, G-J ; De Jonghe, P ; de Kovel, CGF ; Delanty, N ; Depondt, C ; Devinsky, O ; Dlugos, DJ ; Doherty, CP ; Elger, CE ; Eriksson, JG ; Ferraro, TN ; Feucht, M ; Francis, B ; Franke, A ; French, JA ; Freytag, S ; Gaus, V ; Geller, EB ; Gieger, C ; Glauser, T ; Glynn, S ; Goldstein, DB ; Gui, H ; Guo, Y ; Haas, KF ; Hakonarson, H ; Hallmann, K ; Haut, S ; Heinzen, EL ; Helbig, I ; Hengsbach, C ; Hjalgrim, H ; Iacomino, M ; Ingason, A ; Jamnadas-Khoda, J ; Johnson, MR ; Kalviainen, R ; Kantanen, A-M ; Kasperaviciute, D ; Trenite, DK-N ; Kirsch, HE ; Knowlton, RC ; Koeleman, BPC ; Krause, R ; Krenn, M ; Kunz, WS ; Kuzniecky, R ; Kwan, P ; Lal, D ; Lau, Y-L ; Lehesjoki, A-E ; Lerche, H ; Leu, C ; Lieb, W ; Lindhout, D ; Lo, WD ; Lopes-Cendes, I ; Lowenstein, DH ; Malovini, A ; Marson, AG ; Mayer, T ; McCormack, M ; Mills, JL ; Mirza, N ; Moerzinger, M ; Moller, RS ; Molloy, AM ; Muhle, H ; Newton, M ; Ng, P-W ; Noethen, MM ; Nuernberg, P ; O'Brien, TJ ; Oliver, KL ; Palotie, A ; Pangilinan, F ; Peter, S ; Petrovski, S ; Poduri, A ; Privitera, M ; Radtke, R ; Rau, S ; Reif, PS ; Reinthaler, EM ; Rosenow, F ; Sander, JW ; Sander, T ; Scattergood, T ; Schachter, SC ; Schankin, CJ ; Scheffer, IE ; Schmitz, B ; Schoch, S ; Sham, PC ; Shih, JJ ; Sills, GJ ; Sisodiya, SM ; Slattery, L ; Smith, A ; Smith, DF ; Smith, MC ; Smith, PE ; Sonsma, ACM ; Speed, D ; Sperling, MR ; Steinhoff, BJ ; Stephani, U ; Stevelink, R ; Strauch, K ; Striano, P ; Stroink, H ; Surges, R ; Tan, KM ; Thio, LL ; Thomas, GN ; Todaro, M ; Tozzi, R ; Vari, MS ; Vining, EPG ; Visscher, F ; von Spiczak, S ; Walley, NM ; Weber, YG ; Wei, Z ; Weisenberg, J ; Whelan, CD ; Widdess-Walsh, P ; Wolff, M ; Wolking, S ; Yang, W ; Zara, F ; Zimprich, F (NATURE PUBLISHING GROUP, 2018-12-10)
    The epilepsies affect around 65 million people worldwide and have a substantial missing heritability component. We report a genome-wide mega-analysis involving 15,212 individuals with epilepsy and 29,677 controls, which reveals 16 genome-wide significant loci, of which 11 are novel. Using various prioritization criteria, we pinpoint the 21 most likely epilepsy genes at these loci, with the majority in genetic generalized epilepsies. These genes have diverse biological functions, including coding for ion-channel subunits, transcription factors and a vitamin-B6 metabolism enzyme. Converging evidence shows that the common variants associated with epilepsy play a role in epigenetic regulation of gene expression in the brain. The results show an enrichment for monogenic epilepsy genes as well as known targets of antiepileptic drugs. Using SNP-based heritability analyses we disentangle both the unique and overlapping genetic basis to seven different epilepsy subtypes. Together, these findings provide leads for epilepsy therapies based on underlying pathophysiology.