School of Agriculture, Food and Ecosystem Sciences - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 5 of 5
  • Item
    Thumbnail Image
    Inferring Demography from Runs of Homozygosity in Whole-Genome Sequence, with Correction for Sequence Errors
    MacLeod, IM ; Larkin, DM ; Lewin, HA ; Hayes, BJ ; Goddard, ME (OXFORD UNIV PRESS, 2013-09)
    Whole-genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using whole-genome sequence data from two Holstein cattle, we demonstrate a new method to correct for bias caused by hidden errors and then infer stepwise changes in ancestral demography up to present. There was a strong upward bias in estimates of recent effective population size (Ne) if the correction method was not applied to the data, both for our method and the Li and Durbin (Inference of human population history from individual whole-genome sequences. Nature 475:493-496) pairwise sequentially Markovian coalescent method. To infer demography, we use an analytical predictor of multiloci linkage disequilibrium (LD) based on a simple coalescent model that allows for changes in Ne. The LD statistic summarizes the distribution of runs of homozygosity for any given demography. We infer a best fit demography as one that predicts a match with the observed distribution of runs of homozygosity in the corrected sequence data. We use multiloci LD because it potentially holds more information about ancestral demography than pairwise LD. The inferred demography indicates a strong reduction in the Ne around 170,000 years ago, possibly related to the divergence of African and European Bos taurus cattle. This is followed by a further reduction coinciding with the period of cattle domestication, with Ne of between 3,500 and 6,000. The most recent reduction of Ne to approximately 100 in the Holstein breed agrees well with estimates from pedigrees. Our approach can be applied to whole-genome sequence from any diploid species and can be scaled up to use sequence from multiple individuals.
  • Item
    Thumbnail Image
    Exploiting biological priors and sequence variants enhances QTL discovery and genomic prediction of complex traits
    MacLeod, IM ; Bowman, PJ ; Vander Jagt, CJ ; Haile-Mariam, M ; Kemper, KE ; Chamberlain, AJ ; Schrooten, C ; Hayes, BJ ; Goddard, ME (BMC, 2016-02-27)
    BACKGROUND: Dense SNP genotypes are often combined with complex trait phenotypes to map causal variants, study genetic architecture and provide genomic predictions for individuals with genotypes but no phenotype. A single method of analysis that jointly fits all genotypes in a Bayesian mixture model (BayesR) has been shown to competitively address all 3 purposes simultaneously. However, BayesR and other similar methods ignore prior biological knowledge and assume all genotypes are equally likely to affect the trait. While this assumption is reasonable for SNP array genotypes, it is less sensible if genotypes are whole-genome sequence variants which should include causal variants. RESULTS: We introduce a new method (BayesRC) based on BayesR that incorporates prior biological information in the analysis by defining classes of variants likely to be enriched for causal mutations. The information can be derived from a range of sources, including variant annotation, candidate gene lists and known causal variants. This information is then incorporated objectively in the analysis based on evidence of enrichment in the data. We demonstrate the increased power of BayesRC compared to BayesR using real dairy cattle genotypes with simulated phenotypes. The genotypes were imputed whole-genome sequence variants in coding regions combined with dense SNP markers. BayesRC increased the power to detect causal variants and increased the accuracy of genomic prediction. The relative improvement for genomic prediction was most apparent in validation populations that were not closely related to the reference population. We also applied BayesRC to real milk production phenotypes in dairy cattle using independent biological priors from gene expression analyses. Although current biological knowledge of which genes and variants affect milk production is still very incomplete, our results suggest that the new BayesRC method was equal to or more powerful than BayesR for detecting candidate causal variants and for genomic prediction of milk traits. CONCLUSIONS: BayesRC provides a novel and flexible approach to simultaneously improving the accuracy of QTL discovery and genomic prediction by taking advantage of prior biological knowledge. Approaches such as BayesRC will become increasing useful as biological knowledge accumulates regarding functional regions of the genome for a range of traits and species.
  • Item
    Thumbnail Image
    Rare Variants in Transcript and Potential Regulatory Regions Explain a Small Percentage of the Missing Heritability of Complex Traits in Cattle
    Gonzalez-Recio, O ; Daetwyler, HD ; MacLeod, IM ; Pryce, JE ; Bowman, PJ ; Hayes, BJ ; Goddard, ME ; te Pas, MFW (PUBLIC LIBRARY SCIENCE, 2015-12-07)
    The proportion of genetic variation in complex traits explained by rare variants is a key question for genomic prediction, and for identifying the basis of "missing heritability"--the proportion of additive genetic variation not captured by common variants on SNP arrays. Sequence variants in transcript and regulatory regions from 429 sequenced animals were used to impute high density SNP genotypes of 3311 Holstein sires to sequence. There were 675,062 common variants (MAF>0.05), 102,549 uncommon variants (0.01
  • Item
    Thumbnail Image
    Quantifying the contribution of sequence variants with regulatory and evolutionary significance to 34 bovine complex traits
    Xiang, R ; van den Berg, I ; MacLeod, IM ; Hayes, BJ ; Prowse-Wilkins, CP ; Wang, M ; Bolormaa, S ; Liu, Z ; Rochfort, SJ ; Reich, CM ; Mason, BA ; Vander Jagt, CJ ; Daetwyler, HD ; Lund, MS ; Chamberlain, AJ ; Goddard, ME (NATL ACAD SCIENCES, 2019-09-24)
    Many genome variants shaping mammalian phenotype are hypothesized to regulate gene transcription and/or to be under selection. However, most of the evidence to support this hypothesis comes from human studies. Systematic evidence for regulatory and evolutionary signals contributing to complex traits in a different mammalian model is needed. Sequence variants associated with gene expression (expression quantitative trait loci [eQTLs]) and concentration of metabolites (metabolic quantitative trait loci [mQTLs]) and under histone-modification marks in several tissues were discovered from multiomics data of over 400 cattle. Variants under selection and evolutionary constraint were identified using genome databases of multiple species. These analyses defined 30 sets of variants, and for each set, we estimated the genetic variance the set explained across 34 complex traits in 11,923 bulls and 32,347 cows with 17,669,372 imputed variants. The per-variant trait heritability of these sets across traits was highly consistent (r > 0.94) between bulls and cows. Based on the per-variant heritability, conserved sites across 100 vertebrate species and mQTLs ranked the highest, followed by eQTLs, young variants, those under histone-modification marks, and selection signatures. From these results, we defined a Functional-And-Evolutionary Trait Heritability (FAETH) score indicating the functionality and predicted heritability of each variant. In additional 7,551 cattle, the high FAETH-ranking variants had significantly increased genetic variances and genomic prediction accuracies in 3 production traits compared to the low FAETH-ranking variants. The FAETH framework combines the information of gene regulation, evolution, and trait heritability to rank variants, and the publicly available FAETH data provide a set of biological priors for cattle genomic selection worldwide.
  • Item
    Thumbnail Image
    A novel predictor of multilocus haplotype homozygosity: comparison with existing predictors
    MacLeod, IM ; Meuwissen, THE ; Hayes, BJ ; Goddard, ME (HINDAWI LTD, 2009-12)
    The patterns of linkage disequilibrium (LD) between dense polymorphic markers are shaped by the ancestral population history. It is therefore possible to use multilocus predictors of LD to infer past population history and to infer sharing of identical alleles in quantitative trait locus (QTL) studies. We develop a multilocus predictor of LD for pairs of haplotypes, which we term haplotype homozygosity (HHn): the probability that any two haplotypes share a given number of n adjacent identical markers or 'runs of homozygosity'. Our method, based on simplified coalescence theory, accounts for recombination and mutation. We compare our HHn predictions, with HHn in simulated populations and with two published predictors of HHn. Our method performs consistently better across a range of population parameters, including populations with a severe bottleneck followed by expansion, compared to two published methods. We demonstrate that we can predict the pattern of HHn observed in dense single nucleotide polymorphisms (SNPs) genotyped in a cattle population, given appropriate historical changes in population size. Our method is practical for use with very large numbers of individuals and dense genome wide polymorphic DNA data. It has potential applications in inferring ancestral population history and QTL mapping studies.