School of Agriculture, Food and Ecosystem Sciences - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 6 of 6
  • Item
    Thumbnail Image
    Improving Genomic Prediction of Crossbred and Purebred Dairy Cattle
    Khansefid, M ; Goddard, ME ; Haile-Mariam, M ; Konstantinov, K ; Schrooten, C ; de Jong, G ; Jewell, EG ; O'Connor, E ; Pryce, JE ; Daetwyler, HD ; MacLeod, IM (FRONTIERS MEDIA SA, 2020-12-14)
    This study assessed the accuracy and bias of genomic prediction (GP) in purebred Holstein (H) and Jersey (J) as well as crossbred (H and J) validation cows using different reference sets and prediction strategies. The reference sets were made up of different combinations of 36,695 H and J purebreds and crossbreds. Additionally, the effect of using different sets of marker genotypes on GP was studied (conventional panel: 50k, custom panel enriched with, or close to, causal mutations: XT_50k, and conventional high-density with a limited custom set: pruned HDnGBS). We also compared the use of genomic best linear unbiased prediction (GBLUP) and Bayesian (emBayesR) models, and the traits tested were milk, fat, and protein yields. On average, by including crossbred cows in the reference population, the prediction accuracies increased by 0.01-0.08 and were less biased (regression coefficient closer to 1 by 0.02-0.16), and the benefit was greater for crossbreds compared to purebreds. The accuracy of prediction increased by 0.02 using XT_50k compared to 50k genotypes without affecting the bias. Although using pruned HDnGBS instead of 50k also increased the prediction accuracy by about 0.02, it increased the bias for purebred predictions in emBayesR models. Generally, emBayesR outperformed GBLUP for prediction accuracy when using 50k or pruned HDnGBS genotypes, but the benefits diminished with XT_50k genotypes. Crossbred predictions derived from a joint pure H and J reference were similar in accuracy to crossbred predictions derived from the two separate purebred reference sets and combined proportional to breed composition. However, the latter approach was less biased by 0.13. Most interestingly, using an equalized breed reference instead of an H-dominated reference, on average, reduced the bias of prediction by 0.16-0.19 and increased the accuracy by 0.04 for crossbred and J cows, with a little change in the H accuracy. In conclusion, we observed improved genomic predictions for both crossbreds and purebreds by equalizing breed contributions in a mixed breed reference that included crossbred cows. Furthermore, we demonstrate, that compared to the conventional 50k or high-density panels, our customized set of 50k sequence markers improved or matched the prediction accuracy and reduced bias with both GBLUP and Bayesian models.
  • Item
    Thumbnail Image
    Inferring Demography from Runs of Homozygosity in Whole-Genome Sequence, with Correction for Sequence Errors
    MacLeod, IM ; Larkin, DM ; Lewin, HA ; Hayes, BJ ; Goddard, ME (OXFORD UNIV PRESS, 2013-09)
    Whole-genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using whole-genome sequence data from two Holstein cattle, we demonstrate a new method to correct for bias caused by hidden errors and then infer stepwise changes in ancestral demography up to present. There was a strong upward bias in estimates of recent effective population size (Ne) if the correction method was not applied to the data, both for our method and the Li and Durbin (Inference of human population history from individual whole-genome sequences. Nature 475:493-496) pairwise sequentially Markovian coalescent method. To infer demography, we use an analytical predictor of multiloci linkage disequilibrium (LD) based on a simple coalescent model that allows for changes in Ne. The LD statistic summarizes the distribution of runs of homozygosity for any given demography. We infer a best fit demography as one that predicts a match with the observed distribution of runs of homozygosity in the corrected sequence data. We use multiloci LD because it potentially holds more information about ancestral demography than pairwise LD. The inferred demography indicates a strong reduction in the Ne around 170,000 years ago, possibly related to the divergence of African and European Bos taurus cattle. This is followed by a further reduction coinciding with the period of cattle domestication, with Ne of between 3,500 and 6,000. The most recent reduction of Ne to approximately 100 in the Holstein breed agrees well with estimates from pedigrees. Our approach can be applied to whole-genome sequence from any diploid species and can be scaled up to use sequence from multiple individuals.
  • Item
    Thumbnail Image
    Exploiting biological priors and sequence variants enhances QTL discovery and genomic prediction of complex traits
    MacLeod, IM ; Bowman, PJ ; Vander Jagt, CJ ; Haile-Mariam, M ; Kemper, KE ; Chamberlain, AJ ; Schrooten, C ; Hayes, BJ ; Goddard, ME (BMC, 2016-02-27)
    BACKGROUND: Dense SNP genotypes are often combined with complex trait phenotypes to map causal variants, study genetic architecture and provide genomic predictions for individuals with genotypes but no phenotype. A single method of analysis that jointly fits all genotypes in a Bayesian mixture model (BayesR) has been shown to competitively address all 3 purposes simultaneously. However, BayesR and other similar methods ignore prior biological knowledge and assume all genotypes are equally likely to affect the trait. While this assumption is reasonable for SNP array genotypes, it is less sensible if genotypes are whole-genome sequence variants which should include causal variants. RESULTS: We introduce a new method (BayesRC) based on BayesR that incorporates prior biological information in the analysis by defining classes of variants likely to be enriched for causal mutations. The information can be derived from a range of sources, including variant annotation, candidate gene lists and known causal variants. This information is then incorporated objectively in the analysis based on evidence of enrichment in the data. We demonstrate the increased power of BayesRC compared to BayesR using real dairy cattle genotypes with simulated phenotypes. The genotypes were imputed whole-genome sequence variants in coding regions combined with dense SNP markers. BayesRC increased the power to detect causal variants and increased the accuracy of genomic prediction. The relative improvement for genomic prediction was most apparent in validation populations that were not closely related to the reference population. We also applied BayesRC to real milk production phenotypes in dairy cattle using independent biological priors from gene expression analyses. Although current biological knowledge of which genes and variants affect milk production is still very incomplete, our results suggest that the new BayesRC method was equal to or more powerful than BayesR for detecting candidate causal variants and for genomic prediction of milk traits. CONCLUSIONS: BayesRC provides a novel and flexible approach to simultaneously improving the accuracy of QTL discovery and genomic prediction by taking advantage of prior biological knowledge. Approaches such as BayesRC will become increasing useful as biological knowledge accumulates regarding functional regions of the genome for a range of traits and species.
  • Item
    Thumbnail Image
    Rare Variants in Transcript and Potential Regulatory Regions Explain a Small Percentage of the Missing Heritability of Complex Traits in Cattle
    Gonzalez-Recio, O ; Daetwyler, HD ; MacLeod, IM ; Pryce, JE ; Bowman, PJ ; Hayes, BJ ; Goddard, ME ; te Pas, MFW (PUBLIC LIBRARY SCIENCE, 2015-12-07)
    The proportion of genetic variation in complex traits explained by rare variants is a key question for genomic prediction, and for identifying the basis of "missing heritability"--the proportion of additive genetic variation not captured by common variants on SNP arrays. Sequence variants in transcript and regulatory regions from 429 sequenced animals were used to impute high density SNP genotypes of 3311 Holstein sires to sequence. There were 675,062 common variants (MAF>0.05), 102,549 uncommon variants (0.01
  • Item
    Thumbnail Image
    Quantifying the contribution of sequence variants with regulatory and evolutionary significance to 34 bovine complex traits
    Xiang, R ; van den Berg, I ; MacLeod, IM ; Hayes, BJ ; Prowse-Wilkins, CP ; Wang, M ; Bolormaa, S ; Liu, Z ; Rochfort, SJ ; Reich, CM ; Mason, BA ; Vander Jagt, CJ ; Daetwyler, HD ; Lund, MS ; Chamberlain, AJ ; Goddard, ME (NATL ACAD SCIENCES, 2019-09-24)
    Many genome variants shaping mammalian phenotype are hypothesized to regulate gene transcription and/or to be under selection. However, most of the evidence to support this hypothesis comes from human studies. Systematic evidence for regulatory and evolutionary signals contributing to complex traits in a different mammalian model is needed. Sequence variants associated with gene expression (expression quantitative trait loci [eQTLs]) and concentration of metabolites (metabolic quantitative trait loci [mQTLs]) and under histone-modification marks in several tissues were discovered from multiomics data of over 400 cattle. Variants under selection and evolutionary constraint were identified using genome databases of multiple species. These analyses defined 30 sets of variants, and for each set, we estimated the genetic variance the set explained across 34 complex traits in 11,923 bulls and 32,347 cows with 17,669,372 imputed variants. The per-variant trait heritability of these sets across traits was highly consistent (r > 0.94) between bulls and cows. Based on the per-variant heritability, conserved sites across 100 vertebrate species and mQTLs ranked the highest, followed by eQTLs, young variants, those under histone-modification marks, and selection signatures. From these results, we defined a Functional-And-Evolutionary Trait Heritability (FAETH) score indicating the functionality and predicted heritability of each variant. In additional 7,551 cattle, the high FAETH-ranking variants had significantly increased genetic variances and genomic prediction accuracies in 3 production traits compared to the low FAETH-ranking variants. The FAETH framework combines the information of gene regulation, evolution, and trait heritability to rank variants, and the publicly available FAETH data provide a set of biological priors for cattle genomic selection worldwide.
  • Item
    Thumbnail Image
    Effect direction meta-analysis of GWAS identifies extreme, prevalent and shared pleiotropy in a large mammal
    Xiang, R ; van den Berg, I ; MacLeod, IM ; Daetwyler, HD ; Goddard, ME (NATURE PUBLISHING GROUP, 2020-02-28)
    In genome-wide association studies (GWAS), variants showing consistent effect directions across populations are considered as true discoveries. We model this information in an Effect Direction MEta-analysis (EDME) to quantify pleiotropy using GWAS of 34 Cholesky-decorrelated traits in 44,000+ cattle with sequence variants. The effect-direction agreement between independent bull and cow datasets was used to quantify the false discovery rate by effect direction (FDRed) and the number of affected traits for prioritised variants. Variants with multi-trait p < 1e-6 affected 1∼22 traits with an average of 10 traits. EDME assigns pleiotropic variants to each trait which informs the biology behind complex traits. New pleiotropic loci are identified, including signals from the cattle FTO locus mirroring its bystander effects on human obesity. When validated in the 1000-Bull Genome database, the prioritized pleiotropic variants consistently predicted expected phenotypic differences between dairy and beef cattle. EDME provides robust approaches to control GWAS FDR and quantify pleiotropy.