School of Agriculture, Food and Ecosystem Sciences - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 10
  • Item
    Thumbnail Image
    Sharing of either phenotypes or genetic variants can increase the accuracy of genomic prediction of feed efficiency
    Bolormaa, S ; MacLeod, IM ; Khansefid, M ; Marett, LC ; Wales, WJ ; Miglior, F ; Baes, CF ; Schenkel, FS ; Connor, EE ; Manzanilla-Pech, CI ; Stothard, P ; Herman, E ; Nieuwhof, GJ ; Goddard, ME ; Pryce, JE (BMC, 2022-09-06)
    BACKGROUND: Sharing individual phenotype and genotype data between countries is complex and fraught with potential errors, while sharing summary statistics of genome-wide association studies (GWAS) is relatively straightforward, and thus would be especially useful for traits that are expensive or difficult-to-measure, such as feed efficiency. Here we examined: (1) the sharing of individual cow data from international partners; and (2) the use of sequence variants selected from GWAS of international cow data to evaluate the accuracy of genomic estimated breeding values (GEBV) for residual feed intake (RFI) in Australian cows. RESULTS: GEBV for RFI were estimated using genomic best linear unbiased prediction (GBLUP) with 50k or high-density single nucleotide polymorphisms (SNPs), from a training population of 3797 individuals in univariate to trivariate analyses where the three traits were RFI phenotypes calculated using 584 Australian lactating cows (AUSc), 824 growing heifers (AUSh), and 2526 international lactating cows (OVE). Accuracies of GEBV in AUSc were evaluated by either cohort-by-birth-year or fourfold random cross-validations. GEBV of AUSc were also predicted using only the AUS training population with a weighted genomic relationship matrix constructed with SNPs from the 50k array and sequence variants selected from a meta-GWAS that included only international datasets. The genomic heritabilities estimated using the AUSc, OVE and AUSh datasets were moderate, ranging from 0.20 to 0.36. The genetic correlations (rg) of traits between heifers and cows ranged from 0.30 to 0.95 but were associated with large standard errors. The mean accuracies of GEBV in Australian cows were up to 0.32 and almost doubled when either overseas cows, or both overseas cows and AUS heifers were included in the training population. They also increased when selected sequence variants were combined with 50k SNPs, but with a smaller relative increase. CONCLUSIONS: The accuracy of RFI GEBV increased when international data were used or when selected sequence variants were combined with 50k SNP array data. This suggests that if direct sharing of data is not feasible, a meta-analysis of summary GWAS statistics could provide selected SNPs for custom panels to use in genomic selection programs. However, since this finding is based on a small cross-validation study, confirmation through a larger study is recommended.
  • Item
    Thumbnail Image
    Mutant alleles differentially shape fitness and other complex traits in cattle
    Xiang, R ; Breen, EJ ; Bolormaa, S ; Vander Jagt, CJ ; Chamberlain, AJ ; Macleod, IM ; Goddard, ME (NATURE PORTFOLIO, 2021-12-02)
    Mutant alleles (MAs) that have been classically recognised have large effects on phenotype and tend to be deleterious to traits and fitness. Is this the case for mutations with small effects? We infer MAs for 8 million sequence variants in 113k cattle and quantify the effects of MA on 37 complex traits. Heterozygosity for variants at genomic sites conserved across 100 vertebrate species increase fertility, stature, and milk production, positively associating these traits with fitness. MAs decrease stature and fat and protein concentration in milk, but increase gestation length and somatic cell count in milk (the latter indicative of mastitis). However, the frequency of MAs decreasing stature and fat and protein concentration, increasing gestation length and somatic cell count were lower than the frequency of MAs with the opposite effect. These results suggest bias in the mutations direction of effect (e.g. towards reduced protein in milk), but selection operating to reduce the frequency of these MAs. Taken together, our results imply two classes of genomic sites subject to long-term selection: sites conserved across vertebrates show hybrid vigour while sites subject to less long-term selection show a bias in mutation towards undesirable alleles.
  • Item
    Thumbnail Image
    Genome-wide fine-mapping identifies pleiotropic and functional variants that predict many traits across global cattle populations
    Xiang, R ; MacLeod, IM ; Daetwyler, HD ; de Jong, G ; O'Connor, E ; Schrooten, C ; Chamberlain, AJ ; Goddard, ME (NATURE RESEARCH, 2021-02-08)
    The difficulty in finding causative mutations has hampered their use in genomic prediction. Here, we present a methodology to fine-map potentially causal variants genome-wide by integrating the functional, evolutionary and pleiotropic information of variants using GWAS, variant clustering and Bayesian mixture models. Our analysis of 17 million sequence variants in 44,000+ Australian dairy cattle for 34 traits suggests, on average, one pleiotropic QTL existing in each 50 kb chromosome-segment. We selected a set of 80k variants representing potentially causal variants within each chromosome segment to develop a bovine XT-50K genotyping array. The custom array contains many pleiotropic variants with biological functions, including splicing QTLs and variants at conserved sites across 100 vertebrate species. This biology-informed custom array outperformed the standard array in predicting genetic value of multiple traits across populations in independent datasets of 90,000+ dairy cattle from the USA, Australia and New Zealand.
  • Item
    Thumbnail Image
    Improving Genomic Prediction of Crossbred and Purebred Dairy Cattle
    Khansefid, M ; Goddard, ME ; Haile-Mariam, M ; Konstantinov, K ; Schrooten, C ; de Jong, G ; Jewell, EG ; O'Connor, E ; Pryce, JE ; Daetwyler, HD ; MacLeod, IM (FRONTIERS MEDIA SA, 2020-12-14)
    This study assessed the accuracy and bias of genomic prediction (GP) in purebred Holstein (H) and Jersey (J) as well as crossbred (H and J) validation cows using different reference sets and prediction strategies. The reference sets were made up of different combinations of 36,695 H and J purebreds and crossbreds. Additionally, the effect of using different sets of marker genotypes on GP was studied (conventional panel: 50k, custom panel enriched with, or close to, causal mutations: XT_50k, and conventional high-density with a limited custom set: pruned HDnGBS). We also compared the use of genomic best linear unbiased prediction (GBLUP) and Bayesian (emBayesR) models, and the traits tested were milk, fat, and protein yields. On average, by including crossbred cows in the reference population, the prediction accuracies increased by 0.01-0.08 and were less biased (regression coefficient closer to 1 by 0.02-0.16), and the benefit was greater for crossbreds compared to purebreds. The accuracy of prediction increased by 0.02 using XT_50k compared to 50k genotypes without affecting the bias. Although using pruned HDnGBS instead of 50k also increased the prediction accuracy by about 0.02, it increased the bias for purebred predictions in emBayesR models. Generally, emBayesR outperformed GBLUP for prediction accuracy when using 50k or pruned HDnGBS genotypes, but the benefits diminished with XT_50k genotypes. Crossbred predictions derived from a joint pure H and J reference were similar in accuracy to crossbred predictions derived from the two separate purebred reference sets and combined proportional to breed composition. However, the latter approach was less biased by 0.13. Most interestingly, using an equalized breed reference instead of an H-dominated reference, on average, reduced the bias of prediction by 0.16-0.19 and increased the accuracy by 0.04 for crossbred and J cows, with a little change in the H accuracy. In conclusion, we observed improved genomic predictions for both crossbreds and purebreds by equalizing breed contributions in a mixed breed reference that included crossbred cows. Furthermore, we demonstrate, that compared to the conventional 50k or high-density panels, our customized set of 50k sequence markers improved or matched the prediction accuracy and reduced bias with both GBLUP and Bayesian models.
  • Item
    Thumbnail Image
    Inferring Demography from Runs of Homozygosity in Whole-Genome Sequence, with Correction for Sequence Errors
    MacLeod, IM ; Larkin, DM ; Lewin, HA ; Hayes, BJ ; Goddard, ME (OXFORD UNIV PRESS, 2013-09)
    Whole-genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using whole-genome sequence data from two Holstein cattle, we demonstrate a new method to correct for bias caused by hidden errors and then infer stepwise changes in ancestral demography up to present. There was a strong upward bias in estimates of recent effective population size (Ne) if the correction method was not applied to the data, both for our method and the Li and Durbin (Inference of human population history from individual whole-genome sequences. Nature 475:493-496) pairwise sequentially Markovian coalescent method. To infer demography, we use an analytical predictor of multiloci linkage disequilibrium (LD) based on a simple coalescent model that allows for changes in Ne. The LD statistic summarizes the distribution of runs of homozygosity for any given demography. We infer a best fit demography as one that predicts a match with the observed distribution of runs of homozygosity in the corrected sequence data. We use multiloci LD because it potentially holds more information about ancestral demography than pairwise LD. The inferred demography indicates a strong reduction in the Ne around 170,000 years ago, possibly related to the divergence of African and European Bos taurus cattle. This is followed by a further reduction coinciding with the period of cattle domestication, with Ne of between 3,500 and 6,000. The most recent reduction of Ne to approximately 100 in the Holstein breed agrees well with estimates from pedigrees. Our approach can be applied to whole-genome sequence from any diploid species and can be scaled up to use sequence from multiple individuals.
  • Item
    Thumbnail Image
    Exploiting biological priors and sequence variants enhances QTL discovery and genomic prediction of complex traits
    MacLeod, IM ; Bowman, PJ ; Vander Jagt, CJ ; Haile-Mariam, M ; Kemper, KE ; Chamberlain, AJ ; Schrooten, C ; Hayes, BJ ; Goddard, ME (BMC, 2016-02-27)
    BACKGROUND: Dense SNP genotypes are often combined with complex trait phenotypes to map causal variants, study genetic architecture and provide genomic predictions for individuals with genotypes but no phenotype. A single method of analysis that jointly fits all genotypes in a Bayesian mixture model (BayesR) has been shown to competitively address all 3 purposes simultaneously. However, BayesR and other similar methods ignore prior biological knowledge and assume all genotypes are equally likely to affect the trait. While this assumption is reasonable for SNP array genotypes, it is less sensible if genotypes are whole-genome sequence variants which should include causal variants. RESULTS: We introduce a new method (BayesRC) based on BayesR that incorporates prior biological information in the analysis by defining classes of variants likely to be enriched for causal mutations. The information can be derived from a range of sources, including variant annotation, candidate gene lists and known causal variants. This information is then incorporated objectively in the analysis based on evidence of enrichment in the data. We demonstrate the increased power of BayesRC compared to BayesR using real dairy cattle genotypes with simulated phenotypes. The genotypes were imputed whole-genome sequence variants in coding regions combined with dense SNP markers. BayesRC increased the power to detect causal variants and increased the accuracy of genomic prediction. The relative improvement for genomic prediction was most apparent in validation populations that were not closely related to the reference population. We also applied BayesRC to real milk production phenotypes in dairy cattle using independent biological priors from gene expression analyses. Although current biological knowledge of which genes and variants affect milk production is still very incomplete, our results suggest that the new BayesRC method was equal to or more powerful than BayesR for detecting candidate causal variants and for genomic prediction of milk traits. CONCLUSIONS: BayesRC provides a novel and flexible approach to simultaneously improving the accuracy of QTL discovery and genomic prediction by taking advantage of prior biological knowledge. Approaches such as BayesRC will become increasing useful as biological knowledge accumulates regarding functional regions of the genome for a range of traits and species.
  • Item
    Thumbnail Image
    Rare Variants in Transcript and Potential Regulatory Regions Explain a Small Percentage of the Missing Heritability of Complex Traits in Cattle
    Gonzalez-Recio, O ; Daetwyler, HD ; MacLeod, IM ; Pryce, JE ; Bowman, PJ ; Hayes, BJ ; Goddard, ME ; te Pas, MFW (PUBLIC LIBRARY SCIENCE, 2015-12-07)
    The proportion of genetic variation in complex traits explained by rare variants is a key question for genomic prediction, and for identifying the basis of "missing heritability"--the proportion of additive genetic variation not captured by common variants on SNP arrays. Sequence variants in transcript and regulatory regions from 429 sequenced animals were used to impute high density SNP genotypes of 3311 Holstein sires to sequence. There were 675,062 common variants (MAF>0.05), 102,549 uncommon variants (0.01
  • Item
    Thumbnail Image
    Quantifying the contribution of sequence variants with regulatory and evolutionary significance to 34 bovine complex traits
    Xiang, R ; van den Berg, I ; MacLeod, IM ; Hayes, BJ ; Prowse-Wilkins, CP ; Wang, M ; Bolormaa, S ; Liu, Z ; Rochfort, SJ ; Reich, CM ; Mason, BA ; Vander Jagt, CJ ; Daetwyler, HD ; Lund, MS ; Chamberlain, AJ ; Goddard, ME (NATL ACAD SCIENCES, 2019-09-24)
    Many genome variants shaping mammalian phenotype are hypothesized to regulate gene transcription and/or to be under selection. However, most of the evidence to support this hypothesis comes from human studies. Systematic evidence for regulatory and evolutionary signals contributing to complex traits in a different mammalian model is needed. Sequence variants associated with gene expression (expression quantitative trait loci [eQTLs]) and concentration of metabolites (metabolic quantitative trait loci [mQTLs]) and under histone-modification marks in several tissues were discovered from multiomics data of over 400 cattle. Variants under selection and evolutionary constraint were identified using genome databases of multiple species. These analyses defined 30 sets of variants, and for each set, we estimated the genetic variance the set explained across 34 complex traits in 11,923 bulls and 32,347 cows with 17,669,372 imputed variants. The per-variant trait heritability of these sets across traits was highly consistent (r > 0.94) between bulls and cows. Based on the per-variant heritability, conserved sites across 100 vertebrate species and mQTLs ranked the highest, followed by eQTLs, young variants, those under histone-modification marks, and selection signatures. From these results, we defined a Functional-And-Evolutionary Trait Heritability (FAETH) score indicating the functionality and predicted heritability of each variant. In additional 7,551 cattle, the high FAETH-ranking variants had significantly increased genetic variances and genomic prediction accuracies in 3 production traits compared to the low FAETH-ranking variants. The FAETH framework combines the information of gene regulation, evolution, and trait heritability to rank variants, and the publicly available FAETH data provide a set of biological priors for cattle genomic selection worldwide.
  • Item
    Thumbnail Image
    Effect direction meta-analysis of GWAS identifies extreme, prevalent and shared pleiotropy in a large mammal
    Xiang, R ; van den Berg, I ; MacLeod, IM ; Daetwyler, HD ; Goddard, ME (NATURE PUBLISHING GROUP, 2020-02-28)
    In genome-wide association studies (GWAS), variants showing consistent effect directions across populations are considered as true discoveries. We model this information in an Effect Direction MEta-analysis (EDME) to quantify pleiotropy using GWAS of 34 Cholesky-decorrelated traits in 44,000+ cattle with sequence variants. The effect-direction agreement between independent bull and cow datasets was used to quantify the false discovery rate by effect direction (FDRed) and the number of affected traits for prioritised variants. Variants with multi-trait p < 1e-6 affected 1∼22 traits with an average of 10 traits. EDME assigns pleiotropic variants to each trait which informs the biology behind complex traits. New pleiotropic loci are identified, including signals from the cattle FTO locus mirroring its bystander effects on human obesity. When validated in the 1000-Bull Genome database, the prioritized pleiotropic variants consistently predicted expected phenotypic differences between dairy and beef cattle. EDME provides robust approaches to control GWAS FDR and quantify pleiotropy.
  • Item
    Thumbnail Image
    A novel predictor of multilocus haplotype homozygosity: comparison with existing predictors
    MacLeod, IM ; Meuwissen, THE ; Hayes, BJ ; Goddard, ME (HINDAWI LTD, 2009-12)
    The patterns of linkage disequilibrium (LD) between dense polymorphic markers are shaped by the ancestral population history. It is therefore possible to use multilocus predictors of LD to infer past population history and to infer sharing of identical alleles in quantitative trait locus (QTL) studies. We develop a multilocus predictor of LD for pairs of haplotypes, which we term haplotype homozygosity (HHn): the probability that any two haplotypes share a given number of n adjacent identical markers or 'runs of homozygosity'. Our method, based on simplified coalescence theory, accounts for recombination and mutation. We compare our HHn predictions, with HHn in simulated populations and with two published predictors of HHn. Our method performs consistently better across a range of population parameters, including populations with a severe bottleneck followed by expansion, compared to two published methods. We demonstrate that we can predict the pattern of HHn observed in dense single nucleotide polymorphisms (SNPs) genotyped in a cattle population, given appropriate historical changes in population size. Our method is practical for use with very large numbers of individuals and dense genome wide polymorphic DNA data. It has potential applications in inferring ancestral population history and QTL mapping studies.