The genetics of gene expression: from simulations to the early-life origins of immune diseases
Document TypePhD thesis
Access StatusOpen Access
© 2019 Qinqin Huang
Human complex traits and diseases are often highly polygenic. Genome-wide association studies (GWAS) have been successful in identifying the underlying genetic components. However, challenges still remain and one of them is the biological interpretation of these findings. Genetic variants that are associated with diseases or traits are enriched in regulatory regions of the genome, suggesting that they may have a role in the regulation of intermediate molecular phenotypes, such as mRNA gene expression. Studies investigating the genetic architecture of gene expression variation, or expression quantitative trait loci (eQTLs), have aided the interpretation of GWAS findings by providing potential mechanisms through which the genetic variants contribute to higher-order phenotypes. In addition, eQTLs identified in disease-relevant tissues, or those that are specific to certain cell types or conditions are more informative in disease pathogenesis. This thesis first explored eQTL study design and analysis choices using extensive, empirically driven simulations with varying sample sizes, true effect sizes, and allele frequencies of true eQTLs. False discovery rate (FDR) control applied to the entire collection of tests had inflated FDR of genes with eQTLs (eGenes) in most scenarios; in contrast, hierarchical correction procedures had well-calibrated FDR. Significant eQTLs with low allele frequencies identified using small sample sizes were enriched for false positives. Overestimation of eQTL effect sizes was common in scenarios with low statistical power, and a bootstrap method (BootstrapQTL) which can lead to more accurate effect size estimation was developed. Based on the insights of the eQTL simulation study, optimal strategies were selected for the following eQTL analysis in two types of neonatal immune cells (monocytes and T cells) under resting and stimulated conditions. A great proportion of cis-eQTLs were specific to a certain cell type or condition, and the majority of them were observed only upon stimulation. Response eQTLs (reQTLs), with effects on gene expression modified by immune responses, were identified for 31% of the eGenes in monocytes and 52% of the eGenes in T cells. Trans-eQTL effects that were mediated through expression of cis-eGenes were observed. Lastly, integrative analyses were performed, using the early-life eQTLs, as well as GWAS variants associated with immune-related diseases obtained from external large cohorts. Significant overlaps between neonatal eQTLs and postnatal disease-associated variants were observed. Some cell type- or condition-specific cis-eQTLs colocalised with disease associations, suggesting that the potential risk genes involved in disease pathogenesis are linked to the stimulation of certain immune cells. Causal effects of genes were evaluated using Mendelian randomisation, and changes in expression levels (e.g. BTN3A2) were identified to have causal associations with multiple immune-related diseases. Taken together, it demonstrates that the early-life genetic variants and gene expression might contribute to later disease development. In conclusion, this thesis provides a strong evidence base for eQTL study design and guidance for analysis strategies in future studies. The characterisation of genetic regulation of neonatal immune responses and the interaction between regulatory variants and stimulatory conditions is a useful resource, and generates insights on the early-life origins of immune-related diseases that develop later in life.
KeywordseQTLs; Response eQTLs; Neonates; T cells; Monocytes; Immune diseases; Genetics; Gene expression; Genomics; Colocalisation; Mendelian Randomisation
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References