School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 3 of 3
  • Item
    Thumbnail Image
    RNA-seq mixology: designing realistic control experiments to compare protocols and analysis methods
    Holik, AZ ; Law, CW ; Liu, R ; Wang, Z ; Wang, W ; Ahn, J ; Asselin-Labat, M-L ; Smyth, GK ; Ritchie, ME (OXFORD UNIV PRESS, 2017-03-17)
    Carefully designed control experiments provide a gold standard for benchmarking different genomics research tools. A shortcoming of many gene expression control studies is that replication involves profiling the same reference RNA sample multiple times. This leads to low, pure technical noise that is atypical of regular studies. To achieve a more realistic noise structure, we generated a RNA-sequencing mixture experiment using two cell lines of the same cancer type. Variability was added by extracting RNA from independent cell cultures and degrading particular samples. The systematic gene expression changes induced by this design allowed benchmarking of different library preparation kits (standard poly-A versus total RNA with Ribozero depletion) and analysis pipelines. Data generated using the total RNA kit had more signal for introns and various RNA classes (ncRNA, snRNA, snoRNA) and less variability after degradation. For differential expression analysis, voom with quality weights marginally outperformed other popular methods, while for differential splicing, DEXSeq was simultaneously the most sensitive and the most inconsistent method. For sample deconvolution analysis, DeMix outperformed IsoPure convincingly. Our RNA-sequencing data set provides a valuable resource for benchmarking different protocols and data pre-processing workflows. The extra noise mimics routine lab experiments more closely, ensuring any conclusions are widely applicable.
  • Item
    Thumbnail Image
    Repression of Igf1 expression by Ezh2 prevents basal cell differentiation in the developing lung
    Galvis, LA ; Holik, AZ ; Short, KM ; Pasquet, J ; Lun, ATL ; Blewitt, ME ; Smyth, IM ; Ritchie, ME ; Asselin-Labat, M-L (COMPANY BIOLOGISTS LTD, 2015-04-15)
    Epigenetic mechanisms involved in the establishment of lung epithelial cell lineage identities during development are largely unknown. Here, we explored the role of the histone methyltransferase Ezh2 during lung lineage determination. Loss of Ezh2 in the lung epithelium leads to defective lung formation and perinatal mortality. We show that Ezh2 is crucial for airway lineage specification and alveolarization. Using optical projection tomography imaging, we found that branching morphogenesis is affected in Ezh2 conditional knockout mice and the remaining bronchioles are abnormal, lacking terminally differentiated secretory club cells. Remarkably, RNA-seq analysis revealed the upregulation of basal genes in Ezh2-deficient epithelium. Three-dimensional imaging for keratin 5 further showed the unexpected presence of a layer of basal cells from the proximal airways to the distal bronchioles in E16.5 embryos. ChIP-seq analysis indicated the presence of Ezh2-mediated repressive marks on the genomic loci of some but not all basal genes, suggesting an indirect mechanism of action of Ezh2. We found that loss of Ezh2 de-represses insulin-like growth factor 1 (Igf1) expression and that modulation of IGF1 signaling ex vivo in wild-type lungs could induce basal cell differentiation. Altogether, our work reveals an unexpected role for Ezh2 in controlling basal cell fate determination in the embryonic lung endoderm, mediated in part by repression of Igf1 expression.
  • Item
    Thumbnail Image
    Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses
    Liu, R ; Holik, AZ ; Su, S ; Jansz, N ; Chen, K ; Leong, HS ; Blewitt, ME ; Asselin-Labat, M-L ; Smyth, GK ; Ritchie, ME (OXFORD UNIV PRESS, 2015-09-03)
    Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean-variance relationship of the log-counts-per-million using 'voom'. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source 'limma' package.