School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 8 of 8
  • Item
    Thumbnail Image
    A guide to creating design matrices for gene expression experiments.
    Law, CW ; Zeglinski, K ; Dong, X ; Alhamdoosh, M ; Smyth, GK ; Ritchie, ME (F1000 Research Ltd, 2020)
    Differential expression analysis of genomic data types, such as RNA-sequencing experiments, use linear models to determine the size and direction of the changes in gene expression. For RNA-sequencing, there are several established software packages for this purpose accompanied with analysis pipelines that are well described. However, there are two crucial steps in the analysis process that can be a stumbling block for many -- the set up an appropriate model via design matrices and the set up of comparisons of interest via contrast matrices. These steps are particularly troublesome because an extensive catalogue for design and contrast matrices does not currently exist. One would usually search for example case studies across different platforms and mix and match the advice from those sources to suit the dataset they have at hand. This article guides the reader through the basics of how to set up design and contrast matrices. We take a practical approach by providing code and graphical representation of each case study, starting with simpler examples (e.g. models with a single explanatory variable) and move onto more complex ones (e.g. interaction models, mixed effects models, higher order time series and cyclical models). Although our work has been written specifically with a limma-style pipeline in mind, most of it is also applicable to other software packages for differential expression analysis, and the ideas covered can be adapted to data analysis of other high-throughput technologies. Where appropriate, we explain the interpretation and differences between models to aid readers in their own model choices. Unnecessary jargon and theory is omitted where possible so that our work is accessible to a wide audience of readers, from beginners to those with experience in genomics data analysis.
  • Item
    Thumbnail Image
    limma powers differential expression analyses for RNA-sequencing and microarray studies
    Ritchie, ME ; Phipson, B ; Wu, D ; Hu, Y ; Law, CW ; Shi, W ; Smyth, GK (OXFORD UNIV PRESS, 2015-04-20)
    limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
  • Item
    Thumbnail Image
    A pooled shRNA screen for regulators of primary mammary stem and progenitor cells identifies roles for Asap1 and Prox1
    Sheridan, JM ; Ritchie, ME ; Best, SA ; Jiang, K ; Beck, TJ ; Vaillant, F ; Liu, K ; Dickins, RA ; Smyth, GK ; Lindeman, GJ ; Visvader, JE (BMC, 2015-04-03)
    BACKGROUND: The molecular regulators that orchestrate stem cell renewal, proliferation and differentiation along the mammary epithelial hierarchy remain poorly understood. Here we have performed a large-scale pooled RNAi screen in primary mouse mammary stem cell (MaSC)-enriched basal cells using 1295 shRNAs against genes principally involved in transcriptional regulation. METHODS: MaSC-enriched basal cells transduced with lentivirus pools carrying shRNAs were maintained as non-adherent mammospheres, a system known to support stem and progenitor cells. Integrated shRNAs that altered culture kinetics were identified by next generation sequencing as relative frequency changes over time. RNA-seq-based expression profiling coupled with in vitro progenitor and in vivo transplantation assays was used to confirm a role for candidate genes in mammary stem and/or progenitor cells. RESULTS: Utilizing a mammosphere-based assay, the screen identified several candidate regulators. Although some genes had been previously implicated in mammary gland development, the vast majority of genes uncovered have no known function within the mammary gland. RNA-seq analysis of freshly purified primary mammary epithelial populations and short-term cultured mammospheres was used to confirm the expression of candidate regulators. Two genes, Asap1 and Prox1, respectively implicated in breast cancer metastasis and progenitor cell function in other systems, were selected for further analysis as their roles in the normal mammary gland were unknown. Both Prox1 and Asap1 were shown to act as negative regulators of progenitor activity in vitro, and Asap1 knock-down led to a marked increase in repopulating activity in vivo, implying a role in stem cell activity. CONCLUSIONS: This study has revealed a number of novel genes that influence the activity or survival of mammary stem and/or progenitor cells. Amongst these, we demonstrate that Prox1 and Asap1 behave as negative regulators of mammary stem/progenitor function. Both of these genes have also been implicated in oncogenesis. Our findings provide proof of principle for the use of short-term cultured primary MaSC/basal cells in functional RNAi screens.
  • Item
    Thumbnail Image
    RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR.
    Law, CW ; Alhamdoosh, M ; Su, S ; Dong, X ; Tian, L ; Smyth, GK ; Ritchie, ME (F1000 Research Ltd, 2016)
    The ability to easily and efficiently analyse RNA-sequencing data is a key strength of the Bioconductor project. Starting with counts summarised at the gene-level, a typical analysis involves pre-processing, exploratory data analysis, differential expression testing and pathway analysis with the results obtained informing future experiments and validation studies. In this workflow article, we analyse RNA-sequencing data from the mouse mammary gland, demonstrating use of the popular edgeR package to import, organise, filter and normalise the data, followed by the limma package with its voom method, linear modelling and empirical Bayes moderation to assess differential expression and perform gene set testing. This pipeline is further enhanced by the Glimma package which enables interactive exploration of the results so that individual samples and genes can be examined by the user. The complete analysis offered by these three packages highlights the ease with which researchers can turn the raw counts from an RNA-sequencing experiment into biological insights using Bioconductor.
  • Item
    Thumbnail Image
    RNA-seq mixology: designing realistic control experiments to compare protocols and analysis methods
    Holik, AZ ; Law, CW ; Liu, R ; Wang, Z ; Wang, W ; Ahn, J ; Asselin-Labat, M-L ; Smyth, GK ; Ritchie, ME (OXFORD UNIV PRESS, 2017-03-17)
    Carefully designed control experiments provide a gold standard for benchmarking different genomics research tools. A shortcoming of many gene expression control studies is that replication involves profiling the same reference RNA sample multiple times. This leads to low, pure technical noise that is atypical of regular studies. To achieve a more realistic noise structure, we generated a RNA-sequencing mixture experiment using two cell lines of the same cancer type. Variability was added by extracting RNA from independent cell cultures and degrading particular samples. The systematic gene expression changes induced by this design allowed benchmarking of different library preparation kits (standard poly-A versus total RNA with Ribozero depletion) and analysis pipelines. Data generated using the total RNA kit had more signal for introns and various RNA classes (ncRNA, snRNA, snoRNA) and less variability after degradation. For differential expression analysis, voom with quality weights marginally outperformed other popular methods, while for differential splicing, DEXSeq was simultaneously the most sensitive and the most inconsistent method. For sample deconvolution analysis, DeMix outperformed IsoPure convincingly. Our RNA-sequencing data set provides a valuable resource for benchmarking different protocols and data pre-processing workflows. The extra noise mimics routine lab experiments more closely, ensuring any conclusions are widely applicable.
  • Item
    No Preview Available
    Germline heterozygous mutations in Nxf1 perturb RNA metabolism and trigger thrombocytopenia and lymphopenia in mice
    Chappaz, S ; Law, CW ; Dowling, MR ; Carey, KT ; Lane, RM ; Ngo, LH ; Wickramasinghe, VO ; Smyth, GK ; Ritchie, ME ; Kile, BT (ELSEVIER, 2020-04-14)
    In eukaryotic cells, messenger RNA (mRNA) molecules are exported from the nucleus to the cytoplasm, where they are translated. The highly conserved protein nuclear RNA export factor1 (Nxf1) is an important mediator of this process. Although studies in yeast and in human cell lines have shed light on the biochemical mechanisms of Nxf1 function, its contribution to mammalian physiology is less clear. Several groups have identified recurrent NXF1 mutations in chronic lymphocytic leukemia (CLL), placing it alongside several RNA-metabolism factors (including SF3B1, XPO, RPS15) whose dysregulation is thought to contribute to CLL pathogenesis. We report here an allelic series of germline point mutations in murine Nxf1. Mice heterozygous for these loss-of-function Nxf1 mutations exhibit thrombocytopenia and lymphopenia, together with milder hematological defects. This is primarily caused by cell-intrinsic defects in the survival of platelets and peripheral lymphocytes, which are sensitized to intrinsic apoptosis. In contrast, Nxf1 mutations have almost no effect on red blood cell homeostasis. Comparative transcriptome analysis of platelets, lymphocytes, and erythrocytes from Nxf1-mutant mice shows that, in response to impaired Nxf1 function, the cytoplasmic representation of transcripts encoding regulators of RNA metabolism is altered in a unique, lineage-specific way. Thus, blood cell lineages exhibit differential requirements for Nxf1-mediated global mRNA export.
  • Item
    Thumbnail Image
    Targeting triple-negative breast cancers with the Smac-mimetic birinapant
    Lalaoui, N ; Merino, D ; Giner, G ; Vaillant, F ; Chau, D ; Liu, L ; Kratina, T ; Pal, B ; Whittle, JR ; Etemadi, N ; Berthelet, J ; Grasel, J ; Hall, C ; Ritchie, ME ; Ernst, M ; Smyth, GK ; Vaux, DL ; Visvader, JE ; Lindeman, GJ ; Silke, J (Springer Nature, 2020-04-27)
    Smac mimetics target inhibitor of apoptosis (IAP) proteins, thereby suppressing their function to facilitate tumor cell death. Here we have evaluated the efficacy of the preclinical Smac-mimetic compound A and the clinical lead birinapant on breast cancer cells. Both exhibited potent in vitro activity in triple-negative breast cancer (TNBC) cells, including those from patient-derived xenograft (PDX) models. Birinapant was further studied using in vivo PDX models of TNBC and estrogen receptor-positive (ER+) breast cancer. Birinapant exhibited single agent activity in all TNBC PDX models and augmented response to docetaxel, the latter through induction of TNF. Transcriptomic analysis of TCGA datasets revealed that genes encoding mediators of Smac-mimetic-induced cell death were expressed at higher levels in TNBC compared with ER+ breast cancer, resulting in a molecular signature associated with responsiveness to Smac mimetics. In addition, the cell death complex was preferentially formed in TNBCs versus ER+ cells in response to Smac mimetics. Taken together, our findings provide a rationale for prospectively selecting patients whose breast tumors contain a competent death receptor signaling pathway for the further evaluation of birinapant in the clinic.
  • Item
    Thumbnail Image
    Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses
    Liu, R ; Holik, AZ ; Su, S ; Jansz, N ; Chen, K ; Leong, HS ; Blewitt, ME ; Asselin-Labat, M-L ; Smyth, GK ; Ritchie, ME (OXFORD UNIV PRESS, 2015-09-03)
    Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean-variance relationship of the log-counts-per-million using 'voom'. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source 'limma' package.