School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 4 of 4
  • Item
    Thumbnail Image
    A guide to creating design matrices for gene expression experiments.
    Law, CW ; Zeglinski, K ; Dong, X ; Alhamdoosh, M ; Smyth, GK ; Ritchie, ME (F1000 Research Ltd, 2020)
    Differential expression analysis of genomic data types, such as RNA-sequencing experiments, use linear models to determine the size and direction of the changes in gene expression. For RNA-sequencing, there are several established software packages for this purpose accompanied with analysis pipelines that are well described. However, there are two crucial steps in the analysis process that can be a stumbling block for many -- the set up an appropriate model via design matrices and the set up of comparisons of interest via contrast matrices. These steps are particularly troublesome because an extensive catalogue for design and contrast matrices does not currently exist. One would usually search for example case studies across different platforms and mix and match the advice from those sources to suit the dataset they have at hand. This article guides the reader through the basics of how to set up design and contrast matrices. We take a practical approach by providing code and graphical representation of each case study, starting with simpler examples (e.g. models with a single explanatory variable) and move onto more complex ones (e.g. interaction models, mixed effects models, higher order time series and cyclical models). Although our work has been written specifically with a limma-style pipeline in mind, most of it is also applicable to other software packages for differential expression analysis, and the ideas covered can be adapted to data analysis of other high-throughput technologies. Where appropriate, we explain the interpretation and differences between models to aid readers in their own model choices. Unnecessary jargon and theory is omitted where possible so that our work is accessible to a wide audience of readers, from beginners to those with experience in genomics data analysis.
  • Item
    Thumbnail Image
    RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR.
    Law, CW ; Alhamdoosh, M ; Su, S ; Dong, X ; Tian, L ; Smyth, GK ; Ritchie, ME (F1000 Research Ltd, 2016)
    The ability to easily and efficiently analyse RNA-sequencing data is a key strength of the Bioconductor project. Starting with counts summarised at the gene-level, a typical analysis involves pre-processing, exploratory data analysis, differential expression testing and pathway analysis with the results obtained informing future experiments and validation studies. In this workflow article, we analyse RNA-sequencing data from the mouse mammary gland, demonstrating use of the popular edgeR package to import, organise, filter and normalise the data, followed by the limma package with its voom method, linear modelling and empirical Bayes moderation to assess differential expression and perform gene set testing. This pipeline is further enhanced by the Glimma package which enables interactive exploration of the results so that individual samples and genes can be examined by the user. The complete analysis offered by these three packages highlights the ease with which researchers can turn the raw counts from an RNA-sequencing experiment into biological insights using Bioconductor.
  • Item
    Thumbnail Image
    RNA-seq mixology: designing realistic control experiments to compare protocols and analysis methods
    Holik, AZ ; Law, CW ; Liu, R ; Wang, Z ; Wang, W ; Ahn, J ; Asselin-Labat, M-L ; Smyth, GK ; Ritchie, ME (OXFORD UNIV PRESS, 2017-03-17)
    Carefully designed control experiments provide a gold standard for benchmarking different genomics research tools. A shortcoming of many gene expression control studies is that replication involves profiling the same reference RNA sample multiple times. This leads to low, pure technical noise that is atypical of regular studies. To achieve a more realistic noise structure, we generated a RNA-sequencing mixture experiment using two cell lines of the same cancer type. Variability was added by extracting RNA from independent cell cultures and degrading particular samples. The systematic gene expression changes induced by this design allowed benchmarking of different library preparation kits (standard poly-A versus total RNA with Ribozero depletion) and analysis pipelines. Data generated using the total RNA kit had more signal for introns and various RNA classes (ncRNA, snRNA, snoRNA) and less variability after degradation. For differential expression analysis, voom with quality weights marginally outperformed other popular methods, while for differential splicing, DEXSeq was simultaneously the most sensitive and the most inconsistent method. For sample deconvolution analysis, DeMix outperformed IsoPure convincingly. Our RNA-sequencing data set provides a valuable resource for benchmarking different protocols and data pre-processing workflows. The extra noise mimics routine lab experiments more closely, ensuring any conclusions are widely applicable.
  • Item
    No Preview Available
    Germline heterozygous mutations in Nxf1 perturb RNA metabolism and trigger thrombocytopenia and lymphopenia in mice
    Chappaz, S ; Law, CW ; Dowling, MR ; Carey, KT ; Lane, RM ; Ngo, LH ; Wickramasinghe, VO ; Smyth, GK ; Ritchie, ME ; Kile, BT (ELSEVIER, 2020-04-14)
    In eukaryotic cells, messenger RNA (mRNA) molecules are exported from the nucleus to the cytoplasm, where they are translated. The highly conserved protein nuclear RNA export factor1 (Nxf1) is an important mediator of this process. Although studies in yeast and in human cell lines have shed light on the biochemical mechanisms of Nxf1 function, its contribution to mammalian physiology is less clear. Several groups have identified recurrent NXF1 mutations in chronic lymphocytic leukemia (CLL), placing it alongside several RNA-metabolism factors (including SF3B1, XPO, RPS15) whose dysregulation is thought to contribute to CLL pathogenesis. We report here an allelic series of germline point mutations in murine Nxf1. Mice heterozygous for these loss-of-function Nxf1 mutations exhibit thrombocytopenia and lymphopenia, together with milder hematological defects. This is primarily caused by cell-intrinsic defects in the survival of platelets and peripheral lymphocytes, which are sensitized to intrinsic apoptosis. In contrast, Nxf1 mutations have almost no effect on red blood cell homeostasis. Comparative transcriptome analysis of platelets, lymphocytes, and erythrocytes from Nxf1-mutant mice shows that, in response to impaired Nxf1 function, the cytoplasmic representation of transcripts encoding regulators of RNA metabolism is altered in a unique, lineage-specific way. Thus, blood cell lineages exhibit differential requirements for Nxf1-mediated global mRNA export.