School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 3 of 3
  • Item
    Thumbnail Image
    Removing unwanted variation from large-scale RNA sequencing data with PRPS
    Molania, R ; Foroutan, M ; Gagnon-Bartsch, JA ; Gandolfo, LC ; Jain, A ; Sinha, A ; Olshansky, G ; Dobrovic, A ; Papenfuss, AT ; Speed, TP (NATURE PORTFOLIO, 2023-01)
    Accurate identification and effective removal of unwanted variation is essential to derive meaningful biological results from RNA sequencing (RNA-seq) data, especially when the data come from large and complex studies. Using RNA-seq data from The Cancer Genome Atlas (TCGA), we examined several sources of unwanted variation and demonstrate here how these can significantly compromise various downstream analyses, including cancer subtype identification, association between gene expression and survival outcomes and gene co-expression analysis. We propose a strategy, called pseudo-replicates of pseudo-samples (PRPS), for deploying our recently developed normalization method, called removing unwanted variation III (RUV-III), to remove the variation caused by library size, tumor purity and batch effects in TCGA RNA-seq data. We illustrate the value of our approach by comparing it to the standard TCGA normalizations on several TCGA RNA-seq datasets. RUV-III with PRPS can be used to integrate and normalize other large transcriptomic datasets coming from multiple laboratories or platforms.
  • Item
    Thumbnail Image
    A statistical framework for analyzing deep mutational scanning data
    Rubin, AF ; Gelman, H ; Lucas, N ; Bajjalieh, SM ; Papenfuss, AT ; Speed, TP ; Fowler, DM (BMC, 2017-08-07)
    Deep mutational scanning is a widely used method for multiplex measurement of functional consequences of protein variants. We developed a new deep mutational scanning statistical model that generates error estimates for each measurement, capturing both sampling error and consistency between replicates. We apply our model to one novel and five published datasets comprising 243,732 variants and demonstrate its superiority in removing noisy variants and conducting hypothesis testing. Simulations show our model applies to scans based on cell growth or binding and handles common experimental errors. We implemented our model in Enrich2, software that can empower researchers analyzing deep mutational scanning data.
  • Item
    Thumbnail Image
    A statistical framework for analyzing deep mutational scanning data (vol 18, 150, 2017)
    Rubin, AF ; Gelman, H ; Lucas, N ; Bajjalieh, SM ; Papenfuss, AT ; Speed, TP ; Fowler, DM (BIOMED CENTRAL LTD, 2018-02-07)
    After publication of our article [1] it was brought to our attention that a line of code was missing from our program to combine the within-replicate variance and between-replicate variance. This led to an overestimation of the standard errors calculated using the Enrich2 random-effects model.