School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 3 of 3
  • Item
    Thumbnail Image
    Removing unwanted variation from large-scale RNA sequencing data with PRPS
    Molania, R ; Foroutan, M ; Gagnon-Bartsch, JA ; Gandolfo, LC ; Jain, A ; Sinha, A ; Olshansky, G ; Dobrovic, A ; Papenfuss, AT ; Speed, TP (NATURE PORTFOLIO, 2023-01)
    Accurate identification and effective removal of unwanted variation is essential to derive meaningful biological results from RNA sequencing (RNA-seq) data, especially when the data come from large and complex studies. Using RNA-seq data from The Cancer Genome Atlas (TCGA), we examined several sources of unwanted variation and demonstrate here how these can significantly compromise various downstream analyses, including cancer subtype identification, association between gene expression and survival outcomes and gene co-expression analysis. We propose a strategy, called pseudo-replicates of pseudo-samples (PRPS), for deploying our recently developed normalization method, called removing unwanted variation III (RUV-III), to remove the variation caused by library size, tumor purity and batch effects in TCGA RNA-seq data. We illustrate the value of our approach by comparing it to the standard TCGA normalizations on several TCGA RNA-seq datasets. RUV-III with PRPS can be used to integrate and normalize other large transcriptomic datasets coming from multiple laboratories or platforms.
  • Item
    Thumbnail Image
    Assessment of DNA methylation profiling and copy number variation as indications of clonal relationship in ipsilateral and contralateral breast cancers to distinguish recurrent breast cancer from a second primary tumour
    Huang, KT ; Mikeska, T ; Li, J ; Takano, EA ; Millar, EKA ; Graham, PH ; Boyle, SE ; Campbell, IG ; Speed, TP ; Dobrovic, A ; Fox, SB (BMC, 2015-10-09)
    BACKGROUND: Patients with breast cancer have an increased risk of developing subsequent breast cancers. It is important to distinguish whether these tumours are de novo or recurrences of the primary tumour in order to guide the appropriate therapy. Our aim was to investigate the use of DNA methylation profiling and array comparative genomic hybridization (aCGH) to determine whether the second tumour is clonally related to the first tumour. METHODS: Methylation-sensitive high-resolution melting was used to screen promoter methylation in a panel of 13 genes reported as methylated in breast cancer (RASSF1A, TWIST1, APC, WIF1, MGMT, MAL, CDH13, RARβ, BRCA1, CDH1, CDKN2A, TP73, and GSTP1) in 29 tumour pairs (16 ipsilateral and 13 contralateral). Using the methylation profile of these genes, we employed a Bayesian and an empirical statistical approach to estimate clonal relationship. Copy number alterations were analysed using aCGH on the same set of tumour pairs. RESULTS: There is a higher probability of the second tumour being recurrent in ipsilateral tumours compared with contralateral tumours (38 % versus 8 %; p <0.05) based on the methylation profile. Using previously reported recurrence rates as Bayesian prior probabilities, we classified 69 % of ipsilateral and 15 % of contralateral tumours as recurrent. The inferred clonal relationship results of the tumour pairs were generally concordant between methylation profiling and aCGH. CONCLUSION: Our results show that DNA methylation profiling as well as aCGH have potential as diagnostic tools in improving the clinical decisions to differentiate recurrences from a second de novo tumour.
  • Item
    Thumbnail Image
    A new normalization for Nanostring nCounter gene expression data
    Molania, R ; Gagnon-Bartsch, JA ; Dobrovic, A ; Speed, TP (OXFORD UNIV PRESS, 2019-07-09)
    The Nanostring nCounter gene expression assay uses molecular barcodes and single molecule imaging to detect and count hundreds of unique transcripts in a single reaction. These counts need to be normalized to adjust for the amount of sample, variations in assay efficiency and other factors. Most users adopt the normalization approach described in the nSolver analysis software, which involves background correction based on the observed values of negative control probes, a within-sample normalization using the observed values of positive control probes and normalization across samples using reference (housekeeping) genes. Here we present a new normalization method, Removing Unwanted Variation-III (RUV-III), which makes vital use of technical replicates and suitable control genes. We also propose an approach using pseudo-replicates when technical replicates are not available. The effectiveness of RUV-III is illustrated on four different datasets. We also offer suggestions on the design and analysis of studies involving this technology.