School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 5 of 5
  • Item
    Thumbnail Image
    Removing unwanted variation from large-scale RNA sequencing data with PRPS
    Molania, R ; Foroutan, M ; Gagnon-Bartsch, JA ; Gandolfo, LC ; Jain, A ; Sinha, A ; Olshansky, G ; Dobrovic, A ; Papenfuss, AT ; Speed, TP (NATURE PORTFOLIO, 2023-01)
    Accurate identification and effective removal of unwanted variation is essential to derive meaningful biological results from RNA sequencing (RNA-seq) data, especially when the data come from large and complex studies. Using RNA-seq data from The Cancer Genome Atlas (TCGA), we examined several sources of unwanted variation and demonstrate here how these can significantly compromise various downstream analyses, including cancer subtype identification, association between gene expression and survival outcomes and gene co-expression analysis. We propose a strategy, called pseudo-replicates of pseudo-samples (PRPS), for deploying our recently developed normalization method, called removing unwanted variation III (RUV-III), to remove the variation caused by library size, tumor purity and batch effects in TCGA RNA-seq data. We illustrate the value of our approach by comparing it to the standard TCGA normalizations on several TCGA RNA-seq datasets. RUV-III with PRPS can be used to integrate and normalize other large transcriptomic datasets coming from multiple laboratories or platforms.
  • Item
    Thumbnail Image
    Empirical array quality weights in the analysis of microarray data
    Ritchie, ME ; Diyagama, D ; Neilson, J ; van Laar, R ; Dobrovic, A ; Holloway, A ; Smyth, GK (BMC, 2006-05-19)
    BACKGROUND: Assessment of array quality is an essential step in the analysis of data from microarray experiments. Once detected, less reliable arrays are typically excluded or "filtered" from further analysis to avoid misleading results. RESULTS: In this article, a graduated approach to array quality is considered based on empirical reproducibility of the gene expression measures from replicate arrays. Weights are assigned to each microarray by fitting a heteroscedastic linear model with shared array variance terms. A novel gene-by-gene update algorithm is used to efficiently estimate the array variances. The inverse variances are used as weights in the linear model analysis to identify differentially expressed genes. The method successfully assigns lower weights to less reproducible arrays from different experiments. Down-weighting the observations from suspect arrays increases the power to detect differential expression. In smaller experiments, this approach outperforms the usual method of filtering the data. The method is available in the limma software package which is implemented in the R software environment. CONCLUSION: This method complements existing normalisation and spot quality procedures, and allows poorer quality arrays, which would otherwise be discarded, to be included in an analysis. It is applicable to microarray data from experiments with some level of replication.
  • Item
    Thumbnail Image
    Assessment of DNA methylation profiling and copy number variation as indications of clonal relationship in ipsilateral and contralateral breast cancers to distinguish recurrent breast cancer from a second primary tumour
    Huang, KT ; Mikeska, T ; Li, J ; Takano, EA ; Millar, EKA ; Graham, PH ; Boyle, SE ; Campbell, IG ; Speed, TP ; Dobrovic, A ; Fox, SB (BMC, 2015-10-09)
    BACKGROUND: Patients with breast cancer have an increased risk of developing subsequent breast cancers. It is important to distinguish whether these tumours are de novo or recurrences of the primary tumour in order to guide the appropriate therapy. Our aim was to investigate the use of DNA methylation profiling and array comparative genomic hybridization (aCGH) to determine whether the second tumour is clonally related to the first tumour. METHODS: Methylation-sensitive high-resolution melting was used to screen promoter methylation in a panel of 13 genes reported as methylated in breast cancer (RASSF1A, TWIST1, APC, WIF1, MGMT, MAL, CDH13, RARβ, BRCA1, CDH1, CDKN2A, TP73, and GSTP1) in 29 tumour pairs (16 ipsilateral and 13 contralateral). Using the methylation profile of these genes, we employed a Bayesian and an empirical statistical approach to estimate clonal relationship. Copy number alterations were analysed using aCGH on the same set of tumour pairs. RESULTS: There is a higher probability of the second tumour being recurrent in ipsilateral tumours compared with contralateral tumours (38 % versus 8 %; p <0.05) based on the methylation profile. Using previously reported recurrence rates as Bayesian prior probabilities, we classified 69 % of ipsilateral and 15 % of contralateral tumours as recurrent. The inferred clonal relationship results of the tumour pairs were generally concordant between methylation profiling and aCGH. CONCLUSION: Our results show that DNA methylation profiling as well as aCGH have potential as diagnostic tools in improving the clinical decisions to differentiate recurrences from a second de novo tumour.
  • Item
    Thumbnail Image
    A new normalization for Nanostring nCounter gene expression data
    Molania, R ; Gagnon-Bartsch, JA ; Dobrovic, A ; Speed, TP (OXFORD UNIV PRESS, 2019-07-09)
    The Nanostring nCounter gene expression assay uses molecular barcodes and single molecule imaging to detect and count hundreds of unique transcripts in a single reaction. These counts need to be normalized to adjust for the amount of sample, variations in assay efficiency and other factors. Most users adopt the normalization approach described in the nSolver analysis software, which involves background correction based on the observed values of negative control probes, a within-sample normalization using the observed values of positive control probes and normalization across samples using reference (housekeeping) genes. Here we present a new normalization method, Removing Unwanted Variation-III (RUV-III), which makes vital use of technical replicates and suitable control genes. We also propose an approach using pseudo-replicates when technical replicates are not available. The effectiveness of RUV-III is illustrated on four different datasets. We also offer suggestions on the design and analysis of studies involving this technology.
  • Item
    Thumbnail Image
    Elevated levels of circulating mitochondrial DNA predict early allograft dysfunction in patients following liver transplantation
    Yoshino, O ; Wong, BKL ; Cox, DRA ; Lee, E ; Hepworth, G ; Christophi, C ; Jones, R ; Dobrovic, A ; Muralidharan, V ; Perini, M (WILEY, 2021-12)
    BACKGROUND AND AIM: The role of circulating mitochondrial DNA (cmtDNA) in transplantation remains to be elucidated. cmtDNA may be released into the circulation as a consequence of liver injury; yet recent work also suggests a causative role for cmtDNA leading to hepatocellular injury. We hypothesized that elevated cmtDNA would be associated with adverse events after liver transplantation (LT) and conducted an observational cohort study. METHODS: Twenty-one patients were enrolled prospectively prior to LT. RESULTS: Postoperative complications were observed in 47.6% (n = 10). Seven patients (33.3%) had early allograft dysfunction (EAD), and six patients (28.5%) experienced acute cellular rejection within 6 months of LT. cmtDNA levels were significantly elevated in all recipients after LT compared with healthy controls and preoperative samples (1 361 937 copies/mL [IQR 586 781-3 399 687] after LT; 545 531 copies/mL [IQR 238 562-1 381 015] before LT; and 194 562 copies/mL [IQR 182 359-231 515] in healthy controls) and returned to normal levels by 5 days after transplantation. cmtDNA levels were particularly elevated in those who developed EAD in the early postoperative period (P < 0.001). In all patients, there was initially a strong overall positive correlation between cmtDNA and plasma hepatocellular enzyme levels (P < 0.05). However, the patients with EAD demonstrated a second peak in cmtDNA at postoperative day 7, which did not correlate with liver function tests. CONCLUSIONS: The early release of plasma cmtDNA is strongly associated with hepatocellular damage; however, the late surge in cmtDNA in patients with EAD appeared to be independent of hepatocellular injury as measured by conventional tests.