School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 4 of 4
  • Item
    Thumbnail Image
    A hierarchical approach to removal of unwanted variation for large-scale metabolomics data
    Kim, T ; Tang, O ; Vernon, ST ; Kott, KA ; Koay, YC ; Park, J ; James, DE ; Grieve, SM ; Speed, TP ; Yang, P ; Figtree, GA ; O'Sullivan, JF ; Yang, JYH (NATURE PORTFOLIO, 2021-08-17)
    Liquid chromatography-mass spectrometry-based metabolomics studies are increasingly applied to large population cohorts, which run for several weeks or even years in data acquisition. This inevitably introduces unwanted intra- and inter-batch variations over time that can overshadow true biological signals and thus hinder potential biological discoveries. To date, normalisation approaches have struggled to mitigate the variability introduced by technical factors whilst preserving biological variance, especially for protracted acquisitions. Here, we propose a study design framework with an arrangement for embedding biological sample replicates to quantify variance within and between batches and a workflow that uses these replicates to remove unwanted variation in a hierarchical manner (hRUV). We use this design to produce a dataset of more than 1000 human plasma samples run over an extended period of time. We demonstrate significant improvement of hRUV over existing methods in preserving biological signals whilst removing unwanted variation for large scale metabolomics studies. Our tools not only provide a strategy for large scale data normalisation, but also provides guidance on the design strategy for large omics studies.
  • Item
    Thumbnail Image
    Strategies to enable large-scale proteomics for reproducible research
    Poulos, RC ; Hains, PG ; Shah, R ; Lucas, N ; Xavier, D ; Manda, SS ; Anees, A ; Koh, JMS ; Mahboob, S ; Wittman, M ; Williams, SG ; Sykes, EK ; Hecker, M ; Dausmann, M ; Wouters, MA ; Ashman, K ; Yang, J ; Wild, PJ ; deFazio, A ; Balleine, RL ; Tully, B ; Aebersold, R ; Speed, TP ; Liu, Y ; Reddel, RR ; Robinson, PJ ; Zhong, Q (NATURE PORTFOLIO, 2020-07-30)
    Reproducible research is the bedrock of experimental science. To enable the deployment of large-scale proteomics, we assess the reproducibility of mass spectrometry (MS) over time and across instruments and develop computational methods for improving quantitative accuracy. We perform 1560 data independent acquisition (DIA)-MS runs of eight samples containing known proportions of ovarian and prostate cancer tissue and yeast, or control HEK293T cells. Replicates are run on six mass spectrometers operating continuously with varying maintenance schedules over four months, interspersed with ~5000 other runs. We utilise negative controls and replicates to remove unwanted variation and enhance biological signal, outperforming existing methods. We also design a method for reducing missing values. Integrating these computational modules into a pipeline (ProNorM), we mitigate variation among instruments over time and accurately predict tissue proportions. We demonstrate how to improve the quantitative analysis of large-scale DIA-MS data, providing a pathway toward clinical proteomics.
  • Item
    Thumbnail Image
    Controlling technical variation amongst 6693 patient microarrays of the randomized MINDACT trial
    Jacob, L ; Witteveen, A ; Beumer, I ; Delahaye, L ; Wehkamp, D ; van den Akker, J ; Snel, M ; Chan, B ; Floore, A ; Bakx, N ; Brink, G ; Poncet, C ; Bogaerts, J ; Delorenzi, M ; Piccart, M ; Rutgers, E ; Cardoso, F ; Speed, T ; van't Veer, L ; Glas, A (NATURE PUBLISHING GROUP, 2020-07-27)
    Gene expression data obtained in large studies hold great promises for discovering disease signatures or subtypes through data analysis. It is also prone to technical variation, whose removal is essential to avoid spurious discoveries. Because this variation is not always known and can be confounded with biological signals, its removal is a challenging task. Here we provide a step-wise procedure and comprehensive analysis of the MINDACT microarray dataset. The MINDACT trial enrolled 6693 breast cancer patients and prospectively validated the gene expression signature MammaPrint for outcome prediction. The study also yielded a full-transcriptome microarray for each tumor. We show for the first time in such a large dataset how technical variation can be removed while retaining expected biological signals. Because of its unprecedented size, we hope the resulting adjusted dataset will be an invaluable tool to discover or test gene expression signatures and to advance our understanding of breast cancer.
  • Item
    Thumbnail Image
    Multiple sclerosis risk variants regulate gene expression in innate and adaptive immune cells
    Gresle, MM ; Jordan, MA ; Stankovich, J ; Spelman, T ; Johnson, LJ ; Laverick, L ; Hamlett, A ; Smith, LD ; Jokubaitis, VG ; Baker, J ; Haartsen, J ; Taylor, B ; Charlesworth, J ; Bahlo, M ; Speed, TP ; Brown, MA ; Field, J ; Baxter, AG ; Butzkueven, H (LIFE SCIENCE ALLIANCE LLC, 2020-07)
    At least 200 single-nucleotide polymorphisms (SNPs) are associated with multiple sclerosis (MS) risk. A key function that could mediate SNP-encoded MS risk is their regulatory effects on gene expression. We performed microarrays using RNA extracted from purified immune cell types from 73 untreated MS cases and 97 healthy controls and then performed Cis expression quantitative trait loci mapping studies using additive linear models. We describe MS risk expression quantitative trait loci associations for 129 distinct genes. By extending these models to include an interaction term between genotype and phenotype, we identify MS risk SNPs with opposing effects on gene expression in cases compared with controls, namely, rs2256814 MYT1 in CD4 cells (q = 0.05) and rs12087340 RF00136 in monocyte cells (q = 0.04). The rs703842 SNP was also associated with a differential effect size on the expression of the METTL21B gene in CD8 cells of MS cases relative to controls (q = 0.03). Our study provides a detailed map of MS risk loci that function by regulating gene expression in cell types relevant to MS.