School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 57
  • Item
    Thumbnail Image
    Evaluating stably expressed genes in single cells
    Lin, Y ; Ghazanfar, S ; Strbenac, D ; Wang, A ; Patrick, E ; Lin, DM ; Speed, T ; Yang, JYH ; Yang, P (OXFORD UNIV PRESS, 2019-09)
    BACKGROUND: Single-cell RNA-seq (scRNA-seq) profiling has revealed remarkable variation in transcription, suggesting that expression of many genes at the single-cell level is intrinsically stochastic and noisy. Yet, on the cell population level, a subset of genes traditionally referred to as housekeeping genes (HKGs) are found to be stably expressed in different cell and tissue types. It is therefore critical to question whether stably expressed genes (SEGs) can be identified on the single-cell level, and if so, how can their expression stability be assessed? We have previously proposed a computational framework for ranking expression stability of genes in single cells for scRNA-seq data normalization and integration. In this study, we perform detailed evaluation and characterization of SEGs derived from this framework. RESULTS: Here, we show that gene expression stability indices derived from the early human and mouse development scRNA-seq datasets and the "Mouse Atlas" dataset are reproducible and conserved across species. We demonstrate that SEGs identified from single cells based on their stability indices are considerably more stable than HKGs defined previously from cell populations across diverse biological systems. Our analyses indicate that SEGs are inherently more stable at the single-cell level and their characteristics reminiscent of HKGs, suggesting their potential role in sustaining essential functions in individual cells. CONCLUSIONS: SEGs identified in this study have immediate utility both for understanding variation and stability of single-cell transcriptomes and for practical applications such as scRNA-seq data normalization. Our framework for calculating gene stability index, "scSEGIndex," is incorporated into the scMerge Bioconductor R package (https://sydneybiox.github.io/scMerge/reference/scSEGIndex.html) and can be used for identifying genes with stable expression in scRNA-seq datasets.
  • Item
    Thumbnail Image
    Accurate RNA Sequencing From Formalin-Fixed Cancer Tissue to Represent High-Quality Transcriptome From Frozen Tissue
    Li, J ; Fu, C ; Speed, TP ; Wang, W ; Symmans, WF (AMER SOC CLINICAL ONCOLOGY, 2018-01-26)
    PURPOSE: Accurate transcriptional sequencing (RNA-seq) from formalin-fixation and paraffin-embedding (FFPE) tumor samples presents an important challenge for translational research and diagnostic development. In addition, there are now several different protocols to prepare a sequencing library from total RNA. We evaluated the accuracy of RNA-seq data generated from FFPE samples in terms of expression profiling. METHODS: We designed a biospecimen study to directly compare gene expression results from different protocols to prepare libraries for RNA-seq from human breast cancer tissues, with randomization to fresh-frozen (FF) or FFPE conditions. The protocols were compared using multiple computational methods to assess alignment of reads to reference genome, and the uniformity and continuity of coverage; as well as the variance and correlation, of overall gene expression and patterns of measuring coding sequence, phenotypic patterns of gene expression, and measurements from representative multigene signatures. RESULTS: The principal determinant of variance in gene expression was use of exon capture probes, followed by the conditions of preservation (FF versus FFPE), and phenotypic differences between breast cancers. One protocol, with RNase H-based rRNA depletion, exhibited least variability of gene expression measurements, strongest correlation between FF and FFPE samples, and was generally representative of the transcriptome from standard FF RNA-seq protocols. CONCLUSION: Method of RNA-seq library preparation from FFPE samples had marked effect on the accuracy of gene expression measurement compared to matched FF samples. Nevertheless, some protocols produced highly concordant expression data from FFPE RNA-seq data, compared to RNA-seq results from matched frozen samples.
  • Item
    Thumbnail Image
    Evaluation of cross-platform and interlaboratory concordance via consensus modelling of genomic measurements
    Peters, TJ ; French, HJ ; Bradford, ST ; Pidsley, R ; Stirzaker, C ; Varinli, H ; Nair, S ; Qu, W ; Song, J ; Giles, KA ; Statham, AL ; Speirs, H ; Speed, TP ; Clark, SJ ; Hancock, J (OXFORD UNIV PRESS, 2019-02-15)
    MOTIVATION: A synoptic view of the human genome benefits chiefly from the application of nucleic acid sequencing and microarray technologies. These platforms allow interrogation of patterns such as gene expression and DNA methylation at the vast majority of canonical loci, allowing granular insights and opportunities for validation of original findings. However, problems arise when validating against a "gold standard" measurement, since this immediately biases all subsequent measurements towards that particular technology or protocol. Since all genomic measurements are estimates, in the absence of a "gold standard" we instead empirically assess the measurement precision and sensitivity of a large suite of genomic technologies via a consensus modelling method called the row-linear model. This method is an application of the American Society for Testing and Materials Standard E691 for assessing interlaboratory precision and sources of variability across multiple testing sites. Both cross-platform and cross-locus comparisons can be made across all common loci, allowing identification of technology- and locus-specific tendencies. RESULTS: We assess technologies including the Infinium MethylationEPIC BeadChip, whole genome bisulfite sequencing (WGBS), two different RNA-Seq protocols (PolyA+ and Ribo-Zero) and five different gene expression array platforms. Each technology thus is characterised herein, relative to the consensus. We showcase a number of applications of the row-linear model, including correlation with known interfering traits. We demonstrate a clear effect of cross-hybridisation on the sensitivity of Infinium methylation arrays. Additionally, we perform a true interlaboratory test on a set of samples interrogated on the same platform across twenty-one separate testing laboratories. AVAILABILITY AND IMPLEMENTATION: A full implementation of the row-linear model, plus extra functions for visualisation, are found in the R package consensus at https://github.com/timpeters82/consensus. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
  • Item
    Thumbnail Image
    The healthy ageing gene expression signature for Alzheimer's disease diagnosis: a random sampling perspective
    Jacob, L ; Speed, TP (BMC, 2018-07-25)
    In a recent publication, Sood et al. (Genome Biol 16:185, 2015) presented a set of 150 probe sets that could be used in the diagnosis of Alzheimer's disease (AD) based on gene expression. We reproduce some of their experiments and show that their signature is indeed able to discriminate between AD and control patients using blood gene expression in two cohorts. We also show that its performance does not stand out compared to randomly sampled sets of 150 probe sets from the same array.
  • Item
    Thumbnail Image
    G protein-linked signaling pathways in bipolar and major depressive disorders.
    Tomita, H ; Ziegler, ME ; Kim, HB ; Evans, SJ ; Choudary, PV ; Li, JZ ; Meng, F ; Dai, M ; Myers, RM ; Neal, CR ; Speed, TP ; Barchas, JD ; Schatzberg, AF ; Watson, SJ ; Akil, H ; Jones, EG ; Bunney, WE ; Vawter, MP (Frontiers Media SA, 2013)
    The G-protein linked signaling system (GPLS) comprises a large number of G-proteins, G protein-coupled receptors (GPCRs), GPCR ligands, and downstream effector molecules. G-proteins interact with both GPCRs and downstream effectors such as cyclic adenosine monophosphate (cAMP), phosphatidylinositols, and ion channels. The GPLS is implicated in the pathophysiology and pharmacology of both major depressive disorder (MDD) and bipolar disorder (BPD). This study evaluated whether GPLS is altered at the transcript level. The gene expression in the dorsolateral prefrontal (DLPFC) and anterior cingulate (ACC) were compared from MDD, BPD, and control subjects using Affymetrix Gene Chips and real time quantitative PCR. High quality brain tissue was used in the study to control for confounding effects of agonal events, tissue pH, RNA integrity, gender, and age. GPLS signaling transcripts were altered especially in the ACC of BPD and MDD subjects. Transcript levels of molecules which repress cAMP activity were increased in BPD and decreased in MDD. Two orphan GPCRs, GPRC5B and GPR37, showed significantly decreased expression levels in MDD, and significantly increased expression levels in BPD. Our results suggest opposite changes in BPD and MDD in the GPLS, "activated" cAMP signaling activity in BPD and "blunted" cAMP signaling activity in MDD. GPRC5B and GPR37 both appear to have behavioral effects, and are also candidate genes for neurodegenerative disorders. In the context of the opposite changes observed in BPD and MDD, these GPCRs warrant further study of their brain effects.
  • Item
    No Preview Available
    Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data
    Dai, MH ; Wang, PL ; Boyd, AD ; Kostov, G ; Athey, B ; Jones, EG ; Bunney, WE ; Myers, RM ; Speed, TP ; Akil, H ; Watson, SJ ; Meng, F (OXFORD UNIV PRESS, 2005)
    Genome-wide expression profiling is a powerful tool for implicating novel gene ensembles in cellular mechanisms of health and disease. The most popular platform for genome-wide expression profiling is the Affymetrix GeneChip. However, its selection of probes relied on earlier genome and transcriptome annotation which is significantly different from current knowledge. The resultant informatics problems have a profound impact on analysis and interpretation the data. Here, we address these critical issues and offer a solution. We identified several classes of problems at the individual probe level in the existing annotation, under the assumption that current genome and transcriptome databases are more accurate than those used for GeneChip design. We then reorganized probes on more than a dozen popular GeneChips into gene-, transcript- and exon-specific probe sets in light of up-to-date genome, cDNA/EST clustering and single nucleotide polymorphism information. Comparing analysis results between the original and the redefined probe sets reveals approximately 30-50% discrepancy in the genes previously identified as differentially expressed, regardless of analysis method. Our results demonstrate that the original Affymetrix probe set definitions are inaccurate, and many conclusions derived from past GeneChip analyses may be significantly flawed. It will be beneficial to re-analyze existing GeneChip data with updated probe set definitions.
  • Item
    No Preview Available
    A Quartet of PIF bHLH Factors Provides a Transcriptionally Centered Signaling Hub That Regulates Seedling Morphogenesis through Differential Expression-Patterning of Shared Target Genes in Arabidopsis
    Zhang, Y ; Mayba, O ; Pfeiffer, A ; Shi, H ; Tepperman, JM ; Speed, TP ; Quail, PH ; Copenhaver, GP (PUBLIC LIBRARY SCIENCE, 2013-01)
    Dark-grown seedlings exhibit skotomorphogenic development. Genetic and molecular evidence indicates that a quartet of Arabidopsis Phytochrome (phy)-Interacting bHLH Factors (PIF1, 3, 4, and 5) are critically necessary to maintaining this developmental state and that light activation of phy induces a switch to photomorphogenic development by inducing rapid degradation of the PIFs. Here, using integrated ChIP-seq and RNA-seq analyses, we have identified genes that are direct targets of PIF3 transcriptional regulation, exerted by sequence-specific binding to G-box (CACGTG) or PBE-box (CACATG) motifs in the target promoters genome-wide. In addition, expression analysis of selected genes in this set, in all triple pif-mutant combinations, provides evidence that the PIF quartet members collaborate to generate an expression pattern that is the product of a mosaic of differential transcriptional responsiveness of individual genes to the different PIFs and of differential regulatory activity of individual PIFs toward the different genes. Together with prior evidence that all four PIFs can bind to G-boxes, the data suggest that this collective activity may be exerted via shared occupancy of binding sites in target promoters.
  • Item
    No Preview Available
    International network of cancer genome projects
    Hudson, TJ ; Anderson, W ; Aretz, A ; Barker, AD ; Bell, C ; Bernabe, RR ; Bhan, MK ; Calvo, F ; Eerola, I ; Gerhard, DS ; Guttmacher, A ; Guyer, M ; Hemsley, FM ; Jennings, JL ; Kerr, D ; Klatt, P ; Kolar, P ; Kusuda, J ; Lane, DP ; Laplace, F ; Lu, Y ; Nettekoven, G ; Ozenberger, B ; Peterson, J ; Rao, TS ; Remacle, J ; Schafer, AJ ; Shibata, T ; Stratton, MR ; Vockley, JG ; Watanabe, K ; Yang, H ; Yuen, MMF ; Knoppers, M ; Bobrow, M ; Cambon-Thomsen, A ; Dressler, LG ; Dyke, SOM ; Joly, Y ; Kato, K ; Kennedy, KL ; Nicolas, P ; Parker, MJ ; Rial-Sebbag, E ; Romeo-Casabona, CM ; Shaw, KM ; Wallace, S ; Wiesner, GL ; Zeps, N ; Lichter, P ; Biankin, AV ; Chabannon, C ; Chin, L ; Clement, B ; de Alava, E ; Degos, F ; Ferguson, ML ; Geary, P ; Hayes, DN ; Johns, AL ; Nakagawa, H ; Penny, R ; Piris, MA ; Sarin, R ; Scarpa, A ; van de Vijver, M ; Futreal, PA ; Aburatani, H ; Bayes, M ; Bowtell, DDL ; Campbell, PJ ; Estivill, X ; Grimmond, SM ; Gut, I ; Hirst, M ; Lopez-Otin, C ; Majumder, P ; Marra, M ; Ning, Z ; Puente, XS ; Ruan, Y ; Stunnenberg, HG ; Swerdlow, H ; Velculescu, VE ; Wilson, RK ; Xue, HH ; Yang, L ; Spellman, PT ; Bader, GD ; Boutros, PC ; Flicek, P ; Getz, G ; Guigo, R ; Guo, G ; Haussler, D ; Heath, S ; Hubbard, TJ ; Jiang, T ; Jones, SM ; Li, Q ; Lopez-Bigas, N ; Luo, R ; Pearson, JV ; Quesada, V ; Raphael, BJ ; Sander, C ; Speed, TP ; Stuart, JM ; Teague, JW ; Totoki, Y ; Tsunoda, T ; Valencia, A ; Wheeler, DA ; Wu, H ; Zhao, S ; Zhou, G ; Stein, LD ; Lathrop, M ; Ouellette, BFF ; Thomas, G ; Yoshida, T ; Axton, M ; Gunter, C ; McPherson, JD ; Miller, LJ ; Kasprzyk, A ; Zhang, J ; Haider, SA ; Wang, J ; Yung, CK ; Cross, A ; Liang, Y ; Gnaneshan, S ; Guberman, J ; Hsu, J ; Chalmers, DRC ; Hasel, KW ; Kaan, TSH ; Knoppers, BM ; Lowrance, WW ; Masui, T ; Rodriguez, LL ; Vergely, C ; Cloonan, N ; Defazio, A ; Eshleman, JR ; Etemadmoghadam, D ; Gardiner, BA ; Kench, JG ; Sutherland, RL ; Tempero, MA ; Waddell, NJ ; Wilson, PJ ; Gallinger, S ; Tsao, M-S ; Shaw, PA ; Petersen, GM ; Mukhopadhyay, D ; DePinho, RA ; Thayer, S ; Muthuswamy, L ; Shazand, K ; Beck, T ; Sam, M ; Timms, L ; Ballin, V ; Ji, J ; Zhang, X ; Chen, F ; Hu, X ; Yang, Q ; Tian, G ; Zhang, L ; Xing, X ; Li, X ; Zhu, Z ; Yu, Y ; Yu, J ; Tost, J ; Brennan, P ; Holcatova, I ; Zaridze, D ; Brazma, A ; Egevad, L ; Prokhortchouk, E ; Banks, RE ; Uhlen, M ; Viksna, J ; Ponten, F ; Skryabin, K ; Birney, E ; Borg, A ; Borresen-Dale, A-L ; Caldas, C ; Foekens, JA ; Martin, S ; Reis-Filho, JS ; Richardson, AL ; Sotiriou, C ; van't Veer, L ; Birnbaum, D ; Blanche, H ; Boucher, P ; Boyault, S ; Masson-Jacquemier, JD ; Pauporte, I ; Pivot, X ; Vincent-Salomon, A ; Tabone, E ; Theillet, C ; Treilleux, I ; Bioulac-Sage, P ; Decaens, T ; Franco, D ; Gut, M ; Samuel, D ; Zucman-Rossi, J ; Eils, R ; Brors, B ; Korbel, JO ; Korshunov, A ; Landgraf, P ; Lehrach, H ; Pfister, S ; Radlwimmer, B ; Reifenberger, G ; Taylor, MD ; von Kalle, C ; Majumder, PP ; Pederzoli, P ; Lawlor, RT ; Delledonne, M ; Bardelli, A ; Gress, T ; Klimstra, D ; Zamboni, G ; Nakamura, Y ; Miyano, S ; Fujimoto, A ; Campo, E ; de Sanjose, S ; Montserrat, E ; Gonzalez-Diaz, M ; Jares, P ; Himmelbaue, H ; Bea, S ; Aparicio, S ; Easton, DF ; Collins, FS ; Compton, CC ; Lander, ES ; Burke, W ; Green, AR ; Hamilton, SR ; Kallioniemi, OP ; Ley, TJ ; Liu, ET ; Wainwright, BJ (NATURE PORTFOLIO, 2010-04-15)
    The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of more than 25,000 cancer genomes at the genomic, epigenomic and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies.
  • Item
    Thumbnail Image
    Investigating and Correcting Plasma DNA Sequencing Coverage Bias to Enhance Aneuploidy Discovery
    Chandrananda, D ; Thorne, NP ; Ganesamoorthy, D ; Bruno, DL ; Benjamini, Y ; Speed, TP ; Slater, HR ; Bahlo, M ; Zhou, F (PUBLIC LIBRARY SCIENCE, 2014-01-29)
    Pregnant women carry a mixture of cell-free DNA fragments from self and fetus (non-self) in their circulation. In recent years multiple independent studies have demonstrated the ability to detect fetal trisomies such as trisomy 21, the cause of Down syndrome, by Next-Generation Sequencing of maternal plasma. The current clinical tests based on this approach show very high sensitivity and specificity, although as yet they have not become the standard diagnostic test. Here we describe improvements to the analysis of the sequencing data by reducing GC bias and better handling of the genomic repeats. We show substantial improvements in the sensitivity of the standard trisomy 21 statistical tests, which we measure by artificially reducing read coverage. We also explore the bias stemming from the natural cleavage of plasma DNA by examining DNA motifs and position specific base distributions. We propose a model to correct this fragmentation bias and observe that incorporating this bias does not lead to any further improvements in the detection of fetal trisomy. The improved bias corrections that we demonstrate in this work can be readily adopted into existing fetal trisomy detection protocols and should also lead to improvements in sub-chromosomal copy number variation detection.
  • Item
    Thumbnail Image
    Lineage-specific expansion of proteins exported to erythrocytes in malaria parasites
    Sargeant, TJ ; Marti, M ; Caler, E ; Carlton, JM ; Simpson, K ; Speed, TP ; Cowman, AF (BMC, 2006)
    BACKGROUND: The apicomplexan parasite Plasmodium falciparum causes the most severe form of malaria in humans. After invasion into erythrocytes, asexual parasite stages drastically alter their host cell and export remodeling and virulence proteins. Previously, we have reported identification and functional analysis of a short motif necessary for export of proteins out of the parasite and into the red blood cell. RESULTS: We have developed software for the prediction of exported proteins in the genus Plasmodium, and identified exported proteins conserved between malaria parasites infecting rodents and the two major causes of human malaria, P. falciparum and P. vivax. This conserved 'exportome' is confined to a few subtelomeric chromosomal regions in P. falciparum and the synteny of these and surrounding regions is conserved in P. vivax. We have identified a novel gene family PHIST (for Plasmodium helical interspersed subtelomeric family) that shares a unique domain with 72 paralogs in P. falciparum and 39 in P. vivax; however, there is only one member in each of the three species studied from the P. berghei lineage. CONCLUSION: These data suggest radiation of genes encoding remodeling and virulence factors from a small number of loci in a common Plasmodium ancestor, and imply a closer phylogenetic relationship between the P. vivax and P. falciparum lineages than previously believed. The presence of a conserved 'exportome' in the genus Plasmodium has important implications for our understanding of both common mechanisms and species-specific differences in host-parasite interactions, and may be crucial in developing novel antimalarial drugs to this infectious disease.