School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 36
  • Item
    Thumbnail Image
    Evaluating stably expressed genes in single cells
    Lin, Y ; Ghazanfar, S ; Strbenac, D ; Wang, A ; Patrick, E ; Lin, DM ; Speed, T ; Yang, JYH ; Yang, P (OXFORD UNIV PRESS, 2019-09)
    BACKGROUND: Single-cell RNA-seq (scRNA-seq) profiling has revealed remarkable variation in transcription, suggesting that expression of many genes at the single-cell level is intrinsically stochastic and noisy. Yet, on the cell population level, a subset of genes traditionally referred to as housekeeping genes (HKGs) are found to be stably expressed in different cell and tissue types. It is therefore critical to question whether stably expressed genes (SEGs) can be identified on the single-cell level, and if so, how can their expression stability be assessed? We have previously proposed a computational framework for ranking expression stability of genes in single cells for scRNA-seq data normalization and integration. In this study, we perform detailed evaluation and characterization of SEGs derived from this framework. RESULTS: Here, we show that gene expression stability indices derived from the early human and mouse development scRNA-seq datasets and the "Mouse Atlas" dataset are reproducible and conserved across species. We demonstrate that SEGs identified from single cells based on their stability indices are considerably more stable than HKGs defined previously from cell populations across diverse biological systems. Our analyses indicate that SEGs are inherently more stable at the single-cell level and their characteristics reminiscent of HKGs, suggesting their potential role in sustaining essential functions in individual cells. CONCLUSIONS: SEGs identified in this study have immediate utility both for understanding variation and stability of single-cell transcriptomes and for practical applications such as scRNA-seq data normalization. Our framework for calculating gene stability index, "scSEGIndex," is incorporated into the scMerge Bioconductor R package (https://sydneybiox.github.io/scMerge/reference/scSEGIndex.html) and can be used for identifying genes with stable expression in scRNA-seq datasets.
  • Item
    Thumbnail Image
    Accurate RNA Sequencing From Formalin-Fixed Cancer Tissue to Represent High-Quality Transcriptome From Frozen Tissue
    Li, J ; Fu, C ; Speed, TP ; Wang, W ; Symmans, WF (AMER SOC CLINICAL ONCOLOGY, 2018-01-26)
    PURPOSE: Accurate transcriptional sequencing (RNA-seq) from formalin-fixation and paraffin-embedding (FFPE) tumor samples presents an important challenge for translational research and diagnostic development. In addition, there are now several different protocols to prepare a sequencing library from total RNA. We evaluated the accuracy of RNA-seq data generated from FFPE samples in terms of expression profiling. METHODS: We designed a biospecimen study to directly compare gene expression results from different protocols to prepare libraries for RNA-seq from human breast cancer tissues, with randomization to fresh-frozen (FF) or FFPE conditions. The protocols were compared using multiple computational methods to assess alignment of reads to reference genome, and the uniformity and continuity of coverage; as well as the variance and correlation, of overall gene expression and patterns of measuring coding sequence, phenotypic patterns of gene expression, and measurements from representative multigene signatures. RESULTS: The principal determinant of variance in gene expression was use of exon capture probes, followed by the conditions of preservation (FF versus FFPE), and phenotypic differences between breast cancers. One protocol, with RNase H-based rRNA depletion, exhibited least variability of gene expression measurements, strongest correlation between FF and FFPE samples, and was generally representative of the transcriptome from standard FF RNA-seq protocols. CONCLUSION: Method of RNA-seq library preparation from FFPE samples had marked effect on the accuracy of gene expression measurement compared to matched FF samples. Nevertheless, some protocols produced highly concordant expression data from FFPE RNA-seq data, compared to RNA-seq results from matched frozen samples.
  • Item
    Thumbnail Image
    Evaluation of cross-platform and interlaboratory concordance via consensus modelling of genomic measurements
    Peters, TJ ; French, HJ ; Bradford, ST ; Pidsley, R ; Stirzaker, C ; Varinli, H ; Nair, S ; Qu, W ; Song, J ; Giles, KA ; Statham, AL ; Speirs, H ; Speed, TP ; Clark, SJ ; Hancock, J (OXFORD UNIV PRESS, 2019-02-15)
    MOTIVATION: A synoptic view of the human genome benefits chiefly from the application of nucleic acid sequencing and microarray technologies. These platforms allow interrogation of patterns such as gene expression and DNA methylation at the vast majority of canonical loci, allowing granular insights and opportunities for validation of original findings. However, problems arise when validating against a "gold standard" measurement, since this immediately biases all subsequent measurements towards that particular technology or protocol. Since all genomic measurements are estimates, in the absence of a "gold standard" we instead empirically assess the measurement precision and sensitivity of a large suite of genomic technologies via a consensus modelling method called the row-linear model. This method is an application of the American Society for Testing and Materials Standard E691 for assessing interlaboratory precision and sources of variability across multiple testing sites. Both cross-platform and cross-locus comparisons can be made across all common loci, allowing identification of technology- and locus-specific tendencies. RESULTS: We assess technologies including the Infinium MethylationEPIC BeadChip, whole genome bisulfite sequencing (WGBS), two different RNA-Seq protocols (PolyA+ and Ribo-Zero) and five different gene expression array platforms. Each technology thus is characterised herein, relative to the consensus. We showcase a number of applications of the row-linear model, including correlation with known interfering traits. We demonstrate a clear effect of cross-hybridisation on the sensitivity of Infinium methylation arrays. Additionally, we perform a true interlaboratory test on a set of samples interrogated on the same platform across twenty-one separate testing laboratories. AVAILABILITY AND IMPLEMENTATION: A full implementation of the row-linear model, plus extra functions for visualisation, are found in the R package consensus at https://github.com/timpeters82/consensus. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
  • Item
    Thumbnail Image
    The healthy ageing gene expression signature for Alzheimer's disease diagnosis: a random sampling perspective
    Jacob, L ; Speed, TP (BMC, 2018-07-25)
    In a recent publication, Sood et al. (Genome Biol 16:185, 2015) presented a set of 150 probe sets that could be used in the diagnosis of Alzheimer's disease (AD) based on gene expression. We reproduce some of their experiments and show that their signature is indeed able to discriminate between AD and control patients using blood gene expression in two cohorts. We also show that its performance does not stand out compared to randomly sampled sets of 150 probe sets from the same array.
  • Item
    Thumbnail Image
    G protein-linked signaling pathways in bipolar and major depressive disorders.
    Tomita, H ; Ziegler, ME ; Kim, HB ; Evans, SJ ; Choudary, PV ; Li, JZ ; Meng, F ; Dai, M ; Myers, RM ; Neal, CR ; Speed, TP ; Barchas, JD ; Schatzberg, AF ; Watson, SJ ; Akil, H ; Jones, EG ; Bunney, WE ; Vawter, MP (Frontiers Media SA, 2013)
    The G-protein linked signaling system (GPLS) comprises a large number of G-proteins, G protein-coupled receptors (GPCRs), GPCR ligands, and downstream effector molecules. G-proteins interact with both GPCRs and downstream effectors such as cyclic adenosine monophosphate (cAMP), phosphatidylinositols, and ion channels. The GPLS is implicated in the pathophysiology and pharmacology of both major depressive disorder (MDD) and bipolar disorder (BPD). This study evaluated whether GPLS is altered at the transcript level. The gene expression in the dorsolateral prefrontal (DLPFC) and anterior cingulate (ACC) were compared from MDD, BPD, and control subjects using Affymetrix Gene Chips and real time quantitative PCR. High quality brain tissue was used in the study to control for confounding effects of agonal events, tissue pH, RNA integrity, gender, and age. GPLS signaling transcripts were altered especially in the ACC of BPD and MDD subjects. Transcript levels of molecules which repress cAMP activity were increased in BPD and decreased in MDD. Two orphan GPCRs, GPRC5B and GPR37, showed significantly decreased expression levels in MDD, and significantly increased expression levels in BPD. Our results suggest opposite changes in BPD and MDD in the GPLS, "activated" cAMP signaling activity in BPD and "blunted" cAMP signaling activity in MDD. GPRC5B and GPR37 both appear to have behavioral effects, and are also candidate genes for neurodegenerative disorders. In the context of the opposite changes observed in BPD and MDD, these GPCRs warrant further study of their brain effects.
  • Item
    No Preview Available
    A Quartet of PIF bHLH Factors Provides a Transcriptionally Centered Signaling Hub That Regulates Seedling Morphogenesis through Differential Expression-Patterning of Shared Target Genes in Arabidopsis
    Zhang, Y ; Mayba, O ; Pfeiffer, A ; Shi, H ; Tepperman, JM ; Speed, TP ; Quail, PH ; Copenhaver, GP (PUBLIC LIBRARY SCIENCE, 2013-01)
    Dark-grown seedlings exhibit skotomorphogenic development. Genetic and molecular evidence indicates that a quartet of Arabidopsis Phytochrome (phy)-Interacting bHLH Factors (PIF1, 3, 4, and 5) are critically necessary to maintaining this developmental state and that light activation of phy induces a switch to photomorphogenic development by inducing rapid degradation of the PIFs. Here, using integrated ChIP-seq and RNA-seq analyses, we have identified genes that are direct targets of PIF3 transcriptional regulation, exerted by sequence-specific binding to G-box (CACGTG) or PBE-box (CACATG) motifs in the target promoters genome-wide. In addition, expression analysis of selected genes in this set, in all triple pif-mutant combinations, provides evidence that the PIF quartet members collaborate to generate an expression pattern that is the product of a mosaic of differential transcriptional responsiveness of individual genes to the different PIFs and of differential regulatory activity of individual PIFs toward the different genes. Together with prior evidence that all four PIFs can bind to G-boxes, the data suggest that this collective activity may be exerted via shared occupancy of binding sites in target promoters.
  • Item
    No Preview Available
    International network of cancer genome projects
    Hudson, TJ ; Anderson, W ; Aretz, A ; Barker, AD ; Bell, C ; Bernabe, RR ; Bhan, MK ; Calvo, F ; Eerola, I ; Gerhard, DS ; Guttmacher, A ; Guyer, M ; Hemsley, FM ; Jennings, JL ; Kerr, D ; Klatt, P ; Kolar, P ; Kusuda, J ; Lane, DP ; Laplace, F ; Lu, Y ; Nettekoven, G ; Ozenberger, B ; Peterson, J ; Rao, TS ; Remacle, J ; Schafer, AJ ; Shibata, T ; Stratton, MR ; Vockley, JG ; Watanabe, K ; Yang, H ; Yuen, MMF ; Knoppers, M ; Bobrow, M ; Cambon-Thomsen, A ; Dressler, LG ; Dyke, SOM ; Joly, Y ; Kato, K ; Kennedy, KL ; Nicolas, P ; Parker, MJ ; Rial-Sebbag, E ; Romeo-Casabona, CM ; Shaw, KM ; Wallace, S ; Wiesner, GL ; Zeps, N ; Lichter, P ; Biankin, AV ; Chabannon, C ; Chin, L ; Clement, B ; de Alava, E ; Degos, F ; Ferguson, ML ; Geary, P ; Hayes, DN ; Johns, AL ; Nakagawa, H ; Penny, R ; Piris, MA ; Sarin, R ; Scarpa, A ; van de Vijver, M ; Futreal, PA ; Aburatani, H ; Bayes, M ; Bowtell, DDL ; Campbell, PJ ; Estivill, X ; Grimmond, SM ; Gut, I ; Hirst, M ; Lopez-Otin, C ; Majumder, P ; Marra, M ; Ning, Z ; Puente, XS ; Ruan, Y ; Stunnenberg, HG ; Swerdlow, H ; Velculescu, VE ; Wilson, RK ; Xue, HH ; Yang, L ; Spellman, PT ; Bader, GD ; Boutros, PC ; Flicek, P ; Getz, G ; Guigo, R ; Guo, G ; Haussler, D ; Heath, S ; Hubbard, TJ ; Jiang, T ; Jones, SM ; Li, Q ; Lopez-Bigas, N ; Luo, R ; Pearson, JV ; Quesada, V ; Raphael, BJ ; Sander, C ; Speed, TP ; Stuart, JM ; Teague, JW ; Totoki, Y ; Tsunoda, T ; Valencia, A ; Wheeler, DA ; Wu, H ; Zhao, S ; Zhou, G ; Stein, LD ; Lathrop, M ; Ouellette, BFF ; Thomas, G ; Yoshida, T ; Axton, M ; Gunter, C ; McPherson, JD ; Miller, LJ ; Kasprzyk, A ; Zhang, J ; Haider, SA ; Wang, J ; Yung, CK ; Cross, A ; Liang, Y ; Gnaneshan, S ; Guberman, J ; Hsu, J ; Chalmers, DRC ; Hasel, KW ; Kaan, TSH ; Knoppers, BM ; Lowrance, WW ; Masui, T ; Rodriguez, LL ; Vergely, C ; Cloonan, N ; Defazio, A ; Eshleman, JR ; Etemadmoghadam, D ; Gardiner, BA ; Kench, JG ; Sutherland, RL ; Tempero, MA ; Waddell, NJ ; Wilson, PJ ; Gallinger, S ; Tsao, M-S ; Shaw, PA ; Petersen, GM ; Mukhopadhyay, D ; DePinho, RA ; Thayer, S ; Muthuswamy, L ; Shazand, K ; Beck, T ; Sam, M ; Timms, L ; Ballin, V ; Ji, J ; Zhang, X ; Chen, F ; Hu, X ; Yang, Q ; Tian, G ; Zhang, L ; Xing, X ; Li, X ; Zhu, Z ; Yu, Y ; Yu, J ; Tost, J ; Brennan, P ; Holcatova, I ; Zaridze, D ; Brazma, A ; Egevad, L ; Prokhortchouk, E ; Banks, RE ; Uhlen, M ; Viksna, J ; Ponten, F ; Skryabin, K ; Birney, E ; Borg, A ; Borresen-Dale, A-L ; Caldas, C ; Foekens, JA ; Martin, S ; Reis-Filho, JS ; Richardson, AL ; Sotiriou, C ; van't Veer, L ; Birnbaum, D ; Blanche, H ; Boucher, P ; Boyault, S ; Masson-Jacquemier, JD ; Pauporte, I ; Pivot, X ; Vincent-Salomon, A ; Tabone, E ; Theillet, C ; Treilleux, I ; Bioulac-Sage, P ; Decaens, T ; Franco, D ; Gut, M ; Samuel, D ; Zucman-Rossi, J ; Eils, R ; Brors, B ; Korbel, JO ; Korshunov, A ; Landgraf, P ; Lehrach, H ; Pfister, S ; Radlwimmer, B ; Reifenberger, G ; Taylor, MD ; von Kalle, C ; Majumder, PP ; Pederzoli, P ; Lawlor, RT ; Delledonne, M ; Bardelli, A ; Gress, T ; Klimstra, D ; Zamboni, G ; Nakamura, Y ; Miyano, S ; Fujimoto, A ; Campo, E ; de Sanjose, S ; Montserrat, E ; Gonzalez-Diaz, M ; Jares, P ; Himmelbaue, H ; Bea, S ; Aparicio, S ; Easton, DF ; Collins, FS ; Compton, CC ; Lander, ES ; Burke, W ; Green, AR ; Hamilton, SR ; Kallioniemi, OP ; Ley, TJ ; Liu, ET ; Wainwright, BJ (NATURE PORTFOLIO, 2010-04-15)
    The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of more than 25,000 cancer genomes at the genomic, epigenomic and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies.
  • Item
    Thumbnail Image
    Investigating and Correcting Plasma DNA Sequencing Coverage Bias to Enhance Aneuploidy Discovery
    Chandrananda, D ; Thorne, NP ; Ganesamoorthy, D ; Bruno, DL ; Benjamini, Y ; Speed, TP ; Slater, HR ; Bahlo, M ; Zhou, F (PUBLIC LIBRARY SCIENCE, 2014-01-29)
    Pregnant women carry a mixture of cell-free DNA fragments from self and fetus (non-self) in their circulation. In recent years multiple independent studies have demonstrated the ability to detect fetal trisomies such as trisomy 21, the cause of Down syndrome, by Next-Generation Sequencing of maternal plasma. The current clinical tests based on this approach show very high sensitivity and specificity, although as yet they have not become the standard diagnostic test. Here we describe improvements to the analysis of the sequencing data by reducing GC bias and better handling of the genomic repeats. We show substantial improvements in the sensitivity of the standard trisomy 21 statistical tests, which we measure by artificially reducing read coverage. We also explore the bias stemming from the natural cleavage of plasma DNA by examining DNA motifs and position specific base distributions. We propose a model to correct this fragmentation bias and observe that incorporating this bias does not lead to any further improvements in the detection of fetal trisomy. The improved bias corrections that we demonstrate in this work can be readily adopted into existing fetal trisomy detection protocols and should also lead to improvements in sub-chromosomal copy number variation detection.
  • Item
    Thumbnail Image
    Copy Number Variation in Patients with Disorders of Sex Development Due to 46,XY Gonadal Dysgenesis
    White, S ; Ohnesorg, T ; Notini, A ; Roeszler, K ; Hewitt, J ; Daggag, H ; Smith, C ; Turbitt, E ; Gustin, S ; van den Bergen, J ; Miles, D ; Western, P ; Arboleda, V ; Schumacher, V ; Gordon, L ; Bell, K ; Bengtsson, H ; Speed, T ; Hutson, J ; Warne, G ; Harley, V ; Koopman, P ; Vilain, E ; Sinclair, A ; Orban, L (PUBLIC LIBRARY SCIENCE, 2011-03-07)
    Disorders of sex development (DSD), ranging in severity from mild genital abnormalities to complete sex reversal, represent a major concern for patients and their families. DSD are often due to disruption of the genetic programs that regulate gonad development. Although some genes have been identified in these developmental pathways, the causative mutations have not been identified in more than 50% 46,XY DSD cases. We used the Affymetrix Genome-Wide Human SNP Array 6.0 to analyse copy number variation in 23 individuals with unexplained 46,XY DSD due to gonadal dysgenesis (GD). Here we describe three discrete changes in copy number that are the likely cause of the GD. Firstly, we identified a large duplication on the X chromosome that included DAX1 (NR0B1). Secondly, we identified a rearrangement that appears to affect a novel gonad-specific regulatory region in a known testis gene, SOX9. Surprisingly this patient lacked any signs of campomelic dysplasia, suggesting that the deletion affected expression of SOX9 only in the gonad. Functional analysis of potential SRY binding sites within this deleted region identified five putative enhancers, suggesting that sequences additional to the known SRY-binding TES enhancer influence human testis-specific SOX9 expression. Thirdly, we identified a small deletion immediately downstream of GATA4, supporting a role for GATA4 in gonad development in humans. These CNV analyses give new insights into the pathways involved in human gonad development and dysfunction, and suggest that rearrangements of non-coding sequences disturbing gene regulation may account for significant proportion of DSD cases.
  • Item
    Thumbnail Image
    A High Force of Plasmodium vivax Blood-Stage Infection Drives the Rapid Acquisition of Immunity in Papua New Guinean Children
    Koepfli, C ; Colborn, KL ; Kiniboro, B ; Lin, E ; Speed, TP ; Siba, PM ; Felger, I ; Mueller, I ; McCarthy, JS (PUBLIC LIBRARY SCIENCE, 2013-09)
    BACKGROUND: When both parasite species are co-endemic, Plasmodium vivax incidence peaks in younger children compared to P. falciparum. To identify differences in the number of blood stage infections of these species and its potential link to acquisition of immunity, we have estimated the molecular force of blood-stage infection of P. vivax ((mol)FOB, i.e. the number of genetically distinct blood-stage infections over time), and compared it to previously reported values for P. falciparum. METHODS: P. vivax (mol)FOB was estimated by high resolution genotyping parasites in samples collected over 16 months in a cohort of 264 Papua New Guinean children living in an area highly endemic for P. falciparum and P. vivax. In this cohort, P. vivax episodes decreased three-fold over the age range of 1-4.5 years. RESULTS: On average, children acquired 14.0 new P. vivax blood-stage clones/child/year-at-risk. While the incidence of clinical P. vivax illness was strongly associated with mol FOB (incidence rate ratio (IRR) = 1.99, 95% confidence interval (CI95) [1.80, 2.19]), (mol)FOB did not change with age. The incidence of P. vivax showed a faster decrease with age in children with high (IRR = 0.49, CI95 [0.38, 0.64] p<0.001) compared to those with low exposure (IRR = 0.63, CI95[0.43, 0.93] p = 0.02). CONCLUSION: P. vivax (mol)FOB is considerably higher than P. falciparum (mol)FOB (5.5 clones/child/year-at-risk). The high number of P. vivax clones that infect children in early childhood contribute to the rapid acquisition of immunity against clinical P. vivax malaria.