School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 31
  • Item
    No Preview Available
    Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data
    Dai, MH ; Wang, PL ; Boyd, AD ; Kostov, G ; Athey, B ; Jones, EG ; Bunney, WE ; Myers, RM ; Speed, TP ; Akil, H ; Watson, SJ ; Meng, F (OXFORD UNIV PRESS, 2005)
    Genome-wide expression profiling is a powerful tool for implicating novel gene ensembles in cellular mechanisms of health and disease. The most popular platform for genome-wide expression profiling is the Affymetrix GeneChip. However, its selection of probes relied on earlier genome and transcriptome annotation which is significantly different from current knowledge. The resultant informatics problems have a profound impact on analysis and interpretation the data. Here, we address these critical issues and offer a solution. We identified several classes of problems at the individual probe level in the existing annotation, under the assumption that current genome and transcriptome databases are more accurate than those used for GeneChip design. We then reorganized probes on more than a dozen popular GeneChips into gene-, transcript- and exon-specific probe sets in light of up-to-date genome, cDNA/EST clustering and single nucleotide polymorphism information. Comparing analysis results between the original and the redefined probe sets reveals approximately 30-50% discrepancy in the genes previously identified as differentially expressed, regardless of analysis method. Our results demonstrate that the original Affymetrix probe set definitions are inaccurate, and many conclusions derived from past GeneChip analyses may be significantly flawed. It will be beneficial to re-analyze existing GeneChip data with updated probe set definitions.
  • Item
    No Preview Available
    International network of cancer genome projects
    Hudson, TJ ; Anderson, W ; Aretz, A ; Barker, AD ; Bell, C ; Bernabe, RR ; Bhan, MK ; Calvo, F ; Eerola, I ; Gerhard, DS ; Guttmacher, A ; Guyer, M ; Hemsley, FM ; Jennings, JL ; Kerr, D ; Klatt, P ; Kolar, P ; Kusuda, J ; Lane, DP ; Laplace, F ; Lu, Y ; Nettekoven, G ; Ozenberger, B ; Peterson, J ; Rao, TS ; Remacle, J ; Schafer, AJ ; Shibata, T ; Stratton, MR ; Vockley, JG ; Watanabe, K ; Yang, H ; Yuen, MMF ; Knoppers, M ; Bobrow, M ; Cambon-Thomsen, A ; Dressler, LG ; Dyke, SOM ; Joly, Y ; Kato, K ; Kennedy, KL ; Nicolas, P ; Parker, MJ ; Rial-Sebbag, E ; Romeo-Casabona, CM ; Shaw, KM ; Wallace, S ; Wiesner, GL ; Zeps, N ; Lichter, P ; Biankin, AV ; Chabannon, C ; Chin, L ; Clement, B ; de Alava, E ; Degos, F ; Ferguson, ML ; Geary, P ; Hayes, DN ; Johns, AL ; Nakagawa, H ; Penny, R ; Piris, MA ; Sarin, R ; Scarpa, A ; van de Vijver, M ; Futreal, PA ; Aburatani, H ; Bayes, M ; Bowtell, DDL ; Campbell, PJ ; Estivill, X ; Grimmond, SM ; Gut, I ; Hirst, M ; Lopez-Otin, C ; Majumder, P ; Marra, M ; Ning, Z ; Puente, XS ; Ruan, Y ; Stunnenberg, HG ; Swerdlow, H ; Velculescu, VE ; Wilson, RK ; Xue, HH ; Yang, L ; Spellman, PT ; Bader, GD ; Boutros, PC ; Flicek, P ; Getz, G ; Guigo, R ; Guo, G ; Haussler, D ; Heath, S ; Hubbard, TJ ; Jiang, T ; Jones, SM ; Li, Q ; Lopez-Bigas, N ; Luo, R ; Pearson, JV ; Quesada, V ; Raphael, BJ ; Sander, C ; Speed, TP ; Stuart, JM ; Teague, JW ; Totoki, Y ; Tsunoda, T ; Valencia, A ; Wheeler, DA ; Wu, H ; Zhao, S ; Zhou, G ; Stein, LD ; Lathrop, M ; Ouellette, BFF ; Thomas, G ; Yoshida, T ; Axton, M ; Gunter, C ; McPherson, JD ; Miller, LJ ; Kasprzyk, A ; Zhang, J ; Haider, SA ; Wang, J ; Yung, CK ; Cross, A ; Liang, Y ; Gnaneshan, S ; Guberman, J ; Hsu, J ; Chalmers, DRC ; Hasel, KW ; Kaan, TSH ; Knoppers, BM ; Lowrance, WW ; Masui, T ; Rodriguez, LL ; Vergely, C ; Cloonan, N ; Defazio, A ; Eshleman, JR ; Etemadmoghadam, D ; Gardiner, BA ; Kench, JG ; Sutherland, RL ; Tempero, MA ; Waddell, NJ ; Wilson, PJ ; Gallinger, S ; Tsao, M-S ; Shaw, PA ; Petersen, GM ; Mukhopadhyay, D ; DePinho, RA ; Thayer, S ; Muthuswamy, L ; Shazand, K ; Beck, T ; Sam, M ; Timms, L ; Ballin, V ; Ji, J ; Zhang, X ; Chen, F ; Hu, X ; Yang, Q ; Tian, G ; Zhang, L ; Xing, X ; Li, X ; Zhu, Z ; Yu, Y ; Yu, J ; Tost, J ; Brennan, P ; Holcatova, I ; Zaridze, D ; Brazma, A ; Egevad, L ; Prokhortchouk, E ; Banks, RE ; Uhlen, M ; Viksna, J ; Ponten, F ; Skryabin, K ; Birney, E ; Borg, A ; Borresen-Dale, A-L ; Caldas, C ; Foekens, JA ; Martin, S ; Reis-Filho, JS ; Richardson, AL ; Sotiriou, C ; van't Veer, L ; Birnbaum, D ; Blanche, H ; Boucher, P ; Boyault, S ; Masson-Jacquemier, JD ; Pauporte, I ; Pivot, X ; Vincent-Salomon, A ; Tabone, E ; Theillet, C ; Treilleux, I ; Bioulac-Sage, P ; Decaens, T ; Franco, D ; Gut, M ; Samuel, D ; Zucman-Rossi, J ; Eils, R ; Brors, B ; Korbel, JO ; Korshunov, A ; Landgraf, P ; Lehrach, H ; Pfister, S ; Radlwimmer, B ; Reifenberger, G ; Taylor, MD ; von Kalle, C ; Majumder, PP ; Pederzoli, P ; Lawlor, RT ; Delledonne, M ; Bardelli, A ; Gress, T ; Klimstra, D ; Zamboni, G ; Nakamura, Y ; Miyano, S ; Fujimoto, A ; Campo, E ; de Sanjose, S ; Montserrat, E ; Gonzalez-Diaz, M ; Jares, P ; Himmelbaue, H ; Bea, S ; Aparicio, S ; Easton, DF ; Collins, FS ; Compton, CC ; Lander, ES ; Burke, W ; Green, AR ; Hamilton, SR ; Kallioniemi, OP ; Ley, TJ ; Liu, ET ; Wainwright, BJ (NATURE PORTFOLIO, 2010-04-15)
    The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of more than 25,000 cancer genomes at the genomic, epigenomic and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies.
  • Item
    Thumbnail Image
    Lineage-specific expansion of proteins exported to erythrocytes in malaria parasites
    Sargeant, TJ ; Marti, M ; Caler, E ; Carlton, JM ; Simpson, K ; Speed, TP ; Cowman, AF (BMC, 2006)
    BACKGROUND: The apicomplexan parasite Plasmodium falciparum causes the most severe form of malaria in humans. After invasion into erythrocytes, asexual parasite stages drastically alter their host cell and export remodeling and virulence proteins. Previously, we have reported identification and functional analysis of a short motif necessary for export of proteins out of the parasite and into the red blood cell. RESULTS: We have developed software for the prediction of exported proteins in the genus Plasmodium, and identified exported proteins conserved between malaria parasites infecting rodents and the two major causes of human malaria, P. falciparum and P. vivax. This conserved 'exportome' is confined to a few subtelomeric chromosomal regions in P. falciparum and the synteny of these and surrounding regions is conserved in P. vivax. We have identified a novel gene family PHIST (for Plasmodium helical interspersed subtelomeric family) that shares a unique domain with 72 paralogs in P. falciparum and 39 in P. vivax; however, there is only one member in each of the three species studied from the P. berghei lineage. CONCLUSION: These data suggest radiation of genes encoding remodeling and virulence factors from a small number of loci in a common Plasmodium ancestor, and imply a closer phylogenetic relationship between the P. vivax and P. falciparum lineages than previously believed. The presence of a conserved 'exportome' in the genus Plasmodium has important implications for our understanding of both common mechanisms and species-specific differences in host-parasite interactions, and may be crucial in developing novel antimalarial drugs to this infectious disease.
  • Item
    Thumbnail Image
    Integrative analysis of RUNX1 downstream pathways and target genes
    Michaud, J ; Simpson, KM ; Escher, R ; Buchet-Poyau, K ; Beissbarth, T ; Carmichael, C ; Ritchie, ME ; Schuetz, F ; Cannon, P ; Liu, M ; Shen, X ; Ito, Y ; Raskind, WH ; Horwitz, MS ; Osato, M ; Turner, DR ; Speed, TP ; Kavallaris, M ; Smyth, GK ; Scott, HS (BMC, 2008-07-31)
    BACKGROUND: The RUNX1 transcription factor gene is frequently mutated in sporadic myeloid and lymphoid leukemia through translocation, point mutation or amplification. It is also responsible for a familial platelet disorder with predisposition to acute myeloid leukemia (FPD-AML). The disruption of the largely unknown biological pathways controlled by RUNX1 is likely to be responsible for the development of leukemia. We have used multiple microarray platforms and bioinformatic techniques to help identify these biological pathways to aid in the understanding of why RUNX1 mutations lead to leukemia. RESULTS: Here we report genes regulated either directly or indirectly by RUNX1 based on the study of gene expression profiles generated from 3 different human and mouse platforms. The platforms used were global gene expression profiling of: 1) cell lines with RUNX1 mutations from FPD-AML patients, 2) over-expression of RUNX1 and CBFbeta, and 3) Runx1 knockout mouse embryos using either cDNA or Affymetrix microarrays. We observe that our datasets (lists of differentially expressed genes) significantly correlate with published microarray data from sporadic AML patients with mutations in either RUNX1 or its cofactor, CBFbeta. A number of biological processes were identified among the differentially expressed genes and functional assays suggest that heterozygous RUNX1 point mutations in patients with FPD-AML impair cell proliferation, microtubule dynamics and possibly genetic stability. In addition, analysis of the regulatory regions of the differentially expressed genes has for the first time systematically identified numerous potential novel RUNX1 target genes. CONCLUSION: This work is the first large-scale study attempting to identify the genetic networks regulated by RUNX1, a master regulator in the development of the hematopoietic system and leukemia. The biological pathways and target genes controlled by RUNX1 will have considerable importance in disease progression in both familial and sporadic leukemia as well as therapeutic implications.
  • Item
    Thumbnail Image
    Copy Number Variation in Patients with Disorders of Sex Development Due to 46,XY Gonadal Dysgenesis
    White, S ; Ohnesorg, T ; Notini, A ; Roeszler, K ; Hewitt, J ; Daggag, H ; Smith, C ; Turbitt, E ; Gustin, S ; van den Bergen, J ; Miles, D ; Western, P ; Arboleda, V ; Schumacher, V ; Gordon, L ; Bell, K ; Bengtsson, H ; Speed, T ; Hutson, J ; Warne, G ; Harley, V ; Koopman, P ; Vilain, E ; Sinclair, A ; Orban, L (PUBLIC LIBRARY SCIENCE, 2011-03-07)
    Disorders of sex development (DSD), ranging in severity from mild genital abnormalities to complete sex reversal, represent a major concern for patients and their families. DSD are often due to disruption of the genetic programs that regulate gonad development. Although some genes have been identified in these developmental pathways, the causative mutations have not been identified in more than 50% 46,XY DSD cases. We used the Affymetrix Genome-Wide Human SNP Array 6.0 to analyse copy number variation in 23 individuals with unexplained 46,XY DSD due to gonadal dysgenesis (GD). Here we describe three discrete changes in copy number that are the likely cause of the GD. Firstly, we identified a large duplication on the X chromosome that included DAX1 (NR0B1). Secondly, we identified a rearrangement that appears to affect a novel gonad-specific regulatory region in a known testis gene, SOX9. Surprisingly this patient lacked any signs of campomelic dysplasia, suggesting that the deletion affected expression of SOX9 only in the gonad. Functional analysis of potential SRY binding sites within this deleted region identified five putative enhancers, suggesting that sequences additional to the known SRY-binding TES enhancer influence human testis-specific SOX9 expression. Thirdly, we identified a small deletion immediately downstream of GATA4, supporting a role for GATA4 in gonad development in humans. These CNV analyses give new insights into the pathways involved in human gonad development and dysfunction, and suggest that rearrangements of non-coding sequences disturbing gene regulation may account for significant proportion of DSD cases.
  • Item
    Thumbnail Image
    Differential splicing using whole-transcript microarrays
    Robinson, MD ; Speed, TP (BMC, 2009-05-22)
    BACKGROUND: The latest generation of Affymetrix microarrays are designed to interrogate expression over the entire length of every locus, thus giving the opportunity to study alternative splicing genome-wide. The Exon 1.0 ST (sense target) platform, with versions for Human, Mouse and Rat, is designed primarily to probe every known or predicted exon. The smaller Gene 1.0 ST array is designed as an expression microarray but still interrogates expression with probes along the full length of each well-characterized transcript. We explore the possibility of using the Gene 1.0 ST platform to identify differential splicing events. RESULTS: We propose a strategy to score differential splicing by using the auxiliary information from fitting the statistical model, RMA (robust multichip analysis). RMA partitions the probe-level data into probe effects and expression levels, operating robustly so that if a small number of probes behave differently than the rest, they are downweighted in the fitting step. We argue that adjacent poorly fitting probes for a given sample can be evidence of differential splicing and have designed a statistic to search for this behaviour. Using a public tissue panel dataset, we show many examples of tissue-specific alternative splicing. Furthermore, we show that evidence for putative alternative splicing has a strong correspondence between the Gene 1.0 ST and Exon 1.0 ST platforms. CONCLUSION: We propose a new approach, FIRMAGene, to search for differentially spliced genes using the Gene 1.0 ST platform. Such an analysis complements the search for differential expression. We validate the method by illustrating several known examples and we note some of the challenges in interpreting the probe-level data.Software implementing our methods is freely available as an R package.
  • Item
    Thumbnail Image
    Evolution and comparative analysis of the MHC Class III inflammatory region
    Deakin, JE ; Papenfuss, AT ; Belov, K ; Cross, JGR ; Coggill, P ; Palmer, S ; Sims, S ; Speed, TP ; Beck, S ; Graves, JAM (BMC, 2006-11-02)
    BACKGROUND: The Major Histocompatibility Complex (MHC) is essential for immune function. Historically, it has been subdivided into three regions (Class I, II, and III), but a cluster of functionally related genes within the Class III region has also been referred to as the Class IV region or "inflammatory region". This group of genes is involved in the inflammatory response, and includes members of the tumour necrosis family. Here we report the sequencing, annotation and comparative analysis of a tammar wallaby BAC containing the inflammatory region. We also discuss the extent of sequence conservation across the entire region and identify elements conserved in evolution. RESULTS: Fourteen Class III genes from the tammar wallaby inflammatory region were characterised and compared to their orthologues in other vertebrates. The organisation and sequence of genes in the inflammatory region of both the wallaby and South American opossum are highly conserved compared to known genes from eutherian ("placental") mammals. Some minor differences separate the two marsupial species. Eight genes within the inflammatory region have remained tightly clustered for at least 360 million years, predating the divergence of the amphibian lineage. Analysis of sequence conservation identified 354 elements that are conserved. These range in size from 7 to 431 bases and cover 15.6% of the inflammatory region, representing approximately a 4-fold increase compared to the average for vertebrate genomes. About 5.5% of this conserved sequence is marsupial-specific, including three cases of marsupial-specific repeats. Highly Conserved Elements were also characterised. CONCLUSION: Using comparative analysis, we show that a cluster of MHC genes involved in inflammation, including TNF, LTA (or its putative teleost homolog TNF-N), APOM, and BAT3 have remained together for over 450 million years, predating the divergence of mammals from fish. The observed enrichment in conserved sequences within the inflammatory region suggests conservation at the transcriptional regulatory level, in addition to the functional level.
  • Item
    Thumbnail Image
    Proximal genomic localization of STATI binding and regulated transcriptional activity
    Wormald, S ; Hilton, DJ ; Smyth, GK ; Speed, TP (BMC, 2006-10-11)
    BACKGROUND: Signal transducer and activator of transcription (STAT) proteins are key regulators of gene expression in response to the interferon (IFN) family of anti-viral and anti-microbial cytokines. We have examined the genomic relationship between STAT1 binding and regulated transcription using multiple tiling microarray and chromatin immunoprecipitation microarray (ChIP-chip) experiments from public repositories. RESULTS: In response to IFN-gamma, STAT1 bound proximally to regions of the genome that exhibit regulated transcriptional activity. This finding was consistent between different tiling microarray platforms, and between different measures of transcriptional activity, including differential binding of RNA polymerase II, and differential mRNA transcription. Re-analysis of tiling microarray data from a recent study of IFN-gamma-induced STAT1 ChIP-chip and mRNA expression revealed that STAT1 binding is tightly associated with localized mRNA transcription in response to IFN-gamma. Close relationships were also apparent between STAT1 binding, STAT2 binding, and mRNA transcription in response to IFN-alpha. Furthermore, we found that sites of STAT1 binding within the Encyclopedia of DNA Elements (ENCODE) region are precisely correlated with sites of either enhanced or diminished binding by the RNA polymerase II complex. CONCLUSION: Together, our results indicate that STAT1 binds proximally to regions of the genome that exhibit regulated transcriptional activity. This finding establishes a generalized basis for the positioning of STAT1 binding sites within the genome, and supports a role for STAT1 in the direct recruitment of the RNA polymerase II complex to the promoters of IFN-gamma-responsive genes.
  • Item
    Thumbnail Image
    Estrogenic Plant Extracts Reverse Weight Gain and Fat Accumulation without Causing Mammary Gland or Uterine Proliferation
    Saunier, EF ; Vivar, OI ; Rubenstein, A ; Zhao, X ; Olshansky, M ; Baggett, S ; Staub, RE ; Tagliaferri, M ; Cohen, I ; Speed, TP ; Baxter, JD ; Leitman, DC ; Laudet, V (PUBLIC LIBRARY SCIENCE, 2011-12-07)
    Long-term estrogen deficiency increases the risk of obesity, diabetes and metabolic syndrome in postmenopausal women. Menopausal hormone therapy containing estrogens might prevent these conditions, but its prolonged use increases the risk of breast cancer, as wells as endometrial cancer if used without progestins. Animal studies indicate that beneficial effects of estrogens in adipose tissue and adverse effects on mammary gland and uterus are mediated by estrogen receptor alpha (ERα). One strategy to improve the safety of estrogens to prevent/treat obesity, diabetes and metabolic syndrome is to develop estrogens that act as agonists in adipose tissue, but not in mammary gland and uterus. We considered plant extracts, which have been the source of many pharmaceuticals, as a source of tissue selective estrogens. Extracts from two plants, Glycyrrhiza uralensis (RG) and Pueraria montana var. lobata (RP) bound to ERα, activated ERα responsive reporters, and reversed weight gain and fat accumulation comparable to estradiol in ovariectomized obese mice maintained on a high fat diet. Unlike estradiol, RG and RP did not induce proliferative effects on mammary gland and uterus. Gene expression profiling demonstrated that RG and RP induced estradiol-like regulation of genes in abdominal fat, but not in mammary gland and uterus. The compounds in extracts from RG and RP might constitute a new class of tissue selective estrogens to reverse weight gain, fat accumulation and metabolic syndrome in postmenopausal women.
  • Item
    Thumbnail Image
    Rooting a phylogenetic tree with nonreversible substitution models
    Yap, VB ; Speed, T (BMC, 2005-01-04)
    BACKGROUND: We compared two methods of rooting a phylogenetic tree: the stationary and the nonstationary substitution processes. These methods do not require an outgroup. METHODS: Given a multiple alignment and an unrooted tree, the maximum likelihood estimates of branch lengths and substitution parameters for each associated rooted tree are found; rooted trees are compared using their likelihood values. Site variation in substitution rates is handled by assigning sites into several classes before the analysis. RESULTS: In three test datasets where the trees are small and the roots are assumed known, the nonstationary process gets the correct estimate significantly more often, and fits data much better, than the stationary process. Both processes give biologically plausible root placements in a set of nine primate mitochondrial DNA sequences. CONCLUSIONS: The nonstationary process is simple to use and is much better than the stationary process at inferring the root. It could be useful for situations where an outgroup is unavailable.