School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 14
  • Item
    Thumbnail Image
    Integrative analysis of RUNX1 downstream pathways and target genes
    Michaud, J ; Simpson, KM ; Escher, R ; Buchet-Poyau, K ; Beissbarth, T ; Carmichael, C ; Ritchie, ME ; Schuetz, F ; Cannon, P ; Liu, M ; Shen, X ; Ito, Y ; Raskind, WH ; Horwitz, MS ; Osato, M ; Turner, DR ; Speed, TP ; Kavallaris, M ; Smyth, GK ; Scott, HS (BMC, 2008-07-31)
    BACKGROUND: The RUNX1 transcription factor gene is frequently mutated in sporadic myeloid and lymphoid leukemia through translocation, point mutation or amplification. It is also responsible for a familial platelet disorder with predisposition to acute myeloid leukemia (FPD-AML). The disruption of the largely unknown biological pathways controlled by RUNX1 is likely to be responsible for the development of leukemia. We have used multiple microarray platforms and bioinformatic techniques to help identify these biological pathways to aid in the understanding of why RUNX1 mutations lead to leukemia. RESULTS: Here we report genes regulated either directly or indirectly by RUNX1 based on the study of gene expression profiles generated from 3 different human and mouse platforms. The platforms used were global gene expression profiling of: 1) cell lines with RUNX1 mutations from FPD-AML patients, 2) over-expression of RUNX1 and CBFbeta, and 3) Runx1 knockout mouse embryos using either cDNA or Affymetrix microarrays. We observe that our datasets (lists of differentially expressed genes) significantly correlate with published microarray data from sporadic AML patients with mutations in either RUNX1 or its cofactor, CBFbeta. A number of biological processes were identified among the differentially expressed genes and functional assays suggest that heterozygous RUNX1 point mutations in patients with FPD-AML impair cell proliferation, microtubule dynamics and possibly genetic stability. In addition, analysis of the regulatory regions of the differentially expressed genes has for the first time systematically identified numerous potential novel RUNX1 target genes. CONCLUSION: This work is the first large-scale study attempting to identify the genetic networks regulated by RUNX1, a master regulator in the development of the hematopoietic system and leukemia. The biological pathways and target genes controlled by RUNX1 will have considerable importance in disease progression in both familial and sporadic leukemia as well as therapeutic implications.
  • Item
    Thumbnail Image
    Differential splicing using whole-transcript microarrays
    Robinson, MD ; Speed, TP (BMC, 2009-05-22)
    BACKGROUND: The latest generation of Affymetrix microarrays are designed to interrogate expression over the entire length of every locus, thus giving the opportunity to study alternative splicing genome-wide. The Exon 1.0 ST (sense target) platform, with versions for Human, Mouse and Rat, is designed primarily to probe every known or predicted exon. The smaller Gene 1.0 ST array is designed as an expression microarray but still interrogates expression with probes along the full length of each well-characterized transcript. We explore the possibility of using the Gene 1.0 ST platform to identify differential splicing events. RESULTS: We propose a strategy to score differential splicing by using the auxiliary information from fitting the statistical model, RMA (robust multichip analysis). RMA partitions the probe-level data into probe effects and expression levels, operating robustly so that if a small number of probes behave differently than the rest, they are downweighted in the fitting step. We argue that adjacent poorly fitting probes for a given sample can be evidence of differential splicing and have designed a statistic to search for this behaviour. Using a public tissue panel dataset, we show many examples of tissue-specific alternative splicing. Furthermore, we show that evidence for putative alternative splicing has a strong correspondence between the Gene 1.0 ST and Exon 1.0 ST platforms. CONCLUSION: We propose a new approach, FIRMAGene, to search for differentially spliced genes using the Gene 1.0 ST platform. Such an analysis complements the search for differential expression. We validate the method by illustrating several known examples and we note some of the challenges in interpreting the probe-level data.Software implementing our methods is freely available as an R package.
  • Item
    Thumbnail Image
    Drug and Cell Type-Specific Regulation of Genes with Different Classes of Estrogen Receptor β-Selective Agonists
    Paruthiyil, S ; Cvoro, A ; Zhao, X ; Wu, Z ; Sui, Y ; Staub, RE ; Baggett, S ; Herber, CB ; Griffin, C ; Tagliaferri, M ; Harris, HA ; Cohen, I ; Bjeldanes, LF ; Speed, TP ; Schaufele, F ; Leitman, DC ; Laudet, V (PUBLIC LIBRARY SCIENCE, 2009-07-17)
    Estrogens produce biological effects by interacting with two estrogen receptors, ERalpha and ERbeta. Drugs that selectively target ERalpha or ERbeta might be safer for conditions that have been traditionally treated with non-selective estrogens. Several synthetic and natural ERbeta-selective compounds have been identified. One class of ERbeta-selective agonists is represented by ERB-041 (WAY-202041) which binds to ERbeta much greater than ERalpha. A second class of ERbeta-selective agonists derived from plants include MF101, nyasol and liquiritigenin that bind similarly to both ERs, but only activate transcription with ERbeta. Diarylpropionitrile represents a third class of ERbeta-selective compounds because its selectivity is due to a combination of greater binding to ERbeta and transcriptional activity. However, it is unclear if these three classes of ERbeta-selective compounds produce similar biological activities. The goals of these studies were to determine the relative ERbeta selectivity and pattern of gene expression of these three classes of ERbeta-selective compounds compared to estradiol (E(2)), which is a non-selective ER agonist. U2OS cells stably transfected with ERalpha or ERbeta were treated with E(2) or the ERbeta-selective compounds for 6 h. Microarray data demonstrated that ERB-041, MF101 and liquiritigenin were the most ERbeta-selective agonists compared to estradiol, followed by nyasol and then diarylpropionitrile. FRET analysis showed that all compounds induced a similar conformation of ERbeta, which is consistent with the finding that most genes regulated by the ERbeta-selective compounds were similar to each other and E(2). However, there were some classes of genes differentially regulated by the ERbeta agonists and E(2). Two ERbeta-selective compounds, MF101 and liquiritigenin had cell type-specific effects as they regulated different genes in HeLa, Caco-2 and Ishikawa cell lines expressing ERbeta. Our gene profiling studies demonstrate that while most of the genes were commonly regulated by ERbeta-selective agonists and E(2), there were some genes regulated that were distinct from each other and E(2), suggesting that different ERbeta-selective agonists might produce distinct biological and clinical effects.
  • Item
    Thumbnail Image
    Analysis of gene expression during neurite outgrowth and regeneration
    Szpara, ML ; Vranizan, K ; Tai, YC ; Goodman, CS ; Speed, TP ; Ngai, J (BMC, 2007-11-23)
    BACKGROUND: The ability of a neuron to regenerate functional connections after injury is influenced by both its intrinsic state and also by extrinsic cues in its surroundings. Investigations of the transcriptional changes undergone by neurons during in vivo models of injury and regeneration have revealed many transcripts associated with these processes. Because of the complex milieu of interactions in vivo, these results include not only expression changes directly related to regenerative outgrowth and but also unrelated responses to surrounding cells and signals. In vitro models of neurite outgrowth provide a means to study the intrinsic transcriptional patterns of neurite outgrowth in the absence of extensive extrinsic cues from nearby cells and tissues. RESULTS: We have undertaken a genome-wide study of transcriptional activity in embryonic superior cervical ganglia (SCG) and dorsal root ganglia (DRG) during a time course of neurite outgrowth in vitro. Gene expression observed in these models likely includes both developmental gene expression patterns and regenerative responses to axotomy, which occurs as the result of tissue dissection. Comparison across both models revealed many genes with similar gene expression patterns during neurite outgrowth. These patterns were minimally affected by exposure to the potent inhibitory cue Semaphorin3A, indicating that this extrinsic cue does not exert major effects at the level of nuclear transcription. We also compared our data to several published studies of DRG and SCG gene expression in animal models of regeneration, and found the expression of a large number of genes in common between neurite outgrowth in vitro and regeneration in vivo. CONCLUSION: Many gene expression changes undergone by SCG and DRG during in vitro outgrowth are shared between these two tissue types and in common with in vivo regeneration models. This suggests that the genes identified in this in vitro study may represent new candidates worthy of further study for potential roles in the therapeutic regrowth of neuronal connections.
  • Item
    Thumbnail Image
    FIRMA: a method for detection of alternative splicing from exon array data
    Purdom, E ; Simpson, KM ; Robinson, MD ; Conboy, JG ; Lapuk, AV ; Speed, TP (OXFORD UNIV PRESS, 2008-08-01)
    MOTIVATION: Analyses of EST data show that alternative splicing is much more widespread than once thought. The advent of exon and tiling microarrays means that researchers now have the capacity to experimentally measure alternative splicing on a genome wide level. New methods are needed to analyze the data from these arrays. RESULTS: We present a method, finding isoforms using robust multichip analysis (FIRMA), for detecting differential alternative splicing in exon array data. FIRMA has been developed for Affymetrix exon arrays, but could in principle be extended to other exon arrays, tiling arrays or splice junction arrays. We have evaluated the method using simulated data, and have also applied it to two datasets: a panel of 11 human tissues and a set of 10 pairs of matched normal and tumor colon tissue. FIRMA is able to detect exons in several genes confirmed by reverse transcriptase PCR. AVAILABILITY: R code implementing our methods is contributed to the package aroma.affymetrix.
  • Item
    Thumbnail Image
    A comparison of Affymetrix gene expression arrays
    Robinson, MD ; Speed, TP (BMC, 2007-11-15)
    BACKGROUND: Affymetrix GeneChips are an important tool in many facets of biological research. Recently, notable design changes to the chips have been made. In this study, we use publicly available data from Affymetrix to gauge the performance of three human gene expression arrays: Human Genome U133 Plus 2.0 (U133), Human Exon 1.0 ST (HuEx) and Human Gene 1.0 ST (HuGene). RESULTS: We studied probe-, exon- and gene-level reproducibility of technical and biological replicates from each of the 3 platforms. The U133 array has larger feature sizes so it is no surprise that probe-level variances are smaller, however the larger number of probes per gene on the HuGene array seems to produce gene-level summaries that have similar variances. The gene-level summaries of the HuEx array are less reproducible than the other two, despite having the largest average number of probes per gene. Greater than 80% of the content on the HuEx arrays is expressed at or near background. Biological variation seems to have a smaller effect on U133 data. Comparing the overlap of differentially expressed genes, we see a high overall concordance among all 3 platforms, with HuEx and HuGene having greater overlap, as expected given their design. We performed an analysis of detection rates and area under ROC curves using an experiment made up of several mixtures of 2 human tissues. Though it appears that the HuEx array has worse performance in terms of detection rates, all arrays have similar ability to separate differentially expressed and non-differentially expressed genes. CONCLUSION: Despite noticeable differences in the probe-level reproducibility, gene-level reproducibility and differential expression detection are quite similar across the three platforms. The HuEx array, an all-encompassing array, has the flexibility of measuring all known or predicted exonic content. However, the HuEx array induces poorer reproducibility for genes with fewer exons. The HuGene measures just the well-annotated genome content and appears to perform well. The U133 array, though not able to measure across the full length of a transcript, appears to perform as well as the newer designs on the set of genes common to all 3 platforms.
  • Item
    Thumbnail Image
    Global analyses of mRNA translational control during early Drosophila embryogenesis
    Qin, X ; Ahn, S ; Speed, TP ; Rubin, GM (BMC, 2007)
    BACKGROUND: In many animals, the first few hours of life proceed with little or no transcription, and developmental regulation at these early stages is dependent on maternal cytoplasm rather than the zygotic nucleus. Translational control is critical for early Drosophila embryogenesis and is exerted mainly at the gene level. To understand post-transcriptional regulation during Drosophila early embryonic development, we used sucrose polysomal gradient analyses and GeneChip analysis to illustrate the translation profile of individual mRNAs. RESULTS: We determined ribosomal density and ribosomal occupancy of over 10,000 transcripts during the first ten hours after egg laying. CONCLUSION: We report the extent and general nature of gene regulation at the translational level during early Drosophila embryogenesis on a genome-wide basis. The diversity of the translation profiles indicates multiple mechanisms modulating transcript-specific translation. Cluster analyses suggest that the genes involved in some biological processes are co-regulated at the translational level at certain developmental stages.
  • Item
    Thumbnail Image
    A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6
    Bengtsson, H ; Wirapati, P ; Speed, TP (OXFORD UNIV PRESS, 2009-09-01)
    MOTIVATION: High-resolution copy-number (CN) analysis has in recent years gained much attention, not only for the purpose of identifying CN aberrations associated with a certain phenotype, but also for identifying CN polymorphisms. In order for such studies to be successful and cost effective, the statistical methods have to be optimized. We propose a single-array preprocessing method for estimating full-resolution total CNs. It is applicable to all Affymetrix genotyping arrays, including the recent ones that also contain non-polymorphic probes. A reference signal is only needed at the last step when calculating relative CNs. RESULTS: As with our method for earlier generations of arrays, this one controls for allelic crosstalk, probe affinities and PCR fragment-length effects. Additionally, it also corrects for probe sequence effects and co-hybridization of fragments digested by multiple enzymes that takes place on the latest chips. We compare our method with Affymetrix's CN5 method and the dChip method by assessing how well they differentiate between various CN states at the full resolution and various amounts of smoothing. Although CRMA v2 is a single-array method, we observe that it performs as well as or better than alternative methods that use data from all arrays for their preprocessing. This shows that it is possible to do online analysis in large-scale projects where additional arrays are introduced over time.
  • Item
    Thumbnail Image
    Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm
    Li, X-Y ; MacArthur, S ; Bourgon, R ; Nix, D ; Pollard, DA ; Iyer, VN ; Hechmer, A ; Simirenko, L ; Stapleton, M ; Hendriks, CLL ; Chu, HC ; Ogawa, N ; Inwood, W ; Sementchenko, V ; Beaton, A ; Weiszmann, R ; Celniker, SE ; Knowles, DW ; Gingeras, T ; Speed, TP ; Eisen, MB ; Biggin, MD ; Kadonaga, J (PUBLIC LIBRARY SCIENCE, 2008-02)
    Identifying the genomic regions bound by sequence-specific regulatory factors is central both to deciphering the complex DNA cis-regulatory code that controls transcription in metazoans and to determining the range of genes that shape animal morphogenesis. We used whole-genome tiling arrays to map sequences bound in Drosophila melanogaster embryos by the six maternal and gap transcription factors that initiate anterior-posterior patterning. We find that these sequence-specific DNA binding proteins bind with quantitatively different specificities to highly overlapping sets of several thousand genomic regions in blastoderm embryos. Specific high- and moderate-affinity in vitro recognition sequences for each factor are enriched in bound regions. This enrichment, however, is not sufficient to explain the pattern of binding in vivo and varies in a context-dependent manner, demonstrating that higher-order rules must govern targeting of transcription factors. The more highly bound regions include all of the over 40 well-characterized enhancers known to respond to these factors as well as several hundred putative new cis-regulatory modules clustered near developmental regulators and other genes with patterned expression at this stage of embryogenesis. The new targets include most of the microRNAs (miRNAs) transcribed in the blastoderm, as well as all major zygotically transcribed dorsal-ventral patterning genes, whose expression we show to be quantitatively modulated by anterior-posterior factors. In addition to these highly bound regions, there are several thousand regions that are reproducibly bound at lower levels. However, these poorly bound regions are, collectively, far more distant from genes transcribed in the blastoderm than highly bound regions; are preferentially found in protein-coding sequences; and are less conserved than highly bound regions. Together these observations suggest that many of these poorly bound regions are not involved in early-embryonic transcriptional regulation, and a significant proportion may be nonfunctional. Surprisingly, for five of the six factors, their recognition sites are not unambiguously more constrained evolutionarily than the immediate flanking DNA, even in more highly bound and presumably functional regions, indicating that comparative DNA sequence analysis is limited in its ability to identify functional transcription factor targets.
  • Item
    Thumbnail Image
    A single-sample method for normalizing and combining full-resolution copy numbers from multiple platforms, labs and analysis methods
    Bengtsson, H ; Ray, A ; Spellman, P ; Speed, TP (OXFORD UNIV PRESS, 2009-04-01)
    MOTIVATION: The rapid expansion of whole-genome copy number (CN) studies brings a demand for increased precision and resolution of CN estimates. Recent studies have obtained CN estimates from more than one platform for the same set of samples, and it is natural to want to combine the different estimates in order to meet this demand. Estimates from different platforms show different degrees of attenuation of the true CN changes. Similar differences can be observed in CNs from the same platform run in different labs, or in the same lab, with different analytical methods. This is the reason why it is not straightforward to combine CN estimates from different sources (platforms, labs and analysis methods). RESULTS: We propose a single-sample multi source normalization that brings full-resolution CN estimates to the same scale across sources. The normalized CNs are such that for any underlying CN level, their mean level is the same regardless of the source, which make them better suited for being combined across sources, e.g. existing segmentation methods may be used to identify aberrant regions. We use microarray-based CN estimates from 'The Cancer Genome Atlas' (TCGA) project to illustrate and validate the method. We show that the normalized and combined data better separate two CN states at a given resolution. We conclude that it is possible to combine CNs from multiple sources such that the resolution becomes effectively larger, and when multiple platforms are combined, they also enhance the genome coverage by complementing each other in different regions. AVAILABILITY: A bounded-memory implementation is available in aroma.cn.