School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 13
  • Item
    Thumbnail Image
    Polycomb repressive complex 2 (PRC2) restricts hematopoietic stem cell activity
    Majewski, IJ ; Blewitt, ME ; de Graaf, CA ; McManus, EJ ; Bahlo, M ; Hilton, AA ; Hyland, CD ; Smyth, GK ; Corbin, JE ; Metcalf, D ; Alexander, WS ; Hilton, DJ ; Goodell, MA (PUBLIC LIBRARY SCIENCE, 2008-04)
    Polycomb group proteins are transcriptional repressors that play a central role in the establishment and maintenance of gene expression patterns during development. Using mice with an N-ethyl-N-nitrosourea (ENU)-induced mutation in Suppressor of Zeste 12 (Suz12), a core component of Polycomb Repressive Complex 2 (PRC2), we show here that loss of Suz12 function enhances hematopoietic stem cell (HSC) activity. In addition to these effects on a wild-type genetic background, mutations in Suz12 are sufficient to ameliorate the stem cell defect and thrombocytopenia present in mice that lack the thrombopoietin receptor (c-Mpl). To investigate the molecular targets of the PRC2 complex in the HSC compartment, we examined changes in global patterns of gene expression in cells deficient in Suz12. We identified a distinct set of genes that are regulated by Suz12 in hematopoietic cells, including eight genes that appear to be highly responsive to PRC2 function within this compartment. These data suggest that PRC2 is required to maintain a specific gene expression pattern in hematopoiesis that is indispensable to normal stem cell function.
  • Item
    Thumbnail Image
    Statistical analysis of an RNA titration series evaluates microarray precision and sensitivity on a whole-array basis
    Holloway, AJ ; Oshlack, A ; Diyagama, DS ; Bowtell, DDL ; Smyth, GK (BMC, 2006-11-22)
    BACKGROUND: Concerns are often raised about the accuracy of microarray technologies and the degree of cross-platform agreement, but there are yet no methods which can unambiguously evaluate precision and sensitivity for these technologies on a whole-array basis. RESULTS: A methodology is described for evaluating the precision and sensitivity of whole-genome gene expression technologies such as microarrays. The method consists of an easy-to-construct titration series of RNA samples and an associated statistical analysis using non-linear regression. The method evaluates the precision and responsiveness of each microarray platform on a whole-array basis, i.e., using all the probes, without the need to match probes across platforms. An experiment is conducted to assess and compare four widely used microarray platforms. All four platforms are shown to have satisfactory precision but the commercial platforms are superior for resolving differential expression for genes at lower expression levels. The effective precision of the two-color platforms is improved by allowing for probe-specific dye-effects in the statistical model. The methodology is used to compare three data extraction algorithms for the Affymetrix platforms, demonstrating poor performance for the commonly used proprietary algorithm relative to the other algorithms. For probes which can be matched across platforms, the cross-platform variability is decomposed into within-platform and between-platform components, showing that platform disagreement is almost entirely systematic rather than due to measurement variability. CONCLUSION: The results demonstrate good precision and sensitivity for all the platforms, but highlight the need for improved probe annotation. They quantify the extent to which cross-platform measures can be expected to be less accurate than within-platform comparisons for predicting disease progression or outcome.
  • Item
    Thumbnail Image
    Empirical array quality weights in the analysis of microarray data
    Ritchie, ME ; Diyagama, D ; Neilson, J ; van Laar, R ; Dobrovic, A ; Holloway, A ; Smyth, GK (BMC, 2006-05-19)
    BACKGROUND: Assessment of array quality is an essential step in the analysis of data from microarray experiments. Once detected, less reliable arrays are typically excluded or "filtered" from further analysis to avoid misleading results. RESULTS: In this article, a graduated approach to array quality is considered based on empirical reproducibility of the gene expression measures from replicate arrays. Weights are assigned to each microarray by fitting a heteroscedastic linear model with shared array variance terms. A novel gene-by-gene update algorithm is used to efficiently estimate the array variances. The inverse variances are used as weights in the linear model analysis to identify differentially expressed genes. The method successfully assigns lower weights to less reproducible arrays from different experiments. Down-weighting the observations from suspect arrays increases the power to detect differential expression. In smaller experiments, this approach outperforms the usual method of filtering the data. The method is available in the limma software package which is implemented in the R software environment. CONCLUSION: This method complements existing normalisation and spot quality procedures, and allows poorer quality arrays, which would otherwise be discarded, to be included in an analysis. It is applicable to microarray data from experiments with some level of replication.
  • Item
    Thumbnail Image
    Molecular networks involved in mouse cerebral corticogenesis and spatio-temporal regulation of Sox4 and Sox11 novel antisense transcripts revealed by transcriptome profiling
    Ling, K-H ; Hewitt, CA ; Beissbarth, T ; Hyde, L ; Banerjee, K ; Cheah, P-S ; Cannon, PZ ; Hahn, CN ; Thomas, PQ ; Smyth, GK ; Tan, S-S ; Thomas, T ; Scott, HS (BMC, 2009)
    BACKGROUND: Development of the cerebral cortex requires highly specific spatio-temporal regulation of gene expression. It is proposed that transcriptome profiling of the cerebral cortex at various developmental time points or regions will reveal candidate genes and associated molecular pathways involved in cerebral corticogenesis. RESULTS: Serial analysis of gene expression (SAGE) libraries were constructed from C57BL/6 mouse cerebral cortices of age embryonic day (E) 15.5, E17.5, postnatal day (P) 1.5 and 4 to 6 months. Hierarchical clustering analysis of 561 differentially expressed transcripts showed regionalized, stage-specific and co-regulated expression profiles. SAGE expression profiles of 70 differentially expressed transcripts were validated using quantitative RT-PCR assays. Ingenuity pathway analyses of validated differentially expressed transcripts demonstrated that these transcripts possess distinctive functional properties related to various stages of cerebral corticogenesis and human neurological disorders. Genomic clustering analysis of the differentially expressed transcripts identified two highly transcribed genomic loci, Sox4 and Sox11, during embryonic cerebral corticogenesis. These loci feature unusual overlapping sense and antisense transcripts with alternative polyadenylation sites and differential expression. The Sox4 and Sox11 antisense transcripts were highly expressed in the brain compared to other mouse organs and are differentially expressed in both the proliferating and differentiating neural stem/progenitor cells and P19 (embryonal carcinoma) cells. CONCLUSIONS: We report validated gene expression profiles that have implications for understanding the associations between differentially expressed transcripts, novel targets and related disorders pertaining to cerebral corticogenesis. The study reports, for the first time, spatio-temporally regulated Sox4 and Sox11 antisense transcripts in the brain, neural stem/progenitor cells and P19 cells, suggesting they have an important role in cerebral corticogenesis and neuronal/glial cell differentiation.
  • Item
    Thumbnail Image
    Integrative analysis of RUNX1 downstream pathways and target genes
    Michaud, J ; Simpson, KM ; Escher, R ; Buchet-Poyau, K ; Beissbarth, T ; Carmichael, C ; Ritchie, ME ; Schuetz, F ; Cannon, P ; Liu, M ; Shen, X ; Ito, Y ; Raskind, WH ; Horwitz, MS ; Osato, M ; Turner, DR ; Speed, TP ; Kavallaris, M ; Smyth, GK ; Scott, HS (BMC, 2008-07-31)
    BACKGROUND: The RUNX1 transcription factor gene is frequently mutated in sporadic myeloid and lymphoid leukemia through translocation, point mutation or amplification. It is also responsible for a familial platelet disorder with predisposition to acute myeloid leukemia (FPD-AML). The disruption of the largely unknown biological pathways controlled by RUNX1 is likely to be responsible for the development of leukemia. We have used multiple microarray platforms and bioinformatic techniques to help identify these biological pathways to aid in the understanding of why RUNX1 mutations lead to leukemia. RESULTS: Here we report genes regulated either directly or indirectly by RUNX1 based on the study of gene expression profiles generated from 3 different human and mouse platforms. The platforms used were global gene expression profiling of: 1) cell lines with RUNX1 mutations from FPD-AML patients, 2) over-expression of RUNX1 and CBFbeta, and 3) Runx1 knockout mouse embryos using either cDNA or Affymetrix microarrays. We observe that our datasets (lists of differentially expressed genes) significantly correlate with published microarray data from sporadic AML patients with mutations in either RUNX1 or its cofactor, CBFbeta. A number of biological processes were identified among the differentially expressed genes and functional assays suggest that heterozygous RUNX1 point mutations in patients with FPD-AML impair cell proliferation, microtubule dynamics and possibly genetic stability. In addition, analysis of the regulatory regions of the differentially expressed genes has for the first time systematically identified numerous potential novel RUNX1 target genes. CONCLUSION: This work is the first large-scale study attempting to identify the genetic networks regulated by RUNX1, a master regulator in the development of the hematopoietic system and leukemia. The biological pathways and target genes controlled by RUNX1 will have considerable importance in disease progression in both familial and sporadic leukemia as well as therapeutic implications.
  • Item
    Thumbnail Image
    Illumina WG-6 BeadChip strips should be normalized separately
    Shi, W ; Banerjee, A ; Ritchie, ME ; Gerondakis, S ; Smyth, GK (BMC, 2009-11-11)
    BACKGROUND: Illumina Sentrix-6 Whole-Genome Expression BeadChips are relatively new microarray platforms which have been used in many microarray studies in the past few years. These Chips have a unique design in which each Chip contains six microarrays and each microarray consists of two separate physical strips, posing special challenges for precise between-array normalization of expression values. RESULTS: None of the normalization strategies proposed so far for this microarray platform allow for the possibility of systematic variation between the two strips comprising each array. That this variation can be substantial is illustrated by a data example. We demonstrate that normalizing at the strip-level rather than at the array-level can effectively remove this between-strip variation, improve the precision of gene expression measurements and discover more differentially expressed genes. The gain is substantial, yielding a 20% increase in statistical information and doubling the number of genes detected at a 5% false discovery rate. Functional analysis reveals that the extra genes found tend to have interesting biological meanings, dramatically strengthening the biological conclusions from the experiment. Strip-level normalization still outperforms array-level normalization when non-expressed probes are filtered out. CONCLUSION: Plots are proposed which demonstrate how the need for strip-level normalization relates to inconsistent intensity range variation between the strips. Strip-level normalization is recommended for the preprocessing of Illumina Sentrix-6 BeadChips whenever the intensity range is seen to be inconsistent between the strips. R code is provided to implement the recommended plots and normalization algorithms.
  • Item
    Thumbnail Image
    Proximal genomic localization of STATI binding and regulated transcriptional activity
    Wormald, S ; Hilton, DJ ; Smyth, GK ; Speed, TP (BMC, 2006-10-11)
    BACKGROUND: Signal transducer and activator of transcription (STAT) proteins are key regulators of gene expression in response to the interferon (IFN) family of anti-viral and anti-microbial cytokines. We have examined the genomic relationship between STAT1 binding and regulated transcription using multiple tiling microarray and chromatin immunoprecipitation microarray (ChIP-chip) experiments from public repositories. RESULTS: In response to IFN-gamma, STAT1 bound proximally to regions of the genome that exhibit regulated transcriptional activity. This finding was consistent between different tiling microarray platforms, and between different measures of transcriptional activity, including differential binding of RNA polymerase II, and differential mRNA transcription. Re-analysis of tiling microarray data from a recent study of IFN-gamma-induced STAT1 ChIP-chip and mRNA expression revealed that STAT1 binding is tightly associated with localized mRNA transcription in response to IFN-gamma. Close relationships were also apparent between STAT1 binding, STAT2 binding, and mRNA transcription in response to IFN-alpha. Furthermore, we found that sites of STAT1 binding within the Encyclopedia of DNA Elements (ENCODE) region are precisely correlated with sites of either enhanced or diminished binding by the RNA polymerase II complex. CONCLUSION: Together, our results indicate that STAT1 binds proximally to regions of the genome that exhibit regulated transcriptional activity. This finding establishes a generalized basis for the positioning of STAT1 binding sites within the genome, and supports a role for STAT1 in the direct recruitment of the RNA polymerase II complex to the promoters of IFN-gamma-responsive genes.
  • Item
    Thumbnail Image
    Microarray background correction: maximum likelihood estimation for the normal-exponential convolution
    Silver, JD ; Ritchie, ME ; Smyth, GK (OXFORD UNIV PRESS, 2009-04)
    Background correction is an important preprocessing step for microarray data that attempts to adjust the data for the ambient intensity surrounding each feature. The "normexp" method models the observed pixel intensities as the sum of 2 random variables, one normally distributed and the other exponentially distributed, representing background noise and signal, respectively. Using a saddle-point approximation, Ritchie and others (2007) found normexp to be the best background correction method for 2-color microarray data. This article develops the normexp method further by improving the estimation of the parameters. A complete mathematical development is given of the normexp model and the associated saddle-point approximation. Some subtle numerical programming issues are solved which caused the original normexp method to fail occasionally when applied to unusual data sets. A practical and reliable algorithm is developed for exact maximum likelihood estimation (MLE) using high-quality optimization software and using the saddle-point estimates as starting values. "MLE" is shown to outperform heuristic estimators proposed by other authors, both in terms of estimation accuracy and in terms of performance on real data. The saddle-point approximation is an adequate replacement in most practical situations. The performance of normexp for assessing differential expression is improved by adding a small offset to the corrected intensities.
  • Item
    Thumbnail Image
    Deaf-I regulates epithelial cell proliferation and side-branching in the mammary gland
    Barker, HE ; Smyth, GK ; Wettenhall, J ; Ward, TA ; Bath, ML ; Lindeman, GJ ; Visvader, JE (BMC, 2008-10-01)
    BACKGROUND: The transcription factor DEAF-1 has been identified as a high affinity binding partner of the LIM-only protein LMO4 that plays important roles in mammary gland development and breast cancer. Here we investigated the influence of DEAF-1 on human and mouse mammary epithelial cells both in vitro and in vivo and identified a potential target gene. RESULTS: Overexpression of DEAF-1 in human breast epithelial MCF10A cells enhanced cell proliferation in the mammary acini that develop in 3D cultures. To investigate the effects of Deaf-1 on mammary gland development and oncogenesis, we generated MMTV-Deaf-1 transgenic mice. Increased ductal side-branching was observed in young virgin mammary glands, accompanied by augmented cell proliferation. In addition, the ratio of the progesterone receptor isoforms PRA and PRB, previously implicated in regulating ductal side-branching, was altered. Affymetrix gene profiling studies revealed Rac3 as a potential target gene and quantitative RT-PCR analysis confirmed that Rac3 was upregulated by Deaf-1 in immortalized mouse mammary epithelial cells. Furthermore, MMTV-Deaf-1 transgenic mammary glands were found to have elevated levels of Rac3 mRNA, suggesting that it is a bona fide target. CONCLUSION: We have demonstrated that overexpression of Deaf-1 enhances the proliferation of human breast epithelial cells in vitro and mouse epithelial cells in vivo. Transgenic mammary glands overexpressing Deaf-1 exhibited a modest side-branching phenotype, accompanied by an increase in the number of BrdU-positive cells and a decrease in the proportion of PRA-expressing cells. Although proliferation was enhanced in Deaf-1 transgenic mice, overexpression of this gene was not sufficient to induce the formation of mammary tumors. In addition, our studies identified Rac3, encoding a small Rho-like GTPase, as a potential target of Deaf-1 in mouse mammary epithelial cells.
  • Item
    Thumbnail Image
    Normalization of boutique two-color microarrays with a high proportion of differentially expressed probes
    Oshlack, A ; Emslie, D ; Corcoran, LM ; Smyth, GK (BMC, 2007)
    Normalization is critical for removing systematic variation from microarray data. For two-color microarray platforms, intensity-dependent lowess normalization is commonly used to correct relative gene expression values for biases. Here we outline a normalization method for use when the assumptions of lowess normalization fail. Specifically, this can occur when specialized boutique arrays are constructed that contain a subset of genes selected to test particular biological functions.