Medical Biology - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 10
  • Item
    Thumbnail Image
    Empirical array quality weights in the analysis of microarray data
    Ritchie, ME ; Diyagama, D ; Neilson, J ; van Laar, R ; Dobrovic, A ; Holloway, A ; Smyth, GK (BMC, 2006-05-19)
    BACKGROUND: Assessment of array quality is an essential step in the analysis of data from microarray experiments. Once detected, less reliable arrays are typically excluded or "filtered" from further analysis to avoid misleading results. RESULTS: In this article, a graduated approach to array quality is considered based on empirical reproducibility of the gene expression measures from replicate arrays. Weights are assigned to each microarray by fitting a heteroscedastic linear model with shared array variance terms. A novel gene-by-gene update algorithm is used to efficiently estimate the array variances. The inverse variances are used as weights in the linear model analysis to identify differentially expressed genes. The method successfully assigns lower weights to less reproducible arrays from different experiments. Down-weighting the observations from suspect arrays increases the power to detect differential expression. In smaller experiments, this approach outperforms the usual method of filtering the data. The method is available in the limma software package which is implemented in the R software environment. CONCLUSION: This method complements existing normalisation and spot quality procedures, and allows poorer quality arrays, which would otherwise be discarded, to be included in an analysis. It is applicable to microarray data from experiments with some level of replication.
  • Item
    Thumbnail Image
    Integrative analysis of RUNX1 downstream pathways and target genes
    Michaud, J ; Simpson, KM ; Escher, R ; Buchet-Poyau, K ; Beissbarth, T ; Carmichael, C ; Ritchie, ME ; Schuetz, F ; Cannon, P ; Liu, M ; Shen, X ; Ito, Y ; Raskind, WH ; Horwitz, MS ; Osato, M ; Turner, DR ; Speed, TP ; Kavallaris, M ; Smyth, GK ; Scott, HS (BMC, 2008-07-31)
    BACKGROUND: The RUNX1 transcription factor gene is frequently mutated in sporadic myeloid and lymphoid leukemia through translocation, point mutation or amplification. It is also responsible for a familial platelet disorder with predisposition to acute myeloid leukemia (FPD-AML). The disruption of the largely unknown biological pathways controlled by RUNX1 is likely to be responsible for the development of leukemia. We have used multiple microarray platforms and bioinformatic techniques to help identify these biological pathways to aid in the understanding of why RUNX1 mutations lead to leukemia. RESULTS: Here we report genes regulated either directly or indirectly by RUNX1 based on the study of gene expression profiles generated from 3 different human and mouse platforms. The platforms used were global gene expression profiling of: 1) cell lines with RUNX1 mutations from FPD-AML patients, 2) over-expression of RUNX1 and CBFbeta, and 3) Runx1 knockout mouse embryos using either cDNA or Affymetrix microarrays. We observe that our datasets (lists of differentially expressed genes) significantly correlate with published microarray data from sporadic AML patients with mutations in either RUNX1 or its cofactor, CBFbeta. A number of biological processes were identified among the differentially expressed genes and functional assays suggest that heterozygous RUNX1 point mutations in patients with FPD-AML impair cell proliferation, microtubule dynamics and possibly genetic stability. In addition, analysis of the regulatory regions of the differentially expressed genes has for the first time systematically identified numerous potential novel RUNX1 target genes. CONCLUSION: This work is the first large-scale study attempting to identify the genetic networks regulated by RUNX1, a master regulator in the development of the hematopoietic system and leukemia. The biological pathways and target genes controlled by RUNX1 will have considerable importance in disease progression in both familial and sporadic leukemia as well as therapeutic implications.
  • Item
    Thumbnail Image
    Illumina WG-6 BeadChip strips should be normalized separately
    Shi, W ; Banerjee, A ; Ritchie, ME ; Gerondakis, S ; Smyth, GK (BMC, 2009-11-11)
    BACKGROUND: Illumina Sentrix-6 Whole-Genome Expression BeadChips are relatively new microarray platforms which have been used in many microarray studies in the past few years. These Chips have a unique design in which each Chip contains six microarrays and each microarray consists of two separate physical strips, posing special challenges for precise between-array normalization of expression values. RESULTS: None of the normalization strategies proposed so far for this microarray platform allow for the possibility of systematic variation between the two strips comprising each array. That this variation can be substantial is illustrated by a data example. We demonstrate that normalizing at the strip-level rather than at the array-level can effectively remove this between-strip variation, improve the precision of gene expression measurements and discover more differentially expressed genes. The gain is substantial, yielding a 20% increase in statistical information and doubling the number of genes detected at a 5% false discovery rate. Functional analysis reveals that the extra genes found tend to have interesting biological meanings, dramatically strengthening the biological conclusions from the experiment. Strip-level normalization still outperforms array-level normalization when non-expressed probes are filtered out. CONCLUSION: Plots are proposed which demonstrate how the need for strip-level normalization relates to inconsistent intensity range variation between the strips. Strip-level normalization is recommended for the preprocessing of Illumina Sentrix-6 BeadChips whenever the intensity range is seen to be inconsistent between the strips. R code is provided to implement the recommended plots and normalization algorithms.
  • Item
    Thumbnail Image
    Microarray background correction: maximum likelihood estimation for the normal-exponential convolution
    Silver, JD ; Ritchie, ME ; Smyth, GK (OXFORD UNIV PRESS, 2009-04)
    Background correction is an important preprocessing step for microarray data that attempts to adjust the data for the ambient intensity surrounding each feature. The "normexp" method models the observed pixel intensities as the sum of 2 random variables, one normally distributed and the other exponentially distributed, representing background noise and signal, respectively. Using a saddle-point approximation, Ritchie and others (2007) found normexp to be the best background correction method for 2-color microarray data. This article develops the normexp method further by improving the estimation of the parameters. A complete mathematical development is given of the normexp model and the associated saddle-point approximation. Some subtle numerical programming issues are solved which caused the original normexp method to fail occasionally when applied to unusual data sets. A practical and reliable algorithm is developed for exact maximum likelihood estimation (MLE) using high-quality optimization software and using the saddle-point estimates as starting values. "MLE" is shown to outperform heuristic estimators proposed by other authors, both in terms of estimation accuracy and in terms of performance on real data. The saddle-point approximation is an adequate replacement in most practical situations. The performance of normexp for assessing differential expression is improved by adding a small offset to the corrected intensities.
  • Item
    Thumbnail Image
    BASH: a tool for managing BeadArray spatial artefacts
    Cairns, JM ; Dunning, MJ ; Ritchie, ME ; Russell, R ; Lynch, AG (OXFORD UNIV PRESS, 2008-12-15)
    SUMMARY: With their many replicates and their random layouts, Illumina BeadArrays provide greater scope fordetecting spatial artefacts than do other microarray technologies. They are also robust to artefact exclusion, yet there is a lack of tools that can perform these tasks for Illumina. We present BASH, a tool for this purpose. BASH adopts the concepts of Harshlight, but implements them in a manner that utilizes the unique characteristics of the Illumina technology. Using bead-level data, spatial artefacts of various kinds can thus be identified and excluded from further analyses. AVAILABILITY: The beadarray Bioconductor package (version 1.10 onwards), www.bioconductor.org
  • Item
    Thumbnail Image
    Modifier Effects between Regulatory and Protein-Coding Variation
    Dimas, AS ; Stranger, BE ; Beazley, C ; Finn, RD ; Ingle, CE ; Forrest, MS ; Ritchie, ME ; Deloukas, P ; Tavare, S ; Dermitzakis, ET ; Gibson, G (PUBLIC LIBRARY SCIENCE, 2008-10)
    Genome-wide associations have shown a lot of promise in dissecting the genetics of complex traits in humans with single variants, yet a large fraction of the genetic effects is still unaccounted for. Analyzing genetic interactions between variants (epistasis) is one of the potential ways forward. We investigated the abundance and functional impact of a specific type of epistasis, namely the interaction between regulatory and protein-coding variants. Using genotype and gene expression data from the 210 unrelated individuals of the original four HapMap populations, we have explored the combined effects of regulatory and protein-coding single nucleotide polymorphisms (SNPs). We predict that about 18% (1,502 out of 8,233 nsSNPs) of protein-coding variants are differentially expressed among individuals and demonstrate that regulatory variants can modify the functional effect of a coding variant in cis. Furthermore, we show that such interactions in cis can affect the expression of downstream targets of the gene containing the protein-coding SNP. In this way, a cis interaction between regulatory and protein-coding variants has a trans impact on gene expression. Given the abundance of both types of variants in human populations, we propose that joint consideration of regulatory and protein-coding variants may reveal additional genetic effects underlying complex traits and disease and may shed light on causes of differential penetrance of known disease variants.
  • Item
    Thumbnail Image
    R/Bioconductor software for Illumina's Infinium whole-genome genotyping BeadChips
    Ritchie, ME ; Carvalho, BS ; Hetrick, KN ; Tavare, S ; Irizarry, RA (OXFORD UNIV PRESS, 2009-10-01)
    UNLABELLED: Illumina produces a number of microarray-based technologies for human genotyping. An Infinium BeadChip is a two-color platform that types between 10(5) and 10(6) single nucleotide polymorphisms (SNPs) per sample. Despite being widely used, there is a shortage of open source software to process the raw intensities from this platform into genotype calls. To this end, we have developed the R/Bioconductor package crlmm for analyzing BeadChip data. After careful preprocessing, our software applies the CRLMM algorithm to produce genotype calls, confidence scores and other quality metrics at both the SNP and sample levels. We provide access to the raw summary-level intensity data, allowing users to develop their own methods for genotype calling or copy number analysis if they wish. AVAILABILITY AND IMPLEMENTATION: The crlmm Bioconductor package is available from http://www.bioconductor.org. Data packages and documentation are available from http://rafalab.jhsph.edu/software.html.
  • Item
    Thumbnail Image
    Spike-in validation of an Illumina-specific variance-stabilizing transformation.
    Dunning, MJ ; Ritchie, ME ; Barbosa-Morais, NL ; Tavaré, S ; Lynch, AG (Springer Science and Business Media LLC, 2008-06-04)
    BACKGROUND: Variance-stabilizing techniques have been used for some time in the analysis of gene expression microarray data. A new adaptation, the variance-stabilizing transformation (VST), has recently been developed to take advantage of the unique features of Illumina BeadArrays. VST has been shown to perform well in comparison with the widely-used approach of taking a log2 transformation, but has not been validated on a spike-in experiment. We apply VST to the data from a recently published spike-in experiment and compare it both to a regular log2 analysis and a recently recommended analysis that can be applied if all raw data are available. FINDINGS: VST provides more power to detect differentially expressed genes than a log2 transformation. However, the gain in power is roughly the same as utilizing the raw data from an experiment and weighting observations accordingly. VST is still advantageous when large changes in expression are anticipated, while a weighted log2 approach performs better for smaller changes. CONCLUSION: VST can be recommended for summarized Illumina data regardless of which Illumina pre-processing options have been used. However, using the raw data is still encouraged whenever possible.
  • Item
    Thumbnail Image
    Swift: primary data analysis for the Illumina Solexa sequencing platform
    Whiteford, N ; Skelly, T ; Curtis, C ; Ritchie, ME ; Loehr, A ; Zaranek, AW ; Abnizova, I ; Brown, C (OXFORD UNIV PRESS, 2009-09-01)
    MOTIVATION: Primary data analysis methods are of critical importance in second generation DNA sequencing. Improved methods have the potential to increase yield and reduce the error rates. Openly documented analysis tools enable the user to understand the primary data, this is important for the optimization and validity of their scientific work. RESULTS: In this article, we describe Swift, a new tool for performing primary data analysis on the Illumina Solexa Sequencing Platform. Swift is the first tool, outside of the vendors own software, which completes the full analysis process, from raw images through to base calls. As such it provides an alternative to, and independent validation of, the vendor supplied tool. Our results show that Swift is able to increase yield by 13.8%, at comparable error rate.
  • Item
    Thumbnail Image
    Statistical issues in the analysis of Illumina data
    Dunning, MJ ; Barbosa-Morais, NL ; Lynch, AG ; Tavare, S ; Ritchie, ME (BMC, 2008-02-06)
    BACKGROUND: Illumina bead-based arrays are becoming increasingly popular due to their high degree of replication and reported high data quality. However, little attention has been paid to the pre-processing of Illumina data. In this paper, we present our experience of analysing the raw data from an Illumina spike-in experiment and offer guidelines for those wishing to analyse expression data or develop new methodologies for this technology. RESULTS: We find that the local background estimated by Illumina is consistently low, and subtracting this background is beneficial for detecting differential expression (DE). Illumina's summary method performs well at removing outliers, producing estimates which are less biased and are less variable than other robust summary methods. However, quality assessment on summarised data may miss spatial artefacts present in the raw data. Also, we find that the background normalisation method used in Illumina's proprietary software (BeadStudio) can cause problems with a standard DE analysis. We demonstrate that variances calculated from the raw data can be used as inverse weights in the DE analysis to improve power. Finally, variability in both expression levels and DE statistics can be attributed to differences in probe composition. These differences are not accounted for by current analysis methods and require further investigation. CONCLUSION: Analysing Illumina expression data using BeadStudio is reasonable because of the conservative estimates of summary values produced by the software. Improvements can however be made by not using background normalisation. Access to the raw data allows for a more detailed quality assessment and flexible analyses. In the case of a gene expression study, data can be analysed on an appropriate scale using established tools. Similar improvements can be expected for other Illumina assays.