School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 123
  • Item
    No Preview Available
    The EADGENE microarray data analysis workshop (open access publication)
    De Koning, D-J ; Jaffrezic, F ; Lund, MS ; Watson, M ; Channing, C ; Hulsegge, I ; Pool, MH ; Buitenhuis, B ; Hedegaard, J ; Hornshoj, H ; Jiang, L ; Sorensen, P ; Marot, G ; Delmas, C ; Le Cao, K-A ; Cristobal, MS ; Baron, MD ; Malinverni, R ; Stella, A ; Brunner, RM ; Seyfert, H-M ; Jensen, K ; Mouzaki, D ; Waddington, D ; Jimenez-Marin, A ; Perez-Alegre, M ; Perez-Reinado, E ; Closset, R ; Detilleux, JC ; Dovc, P ; Lavric, M ; Nie, H ; Janss, L (EDP SCIENCES S A, 2007)
    Microarray analyses have become an important tool in animal genomics. While their use is becoming widespread, there is still a lot of ongoing research regarding the analysis of microarray data. In the context of a European Network of Excellence, 31 researchers representing 14 research groups from 10 countries performed and discussed the statistical analyses of real and simulated 2-colour microarray data that were distributed among participants. The real data consisted of 48 microarrays from a disease challenge experiment in dairy cattle, while the simulated data consisted of 10 microarrays from a direct comparison of two treatments (dye-balanced). While there was broader agreement with regards to methods of microarray normalisation and significance testing, there were major differences with regards to quality control. The quality control approaches varied from none, through using statistical weights, to omitting a large number of spots or omitting entire slides. Surprisingly, these very different approaches gave quite similar results when applied to the simulated data, although not all participating groups analysed both real and simulated data. The workshop was very successful in facilitating interaction between scientists with a diverse background but a common interest in microarray analyses.
  • Item
    No Preview Available
    Analysis of a simulated microarray dataset:: Comparison of methods for data normalisation and detection of differential expression (Open Access publication)
    Watson, M ; Alegre, MP ; Baron, MD ; Delmas, C ; Dovc, P ; Duval, M ; Foulley, JL ; Pavon, JJG ; Hulsegge, I ; Jaffrezic, F ; Marin, AJ ; Lavric, M ; Le Cao, KA ; Marot, G ; Mouzaki, D ; Pool, MH ; Granie, CR ; Cristobal, MS ; Klopp, GT ; Waddington, D ; De Koning, DJ (EDP SCIENCES S A, 2007)
    Microarrays allow researchers to measure the expression of thousands of genes in a single experiment. Before statistical comparisons can be made, the data must be assessed for quality and normalisation procedures must be applied, of which many have been proposed. Methods of comparing the normalised data are also abundant, and no clear consensus has yet been reached. The purpose of this paper was to compare those methods used by the EADGENE network on a very noisy simulated data set. With the a priori knowledge of which genes are differentially expressed, it is possible to compare the success of each approach quantitatively. Use of an intensity-dependent normalisation procedure was common, as was correction for multiple testing. Most variety in performance resulted from differing approaches to data quality and the use of different statistical tests. Very few of the methods used any kind of background correction. A number of approaches achieved a success rate of 95% or above, with relatively small numbers of false positives and negatives. Applying stringent spot selection criteria and elimination of data did not improve the false positive rate and greatly increased the false negative rate. However, most approaches performed well, and it is encouraging that widely available techniques can achieve such good results on a very noisy data set.
  • Item
    No Preview Available
    Analysis of the real EADGENE data set:: Multivariate approaches and post analysis (Open Access publication)
    Sorensen, P ; Bonnet, A ; Buitenhuis, B ; Closset, R ; Dejean, S ; Delmas, C ; Duval, M ; Glass, L ; Hedegaard, J ; Hornshoj, H ; Hulsegge, I ; Jaffrezic, F ; Jensen, K ; Jiang, L ; De Koning, D-J ; Le Cao, K-A ; Nie, H ; Petzl, W ; Pool, MH ; Robert-Granie, C ; Cristobal, MS ; Lund, MS ; Van Schothorst, EM ; Schuberth, H-J ; Seyfert, H-M ; Tosser-Klopp, G ; Waddington, D ; Watson, M ; Yang, W ; Zerbe, H (BMC, 2007)
    The aim of this paper was to describe, and when possible compare, the multivariate methods used by the participants in the EADGENE WP1.4 workshop. The first approach was for class discovery and class prediction using evidence from the data at hand. Several teams used hierarchical clustering (HC) or principal component analysis (PCA) to identify groups of differentially expressed genes with a similar expression pattern over time points and infective agent (E. coli or S. aureus). The main result from these analyses was that HC and PCA were able to separate tissue samples taken at 24 h following E. coli infection from the other samples. The second approach identified groups of differentially co-expressed genes, by identifying clusters of genes highly correlated when animals were infected with E. coli but not correlated more than expected by chance when the infective pathogen was S. aureus. The third approach looked at differential expression of predefined gene sets. Gene sets were defined based on information retrieved from biological databases such as Gene Ontology. Based on these annotation sources the teams used either the GlobalTest or the Fisher exact test to identify differentially expressed gene sets. The main result from these analyses was that gene sets involved in immune defence responses were differentially expressed.
  • Item
    No Preview Available
    Analysis of the real EADGENE data set:: Comparison of methods and guidelines for data normalisation and selection of diffrentially expressed genes (Open Access publication)
    Jaffrezic, F ; De Koning, D-J ; Boettcher, PJ ; Bonnet, A ; Buitenhuis, B ; Closset, R ; Dejean, S ; Delmas, C ; Detilleux, JC ; Dovc, P ; Duval, M ; Foulley, J-L ; Hedegaard, J ; Hornshoj, H ; Hulsegge, I ; Janss, L ; Jensen, K ; Jiang, L ; Lavric, M ; Le Cao, K-A ; Lund, MS ; Malinverni, R ; Marot, G ; Nie, H ; Petzl, W ; Pool, MH ; Granie, CR ; Cristobal, MS ; Van Schothorst, EM ; Schuberth, H-J ; Sorensen, P ; Stella, A ; Tosser-Klopp, G ; Waddington, D ; Watson, M ; Yang, W ; Zerbe, H ; Seyfert, H-M (BMC, 2007)
    A large variety of methods has been proposed in the literature for microarray data analysis. The aim of this paper was to present techniques used by the EADGENE (European Animal Disease Genomics Network of Excellence) WP1.4 participants for data quality control, normalisation and statistical methods for the detection of differentially expressed genes in order to provide some more general data analysis guidelines. All the workshop participants were given a real data set obtained in an EADGENE funded microarray study looking at the gene expression changes following artificial infection with two different mastitis causing bacteria: Escherichia coli and Staphylococcus aureus. It was reassuring to see that most of the teams found the same main biological results. In fact, most of the differentially expressed genes were found for infection by E. coli between uninfected and 24 h challenged udder quarters. Very little transcriptional variation was observed for the bacteria S. aureus. Lists of differentially expressed genes found by the different research teams were, however, quite dependent on the method used, especially concerning the data quality control step. These analyses also emphasised a biological problem of cross-talk between infected and uninfected quarters which will have to be dealt with for further microarray studies.
  • Item
    No Preview Available
    Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data
    Dai, MH ; Wang, PL ; Boyd, AD ; Kostov, G ; Athey, B ; Jones, EG ; Bunney, WE ; Myers, RM ; Speed, TP ; Akil, H ; Watson, SJ ; Meng, F (OXFORD UNIV PRESS, 2005)
    Genome-wide expression profiling is a powerful tool for implicating novel gene ensembles in cellular mechanisms of health and disease. The most popular platform for genome-wide expression profiling is the Affymetrix GeneChip. However, its selection of probes relied on earlier genome and transcriptome annotation which is significantly different from current knowledge. The resultant informatics problems have a profound impact on analysis and interpretation the data. Here, we address these critical issues and offer a solution. We identified several classes of problems at the individual probe level in the existing annotation, under the assumption that current genome and transcriptome databases are more accurate than those used for GeneChip design. We then reorganized probes on more than a dozen popular GeneChips into gene-, transcript- and exon-specific probe sets in light of up-to-date genome, cDNA/EST clustering and single nucleotide polymorphism information. Comparing analysis results between the original and the redefined probe sets reveals approximately 30-50% discrepancy in the genes previously identified as differentially expressed, regardless of analysis method. Our results demonstrate that the original Affymetrix probe set definitions are inaccurate, and many conclusions derived from past GeneChip analyses may be significantly flawed. It will be beneficial to re-analyze existing GeneChip data with updated probe set definitions.
  • Item
    No Preview Available
    A chain multinomial model for estimating the real-time fatality rate of a disease, with an application to severe acute respiratory syndrome
    Yip, PSF ; Lau, EHY ; Lam, KF ; Huggins, RM (OXFORD UNIV PRESS INC, 2005-04-01)
    It is well known that statistics using cumulative data are insensitive to changes. World Health Organization (WHO) estimates of fatality rates are of the above type, which may not be able to reflect the latest changes in fatality due to treatment or government policy in a timely fashion. Here, the authors propose an estimate of a real-time fatality rate based on a chain multinomial model with a kernel function. It is more accurate than the WHO estimate in describing fatality, especially earlier in the course of an epidemic. The estimator provides useful information for public health policy makers for understanding the severity of the disease or evaluating the effects of treatments or policies within a shorter time period, which is critical in disease control during an outbreak. Simulation results showed that the performance of the proposed estimator is superior to that of the WHO estimator in terms of its sensitivity to changes and its timeliness in reflecting the severity of the disease.
  • Item
    Thumbnail Image
    Polycomb repressive complex 2 (PRC2) restricts hematopoietic stem cell activity
    Majewski, IJ ; Blewitt, ME ; de Graaf, CA ; McManus, EJ ; Bahlo, M ; Hilton, AA ; Hyland, CD ; Smyth, GK ; Corbin, JE ; Metcalf, D ; Alexander, WS ; Hilton, DJ ; Goodell, MA (PUBLIC LIBRARY SCIENCE, 2008-04)
    Polycomb group proteins are transcriptional repressors that play a central role in the establishment and maintenance of gene expression patterns during development. Using mice with an N-ethyl-N-nitrosourea (ENU)-induced mutation in Suppressor of Zeste 12 (Suz12), a core component of Polycomb Repressive Complex 2 (PRC2), we show here that loss of Suz12 function enhances hematopoietic stem cell (HSC) activity. In addition to these effects on a wild-type genetic background, mutations in Suz12 are sufficient to ameliorate the stem cell defect and thrombocytopenia present in mice that lack the thrombopoietin receptor (c-Mpl). To investigate the molecular targets of the PRC2 complex in the HSC compartment, we examined changes in global patterns of gene expression in cells deficient in Suz12. We identified a distinct set of genes that are regulated by Suz12 in hematopoietic cells, including eight genes that appear to be highly responsive to PRC2 function within this compartment. These data suggest that PRC2 is required to maintain a specific gene expression pattern in hematopoiesis that is indispensable to normal stem cell function.
  • Item
    Thumbnail Image
    A Mouse Model of Harlequin Ichthyosis Delineates a Key Role for Abca12 in Lipid Homeostasis
    Smyth, I ; Hacking, DF ; Hilton, AA ; Mukhamedova, N ; Meikle, PJ ; Ellis, S ; Slattery, K ; Collinge, JE ; de Graaf, CA ; Bahlo, M ; Sviridov, D ; Kile, BT ; Hilton, DJ ; Beier, DR (PUBLIC LIBRARY SCIENCE, 2008-09)
    Harlequin Ichthyosis (HI) is a severe and often lethal hyperkeratotic skin disease caused by mutations in the ABCA12 transport protein. In keratinocytes, ABCA12 is thought to regulate the transfer of lipids into small intracellular trafficking vesicles known as lamellar bodies. However, the nature and scope of this regulation remains unclear. As part of an original recessive mouse ENU mutagenesis screen, we have identified and characterised an animal model of HI and showed that it displays many of the hallmarks of the disease including hyperkeratosis, loss of barrier function, and defects in lipid homeostasis. We have used this model to follow disease progression in utero and present evidence that loss of Abca12 function leads to premature differentiation of basal keratinocytes. A comprehensive analysis of lipid levels in mutant epidermis demonstrated profound defects in lipid homeostasis, illustrating for the first time the extent to which Abca12 plays a pivotal role in maintaining lipid balance in the skin. To further investigate the scope of Abca12's activity, we have utilised cells from the mutant mouse to ascribe direct transport functions to the protein and, in doing so, we demonstrate activities independent of its role in lamellar body function. These cells have severely impaired lipid efflux leading to intracellular accumulation of neutral lipids. Furthermore, we identify Abca12 as a mediator of Abca1-regulated cellular cholesterol efflux, a finding that may have significant implications for other diseases of lipid metabolism and homeostasis, including atherosclerosis.
  • Item
    Thumbnail Image
    Statistical analysis of an RNA titration series evaluates microarray precision and sensitivity on a whole-array basis
    Holloway, AJ ; Oshlack, A ; Diyagama, DS ; Bowtell, DDL ; Smyth, GK (BMC, 2006-11-22)
    BACKGROUND: Concerns are often raised about the accuracy of microarray technologies and the degree of cross-platform agreement, but there are yet no methods which can unambiguously evaluate precision and sensitivity for these technologies on a whole-array basis. RESULTS: A methodology is described for evaluating the precision and sensitivity of whole-genome gene expression technologies such as microarrays. The method consists of an easy-to-construct titration series of RNA samples and an associated statistical analysis using non-linear regression. The method evaluates the precision and responsiveness of each microarray platform on a whole-array basis, i.e., using all the probes, without the need to match probes across platforms. An experiment is conducted to assess and compare four widely used microarray platforms. All four platforms are shown to have satisfactory precision but the commercial platforms are superior for resolving differential expression for genes at lower expression levels. The effective precision of the two-color platforms is improved by allowing for probe-specific dye-effects in the statistical model. The methodology is used to compare three data extraction algorithms for the Affymetrix platforms, demonstrating poor performance for the commonly used proprietary algorithm relative to the other algorithms. For probes which can be matched across platforms, the cross-platform variability is decomposed into within-platform and between-platform components, showing that platform disagreement is almost entirely systematic rather than due to measurement variability. CONCLUSION: The results demonstrate good precision and sensitivity for all the platforms, but highlight the need for improved probe annotation. They quantify the extent to which cross-platform measures can be expected to be less accurate than within-platform comparisons for predicting disease progression or outcome.
  • Item
    Thumbnail Image
    Empirical array quality weights in the analysis of microarray data
    Ritchie, ME ; Diyagama, D ; Neilson, J ; van Laar, R ; Dobrovic, A ; Holloway, A ; Smyth, GK (BMC, 2006-05-19)
    BACKGROUND: Assessment of array quality is an essential step in the analysis of data from microarray experiments. Once detected, less reliable arrays are typically excluded or "filtered" from further analysis to avoid misleading results. RESULTS: In this article, a graduated approach to array quality is considered based on empirical reproducibility of the gene expression measures from replicate arrays. Weights are assigned to each microarray by fitting a heteroscedastic linear model with shared array variance terms. A novel gene-by-gene update algorithm is used to efficiently estimate the array variances. The inverse variances are used as weights in the linear model analysis to identify differentially expressed genes. The method successfully assigns lower weights to less reproducible arrays from different experiments. Down-weighting the observations from suspect arrays increases the power to detect differential expression. In smaller experiments, this approach outperforms the usual method of filtering the data. The method is available in the limma software package which is implemented in the R software environment. CONCLUSION: This method complements existing normalisation and spot quality procedures, and allows poorer quality arrays, which would otherwise be discarded, to be included in an analysis. It is applicable to microarray data from experiments with some level of replication.