School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 111
  • Item
    No Preview Available
    Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data
    Dai, MH ; Wang, PL ; Boyd, AD ; Kostov, G ; Athey, B ; Jones, EG ; Bunney, WE ; Myers, RM ; Speed, TP ; Akil, H ; Watson, SJ ; Meng, F (OXFORD UNIV PRESS, 2005)
    Genome-wide expression profiling is a powerful tool for implicating novel gene ensembles in cellular mechanisms of health and disease. The most popular platform for genome-wide expression profiling is the Affymetrix GeneChip. However, its selection of probes relied on earlier genome and transcriptome annotation which is significantly different from current knowledge. The resultant informatics problems have a profound impact on analysis and interpretation the data. Here, we address these critical issues and offer a solution. We identified several classes of problems at the individual probe level in the existing annotation, under the assumption that current genome and transcriptome databases are more accurate than those used for GeneChip design. We then reorganized probes on more than a dozen popular GeneChips into gene-, transcript- and exon-specific probe sets in light of up-to-date genome, cDNA/EST clustering and single nucleotide polymorphism information. Comparing analysis results between the original and the redefined probe sets reveals approximately 30-50% discrepancy in the genes previously identified as differentially expressed, regardless of analysis method. Our results demonstrate that the original Affymetrix probe set definitions are inaccurate, and many conclusions derived from past GeneChip analyses may be significantly flawed. It will be beneficial to re-analyze existing GeneChip data with updated probe set definitions.
  • Item
    No Preview Available
    The impact of low-cost, genome-wide resequencing on association studies.
    Balding, D (Springer Science and Business Media LLC, 2005-06)
  • Item
    No Preview Available
    A chain multinomial model for estimating the real-time fatality rate of a disease, with an application to severe acute respiratory syndrome
    Yip, PSF ; Lau, EHY ; Lam, KF ; Huggins, RM (OXFORD UNIV PRESS INC, 2005-04-01)
    It is well known that statistics using cumulative data are insensitive to changes. World Health Organization (WHO) estimates of fatality rates are of the above type, which may not be able to reflect the latest changes in fatality due to treatment or government policy in a timely fashion. Here, the authors propose an estimate of a real-time fatality rate based on a chain multinomial model with a kernel function. It is more accurate than the WHO estimate in describing fatality, especially earlier in the course of an epidemic. The estimator provides useful information for public health policy makers for understanding the severity of the disease or evaluating the effects of treatments or policies within a shorter time period, which is critical in disease control during an outbreak. Simulation results showed that the performance of the proposed estimator is superior to that of the WHO estimator in terms of its sensitivity to changes and its timeliness in reflecting the severity of the disease.
  • Item
    Thumbnail Image
    Statistical analysis of an RNA titration series evaluates microarray precision and sensitivity on a whole-array basis
    Holloway, AJ ; Oshlack, A ; Diyagama, DS ; Bowtell, DDL ; Smyth, GK (BMC, 2006-11-22)
    BACKGROUND: Concerns are often raised about the accuracy of microarray technologies and the degree of cross-platform agreement, but there are yet no methods which can unambiguously evaluate precision and sensitivity for these technologies on a whole-array basis. RESULTS: A methodology is described for evaluating the precision and sensitivity of whole-genome gene expression technologies such as microarrays. The method consists of an easy-to-construct titration series of RNA samples and an associated statistical analysis using non-linear regression. The method evaluates the precision and responsiveness of each microarray platform on a whole-array basis, i.e., using all the probes, without the need to match probes across platforms. An experiment is conducted to assess and compare four widely used microarray platforms. All four platforms are shown to have satisfactory precision but the commercial platforms are superior for resolving differential expression for genes at lower expression levels. The effective precision of the two-color platforms is improved by allowing for probe-specific dye-effects in the statistical model. The methodology is used to compare three data extraction algorithms for the Affymetrix platforms, demonstrating poor performance for the commonly used proprietary algorithm relative to the other algorithms. For probes which can be matched across platforms, the cross-platform variability is decomposed into within-platform and between-platform components, showing that platform disagreement is almost entirely systematic rather than due to measurement variability. CONCLUSION: The results demonstrate good precision and sensitivity for all the platforms, but highlight the need for improved probe annotation. They quantify the extent to which cross-platform measures can be expected to be less accurate than within-platform comparisons for predicting disease progression or outcome.
  • Item
    Thumbnail Image
    Empirical array quality weights in the analysis of microarray data
    Ritchie, ME ; Diyagama, D ; Neilson, J ; van Laar, R ; Dobrovic, A ; Holloway, A ; Smyth, GK (BMC, 2006-05-19)
    BACKGROUND: Assessment of array quality is an essential step in the analysis of data from microarray experiments. Once detected, less reliable arrays are typically excluded or "filtered" from further analysis to avoid misleading results. RESULTS: In this article, a graduated approach to array quality is considered based on empirical reproducibility of the gene expression measures from replicate arrays. Weights are assigned to each microarray by fitting a heteroscedastic linear model with shared array variance terms. A novel gene-by-gene update algorithm is used to efficiently estimate the array variances. The inverse variances are used as weights in the linear model analysis to identify differentially expressed genes. The method successfully assigns lower weights to less reproducible arrays from different experiments. Down-weighting the observations from suspect arrays increases the power to detect differential expression. In smaller experiments, this approach outperforms the usual method of filtering the data. The method is available in the limma software package which is implemented in the R software environment. CONCLUSION: This method complements existing normalisation and spot quality procedures, and allows poorer quality arrays, which would otherwise be discarded, to be included in an analysis. It is applicable to microarray data from experiments with some level of replication.
  • Item
    Thumbnail Image
    Lineage-specific expansion of proteins exported to erythrocytes in malaria parasites
    Sargeant, TJ ; Marti, M ; Caler, E ; Carlton, JM ; Simpson, K ; Speed, TP ; Cowman, AF (BMC, 2006)
    BACKGROUND: The apicomplexan parasite Plasmodium falciparum causes the most severe form of malaria in humans. After invasion into erythrocytes, asexual parasite stages drastically alter their host cell and export remodeling and virulence proteins. Previously, we have reported identification and functional analysis of a short motif necessary for export of proteins out of the parasite and into the red blood cell. RESULTS: We have developed software for the prediction of exported proteins in the genus Plasmodium, and identified exported proteins conserved between malaria parasites infecting rodents and the two major causes of human malaria, P. falciparum and P. vivax. This conserved 'exportome' is confined to a few subtelomeric chromosomal regions in P. falciparum and the synteny of these and surrounding regions is conserved in P. vivax. We have identified a novel gene family PHIST (for Plasmodium helical interspersed subtelomeric family) that shares a unique domain with 72 paralogs in P. falciparum and 39 in P. vivax; however, there is only one member in each of the three species studied from the P. berghei lineage. CONCLUSION: These data suggest radiation of genes encoding remodeling and virulence factors from a small number of loci in a common Plasmodium ancestor, and imply a closer phylogenetic relationship between the P. vivax and P. falciparum lineages than previously believed. The presence of a conserved 'exportome' in the genus Plasmodium has important implications for our understanding of both common mechanisms and species-specific differences in host-parasite interactions, and may be crucial in developing novel antimalarial drugs to this infectious disease.
  • Item
    Thumbnail Image
    Rethinking the "diseases of affluence" paradigm: Global patterns of nutritional risks in relation to economic development
    Ezzati, M ; Vander Hoorn, S ; Lawes, CMM ; Leach, R ; James, WPT ; Lopez, AD ; Rodgers, A ; Murray, CJL ; Novotny, T (PUBLIC LIBRARY SCIENCE, 2005-05)
    BACKGROUND: Cardiovascular diseases and their nutritional risk factors--including overweight and obesity, elevated blood pressure, and cholesterol--are among the leading causes of global mortality and morbidity, and have been predicted to rise with economic development. METHODS AND FINDINGS: We examined age-standardized mean population levels of body mass index (BMI), systolic blood pressure, and total cholesterol in relation to national income, food share of household expenditure, and urbanization in a cross-country analysis. Data were from a total of over 100 countries and were obtained from systematic reviews of published literature, and from national and international health agencies. BMI and cholesterol increased rapidly in relation to national income, then flattened, and eventually declined. BMI increased most rapidly until an income of about ID 5,000 (international dollars) and peaked at about ID 12,500 for females and ID 17,000 for males. Cholesterol's point of inflection and peak were at higher income levels than those of BMI (about ID 8,000 and ID 18,000, respectively). There was an inverse relationship between BMI/cholesterol and the food share of household expenditure, and a positive relationship with proportion of population in urban areas. Mean population blood pressure was not correlated or only weakly correlated with the economic factors considered, or with cholesterol and BMI. CONCLUSIONS: When considered together with evidence on shifts in income-risk relationships within developed countries, the results indicate that cardiovascular disease risks are expected to systematically shift to low-income and middle-income countries and, together with the persistent burden of infectious diseases, further increase global health inequalities. Preventing obesity should be a priority from early stages of economic development, accompanied by population-level and personal interventions for blood pressure and cholesterol.
  • Item
    Thumbnail Image
    Evolution and comparative analysis of the MHC Class III inflammatory region
    Deakin, JE ; Papenfuss, AT ; Belov, K ; Cross, JGR ; Coggill, P ; Palmer, S ; Sims, S ; Speed, TP ; Beck, S ; Graves, JAM (BMC, 2006-11-02)
    BACKGROUND: The Major Histocompatibility Complex (MHC) is essential for immune function. Historically, it has been subdivided into three regions (Class I, II, and III), but a cluster of functionally related genes within the Class III region has also been referred to as the Class IV region or "inflammatory region". This group of genes is involved in the inflammatory response, and includes members of the tumour necrosis family. Here we report the sequencing, annotation and comparative analysis of a tammar wallaby BAC containing the inflammatory region. We also discuss the extent of sequence conservation across the entire region and identify elements conserved in evolution. RESULTS: Fourteen Class III genes from the tammar wallaby inflammatory region were characterised and compared to their orthologues in other vertebrates. The organisation and sequence of genes in the inflammatory region of both the wallaby and South American opossum are highly conserved compared to known genes from eutherian ("placental") mammals. Some minor differences separate the two marsupial species. Eight genes within the inflammatory region have remained tightly clustered for at least 360 million years, predating the divergence of the amphibian lineage. Analysis of sequence conservation identified 354 elements that are conserved. These range in size from 7 to 431 bases and cover 15.6% of the inflammatory region, representing approximately a 4-fold increase compared to the average for vertebrate genomes. About 5.5% of this conserved sequence is marsupial-specific, including three cases of marsupial-specific repeats. Highly Conserved Elements were also characterised. CONCLUSION: Using comparative analysis, we show that a cluster of MHC genes involved in inflammation, including TNF, LTA (or its putative teleost homolog TNF-N), APOM, and BAT3 have remained together for over 450 million years, predating the divergence of mammals from fish. The observed enrichment in conserved sequences within the inflammatory region suggests conservation at the transcriptional regulatory level, in addition to the functional level.
  • Item
    Thumbnail Image
    Proximal genomic localization of STATI binding and regulated transcriptional activity
    Wormald, S ; Hilton, DJ ; Smyth, GK ; Speed, TP (BMC, 2006-10-11)
    BACKGROUND: Signal transducer and activator of transcription (STAT) proteins are key regulators of gene expression in response to the interferon (IFN) family of anti-viral and anti-microbial cytokines. We have examined the genomic relationship between STAT1 binding and regulated transcription using multiple tiling microarray and chromatin immunoprecipitation microarray (ChIP-chip) experiments from public repositories. RESULTS: In response to IFN-gamma, STAT1 bound proximally to regions of the genome that exhibit regulated transcriptional activity. This finding was consistent between different tiling microarray platforms, and between different measures of transcriptional activity, including differential binding of RNA polymerase II, and differential mRNA transcription. Re-analysis of tiling microarray data from a recent study of IFN-gamma-induced STAT1 ChIP-chip and mRNA expression revealed that STAT1 binding is tightly associated with localized mRNA transcription in response to IFN-gamma. Close relationships were also apparent between STAT1 binding, STAT2 binding, and mRNA transcription in response to IFN-alpha. Furthermore, we found that sites of STAT1 binding within the Encyclopedia of DNA Elements (ENCODE) region are precisely correlated with sites of either enhanced or diminished binding by the RNA polymerase II complex. CONCLUSION: Together, our results indicate that STAT1 binds proximally to regions of the genome that exhibit regulated transcriptional activity. This finding establishes a generalized basis for the positioning of STAT1 binding sites within the genome, and supports a role for STAT1 in the direct recruitment of the RNA polymerase II complex to the promoters of IFN-gamma-responsive genes.
  • Item
    Thumbnail Image
    Rooting a phylogenetic tree with nonreversible substitution models
    Yap, VB ; Speed, T (BMC, 2005-01-04)
    BACKGROUND: We compared two methods of rooting a phylogenetic tree: the stationary and the nonstationary substitution processes. These methods do not require an outgroup. METHODS: Given a multiple alignment and an unrooted tree, the maximum likelihood estimates of branch lengths and substitution parameters for each associated rooted tree are found; rooted trees are compared using their likelihood values. Site variation in substitution rates is handled by assigning sites into several classes before the analysis. RESULTS: In three test datasets where the trees are small and the roots are assumed known, the nonstationary process gets the correct estimate significantly more often, and fits data much better, than the stationary process. Both processes give biologically plausible root placements in a set of nine primate mitochondrial DNA sequences. CONCLUSIONS: The nonstationary process is simple to use and is much better than the stationary process at inferring the root. It could be useful for situations where an outgroup is unavailable.