School of Mathematics and Statistics - Research Publications
Now showing items 1-12 of 841
Investigating and Correcting Plasma DNA Sequencing Coverage Bias to Enhance Aneuploidy Discovery
(PUBLIC LIBRARY SCIENCE, 2014-01-29)
Pregnant women carry a mixture of cell-free DNA fragments from self and fetus (non-self) in their circulation. In recent years multiple independent studies have demonstrated the ability to detect fetal trisomies such as trisomy 21, the cause of Down syndrome, by Next-Generation Sequencing of maternal plasma. The current clinical tests based on this approach show very high sensitivity and specificity, although as yet they have not become the standard diagnostic test. Here we describe improvements to the analysis of the sequencing data by reducing GC bias and better handling of the genomic repeats. We show substantial improvements in the sensitivity of the standard trisomy 21 statistical tests, which we measure by artificially reducing read coverage. We also explore the bias stemming from the natural cleavage of plasma DNA by examining DNA motifs and position specific base distributions. We propose a model to correct this fragmentation bias and observe that incorporating this bias does not lead to any further improvements in the detection of fetal trisomy. The improved bias corrections that we demonstrate in this work can be readily adopted into existing fetal trisomy detection protocols and should also lead to improvements in sub-chromosomal copy number variation detection.
Loss of Bak enhances lymphocytosis but does not ameliorate thrombocytopaenia in BCL-2 transgenic mice
(NATURE PUBLISHING GROUP, 2014-05-01)
Bax and Bak are critical effectors of apoptosis. Although both are widely expressed and usually functionally redundant, recent studies suggest that Bak has particular importance in certain cell types. Genetic and biochemical studies indicate that Bak activation is prevented primarily by Mcl-1 and Bcl-xL, whereas Bax is held in check by all pro-survival Bcl-2 homologues, including Bcl-2 itself. In this study, we have investigated whether loss of Bak or elevated Mcl-1 modulates haemopoietic abnormalities provoked by overexpression of Bcl-2. The Mcl-1 transgene had little impact, probably because the expression level was insufficient to effectively reduce Bak activation. However, loss of Bak enhanced lymphocytosis in vavP-BCL-2 transgenic mice and increased resistance of their thymocytes to some cytotoxic agents, implying that Bak-specific signals can be triggered in certain lymphoid populations. Nevertheless, lack of Bak had no significant impact on thymic abnormalities in vavP-BCL-2tg mice, which kinetic analysis suggested was due to accumulation of self-reactive thymocytes that resist deletion. Intriguingly, although Bak(-/-) mice have elevated platelet counts, Bak(-/-)vavP-BCL-2 mice, like vavP-BCL-2 littermates, were thrombocytopaenic. To clarify why, the vavP-BCL-2 platelet phenotype was scrutinised more closely. Platelet life span was found to be elevated in vavP-BCL-2 mice, which should have provoked thrombocytosis, as in Bak(-/-) mice. Analysis of bone marrow chimaeric mice suggested the low platelet phenotype was due principally to extrinsic factors. Following splenectomy, blood platelets remained lower in vavP-BCL-2 than wild-type mice. However, in Rag1(-/-) BCL-2tg mice, platelet levels were normal, implying that elevated lymphocytes are primarily responsible for BCL-2tg-induced thrombocytopaenia.
Cell-Type-Specific Transcriptional Profiles of the Dimorphic Pathogen Penicillium marneffei Reflect Distinct Reproductive, Morphological, and Environmental Demands
(GENETICS SOCIETY AMERICA, 2013-11-01)
Penicillium marneffei is an opportunistic human pathogen endemic to Southeast Asia. At 25° P. marneffei grows in a filamentous hyphal form and can undergo asexual development (conidiation) to produce spores (conidia), the infectious agent. At 37° P. marneffei grows in the pathogenic yeast cell form that replicates by fission. Switching between these growth forms, known as dimorphic switching, is dependent on temperature. To understand the process of dimorphic switching and the physiological capacity of the different cell types, two microarray-based profiling experiments covering approximately 42% of the genome were performed. The first experiment compared cells from the hyphal, yeast, and conidiation phases to identify "phase or cell-state-specific" gene expression. The second experiment examined gene expression during the dimorphic switch from one morphological state to another. The data identified a variety of differentially expressed genes that have been organized into metabolic clusters based on predicted function and expression patterns. In particular, C-14 sterol reductase-encoding gene ergM of the ergosterol biosynthesis pathway showed high-level expression throughout yeast morphogenesis compared to hyphal. Deletion of ergM resulted in severe growth defects with increased sensitivity to azole-type antifungal agents but not amphotericin B. The data defined gene classes based on spatio-temporal expression such as those expressed early in the dimorphic switch but not in the terminal cell types and those expressed late. Such classifications have been helpful in linking a given gene of interest to its expression pattern throughout the P. marneffei dimorphic life cycle and its likely role in pathogenicity.
Integration of Steady-State and Temporal Gene Expression Data for the Inference of Gene Regulatory Networks
(PUBLIC LIBRARY SCIENCE, 2013-08-14)
We develop a new regression algorithm, cMIKANA, for inference of gene regulatory networks from combinations of steady-state and time-series gene expression data. Using simulated gene expression datasets to assess the accuracy of reconstructing gene regulatory networks, we show that steady-state and time-series data sets can successfully be combined to identify gene regulatory interactions using the new algorithm. Inferring gene networks from combined data sets was found to be advantageous when using noisy measurements collected with either lower sampling rates or a limited number of experimental replicates. We illustrate our method by applying it to a microarray gene expression dataset from human umbilical vein endothelial cells (HUVECs) which combines time series data from treatment with growth factor TNF and steady state data from siRNA knockdown treatments. Our results suggest that the combination of steady-state and time-series datasets may provide better prediction of RNA-to-RNA interactions, and may also reveal biological features that cannot be identified from dynamic or steady state information alone. Finally, we consider the experimental design of genomics experiments for gene regulatory network inference and show that network inference can be improved by incorporating steady-state measurements with time-series data.
Separate-channel analysis of two-channel microarrays: recovering inter-spot information
BACKGROUND: Two-channel (or two-color) microarrays are cost-effective platforms for comparative analysis of gene expression. They are traditionally analysed in terms of the log-ratios (M-values) of the two channel intensities at each spot, but this analysis does not use all the information available in the separate channel observations. Mixed models have been proposed to analyse intensities from the two channels as separate observations, but such models can be complex to use and the gain in efficiency over the log-ratio analysis is difficult to quantify. Mixed models yield test statistics for the null distributions can be specified only approximately, and some approaches do not borrow strength between genes. RESULTS: This article reformulates the mixed model to clarify the relationship with the traditional log-ratio analysis, to facilitate information borrowing between genes, and to obtain an exact distributional theory for the resulting test statistics. The mixed model is transformed to operate on the M-values and A-values (average log-expression for each spot) instead of on the log-expression values. The log-ratio analysis is shown to ignore information contained in the A-values. The relative efficiency of the log-ratio analysis is shown to depend on the size of the intraspot correlation. A new separate channel analysis method is proposed that assumes a constant intra-spot correlation coefficient across all genes. This approach permits the mixed model to be transformed into an ordinary linear model, allowing the data analysis to use a well-understood empirical Bayes analysis pipeline for linear modeling of microarray data. This yields statistically powerful test statistics that have an exact distributional theory. The log-ratio, mixed model and common correlation methods are compared using three case studies. The results show that separate channel analyses that borrow strength between genes are more powerful than log-ratio analyses. The common correlation analysis is the most powerful of all. CONCLUSIONS: The common correlation method proposed in this article for separate-channel analysis of two-channel microarray data is no more difficult to apply in practice than the traditional log-ratio analysis. It provides an intuitive and powerful means to conduct analyses and make comparisons that might otherwise not be possible.
HDAC inhibitors induce tumor-cell-selective pro-apoptotic transcriptional responses
(NATURE PUBLISHING GROUP, 2013-02-01)
The identification of recurrent somatic mutations in genes encoding epigenetic enzymes has provided a strong rationale for the development of compounds that target the epigenome for the treatment of cancer. This notion is supported by biochemical studies demonstrating aberrant recruitment of epigenetic enzymes such as histone deacetylases (HDACs) and histone methyltransferases to promoter regions through association with oncogenic fusion proteins such as PML-RARα and AML1-ETO. HDAC inhibitors (HDACi) are potent inducers of tumor cell apoptosis; however, it remains unclear why tumor cells are more sensitive to HDACi-induced cell death than normal cells. Herein, we assessed the biological and molecular responses of isogenic normal and transformed cells to the FDA-approved HDACi vorinostat and romidepsin. Both HDACi selectively killed cells of diverse tissue origin that had been transformed through the serial introduction of different oncogenes. Time-course microarray expression profiling revealed that normal and transformed cells transcriptionally responded to vorinostat treatment. Over 4200 genes responded differently to vorinostat in normal and transformed cells and gene ontology and pathway analyses identified a tumor-cell-selective pro-apoptotic gene-expression signature that consisted of BCL2 family genes. In particular, HDACi induced tumor-cell-selective upregulation of the pro-apoptotic gene BMF and downregulation of the pro-survival gene BCL2A1 encoding BFL-1. Maintenance of BFL-1 levels in transformed cells through forced expression conferred vorinostat resistance, indicating that specific and selective engagement of the intrinsic apoptotic pathway underlies the tumor-cell-selective apoptotic activities of these agents. The ability of HDACi to affect the growth and survival of tumor cells whilst leaving normal cells relatively unharmed is fundamental to their successful clinical application. This study provides new insight into the transcriptional effects of HDACi in human donor-matched normal and transformed cells, and implicates specific molecules and pathways in the tumor-selective cytotoxic activity of these compounds.
Polycomb repressive complex 2 (PRC2) restricts hematopoietic stem cell activity
(PUBLIC LIBRARY SCIENCE, 2008-04-01)
Polycomb group proteins are transcriptional repressors that play a central role in the establishment and maintenance of gene expression patterns during development. Using mice with an N-ethyl-N-nitrosourea (ENU)-induced mutation in Suppressor of Zeste 12 (Suz12), a core component of Polycomb Repressive Complex 2 (PRC2), we show here that loss of Suz12 function enhances hematopoietic stem cell (HSC) activity. In addition to these effects on a wild-type genetic background, mutations in Suz12 are sufficient to ameliorate the stem cell defect and thrombocytopenia present in mice that lack the thrombopoietin receptor (c-Mpl). To investigate the molecular targets of the PRC2 complex in the HSC compartment, we examined changes in global patterns of gene expression in cells deficient in Suz12. We identified a distinct set of genes that are regulated by Suz12 in hematopoietic cells, including eight genes that appear to be highly responsive to PRC2 function within this compartment. These data suggest that PRC2 is required to maintain a specific gene expression pattern in hematopoiesis that is indispensable to normal stem cell function.
A Mouse Model of Harlequin Ichthyosis Delineates a Key Role for Abca12 in Lipid Homeostasis
(PUBLIC LIBRARY SCIENCE, 2008-09-01)
Harlequin Ichthyosis (HI) is a severe and often lethal hyperkeratotic skin disease caused by mutations in the ABCA12 transport protein. In keratinocytes, ABCA12 is thought to regulate the transfer of lipids into small intracellular trafficking vesicles known as lamellar bodies. However, the nature and scope of this regulation remains unclear. As part of an original recessive mouse ENU mutagenesis screen, we have identified and characterised an animal model of HI and showed that it displays many of the hallmarks of the disease including hyperkeratosis, loss of barrier function, and defects in lipid homeostasis. We have used this model to follow disease progression in utero and present evidence that loss of Abca12 function leads to premature differentiation of basal keratinocytes. A comprehensive analysis of lipid levels in mutant epidermis demonstrated profound defects in lipid homeostasis, illustrating for the first time the extent to which Abca12 plays a pivotal role in maintaining lipid balance in the skin. To further investigate the scope of Abca12's activity, we have utilised cells from the mutant mouse to ascribe direct transport functions to the protein and, in doing so, we demonstrate activities independent of its role in lamellar body function. These cells have severely impaired lipid efflux leading to intracellular accumulation of neutral lipids. Furthermore, we identify Abca12 as a mediator of Abca1-regulated cellular cholesterol efflux, a finding that may have significant implications for other diseases of lipid metabolism and homeostasis, including atherosclerosis.
Statistical analysis of an RNA titration series evaluates microarray precision and sensitivity on a whole-array basis
BACKGROUND: Concerns are often raised about the accuracy of microarray technologies and the degree of cross-platform agreement, but there are yet no methods which can unambiguously evaluate precision and sensitivity for these technologies on a whole-array basis. RESULTS: A methodology is described for evaluating the precision and sensitivity of whole-genome gene expression technologies such as microarrays. The method consists of an easy-to-construct titration series of RNA samples and an associated statistical analysis using non-linear regression. The method evaluates the precision and responsiveness of each microarray platform on a whole-array basis, i.e., using all the probes, without the need to match probes across platforms. An experiment is conducted to assess and compare four widely used microarray platforms. All four platforms are shown to have satisfactory precision but the commercial platforms are superior for resolving differential expression for genes at lower expression levels. The effective precision of the two-color platforms is improved by allowing for probe-specific dye-effects in the statistical model. The methodology is used to compare three data extraction algorithms for the Affymetrix platforms, demonstrating poor performance for the commonly used proprietary algorithm relative to the other algorithms. For probes which can be matched across platforms, the cross-platform variability is decomposed into within-platform and between-platform components, showing that platform disagreement is almost entirely systematic rather than due to measurement variability. CONCLUSION: The results demonstrate good precision and sensitivity for all the platforms, but highlight the need for improved probe annotation. They quantify the extent to which cross-platform measures can be expected to be less accurate than within-platform comparisons for predicting disease progression or outcome.
Neither loss of Bik alone, nor combined loss of Bik and Noxa, accelerate murine lymphoma development or render lymphoma cells resistant to DNA damaging drugs
(NATURE PUBLISHING GROUP, 2012-05-01)
The pro-apoptotic BH3-only protein, BIK, is widely expressed and although many critical functions in developmental or stress-induced death have been ascribed to this protein, mice lacking Bik display no overt abnormalities. It has been postulated that Bik can serve as a tumour suppressor, on the basis that its deficiency and loss of apoptotic function have been reported in many human cancers, including lymphoid malignancies. Evasion of apoptosis is a major factor contributing to c-Myc-induced tumour development, but despite this, we found that Bik deficiency did not accelerate Eμ-Myc-induced lymphomagenesis. Co-operation between BIK and NOXA, another BH3-only protein, has been previously described, and was attributed to their complementary binding specificities to distinct subsets of pro-survival BCL-2 family proteins. Nevertheless, combined deficiency of Bik and Noxa did not alter the onset of Eμ-Myc transgene induced lymphoma development. Moreover, although p53-mediated induction of Bik has been reported, neither Eμ-Myc/Bik(-/-) nor Eμ-Myc/Bik(-/-)Noxa(-/-) lymphomas were more resistant than control Eμ-Myc lymphomas to killing by DNA damaging drugs, either in vitro or in vivo. These results suggest that Bik, even in combination with Noxa, is not a potent suppressor of c-Myc-driven tumourigenesis or critical for chemotherapeutic drug-induced killing of Myc-driven tumours.
The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote
(OXFORD UNIV PRESS, 2013-05-01)
Read alignment is an ongoing challenge for the analysis of data from sequencing technologies. This article proposes an elegantly simple multi-seed strategy, called seed-and-vote, for mapping reads to a reference genome. The new strategy chooses the mapped genomic location for the read directly from the seeds. It uses a relatively large number of short seeds (called subreads) extracted from each read and allows all the seeds to vote on the optimal location. When the read length is <160 bp, overlapping subreads are used. More conventional alignment algorithms are then used to fill in detailed mismatch and indel information between the subreads that make up the winning voting block. The strategy is fast because the overall genomic location has already been chosen before the detailed alignment is done. It is sensitive because no individual subread is required to map exactly, nor are individual subreads constrained to map close by other subreads. It is accurate because the final location must be supported by several different subreads. The strategy extends easily to find exon junctions, by locating reads that contain sets of subreads mapping to different exons of the same gene. It scales up efficiently for longer reads.
Identity-by-Descent Mapping to Detect Rare Variants Conferring Susceptibility to Multiple Sclerosis
(PUBLIC LIBRARY SCIENCE, 2013-03-05)
Genome-wide association studies (GWAS) have identified around 60 common variants associated with multiple sclerosis (MS), but these loci only explain a fraction of the heritability of MS. Some missing heritability may be caused by rare variants that have been suggested to play an important role in the aetiology of complex diseases such as MS. However current genetic and statistical methods for detecting rare variants are expensive and time consuming. 'Population-based linkage analysis' (PBLA) or so called identity-by-descent (IBD) mapping is a novel way to detect rare variants in extant GWAS datasets. We employed BEAGLE fastIBD to search for rare MS variants utilising IBD mapping in a large GWAS dataset of 3,543 cases and 5,898 controls. We identified a genome-wide significant linkage signal on chromosome 19 (LOD = 4.65; p = 1.9×10(-6)). Network analysis of cases and controls sharing haplotypes on chromosome 19 further strengthened the association as there are more large networks of cases sharing haplotypes than controls. This linkage region includes a cluster of zinc finger genes of unknown function. Analysis of genome wide transcriptome data suggests that genes in this zinc finger cluster may be involved in very early developmental regulation of the CNS. Our study also indicates that BEAGLE fastIBD allowed identification of rare variants in large unrelated population with moderate computational intensity. Even with the development of whole-genome sequencing, IBD mapping still may be a promising way to narrow down the region of interest for sequencing priority.