School of BioSciences - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 14
  • Item
    No Preview Available
    What the papers say: text mining for genomics and systems biology.
    Harmston, N ; Filsell, W ; Stumpf, MPH (Springer Science and Business Media LLC, 2010-10)
    Keeping up with the rapidly growing literature has become virtually impossible for most scientists. This can have dire consequences. First, we may waste research time and resources on reinventing the wheel simply because we can no longer maintain a reliable grasp on the published literature. Second, and perhaps more detrimental, judicious (or serendipitous) combination of knowledge from different scientific disciplines, which would require following disparate and distinct research literatures, is rapidly becoming impossible for even the most ardent readers of research publications. Text mining - the automated extraction of information from (electronically) published sources - could potentially fulfil an important role - but only if we know how to harness its strengths and overcome its weaknesses. As we do not expect that the rate at which scientific results are published will decrease, text mining tools are now becoming essential in order to cope with, and derive maximum benefit from, this information explosion. In genomics, this is particularly pressing as more and more rare disease-causing variants are found and need to be understood. Not being conversant with this technology may put scientists and biomedical regulators at a severe disadvantage. In this review, we introduce the basic concepts underlying modern text mining and its applications in genomics and systems biology. We hope that this review will serve three purposes: (i) to provide a timely and useful overview of the current status of this field, including a survey of present challenges; (ii) to enable researchers to decide how and when to apply text mining tools in their own research; and (iii) to highlight how the research communities in genomics and systems biology can help to make text mining from biomedical abstracts and texts more straightforward.
  • Item
    Thumbnail Image
    From qualitative data to quantitative models: analysis of the phage shock protein stress response in Escherichia coli
    Toni, T ; Jovanovic, G ; Huvet, M ; Buck, M ; Stumpf, MPH (BMC, 2011-05-12)
    BACKGROUND: Bacteria have evolved a rich set of mechanisms for sensing and adapting to adverse conditions in their environment. These are crucial for their survival, which requires them to react to extracellular stresses such as heat shock, ethanol treatment or phage infection. Here we focus on studying the phage shock protein (Psp) stress response in Escherichia coli induced by a phage infection or other damage to the bacterial membrane. This system has not yet been theoretically modelled or analysed in silico. RESULTS: We develop a model of the Psp response system, and illustrate how such models can be constructed and analyzed in light of available sparse and qualitative information in order to generate novel biological hypotheses about their dynamical behaviour. We analyze this model using tools from Petri-net theory and study its dynamical range that is consistent with currently available knowledge by conditioning model parameters on the available data in an approximate Bayesian computation (ABC) framework. Within this ABC approach we analyze stochastic and deterministic dynamics. This analysis allows us to identify different types of behaviour and these mechanistic insights can in turn be used to design new, more detailed and time-resolved experiments. CONCLUSIONS: We have developed the first mechanistic model of the Psp response in E. coli. This model allows us to predict the possible qualitative stochastic and deterministic dynamic behaviours of key molecular players in the stress response. Our inferential approach can be applied to stress response and signalling systems more generally: in the ABC framework we can condition mathematical models on qualitative data in order to delimit e.g. parameter ranges or the qualitative system dynamics in light of available end-point or qualitative information.
  • Item
    Thumbnail Image
    A systems biology analysis of long and short-term memories of osmotic stress adaptation in fungi.
    You, T ; Ingram, P ; Jacobsen, MD ; Cook, E ; McDonagh, A ; Thorne, T ; Lenardon, MD ; de Moura, APS ; Romano, MC ; Thiel, M ; Stumpf, M ; Gow, NAR ; Haynes, K ; Grebogi, C ; Stark, J ; Brown, AJP (Springer Science and Business Media LLC, 2012-05-25)
    BACKGROUND: Saccharomyces cerevisiae senses hyperosmotic conditions via the HOG signaling network that activates the stress-activated protein kinase, Hog1, and modulates metabolic fluxes and gene expression to generate appropriate adaptive responses. The integral control mechanism by which Hog1 modulates glycerol production remains uncharacterized. An additional Hog1-independent mechanism retains intracellular glycerol for adaptation. Candida albicans also adapts to hyperosmolarity via a HOG signaling network. However, it remains unknown whether Hog1 exerts integral or proportional control over glycerol production in C. albicans. RESULTS: We combined modeling and experimental approaches to study osmotic stress responses in S. cerevisiae and C. albicans. We propose a simple ordinary differential equation (ODE) model that highlights the integral control that Hog1 exerts over glycerol biosynthesis in these species. If integral control arises from a separation of time scales (i.e. rapid HOG activation of glycerol production capacity which decays slowly under hyperosmotic conditions), then the model predicts that glycerol production rates elevate upon adaptation to a first stress and this makes the cell adapts faster to a second hyperosmotic stress. It appears as if the cell is able to remember the stress history that is longer than the timescale of signal transduction. This is termed the long-term stress memory. Our experimental data verify this. Like S. cerevisiae, C. albicans mimimizes glycerol efflux during adaptation to hyperosmolarity. Also, transient activation of intermediate kinases in the HOG pathway results in a short-term memory in the signaling pathway. This determines the amplitude of Hog1 phosphorylation under a periodic sequence of stress and non-stressed intervals. Our model suggests that the long-term memory also affects the way a cell responds to periodic stress conditions. Hence, during osmohomeostasis, short-term memory is dependent upon long-term memory. This is relevant in the context of fungal responses to dynamic and changing environments. CONCLUSIONS: Our experiments and modeling have provided an example of identifying integral control that arises from time-scale separation in different processes, which is an important functional module in various contexts.
  • Item
    Thumbnail Image
    Trees on networks: resolving statistical patterns of phylogenetic similarities among interacting proteins
    Kelly, WP ; Stumpf, MPH (BMC, 2010-09-20)
    BACKGROUND: Phylogenies capture the evolutionary ancestry linking extant species. Correlations and similarities among a set of species are mediated by and need to be understood in terms of the phylogenic tree. In a similar way it has been argued that biological networks also induce correlations among sets of interacting genes or their protein products. RESULTS: We develop suitable statistical resampling schemes that can incorporate these two potential sources of correlation into a single inferential framework. To illustrate our approach we apply it to protein interaction data in yeast and investigate whether the phylogenetic trees of interacting proteins in a panel of yeast species are more similar than would be expected by chance. CONCLUSIONS: While we find only negligible evidence for such increased levels of similarities, our statistical approach allows us to resolve the previously reported contradictory results on the levels of co-evolution induced by protein-protein interactions. We conclude with a discussion as to how we may employ the statistical framework developed here in further functional and evolutionary analyses of biological networks and systems.
  • Item
    Thumbnail Image
    Designing attractive models via automated identification of chaotic and oscillatory dynamical regimes
    Silk, D ; Kirk, PDW ; Barnes, CP ; Toni, T ; Rose, A ; Moon, S ; Dallman, MJ ; Stumpf, MPH (NATURE PUBLISHING GROUP, 2011-10)
    Chaos and oscillations continue to capture the interest of both the scientific and public domains. Yet despite the importance of these qualitative features, most attempts at constructing mathematical models of such phenomena have taken an indirect, quantitative approach, for example, by fitting models to a finite number of data points. Here we develop a qualitative inference framework that allows us to both reverse-engineer and design systems exhibiting these and other dynamical behaviours by directly specifying the desired characteristics of the underlying dynamical attractor. This change in perspective from quantitative to qualitative dynamics, provides fundamental and new insights into the properties of dynamical systems.
  • Item
    Thumbnail Image
    GPU accelerated biochemical network simulation
    Zhou, Y ; Liepe, J ; Sheng, X ; Stumpf, MPH ; Barnes, C (OXFORD UNIV PRESS, 2011-03-15)
    MOTIVATION: Mathematical modelling is central to systems and synthetic biology. Using simulations to calculate statistics or to explore parameter space is a common means for analysing these models and can be computationally intensive. However, in many cases, the simulations are easily parallelizable. Graphics processing units (GPUs) are capable of efficiently running highly parallel programs and outperform CPUs in terms of raw computing power. Despite their computational advantages, their adoption by the systems biology community is relatively slow, since differences in hardware architecture between GPUs and CPUs complicate the porting of existing code. RESULTS: We present a Python package, cuda-sim, that provides highly parallelized algorithms for the repeated simulation of biochemical network models on NVIDIA CUDA GPUs. Algorithms are implemented for the three popular types of model formalisms: the LSODA algorithm for ODE integration, the Euler-Maruyama algorithm for SDE simulation and the Gillespie algorithm for MJP simulation. No knowledge of GPU computing is required from the user. Models can be specified in SBML format or provided as CUDA code. For running a large number of simulations in parallel, up to 360-fold decrease in simulation runtime is attained when compared to single CPU implementations. AVAILABILITY: http://cuda-sim.sourceforge.net/
  • Item
    Thumbnail Image
    Statistical inference of the time-varying structure of gene-regulation networks
    Lebre, S ; Becq, J ; Devaux, F ; Stumpf, MPH ; Lelandais, G (BMC, 2010-09-22)
    BACKGROUND: Biological networks are highly dynamic in response to environmental and physiological cues. This variability is in contrast to conventional analyses of biological networks, which have overwhelmingly employed static graph models which stay constant over time to describe biological systems and their underlying molecular interactions. METHODS: To overcome these limitations, we propose here a new statistical modelling framework, the ARTIVA formalism (Auto Regressive TIme VArying models), and an associated inferential procedure that allows us to learn temporally varying gene-regulation networks from biological time-course expression data. ARTIVA simultaneously infers the topology of a regulatory network and how it changes over time. It allows us to recover the chronology of regulatory associations for individual genes involved in a specific biological process (development, stress response, etc.). RESULTS: We demonstrate that the ARTIVA approach generates detailed insights into the function and dynamics of complex biological systems and exploits efficiently time-course data in systems biology. In particular, two biological scenarios are analyzed: the developmental stages of Drosophila melanogaster and the response of Saccharomyces cerevisiae to benomyl poisoning. CONCLUSIONS: ARTIVA does recover essential temporal dependencies in biological systems from transcriptional data, and provide a natural starting point to learn and investigate their dynamics in greater detail.
  • Item
    Thumbnail Image
    Graph spectral analysis of protein interaction network evolution
    Thorne, T ; Stumpf, MPH (ROYAL SOC, 2012-10-07)
    We present an analysis of protein interaction network data via the comparison of models of network evolution to the observed data. We take a bayesian approach and perform posterior density estimation using an approximate bayesian computation with sequential Monte Carlo method. Our approach allows us to perform model selection over a selection of potential network growth models. The methodology we apply uses a distance defined in terms of graph spectra which captures the network data more naturally than previously used summary statistics such as the degree distribution. Furthermore, we include the effects of sampling into the analysis, to properly correct for the incompleteness of existing datasets, and have analysed the performance of our method under various degrees of sampling. We consider a number of models focusing not only on the biologically relevant class of duplication models, but also including models of scale-free network growth that have previously been claimed to describe such data. We find a preference for a duplication-divergence with linear preferential attachment model in the majority of the interaction datasets considered. We also illustrate how our method can be used to perform multi-model inference of network parameters to estimate properties of the full network from sampled data.
  • Item
    Thumbnail Image
    Inference of temporally varying Bayesian networks.
    Thorne, T ; Stumpf, MPH (Oxford University Press (OUP), 2012-12-15)
    MOTIVATION: When analysing gene expression time series data, an often overlooked but crucial aspect of the model is that the regulatory network structure may change over time. Although some approaches have addressed this problem previously in the literature, many are not well suited to the sequential nature of the data. RESULTS: Here, we present a method that allows us to infer regulatory network structures that may vary between time points, using a set of hidden states that describe the network structure at a given time point. To model the distribution of the hidden states, we have applied the Hierarchical Dirichlet Process Hidden Markov Model, a non-parametric extension of the traditional Hidden Markov Model, which does not require us to fix the number of hidden states in advance. We apply our method to existing microarray expression data as well as demonstrating is efficacy on simulated test data.
  • Item
    Thumbnail Image
    ABC-SysBio-approximate Bayesian computation in Python with GPU support
    Liepe, J ; Barnes, C ; Cule, E ; Erguler, K ; Kirk, P ; Toni, T ; Stumpf, MPH (OXFORD UNIV PRESS, 2010-07-15)
    MOTIVATION: The growing field of systems biology has driven demand for flexible tools to model and simulate biological systems. Two established problems in the modeling of biological processes are model selection and the estimation of associated parameters. A number of statistical approaches, both frequentist and Bayesian, have been proposed to answer these questions. RESULTS: Here we present a Python package, ABC-SysBio, that implements parameter inference and model selection for dynamical systems in an approximate Bayesian computation (ABC) framework. ABC-SysBio combines three algorithms: ABC rejection sampler, ABC SMC for parameter inference and ABC SMC for model selection. It is designed to work with models written in Systems Biology Markup Language (SBML). Deterministic and stochastic models can be analyzed in ABC-SysBio. AVAILABILITY: http://abc-sysbio.sourceforge.net