School of BioSciences - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 3 of 3
  • Item
    Thumbnail Image
    Using likelihood-free inference to compare evolutionary dynamics of the protein networks of H. pylori and P. falciparum
    Ratmann, O ; Jorgensen, O ; Hinkley, T ; Stumpf, M ; Richardson, S ; Wiuf, C ; Bonhoeffer, S (PUBLIC LIBRARY SCIENCE, 2007-11)
    Gene duplication with subsequent interaction divergence is one of the primary driving forces in the evolution of genetic systems. Yet little is known about the precise mechanisms and the role of duplication divergence in the evolution of protein networks from the prokaryote and eukaryote domains. We developed a novel, model-based approach for Bayesian inference on biological network data that centres on approximate Bayesian computation, or likelihood-free inference. Instead of computing the intractable likelihood of the protein network topology, our method summarizes key features of the network and, based on these, uses a MCMC algorithm to approximate the posterior distribution of the model parameters. This allowed us to reliably fit a flexible mixture model that captures hallmarks of evolution by gene duplication and subfunctionalization to protein interaction network data of Helicobacter pylori and Plasmodium falciparum. The 80% credible intervals for the duplication-divergence component are [0.64, 0.98] for H. pylori and [0.87, 0.99] for P. falciparum. The remaining parameter estimates are not inconsistent with sequence data. An extensive sensitivity analysis showed that incompleteness of PIN data does not largely affect the analysis of models of protein network evolution, and that the degree sequence alone barely captures the evolutionary footprints of protein networks relative to other statistics. Our likelihood-free inference approach enables a fully Bayesian analysis of a complex and highly stochastic system that is otherwise intractable at present. Modelling the evolutionary history of PIN data, it transpires that only the simultaneous analysis of several global aspects of protein networks enables credible and consistent inference to be made from available datasets. Our results indicate that gene duplication has played a larger part in the network evolution of the eukaryote than in the prokaryote, and suggests that single gene duplications with immediate divergence alone may explain more than 60% of biological network data in both domains.
  • Item
    Thumbnail Image
    SNPSTR: a database of compound microsatellite-SNP markers
    Agrafioti, I ; Stumpf, MPH (OXFORD UNIV PRESS, 2007-01)
    There has been widespread and growing interest in genetic markers suitable for drawing population genetic inferences about past demographic events and to detect the effects of selection. In addition to single nucleotide polymorphisms (SNPs), microsatellites (or short tandem repeats, STRs) have received great attention in the analysis of human population history. In the SNPSTR database (http://www.imperial.ac.uk/theoreticalgenomics/data-software) we catalogue a relatively new type of compound genetic marker called SNPSTR which combines a microsatellite marker (STR) with one or more tightly linked SNPs. Here, the SNP(s) and the microsatellite are less than 250 bp apart so each SNPSTR can be considered a small haplotype with no recombination occurring between the two individual markers. Thus, SNPSTRs have the potential to become a very useful tool in the field of population genetics. The SNPSTR database contains all inferable human SNPSTRs as well as those in mouse, rat, dog and chicken, i.e. all model organisms for which extensive SNP datasets are available.
  • Item
    Thumbnail Image
    Generating confidence intervals on biological networks
    Thorne, T ; Stumpf, MPH (BMC, 2007-11-30)
    BACKGROUND: In the analysis of networks we frequently require the statistical significance of some network statistic, such as measures of similarity for the properties of interacting nodes. The structure of the network may introduce dependencies among the nodes and it will in general be necessary to account for these dependencies in the statistical analysis. To this end we require some form of Null model of the network: generally rewired replicates of the network are generated which preserve only the degree (number of interactions) of each node. We show that this can fail to capture important features of network structure, and may result in unrealistic significance levels, when potentially confounding additional information is available. METHODS: We present a new network resampling Null model which takes into account the degree sequence as well as available biological annotations. Using gene ontology information as an illustration we show how this information can be accounted for in the resampling approach, and the impact such information has on the assessment of statistical significance of correlations and motif-abundances in the Saccharomyces cerevisiae protein interaction network. An algorithm, GOcardShuffle, is introduced to allow for the efficient construction of an improved Null model for network data. RESULTS: We use the protein interaction network of S. cerevisiae; correlations between the evolutionary rates and expression levels of interacting proteins and their statistical significance were assessed for Null models which condition on different aspects of the available data. The novel GOcardShuffle approach results in a Null model for annotated network data which appears better to describe the properties of real biological networks. CONCLUSION: An improved statistical approach for the statistical analysis of biological network data, which conditions on the available biological information, leads to qualitatively different results compared to approaches which ignore such annotations. In particular we demonstrate the effects of the biological organization of the network can be sufficient to explain the observed similarity of interacting proteins.