School of BioSciences - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 11
  • Item
    No Preview Available
    The extent and importance of intragenic recombination.
    de Silva, E ; Kelley, LA ; Stumpf, MPH (Springer Science and Business Media LLC, 2004-11)
    We have studied the recombination rate behaviour of a set of 140 genes which were investigated for their potential importance in inflammatory disease. Each gene was extensively sequenced in 24 individuals of African descent and 23 individuals of European descent, and the recombination process was studied separately in the two population samples. The results obtained from the two populations were highly correlated, suggesting that demographic bias does not affect our population genetic estimation procedure. We found evidence that levels of recombination correlate with levels of nucleotide diversity. High marker density allowed us to study recombination rate variation on a very fine spatial scale. We found that about 40 per cent of genes showed evidence of uniform recombination, while approximately 12 per cent of genes carried distinct signatures of recombination hotspots. On studying the locations of these hotspots, we found that they are not always confined to introns but can also stretch across exons. An investigation of the protein products of these genes suggested that recombination hotspots can sometimes separate exons belonging to different protein domains; however, this occurs much less frequently than might be expected based on evolutionary studies into the origins of recombination. This suggests that evolutionary analysis of the recombination process is greatly aided by considering nucleotide sequences and protein products jointly.
  • Item
    No Preview Available
    Evolution of pathogenicity and sexual reproduction in eight Candida genomes.
    Butler, G ; Rasmussen, MD ; Lin, MF ; Santos, MAS ; Sakthikumar, S ; Munro, CA ; Rheinbay, E ; Grabherr, M ; Forche, A ; Reedy, JL ; Agrafioti, I ; Arnaud, MB ; Bates, S ; Brown, AJP ; Brunke, S ; Costanzo, MC ; Fitzpatrick, DA ; de Groot, PWJ ; Harris, D ; Hoyer, LL ; Hube, B ; Klis, FM ; Kodira, C ; Lennard, N ; Logue, ME ; Martin, R ; Neiman, AM ; Nikolaou, E ; Quail, MA ; Quinn, J ; Santos, MC ; Schmitzberger, FF ; Sherlock, G ; Shah, P ; Silverstein, KAT ; Skrzypek, MS ; Soll, D ; Staggs, R ; Stansfield, I ; Stumpf, MPH ; Sudbery, PE ; Srikantha, T ; Zeng, Q ; Berman, J ; Berriman, M ; Heitman, J ; Gow, NAR ; Lorenz, MC ; Birren, BW ; Kellis, M ; Cuomo, CA (Springer Science and Business Media LLC, 2009-06-04)
    Candida species are the most common cause of opportunistic fungal infection worldwide. Here we report the genome sequences of six Candida species and compare these and related pathogens and non-pathogens. There are significant expansions of cell wall, secreted and transporter gene families in pathogenic species, suggesting adaptations associated with virulence. Large genomic tracts are homozygous in three diploid species, possibly resulting from recent recombination events. Surprisingly, key components of the mating and meiosis pathways are missing from several species. These include major differences at the mating-type loci (MTL); Lodderomyces elongisporus lacks MTL, and components of the a1/2 cell identity determinant were lost in other species, raising questions about how mating and cell types are controlled. Analysis of the CUG leucine-to-serine genetic-code change reveals that 99% of ancestral CUG codons were erased and new ones arose elsewhere. Lastly, we revise the Candida albicans gene catalogue, identifying many new genes.
  • Item
    Thumbnail Image
    Gaussian process regression bootstrapping: exploring the effects of uncertainty in time course data.
    Kirk, PDW ; Stumpf, MPH (Oxford University Press (OUP), 2009-05-15)
    MOTIVATION: Although widely accepted that high-throughput biological data are typically highly noisy, the effects that this uncertainty has upon the conclusions we draw from these data are often overlooked. However, in order to assign any degree of confidence to our conclusions, we must quantify these effects. Bootstrap resampling is one method by which this may be achieved. Here, we present a parametric bootstrapping approach for time-course data, in which Gaussian process regression (GPR) is used to fit a probabilistic model from which replicates may then be drawn. This approach implicitly allows the time dependence of the data to be taken into account, and is applicable to a wide range of problems. RESULTS: We apply GPR bootstrapping to two datasets from the literature. In the first example, we show how the approach may be used to investigate the effects of data uncertainty upon the estimation of parameters in an ordinary differential equations (ODE) model of a cell signalling pathway. Although we find that the parameter estimates inferred from the original dataset are relatively robust to data uncertainty, we also identify a distinct second set of estimates. In the second example, we use our method to show that the topology of networks constructed from time-course gene expression data appears to be sensitive to data uncertainty, although there may be individual edges in the network that are robust in light of present data. AVAILABILITY: Matlab code for performing GPR bootstrapping is available from our web site: http://www3.imperial.ac.uk/theoreticalsystemsbiology/data-software/.
  • Item
    Thumbnail Image
    The effects of incomplete protein interaction data on structural and evolutionary inferences.
    de Silva, E ; Thorne, T ; Ingram, P ; Agrafioti, I ; Swire, J ; Wiuf, C ; Stumpf, MPH (Springer Science and Business Media LLC, 2006-11-03)
    BACKGROUND: Present protein interaction network data sets include only interactions among subsets of the proteins in an organism. Previously this has been ignored, but in principle any global network analysis that only looks at partial data may be biased. Here we demonstrate the need to consider network sampling properties explicitly and from the outset in any analysis. RESULTS: Here we study how properties of the yeast protein interaction network are affected by random and non-random sampling schemes using a range of different network statistics. Effects are shown to be independent of the inherent noise in protein interaction data. The effects of the incomplete nature of network data become very noticeable, especially for so-called network motifs. We also consider the effect of incomplete network data on functional and evolutionary inferences. CONCLUSION: Crucially, when only small, partial network data sets are considered, bias is virtually inevitable. Given the scope of effects considered here, previous analyses may have to be carefully reassessed: ignoring the fact that present network data are incomplete will severely affect our ability to understand biological systems.
  • Item
    Thumbnail Image
    Using likelihood-free inference to compare evolutionary dynamics of the protein networks of H. pylori and P. falciparum.
    Ratmann, O ; Jørgensen, O ; Hinkley, T ; Stumpf, M ; Richardson, S ; Wiuf, C ; Bonhoeffer, S (Public Library of Science (PLoS), 2007-11)
    Gene duplication with subsequent interaction divergence is one of the primary driving forces in the evolution of genetic systems. Yet little is known about the precise mechanisms and the role of duplication divergence in the evolution of protein networks from the prokaryote and eukaryote domains. We developed a novel, model-based approach for Bayesian inference on biological network data that centres on approximate Bayesian computation, or likelihood-free inference. Instead of computing the intractable likelihood of the protein network topology, our method summarizes key features of the network and, based on these, uses a MCMC algorithm to approximate the posterior distribution of the model parameters. This allowed us to reliably fit a flexible mixture model that captures hallmarks of evolution by gene duplication and subfunctionalization to protein interaction network data of Helicobacter pylori and Plasmodium falciparum. The 80% credible intervals for the duplication-divergence component are [0.64, 0.98] for H. pylori and [0.87, 0.99] for P. falciparum. The remaining parameter estimates are not inconsistent with sequence data. An extensive sensitivity analysis showed that incompleteness of PIN data does not largely affect the analysis of models of protein network evolution, and that the degree sequence alone barely captures the evolutionary footprints of protein networks relative to other statistics. Our likelihood-free inference approach enables a fully Bayesian analysis of a complex and highly stochastic system that is otherwise intractable at present. Modelling the evolutionary history of PIN data, it transpires that only the simultaneous analysis of several global aspects of protein networks enables credible and consistent inference to be made from available datasets. Our results indicate that gene duplication has played a larger part in the network evolution of the eukaryote than in the prokaryote, and suggests that single gene duplications with immediate divergence alone may explain more than 60% of biological network data in both domains.
  • Item
    Thumbnail Image
    SNPSTR: a database of compound microsatellite-SNP markers.
    Agrafioti, I ; Stumpf, MPH (Oxford University Press (OUP), 2007-01)
    There has been widespread and growing interest in genetic markers suitable for drawing population genetic inferences about past demographic events and to detect the effects of selection. In addition to single nucleotide polymorphisms (SNPs), microsatellites (or short tandem repeats, STRs) have received great attention in the analysis of human population history. In the SNPSTR database (http://www.imperial.ac.uk/theoreticalgenomics/data-software) we catalogue a relatively new type of compound genetic marker called SNPSTR which combines a microsatellite marker (STR) with one or more tightly linked SNPs. Here, the SNP(s) and the microsatellite are less than 250 bp apart so each SNPSTR can be considered a small haplotype with no recombination occurring between the two individual markers. Thus, SNPSTRs have the potential to become a very useful tool in the field of population genetics. The SNPSTR database contains all inferable human SNPSTRs as well as those in mouse, rat, dog and chicken, i.e. all model organisms for which extensive SNP datasets are available.
  • Item
    Thumbnail Image
    Phylogenetic diversity of stress signalling pathways in fungi.
    Nikolaou, E ; Agrafioti, I ; Stumpf, M ; Quinn, J ; Stansfield, I ; Brown, AJP (Springer Science and Business Media LLC, 2009-02-21)
    BACKGROUND: Microbes must sense environmental stresses, transduce these signals and mount protective responses to survive in hostile environments. In this study we have tested the hypothesis that fungal stress signalling pathways have evolved rapidly in a niche-specific fashion that is independent of phylogeny. To test this hypothesis we have compared the conservation of stress signalling molecules in diverse fungal species with their stress resistance. These fungi, which include ascomycetes, basidiomycetes and microsporidia, occupy highly divergent niches from saline environments to plant or mammalian hosts. RESULTS: The fungi displayed significant variation in their resistance to osmotic (NaCl and sorbitol), oxidative (H2O2 and menadione) and cell wall stresses (Calcofluor White and Congo Red). There was no strict correlation between fungal phylogeny and stress resistance. Rather, the human pathogens tended to be more resistant to all three types of stress, an exception being the sensitivity of Candida albicans to the cell wall stress, Calcofluor White. In contrast, the plant pathogens were relatively sensitive to oxidative stress. The degree of conservation of osmotic, oxidative and cell wall stress signalling pathways amongst the eighteen fungal species was examined. Putative orthologues of functionally defined signalling components in Saccharomyces cerevisiae were identified by performing reciprocal BLASTP searches, and the percent amino acid identities of these orthologues recorded. This revealed that in general, central components of the osmotic, oxidative and cell wall stress signalling pathways are relatively well conserved, whereas the sensors lying upstream and transcriptional regulators lying downstream of these modules have diverged significantly. There was no obvious correlation between the degree of conservation of stress signalling pathways and the resistance of a particular fungus to the corresponding stress. CONCLUSION: Our data are consistent with the hypothesis that fungal stress signalling components have undergone rapid recent evolution to tune the stress responses in a niche-specific fashion.
  • Item
    Thumbnail Image
    Network motifs: structure does not determine function.
    Ingram, PJ ; Stumpf, MPH ; Stark, J (Springer Science and Business Media LLC, 2006-05-05)
    BACKGROUND: A number of publications have recently examined the occurrence and properties of the feed-forward motif in a variety of networks, including those that are of interest in genome biology, such as gene networks. The present work looks in some detail at the dynamics of the bi-fan motif, using systems of ordinary differential equations to model the populations of transcription factors, mRNA and protein, with the aim of extending our understanding of what appear to be important building blocks of gene network structure. RESULTS: We develop an ordinary differential equation model of the bi-fan motif and analyse variants of the motif corresponding to its behaviour under various conditions. In particular, we examine the effects of different steady and pulsed inputs to five variants of the bifan motif, based on evidence in the literature of bifan motifs found in Saccharomyces cerevisiae (commonly known as baker's yeast). Using this model, we characterize the dynamical behaviour of the bi-fan motif for a wide range of biologically plausible parameters and configurations. We find that there is no characteristic behaviour for the motif, and with the correct choice of parameters and of internal structure, very different, indeed even opposite behaviours may be obtained. CONCLUSION: Even with this relatively simple model, the bi-fan motif can exhibit a wide range of dynamical responses. This suggests that it is difficult to gain significant insights into biological function simply by considering the connection architecture of a gene network, or its decomposition into simple structural motifs. It is necessary to supplement such structural information by kinetic parameters, or dynamic time series experimental data, both of which are currently difficult to obtain.
  • Item
    Thumbnail Image
    Comparative analysis of the Saccharomyces cerevisiae and Caenorhabditis elegans protein interaction networks.
    Agrafioti, I ; Swire, J ; Abbott, J ; Huntley, D ; Butcher, S ; Stumpf, MPH (Springer Science and Business Media LLC, 2005-03-18)
    BACKGROUND: Protein interaction networks aim to summarize the complex interplay of proteins in an organism. Early studies suggested that the position of a protein in the network determines its evolutionary rate but there has been considerable disagreement as to what extent other factors, such as protein abundance, modify this reported dependence. RESULTS: We compare the genomes of Saccharomyces cerevisiae and Caenorhabditis elegans with those of closely related species to elucidate the recent evolutionary history of their respective protein interaction networks. Interaction and expression data are studied in the light of a detailed phylogenetic analysis. The underlying network structure is incorporated explicitly into the statistical analysis. The increased phylogenetic resolution, paired with high-quality interaction data, allows us to resolve the way in which protein interaction network structure and abundance of proteins affect the evolutionary rate. We find that expression levels are better predictors of the evolutionary rate than a protein's connectivity. Detailed analysis of the two organisms also shows that the evolutionary rates of interacting proteins are not sufficiently similar to be mutually predictive. CONCLUSION: It appears that meaningful inferences about the evolution of protein interaction networks require comparative analysis of reasonably closely related species. The signature of protein evolution is shaped by a protein's abundance in the organism and its function and the biological process it is involved in. Its position in the interaction networks and its connectivity may modulate this but they appear to have only minor influence on a protein's evolutionary rate.
  • Item
    Thumbnail Image
    Generating confidence intervals on biological networks.
    Thorne, T ; Stumpf, MPH (Springer Science and Business Media LLC, 2007-11-30)
    BACKGROUND: In the analysis of networks we frequently require the statistical significance of some network statistic, such as measures of similarity for the properties of interacting nodes. The structure of the network may introduce dependencies among the nodes and it will in general be necessary to account for these dependencies in the statistical analysis. To this end we require some form of Null model of the network: generally rewired replicates of the network are generated which preserve only the degree (number of interactions) of each node. We show that this can fail to capture important features of network structure, and may result in unrealistic significance levels, when potentially confounding additional information is available. METHODS: We present a new network resampling Null model which takes into account the degree sequence as well as available biological annotations. Using gene ontology information as an illustration we show how this information can be accounted for in the resampling approach, and the impact such information has on the assessment of statistical significance of correlations and motif-abundances in the Saccharomyces cerevisiae protein interaction network. An algorithm, GOcardShuffle, is introduced to allow for the efficient construction of an improved Null model for network data. RESULTS: We use the protein interaction network of S. cerevisiae; correlations between the evolutionary rates and expression levels of interacting proteins and their statistical significance were assessed for Null models which condition on different aspects of the available data. The novel GOcardShuffle approach results in a Null model for annotated network data which appears better to describe the properties of real biological networks. CONCLUSION: An improved statistical approach for the statistical analysis of biological network data, which conditions on the available biological information, leads to qualitatively different results compared to approaches which ignore such annotations. In particular we demonstrate the effects of the biological organization of the network can be sufficient to explain the observed similarity of interacting proteins.