Biochemistry and Pharmacology - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 5 of 5
  • Item
    Thumbnail Image
    A Comprehensive Bioinformatics Analysis of the Nudix Superfamily in Arabidopsis thaliana
    Gunawardana, D ; Likic, VA ; Gayler, KR (HINDAWI LTD, 2009)
    Nudix enzymes are a superfamily with a conserved common reaction mechanism that provides the capacity for the hydrolysis of a broad spectrum of metabolites. We used hidden Markov models based on Nudix sequences from the PFAM and PROSITE databases to identify Nudix hydrolases encoded by the Arabidopsis genome. 25 Nudix hydrolases were identified and classified into 11 individual families by pairwise sequence alignments. Intron phases were strikingly conserved in each family. Phylogenetic analysis showed that all multimember families formed monophyletic clusters. Conserved familial sequence motifs were identified with the MEME motif analysis algorithm. One motif (motif 4) was found in three diverse families. All proteins containing motif 4 demonstrated a degree of preference for substrates containing an ADP moiety. We conclude that HMM model-based genome scanning and MEME motif analysis, respectively, can significantly improve the identification and assignment of function of new members of this mechanistically-diverse protein superfamily.
  • Item
    Thumbnail Image
    Extraction of pure components from overlapped signals in gas chromatography-mass spectrometry (GC-MS)
    Likic, VA (BMC, 2009)
    Gas chromatography-mass spectrometry (GC-MS) is a widely used analytical technique for the identification and quantification of trace chemicals in complex mixtures. When complex samples are analyzed by GC-MS it is common to observe co-elution of two or more components, resulting in an overlap of signal peaks observed in the total ion chromatogram. In such situations manual signal analysis is often the most reliable means for the extraction of pure component signals; however, a systematic manual analysis over a number of samples is both tedious and prone to error. In the past 30 years a number of computational approaches were proposed to assist in the process of the extraction of pure signals from co-eluting GC-MS components. This includes empirical methods, comparison with library spectra, eigenvalue analysis, regression and others. However, to date no approach has been recognized as best, nor accepted as standard. This situation hampers general GC-MS capabilities, and in particular has implications for the development of robust, high-throughput GC-MS analytical protocols required in metabolic profiling and biomarker discovery. Here we first discuss the nature of GC-MS data, and then review some of the approaches proposed for the extraction of pure signals from co-eluting components. We summarize and classify different approaches to this problem, and examine why so many approaches proposed in the past have failed to live up to their full promise. Finally, we give some thoughts on the future developments in this field, and suggest that the progress in general computing capabilities attained in the past two decades has opened new horizons for tackling this important problem.
  • Item
    Thumbnail Image
    LeishCyc: a biochemical pathways database for Leishmania major
    Doyle, MA ; MacRae, JI ; De Souza, DP ; Saunders, EC ; McConville, MJ ; Likic, VA (BMC, 2009-06-05)
    BACKGROUND: Leishmania spp. are sandfly transmitted protozoan parasites that cause a spectrum of diseases in more than 12 million people worldwide. Much research is now focusing on how these parasites adapt to the distinct nutrient environments they encounter in the digestive tract of the sandfly vector and the phagolysosome compartment of mammalian macrophages. While data mining and annotation of the genomes of three Leishmania species has provided an initial inventory of predicted metabolic components and associated pathways, resources for integrating this information into metabolic networks and incorporating data from transcript, protein, and metabolite profiling studies is currently lacking. The development of a reliable, expertly curated, and widely available model of Leishmania metabolic networks is required to facilitate systems analysis, as well as discovery and prioritization of new drug targets for this important human pathogen. DESCRIPTION: The LeishCyc database was initially built from the genome sequence of Leishmania major (v5.2), based on the annotation published by the Wellcome Trust Sanger Institute. LeishCyc was manually curated to remove errors, correct automated predictions, and add information from the literature. The ongoing curation is based on public sources, literature searches, and our own experimental and bioinformatics studies. In a number of instances we have improved on the original genome annotation, and, in some ambiguous cases, collected relevant information from the literature in order to help clarify gene or protein annotation in the future. All genes in LeishCyc are linked to the corresponding entry in GeneDB (Wellcome Trust Sanger Institute). CONCLUSION: The LeishCyc database describes Leishmania major genes, gene products, metabolites, their relationships and biochemical organization into metabolic pathways. LeishCyc provides a systematic approach to organizing the evolving information about Leishmania biochemical networks and is a tool for analysis, interpretation, and visualization of Leishmania Omics data (transcriptomics, proteomics, metabolomics) in the context of metabolic pathways. LeishCyc is the first such database for the Trypanosomatidae family, which includes a number of other important human parasites. Flexible query/visualization capabilities are provided by the Pathway Tools software and its Web interface. The LeishCyc database is made freely available over the Internet http://www.leishcyc.org.
  • Item
    Thumbnail Image
    Protein secretion and outer membrane assembly in Alphaproteobacteria
    Gatsos, X ; Perry, AJ ; Anwari, K ; Dolezal, P ; Wolynec, PP ; Likic, VA ; Purcell, AW ; Buchanan, SK ; Lithgow, T (OXFORD UNIV PRESS, 2008-11)
    The assembly of beta-barrel proteins into membranes is a fundamental process that is essential in Gram-negative bacteria, mitochondria and plastids. Our understanding of the mechanism of beta-barrel assembly is progressing from studies carried out in Escherichia coli and Neisseria meningitidis. Comparative sequence analysis suggests that while many components mediating beta-barrel protein assembly are conserved in all groups of bacteria with outer membranes, some components are notably absent. The Alphaproteobacteria in particular seem prone to gene loss and show the presence or absence of specific components mediating the assembly of beta-barrels: some components of the pathway appear to be missing from whole groups of bacteria (e.g. Skp, YfgL and NlpB), other proteins are conserved but are missing characteristic domains (e.g. SurA). This comparative analysis is also revealing important structural signatures that are vague unless multiple members from a protein family are considered as a group (e.g. tetratricopeptide repeat (TPR) motifs in YfiO, beta-propeller signatures in YfgL). Given that the process of the beta-barrel assembly is conserved, analysis of outer membrane biogenesis in Alphaproteobacteria, the bacterial group that gave rise to mitochondria, also promises insight into the assembly of beta-barrel proteins in eukaryotes.
  • Item
    Thumbnail Image
    A dynamic programming approach for the alignment of signal peaks in multiple gas chromatography-mass spectrometry experiments
    Robinson, MD ; De Souza, DP ; Keen, WW ; Saunders, EC ; McConville, MJ ; Speed, TP ; Likic, VA (BMC, 2007-10-29)
    BACKGROUND: Gas chromatography-mass spectrometry (GC-MS) is a robust platform for the profiling of certain classes of small molecules in biological samples. When multiple samples are profiled, including replicates of the same sample and/or different sample states, one needs to account for retention time drifts between experiments. This can be achieved either by the alignment of chromatographic profiles prior to peak detection, or by matching signal peaks after they have been extracted from chromatogram data matrices. Automated retention time correction is particularly important in non-targeted profiling studies. RESULTS: A new approach for matching signal peaks based on dynamic programming is presented. The proposed approach relies on both peak retention times and mass spectra. The alignment of more than two peak lists involves three steps: (1) all possible pairs of peak lists are aligned, and similarity of each pair of peak lists is estimated; (2) the guide tree is built based on the similarity between the peak lists; (3) peak lists are progressively aligned starting with the two most similar peak lists, following the guide tree until all peak lists are exhausted. When two or more experiments are performed on different sample states and each consisting of multiple replicates, peak lists within each set of replicate experiments are aligned first (within-state alignment), and subsequently the resulting alignments are aligned themselves (between-state alignment). When more than two sets of replicate experiments are present, the between-state alignment also employs the guide tree. We demonstrate the usefulness of this approach on GC-MS metabolic profiling experiments acquired on wild-type and mutant Leishmania mexicana parasites. CONCLUSION: We propose a progressive method to match signal peaks across multiple GC-MS experiments based on dynamic programming. A sensitive peak similarity function is proposed to balance peak retention time and peak mass spectra similarities. This approach can produce the optimal alignment between an arbitrary number of peak lists, and models explicitly within-state and between-state peak alignment. The accuracy of the proposed method was close to the accuracy of manually-curated peak matching, which required tens of man-hours for the analyzed data sets. The proposed approach may offer significant advantages for processing of high-throughput metabolomics data, especially when large numbers of experimental replicates and multiple sample states are analyzed.