University Library
  • Login
A gateway to Melbourne's research publications
Minerva Access is the University's Institutional Repository. It aims to collect, preserve, and showcase the intellectual output of staff and students of the University of Melbourne for a global audience.
View Item 
  • Minerva Access
  • Medicine, Dentistry & Health Sciences
  • Medicine, Dentistry & Health Sciences Collected Works
  • Medicine, Dentistry & Health Sciences Collected Works - Research Publications
  • View Item
  • Minerva Access
  • Medicine, Dentistry & Health Sciences
  • Medicine, Dentistry & Health Sciences Collected Works
  • Medicine, Dentistry & Health Sciences Collected Works - Research Publications
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

    Efficient large-scale protein sequence comparison and gene matching to identify orthologs and co-orthologs

    Thumbnail
    Download
    Published version (699.7Kb)

    Citations
    Scopus
    Web of Science
    Altmetric
    17
    15
    Author
    Mahmood, K; Webb, GI; Song, J; Whisstock, JC; Konagurthu, AS
    Date
    2012-03-01
    Source Title
    Nucleic Acids Research
    Publisher
    OXFORD UNIV PRESS
    University of Melbourne Author/s
    Mahmood, Khalid
    Affiliation
    Medicine Dentistry & Health Sciences
    Metadata
    Show full item record
    Document Type
    Journal Article
    Citations
    Mahmood, K., Webb, G. I., Song, J., Whisstock, J. C. & Konagurthu, A. S. (2012). Efficient large-scale protein sequence comparison and gene matching to identify orthologs and co-orthologs. NUCLEIC ACIDS RESEARCH, 40 (6), https://doi.org/10.1093/nar/gkr1261.
    Access Status
    Open Access
    URI
    http://hdl.handle.net/11343/253582
    DOI
    10.1093/nar/gkr1261
    Abstract
    Broadly, computational approaches for ortholog assignment is a three steps process: (i) identify all putative homologs between the genomes, (ii) identify gene anchors and (iii) link anchors to identify best gene matches given their order and context. In this article, we engineer two methods to improve two important aspects of this pipeline [specifically steps (ii) and (iii)]. First, computing sequence similarity data [step (i)] is a computationally intensive task for large sequence sets, creating a bottleneck in the ortholog assignment pipeline. We have designed a fast and highly scalable sort-join method (afree) based on k-mer counts to rapidly compare all pairs of sequences in a large protein sequence set to identify putative homologs. Second, availability of complex genomes containing large gene families with prevalence of complex evolutionary events, such as duplications, has made the task of assigning orthologs and co-orthologs difficult. Here, we have developed an iterative graph matching strategy where at each iteration the best gene assignments are identified resulting in a set of orthologs and co-orthologs. We find that the afree algorithm is faster than existing methods and maintains high accuracy in identifying similar genes. The iterative graph matching strategy also showed high accuracy in identifying complex gene relationships. Standalone afree available from http://vbc.med.monash.edu.au/∼kmahmood/afree. EGM2, complete ortholog assignment pipeline (including afree and the iterative graph matching method) available from http://vbc.med.monash.edu.au/∼kmahmood/EGM2.

    Export Reference in RIS Format     

    Endnote

    • Click on "Export Reference in RIS Format" and choose "open with... Endnote".

    Refworks

    • Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References


    Collections
    • Minerva Elements Records [45770]
    • Medicine, Dentistry & Health Sciences Collected Works - Research Publications [578]
    Minerva AccessDepositing Your Work (for University of Melbourne Staff and Students)NewsFAQs

    BrowseCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects
    My AccountLoginRegister
    StatisticsMost Popular ItemsStatistics by CountryMost Popular Authors