University Library
  • Login
A gateway to Melbourne's research publications
Minerva Access is the University's Institutional Repository. It aims to collect, preserve, and showcase the intellectual output of staff and students of the University of Melbourne for a global audience.
View Item 
  • Minerva Access
  • Engineering and Information Technology
  • Computing and Information Systems
  • Computing and Information Systems - Research Publications
  • View Item
  • Minerva Access
  • Engineering and Information Technology
  • Computing and Information Systems
  • Computing and Information Systems - Research Publications
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

    Data-poor categorization and passage retrieval for gene ontology annotation in Swiss-Prot.

    Thumbnail
    Download
    Published version (305.1Kb)

    Citations
    Scopus
    Altmetric
    4
    Author
    Ehrler, F; Geissbühler, A; Jimeno, A; Ruch, P
    Date
    2005
    Source Title
    BMC Bioinformatics
    Publisher
    Springer Science and Business Media LLC
    University of Melbourne Author/s
    Jimeno Yepes, Antonio
    Affiliation
    Computing and Information Systems
    Metadata
    Show full item record
    Document Type
    Journal Article
    Citations
    Ehrler, F., Geissbühler, A., Jimeno, A. & Ruch, P. (2005). Data-poor categorization and passage retrieval for gene ontology annotation in Swiss-Prot.. BMC Bioinformatics, 6 Suppl 1 (SUPPL.1), pp.S23-. https://doi.org/10.1186/1471-2105-6-S1-S23.
    Access Status
    Open Access
    URI
    http://hdl.handle.net/11343/259050
    DOI
    10.1186/1471-2105-6-S1-S23
    Open Access at PMC
    http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869016
    Abstract
    BACKGROUND: In the context of the BioCreative competition, where training data were very sparse, we investigated two complementary tasks: 1) given a Swiss-Prot triplet, containing a protein, a GO (Gene Ontology) term and a relevant article, extraction of a short passage that justifies the GO category assignment; 2) given a Swiss-Prot pair, containing a protein and a relevant article, automatic assignment of a set of categories. METHODS: Sentence is the basic retrieval unit. Our classifier computes a distance between each sentence and the GO category provided with the Swiss-Prot entry. The Text Categorizer computes a distance between each GO term and the text of the article. Evaluations are reported both based on annotator judgements as established by the competition and based on mean average precision measures computed using a curated sample of Swiss-Prot. RESULTS: Our system achieved the best recall and precision combination both for passage retrieval and text categorization as evaluated by official evaluators. However, text categorization results were far below those in other data-poor text categorization experiments The top proposed term is relevant in less that 20% of cases, while categorization with other biomedical controlled vocabulary, such as the Medical Subject Headings, we achieved more than 90% precision. We also observe that the scoring methods used in our experiments, based on the retrieval status value of our engines, exhibits effective confidence estimation capabilities. CONCLUSION: From a comparative perspective, the combination of retrieval and natural language processing methods we designed, achieved very competitive performances. Largely data-independent, our systems were no less effective that data-intensive approaches. These results suggests that the overall strategy could benefit a large class of information extraction tasks, especially when training data are missing. However, from a user perspective, results were disappointing. Further investigations are needed to design applicable end-user text mining tools for biologists.

    Export Reference in RIS Format     

    Endnote

    • Click on "Export Reference in RIS Format" and choose "open with... Endnote".

    Refworks

    • Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References


    Collections
    • Minerva Elements Records [52369]
    • Computing and Information Systems - Research Publications [1558]
    Minerva AccessDepositing Your Work (for University of Melbourne Staff and Students)NewsFAQs

    BrowseCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects
    My AccountLoginRegister
    StatisticsMost Popular ItemsStatistics by CountryMost Popular Authors