Show simple item record

dc.contributor.authorJimeno-Yepes, AJ
dc.contributor.authorSticco, JC
dc.contributor.authorMork, JG
dc.contributor.authorAronson, AR
dc.date.accessioned2021-02-04T00:10:26Z
dc.date.available2021-02-04T00:10:26Z
dc.date.issued2013-05-31
dc.identifierpii: 1471-2105-14-171
dc.identifier.citationJimeno-Yepes, A. J., Sticco, J. C., Mork, J. G. & Aronson, A. R. (2013). GeneRIF indexing: sentence selection based on machine learning. BMC BIOINFORMATICS, 14 (1), https://doi.org/10.1186/1471-2105-14-171.
dc.identifier.issn1471-2105
dc.identifier.urihttp://hdl.handle.net/11343/259046
dc.description.abstractBACKGROUND: A Gene Reference Into Function (GeneRIF) describes novel functionality of genes. GeneRIFs are available from the National Center for Biotechnology Information (NCBI) Gene database. GeneRIF indexing is performed manually, and the intention of our work is to provide methods to support creating the GeneRIF entries. The creation of GeneRIF entries involves the identification of the genes mentioned in MEDLINE®; citations and the sentences describing a novel function. RESULTS: We have compared several learning algorithms and several features extracted or derived from MEDLINE sentences to determine if a sentence should be selected for GeneRIF indexing. Features are derived from the sentences or using mechanisms to augment the information provided by them: assigning a discourse label using a previously trained model, for example. We show that machine learning approaches with specific feature combinations achieve results close to one of the annotators. We have evaluated different feature sets and learning algorithms. In particular, Naïve Bayes achieves better performance with a selection of features similar to one used in related work, which considers the location of the sentence, the discourse of the sentence and the functional terminology in it. CONCLUSIONS: The current performance is at a level similar to human annotation and it shows that machine learning can be used to automate the task of sentence selection for GeneRIF annotation. The current experiments are limited to the human species. We would like to see how the methodology can be extended to other species, specifically the normalization of gene mentions in other species.
dc.languageEnglish
dc.publisherBMC
dc.rights.urihttps://creativecommons.org/licenses/by/4.0
dc.titleGeneRIF indexing: sentence selection based on machine learning
dc.typeJournal Article
dc.identifier.doi10.1186/1471-2105-14-171
melbourne.affiliation.departmentComputing and Information Systems
melbourne.affiliation.facultyEngineering and Information Technology
melbourne.source.titleBMC Bioinformatics
melbourne.source.volume14
melbourne.source.issue1
dc.rights.licenseCC BY
melbourne.elementsid1197635
melbourne.contributor.authorJimeno Yepes, Antonio
dc.identifier.eissn1471-2105
melbourne.accessrightsOpen Access


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record