Show simple item record

dc.contributor.authorNagel, K
dc.contributor.authorJimeno-Yepes, A
dc.contributor.authorRebholz-Schuhmann, D
dc.date.accessioned2021-02-04T00:03:38Z
dc.date.available2021-02-04T00:03:38Z
dc.date.issued2009-01-01
dc.identifierpii: 1471-2105-10-S8-S4
dc.identifier.citationNagel, K., Jimeno-Yepes, A. & Rebholz-Schuhmann, D. (2009). Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb. BMC BIOINFORMATICS, 10 (SUPPL. 8), https://doi.org/10.1186/1471-2105-10-S8-S4.
dc.identifier.issn1471-2105
dc.identifier.urihttp://hdl.handle.net/11343/259021
dc.description.abstractBACKGROUND: A protein annotation database, such as the Universal Protein Resource knowledge base (UniProtKb), is a valuable resource for the validation and interpretation of predicted 3D structure patterns in proteins. Existing studies have focussed on point mutation extraction methods from biomedical literature which can be used to support the time consuming work of manual database curation. However, these methods were limited to point mutation extraction and do not extract features for the annotation of proteins at the residue level. RESULTS: This work introduces a system that identifies protein residues in MEDLINE abstracts and annotates them with features extracted from the context written in the surrounding text. MEDLINE abstract texts have been processed to identify protein mentions in combination with taxonomic species and protein residues (F1-measure 0.52). The identified protein-species-residue triplets have been validated and benchmarked against reference data resources (UniProtKb, average F1-measure of 0.54). Then, contextual features were extracted through shallow and deep parsing and the features have been classified into predefined categories (F1-measure ranges from 0.15 to 0.67). Furthermore, the feature sets have been aligned with annotation types in UniProtKb to assess the relevance of the annotations for ongoing curation projects. Altogether, the annotations have been assessed automatically and manually against reference data resources. CONCLUSION: This work proposes a solution for the automatic extraction of functional annotation for protein residues from biomedical articles. The presented approach is an extension to other existing systems in that a wider range of residue entities are considered and that features of residues are extracted as annotations.
dc.languageEnglish
dc.publisherBMC
dc.rights.urihttps://creativecommons.org/licenses/by/4.0
dc.titleAnnotation of protein residues based on a literature analysis: cross-validation against UniProtKb
dc.typeJournal Article
dc.identifier.doi10.1186/1471-2105-10-S8-S4
melbourne.affiliation.departmentComputing and Information Systems
melbourne.affiliation.facultyEngineering and Information Technology
melbourne.source.titleBMC Bioinformatics
melbourne.source.volume10
melbourne.source.issueSUPPL. 8
melbourne.source.pagesS4-
dc.rights.licenseCC BY
melbourne.elementsid1197670
melbourne.openaccess.pmchttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC2745586
melbourne.contributor.authorJimeno Yepes, Antonio
dc.identifier.eissn1471-2105
melbourne.accessrightsOpen Access


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record