Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb
AuthorNagel, K; Jimeno-Yepes, A; Rebholz-Schuhmann, D
Source TitleBMC Bioinformatics
University of Melbourne Author/sJimeno Yepes, Antonio
AffiliationComputing and Information Systems
Document TypeJournal Article
CitationsNagel, K., Jimeno-Yepes, A. & Rebholz-Schuhmann, D. (2009). Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb. BMC BIOINFORMATICS, 10 (SUPPL. 8), https://doi.org/10.1186/1471-2105-10-S8-S4.
Access StatusOpen Access
Open Access at PMChttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC2745586
BACKGROUND: A protein annotation database, such as the Universal Protein Resource knowledge base (UniProtKb), is a valuable resource for the validation and interpretation of predicted 3D structure patterns in proteins. Existing studies have focussed on point mutation extraction methods from biomedical literature which can be used to support the time consuming work of manual database curation. However, these methods were limited to point mutation extraction and do not extract features for the annotation of proteins at the residue level. RESULTS: This work introduces a system that identifies protein residues in MEDLINE abstracts and annotates them with features extracted from the context written in the surrounding text. MEDLINE abstract texts have been processed to identify protein mentions in combination with taxonomic species and protein residues (F1-measure 0.52). The identified protein-species-residue triplets have been validated and benchmarked against reference data resources (UniProtKb, average F1-measure of 0.54). Then, contextual features were extracted through shallow and deep parsing and the features have been classified into predefined categories (F1-measure ranges from 0.15 to 0.67). Furthermore, the feature sets have been aligned with annotation types in UniProtKb to assess the relevance of the annotations for ongoing curation projects. Altogether, the annotations have been assessed automatically and manually against reference data resources. CONCLUSION: This work proposes a solution for the automatic extraction of functional annotation for protein residues from biomedical articles. The presented approach is an extension to other existing systems in that a wider range of residue entities are considered and that features of residues are extracted as annotations.
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References