BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature
AuthorKuo, C-J; Ling, MHT; Lin, K-T; Hsu, C-N
Source TitleBMC BIOINFORMATICS
University of Melbourne Author/sLING, HAN
Document TypeJournal Article
CitationsKuo, C. -J., Ling, M. H. T., Lin, K. -T. & Hsu, C. -N. (2009). BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature. BMC BIOINFORMATICS, 10 (SUPPL. 15), https://doi.org/10.1186/1471-2105-10-S15-S7.
Access StatusOpen Access
BACKGROUND: To automatically process large quantities of biological literature for knowledge discovery and information curation, text mining tools are becoming essential. Abbreviation recognition is related to NER and can be considered as a pair recognition task of a terminology and its corresponding abbreviation from free text. The successful identification of abbreviation and its corresponding definition is not only a prerequisite to index terms of text databases to produce articles of related interests, but also a building block to improve existing gene mention tagging and gene normalization tools. RESULTS: Our approach to abbreviation recognition (AR) is based on machine-learning, which exploits a novel set of rich features to learn rules from training data. Tested on the AB3P corpus, our system demonstrated a F-score of 89.90% with 95.86% precision at 84.64% recall, higher than the result achieved by the existing best AR performance system. We also annotated a new corpus of 1200 PubMed abstracts which was derived from BioCreative II gene normalization corpus. On our annotated corpus, our system achieved a F-score of 86.20% with 93.52% precision at 79.95% recall, which also outperforms all tested systems. CONCLUSION: By applying our system to extract all short form-long form pairs from all available PubMed abstracts, we have constructed BIOADI. Mining BIOADI reveals many interesting trends of bio-medical research. Besides, we also provide an off-line AR software in the download section on http://bioagent.iis.sinica.edu.tw/BIOADI/.
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References