Coreference resolution for biomedical pathway data
AuthorChoi, Miji Jooyoung
AffiliationComputing and Information Systems
Document TypePhD thesis
Access StatusOpen Access
© 2017 Dr Miji Jooyung Choi
The study of biological pathways is a major activity in the life sciences. Biological pathways provide understanding and interpretation of many different kinds of biological mechanisms such as metabolism, sending of signals between cells, regulation of gene expression, and production of cells. If there are defects in a pathway, the result may be a disease. Thus, biological pathways are used to support diagnosis of disease, more effective drug prescription, or personalised treatments. Even though there are many pathway resources providing useful information discovered with manual efforts, a great deal of relevant information concerning in such pathways is scattered through the vast biomedical literature. With the growth in the volume of the biomedical literature, many natural language processing methods for automatic information extraction have been studied, but there still exist a variety of challenges such as complex or hidden representations due to the use of coreference expressions in texts. Linguistic expressions such as it, they, or the gene are frequently used by authors to avoid repeating the names of entities or repeating complex descriptions that have previously been introduced in the same text. This thesis addresses three research goals: (1) examining whether an existing coreference resolution approach in the general domain can be adapted to the biomedical domain; (2) investigation of a heuristic strategy for coreference resolution in the biomedical literature; and (3) examining how coreference resolution can improve biological pathway data from the perspectives of information extraction, and of evaluation of existing pathway resources. In this thesis, we propose a new categorical framework that provides detailed analysis of performance of coreference resolution systems, based on analysis of syntactic and semantic characteristics of coreference relations in the biomedical domain. The framework not only can identify weaknesses of existing approaches, but also can provide insights into strategies for further improvement. We propose an approach to biomedical domain-specific coreference resolution that combines a set of syntactically and semantically motivated rules in terms of coreference type. Finally, we demonstrate that coreference resolution is a valuable process for pathway information discovery, through case studies. Our results show that an approach incorporating a coreference resolution process significantly improves information extraction performance.
Keywordsnatural language processing; biomedical text mining; coreference resolution; pathway information extraction
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References