Computing and Information Systems

Binary relation extraction is an essential component of information extraction systems, wherein the aim is to extract meaningful relations that might exist between a pair of entities within a sentence. Binary relation extraction systems have witnessed a significant improvement over past three decades, ranging from rule-based systems to statistical natural language techniques including supervised, semi-supervised and unsupervised machine learning approaches. Modern question answering and summarization systems have motivated the need for extracting complex relations wherein the number of related entities is more than two. Complex relation extraction (CRE) systems are highly domain specific and often rely on traditional binary relation extraction techniques employed in a pipeline fashion, thus susceptible to processing-induced error propagation. In this thesis, we investigate and develop approaches to extract complex relations directly from natural language text. In particular, we deviate from the traditional disintegration of complex relations into constituent binary relations and propose usage of shortest dependency parse spanning the n related entities as an alternative to facilitate direct CRE. We investigate this proposed approach by a comprehensive study of supervised learning algorithms with a special focus on training support vector machines, convolutional neural networks and deep learning ensemble algorithms. Research in the domain of CRE is stymied by paucity of annotated data. To facilitate future exploration, we create two new datasets to evaluate our proposed CRE approaches on a pilot biographical fact extraction task. An evaluation of results on new and standard datasets concludes that usage of shortest path dependency parse in a supervised setting enables direct CRE with an improved accuracy, beating current state-of-the-art CRE systems. We further show the application of CRE to achieve state-of-the-art performance for directly extracting events without the need of disintegrating them into event trigger and event argument extraction processes.

The purpose of relation extraction is to identify novel pairs of entities which are related by a pre-specified relation such as hypernym or synonym. The traditional approach to relation extraction is to building a dedicated system for a particular relation, meaning that significant effort is required to repurpose the approach to new relations. We propose a generic approach based on supervised learning, which provides a standardised process for performing relation extraction on different relations and domains. We explore the feasibility of the approach over a range of relations and corpora, focusing particularly on the development of a realistic evaluation methodology for relation extraction. In addition to this, we investigate the impact of semantic categories on extraction effectiveness.

Computing and Information Systems - Theses

Permanent URI for this collection

Filters

Date

Author

Subject

Type

Settings

Sort By

Results per page

Statistics

Citations

Search Results