Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 2 of 2
  • Item
    Thumbnail Image
    Improving the efficiency and capabilities of document structuring
    MARSHALL, ROBERT ( 2007)
    Natural language generation (NLG), the problem of creating human-readable documents by computer, is one of the major fields of research in computational linguistics The task of creating a document is extremely common in many fields of activity. Accordingly, there are many potential applications for NLG - almost any document creation task could potentially be automated by an NLG system. Advanced forms of NLG could also be used to generate a document in multiple languages, or as an output interface for other programs, which might ordinarily produce a less-manageable collection of data. They may also be able to create documents tailored to the needs of individual users. This thesis deals with document structure, a recent theory which describes those aspects of a document’s layout which affect its meaning. As well as its theoretical interest, it is a useful intermediate representation in the process of NLG. There is a well-defined process for generating a document structure using constraint programming. We show how this process can be made considerably more efficient. This in turn allows us to extend the document structuring task to allow for summarisation and finer control of the document layout. This thesis is organised as follows. Firstly, we review the necessary background material in both natural language processing and constraint programming.
  • Item
    Thumbnail Image
    The effects of sampling and semantic categories on large-scale supervised relation extraction
    Willy ( 2012)
    The purpose of relation extraction is to identify novel pairs of entities which are related by a pre-specified relation such as hypernym or synonym. The traditional approach to relation extraction is to building a dedicated system for a particular relation, meaning that significant effort is required to repurpose the approach to new relations. We propose a generic approach based on supervised learning, which provides a standardised process for performing relation extraction on different relations and domains. We explore the feasibility of the approach over a range of relations and corpora, focusing particularly on the development of a realistic evaluation methodology for relation extraction. In addition to this, we investigate the impact of semantic categories on extraction effectiveness.