Computing and Information Systems - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 11
  • Item
    Thumbnail Image
    Automatic keyphrase extraction from scientific articles
    Kim, SN ; Medelyan, O ; Kan, M-Y ; Baldwin, T (SPRINGER, 2013-09)
  • Item
    Thumbnail Image
    Detecting modification of biomedical events using a deep parsing approach
    MacKinlay, A ; Martinez, D ; Baldwin, T (BMC, 2012-04-30)
    BACKGROUND: This work describes a system for identifying event mentions in bio-molecular research abstracts that are either speculative (e.g. analysis of IkappaBalpha phosphorylation, where it is not specified whether phosphorylation did or did not occur) or negated (e.g. inhibition of IkappaBalpha phosphorylation, where phosphorylation did not occur). The data comes from a standard dataset created for the BioNLP 2009 Shared Task. The system uses a machine-learning approach, where the features used for classification are a combination of shallow features derived from the words of the sentences and more complex features based on the semantic outputs produced by a deep parser. METHOD: To detect event modification, we use a Maximum Entropy learner with features extracted from the data relative to the trigger words of the events. The shallow features are bag-of-words features based on a small sliding context window of 3-4 tokens on either side of the trigger word. The deep parser features are derived from parses produced by the English Resource Grammar and the RASP parser. The outputs of these parsers are converted into the Minimal Recursion Semantics formalism, and from this, we extract features motivated by linguistics and the data itself. All of these features are combined to create training or test data for the machine learning algorithm. RESULTS: Over the test data, our methods produce approximately a 4% absolute increase in F-score for detection of event modification compared to a baseline based only on the shallow bag-of-words features. CONCLUSIONS: Our results indicate that grammar-based techniques can enhance the accuracy of methods for detecting event modification.
  • Item
    Thumbnail Image
    Word sense disambiguation for event trigger word detection in biomedicine
    Martinez, D ; Baldwin, T (BMC, 2011-03-29)
    This paper describes a method for detecting event trigger words in biomedical text based on a word sense disambiguation (WSD) approach. We first investigate the applicability of existing WSD techniques to trigger word disambiguation in the BioNLP 2009 shared task data, and find that we are able to outperform a traditional CRF-based approach for certain word types. On the basis of this finding, we combine the WSD approach with the CRF, and obtain significant improvements over the standalone CRF, gaining particularly in recall.
  • Item
    Thumbnail Image
    VetCompass Australia: A National Big Data Collection System for Veterinary Science
    McGreevy, P ; Thomson, P ; Dhand, NK ; Raubenheimer, D ; Masters, S ; Mansfield, CS ; Baldwin, T ; Magalhaes, RJS ; Rand, J ; Hill, P ; Peaston, A ; Gilkerson, J ; Combs, M ; Raidal, S ; Irwin, P ; Irons, P ; Squires, R ; Brodbelt, D ; Hammond, J (MDPI, 2017-10)
    VetCompass Australia is veterinary medical records-based research coordinated with the global VetCompass endeavor to maximize its quality and effectiveness for Australian companion animals (cats, dogs, and horses). Bringing together all seven Australian veterinary schools, it is the first nationwide surveillance system collating clinical records on companion-animal diseases and treatments. VetCompass data service collects and aggregates real-time, clinical records for researchers to interrogate, delivering sustainable and cost-effective access to data from hundreds of veterinary practitioners nationwide. Analysis of these clinical records will reveal geographical and temporal trends in the prevalence of inherited and acquired diseases, identify frequently prescribed treatments, revolutionize clinical auditing, help the veterinary profession to rank research priorities, and assure evidence-based companion-animal curricula in veterinary schools. VetCompass Australia will progress in three phases: (1) roll-out of the VetCompass platform to harvest Australian veterinary clinical record data; (2) development and enrichment of the coding (data-presentation) platform; and (3) creation of a world-first, real-time surveillance interface with natural language processing (NLP) technology. The first of these three phases is described in the current article. Advances in the collection and sharing of records from numerous practices will enable veterinary professionals to deliver a vastly improved level of care for companion animals that will improve their quality of life.
  • Item
    Thumbnail Image
    Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vector Differences for Lexical Relation Learning
    Vylomova, E ; Rimell, L ; Cohn, T ; Baldwin, T ; Erk, K ; Smith, NA (The Association for Computational Linguistics, 2016)
    Recent work on word embeddings has shown that simple vector subtraction over pre-trained embeddings is surprisingly effective at capturing different lexical relations, despite lacking explicit supervision. Prior work has evaluated this intriguing result using a word analogy prediction formulation and hand-selected relations, but the generality of the finding over a broader range of lexical relation types and different learning settings has not been evaluated. In this paper, we carry out such an evaluation in two learning settings: (1) spectral clustering to induce word relations, and (2) supervised learning to classify vector differences into relation types. We find that word embeddings capture a surprising amount of information, and that, under suitable supervised training, vector subtraction generalises well to a broad range of relations, including over unseen lexical items.
  • Item
    Thumbnail Image
    Can machine translation systems be evaluated by the crowd alone
    Graham, Y ; Baldwin, T ; Moffat, A ; Zobel, J (CAMBRIDGE UNIV PRESS, 2017-01)
    Abstract Crowd-sourced assessments of machine translation quality allow evaluations to be carried out cheaply and on a large scale. It is essential, however, that the crowd's work be filtered to avoid contamination of results through the inclusion of false assessments. One method is to filter via agreement with experts, but even amongst experts agreement levels may not be high. In this paper, we present a new methodology for crowd-sourcing human assessments of translation quality, which allows individual workers to develop their own individual assessment strategy. Agreement with experts is no longer required, and a worker is deemed reliable if they are consistent relative to their own previous work. Individual translations are assessed in isolation from all others in the form of direct estimates of translation quality. This allows more meaningful statistics to be computed for systems and enables significance to be determined on smaller sets of assessments. We demonstrate the methodology's feasibility in large-scale human evaluation through replication of the human evaluation component of Workshop on Statistical Machine Translation shared translation task for two language pairs, Spanish-to-English and English-to-Spanish. Results for measurement based solely on crowd-sourced assessments show system rankings in line with those of the original evaluation. Comparison of results produced by the relative preference approach and the direct estimate method described here demonstrate that the direct estimate method has a substantially increased ability to identify significant differences between translation systems.
  • Item
    Thumbnail Image
    Classifying dialogue acts in one-on-one live chats
    Kim, SN ; Cavedon, L ; Baldwin, T (The Association for Computational Linguistics, 2010-12-01)
  • Item
    Thumbnail Image
    Unsupervised parse selection for HPSG
    Dridan, R ; Baldwin, T (The Association for Computational Linguistics, 2010-12-01)
  • Item
    Thumbnail Image
    Best topic word selection for topic labelling
    Lau, JH ; Newman, D ; Karimi, S ; Baldwin, T (The Association for Computational Linguistics, 2010-12-01)
  • Item
    Thumbnail Image
    Word sense disambiguation for event trigger word detection
    Martinez, D ; Baldwin, T (ACM, 2010-12-01)