Computing and Information Systems - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 76
  • Item
    Thumbnail Image
    Towards a semantic lexicon for biological language processing
    Verspoor, K (HINDAWI LTD, 2005)
    This paper explores the use of the resources in the National Library of Medicine's Unified Medical Language System (UMLS) for the construction of a lexicon useful for processing texts in the field of molecular biology. A lexicon is constructed from overlapping terms in the UMLS SPECIALIST lexicon and the UMLS Metathesaurus to obtain both morphosyntactic and semantic information for terms, and the coverage of a domain corpus is assessed. Over 77% of tokens in the domain corpus are found in the constructed lexicon, validating the lexicon's coverage of the most frequent terms in the domain and indicating that the constructed lexicon is potentially an important resource for biological text processing.
  • Item
    Thumbnail Image
    Structuring Documents Efficiently
    MARSHALL, RGJ ; BIRD, SG ; STUCKEY, PJ (University of Sydney, 2005)
  • Item
    Thumbnail Image
    A classification-based framework for learning object assembly
    Farmer, R. A. ; Hughes, B. (IEEE Computer Society Press, 2005)
    Relations between learning outcomes and the learning objects which are assembled to facilitate their achievement are the subject of increasingly prevalent investigation, particularly with approaches which advocate the aggregation of learning objects as complex constituencies for achieving learning outcomes. From the perspective of situated learning, we show how the CASE framework imbues learning objects with a closed set of properties which can be classified and aggregated into learning object assemblies in a principled fashion. We argue that the computational and pedagogical tractability of this model provides a new insight into learning object evaluation, and hence learning outcomes.
  • Item
    Thumbnail Image
    NICTA i2d2 at GeoCLEF 2005
    HUGHES, BADEN ( 2005)
    This paper describes the participation of the Interactive Information Discovery and Delivery (i2d2) project of National ICT Australia (NICTA) in the GeoCLEF track of the Cross Language Evaluation Forum 2005. We present some background information about NICTA i2d2 project to motivate our involvement; describing our systems and experimental interests. We review the design of our runs and the results of our submitted and subsequent experiments; and contribute a range of suggestions for future instantiations of a geospatial information retrieval track within a shared evaluation task framework.
  • Item
    Thumbnail Image
    TalkBank: Building an open unified multimodal database of communicative interaction
    MacWhinney, B ; Bird, S ; Cieri, C ; Martell, C (Evaluations and Language resources Distribution Agency, 2004-01-01)
  • Item
  • Item
    Thumbnail Image
    LPath+: A first-order complete languagefor linguistic tree query
    Lai, C. ; Bird, S. G. (Academia Sinica, 2005)
    Annotated linguistic databases are widely used in linguistic research and inlanguage technology development. These annotations are typically hierarchical,and represent the nested structure of syntactic and prosodic constituents. Recently,the LPath language has been proposed as a convenient path-based language forquerying linguistic trees. We establish the formal expressiveness of LPath relativeto the XPath family of languages. We also extend LPath to permit simple closures,resulting in a first-order complete language which we believe is sufficientlyexpressive for the majority of linguistic tree query needs.
  • Item
    Thumbnail Image
    NLTK: The Natural Language Toolkit
    BIRD, SG ; LOPER, E (Association for Computational Linguistics, 2004)
  • Item
    Thumbnail Image
    Towards a General Model for Linguistic Paradigms
    PENTON, D ; BOW, C ; BIRD, S ; HUGHES, B (emeld.org, 2004)
  • Item
    Thumbnail Image
    Automatic utterance segmentation in Instant Messaging dialogue
    Ivanovic, Edward (Australasian Language Technology Association, 2005)
    Instant Messaging (IM) chat sessions are real-time, text-based conversations which can be analyzed using dialogue-act models.Dialogue acts represent the semantic information of an utterance, however, messages must be segmented into utterances before classification can take place. We describe and compare two statistical methods for automatic utterance segmentation and dialogue-act classification in task-based IM dialogue. It is shown that IM messages can be automatically segmented and classified to a very high accuracy using statistical machine learning.