Computing and Information Systems - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 8 of 8
  • Item
    Thumbnail Image
    Collecting low-density language materials on the Web
    Baldwin, Timothy ; BIRD, STEPHEN ; HUGHES, BADEN (Southern Cross University, 2006)
    Most web content exists in a few dozen languages. Hundreds of other languages - the `low-density languages' - are only represented in scarce quantities on the web. How can we locate, store and describe these low-density resources? In particular, how can we identify linguistically interesting resources, such as translation sets and multilingual documents? In this paper we describe ongoing research in which we integrate a number of discrete systems (language data crawler, automated metadata generation tools, language data repositories and federated search services) to address the identification, retrieval, description, storage and access issues for low-density language materials from the web.
  • Item
    Thumbnail Image
    Reconsidering language identification for written language resources
    HUGHES, BADEN ; BALDWIN, TIMOTHY ; BIRD, STEVEN ; NICHOLSON, JEREMY ; MACKINLAY, ANDREW (European Language Resources Association, 2006)
    The task of identifying the language in which a given document (ranging from a sentence to thousands of pages) is written has been relatively well studied over several decades. Automated approaches to written language identification are used widely throughout research and industrial contexts, over both oral and written source materials. Despite this widespread acceptance, a review of previous research in written language identification reveals a number of questions which remain open and ripe for further investigation.
  • Item
    Thumbnail Image
    Towards a General Model for Linguistic Paradigms
    PENTON, D ; BOW, C ; BIRD, S ; HUGHES, B (emeld.org, 2004)
  • Item
    Thumbnail Image
    Securing interpretability: The case of ega language documentation
    Gibbon, D ; Bow, C ; Bird, S ; Hughes, B (Evaluations and Language resources Distribution Agency, 2004-01-01)
  • Item
    Thumbnail Image
    Management of metadata in linguistic fieldwork: Experience from the ACLA project
    Hughes, B ; Penton, D ; Bird, S ; Bow, C ; Wigglesworth, G ; McConvell, P ; Simpson, J (European Language Resource Association, 2004-01-01)
  • Item
    Thumbnail Image
    Functional requirements for an interlinear text editor
    HUGHES, BADEN ; BOW, CATHERINE ; BIRD, STEVEN (European Language Resources Association, 2004)
    Interlinear text has long been considered a valuable format in the presentation of multilingual data, and a variety of software tools have facilitated the creation and processing of such texts by researchers. Despite the diversity of tools, a common core of editorial functionality is provided. Identifying these core functions has important implications for software engineers who seek to efficiently build tools that support interlinear text editing. While few applications are specifically designed for the creation or manipulation of interlinear text, a number of tools offer varying degrees of incidental support for this modality. In this paper we provide a comprehensive set of critieria upon which the derivation of functional criteria can be based. We describe the basis on which a group of tools was selected for investigation, along with the evaluation criteria. Finally we consolidate our findings into a functional specification for the development of software applications for the editing of interlinear text.
  • Item
    Thumbnail Image
    Grid-enabling natural language engineering by stealth
    HUGHES, BADEN ; Bird, Steven (Association for Computational Linguistics, 2003)
    We describe a proposal for an extensible, component-based software architecture for natural language engineering applications. Our model leverages existing linguistic resource description and discovery mechanisms based on extended Dublin Core metadata. In addition, the application design is flexible, allowing disparate components to be combined to suit the overall application functionality. An application specification language provides abstraction from the programming environment and allows ease of interface with computational grids via a broker.
  • Item
    Thumbnail Image
    Encoding and presenting interlinear text using XML technologies
    HUGHES, BADEN ; BIRD, STEVEN ; BOW, CATHERINE (Australasian Language Technology Association, 2003)
    Interlinear text is a common presentational format for linguistic information, and its creation and management have been greatly facilitated by the development of specialised software. In earlier work we developed a four-level mode and corresponding formal specification for interlinear text. Here we describe a suitable XML representation for the model and show how it can be rendered into a variety of convenient presentational formats. We conclude by discussing architectural extensions, and application programming interface for interlinear text, and prospects for embedding the interlinear model into existing applications.