School of Languages and Linguistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 4 of 4
  • Item
    Thumbnail Image
    Towards a Web search service for minority language communities
    HUGHES, BADEN (State Library of Victoria, 2006)
    Locating resources of interest on the web in the general case is at best a low precision activity owing to the large number of pages on the web (for example, Google covers more than 8 billion web pages). As language communities (at all points on the spectrum) increasingly self-publish materials on the web, so interested users are beginning to search for them in the same way that they search for general internet resources, using broad coverage search engines with typically simple queries. Given that language resources are in a minority case on the web in general, finding relevant materials for low density or lesser used languages on the web is in general an increasingly inefficient exercise even for experienced searchers. Furthermore, the inconsistent coverage of web content between search engines serves to complicate matters even more. A number of previous research efforts have focused on using web data to create language corpora, mine linguistic data, building language ontologies, create thesaurii etc. The work reported in this paper contrasts with previous research in that it is not specifically oriented towards creation of language resources from web data directly, but rather, increasing the likelihood that end users searching for resources in minority languages will actually find useful results from web searches. Similarly, it differs from earlier work by virtue of its focus on search optimization directly, rather than as a component of a larger process (other researchers use the seed URIs discovered via the mechanism described in this paper in their own varied work). The work here can be seen to contribute to a user-centric agenda for locating language resources for lesser-used languages on the web. (From Introduction)
  • Item
    Thumbnail Image
    Management of metadata in linguistic fieldwork: Experience from the ACLA project
    Hughes, B ; Penton, D ; Bird, S ; Bow, C ; Wigglesworth, G ; McConvell, P ; Simpson, J (European Language Resource Association, 2004-01-01)
  • Item
    Thumbnail Image
    Functional requirements for an interlinear text editor
    HUGHES, BADEN ; BOW, CATHERINE ; BIRD, STEVEN (European Language Resources Association, 2004)
    Interlinear text has long been considered a valuable format in the presentation of multilingual data, and a variety of software tools have facilitated the creation and processing of such texts by researchers. Despite the diversity of tools, a common core of editorial functionality is provided. Identifying these core functions has important implications for software engineers who seek to efficiently build tools that support interlinear text editing. While few applications are specifically designed for the creation or manipulation of interlinear text, a number of tools offer varying degrees of incidental support for this modality. In this paper we provide a comprehensive set of critieria upon which the derivation of functional criteria can be based. We describe the basis on which a group of tools was selected for investigation, along with the evaluation criteria. Finally we consolidate our findings into a functional specification for the development of software applications for the editing of interlinear text.
  • Item
    Thumbnail Image
    Encoding and presenting interlinear text using XML technologies
    HUGHES, BADEN ; BIRD, STEVEN ; BOW, CATHERINE (Australasian Language Technology Association, 2003)
    Interlinear text is a common presentational format for linguistic information, and its creation and management have been greatly facilitated by the development of specialised software. In earlier work we developed a four-level mode and corresponding formal specification for interlinear text. Here we describe a suitable XML representation for the model and show how it can be rendered into a variety of convenient presentational formats. We conclude by discussing architectural extensions, and application programming interface for interlinear text, and prospects for embedding the interlinear model into existing applications.