School of Languages and Linguistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 4 of 4
  • Item
    Thumbnail Image
    Management of metadata in linguistic fieldwork: Experience from the ACLA project
    Hughes, B ; Penton, D ; Bird, S ; Bow, C ; Wigglesworth, G ; McConvell, P ; Simpson, J (European Language Resource Association, 2004-01-01)
  • Item
    Thumbnail Image
    Functional requirements for an interlinear text editor
    HUGHES, BADEN ; BOW, CATHERINE ; BIRD, STEVEN (European Language Resources Association, 2004)
    Interlinear text has long been considered a valuable format in the presentation of multilingual data, and a variety of software tools have facilitated the creation and processing of such texts by researchers. Despite the diversity of tools, a common core of editorial functionality is provided. Identifying these core functions has important implications for software engineers who seek to efficiently build tools that support interlinear text editing. While few applications are specifically designed for the creation or manipulation of interlinear text, a number of tools offer varying degrees of incidental support for this modality. In this paper we provide a comprehensive set of critieria upon which the derivation of functional criteria can be based. We describe the basis on which a group of tools was selected for investigation, along with the evaluation criteria. Finally we consolidate our findings into a functional specification for the development of software applications for the editing of interlinear text.
  • Item
    Thumbnail Image
    Encoding and presenting interlinear text using XML technologies
    HUGHES, BADEN ; BIRD, STEVEN ; BOW, CATHERINE (Australasian Language Technology Association, 2003)
    Interlinear text is a common presentational format for linguistic information, and its creation and management have been greatly facilitated by the development of specialised software. In earlier work we developed a four-level mode and corresponding formal specification for interlinear text. Here we describe a suitable XML representation for the model and show how it can be rendered into a variety of convenient presentational formats. We conclude by discussing architectural extensions, and application programming interface for interlinear text, and prospects for embedding the interlinear model into existing applications.
  • Item
    Thumbnail Image
    A Blueprint for a Comprehensive Australian English Auditory-Visual Speech Corpus
    Burnham, D ; Ambikairajah, E ; Arciuli, J ; Bennamoun, M ; Best, CT ; Bird, S ; Butcher, AR ; Cassidy, S ; Chetty, G ; Cox, FM ; Cutler, A ; Dale, R ; Epps, JR ; Fletcher, JM ; Goecke, R ; Grayden, DB ; Hajek, JT ; Ingram, JC ; Ishihara, S ; Kemp, N ; Kinoshita, Y ; Kuratate, T ; Lewis, TW ; Loakes, DE ; Onslow, M ; Powers, DM ; Rose, P ; Togneri, R ; Tran, D ; Wagner, M (Cascadilla Press, 2009)
    Contemporary speech science is driven by the availability of large, diverse speech corpora. Such infrastructure underpins research and technological advances in various practical, socially beneficial and economically fruitful endeavours, from ASR to hearing prostheses. Unfortunately, speech corpora are not easy to come by because they are both expensive to collect and are not favoured by the usual funding sources as their collection per se does not fall under the classification of ‘research’. Nevertheless they provide the sine qua non for many avenues of research endeavour in speech science. The only publicly available Australian speech corpus is the 12-year-old Australian National Database of Spoken Language (ANDOSL) database (see http://andosl.anu.edu.au/; Millar, Dermody, Harrington, & Vonwillar, 1990), which is now outmoded due to its small number of participants, just a single recording session per speaker, low fidelity, audio-only rather than AV data, its lack of disordered speech, and limited coverage of indigenous and ethnocultural Australian English (AusE) variants. There are more up-to-date UK and US English language corpora, but these are mostly audio-only, and use of these for AusE purposes is not optimal, and results in inaccuracies.