School of Languages and Linguistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 7 of 7
  • Item
    Thumbnail Image
    Linguistic Data Management
    Thieberger, N ; Berez, AL ; Thieberger, N (Oxford University Press, 2012-09-18)
  • Item
    Thumbnail Image
    Documentation in practice: developing a linked media corpus of South Efate
    Thieberger, N (Hans Rausing Endangered Languages Project, School of Oriental and African Studies, University of London, 2004)
    There is a growing need for linguists working with endangered languages to be able to provide documentation of those languages that will serve two functions, not only the analysis and presentation of examples and texts, but also the means for accessing the material in the future. In this paper I describe a workflow for building documentation into a language description developed in the course of writing a grammar of South Efate, an Oceanic language of Vanuatu, for a PhD dissertation. I suggest that, with appropriate tools, the effort of recording and transcribing documentary field recordings can result in a media corpus from which we can produce instant links between text and media, which in turn enriches our analysis. Further, these annotations are in an ideal form for archiving and for providing access to data by the speakers of the language. I take it as axiomatic that we must archive our recordings and associated material and that this step is integral to the larger project of language documentation.
  • Item
    Thumbnail Image
    ALS Hypothetical
    Musgrave, S ; Thieberger, N ( 2005-09)
    A hypothetical scenario prepared by Simon Musgrave and Nick Thieberger and the presentation was by John Henderson (who added his own flourishes).This scenario draws out some significant ethical issues facing linguists, in particular those arising from the use of recording media and, increasingly, from more in depth studies of small, so-called ‘endangered’ languages in use in various contexts. The scenario is a background against which the panel can discuss the issues in character, and it should be made clear to the audience the characters do not reflect on the real-life panel members except in occasional asides, and that any resemblance to any person, living or dead, is inevitable.
  • Item
    Thumbnail Image
    Building an interactive corpus of field recordings
    Thieberger, N (Paris: ELRA, 2004)
    There is a growing need for linguists working with small and endangered languages to be able to provide documentation of those languages that will serve two functions, not only the analysis and presentation of examples and texts, but also the means for others to access the material in the future. In this presentation I describe the workflow developed in the course of writing a description of South Efate, an Oceanic language of Vanuatu for a PhD dissertation. This workflow steps through (i) field recording; (ii) digitising or capturing media data as citable objects for archival purposes; (iii) transcribing those objects with time-alignment; (iv) establishing a media corpus indexed by the transcript; (v) instantiating links between text and media using a purpose-built tool (Audiamus); (vi) exporting from Audiamus to interlinearise while maintaining timecodes; (vii) extracting citable example sentences for use in a grammatical description; (viii) exporting from Audiamus in XML, Quicktime or other formats.
  • Item
    Thumbnail Image
    As we may link: time-aligned concordances of field recordings. A working model
    Thieberger, N ( 2001)
    One can now picture a future investigator in his laboratory. His hands are free, and he is not anchored. As he moves about and observes, he photographs and comments. Time is automatically recorded to tie the two records together. If he goes into the field, he may be connected by radio to his recorder. As he ponders over his notes in the evening, he again talks his comments into the record. Vannevar Bush (1945) It has taken some time, but we are now able to create a system like the one envisaged by Vannevar Bush over 50 years ago. And despite the obvious leaps and bounds in technologies there are still areas in which much needs to be done. Linguists working on small languages (those typically spoken by indigenous people) with limited research grants typically patch together tools that will do what we want. Our research involves recording stories, sentences and so on, and then analysing that material to write a grammatical description. What we have done is record on to cassette, then transcribe the cassette and store it safely somewhere (like in our garage, or a cupboard). However a growing awareness that the products of our work need to be preserved in perpetuity means that we are also actively seeking principled approaches to language documentation.
  • Item
    Thumbnail Image
    Documentary linguistics and ethical issues
    Thieberger, N ; Musgrave, S (Hans Rausing Endangered Languages Project, 2007)
    In recent years, there has been an increasing emphasis on documentary linguistics within our discipline. This change of emphasis has been motivated by our concern over the pace of language loss, and has been facilitated by coincidental technological changes. Within this developing field, and especially as a result of the technological resources now available, we suggest that new ethical challenges arise in the professional practice of the linguist. The issues which we wish to raise in this paper stand outside of the area covered by existing institutional ethics procedures. The practice of documentary linguistics has a greater impact in a community than traditional data collection practice. There are two aspects to this impact. Firstly, a good documentation attempts to record as wide a range of language events as possible, in many genres and in many settings. This implies that the researcher’s presence in the community will be more intrusive than if the sole aim is to record sufficient material to prepare a grammatical description. Secondly, the nature of the data captured is also more intrusive, with video recording common and high quality audio recording more or less standard. Language documentation also implies the existence of archival data, that is, high quality data which is intended for persistent storage, which is accompanied by metadata sufficient to allow for the discovery of the resource, and which is under the control of a third party. Both of these aspects of documentation raise ethical issues. What procedures are appropriate to obtain informed consent to the type of data collection discussed above? What sort of rights and responsibilities does an archive have as another interested party in the negotiation of agreements between researchers and speakers / communities? Given the technological possibilities for dissemination and reproduction, how can ownership rights in recorded material be handled? How far should communities’ concepts of ownership be taken into account? How can ownership and access rights be negotiated so that they hold over the time frame which archiving assumes? What may be the consequences for a community when material is returned to them by researchers or archivists, given that the research and archiving process will inevitably have changed the nature of the material and its status in the community? We suggest that it is time for linguists to engage with these issues. We will discuss who the interested parties are in these processes, what responsibilities and rights each party may have, and some of the areas of potential conflict between those rights and responsibilities
  • Item
    Thumbnail Image
    Steps toward a grammar embedded in data
    THIEBERGER, N ; Epps, P ; Arkhipov, A (Walter de Gruyter, 2009-06-05)
    This volume continues the tradition of presenting the latest findings by typologists and field linguists, relevant to general linguistic theory and research methodology.