School of Languages and Linguistics - Research Publications

Permanent URI for this collection

http://hdl.handle.net/11343/392

Search Results

Now showing 1 - 10 of 88

LD Tools and Methods Summit Report

Thieberger, N ( 2016)

This document provides an overview of the main points arising from discussion at the Language Documentation Tools and Methods Summit (http://bit.ly/LDsummit2016) held at the University of Melbourne on 1-3 June 2016 and convened by Nick Thieberger and Simon Musgrave for the Centre of Excellence for the Dynamics of Language, funded by the Australian Research Council. Invited participants were asked to consider key issues that were pre-circulated and then prepare discussion points for the meeting. Each theme leader took notes and they are summarised below, with links to the original notes also provided below. There is necessarily some overlap between the reports on group discussions.
ARC Centre of Excellence for the Dynamics of Language: Indigenous Linguistic & Cultural Heritage Ethics Document

Thieberger, N ; Jones, C (ARC Centre of Excellence for the Dynamics of Language, 2017)

A significant part of the Centre’s research is reliant on the participation of indigenous communities in Australia and the Asia-Pacific, and actively contributes to the transmission and safeguarding of important cultural, linguistic and historical information. The Centre recognises the right of indigenous communities and individuals to maintain, control, protect and develop their traditional knowledge and cultural expressions, and the inherent ownership they have over this intellectual property. The Centre also recognises that communities and individuals within the region hold different views as to what these rights entail. Research conducted by Centre staff and students at the collaborating institutions is subject to approval by the respective institutional human research ethics committees. These statutory committees review and approve research involving Indigenous people with specific reference to Values and Ethics: Guidelines for Ethical Conduct in Aboriginal and Torres Strait Islander Health Research (NHMRC 2003), and AIATSIS Code of Ethics for Aboriginal and Torres Strait Islander Research (AIATSIS 2021), plus the National Statement on Ethical Conduct in Human Research (NHMRC, ARC, AVCC 2007) and ask researchers to consider expectations in Keeping Research on Track (NHMRC 2006). However, the CoE acknowledges that simply adhering to institutional requirements does not entail an ethical outcome, and we endorse the NHMRC’s statement that it “is possible for researchers to ‘meet’ rule-based requirements without engaging fully with the implications of difference and values relevant to their research. The approach advanced in these guidelines is more demanding of researchers as it seeks to move from compliance to trust.” (NHMRC 2003: 4)
It's a word isn't it? Language affection as an outcome of language programmes.

Thieberger, N (School of Languages and Linguistics, 2000)

Structural linguistics has a particular view of the integrity of language which may be detrimental to the construction of appropriate language maintenance programmes for small indigenous languages. In this paper I outline ways in which ‘affective’ use of language may be the most useful target of language programmes in some situations, based on my experience with Australian indigenous languages. Fluency in a language may not be the achievable outcome of a language course for a number of reasons, not least among them being the enormity of the task perceived by learners of the language. For languages with few or no speakers we should be able to construct language programmes in which the use of a small number of terms in the target language, for purposes of identity, is a sufficient and realistic outcome.
The Aboriginal Studies Electronic Data Archive (ASEDA)

Thieberger, N (De Gruyter, 1995)
Community-Led Documentation of Nafsan (Erakor, Vanuatu)

Krajinovic, A ; Billington, R ; Emil, L ; Kaltapau, G ; Thieberger, N ; Vetulani, Z ; Paroubek, P ; Kubis, M (SPRINGER INTERNATIONAL PUBLISHING AG, 2022)

We focus on a collaboration between community members and visiting linguists in Erakor, Vanuatu, aiming to build the capacity of community-based researchers to undertake and sustain documentation of Nafsan, the local indigenous language. We focus on the technical and procedural skills required to collect, manage, and work with audio and video data, and give an overview of the outcomes of a community-led documentation after initial training. We discuss the benefits and challenges of this type of project from the perspective of the community researchers and the external linguists. We show that community-led documentation such as this project in Erakor, in which data management and archiving are incorporated into the documentation process, has crucial benefits for both the community and the linguists. The two most salient benefits are: a) long-term documentation of linguistic and cultural practices calibrated towards community’s needs, and b) collection of larger quantities of data by community members, and often of better quality and scope than those collected by visiting linguists, which, besides being readily available for research, have a great potential for training and testing emerging language technologies for less-resourced languages, such as Automatic Speech Recognition (ASR).
Reflections on software and technology for language documentation

Arkhipov, A ; Thieberger, N ( 2020-01-01)

Technological developments in the last decades enabled an unprecedented growth in volumes and quality of collected language data. Emerging challenges include ensuring the longevity of the records, making them accessible and reusable for fellow researchers as well as for the speech communities. These records are robust research data on which verifiable claims can be based and on which future research can be built, and are the basis for revitalization of cultural practices, including language and music performance. Recording, storage and analysis technologies become more lightweight and portable, allowing language speakers to actively participate in documentation activities. This also results in growing needs for training and support, and thus more interaction and collaboration between linguists, developers and speakers. Both cutting-edge speech technologies and crowdsourcing methods can be effectively used to overcome bottlenecks between different stages of analysis. While the endeavour to develop a single all-purpose integrated workbench for documentary linguists may not be achievable, investing in robust open interchange formats that can be accessed and enriched by independent pieces of software seems more promising for the near future.
Carl Georg von Brandenstein’s legacy: The past in the present

Thieberger, N ; Peterson, N ; Kenny, A (ANU Press, 2017-09-21)

Interned as a prisoner of war in Australia in the 1940s, the Hittite specialist Carl Georg von Brandenstein went on to work with speakers of a number of Australian languages in Western Australia. At a time when the dominant paradigms in linguistics were either Chomskyan reductionism or writing a grammar to the exclusion of textual material, Carl followed his own direction, producing substantial collections of texts and recordings in Ngarluma, Yindjibarndi, Nyiyaparli, Ngadju and Noongar, as well as information about a number of other Australian languages. Part of his motivation was to obtain examples to reconstruct what he considered to be the original human language that diffused to all corners of the world, so he put some effort into comparing Australian languages with the classical languages he had previously studied.
When Your Data is My Grandparents Singing. Digitisation and Access for Cultural Records, the Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC)

Thieberger, N ; Harris, A (Ubiquity Press, Ltd., 2022-04-04)

In this paper we discuss the Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC), a research repository that explicitly aims to act as a conduit for research outputs to a range of audiences, both within and outside of academia. PARADISEC has been operating for 19 years, and has grown to hold over 390,000 files currently totaling 150 terabytes and representing 1,312 languages, many of them from Papua New Guinea and the Pacific. Our focus is on recordings and transcripts in the many small languages of the world, the songs and stories that are unique cultural expressions. While this research data is created for a particular project, it has huge value beyond academic research as it is typically oral tradition recorded in places where little else has been recorded. There is an increasing focus in academia on reproducible research and research data management, and repositories are the key to successful data management. We discuss the importance for research practice of having discipline-specific repositories. The data in our work is also cultural material that has value to the people recorded and their descendants, it is their grandparents and so we, as outsider researchers, have special responsibilities to treat the materials with respect and to ensure they are accessible to the people we have worked with.
Digital curation and access to recordings of traditional cultural performance.

Thieberger, N ; Harris, A (UNESCO, 2021)

Being home to over a quarter of the world’s languages, the Pacific is a particularly good place to focus on how language records can be made accessible. The creation and description of research records has not always been a priority for humanities academics and any records that are created have typically not been provided with good archival solutions. This is despite these records often being of cultural or historical relevance beyond academia. Many cultural agencies struggle to keep track of recordings they have made, and it is the same for many researchers. Often it is only when researchers prepare recordings for archiving that they realize how many (or few) are described adequately, or have been transcribed or translated.
Building Speech Recognition Systems for Language Documentation: The CoEDL Endangered Language Pipeline and Inference System (ELPIS)

Foley, B ; Arnold, J ; Coto-Solano, R ; Durantin, G ; Ellison, TM ; van Esch, D ; Heath, S ; Kratochvíl, F ; Maxwell-Smith, Z ; Nash, D ; Olsson, O ; Richards, M ; San, N ; Stoakes, H ; Thieberger, N ; Wiles, J (ISCA, 2018)

Machine learning has revolutionized speech technologies for major world languages, but these technologies have generally not been available for the roughly 4,000 languages with populations of fewer than 10,000 speakers. This paper describes the development of ELPIS, a pipeline which language documentation workers with minimal computational experience can use to build their own speech recognition models, resulting in models being built for 16 languages from the Asia-Pacific region. ELPIS puts machine learning speech technologies within reach of people working with languages with scarce data, in a scalable way. This is impactful since it enables language communities to cross the digital divide, and speeds up language documentation. Complete automation of the process is not feasible for languages with small quantities of data and potentially large vocabularies. Hence our goal is not full automation, but rather to make a practical and effective workflow that integrates machine learning technologies.

School of Languages and Linguistics - Research Publications

Permanent URI for this collection

Filters

Date

Author

Subject

Type

Settings

Sort By

Results per page

Statistics

Citations

Search Results