School of Languages and Linguistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 5 of 5
  • Item
    No Preview Available
    Nyingarn: Supporting Australian Indigenous languages from textual sources1
    Thieberger, N ; Lewincamp, S ; Rosa, ML (IEEE, 2023-01-01)
  • Item
    No Preview Available
    Customary song in Christian clothing
    Thieberger, N ; Barwick, L (Presses universitaires de la Nouvelle‐Calédonie, 2023)
    In this paper, we illustrate the maintenance of archaic forms of Nafsan (a language spoken in Efate, Vanuatu) in song, and take one particular song as an example. Nafsan is known for having lost medial and final vowels in everyday language, but these can be, as in many languages, retained in song. One of the very few books written in Nafsan by Nafsan speakers was produced in 1983 in Port Vila (Wai et al.). It contains twelve stories, and ends with a cryptic inscription, M‐dd‐M‐dd‐ddl‐S‐dl‐s‐dd. All the stories were transcribed and translated as part of Thieberger’s research, but he was not sure what to do with this collection of letters. By chance, a copy of a hymnal on Lelepa island had the same cryptic letters that were evidently a form of musical notation known as solfa, Tonic Sol‐fa, or Solfege. Translations of Christian hymns into Nafsan were first made in the 1840s, but none of these hymnals includes solfa notation. As Stevens (2005) notes, solfa “often resulted in the emergence of a school of indigenous composers writing in Tonic Sol-fa notation and using the tonal harmonic style”. That is clearly the case in this Nafsan story. In this paper, we will look in more detail at the Ririal song, noting its archaic content. Early translations of hymns often maintain vowels that are now lost in Nafsan, and the same appears to be the case with the Ririal song. It is indicative of the syncretism with which Christianity has been received in Efate that a method of transcription originally intended to make Christian hymns more accessible has been adapted in a monolingual set of kastom stories to present a traditional song.
  • Item
    Thumbnail Image
    The Pacific Expansion: Optimizing phonetic transcription of archival corpora
    Billington, R ; Stoakes, H ; Thieberger, N (ISCA-INT SPEECH COMMUNICATION ASSOC, 2021)
    For most of the world’s languages, detailed phonetic analyses across different aspects of the sound system do not exist, due in part to limitations in available speech data and tools for efficiently processing such data for low-resource languages. Archival language documentation collections offer opportunities to extend the scope and scale of phonetic research on low-resource languages, and developments in methods for automatic recognition and alignment of speech facilitate the preparation of phonetic corpora based on these collections. We present a case study applying speech modelling and forced alignment methods to narrative data for Nafsan, an Oceanic language of central Vanuatu. We examine the accuracy of the forced-aligned phonetic labelling based on limited speech data used in the modelling process, and compare acoustic and durational measures of 17,851 vowel tokens for 11 speakers with previous experimental phonetic data for Nafsan. Results point to the suitability of archival data for large-scale studies of phonetic variation in low-resource languages, and also suggest that this approach can feasibly be used as a starting point in expanding to phonetic comparisons across closely-related Oceanic languages.
  • Item
    Thumbnail Image
    Be Not Like the Wind: Access to Language and Music Records, Next Steps
    Thieberger, N ; Harris, A (European Language Resources Association (ELRA), 2020)
    Language archives play an important role in keeping records of the world’s languages safe. Accessible audio recordings held in archives can be used by speakers of small and endangered languages, and their communities, and provide a base for further research and documentation. There is an urgent need for historical analog tape recordings to be located and digitised, as they will soon be unplayable. PARADISEC holds records in 1228 languages. We run training for language documentation and are developing technologies to localise access to language records. A concerted effort is needed to support language archives and sustain language diversity.
  • Item
    No Preview Available
    The Language Documentation Quartet
    Musgrave, S ; Thieberger, N (University of Colorado at Boulder, 2021)
    As we noted in an earlier paper (Musgrave & Thieberger 2012), the written description of a language is an essentially hypertextual exercise, linking various kinds of material in a dense network. An aim based on that insight is to provide a model that can be implemented in tools for language documentation, allowing instantiation of the links always followed in writing a grammar or a dictionary, tracking backwards and forwards to the texts and media as the source of authority for claims made in an analysis. Our earlier paper described our initial efforts to encode Heath’s (1984) grammar, texts (1980), and dictionary (1982) of Nunggubuyu, an Australian language from eastern Arnhemland. We chose this body of work because it was written with many internal links between the three volumes. The links are all encoded with textual indexes which looked to be ready to be instantiated as automated hyperlinks once the technology was available. In this paper, we discuss our progress in identifying how the four component parts of a description (grammar, text, dictionary, media, henceforth the quartet) can be interlinked, what are the logical points at which to join them, and whether there are practical limits to how far this linking should be carried. We suggest that the problems which are exposed in this process can inform the development of an abstract or theoretical data structure for each of the components and these in turn can provide models for language documentation work which can feed into hypertext presentations of the type we are developing.