School of Languages and Linguistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 13
  • Item
    Thumbnail Image
    Building Speech Recognition Systems for Language Documentation: The CoEDL Endangered Language Pipeline and Inference System (ELPIS)
    Foley, B ; Arnold, J ; Coto-Solano, R ; Durantin, G ; Ellison, TM ; van Esch, D ; Heath, S ; Kratochvíl, F ; Maxwell-Smith, Z ; Nash, D ; Olsson, O ; Richards, M ; San, N ; Stoakes, H ; Thieberger, N ; Wiles, J (ISCA, 2018)
    Machine learning has revolutionized speech technologies for major world languages, but these technologies have generally not been available for the roughly 4,000 languages with populations of fewer than 10,000 speakers. This paper describes the development of ELPIS, a pipeline which language documentation workers with minimal computational experience can use to build their own speech recognition models, resulting in models being built for 16 languages from the Asia-Pacific region. ELPIS puts machine learning speech technologies within reach of people working with languages with scarce data, in a scalable way. This is impactful since it enables language communities to cross the digital divide, and speeds up language documentation. Complete automation of the process is not feasible for languages with small quantities of data and potentially large vocabularies. Hence our goal is not full automation, but rather to make a practical and effective workflow that integrates machine learning technologies.
  • Item
    Thumbnail Image
    Nasal aerodynamics and coarticulation in Bininj Kunwok: Smoothing Spline Analysis of Variance
    STOAKES, H ; Fletcher, J ; Butcher, AR ; Carignan, C ; Tyler, M (ASSTA, 2016-12-06)
    Nasal phonemes are well represented within the lexicon of BininjKunwok.1 Thisstudyexaminesintervocalic,wordmedial nasals and reports patterns of coarticulation using a Smooth- ing Spline Analysis of Variance (SSANOVA). This allows for detailed comparisons of peak nasal airflow across six female speakers of the language. Results show that in a VNV sequence there is very little anticipatory vowel nasalisation and greater carryover into a following vowel. The maximum peak nasal flow is delayed for coronals when compared to the onset of oral closure in the nasal, indicating a delayed velum opening gesture. The velar place of articulation is the exception to this pattern with some limited anticipatory nasalisation. The SSANOVA has shown to be an appropriate technique for quantifying these patterns and dynamic speech data in general.
  • Item
    Thumbnail Image
    Prosodically Conditioned Consonant Duration in Djambarrpuyŋu.
    Jepson, K ; Fletcher, J ; Stoakes, H (SAGE Publications, 2019-03-01)
    Cross-linguistically, segments typically lengthen because of proximity to prosodic events such as intonational phrase or phonological phrase boundaries, a phrasal accent, or due to lexical stress. Australian Indigenous languages have been claimed to operate somewhat differently in terms of prosodically conditioned consonant lengthening and strengthening. Consonants have been found to lengthen after a vowel bearing a phrasal pitch accent. It is further claimed that this post-tonic position is a position of prosodic strength in Australian languages. In this study, we investigate the effects of proximity to a phrasal pitch accent and prosodic constituent boundaries on the duration of stop and nasal consonants in words of varying lengths in Djambarrpuyŋu, an Australian Indigenous language spoken in northeast Arnhem Land, Northern Territory, Australia. Our results suggest that the post-tonic consonant position does not condition longer consonant duration compared with other word-medial consonants, with one exception: Intervocalic post-tonic consonants in disyllabic words are significantly longer than word-medial consonants elsewhere. Therefore, it appears that polysyllabic shortening has a strong effect on segment duration in these data. Word-initial position did not condition longer consonant duration than word-medial position. Further, initial consonants in higher-level prosodic domains had shorter consonant duration compared with domain-medial word-initial consonants. By contrast, domain-final lengthening was observed in our data, with word-final nasals preceding a pause found to be significantly longer than all other consonants. Taken together, these findings for Djambarrpuyŋu suggest that, unlike other Australian languages, post-tonic lengthening is not a cue to prosodic prominence, whereas prosodic domain-initial and -final duration patterns of consonants are like those that have been observed in other languages of the world.
  • Item
    Thumbnail Image
    The inconspicuous substratum Indigenous Australian languages and the phonetics of stop contrasts in English on Croker Island
    Mailhammer, R ; Sherwood, S ; Stoakes, H (John Benjamins Publishing, 2020-01-01)
    Descriptions of Australian Aboriginal English list the neutralisation of the Standard English contrast between so-called voiced and voiceless stops as one characteristic feature. This paper reports on the results of an acoustic analysis of data collected in a production task by monolingual speakers of Standard Australian English in Sydney, of Aboriginal English on Croker Island, Northern Territory, and bilingual speakers of Iwaidja/Aboriginal English and Kunwinjku/Aboriginal English on Croker Island. The results show that average values for Voice Onset Time, the main correlate of the “stop voicing contrast” in English, and Closure Duration collected from Aboriginal speakers of English do not significantly differ from that of speakers of Standard Australian English, irrespective of language background. This result proves that the stop contrast is not neutralised by these Aboriginal speakers of English. However, it can be shown that phonetic voicing manifesting itself in Voice Termination Time is a prevalent and characteristic feature of Aboriginal English on Croker Island. This feature aligns Aboriginal English on Croker Island with local Aboriginal languages and differentiates it from Standard Australian English.
  • Item
    Thumbnail Image
    The Pacific Expansion: Optimizing phonetic transcription of archival corpora
    Billington, R ; Stoakes, H ; Thieberger, N (ISCA-INT SPEECH COMMUNICATION ASSOC, 2021-01-01)
    For most of the world’s languages, detailed phonetic analyses across different aspects of the sound system do not exist, due in part to limitations in available speech data and tools for efficiently processing such data for low-resource languages. Archival language documentation collections offer opportunities to extend the scope and scale of phonetic research on low-resource languages, and developments in methods for automatic recognition and alignment of speech facilitate the preparation of phonetic corpora based on these collections. We present a case study applying speech modelling and forced alignment methods to narrative data for Nafsan, an Oceanic language of central Vanuatu. We examine the accuracy of the forced-aligned phonetic labelling based on limited speech data used in the modelling process, and compare acoustic and durational measures of 17,851 vowel tokens for 11 speakers with previous experimental phonetic data for Nafsan. Results point to the suitability of archival data for large-scale studies of phonetic variation in low-resource languages, and also suggest that this approach can feasibly be used as a starting point in expanding to phonetic comparisons across closely-related Oceanic languages.
  • Item
    Thumbnail Image
    Nasal coarticulation in Bininj Kunwok: An aerodynamic analysis
    Stoakes, HM ; Fletcher, JM ; Butcher, AR (Cambridge University Press (CUP), 2020-12-01)
    Bininj Kunwok (BKw), a language spoken in Northern Australia, restricts the degree of anticipatory nasalization, as suggested by previous aerodynamic and acoustic analyses (Butcher 1999). The current study uses aerodynamic measurements of speech to investigate patterns of nasalization and nasal articulation in Bininj Kunwok to compare with Australian languages more generally. The role of nasal coarticulation in ensuring language compre-hensibility a key question in phonetics research today is explored. Nasal aerodynamics is measured in intervocalic, word-medial nasals in the speech of five female speakers of BKw and data are analyzed using Smoothing Spline Analysis of Variance (SSANOVA) and Functional Data Analysis averaging techniques. Results show that in a VNV sequence there is very little anticipatory vowel nasalization with no restriction on carryover nasalization for a following vowel. The maximum peak nasal flow is delayed until the oral release of a nasal for coronal articulations, indicating a delayed velum opening gesture. Patterns of anticipatory nasalization appears similar to nasal airflow in French non-nasalized vowels in oral vowel plus nasal environments (Delvaux et al. 2008). Findings show that Bininj Kunwok speakers use language specific strategies in order to limit anticipatory nasalization, enhancing place of articulation cues at a site of intonational prominence which also is also the location of the majority of place of articulation contrasts within the language. Patterns of airflow suggest enhancement and coarticulatory resistance in prosodically prominent VN and VNC sequences which we interpret as evidence of speakers maintaining a phonological contrast to enhance place of articulation cues.
  • Item
    Thumbnail Image
    Scaling processes of clause chains in Pitjantjatjara
    Defina, R ; Torres, C ; Stoakes, H (Interspeech, 2020)
    Clause chains are a syntactic strategy for combining multiple clauses into a single unit. They are reported in many languages, including Korean and Turkish. However, they have seen relatively little focused research. In particular, prosodic features are often mentioned in descriptions of clause chaining, however there have been vanishingly few investigations. Corpus-based studies of the prosody of clause chains in two unrelated languages of Papua New Guinea report that they are typically produced as a sequence of Intonation phrases united by pitch-scaling of the L% boundary tones in each clause with only the final, finite, clause descending to a full L%. The present study is the first experimental investigation of the prosody of clause chains in Pitjantjatjara. This paper focuses on one type of clause chain found in the Australian Indigenous language Pitjantjatjara. We examine a set of 120 clause chains read out by three native Pitjantjatjara speakers. Prosodic analysis reveals that these Pitjantjatjara clause chains are produced within a single Intonational Phrase. Speakers do not pause between the clauses in the chain, there is consistent linear downstep throughout the phrase and additionally phrase final lowering occurs at the end of the utterance. This differs from previous impressionistic studies of the prosody of clause chains.
  • Item
    Thumbnail Image
    Building Speech Recognition Systems for Language Documentation: The CoEDL Endangered Language Pipeline and Inference System (ELPIS)
    Foley, B ; Arnold, J ; Coto-Solano, R ; Durantin, G ; Mark, E ; van Esch, D ; Heath, S ; Kratochvíl, F ; Maxwell-Smith, Z ; Nash, D ; Olsson, O ; Richards, M ; San, N ; Stoakes, H ; Thieberger, N ; Wiles, J (International Speech Communication Association, 2018-08-30)
    Machine learning has revolutionised speech technologies for major world languages, but these technologies have generally not been available for the roughly 4,000 languages with populations of fewer than 10,000 speakers. This paper describes the development of Elpis, a pipeline which language documentation workers with minimal computational experience can use to build their own speech recognition models, resulting in models being built for 16 languages from the Asia-Pacific region. Elpis puts machine learning speech technologies within reach of people working with languages with scarce data, in a scalable way. This is impactful since it enables language communities to cross the digital divide, and speeds up language documentation. Complete automation of the process is not feasible for languages with small quantities of data and potentially large vocabularies. Hence our goal is not full automation, but rather to make a practical and effective workflow that integrates machine learning technologies.
  • Item
    Thumbnail Image
    Intonational correlates of subject and object realisation in Mawng (Australian)
    FLETCHER, J ; Stoakes, H ; Singer, R ; Loakes, D ; BARNES, J ; VEILLEUX, N ; SHATTUCK-HUFNAGEL, S ; BRUGOS, A (ISCA, 2016)
    A range of intonational devices can be used in the grammar of information and corrective focus marking in languages with relatively free word order. In this paper we explore whether nouns in the Australian Indigenous language Mawng are realised differently depending on syntactic function and focus. Results show that the pitch level associated with Subjects is higher in conditions of corrective focus compared to other utterance contexts and there is a strong correlation between focus and utterance position. Placing a word in a corrective focus context does not appear to have an effect on word duration in this corpus confirming that pitch register variation and intonational phrasing are the major prosodic cues associated with corrective focus in Mawng.
  • Item
    Thumbnail Image
    Pointing out directions in Murrinhpatha
    Blythe, J ; Mardigan, KC ; Perdjert, ME ; STOAKES, H (De Gruyter Open, 2016)
    Rather than using abstract directionals, speakers of the Australian Aboriginal language Murrinhpatha make reference to locations of interest using named landmarks, demonstratives and pointing. Building on a culturally prescribed avoidance for certain placenames, this study reports on the use of demonstratives, pointing and landmarks for direction giving. Whether or not pointing will be used, and which demonstratives will be selected is determined partly by the relative epistemic incline between interlocutors and partly by whether information about a location is being sought or being provided. The reliance on pointing for the representation of spatial vectors requires a construal of language that includes the visuo-corporal modality.