School of Languages and Linguistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 163
  • Item
    Thumbnail Image
    Крылатые выражения из советских кинофильмов как элементы национальной идентичности
    Kabiak, N (Издательство "Научный консультант", 2019)
    The paper is dedicated to the questions of the reflection of national identity in the headlines of newspapers Komsomol'skaya Pravda – Moscow, Izvestiya and Literaturnaya Gazeta for the period from 1st January 2017 until 1st July 2018, drawing upon examples of winged words adapted from Soviet films. Specific winged phrases considered uniquely reflective of Russian culture are highlighted. An attempt is made to explain the main reasons behind winged phrases transformations. The paper stresses the value of learning winged phrases from Soviet films in practical classes of Russian taught as a foreign language.
  • Item
    Thumbnail Image
    Building Speech Recognition Systems for Language Documentation: The CoEDL Endangered Language Pipeline and Inference System (ELPIS)
    Foley, B ; Arnold, J ; Coto-Solano, R ; Durantin, G ; Ellison, TM ; van Esch, D ; Heath, S ; Kratochvíl, F ; Maxwell-Smith, Z ; Nash, D ; Olsson, O ; Richards, M ; San, N ; Stoakes, H ; Thieberger, N ; Wiles, J (ISCA, 2018)
    Machine learning has revolutionized speech technologies for major world languages, but these technologies have generally not been available for the roughly 4,000 languages with populations of fewer than 10,000 speakers. This paper describes the development of ELPIS, a pipeline which language documentation workers with minimal computational experience can use to build their own speech recognition models, resulting in models being built for 16 languages from the Asia-Pacific region. ELPIS puts machine learning speech technologies within reach of people working with languages with scarce data, in a scalable way. This is impactful since it enables language communities to cross the digital divide, and speeds up language documentation. Complete automation of the process is not feasible for languages with small quantities of data and potentially large vocabularies. Hence our goal is not full automation, but rather to make a practical and effective workflow that integrates machine learning technologies.
  • Item
    Thumbnail Image
    Nasal aerodynamics and coarticulation in Bininj Kunwok: Smoothing Spline Analysis of Variance
    STOAKES, H ; Fletcher, J ; Butcher, AR ; Carignan, C ; Tyler, M (ASSTA, 2016-12-06)
    Nasal phonemes are well represented within the lexicon of BininjKunwok.1 Thisstudyexaminesintervocalic,wordmedial nasals and reports patterns of coarticulation using a Smooth- ing Spline Analysis of Variance (SSANOVA). This allows for detailed comparisons of peak nasal airflow across six female speakers of the language. Results show that in a VNV sequence there is very little anticipatory vowel nasalisation and greater carryover into a following vowel. The maximum peak nasal flow is delayed for coronals when compared to the onset of oral closure in the nasal, indicating a delayed velum opening gesture. The velar place of articulation is the exception to this pattern with some limited anticipatory nasalisation. The SSANOVA has shown to be an appropriate technique for quantifying these patterns and dynamic speech data in general.
  • Item
    Thumbnail Image
    The Pacific Expansion: Optimizing phonetic transcription of archival corpora
    Billington, R ; Stoakes, H ; Thieberger, N (ISCA-INT SPEECH COMMUNICATION ASSOC, 2021-01-01)
    For most of the world’s languages, detailed phonetic analyses across different aspects of the sound system do not exist, due in part to limitations in available speech data and tools for efficiently processing such data for low-resource languages. Archival language documentation collections offer opportunities to extend the scope and scale of phonetic research on low-resource languages, and developments in methods for automatic recognition and alignment of speech facilitate the preparation of phonetic corpora based on these collections. We present a case study applying speech modelling and forced alignment methods to narrative data for Nafsan, an Oceanic language of central Vanuatu. We examine the accuracy of the forced-aligned phonetic labelling based on limited speech data used in the modelling process, and compare acoustic and durational measures of 17,851 vowel tokens for 11 speakers with previous experimental phonetic data for Nafsan. Results point to the suitability of archival data for large-scale studies of phonetic variation in low-resource languages, and also suggest that this approach can feasibly be used as a starting point in expanding to phonetic comparisons across closely-related Oceanic languages.
  • Item
    Thumbnail Image
    Developing a workforce to support research reliant on data and compute
    Turpin, A ; Gruba, P ; Pozanenko, A ; Stupnikov, S ; Thalheim, B ; Mendez, E ; Kiselyova, N (CEUR, 2021-01-01)
    We describe the construction, operation and evaluation of the Melbourne Data Analytics Platform; a group of academics whose mission is to support research requiring non-trivial data analysis or compute at the University of Melbourne.
  • Item
    No Preview Available
    Developing global citizenship through real-world tasks – a virtual exchange between North American university students and Italian upper-secondary school students
    Trapè, R ; Hauck, M ; Muller-Hartmann, A (Research-publishing.net, 2020-11-30)
    This paper concerns a virtual exchange project between the University of Virginia (UVa), United States, and an upper-secondary school in Pavia, Italy. Centred on the question of gender equality, the project has been designed to take place over three years (2018–2021) with a direct reference to Robert O’Dowd’s transnational model of virtual exchange for global citizenship education, proposed in 2018. As an integrated part of the language learning curriculum, the project creates a virtual space which parallels the space-time of traditional class tuition, and which students can inhabit with a significant degree of autonomy. More specifically, this paper gives an account of how students, through real-world tasks, could develop global citizenship.
  • Item
    Thumbnail Image
    AusKidTalk: An Auditory-Visual Corpus of 3- to 12-Year-Old Australian Children's Speech
    Ahmed, B ; Ballard, KJ ; Burnham, D ; Sirojan, T ; Mehmood, H ; Estival, D ; Baker, E ; Cox, F ; Arciuli, J ; Benders, T ; Demuth, K ; Kelly, B ; Diskin-Holdaway, C ; Shahin, M ; Sethu, V ; Epps, J ; Lee, CB ; Ambikairajah, E (ISCA, 2021)
    Here we present AusKidTalk [1], an audio-visual (AV) corpus of Australian children’s speech collected to facilitate the development of speech based technological solutions for children. It builds upon the technology and expertise developed through the collection of an earlier corpus of Australian adult speech, AusTalk [2,3]. This multi-site initiative was established to remedy the dire shortage of children’s speech corpora in Australia and around the world that are sufficiently sized to train accurate automated speech processing tools for children. We are collecting ~600 hours of speech from children aged 3–12 years that includes single word and sentence productions as well as narrative and emotional speech. In this paper, we discuss the key requirements for AusKidTalk and how we designed the recording setup and protocol to meet them. We also discuss key findings from our feasibility study of the recording protocol, recording tools, and user interface.
  • Item
    Thumbnail Image
    Multilingualism in Cyberspace - Longevity for Documentation of Small Languages
    Thieberger, N (Interregional Library Cooperation Centre, 2012)
  • Item
    Thumbnail Image
    The “other” Spanish: Methodological issues in the study of speech timing in Chilean Spanish
    Reynolds, I ; Maxwell, O ; Wigglesworth, G (International Speech Communication Association (ISCA), 2020-01-01)
    This paper is a preliminary account of speech rhythm and some phonological properties of Chilean Spanish in spontaneous dialogues. Different dialects of Spanish have been studied using rhythm metrics measuring the durational variability of vocalic and consonantal intervals. There are, however, methodological issues regarding the segmentation of intervals, often overlooked in previous research, such as the criteria for categorising certain segments into the different intervals and the segmentation of different voice qualities. The present study addresses this gap and compares rhythm metrics obtained using two methods of segmentation based on the available literature. The analyses reveal that a strictly 'acoustic' approach to segmentation of intervals results in slightly inflated metrics. Nevertheless, both methods show there is significant durational interval variability in Chilean Spanish, compared to other dialects of Spanish, that may be connected to phonological properties of the variety.
  • Item
    Thumbnail Image
    Be Not Like the Wind: Access to Language and Music Records, Next Steps
    Thieberger, N ; Harris, A (European Language Resources Association (ELRA), 2020)
    Language archives play an important role in keeping records of the world’s languages safe. Accessible audio recordings held in archives can be used by speakers of small and endangered languages, and their communities, and provide a base for further research and documentation. There is an urgent need for historical analog tape recordings to be located and digitised, as they will soon be unplayable. PARADISEC holds records in 1228 languages. We run training for language documentation and are developing technologies to localise access to language records. A concerted effort is needed to support language archives and sustain language diversity.