School of Languages and Linguistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 28
  • Item
    Thumbnail Image
    Does Automatic Speech Recognition (ASR) Have a Role in the Transcription of Indistinct Covert Recordings for Forensic Purposes?
    Loakes, D (FRONTIERS MEDIA SA, 2022-06-14)
    The transcription of covert recordings used as evidence in court is a huge issue for forensic linguistics. Covert recordings are typically made under conditions in which the device needs to be hidden, and so the resulting speech is generally indistinct, with overlapping voices and background noise, and in many cases the acoustic record cannot be analyzed via conventional phonetic techniques (i.e. phonetic segments are unclear, or there are no cues at all present acoustically). In the case of indistinct audio, the resulting transcripts that are produced, often by police working on the case, are often questionable and despite their unreliable nature can be provided as evidence in court. Injustices can, and have, occurred. Given the growing performance of automatic speech recognition (ASR) technologies, and growing reliance on such technologies in everyday life, a common question asked, especially by lawyers and other legal professionals, is whether ASR can solve the problem of what was said in indistinct forensic audio, and this is the main focus of the current paper. The paper also looks at forced alignment, a way of automatically aligning an existing transcriptions to audio. This is an area that needs to be explored in the context of forensic linguistics because transcripts can technically be “aligned” with any audio, making it seem as if it is “correct” even if it is not. The aim of this research is to demonstrate how automatic transcription systems fare using forensic-like audio, and with more than one system. Forensic-like audio is most appropriate for research, because there is greater certainty with what the speech material consists of (unlike in forensic situations where it cannot be verified). Examples of how various ASR systems cope with indistinct audio are shown, highlighting that when a good-quality recording is used ASR systems cope well, with the resulting transcript being usable and, for the most part, accurate. When a poor-quality, forensic-like recording is used, on the other hand, the resulting transcript is effectively unusable, with numerous errors and very few words recognized (and in some cases, no words recognized). The paper also demonstrates some of the problems that arise when forced-alignment is used with indistinct forensic-like audio—the transcript is simply “forced” onto an audio signal giving completely wrong alignment. This research shows that the way things currently stand, computational methods are not suitable for solving the issue of transcription of indistinct forensic audio for a range of reasons. Such systems cannot transcribe what was said in indistinct covert recordings, nor can they determine who uttered the words and phrases in such recordings, nor prove that a transcript is “right” (or wrong). These systems can indeed be used advantageously in research, and for various other purposes, and the reasons they do not work for forensic transcription stems from the nature of the recording conditions, as well as the nature of the forensic context.
  • Item
    No Preview Available
    Acoustic injustice: The experience of listening to indistinct covert recordings presented as evidence in court
    Fraser, H ; Loakes, D (University of Wollongong, 2020)
    Audio recorded by hidden listening devices can provide powerful evidence in criminal trials. Unfortunately these covert recordings are often indistinct, to the extent the court needs a transcript to understand the content. Australian law allows police to provide transcripts as ‘ad hoc experts’. Legal procedures incorporate safeguards intended to ensure the transcripts are not misleading. The problem is that these safeguards have been shown to be ineffective, with multiple examples of inaccurate transcripts being provided to ‘assist’ the jury in determining what is said and who is saying it. The present paper explains the problem, provides an accessible overview of the nature of speech and how speech perception works, and outlines the solution proposed by the Research Hub for Language in Forensic Evidence to the ‘acoustic injustice’ embodied in current legal procedures.
  • Item
    Thumbnail Image
    New insights into /el/-/AE l/ merging in Australian English
    Schmidt, P ; Diskin-Holdaway, C ; Loakes, D (ROUTLEDGE JOURNALS, TAYLOR & FRANCIS LTD, 2021-04-21)
    A merger exists in Australian English in which /el/ is realized as [æl] for a number of speakers, particularly in Victoria. There have also been some observations of /æl/ raising to [el], termed “transposition”. Although thought to be characteristic of older speakers, empirical evidence for transposition is scant. Here we report the discovery of substantive degrees of merging in thirteen older speakers, aged between 51 and 80, from Ocean Grove, Victoria. Auditory and acoustic methods showed bidirectional vowel movement, with speakers converging on both the /æ/ and /e/ phonemes. Increasing velarization of the lateral has been posited as a factor in the development of the merger in Victoria, and thus /l/ quality was also investigated, with null results in terms of direct factors. The lateral, however, was shown to be dark in both syllable onset and coda positions, with evidence for /l/ being clearer in this age group when compared with younger speakers. Lexical frequency and orthography were also investigated as factors, the latter showing a significant effect and suggesting a role for velarization as a contrast maintenance strategy.
  • Item
    Thumbnail Image
    They Talk Muṯumuṯu: Variable Elision of Tense Suffixes in Contemporary Pitjantjatjara
    Wilmoth, S ; Defina, R ; Loakes, D (MDPI AG, 2021)
    Vowel elision is common in Pitjantjatjara and Yankunytjatjara connected speech. It also appears to be a locus of language change, with young people extending elision to new contexts; resulting in a distinctive style of speech which speakers refer to as muṯumuṯu (‘short’ speech). This study examines the productions of utterance-final past tense suffixes /-nu, -ɳu, -ŋu/ by four older and four younger Pitjantjatjara speakers in spontaneous speech. This is a context where elision tends not to be sociolinguistically or perceptually salient. We find extensive variance within and between speakers in the realization of both the vowel and nasal segments. We also find evidence of a change in progress, with a mixed effects model showing that among the older speakers, elision is associated with both the place of articulation of the nasal segment and the metrical structure of the verbal stem, while among the younger speakers, elision is associated with place of articulation but metrical structure plays little role. This is in line with a reanalysis of the conditions for elision by younger speakers based on the variability present in the speech of older people. Such a reanalysis would also account for many of the sociolinguistically marked extended contexts of elision.
  • Item
    Thumbnail Image
    Attitudes towards Indian English among young urban professionals in Hyderabad, India
    Maxwell, O ; Diskin-Holdaway, C ; Loakes, D (WILEY, 2021-04-01)
  • Item
    Thumbnail Image
    A sociophonetic analysis of vowels produced by female Irish migrants: Investigating second dialect contact in Melbourne
    Diskin, C ; Loakes, D ; Clothier, J ; Volchok, B ; Calhoun, S ; Escudero, P ; TABAIN, MARIJA ; Warren, P (Australian Speech Science and Technology Association, 2019-08)
    We present preliminary results of an acoustic analysis of monophthongal vowels produced by five female Irish migrants in Melbourne, with lengths of residence in Australia between 1.5 and 9.5 years. This sample is compared with five female Australian English (AusE) participants. Results show greater overall variability within the Irish group compared to the AusE group for the majority of vowels. Sociophonetic variability also emerged, for example with only two migrants producing an expected Irish English FOOT-STRUT merger. One ‘non-merger’ with the longest length of residence, and a social network comprised exclusively of Australians, also displayed initial signs of movement towards other AusE vowel targets, such as a fronted /ʉ:/. This research contributes to our understanding of the dynamics of dialect contact, indicating movement in the direction of AusE after approximately ten years of exposure.
  • Item
    Thumbnail Image
    Individual differences and sound change actuation: evidence from imitation and perception of English /str/
    Stevens, M ; Loakes, D ; Calhoun, S ; Escudero, P ; TABAIN, M ; Warren, P (Australasian Speech Science and Technology Association Inc, 2019-08-05)
    This study investigates the role of individual differences in the earliest stages of sound change, taking as a case study /s/-retraction in . Australian English speakers completed an auditory repetition task involving isolated words with word-initial sibilants. In -words the sibilant was manipulated to resemble post-alveolar [ʃ]. The same participants also completed a forced choice perception task involving sibilants. The study tests two predictions: (1) the sibilant in -words should shift in the direction [s] → [ʃ] during exposure; (2) the magnitude and direction of shifts within individuals should depend on their own phonetic repertoire (production and perception). Results did not support (1) but there was partial support for (2) from production. The implications of the results for models of sound change are discussed.
  • Item
    Thumbnail Image
    The /el/-/æl/ merger in Australian English: Acoustic and articulatory insights
    Diskin, C ; Loakes, D ; Billington, R ; Stoakes, H ; Gonzalez, S ; Kirkham, S ; Calhoun, S ; Escudero, P ; Tabain, M ; Warren, P (Australasian Speech Science and Technology Association, 2019)
    This paper investigates a merger-in-progress of /e/-/æ/ in prelateral contexts for speakers of Australian English in Victoria. Twelve participants (7F, 5M) were recorded producing a wordlist resulting in acoustic and concurrent articulatory data via stabilised mid-sagittal ultrasound tongue imaging. Focusing on a subset of the data comprising short front vowels /ɪ, e, æ/ in /hVt/ and /hVl/ contexts, findings show that there are robust acoustic differences between /e/ and /æ/ preceding /t/, as anticipated. However, individual differences emerge for /e/ and /æ/ preceding /l/, with highly gradient production patterns across the speakers, ranging from speakers who exhibit merger behaviour to those who maintain categorical distinctions. The evidence for merging behaviour across speakers is similar, but does not map directly, across both the acoustic and articulatory data, and illustrates the value of incorporating a range of data types in investigating a merger-in-progress.
  • Item
    Thumbnail Image
    Tracking vowel categorization behaviour longitudinally: a study across three x three year increments (2012, 2015, 2018)
    Loakes, D ; Escudero, P ; Clothier, J ; Hajek, J ; Calhoun, S ; Escudero, P ; TABAIN, M ; Warren, P (Australasian Speech Science and Technology Association Inc., 2019)
    Longitudinal data provide a unique opportunity to address questions around language change, and speaker/listener behaviour. Processing behaviour is considered subject to change over time, but it remains an open question as to over what time period incremental changes might occur. This study compares responses to a forced-choice listening test over three x three-year increments (2012, 2015, 2018), from a set of the same ten mainstream Australian English listeners. The listeners are from a small town (Warrnambool, Australia), where crucially, a distinction between /el/-/æl/ is lost for many. Here we focus on the contrasts between /ɪ e æ/ in /hVt/ and /CVL/ environments. Despite our predictions, overall results show that the increments, which span six years in total, are too small for any changes to arise. This study contributes to our understanding of longitudinal processing behaviour, showing overall consistency across 2012-2018, even in the context of a merger in-progress.
  • Item
    Thumbnail Image
    Varietal differences in categorisation of /ɪ e æ/: A case study of Irish and Australian English listeners in Melbourne
    Diskin, C ; Loakes, D ; Clothier, J ; Epps, J ; Wolfe, J ; Smith, J ; Jones, C (ASSTA, 2018)
    This paper presents results of a vowel categorisation task of front lax vowels in /hVt/, /hVl/ and /mVl/ contexts, by 12 native Australian English speakers and 10 Irish migrants residing in Melbourne. Results show significant differences in how listeners categorise these vowels, in five out of six phonetic contexts. Vowels suggested to be undergoing merger in Victoria, specifically /el-æl/, are not perceived as merged, indicating this phenomenon may be stratified and/or more age-graded than previously reported. Results show clear differences between listeners sharing an L1 but speaking different dialects, even when these dialects are in direct contact due to migration.