School of Languages and Linguistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 1 of 1
  • Item
    No Preview Available
    Nyingarn: Supporting Australian Indigenous languages from textual sources1
    Thieberger, N ; Lewincamp, S ; Rosa, ML (IEEE, 2023-01-01)
    For many Indigenous languages there are few records, and the earliest written sources are witnesses of the language as spoken before major changes that were typically brought about by colonial expansion. In Australia, as a result of settler aggression, removal of Indigenous children, and displacement of the original inhabitants from their land, many Indigenous languages are no longer spoken. In this situation, the earliest sources that record aspects of these languages become all the more valuable as resources for relearning the languages. However, as paper documents in a single repository, they can be difficult to access and to use. We report on our project to take such manuscripts, convert them to text, and to create a platform in which they can be found and used. We have written a new platform taking advantage of current technology as we found nothing that was suitable in existing systems. We allow for various input formats for the text and store it and images as Research Object Crates (RO-Crates). We use Amazon Textract for Optical Character Recognition (OCR) of text in images and accept pre-existing transcriptions in CSV and TEI formats (and Word documents converted to TEI). We successfully use an existing crowdsourcing platform to have documents transcribed. Initial preparation is done in a workspace, with TEI as the encoding system. Finalised documents are pushed to a repository for exploration via geographic maps or text searching, and can then be downloaded in various formats for re-use. Once texts are prepared in this way, we can submit them to an algorithm to detect non-English items and tag that text as being in the Indigenous languages. The platform is live and currently has 400 manuscripts submitted to it: https://nyingarn.net.