University Library
  • Login
A gateway to Melbourne's research publications
Minerva Access is the University's Institutional Repository. It aims to collect, preserve, and showcase the intellectual output of staff and students of the University of Melbourne for a global audience.
View Item 
  • Minerva Access
  • Engineering and Information Technology
  • Computing and Information Systems
  • Computing and Information Systems - Research Publications
  • View Item
  • Minerva Access
  • Engineering and Information Technology
  • Computing and Information Systems
  • Computing and Information Systems - Research Publications
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

    Open source corpus analysis tools for Malay

    Thumbnail
    Download
    Open source corpus analysis tools for Malay (28.92Kb)

    Citations
    Altmetric
    Author
    BALDWIN, TIMOTHY; Awab, Su'ad
    Date
    2006
    Source Title
    Proceedings, the 5th International Conference on Language Resources and Evaluation (LREC2006)
    University of Melbourne Author/s
    Baldwin, Timothy
    Affiliation
    Engineering: Department of Computer Science and Software Engineering
    Metadata
    Show full item record
    Document Type
    Conference Paper
    Citations
    Baldwin, T., & Awab, S. (2006). Open source corpus analysis tools for Malay. In, Proceedings, the 5th International Conference on Language Resources and Evaluation (LREC2006), Genoa, Italy.
    Access Status
    Open Access
    URI
    http://hdl.handle.net/11343/33498
    Abstract
    Tokenisers, lemmatisers and POS taggers are vital to the linguistic and digital furtherment of any language. In this paper, we present an open source toolkit for Malay incorporating a word and sentence tokeniser, a lemmatiser and a partial POS tagger, based on heavy reuse of pre-existing language resources. We outline the software architecture of each component, and present an evaluation of each over a 26K word sample of Malay text.
    Keywords
    Malay; tokeniser; lemmatiser; morphological analyser

    Export Reference in RIS Format     

    Endnote

    • Click on "Export Reference in RIS Format" and choose "open with... Endnote".

    Refworks

    • Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References


    Collections
    • Computing and Information Systems - Research Publications [1565]
    Minerva AccessDepositing Your Work (for University of Melbourne Staff and Students)NewsFAQs

    BrowseCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects
    My AccountLoginRegister
    StatisticsMost Popular ItemsStatistics by CountryMost Popular Authors