Show simple item record

dc.contributor.authorBALDWIN, TIMOTHY
dc.contributor.authorAwab, Su'ad
dc.date.accessioned2014-05-22T09:15:18Z
dc.date.available2014-05-22T09:15:18Z
dc.date.issued2006en_US
dc.date.submitted2007-02-02en_US
dc.identifier.citationBaldwin, T., & Awab, S. (2006). Open source corpus analysis tools for Malay. In, Proceedings, the 5th International Conference on Language Resources and Evaluation (LREC2006), Genoa, Italy.en_US
dc.identifier.urihttp://hdl.handle.net/11343/33498
dc.description.abstractTokenisers, lemmatisers and POS taggers are vital to the linguistic and digital furtherment of any language. In this paper, we present an open source toolkit for Malay incorporating a word and sentence tokeniser, a lemmatiser and a partial POS tagger, based on heavy reuse of pre-existing language resources. We outline the software architecture of each component, and present an evaluation of each over a 26K word sample of Malay text.en_US
dc.formatapplication/pdfen_US
dc.languageengen_US
dc.subjectMalayen_US
dc.subjecttokeniseren_US
dc.subjectlemmatiseren_US
dc.subjectmorphological analyseren_US
dc.titleOpen source corpus analysis tools for Malayen_US
dc.typeConference Paperen_US
melbourne.peerreviewNon Peer Revieweden_US
melbourne.affiliation.departmentEngineering: Department of Computer Science and Software Engineeringen_US
melbourne.publication.statusPublisheden_US
melbourne.source.titleProceedings, the 5th International Conference on Language Resources and Evaluation (LREC2006)en_US
melbourne.source.pages2212-2215en_US
melbourne.source.locationconferenceGenoa, Italyen_US
dc.description.sourcedateMay, 2006en_US
melbourne.elementsidNA
melbourne.contributor.authorBaldwin, Timothy
melbourne.accessrightsOpen Access


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record