Extraction of neologisms from Japanese corpora
AffiliationComputing and Information Systems
Document TypePhD thesis
Access StatusOpen Access
© 2017 Dr. James Breen
In this thesis an exploration of the application of natural-language processing techniques to the extraction of neologisms from Japanese corpora is described. The research aim was to establish techniques which can be developed and exploited to assist significantly in neologism extraction for compiling Japanese monolingual and bilingual dictionaries. The particular challenge of the task is presented by the lack of word boundaries in Japanese text which creates a problem in the identification of unrecorded words. Three broad approaches have been explored, using a variety of language processing and artificial intelligence techniques, and drawing on large-scale Japanese corpora and reference lexicons: synthesis of possible Japanese words by mimicking Japanese morphological processes, followed by testing for the presence of candidate words in Japanese corpora; analysis of morpheme sequences in Japanese texts to determine the presence of potential new or unrecorded terms; and analysis of language patterns which are often used in Japanese in association with new and emerging terms. The research described in this thesis has identified a number of processes which can be used to assist lexicographers in the identification of unrecorded lexical items in Japanese texts.
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References