- Computing and Information Systems - Research Publications
Computing and Information Systems - Research Publications
Permanent URI for this collection
Search Results
Now showing
1 - 2 of 2
-
ItemReconsidering language identification for written language resourcesHUGHES, BADEN ; BALDWIN, TIMOTHY ; BIRD, STEVEN ; NICHOLSON, JEREMY ; MACKINLAY, ANDREW (European Language Resources Association, 2006)The task of identifying the language in which a given document (ranging from a sentence to thousands of pages) is written has been relatively well studied over several decades. Automated approaches to written language identification are used widely throughout research and industrial contexts, over both oral and written source materials. Despite this widespread acceptance, a review of previous research in written language identification reveals a number of questions which remain open and ripe for further investigation.
-
ItemFeature-based Encoding and Querying Language Resources with Character SemanticsHUGHES, B. ; Gibbon, D. ; Trippel, T. (European Language Resources Association, 2006)In this paper we discuss the explicit representation of character features pertaining to written language resources, which we argue are critically necessary in the long term of archiving language data. Much focus on the creation of language resources and their associated preservation is at the level of the corpus itself; however it is generally accepted that long term interpretation of these resources requires more than a best practice data format. In particular, where language resources are created in linguistic fieldwork, and especially for minority languages, the need for preservation not only of the resource itself, but of additional metadata which allows for the resource to be accurately interpreted in the future is becoming a topic of research in itself. In this paper we extend earlier work on semantically based character decomposition to include representation of character properties in a variety of models and a mechanism for exploiting these properties through queries.