- Computing and Information Systems - Research Publications
Computing and Information Systems - Research Publications
Permanent URI for this collection
Search Results
Now showing
1 - 3 of 3
-
ItemCollecting low-density language materials on the WebBaldwin, Timothy ; BIRD, STEPHEN ; HUGHES, BADEN (Southern Cross University, 2006)Most web content exists in a few dozen languages. Hundreds of other languages - the `low-density languages' - are only represented in scarce quantities on the web. How can we locate, store and describe these low-density resources? In particular, how can we identify linguistically interesting resources, such as translation sets and multilingual documents? In this paper we describe ongoing research in which we integrate a number of discrete systems (language data crawler, automated metadata generation tools, language data repositories and federated search services) to address the identification, retrieval, description, storage and access issues for low-density language materials from the web.
-
ItemAnalysis and prediction of user behaviour in a museum environmentGrieser, Karl ; Baldwin, Timothy ; Bird, Steven (Australasian Language Technology Association, 2006)N/A
-
ItemReconsidering language identification for written language resourcesHUGHES, BADEN ; BALDWIN, TIMOTHY ; BIRD, STEVEN ; NICHOLSON, JEREMY ; MACKINLAY, ANDREW (European Language Resources Association, 2006)The task of identifying the language in which a given document (ranging from a sentence to thousands of pages) is written has been relatively well studied over several decades. Automated approaches to written language identification are used widely throughout research and industrial contexts, over both oral and written source materials. Despite this widespread acceptance, a review of previous research in written language identification reveals a number of questions which remain open and ripe for further investigation.