Computing and Information Systems - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 29
  • Item
    Thumbnail Image
    Structuring Documents Efficiently
    MARSHALL, RGJ ; BIRD, SG ; STUCKEY, PJ (University of Sydney, 2005)
  • Item
    Thumbnail Image
    The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics
    Bird, S ; Dale, R ; Dorr, BJ ; Gibson, B ; Joseph, MT ; Kan, M-Y ; Lee, D ; Powley, B ; Radev, DR ; Tan, YF (EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA, 2008)
  • Item
    Thumbnail Image
    Defining a core body of knowledge for the introductory computational linguistics curriculum
    BIRD, STEVEN ( 2008)
    Discourse in and about computational linguistics depends on a shared body of knowledge. However, little content is shared across the introductory courses in this field. Instead, they typically cover a diverse assortment of topics tailored to the capabilities of the students and the interests of the instructor. If the core body of knowledge could be agreed and incorporated into introductory courses several benefits would ensue, such as the proliferation of instructional materials, software support, and extension modules building on a common foundation. This paper argues that it is worthwhile to articulate a core body of knowledge, and proposes a starting point based on the ACM Computer Science Curriculum. A variety of issues specific to the multidisciplinary nature of computational linguistics are explored.
  • Item
    Thumbnail Image
    Multidisciplinary instruction with the Natural Language Toolkit
    Bird, S ; Klein, E ; Loper, E ; Baldridge, J (Association for Computational Linguistics, 2008)
  • Item
    Thumbnail Image
    Graphical query for linguistic treebanks
    BIRD, STEVEN ; Lee, Haejoong ( 2007)
    Databases of hierarchically annotated text occupy a central place in linguistic research and language technology development. We describe a new approach to tree query which we call "Query by Annotation". Users express a query by annotating a tree, and the annotation is compiled into an expression in a path language. The result trees are overlaid with the original query, permitting the user to see why they match. Since queries and results are annotated trees, users can easily refine and resubmit their queries. The approach to Query by Annotation is motivated and exemplified using databases of linguistic trees, or treebanks.
  • Item
    Thumbnail Image
    Collecting low-density language materials on the Web
    Baldwin, Timothy ; BIRD, STEPHEN ; HUGHES, BADEN (Southern Cross University, 2006)
    Most web content exists in a few dozen languages. Hundreds of other languages - the `low-density languages' - are only represented in scarce quantities on the web. How can we locate, store and describe these low-density resources? In particular, how can we identify linguistically interesting resources, such as translation sets and multilingual documents? In this paper we describe ongoing research in which we integrate a number of discrete systems (language data crawler, automated metadata generation tools, language data repositories and federated search services) to address the identification, retrieval, description, storage and access issues for low-density language materials from the web.
  • Item
    Thumbnail Image
    Analysis and prediction of user behaviour in a museum environment
    Grieser, Karl ; Baldwin, Timothy ; Bird, Steven (Australasian Language Technology Association, 2006)
    N/A
  • Item
    Thumbnail Image
    Reconsidering language identification for written language resources
    HUGHES, BADEN ; BALDWIN, TIMOTHY ; BIRD, STEVEN ; NICHOLSON, JEREMY ; MACKINLAY, ANDREW (European Language Resources Association, 2006)
    The task of identifying the language in which a given document (ranging from a sentence to thousands of pages) is written has been relatively well studied over several decades. Automated approaches to written language identification are used widely throughout research and industrial contexts, over both oral and written source materials. Despite this widespread acceptance, a review of previous research in written language identification reveals a number of questions which remain open and ripe for further investigation.
  • Item
    Thumbnail Image
    Building a Search Engine to Drive Problem-Based Learning
    BIRD, STEVEN ; Curran, James (ACM, 2006)
    Search engines pervade the digital world, mediating most access to information instantaneously. We have found that students can build search engine components, and even entire search engines, in the context of problem-based learning in introductory and intermediate computer science courses. The courses cover a broad range of topics in algorithms, data structures, and web design, with a heavy emphasis on programming. Additionally, the internet is coupled with the syllabus at many places, from web design and HTML to graph algorithms and pattern matching. This connection enlivens the discussion of otherwise dry topics like searching, sorting, indexing and hashing. Moreover, the challenge of web-scale computing motivates the continuing students in their later study of formal topics like algorithmic complexity, while non-continuing students acquire transferable analytical skills. We report on the experience in search engine projects for driving problem-based learning in computer science courses, for both high school and university students. Our experience shows that such projects are effective in both introductory and intermediate courses, and readily encompass student groups with diverse programming abilities.
  • Item
    Thumbnail Image
    TalkBank: Building an open unified multimodal database of communicative interaction
    MacWhinney, B ; Bird, S ; Cieri, C ; Martell, C (Evaluations and Language resources Distribution Agency, 2004-01-01)