Computing and Information Systems - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 102
  • Item
    Thumbnail Image
    Towards a Web search service for minority language communities
    HUGHES, BADEN (State Library of Victoria, 2006)
    Locating resources of interest on the web in the general case is at best a low precision activity owing to the large number of pages on the web (for example, Google covers more than 8 billion web pages). As language communities (at all points on the spectrum) increasingly self-publish materials on the web, so interested users are beginning to search for them in the same way that they search for general internet resources, using broad coverage search engines with typically simple queries. Given that language resources are in a minority case on the web in general, finding relevant materials for low density or lesser used languages on the web is in general an increasingly inefficient exercise even for experienced searchers. Furthermore, the inconsistent coverage of web content between search engines serves to complicate matters even more. A number of previous research efforts have focused on using web data to create language corpora, mine linguistic data, building language ontologies, create thesaurii etc. The work reported in this paper contrasts with previous research in that it is not specifically oriented towards creation of language resources from web data directly, but rather, increasing the likelihood that end users searching for resources in minority languages will actually find useful results from web searches. Similarly, it differs from earlier work by virtue of its focus on search optimization directly, rather than as a component of a larger process (other researchers use the seed URIs discovered via the mechanism described in this paper in their own varied work). The work here can be seen to contribute to a user-centric agenda for locating language resources for lesser-used languages on the web. (From Introduction)
  • Item
    Thumbnail Image
    Anti-personnel landmine detection based on GPR and IR imaging: a review
    Bhuiyan, Alauddin ; Nath, Baikunth ( 2006-04)
    Ground penetrating radar (GPR) and Infrared (IR) camera have become two established sensors for detecting buried anti-personnel mines (APM) which contain no or a little metal. This report reviews the detection techniques of APM using GPR and IR, and describes particular situations where each technique is feasible. We provide an analysis for fusion based detection and classification of APM. We discuss the GPR and IR data acquisition, signal processing and image processing methods. We also include a comparative study of these two sensors with respect to signal processing and target detection procedures. The report discusses the strengths and weaknesses of each of the sensors based on data capturing efficiency, overcoming environmental difficulties and sensor technology. Finally, we emphasize that a geometrical feature based sensor fusion, combining GPR and IR, for detection and classification of APM may be the most effective technique.
  • Item
    Thumbnail Image
    An anthological review of research using MontyLingua: a python-based end-to-end text processor
    Ling, Maurice H. T. (CG Publisher, 2006-12)
    MontyLingua, an integral part of ConceptNet which is currently the largest commonsense knowledge base, is an English text processor developed using Python programming language in MIT Media Lab. The main feature of MontyLingua is the coverage for all aspects of English text processing from raw input text to semantic meanings and summary generation, yet each component in MontyLingua is loosely-coupled to each other at the architectural and code level, which enabled individual components to be used independently or substituted. However, there has been no review exploring the role of MontyLingua in recent research work utilizing it. This paper aims to review the use of and roles played by MontyLingua and its components in research work published in 19 articles between October 2004 and August 2006. We had observed a diversified use of MontyLingua in many different areas, both generic and domain specific. Although the use of text summarizing component had not been observe, we areoptimistic that it will have a crucial role in managing the current trend of information overload in future research.
  • Item
    Thumbnail Image
    Searching for language resources on the Web: user behaviour in the Open Language Archives Community
    HUGHES, BADEN (European Language Resources Association, 2006)
    While much effort is expended in the curation of language resources, such investment is largely irrelevant if users cannot locate resources of interest. The Open Language Archives Community (OLAC) was established to define standards for the description of language resources and provide core infrastructure for a virtual digital library, thus addressing the resource discovery issue. In this paper we consider naturalistic user search behaviour in the Open Language Archives Community. Specifically, we have collected the query logs from the OLAC Search Engine over a 2 year period, collecting in excess of 1.2 million queries, in over 450K user search sessions. Subsequently we have mined these to discover user search patterns of various types, all pertaining to the discovery of language resources. A number of interesting observations can be made based on this analysis, in this paper we report on a range of properties and behaviours based on empirical evidence.
  • Item
    Thumbnail Image
    Gold as a standard for linguistic data interoperation: a road map for development
    Simons, G. F. ; Hughes, B. ( 2006)
    GOLD, the General Ontology for Linguistic Description [1], has somewhat unexpectedly emerged from the EMELD project. Originally conceived of as a morphosyntactic annotation inventory and label mapping scheme, GOLD has now been formalized as an ontology by which disparate data sets can be integrated through a common representation of the basic linguistic features.The overall vision of the GOLD Community is that:"By agreeing on a shared ONTOLOGY of linguistic concepts and on a shared infrastructure for INTEROPERATION, the linguistics community will be able to produce RESOURCES that describe individual languages in a comparable way, to develop TOOLS that produce these comparable resources, and to query SERVICES that aggregate as many comparable resources as are available." [2]In the EMELD context, a significant amount of effort has been invested in the development of GOLD in the first dimension of this vision, namely a shared collection of linguistic concepts. Initial surveying work been completed to glean linguistic concepts and their definitions from published materials. This survey work has been complemented by web data mining activities [3] to further increase the coverage of GOLD. GOLD has been instantiated in several formal versions, and a range of proof of concept implementations have featured at previous EMELD events [4, 5, 6] and other venues [7].However the latter four items from the GOLD Community vision (to achieve interoperation through resources, tools, and services) remain largely unaddressed, and thus there remains considerable effort to be expended in achieving the vision in its entirity. Upon reflection, we believe that there are presently three significant barriers to the widespread adoption of GOLD and subsequent realization of the interoperation goals, vis: * the complexity of the dissemination format which in effect places the threshold for engagement with GOLD at too high a level; * the absence of a well defined change process
  • Item
    Thumbnail Image
    Metadata Challenges for Situational Properties of Learning Objects
    HUGHES, B ; FARMER, RA (IEEE Computer Society, 2006)
  • Item
    Thumbnail Image
    Collecting low-density language materials on the Web
    Baldwin, Timothy ; BIRD, STEPHEN ; HUGHES, BADEN (Southern Cross University, 2006)
    Most web content exists in a few dozen languages. Hundreds of other languages - the `low-density languages' - are only represented in scarce quantities on the web. How can we locate, store and describe these low-density resources? In particular, how can we identify linguistically interesting resources, such as translation sets and multilingual documents? In this paper we describe ongoing research in which we integrate a number of discrete systems (language data crawler, automated metadata generation tools, language data repositories and federated search services) to address the identification, retrieval, description, storage and access issues for low-density language materials from the web.
  • Item
    Thumbnail Image
    Frontiers in Linguistic Annotation for Lower-Density Languages
    Maxwell, M. ; Hughes, B. (Association for Computational Linguistics, 2006)
    The languages that are most commonly subject to linguistic annotation on a large scale tend to be those with the largest pop- ulations or with recent histories of lin- guistic scholarship. In this paper we dis- cuss the problems associated with lower- density languages in the context of the de- velopment of linguistically annotated re- sources. We frame our work with three key questions regarding the definition of lower-density languages; increasing avail- able resources and reducing data require- ments. A number of steps forward are identified for increasing the number lower- density language corpora with linguistic annotations.
  • Item
    Thumbnail Image
    Analysis and prediction of user behaviour in a museum environment
    Grieser, Karl ; Baldwin, Timothy ; Bird, Steven (Australasian Language Technology Association, 2006)
    N/A
  • Item
    Thumbnail Image
    Towards interoperable secondary annotations in the e-social science domain
    HUGHES, BADEN ; SCHMIDT, D. ; Smith, Andrew E. (National Centre for E-Social Science / The University of Manchester, 2006)
    The sharing of data for secondary analysis has been very limited, especially in the social sciences. The reasons usually cited for this limited sharing are (1) strong privacy requirements on data and (2) lack of appropriate contextual knowledge by secondary investigators. We argue that a third reason, the lack of interoperability between software tools commonly used for data annotation and coding by social scientists, is critical even if the problems identified by earlier researchers are resolved. In this paper, we describe our work in attempting to address the data interoperability issue directly by developing standards for the syntactic expression of annotation, and core libraries which can be used to manipulate the annotations in their standard format, as well as the overall system architecture and examples of analytical applications which can be used for secondary coding and analysis.