Computing and Information Systems - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 29
  • Item
    Thumbnail Image
    A classification-based framework for learning object assembly
    Farmer, R. A. ; Hughes, B. (IEEE Computer Society Press, 2005)
    Relations between learning outcomes and the learning objects which are assembled to facilitate their achievement are the subject of increasingly prevalent investigation, particularly with approaches which advocate the aggregation of learning objects as complex constituencies for achieving learning outcomes. From the perspective of situated learning, we show how the CASE framework imbues learning objects with a closed set of properties which can be classified and aggregated into learning object assemblies in a principled fashion. We argue that the computational and pedagogical tractability of this model provides a new insight into learning object evaluation, and hence learning outcomes.
  • Item
    Thumbnail Image
    NICTA i2d2 at GeoCLEF 2005
    HUGHES, BADEN ( 2005)
    This paper describes the participation of the Interactive Information Discovery and Delivery (i2d2) project of National ICT Australia (NICTA) in the GeoCLEF track of the Cross Language Evaluation Forum 2005. We present some background information about NICTA i2d2 project to motivate our involvement; describing our systems and experimental interests. We review the design of our runs and the results of our submitted and subsequent experiments; and contribute a range of suggestions for future instantiations of a geospatial information retrieval track within a shared evaluation task framework.
  • Item
    Thumbnail Image
    Towards a Web search service for minority language communities
    HUGHES, BADEN (State Library of Victoria, 2006)
    Locating resources of interest on the web in the general case is at best a low precision activity owing to the large number of pages on the web (for example, Google covers more than 8 billion web pages). As language communities (at all points on the spectrum) increasingly self-publish materials on the web, so interested users are beginning to search for them in the same way that they search for general internet resources, using broad coverage search engines with typically simple queries. Given that language resources are in a minority case on the web in general, finding relevant materials for low density or lesser used languages on the web is in general an increasingly inefficient exercise even for experienced searchers. Furthermore, the inconsistent coverage of web content between search engines serves to complicate matters even more. A number of previous research efforts have focused on using web data to create language corpora, mine linguistic data, building language ontologies, create thesaurii etc. The work reported in this paper contrasts with previous research in that it is not specifically oriented towards creation of language resources from web data directly, but rather, increasing the likelihood that end users searching for resources in minority languages will actually find useful results from web searches. Similarly, it differs from earlier work by virtue of its focus on search optimization directly, rather than as a component of a larger process (other researchers use the seed URIs discovered via the mechanism described in this paper in their own varied work). The work here can be seen to contribute to a user-centric agenda for locating language resources for lesser-used languages on the web. (From Introduction)
  • Item
    Thumbnail Image
    Searching for language resources on the Web: user behaviour in the Open Language Archives Community
    HUGHES, BADEN (European Language Resources Association, 2006)
    While much effort is expended in the curation of language resources, such investment is largely irrelevant if users cannot locate resources of interest. The Open Language Archives Community (OLAC) was established to define standards for the description of language resources and provide core infrastructure for a virtual digital library, thus addressing the resource discovery issue. In this paper we consider naturalistic user search behaviour in the Open Language Archives Community. Specifically, we have collected the query logs from the OLAC Search Engine over a 2 year period, collecting in excess of 1.2 million queries, in over 450K user search sessions. Subsequently we have mined these to discover user search patterns of various types, all pertaining to the discovery of language resources. A number of interesting observations can be made based on this analysis, in this paper we report on a range of properties and behaviours based on empirical evidence.
  • Item
    Thumbnail Image
    Gold as a standard for linguistic data interoperation: a road map for development
    Simons, G. F. ; Hughes, B. ( 2006)
    GOLD, the General Ontology for Linguistic Description [1], has somewhat unexpectedly emerged from the EMELD project. Originally conceived of as a morphosyntactic annotation inventory and label mapping scheme, GOLD has now been formalized as an ontology by which disparate data sets can be integrated through a common representation of the basic linguistic features.The overall vision of the GOLD Community is that:"By agreeing on a shared ONTOLOGY of linguistic concepts and on a shared infrastructure for INTEROPERATION, the linguistics community will be able to produce RESOURCES that describe individual languages in a comparable way, to develop TOOLS that produce these comparable resources, and to query SERVICES that aggregate as many comparable resources as are available." [2]In the EMELD context, a significant amount of effort has been invested in the development of GOLD in the first dimension of this vision, namely a shared collection of linguistic concepts. Initial surveying work been completed to glean linguistic concepts and their definitions from published materials. This survey work has been complemented by web data mining activities [3] to further increase the coverage of GOLD. GOLD has been instantiated in several formal versions, and a range of proof of concept implementations have featured at previous EMELD events [4, 5, 6] and other venues [7].However the latter four items from the GOLD Community vision (to achieve interoperation through resources, tools, and services) remain largely unaddressed, and thus there remains considerable effort to be expended in achieving the vision in its entirity. Upon reflection, we believe that there are presently three significant barriers to the widespread adoption of GOLD and subsequent realization of the interoperation goals, vis: * the complexity of the dissemination format which in effect places the threshold for engagement with GOLD at too high a level; * the absence of a well defined change process
  • Item
    Thumbnail Image
    Metadata Challenges for Situational Properties of Learning Objects
    HUGHES, B ; FARMER, RA (IEEE Computer Society, 2006)
  • Item
    Thumbnail Image
    Collecting low-density language materials on the Web
    Baldwin, Timothy ; BIRD, STEPHEN ; HUGHES, BADEN (Southern Cross University, 2006)
    Most web content exists in a few dozen languages. Hundreds of other languages - the `low-density languages' - are only represented in scarce quantities on the web. How can we locate, store and describe these low-density resources? In particular, how can we identify linguistically interesting resources, such as translation sets and multilingual documents? In this paper we describe ongoing research in which we integrate a number of discrete systems (language data crawler, automated metadata generation tools, language data repositories and federated search services) to address the identification, retrieval, description, storage and access issues for low-density language materials from the web.
  • Item
    Thumbnail Image
    Frontiers in Linguistic Annotation for Lower-Density Languages
    Maxwell, M. ; Hughes, B. (Association for Computational Linguistics, 2006)
    The languages that are most commonly subject to linguistic annotation on a large scale tend to be those with the largest pop- ulations or with recent histories of lin- guistic scholarship. In this paper we dis- cuss the problems associated with lower- density languages in the context of the de- velopment of linguistically annotated re- sources. We frame our work with three key questions regarding the definition of lower-density languages; increasing avail- able resources and reducing data require- ments. A number of steps forward are identified for increasing the number lower- density language corpora with linguistic annotations.
  • Item
    Thumbnail Image
    Towards interoperable secondary annotations in the e-social science domain
    HUGHES, BADEN ; SCHMIDT, D. ; Smith, Andrew E. (National Centre for E-Social Science / The University of Manchester, 2006)
    The sharing of data for secondary analysis has been very limited, especially in the social sciences. The reasons usually cited for this limited sharing are (1) strong privacy requirements on data and (2) lack of appropriate contextual knowledge by secondary investigators. We argue that a third reason, the lack of interoperability between software tools commonly used for data annotation and coding by social scientists, is critical even if the problems identified by earlier researchers are resolved. In this paper, we describe our work in attempting to address the data interoperability issue directly by developing standards for the syntactic expression of annotation, and core libraries which can be used to manipulate the annotations in their standard format, as well as the overall system architecture and examples of analytical applications which can be used for secondary coding and analysis.
  • Item
    Thumbnail Image
    Link? Rot: URI citation durability in 10 years of AusWeb Proceedings
    HUGHES, BADEN (Southern Cross University, 2006)
    The AusWeb conference has played a significant role in promoting and advancing research in web technologies in Australia over the last decade. In addition, the AusWeb forum serves as a point of reflection for practitioners engaged in web infrastructure, content and policy development; allowing the distillation of best practice in the management of web services particularly in higher educational institutions in the Australasian region. Papers contributed to AusWeb are highly connected to the web in general, to digital libraries and to other conference sites in particular, owing to the publication of proceedings in HTML and full support for hyperlink based referencing. Authors have exploited this medium progressively more effectively, with the vast majority of references for AusWeb papers now being URIs as opposed to more traditional citation forms. Although care has been taken by the proceedings editors to ensure that a high degree of syntactic standards compliance is adhered to by AusWeb authors, the editors are not responsible for paper content, or the citations made within them. As such, it is the responsibility of the authors (and arguably perhaps, the AusWeb reviewers) to ensure that the citations made in a given paper, particularly those cited by URI, are available for consideration by interested readers. The objective of this paper is to examine the reliability of URI citations in 10 years worth of AusWeb proceedings, particularly to determine the durability of such references, and to classify their causes of unavailability.