Computing and Information Systems - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 13
  • Item
    Thumbnail Image
    An empirical study of the effects of NLP components on Geographic IR performance
    Stokes, N ; Li, Y ; Moffat, A ; Rong, J (TAYLOR & FRANCIS LTD, 2008)
  • Item
    Thumbnail Image
    Index compression using 64-bit words
    Anh, VN ; Moffat, A (WILEY, 2010-02)
  • Item
    Thumbnail Image
    Decoding prefix codes
    Liddell, M ; Moffat, A (WILEY-BLACKWELL, 2006-12)
  • Item
    Thumbnail Image
    Block Merging for Off-Line Compression
    WAN, R. ; MOFFAT, A. ( 2007)
  • Item
    Thumbnail Image
    Inverted files for text search engines
    Zobel, J ; Moffat, A (ASSOC COMPUTING MACHINERY, 2006)
  • Item
    Thumbnail Image
    A pipelined architecture for distributed text query evaluation
    Moffat, A ; Webber, W ; Zobel, J ; Baeza-Yates, R (SPRINGER, 2007-06)
  • Item
    Thumbnail Image
    Word-based text compression using the Burrows-Wheeler transform
    Moffat, A ; Isal, RYK (ELSEVIER SCI LTD, 2005-09)
  • Item
    Thumbnail Image
    Inverted index compression using word-aligned binary codes
    Anh, VN ; Moffat, A (SPRINGER, 2005-01)
  • Item
    Thumbnail Image
    Efficient Set Intersection for Inverted Indexing
    Culpepper, JS ; Moffat, A (ASSOC COMPUTING MACHINERY, 2010-12)
    Conjunctive Boolean queries are a key component of modern information retrieval systems, especially when Web-scale repositories are being searched. A conjunctive query q is equivalent to a | q |-way intersection over ordered sets of integers, where each set represents the documents containing one of the terms, and each integer in each set is an ordinal document identifier. As is the case with many computing applications, there is tension between the way in which the data is represented, and the ways in which it is to be manipulated. In particular, the sets representing index data for typical document collections are highly compressible, but are processed using random access techniques, meaning that methods for carrying out set intersections must be alert to issues to do with access patterns and data representation. Our purpose in this article is to explore these trade-offs, by investigating intersection techniques that make use of both uncompressed “integer” representations, as well as compressed arrangements. We also propose a simple hybrid method that provides both compact storage, and also faster intersection computations for conjunctive querying than is possible even with uncompressed representations.
  • Item
    Thumbnail Image
    A Similarity Measure for Indefinite Rankings
    Webber, W ; Moffat, A ; Zobel, J (ASSOC COMPUTING MACHINERY, 2010-11)
    Ranked lists are encountered in research and daily life and it is often of interest to compare these lists even when they are incomplete or have only some members in common. An example is document rankings returned for the same query by different search engines. A measure of the similarity between incomplete rankings should handle nonconjointness, weight high ranks more heavily than low, and be monotonic with increasing depth of evaluation; but no measure satisfying all these criteria currently exists. In this article, we propose a new measure having these qualities, namely rank-biased overlap (RBO). The RBO measure is based on a simple probabilistic user model. It provides monotonicity by calculating, at a given depth of evaluation, a base score that is non-decreasing with additional evaluation, and a maximum score that is nonincreasing. An extrapolated score can be calculated between these bounds if a point estimate is required. RBO has a parameter which determines the strength of the weighting to top ranks. We extend RBO to handle tied ranks and rankings of different lengths. Finally, we give examples of the use of the measure in comparing the results produced by public search engines and in assessing retrieval systems in the laboratory.