Minerva Elements Records

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 16
  • Item
    Thumbnail Image
    Efficient plagiarism detection for large code repositories
    Burrows, S ; Tahaghoghi, SMM ; Zobel, J (WILEY, 2007-02)
  • Item
    Thumbnail Image
    Inverted files for text search engines
    Zobel, J ; Moffat, A (ASSOC COMPUTING MACHINERY, 2006)
  • Item
    Thumbnail Image
    A pipelined architecture for distributed text query evaluation
    Moffat, A ; Webber, W ; Zobel, J ; Baeza-Yates, R (SPRINGER, 2007-06)
  • Item
    Thumbnail Image
    A Similarity Measure for Indefinite Rankings
    Webber, W ; Moffat, A ; Zobel, J (ASSOC COMPUTING MACHINERY, 2010-11)
    Ranked lists are encountered in research and daily life and it is often of interest to compare these lists even when they are incomplete or have only some members in common. An example is document rankings returned for the same query by different search engines. A measure of the similarity between incomplete rankings should handle nonconjointness, weight high ranks more heavily than low, and be monotonic with increasing depth of evaluation; but no measure satisfying all these criteria currently exists. In this article, we propose a new measure having these qualities, namely rank-biased overlap (RBO). The RBO measure is based on a simple probabilistic user model. It provides monotonicity by calculating, at a given depth of evaluation, a base score that is non-decreasing with additional evaluation, and a maximum score that is nonincreasing. An extrapolated score can be calculated between these bounds if a point estimate is required. RBO has a parameter which determines the strength of the weighting to top ranks. We extend RBO to handle tied ranks and rankings of different lengths. Finally, we give examples of the use of the measure in comparing the results produced by public search engines and in assessing retrieval systems in the laboratory.
  • Item
    Thumbnail Image
    Using query logs to establish vocabularies in distributed information retrieval
    Shokouhi, M ; Zobel, J ; Tahaghoghi, S ; Scholer, F (ELSEVIER SCI LTD, 2007-01)
  • Item
    Thumbnail Image
    Robust Result Merging Using Sample-Based Score Estimates
    Shokouhi, M ; Zobel, J (ASSOC COMPUTING MACHINERY, 2009)
    In federated information retrieval, a query is routed to multiple collections and a single answer list is constructed by combining the results. Such metasearch provides a mechanism for locating documents on the hidden Web and, by use of sampling, can proceed even when the collections are uncooperative. However, the similarity scores for documents returned from different collections are not comparable, and, in uncooperative environments, document scores are unlikely to be reported. We introduce a new merging method for uncooperative environments, in which similarity scores for the sampled documents held for each collection are used to estimate global scores for the documents returned per query. This method requires no assumptions about properties such as the retrieval models used. Using experiments on a wide range of collections, we show that in many cases our merging methods are significantly more effective than previous techniques.
  • Item
    Thumbnail Image
    B-tries for disk-based string management
    Askitis, N ; Zobel, J (SPRINGER, 2009-01)
  • Item
    Thumbnail Image
    Exploring criteria for successful query expansion in the genomic domain
    Stokes, N ; Li, Y ; Cavedon, L ; Zobel, J (SPRINGER, 2009-02)
  • Item
    Thumbnail Image
    Efficient online index maintenance for contiguous inverted lists
    Lester, N ; Zobel, J ; Williams, H (ELSEVIER SCI LTD, 2006-07)
  • Item
    Thumbnail Image
    Detection of video sequences using compact signatures
    Hoad, TC ; Zobel, J (ASSOC COMPUTING MACHINERY, 2006-01)
    Digital representations are widely used for audiovisual content, enabling the creation of large online repositories of video, allowing access such as video on demand. However, the ease of copying and distribution of digital video makes piracy a growing concern for content owners. We investigate methods for identifying coderivative video content---that is, video clips that are derived from the same original source. By using dynamic programming to identify regions of similarity in video signatures, it is possible to efficiently and accurately identify coderivatives, even when these regions constitute only a small section of the clip being searched. We propose four new methods for producing compact video signatures, based on the way in which the video changes over time. The intuition is that such properties are likely to be preserved even when the video is badly degraded. We demonstrate that these signatures are insensitive to dramatic changes in video bitrate and resolution, two parameters that are often altered when reencoding. In the presence of mild degradations, our methods can accurately identify copies of clips that are as short as 5 s within a dataset 140 min long. These methods are much faster than previously proposed techniques; using a more compact signature, this query can be completed in a few milliseconds.