Chancellery Research - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 11
  • Item
    Thumbnail Image
    Efficient plagiarism detection for large code repositories
    Burrows, S ; Tahaghoghi, SMM ; Zobel, J (WILEY, 2007-02)
  • Item
    Thumbnail Image
    Federating distributed clinical data for the prediction of adverse hypotensive events
    Stell, A ; Sinnott, R ; Jiang, J ; Donald, R ; Chambers, I ; Citerio, G ; Enblad, P ; Gregson, B ; Howells, T ; Kiening, K ; Nilsson, P ; Ragauskas, A ; Sahuquillo, J ; Piper, I (Royal Society, The, 2009)
    The ability to predict adverse hypotensive events, where a patient's arterial blood pressure drops to abnormally low (and dangerous) levels, would be of major benefit to the fields of primary and secondary health care, and especially to the traumatic brain injury domain. A wealth of data exist in health care systems providing information on the major health indicators of patients in hospitals (blood pressure, temperature, heart rate, etc.). It is believed that if enough of these data could be drawn together and analysed in a systematic way, then a system could be built that will trigger an alarm predicting the onset of a hypotensive event over a useful time scale, e.g. half an hour in advance. In such circumstances, avoidance measures can be taken to prevent such events arising. This is the basis for the Avert-IT project (http://www.avert-it.org), a collaborative EU-funded project involving the construction of a hypotension alarm system exploiting Bayesian neural networks using techniques of data federation to bring together the relevant information for study and system development.
  • Item
    Thumbnail Image
    Using query logs to establish vocabularies in distributed information retrieval
    Shokouhi, M ; Zobel, J ; Tahaghoghi, S ; Scholer, F (ELSEVIER SCI LTD, 2007-01)
  • Item
    Thumbnail Image
    Robust Result Merging Using Sample-Based Score Estimates
    Shokouhi, M ; Zobel, J (ASSOC COMPUTING MACHINERY, 2009)
    In federated information retrieval, a query is routed to multiple collections and a single answer list is constructed by combining the results. Such metasearch provides a mechanism for locating documents on the hidden Web and, by use of sampling, can proceed even when the collections are uncooperative. However, the similarity scores for documents returned from different collections are not comparable, and, in uncooperative environments, document scores are unlikely to be reported. We introduce a new merging method for uncooperative environments, in which similarity scores for the sampled documents held for each collection are used to estimate global scores for the documents returned per query. This method requires no assumptions about properties such as the retrieval models used. Using experiments on a wide range of collections, we show that in many cases our merging methods are significantly more effective than previous techniques.
  • Item
    Thumbnail Image
    B-tries for disk-based string management
    Askitis, N ; Zobel, J (SPRINGER, 2009-01)
  • Item
    Thumbnail Image
    Efficient online index maintenance for contiguous inverted lists
    Lester, N ; Zobel, J ; Williams, H (ELSEVIER SCI LTD, 2006-07)
  • Item
    Thumbnail Image
    Detection of video sequences using compact signatures
    Hoad, TC ; Zobel, J (ASSOC COMPUTING MACHINERY, 2006-01)
    Digital representations are widely used for audiovisual content, enabling the creation of large online repositories of video, allowing access such as video on demand. However, the ease of copying and distribution of digital video makes piracy a growing concern for content owners. We investigate methods for identifying coderivative video content---that is, video clips that are derived from the same original source. By using dynamic programming to identify regions of similarity in video signatures, it is possible to efficiently and accurately identify coderivatives, even when these regions constitute only a small section of the clip being searched. We propose four new methods for producing compact video signatures, based on the way in which the video changes over time. The intuition is that such properties are likely to be preserved even when the video is badly degraded. We demonstrate that these signatures are insensitive to dramatic changes in video bitrate and resolution, two parameters that are often altered when reencoding. In the presence of mild degradations, our methods can accurately identify copies of clips that are as short as 5 s within a dataset 140 min long. These methods are much faster than previously proposed techniques; using a more compact signature, this query can be completed in a few milliseconds.
  • Item
    Thumbnail Image
    Efficient query expansion with auxiliary data structures
    Billerbeck, B ; Zobel, J (PERGAMON-ELSEVIER SCIENCE LTD, 2006-11)
  • Item
    Thumbnail Image
    Sample sizes for query probing in uncooperative distributed information retrieval
    Shokouhi, M ; Scholer, F ; Zobel, J ; Zhou, XF ; Li, J ; Shen, HT ; Kitsuregawa, M ; Zhang, Y (SPRINGER-VERLAG BERLIN, 2006)
  • Item
    Thumbnail Image
    Does topic metadata help with Web search?
    Hawking, D ; Zobel, J (WILEY, 2007-03)
    Abstract It has been claimed that topic metadata can be used to improve the accuracy of text searches. Here, we test this claim by examining the contribution of metadata to effective searching within Web sites published by a university with a strong commitment to and substantial investment in metadata. The authors use four sets of queries, a total of 463, extracted from the university's official query logs and from the university's site map. The results are clear: The available metadata is of little value in ranking answers to those queries. A follow‐up experiment with the Web sites published in a particular government jurisdiction confirms that this conclusion is not specific to the particular university. Examination of the metadata present at the university reveals that, in addition to implementation deficiencies, there are inherent problems in trying to use subject and description metadata to enhance the searchability of Web sites. Our experiments show that link anchor text, which can be regarded as metadata created by others, is much more effective in identifying best answers to queries than other textual evidence. Furthermore, query‐independent evidence such as link counts and uniform resource locator (URL) length, unlike subject and description metadata, can substantially improve baseline performance.