Minerva Elements Records

Permanent URI for this collection

http://hdl.handle.net/11343/251382

Search Results

Now showing 1 - 10 of 32

Lossy Compression Options for Dense Index Retention

Mackenzie, J ; Moffat, A (ASSOC COMPUTING MACHINERY, 2023)
Immediate-Access Indexing Using Space-Efficient Extensible Arrays

Moffat, A ; Mackenzie, J (ACM, 2022-12-15)
Users: Can't Work With Them, Can't Work Without Them?

Moffat, A (ACM, 2022-07-06)
A Flexible Framework for Offline Effectiveness Metrics

Moffat, A ; MacKenzie, J ; Thomas, P ; Azzopardi, L (Association for Computing Machinery, 2022-07-06)

The use of offline effectiveness metrics is one of the cornerstones of evaluation in information retrieval. Static resources that include test collections and sets of topics, the corresponding relevance judgments connecting them, and metrics that map document rankings from a retrieval system to numeric scores have been used for multiple decades as an important way of comparing systems. The basis behind this experimental structure is that the metric score for a system can serve as a surrogate measurement for user satisfaction. Here we introduce a user behavior framework that extends the C/W/L family. The essence of the new framework - which we call C/W/L/A - is that the user actions that are undertaken while reading the ranking can be considered separately from the benefit that each user will have derived as they exit the ranking. This split structure allows the great majority of current effectiveness metrics to be systematically categorized, and thus their relative properties and relationships to be better understood; and at the same time permits a wide range of novel combinations to be considered. We then carry out experiments using relevance judgments, document rankings, and user satisfaction data from two distinct sources, comparing the patterns of metric scores generated, and showing that those metrics vary quite markedly in terms of their ability to predict user satisfaction.
ERR is not C/W/L: Exploring the Relationship between Expected Reciprocal Rank and Other Metrics

Azzopardi, L ; MacKenzie, J ; Moffat, A (ACM, 2021-07-11)

We explore the relationship between expected reciprocal rank (ERR) and the metrics that are available under the C/W/L framework. On the surface, it appears that the user browsing model associated with ERR can be directly injected into a C/W/L arrangement, to produce system measurements equivalent to those generated from ERR. That assumption is now known to be invalid, and demonstration of the impossibility of ERR being described via C/W/L choices forms the first part of our work. Given that ERR cannot be accommodated within the C/W/L framework, we then explore the extent to which practical use of ERR correlates with metrics that do fit within the C/W/L user browsing model. In this part of the investigation we present a range of shallow-evaluation C/W/L variants that have very high correlation with ERR when compared in experiments involving a large number of TREC runs. That is, while ERR itself is not a C/W/L metric, there are other weighted-precision computations that fit with the user model assumed by C/W/L, and yield system comparisons almost indistinguishable from those generated via the use of ERR.
CC-News-En: A Large English News Corpus

Mackenzie, J ; Benham, R ; Petri, M ; Trippas, JR ; Culpepper, JS ; Moffat, A (ACM, 2020-10-19)

We describe a static, open-access news corpus using data from the Common Crawl Foundation, who provide free, publicly available web archives, including a continuous crawl of international news articles published in multiple languages. Our derived corpus, CC-News-En, contains 44 million English documents collected between September 2016 and March 2018. The collection is comparable in size with the number of documents typically found in a single shard of a large-scale, distributed search engine, and is four times larger than the news collections previously used in offline information retrieval experiments. To complement the corpus, 173 topics were curated using titles from Reddit threads, forming a temporally representative sampling of relevant news topics over the 583 day collection window. Information needs were then generated using automatic summarization tools to produce textual and audio representations, and used to elicit query variations from crowdworkers, with a total of 10,437 queries collected against the 173 topics. Of these, 10,089 include key-stroke level instrumentation that captures the timings of character insertions and deletions made by the workers while typing their queries. These new resources support a wide variety of experiments, including large-scale efficiency exercises and query auto-completion synthesis, with scope for future addition of relevance judgments to support offline effectiveness experiments and hence batch evaluation campaigns.
Examining the Additivity of Top-k Query Processing Innovations

MacKenzie, J ; Moffat, A (ACM, 2020-10-19)

Research activity spanning more than five decades has led to index organizations, compression schemes, and traversal algorithms that allow extremely rapid response to ranked queries against very large text collections. However, little attention has been paid to the interactions between these many components, and the additivity of algorithmic improvements has not been explored. Here we examine the extent to which efficiency improvements add up. We employ four query processing algorithms, four compression codecs, and all possible combinations of four distinct further optimizations, and compare the performance of the 256 resulting systems to determine when and how different optimizations interact. Our results over two test collections show that efficiency enhancements are, for the most part, additive, and that there is little risk of negative interactions. In addition, our detailed profiling across this large pool of systems leads to key insights as to why the various individual enhancements work well, and indicates that optimizing "simpler" implementations can result in higher query throughput than is available from non-optimized versions of the more "complex" techniques, with clear implications for the choices needing to be made by practitioners.
Different Keystrokes for Different Folks: Visualizing Crowdworker Querying Behavior

Benham, R ; MacKenzie, J ; Culpepper, JS ; Moffat, A (ACM, 2021-03-14)

Search engine users retrieve relevant information for an information need using keyword queries. Different users may have similar information needs, but use different query terms. The resultinguser query variations can provide a wealth of useful information to IR researchers. Most recently, the keystroke-level telemetry data gathered as part of the CC-News-En collection provides importantinsights into how users create queries for a search task, at a level of detail not possible using anormal query log. In this demo, we present an interactive tool that enables practitioners to visualize users formulating queries. Our new tool is a temporal simulation of the typing behavior of crowdworkers, grouped by information need. It provides the ability to directly compare the cognitive behavior of multiple users simultaneously, and observe how query keyword selection and ordering happens before a final query is submitted to a search engine. To demonstrate the benefit of our tool, weinclude a qualitative study of four different user behavior patterns which were observed in the CC-News-En collection.
Entropia: A family of entropy-based conformance checking measures for process mining

Polyvyanyy, A ; Alkhammash, H ; Di Ciccio, C ; García-Bañuelos, L ; Kalenkova, A ; Leemans, SJJ ; Mendling, J ; Moffat, A ; Weidlich, M (CEUR Workshop Proceedings, 2020-01-01)

This paper presents a command-line tool, called Entropia, that implements a family of conformance checking measures for process mining founded on the notion of entropy from information theory. The measures allow quantifying classical non-deterministic and stochastic precision and recall quality criteria for process models automatically discovered from traces executed by IT-systems and recorded in their event logs. A process model has "good" precision with respect to the log it was discovered from if it does not encode many traces that are not part of the log, and has "good" recall if it encodes most of the traces from the log. By definition, the measures possess useful properties and can often be computed quickly.
An entropic relevance measure for stochastic conformance checking in process mining

Polyvyanyy, A ; Moffat, A ; Garcia-Banuelos, L ; vanDongen, B ; Montali, M ; Wynn, MT (IEEE, 2020-10-01)

Given an event log as a collection of recorded real-world process traces, process mining aims to automatically construct a process model that is both simple and provides a useful explanation of the traces. Conformance checking techniques are then employed to characterize and quantify commonalities and discrepancies between the log's traces and the candidate models. Recent approaches to conformance checking acknowledge that the elements being compared are inherently stochastic-for example, some traces occur frequently and others infrequently- A nd seek to incorporate this knowledge in their analyses.Here we present an entropic relevance measure for stochastic conformance checking, computed as the average number of bits required to compress each of the log's traces, based on the structure and information about relative likelihoods provided by the model. The measure penalizes traces from the event log not captured by the model and traces described by the model but absent in the event log, thus addressing both precision and recall quality criteria at the same time. We further show that entropic relevance is computable in time linear in the size of the log, and provide evaluation outcomes that demonstrate the feasibility of using the new approach in industrial settings.

Minerva Elements Records

Permanent URI for this collection

Filters

Date

Author

Subject

Type

Settings

Sort By

Results per page

Statistics

Citations

Search Results