Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 1 of 1
  • Item
    Thumbnail Image
    Probabilistic models for aggregating crowdsourced annotations
    Li, Yuan ( 2019)
    This thesis explores aggregation methods for crowdsourced annotations. Crowdsourcing is a popular means of creating training and evaluation datasets for machine learning, e.g. used for computer vision, natural language processing, and information retrieval, at a low cost and in a timely manner. However, due to low-quality annotations, individual workers cannot be wholly trusted to provide reliable annotations and consequently items are typically redundantly labelled by different workers, with labels aggregated subsequently. Although many aggregation methods have been proposed to jointly learn non-uniform weights to workers and infer the truth, the simplest aggregation method, majority voting (MV) which grants workers equal votes towards consensus, still predominates in practice. To find explanations to the predominance of MV, we conduct extensive experiments of evaluation on 19 datasets and identify two shortcomings that prevent existing methods from being applied in practice over the simple MV. A key finding is that most methods don’t significantly outperform MV across all datasets. These methods may achieve higher mean accuracy than MV does but are also outperformed by MV on several datasets. A secondary shortcoming is that several methods require slow and cumbersome inference, which doesn’t scale to large datasets that are common in practice. To address the identified shortcomings, we propose two novel aggregation methods both of which significantly outperform MV. The first is a Bayesian version of a weighted average model. It learns unknown voting weights of workers in a principled way by estimating their posterior, unlike existing weighted average models that rely on heuristic update rules or optimising handcrafted objectives. The second approach, complementary to the above, is another Bayesian model that captures the correlations between worker labels which most existing models completely ignore or assume don’t exist. Learning the correlations also helps the method achieve the highest mean accuracy among all methods compared in our experiments. When applying aggregation methods in practice, it’s typically assumed that the only information we have is worker labels, but in many situations more information is available. For the setting where item content is available, e.g. feature vectors, we propose a novel model for aggregating binary labels using a Boltzmann machine prior to bias similar instances towards sharing the same label. We also show further gains by integrating a proposed active learning heuristic. We also consider a second, related, setting where instances are sentences, the task is annotating which words in the sentence denote a named entity, structural outputs from a few classifiers are given, and the goal is ensembling those classifiers. We discuss the strategy of adapting aggregation methods for crowdsourcing into this setting. We also discuss the effect of a few gold labels on truth inference and approaches for effectively leveraging gold labels.