Now showing 1 - 3 of 3
ItemTask assignment using worker cognitive ability and context to improve data quality in crowdsourcingHettiachchi Mudiyanselage, Danula Eranjith ( 2021)While crowd work on crowdsourcing platforms is becoming prevalent, there exists no widely accepted method to successfully match workers to different types of tasks. Previous work has considered using worker demographics, behavioural traces, and prior task completion records to optimise task assignment. However, optimum task assignment remains a challenging research problem, since proposed approaches lack an awareness of workers' cognitive abilities and context. This thesis investigates and discusses how to use these key constructs for effective task assignment: workers' cognitive ability, and an understanding of the workers' context. Specifically, the thesis presents 'CrowdCog', a dynamic online system for task assignment and task recommendations, that uses fast-paced online cognitive tests to estimate worker performance across a variety of tasks. The proposed task assignment method can achieve significant data quality improvements compared to a baseline where workers select preferred tasks. Next, the thesis investigates how worker context can influence task acceptance, and it presents 'CrowdTasker', a voice-based crowdsourcing platform that provides an alternative form factor and modality to crowd workers. Our findings inform how to better design crowdsourcing platforms to facilitate effective task assignment and recommendation, which can benefit both workers and task requesters.
ItemProbabilistic models for aggregating crowdsourced annotationsLi, Yuan ( 2019)This thesis explores aggregation methods for crowdsourced annotations. Crowdsourcing is a popular means of creating training and evaluation datasets for machine learning, e.g. used for computer vision, natural language processing, and information retrieval, at a low cost and in a timely manner. However, due to low-quality annotations, individual workers cannot be wholly trusted to provide reliable annotations and consequently items are typically redundantly labelled by different workers, with labels aggregated subsequently. Although many aggregation methods have been proposed to jointly learn non-uniform weights to workers and infer the truth, the simplest aggregation method, majority voting (MV) which grants workers equal votes towards consensus, still predominates in practice. To find explanations to the predominance of MV, we conduct extensive experiments of evaluation on 19 datasets and identify two shortcomings that prevent existing methods from being applied in practice over the simple MV. A key finding is that most methods don’t significantly outperform MV across all datasets. These methods may achieve higher mean accuracy than MV does but are also outperformed by MV on several datasets. A secondary shortcoming is that several methods require slow and cumbersome inference, which doesn’t scale to large datasets that are common in practice. To address the identified shortcomings, we propose two novel aggregation methods both of which significantly outperform MV. The first is a Bayesian version of a weighted average model. It learns unknown voting weights of workers in a principled way by estimating their posterior, unlike existing weighted average models that rely on heuristic update rules or optimising handcrafted objectives. The second approach, complementary to the above, is another Bayesian model that captures the correlations between worker labels which most existing models completely ignore or assume don’t exist. Learning the correlations also helps the method achieve the highest mean accuracy among all methods compared in our experiments. When applying aggregation methods in practice, it’s typically assumed that the only information we have is worker labels, but in many situations more information is available. For the setting where item content is available, e.g. feature vectors, we propose a novel model for aggregating binary labels using a Boltzmann machine prior to bias similar instances towards sharing the same label. We also show further gains by integrating a proposed active learning heuristic. We also consider a second, related, setting where instances are sentences, the task is annotating which words in the sentence denote a named entity, structural outputs from a few classifiers are given, and the goal is ensembling those classifiers. We discuss the strategy of adapting aggregation methods for crowdsourcing into this setting. We also discuss the effect of a few gold labels on truth inference and approaches for effectively leveraging gold labels.
ItemCrowdsourcing lexical semantic judgements from bilingual dictionary usersFothergill, Richard James ( 2017)Words can take on many meanings, and collecting and identifying example usages representative of the full variety of meanings words can take is a bottleneck to the study of lexical semantics using statistical approaches. To perform supervised word sense disambiguation (WSD), or to evaluate knowledge-based methods, a corpus of texts annotated with senses from a dictionary may be constructed by paid experts. However, the cost usually prohibits more than a small sample of words and senses being represented in the corpus. Crowdsourcing methods promise to acquire data more cheaply, albeit with a greater challenge for quality control. Most crowdsourcing to date has incentivised participation in the form of a payment or by gamification of the resource construction task. However, with paid crowdsourcing the cost of human labour scales linearly with the output size, and while game playing volunteers may be free, gamification studies must compete with a multi-billion dollar games industry for players. In this thesis we develop and evaluate resources for computational semantics, working towards a crowdsourcing method that extracts information from naturally occurring human activities. A number of software products exist for glossing Japanese text with entries from a dictionary for English speaking students. However, the most popular ones have a tendency to either present an overwhelming amount of information containing every sense of every word or else hide too much information and risk removing senses with particular relevance to a specific text. By offering a glossing application with interactive features for exploring word senses, we create an opportunity to crowdsource human judgements about word senses and record human interaction with semantic NLP.