Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 446
  • Item
    Thumbnail Image
    Lexical Semantics of the Long Tail
    Wada, Takashi ( 2023-12)
    Natural language data is characterised by containing a variety of long-tail instances. For instance, whilst there exists an abundance of text data on the web for major languages such as English, there is a dearth of data for a great number of minor languages. Furthermore, when we look at the corpus data in each language, it usually consists of a very small number of high-frequency words and a plethora of long-tail expressions that are not commonly used in text, such as scientific jargon and multiword expressions. Generally, those long-tail instances draw little attention from the research community, largely because they often have a biased interest in a handful of resource-rich languages and models' overall performance on a specific task, which is, in many cases, not heavily influenced by the long-tail instances in text. In this thesis, we aim to shed light on the long-tail instances in language and explore NLP models that represent their lexical semantics effectively. In particular, we focus on the three types of long-tail instances, namely, extremely low-resource languages, rare words, and multiword expressions. Firstly, for extremely low-resource languages, we propose a new cross-lingual word embedding model that works well with very limited data, and show its effectiveness on the task of aligning semantically equivalent words between high- and low-resource languages. For evaluation, we conduct experiments that involve three endangered languages, namely Yongning Na, Shipibo-Konibo and Griko, and demonstrate that our model performs well on real-world language data. Secondly, with regard to rare words, we first investigate how well recent embedding models can capture lexical semantics in general on lexical substitution, where given a target word in context, a model is tasked with retrieving its synonymous words. To this end, we propose a new lexical substitution method that effectively makes use of existing embedding models, and show that it performs very well on English and Italian, especially for retrieving low-frequency substitutes. We also reveal a couple of limitations of current embedding models: (1) they are highly affected by morphophonetic and morphosyntactic biases, such as article–noun agreement in English and Italian; and (2) they often represent rare words poorly when they are segmented into multiple subwords. To address the second limitation, we propose a new method that performs very well in predicting synonyms of rare words, and demonstrate its effectiveness on lexical substitution and simplification. Lastly, to represent multiword expressions (MWEs) effectively, we propose a new method that paraphrases MWEs with more literal expressions that are easier to understand, e.g. swan song with final performance. Compared to previous approaches that resort to human-crafted resources such as dictionaries, our model is fully unsupervised and relies on monolingual data only, making it applicable to resource-poor languages. For evaluation, we perform experiments in two high-resource languages (English and Portuguese) and one low-resource language (Galician), and demonstrate that our model generates high-quality paraphrases of MWEs in all languages, and aids pre-trained sentence embedding models to encode sentences that contain MWEs by paraphrasing them with literal expressions.
  • Item
    Thumbnail Image
    Computational modeling of the epidemiological dynamics of the skin pathogens Group A Streptococcus and Sarcoptes scabiei
    Tellioglu, Nefel ( 2023-11)
    Sarcoptes scabiei is a skin pathogen that causes substantial health burdens in humans. An estimated 455 million people are affected by scabies annually, resulting in an estimate of 3.8 million disability-adjusted life years annually. Scratching from scabies can result in further bacterial skin infections including Group A Streptococcus (GAS) infections which increases the burden of scabies. GAS infections can lead to severe health conditions such as acute rheumatic fever and rheumatic heart disease. Each year, around 18 million people worldwide suffer from severe GAS-related diseases, resulting in 500,000 deaths. Sarcoptes scabiei and Group A Streptococcus are endemic in many underprivileged populations such as Indigenous communities in Australia. A number of factors are likely to play a role in the high burden of skin pathogens in these settings including heterogeneities in the pathogen population (pathogens having multiple strains with varying characteristics) and host population (populations with varying disease prevalence and transmission rate). While these factors make it difficult to manage disease burden, computational models can help us to understand transmission mechanisms as well as control health burden. In this thesis, I focus on Sarcoptes scabiei and GAS and aim to understand the underlying transmission mechanisms of these skin pathogens and to provide insights into the efficacy of community-specific control strategies to reduce the disease burden using computational modelling. I focus on three key research questions in which I investigate the impact of pathogen and host heterogeneities on disease transmission and identify effective control strategies using computational models. Controlling the spread of pathogens with multiple strains can be challenging due to the strain interactions. It is uncertain how strain interactions play a role in the persistence of high strain diversity in endemic settings and what that implied for future interventions. As my first research question, I focused on “What role do within-host dynamics play in maintaining high diversity of pathogen strains?”. I developed an individual-based model with a synthetic population representing the characteristics of Indigenous communities in Australia. I discovered that within-host competition among strains can impact epidemiological dynamics. My findings revealed that within-host and between-host competition reduces strain diversity when operating independently. However, when they function together, they could significantly increase the diversity of strains. My model suggested that an intervention that reduces the transmission of all the strains had the potential to later increase the level of pathogen diversity, complicating the efficacy of further interventions. In addition, I discussed how this modelling framework can be adapted to investigate the impact of GAS strain interactions on population-level dynamics. To apply mass drug administrations in the areas that need them the most, it is essential to estimate the prevalence of scabies at the community level. Currently, there is no standardisation of approaches to estimate scabies prevalence. Given that prevalence and transmission mechanisms differ among communities, there is a need to thoroughly understand how sampling procedures aiming to asses scabies prevalence interplay with epidemiology. As my second research question, I focused on “Which sampling strategies - individual, household, or school-based - are most effective for estimating the prevalence of scabies in a population?”. I developed another computational model and explored the effectiveness of sampling methods to estimate prevalence of scabies in remote Indigenous communities in Australia. I found that when there is an underlying household-specific heterogeneity in scabies prevalence, the household sampling approach introduces more variance than simple random sampling. I concluded that while the simple random sampling approach seems to be more effective than other sampling methods in estimating scabies prevalence, the efficacy of surveillance strategies depends on how prevalence is distributed within the community. In addition, I built a table for use of future surveillance studies to estimate the sampling percentage based on population size, an accuracy threshold and a priori knowledge of scabies prevalence. To reduce scabies burden in communities with high endemic levels of scabies, three to five annual mass drug administration rounds are recommended by the experts convened by the World Health Organization. Because current guidelines are only based on expert opinions, WHO recommends quantitative evaluations to assess the likely efficacy of MDA recommendations. As my third research question, I focused on “Which mass drug administration strategy is most effective for controlling scabies?”. I developed an individual-based model to evaluate the efficacy of mass drug administration strategies in decreasing burden of scabies in Liberia. I found that while MDAs can be effective in the short and medium term, prevalence will rise over longer time periods until it reaches pre-MDA levels. The modelling results also indicated that low level of scabies prevalence can be sustained long-term when MDAs are combined with behavioral and systemic changes, such as improvements in education and access to the health care system, that shorten the time involved in effective scabies treatment. In this thesis, I conclude that understanding the complex dynamics of skin pathogens remains a challenging problem because of the heterogeneities in host and pathogen populations. While this thesis provides practical results for controlling skin pathogens, it also highlights the need for developing pathogen-specific and community-specific models to reduce the burden of skin pathogens.
  • Item
    Thumbnail Image
    Generalization Lessons from Biomedical Relation Extraction using Pretrained Transformer Models
    Elangovan, Aparna ( 2023-12)
    Curating structured knowledge for storing in biomedical knowledge databases, requires human experts to annotate relationships, thus making maintenance of these databases expensive and difficult to scale to the large quantities of information presented in scientific publications. It is challenging to ensure that the information is comprehensive and up-to-date. Hence, we investigate the generalization capabilities of state-of-the-art natural language processing (NLP) techniques to automate relation extraction to aid human curation. In NLP, deep learning-based architectures, in particular pretrained transformer models with millions of parameters enabling them to achieve state of the art (SOTA) performance, have been dominating leaderboards on public benchmark datasets, usually achieved by fine-tuning pretrained transformer models on the target dataset task. In our research, we investigate the generalizability of such SOTA models – fine-tuned pretrained transformer models – in biomedical relation extraction for real-world applications where the performance expectations of these models need to be applicable beyond the official test sets. While our experiments focus on the current SOTA models, our findings have broader implications on generalization of NLP models and their performance evaluations. We ask the following research questions: 1. How generalizable are fine-tuned pretrained transformer models in biomedical relation extraction? 2. What factors lead to poor generalizability despite high test set performance of fine-tuned pretrained transformer models? 3. How can we improve qualitative aspects of the training data to improve real-world generalization performance of fine-tuned pretrained transformer models? The contributions are: 1) We identify a large performance gap compared to the test set when a SOTA fine-tuned pretrained transformer model is applied at large scale. This substantial generalization gap has neither been verified nor reported in prior large scale biomedical relation extraction studies. 2) We identify that high similarity between training and test sets, even with random splits, can result in inflated performance measurements. We suggest stratifying the test set based on similarity to the training set to provide a more effective interpretation of the results to understand memorization versus generalization capabilities of a model. Furthermore, we also find that the fine-tuned pretrained transformer models appear to rely on spurious correlations that are present in both training and test sets, obtaining inflated test set performance. 3) We also find that, for a given quantity of training data, qualitative aspects can boost performance when fine-tuning pretrained transformers. More specifically, we find that incorporating training samples that are quite similar to one another but have different ground truth labels – we call them human-adversarials – in low to moderate proportions can boost generalization performance by up to 20 points on fine-tuned pretrained transformer models such as BERT, BioBERT and RoBERTa. On the other hand, training samples that are quite similar to one another and have the same ground truth labels – we call them human-affables – can potentially degrade generalization performance. We thus demonstrate that merely aiming for higher quantities of training data is not sufficient to improve generalization. 4) As a result of our findings 1 & 2, we propose to the NLP community that confirming linguistic capabilities as the cause of performance gains even within the context of the test set is crucial to generalization, adapting generalization principles from clinical studies. We thus advocate for effective test sets & evaluation strategies, including adapting concepts such as randomized controlled trials from clinical studies for NLP to establish causation, as our experiments demonstrate how a test set constructed using the standard practice of random splits may not be sufficient to measure generalization capabilities of a model. Overall, in this thesis, we closely examine model generalization and aim to strengthen how machine learning models are evaluated. While we do so in the context of biomedical relation extraction, where generalizability is critical – our findings are applicable to the entire field of machine learning model evaluation in NLP.
  • Item
    Thumbnail Image
    Autonomous Resource Management for Serverless Computing
    Mampage, Anupama Ruwanthika ( 2023-11)
    Serverless computing is gaining momentum as the latest cloud deployment model for applications, with many major global companies shifting towards a complete adoption of this new computing paradigm. As the name implies, the burden of server management is non-existent to the end user under this model, with the cloud vendor taking full responsibility of infrastructure management. The users simply deploy the logic of their application in the form of code segments called 'functions', with a rough estimate on the resource requirements and the rest is taken care of by the serverless platform. This greatly reduces the time-to-market and upfront costs for client products with no expertise required in initial server configurations and subsequent operations. The rapid auto-scalability feature allows customers to scale their businesses fast, without the need for any prior infrastructure requirement planning. The pay-per-use billing model enables a fair ground for applications with spontaneous fluctuations in traffic loads. However, the 'serverless' nature to the end user unequivocally leaves the entire set of end to end server management responsibilities with the cloud vendor. In contrast to conventional Infrastructure-as-a-Service (IaaS) cloud model, where the provider only handles the physical infrastructure maintenance, now the complete virtual machine maintenance is also part of the provider service offering. Moreover, unlike the users managing their allocated resources for the execution of their own applications, the cloud provider has to undertake the same set of tasks with far lesser knowledge available to them. Not only do they have to successfully manage the infrastructure for an application belonging to a separate party, but they need to accommodate the needs of thousands of user applications on the same shared platform, considering the impact of their co-resident behaviors as well. Thus this is a tremendous feat for cloud vendors if accomplished to the satisfaction of all the parties involved. Further, while auto-scaling and granular billing features are vastly favorable to end users, the cloud vendors are at a grave disadvantage if measures are not taken to maintain high efficiency in their underlying resources. Hence, having proper techniques for resource management in place which are capable of meeting all of these challenges is of utmost importance. The existing commercial and open-source serverless platforms mostly follow primitive resource management policies at the moment, which have a vast potential to be optimized. Further, although there is a huge interest in the research community in this area of study, most of the existing works are limited in their generalizability to be adaptable to practical serverless computing environments, considering the multi-tenant and rapidly changing nature of these systems. Moreover, a vast majority of these works are negligent on the service provider perspective of their offered solutions, which is a key factor determining the probable adoption of the same in vendor platforms. This thesis investigates novel techniques for the smooth running of all resource handling operations including resource provisioning, resource scheduling and scaling, which are dynamic and intelligent enough to handle the complexities of this computing environment. Proposed approaches strive to gain a thorough understanding on the behavior of the serverless computing infrastructure, along with the different application workloads, in developing strategies beneficial for both end users and cloud vendors. In essence, this thesis advances the state-of-the-art by making the following contributions: - A comprehensive taxonomy and literature review on the aspect of resource management in serverless computing environments, along with a discussion on identified research gaps, laying the ground work for future research work. - A dynamic resource management and a function request placement technique for meeting user specific application requirements and maintaining high resource efficiency. - A Deep Reinforcement Learning (DRL) based workload and system aware technique for scheduling applications in resource-constrained, multi-tenant serverless computing environments, along with flexibility in achieving a desirable level of application performance and resource cost optimization. - A framework for horizontally and vertically scaling allocated resources to function instances based on a multi-agent DRL model for elevating application performance while maintaining provider resource cost efficiency. - A deployment and scheduling approach for applications in a hybrid serverless and serverful environment based on DRL, with the aim of reducing application latency and incurred user cost. - A detailed discussion outlining challenges and the potential for future work for the efficient use of serverless computing environments.
  • Item
    Thumbnail Image
    Lazy Constraint Generation and Tractable Approximations for Large Scale Planning Problems
    Singh, Anubhav ( 2023-12)
    In our research, we explore two orthogonal but related methodologies of solving planning instances: planning algorithms based on direct but lazy, incremental heuristic search over transition systems and planning as satisfiability. We address numerous challenges associated with solving large planning instances within practical time and memory constraints. This is particularly relevant when solving real-world problems, which often have numeric domains and resources and, therefore, have a large ground representation of the planning instance. Our first contribution is an approximate novelty search, which introduces two novel methods. The first approximates novelty via sampling and Bloom filters, and the other approximates the best-first search using an adaptive policy that decides whether to forgo the expansion of nodes in the open list. For our second work, we present an encoding of the partial order causal link (POCL) formulation of the temporal planning problems into a CP model that handles the instances with required concurrency, which cannot be solved using sequential planners. Our third significant contribution is on lifted sequential planning with lazy constraint generation, which scales very well on large instances with numeric domains and resources. Lastly, we propose a novel way of using novelty approximation as a polynomial reachability propagator, which we use to train the activity heuristics used by the CP solvers.
  • Item
    Thumbnail Image
    Robust and Trustworthy Machine Learning
    Huang, Hanxun ( 2024-01)
    The field of machine learning (ML) has undergone rapid advancements in recent decades. The primary objective of ML models is to extract meaningful patterns from vast amounts of data. One of the most successful models, deep neural networks (DNNs), have been deployed in many real-world applications, such as face recognition, medical image analysis, gaming agents, autonomous driving and chatbots. Current DNNs, however, are vulnerable to adversarial perturbations, where an adversary can craft malicious perturbations to manipulate these models. For example, they can inject backdoor patterns into the training data, allowing them to control the model’s prediction with the backdoor pattern (known as a backdoor attack). Also, an adversary can introduce imperceptible adversarial noise to an image and change the prediction of a trained DNN with high confidence (known as an adversarial attack). These vulnerabilities of DNNs raise security concerns, particularly if deployed in safety-critical applications. The current success of DNNs relies on the volume of “free” data on the internet. A recent news article revealed that a company trains large-scale commercial models using personal data obtained from social media, which raises serious privacy concerns. This has led to an open question regarding whether or not data can be made unlearnable for DNNs. Unlike backdoor attacks, unlearnable data do not seek to control the model maliciously but only prevent the model from learning meaningful patterns in the data. Recent advancements in self-supervised learning (SSL) have shown promise in enabling models to learn from data without the need for human supervision. Annotating largescale datasets can be time-consuming and expensive, making SSL an attractive alternative. However, one challenge with SSL is the potential for dimensional collapse in the learned representations. This occurs when many features are highly correlated, giving rise to an “underfilling” phenomenon whereby the data spans only a lower-dimensional subspace. This can reduce the utility of a representation for downstream learning tasks. The first part of this thesis investigates defense strategies against backdoor attacks. Specifically, we develop a robust backdoor data detection method under the poisoning attacks threat model. We introduce a novel backdoor sample detection method Cognitive Distilation (CD). It extracts the minimal essence of features in the input image responsible for the model’s prediction. Through an optimization process, features that are not important are removed. For data containing backdoor triggers, only a small portion of semantic meaningless features are important for calssification, while clean data contains a larger amount of useful semantic features. Based on this characteristic, CD provides novel insights into existing attacks and can robustly detect backdoor samples. Additionally, the CD also reveals the connection between dataset bias and backdoor attacks. Through a case study, we show CD not only can detect bias matches with existing works but also discover several potential biases in a real-world dataset. The second part of this work examines the defences towards adversarial attacks. Adversarial training is one of the most effective defences. However, despite preliminary understandings developed for adversarial training, it is still not clear, from the architectural perspective, what configurations can lead to more robust DNNs. This work addresses this gap via a comprehensive investigation of the impact of network width and depth on the robustness of adversarially trained DNNs. The theoretical and empirical analysis provides the following insights: (1) more parameters do not necessarily help adversarial robustness; (2) reducing capacity at the last stage (the last group of blocks) of the network can improve adversarial robustness; and (3) under the same parameter budget, there exists an optimal architectural configuration for adversarial robustness. These architectural insights can help design adversarially robust DNNs. The third part of this thesis addresses the question of whether or not data can be made unexploitable for DNNs. This work introduces a novel concept, the unlearnable examples, which DNNs cannot learn useful features on such data. The unlearnable examples are generated through error-minimizing noise, which intentionally reduces the error of one or more of the training example(s) close to zero. Consequently, DNNs believe there is “nothing” worth learning from these example(s). The noise is restricted to be imperceptible to human eyes and thus does not affect normal data utility. This work demonstrates its flexibility under extensive experimental settings and practicability in a case study of face recognition. The fourth part of this thesis studies robust regularization techniques to address dimension collapse in SSL. Previous work has considered dimensional collapse at a global level. In this thesis, we demonstrate that learned representations can span over high dimensional space globally but collapse locally. To address this, we propose a method called local dimensional regularization (LDReg). Our formulation is based on the derivation of the Fisher-Rao metric to compare and optimize local distance distributions at an asymptotically small radius for each point. By increasing the local intrinsic dimensionality, we demonstrate through a range of experiments that LDReg improves the representation quality of SSL. The empirical results also show that LDReg can regularize dimensionality at both local and global levels. In summary, this work has contributed significantly toward robust and trustworthy machine learning. It includes the detection of backdoor samples, the development of robust architectures against adversarial examples, the introduction of unlearnable examples and a robust regularization to prevent dimension collapse in self-suerpvised learning.
  • Item
    Thumbnail Image
    A Toolkit for Semantic Localisation Analysis
    Marini, Gabriele ( 2023-11)
    While UbiComp research has steadily improved the performance of localisation systems, the analysis of such datasets remains largely unaddressed. We present a tool to facilitate the querying and analysis of localisation time-series with a focus on semantic localisation. We developed a conceptual framework based on the idea of strongly-typed spaces, represented as symbolic coordinates. We also demonstrate its power and flexibility through an implementation of the framework and its application on a real-life case indoor localisation scenario.
  • Item
    Thumbnail Image
    Explainable Computer Vision with Unsupervised Concept-based Explanations
    ZHANG, Ruihan ( 2023-10)
    This thesis focuses on concept-based explanations for deep learning models in the computer vision domain with unsupervised concepts. The success of deep learning methods significantly improves the performance of computer vision models. However, the quickly growing complexity of the models makes explainability a more important research focus. One of the major issues in computer vision explainability is that it is unclear what the appropriate features are that can be used in the explanations. Pixels are less understandable features compared with other domains like natural language processing with words as features. In recent years, concepts, that refer to the shared knowledge between human and AI systems with feature maps inside the deep learning model provide significant performance improvement as features in the explanations. Concept-based explanations become a good choice for explainability in computer vision. In most tasks, the supervised concept is the standard choice with better performance. Nevertheless, the concept learning task in supervised concept-based explanations additionally requires a dataset with a designed concept set and instance-level concept labels. Unsupervised concepts could reduce manual workload. In this thesis, we aim to reduce the performance gap between unsupervised and supervised concepts for concept-based explanations in computer vision. Targeting the baseline of concept bottleneck models (CBM) with supervised concepts, combined with the advances that unsupervised concepts do not require the concept set designing and labeling, the core contributions in this thesis make the unsupervised concepts an attractive alternative choice for concept-based explanations. Our core contributions are as follows: 1) We propose a new concept learning algorithm, invertible concept-based explanations (ICE). Explanations with unsupervised concepts can be evaluated with fidelity to the original model, like explanations with supervised concepts. Learned concepts are evaluated to be more understandable than baseline unsupervised concept learning methods like k-means clustering methods from ACE; 2) We propose a general framework of concept-based interpretable models with built-in faithful explanations similar to CBM. The framework makes the comparison between supervised and unsupervised concepts available. We show that unsupervised concepts provide competitive performance with model accuracy and concept interpretability; 3) We propose an example of applications using unsupervised concepts with counterfactual explanations, the fast concept-based counterfactual explanations (FCCE). In the ICE concept space, we propose the analytical solution to the counterfactual loss function. The calculation of counterfactual explanations in concept space takes less than 1e-5 seconds. Also, the FCCE is evaluated to be more interpretable through a human survey. In conclusion, previously, unsupervised concepts are not a choice for concept-based explanations as they suffer from many issues, such as being less interpretable and faithful to supervised concept-based explanations like CBM. With all our core contributions, the accuracy and interoperability performance of unsupervised concepts for concept-based explanations is improved to be competitive with supervised concept-based explanations. Since no extra requirements of concept set design and labeling are required, unsupervised concepts are an attractive choice for concept-based explanations in computer vision with competitive performance to supervised concepts. They also bring the benefit that no manual workload of concept set design and labeling is required.
  • Item
    Thumbnail Image
    Explainable Computer Vision with Unsupervised Concept-based Explanations
    ZHANG, Ruihan ( 2023-10)
    This thesis focuses on concept-based explanations for deep learning models in the computer vision domain with unsupervised concepts. The success of deep learning methods significantly improves the performance of computer vision models. However, the quickly growing complexity of the models makes explainability a more important research focus. One of the major issues in computer vision explainability is that it is unclear what the appropriate features are that can be used in the explanations. Pixels are less understandable features compared with other domains like natural language processing with words as features. In recent years, concepts, that refer to the shared knowledge between human and AI systems with feature maps inside the deep learning model provide significant performance improvement as features in the explanations. Concept-based explanations become a good choice for explainability in computer vision. In most tasks, the supervised concept is the standard choice with better performance. Nevertheless, the concept learning task in supervised concept-based explanations additionally requires a dataset with a designed concept set and instance-level concept labels. Unsupervised concepts could reduce manual workload. In this thesis, we aim to reduce the performance gap between unsupervised and supervised concepts for concept-based explanations in computer vision. Targeting the baseline of concept bottleneck models (CBM) with supervised concepts, combined with the advances that unsupervised concepts do not require the concept set designing and labeling, the core contributions in this thesis make the unsupervised concepts an attractive alternative choice for concept-based explanations. Our core contributions are as follows: 1) We propose a new concept learning algorithm, invertible concept-based explanations (ICE). Explanations with unsupervised concepts can be evaluated with fidelity to the original model, like explanations with supervised concepts. Learned concepts are evaluated to be more understandable than baseline unsupervised concept learning methods like k-means clustering methods from ACE; 2) We propose a general framework of concept-based interpretable models with built-in faithful explanations similar to CBM. The framework makes the comparison between supervised and unsupervised concepts available. We show that unsupervised concepts provide competitive performance with model accuracy and concept interpretability; 3) We propose an example of applications using unsupervised concepts with counterfactual explanations, the fast concept-based counterfactual explanations (FCCE). In the ICE concept space, we propose the analytical solution to the counterfactual loss function. The calculation of counterfactual explanations in concept space takes less than 1e-5 seconds. Also, the FCCE is evaluated to be more interpretable through a human survey. In conclusion, previously, unsupervised concepts are not a choice for concept-based explanations as they suffer from many issues, such as being less interpretable and faithful to supervised concept-based explanations like CBM. With all our core contributions, the accuracy and interoperability performance of unsupervised concepts for concept-based explanations is improved to be competitive with supervised concept-based explanations. Since no extra requirements of concept set design and labeling are required, unsupervised concepts are an attractive choice for concept-based explanations in computer vision with competitive performance to supervised concepts. They also bring the benefit that no manual workload of concept set design and labeling is required.
  • Item
    Thumbnail Image
    Word Associations as a Source of Commonsense Knowledge
    Liu, Chunhua ( 2023-12)
    Commonsense knowledge helps individuals naturally make sense of everyday situations and is important for AI systems to truly understand and interact with humans. However, acquiring such knowledge is difficult due to its implicit nature and sheer size, causing existing large-scale commonsense resources to suffer from a sparsity issue. This thesis addresses the challenge of acquiring commonsense knowledge by using word associations, a resource yet untapped for this purpose in natural language processing (NLP). Word associations are spontaneous connections between concepts that individuals make (e.g., smile and happy), reflecting the human mental lexicon. The aim of this thesis is to complement existing resources like commonsense knowledge graphs and pre-trained language models (PLMs), and enhance models’ ability to reason in a more intuitive and human-like manner. To achieve this aim, we explore three aspects of word associations: (1) understanding the relational knowledge they encode, (2) comparing the content and utility for NLP downstream tasks of large-scale word associations with widely-used commonsense knowledge resources, and (3) improving knowledge extraction from PLMs with word associations. We introduce a crowd-sourced large-scale dataset of word association explanations, which is crucial for disambiguating multiple reasons behind word associations. This resource fills a gap in the cognitive psychology community by providing a dataset to study the rationales and structures underlying associations. By automating the process of labelling word associations with relevant relations, we demonstrate that these explanations enhance the performance of relation extractors. We conduct a comprehensive comparison between large-scale word association networks and the ConceptNet commonsense knowledge graph, analysing their structures, knowledge content, and benefits for commonsense reasoning tasks. Even though we identify systematic differences between the two resources, we find that they both show improvements when incorporated into NLP models. Finally, we propose a diagnostic framework to understand the implicit knowledge encoded in PLMs and identify effective strategies for knowledge extraction. We show that word associations can enhance the quality of extracted knowledge from PLMs. The contributions of this thesis highlight the value of word associations in acquiring commonsense knowledge, offering insights into their utility in cognitive psychology and NLP research.