Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 5 of 5
  • Item
    Thumbnail Image
    Word Associations as a Source of Commonsense Knowledge
    Liu, Chunhua ( 2023-12)
    Commonsense knowledge helps individuals naturally make sense of everyday situations and is important for AI systems to truly understand and interact with humans. However, acquiring such knowledge is difficult due to its implicit nature and sheer size, causing existing large-scale commonsense resources to suffer from a sparsity issue. This thesis addresses the challenge of acquiring commonsense knowledge by using word associations, a resource yet untapped for this purpose in natural language processing (NLP). Word associations are spontaneous connections between concepts that individuals make (e.g., smile and happy), reflecting the human mental lexicon. The aim of this thesis is to complement existing resources like commonsense knowledge graphs and pre-trained language models (PLMs), and enhance models’ ability to reason in a more intuitive and human-like manner. To achieve this aim, we explore three aspects of word associations: (1) understanding the relational knowledge they encode, (2) comparing the content and utility for NLP downstream tasks of large-scale word associations with widely-used commonsense knowledge resources, and (3) improving knowledge extraction from PLMs with word associations. We introduce a crowd-sourced large-scale dataset of word association explanations, which is crucial for disambiguating multiple reasons behind word associations. This resource fills a gap in the cognitive psychology community by providing a dataset to study the rationales and structures underlying associations. By automating the process of labelling word associations with relevant relations, we demonstrate that these explanations enhance the performance of relation extractors. We conduct a comprehensive comparison between large-scale word association networks and the ConceptNet commonsense knowledge graph, analysing their structures, knowledge content, and benefits for commonsense reasoning tasks. Even though we identify systematic differences between the two resources, we find that they both show improvements when incorporated into NLP models. Finally, we propose a diagnostic framework to understand the implicit knowledge encoded in PLMs and identify effective strategies for knowledge extraction. We show that word associations can enhance the quality of extracted knowledge from PLMs. The contributions of this thesis highlight the value of word associations in acquiring commonsense knowledge, offering insights into their utility in cognitive psychology and NLP research.
  • Item
    Thumbnail Image
    Adapting Clinical Natural Language Processing to Contexts: Task, Framework, and Data Bias
    Liu, Jinghui ( 2023-04)
    Clinical texts contain rich amounts of valuable information about real-world patients and clinical practices that can be utilized to improve clinical care. Mining information from clinical text through Natural Language Processing is a promising research field and has attracted much attention. Recent NLP approaches for clinical text usually treat clinical texts as mere corpora from just “another” textual domain. However, clinical text is generated to serve multiple purposes in the healthcare setting and encodes variations and biases from clinical practice that are often not obvious to NLP researchers. This leads to three types of unsatisfactory applications of clinical NLP. First, some clinical NLP tasks provide solutions with limited applicability to existing clinical decision-making and clinical workflow, and they often tend to target individual patients instead of a patient cohort. Second, the output of many clinical NLP models is often a single number or label, presenting a framework that tends to replace instead of augmenting clinical reasoning in the care process. Third, most recent clinical NLP systems are trained end-to-end to manage the complexity of human language, which neglects the various biases that exist in clinical text. This thesis aims to address these three aspects of clinical NLP through three case studies, which include 1) proposing a prediction task to support clinical resource management at the cohort level, 2) examining the feasibility of patient retrieval as supplementary output for predictive analysis, and 3) evaluating the impact of clinical documentation practices on NLP modeling. The results of these studies demonstrate the importance of taking the clinical context into consideration when designing tasks, developing models, and preparing data for effective and reliable clinical NLP systems.
  • Item
    Thumbnail Image
    Anaphora Resolution in Procedural Text - from Domain to Domain
    Fang, Biaoyan ( 2022)
    Anaphora is an important and frequent concept in any form of discourse. It describes the use of expressions referring back to expressions used earlier in text, to avoid repetition. Anaphora resolution aims at resolving these reference relations in discourse and forms a core task in natural language understanding. It mainly contains two anaphoric types: coreference and bridging. While much effort has been targeted at anaphora resolution, most research has focused on these two anaphoric types separately. Specifically, anaphora research mostly focuses on coreference, modeling it from different perspectives across various resources. Bridging, on the other hand, has not been studied comprehensively. Different work analyzes bridging differently, leading to inconsistencies in bridging definitions. The lack of attention to bridging also brings challenges in capturing comprehensive anaphora phenomena in discourse -- only modeling coreference is not sufficient to capture complex anaphoric relations in text. It is becoming increasingly important to have both coreference and bridging annotated. Additionally, most existing anaphora research is based on declarative text. Procedural text, a common type of text, has received limited attention despite the richness and importance of anaphora phenomena in it, leaving much room for further exploration. In this thesis, we focus on anaphora resolution in procedural text, studying both coreference and bridging based on two common types of procedural text, chemical patents and recipes, and show that our proposed anaphora frameworks are well suited for procedural text. The four research questions we address in this thesis are: (1) How to model anaphora resolution in chemical patents? (2) How to combine different types of anaphora resolution? (3) How to incorporate external knowledge into anaphora resolution? (4) How to generalize our anaphora resolution model to domains apart from the biochemical domain? We address the first research question by proposing domain-specific anaphora annotation guidelines for chemical patents, targeting both coreference and bridging and incorporating general and domain-specific knowledge via in-depth investigations. We resolve ambiguities in bridging definitions by limiting the anaphoric relations to four specific subtypes related to the chemical domain while maintaining high coverage of anaphora phenomena. We achieve high IAA on the created ChEMU-Ref corpus, well above existing bridging corpora and demonstrating the reliability of the created dataset. To address the second research question, we propose an end-to-end joint training anaphora resolution model for coreference and bridging, adopting an end-to-end coreference resolution framework (Lee et al., 2017, 2018). Through empirical experiments on off-the-shelf anaphora corpora, we show the benefits of joint training for bridging. However, the impact on coreference is not clear. We argue that it could be due to ambiguity in the definition of bridging. To validate our hypothesis, we further experiment on two high-quality anaphora corpora with clear anaphora definitions, the ChEMU-Ref and RecipeRef (details in the last research question) datasets, and show the potential in improving both tasks through joint training, indicating the benefits of joint learning of coreference and bridging on high-quality anaphora corpora. Next, we address the third research question from the perspective of the utilization of pretrained language models based on the proposed end-to-end joint training framework, experimenting on the ChEMU-Ref corpus. We show that even with simple replacements, replacing generic language models (e.g. ELMo (Peters et al., 2018)) with domain pretrained language models (e.g. CHELMO (Zhai et al., 2019)), models achieve better performance, suggesting the potential of incorporating external knowledge for domain-specific anaphora resolution. Further explorations on recurrent neural network based and transformer based language models provide deeper insights, and suggest that different approaches might be needed to fully utilize different types of pretrained language models. For the last research question, we generalize the anaphora annotation framework developed for chemical patents to recipes with domain adjustments by detailed analysis of the similarities and differences between these two types of procedural text. Through in-depth comparison, we propose a more generic anaphora annotation framework for procedural text, designing in a hierarchy based on the state of entities. Based on the proposed annotation framework, we create the RecipeRef corpus, capturing rich anaphora phenomena in recipes, maintaining high IAA scores, and suggesting the feasibility of generalizing this framework to other procedural text. We observe further improvement from transfer learning, i.e. pretraining on the ChEMU-Ref dataset and fine-tuning on the RecipeRef dataset, indicating the transformation of general procedural knowledge in this domain. In summary, this thesis studies anaphora resolution in procedural text, particularly based on chemical patents and recipes, two common types of procedural text, and fills the gap in modeling and resolving anaphora resolution in this area. This establishes a firm base and contributes towards further research in anaphora resolution over procedural text.
  • Item
    Thumbnail Image
    Multi-Granular Webpage Information Extraction and Analysis via Deep Joint Learning
    Dai, Yimeng ( 2020)
    The number of webpages is growing exponentially, which results in a great volume of unstructured information on the web. It takes time either to fully comprehend a webpage or to retrieve relevant information from a complex webpage. Analyzing unstructured webpage and extracting structured information from the webpage automatically is crucial. In this study, we aim to develop algorithms for multi-granular webpage information extraction and analysis to facilitate webpage information understanding. We investigate the problem at three levels of granularity, i.e., micro, meso and macro levels. For every level, we focus on one extraction and analysis task, although the algorithms we developed are general and can be applied to many other similar tasks. At the micro level, we aim to extract webpage entities that have diverse forms, and focus on the application of person name recognition. We propose a fine-grained annotation scheme based on anthroponymy and create the first dataset for fine-grained name recognition. We propose a joint model that learns the different name form classes with two sub-neural networks while fusing the learned signals through co-attention and gated fusion mechanisms. Experimental results show that our annotations can be utilised in different ways to improve the recognition performance. At the meso level, we study the relationships between webpage entities and blocks with a focus on the application of joint recognition of names and publications. We address the person name recognition and publication string recognition tasks in academic homepages jointly based on the insight that the two tasks are inherently correlated. We propose a joint model to capture the interdependencies between entities. We also capture global position patterns of blocks and local position patterns of entities in the model learning process. Empirical results on real datasets show that our model outperforms the state-of-the-art publication string recognition model and person name recognition model. Experimental results also show that our model outperforms baseline joint models. At the macro level, we aim to provide hierarchical analysis for webpages from diverse domains. We introduce the Webpage Briefing (WB) task, which aims to generate a summary of a webpage in a hierarchical manner, starting at the top is an abstract and general description of the topic of the webpage page, followed by high level key attributes extracted from the webpage, and then lower level key attributes, which contain concrete and specific key information. We propose to perform webpage briefing by identifying and summarizing the informative contents, which mimic human behaviour of understanding a complex webpage. We propose a novel Dual Distillation method that has a teacher-student architecture with dual distillation. We further propose a Triple Distillation method to better exploit the inherent correlation of specific key attributes and general topics of webpages. We finally propose a novel Triple Joint model that has a triple joint learning architecture with signal exchange and enhancement mechanisms. Experimental results show the superiority of Bi-Distill method and Tri-Distill over baseline methods. Experimental results also show that Tri-Join outperforms baseline single-task models and baseline jointly trained models.
  • Item
    Thumbnail Image
    Pushing the boundaries of deep parsing
    MACKINLAY, ANDREW ( 2012)
    I examine the application of deep parsing techniques to a range of Natural Language Processing tasks as well as methods to improve their performance. Focussing specifically on the English Resource Grammar, a hand-crafted grammar of English based on the Head-Driven Phrase Structure Grammar formalism, I examine some techniques for improving parsing accuracy in diverse domains and methods for evaluating these improvements. I also evaluate the utility of the in-depth linguistic analyses available from this grammar for some specific NLP applications such as biomedical information extraction, as well as investigating other applications of the semantic output available from this grammar.