Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 6 of 6
  • Item
    Thumbnail Image
    Enhancing Deep Multimodal Representation: Online, Noise-robust and Unsupervised Learning
    Silva, Dadallage Amila Ruwansiri ( 2022)
    Information that is generated and shared today uses data that involves different modalities. These multimodalities are not limited to the well-known sensory media (e.g., text, image, video, and audio), but could be any abstract or inferred form of encoding information (e.g., propagation network of a news article and sentiment of a text) that represents a different viewpoint of the same object. For machine learning models to be competitive with humans, they should be able to extract and combine information from these modalities. Thus, multimodal representation learning has emerged as a broad research domain that aims to understand complex multimodal environments while narrowing the heterogeneity gap among different modalities. Due to the potential of representing latent information in complex data structures, deep learning-based techniques have recently attracted much attention for multimodal representation learning. Nevertheless, most existing deep multimodal representation learning techniques lack the following: (1) ability to continuously learn and update representations in a memory-efficient manner while being recency-aware and avoiding catastrophic forgetting of historical knowledge; (2) ability to learn unsupervised representations for under-exploited multimodalities with complex data structures (i.e., temporally evolving networks) and high diversity (cross-domain multimodal data); and (3) ability to directly serve as features to address various real-world applications without fine-tuning using an application-specific labelled dataset. This thesis aims to bridge these research gaps in deep multimodal representation learning approaches. In addition, this thesis addresses real-world applications involving multimodal data such as misinformation detection, spatiotemporal activity modeling and online market basket analysis. The main contributions of this thesis include: (1) proposing two novel online learning strategies for learning deep multimodal representations, and proposing two frameworks using the proposed online learning strategies to address two real-world applications -- i.e., user-guided spatiotemporal activity modeling (USTAR) and online market basket analysis (OMBA); (2) proposing METEOR, a memory and time efficient online representation learning algorithm for making deep multimodal representations compact and scalable to cope with the different data rates of real-world multimodal data streams; (3) developing an unsupervised framework to capture and preserve domain-specific and domain-shared knowledge in cross-domain data streams, and applying the proposed framework to address cross-domain fake news detection; (4) proposing an unsupervised model to learn representations for temporally evolving graphs by mimicking the future knowledge of an evolving graph at an early timestep, and developing a new framework called Propagation2Vec with the help of the proposed objective functions for fake news early detection; and (5) developing a theoretically-motivated noise-robust unsupervised learning framework, which can filter out the noise (i.e., fine-tune) in multimodal representations learned from general pretraining objective functions without requiring a labelled dataset, and applying the findings to address the unsupervised fake news detection task.
  • Item
    Thumbnail Image
    Improving Agile Sprint Planning Through Empirical Studies of Documented Information and Story Points Estimation
    Pasuksmit, Jirat ( 2022)
    In Agile iterative development (e.g., Scrum), effort estimation is an integral part of the development iteration planning (i.e., sprint planning). Unlike traditional software development teams, an Agile team relies on a lightweight estimation method based on the team consensus (e.g., Planning Poker) for effort estimation and the estimated effort is continuously refined (or changed) to improve the estimation accuracy. However, such lightweight estimation methods are prone to be inaccurate and late changes of the estimated effort may cause the sprint plan to become unreliable. Despite a large body of research, only few studies have reviewed the reasons for inaccurate estimations and the approaches to improve effort estimation. We conducted a systematic literature review and found that the quality of the available information is one of the most common reasons for inaccurate estimations. We found several manual approaches aim to help the team improve the information quality and manage the uncertainty in effort estimation. However, prior work reported that the practitioners were reluctant to use them as they added additional overhead to the development process. The goals of this thesis are to better understand and propose the approaches that help the team achieves accurate estimation without introducing additional overhead. To achieve this goal, we conducted studies in this thesis in two broad areas. We first conducted two empirical studies to investigate the importance of documented information for effort estimation and the impact of estimation changes in a project. In the first empirical study, we aim to investigate the importance and quality of documented information for effort estimation. We conducted a survey study with 121 Agile practitioners from 25 countries. We found that the documented information is considered important for effort estimation. We also found that the useful documented information for effort estimation is often changed and the practitioners would re-estimate effort when the change of documented information occurred, even after the work had started. In the second empirical study, we aim to better understand the change of effort (in Story Points unit; SP). We examined the prevalence of SP changes, the accuracy of changed SP, and the impact of information changes on SP changes. We found that the SP were not often changed after sprint planning. However, when the SP were changed, the changing size was relatively large and the changed SP may be inaccurate. We also found that the SP changes were often occurred along with the information changes for scope modification. These findings suggest that a change of documented information could lead to a change of effort, and the changed effort could have a large impact on the sprint plan. To mitigate the risk of an unreliable sprint plan, the documented information and the estimated effort should be verified and stabilized before finalizing the sprint plan. Otherwise, the team may have to re-estimate the effort and adjust the sprint plan. However, revisiting all documented information and estimated SP could be a labor-intensive task and may not comply with the Agile principles. To help the team manages these uncertainties without introducing additional overhead, we proposed the automated approaches called DocWarn and SPWarn to predict the documentation changes and SP changes that may occur after sprint planning. We built DocWarn and SPWarn using machine learning and deep learning techniques based on the metrics that measure the characteristics of the work items. We evaluated DocWarn and SPWarn using the work items extracted from the open-source projects. Our empirical evaluations show that DocWarn achieved an average AUC of 0.75 and SPWarn achieved an average AUC of 0.73, which are significantly higher than baseline models. These results suggest that our approaches can predict future changes of documented information and SP based on the currently-available information. With our approaches, the team will be better aware and pay attention to the potential documentation changes and SP changes during sprint planning. Thus, the team can manage uncertainty and reduce the risk of unreliable effort estimation and sprint planning without additional overhead.
  • Item
    Thumbnail Image
    A Novel Perspective on Robustness in Deep Learning
    Mohaghegh Dolatabadi, Hadi ( 2022)
    Nowadays, machine learning plays a crucial role in our path toward automated decision-making. Traditional machine learning algorithms would require careful, often manual, feature engineering to deliver satisfactory results. Deep Neural Networks (DNNs) have shown great promise in an attempt to automate this process. Today, DNNs are the primary candidate for various applications, from object detection to high-dimensional density estimation and beyond. Despite their impressive performance, DNNs are vulnerable to different security threats. For instance, in adversarial attacks, an adversary can alter the output of a DNN for its benefit by adding carefully crafted yet imperceptible distortions to clean samples. As another example, in backdoor (Trojan) attacks, an adversary intentionally plants a loophole in the DNN during the learning process. This is often done via attaching specific triggers to the benign samples during training such that the model creates an association between the trigger and a particularly intended output. Once such a loophole is planted, the attacker can activate the backdoor with the learned triggers and bypass the model. All these examples demonstrate the fragility of DNNs in their decision-making, which questions their widespread use in safety-critical applications such as autonomous driving. This thesis studies these vulnerabilities in DNNs from novel perspectives. To this end, we identify two key challenges in the previous studies around the robustness of neural networks. First, while a plethora of existing algorithms can robustify DNNs against attackers to some extent, these methods often lack the efficiency required for their use in real-world applications. Second, the true nature of these adversaries has been less studied, leading to unrealistic assumptions about their behavior. This is particularly crucial as building defense mechanisms using such assumptions would fail to address the underlying threats and create a false belief in the security of DNNs. This thesis studies the first challenge in the context of robust DNN training. In particular, we leverage the theory of coreset selection to form informative weighted subsets of data. We use this framework in two different settings. First, we develop an online algorithm for filtering poisonous data to prevent backdoor attacks. Specifically, we identify two critical properties of poisonous samples based on their gradient space and geometrical representation and define an appropriate selection objective based on these criteria to select clean samples. Second, we extend the idea of coreset selection to adversarial training of DNNs. Although adversarial training is one of the most effective methods in defending DNNs against adversarial attacks, it requires generating costly adversarial examples for each training sample iteratively. To ease the computational burden of various adversarial training methods in a unified manner, we build a weighted subset of the training data that can faithfully approximate the DNN gradient. We show how our proposed solution can lead to robust neural network training more efficiently in both of these scenarios. Then, we touch upon the second challenge and question the validity of one of the widely used assumptions around adversarial attacks. More precisely, it is often assumed that adversarial examples stem from an entirely different distribution than clean data. To challenge this assumption, we resort to generative modeling, particularly Normalizing Flows (NF). Using an NF model pre-trained on clean data, we demonstrate how one can create adversarial examples closely following the clean data distribution. We then use our approach against state-of-the-art adversarial example detection methods to show that methods that explicitly assume a difference in the distribution of adversarial attacks vs. clean data might greatly suffer. Our study reveals the importance of correct assumptions in treating adversarial threats. Finally, we extend the distribution modeling component of our adversarial attacker to increase its density estimation capabilities. In summary, this thesis advances the current state of robustness in deep learning by i) proposing more effective training algorithms against backdoor and adversarial attacks and ii) challenging a fundamental prevalent misconception about the distributional properties of adversarial threats. Through these contributions, we aim to help create more robust neural networks, which is crucial before their deployment in real-world applications. Our work is supported by theoretical analysis and experimental investigations based on publications.
  • Item
    Thumbnail Image
    Anaphora Resolution in Procedural Text - from Domain to Domain
    Fang, Biaoyan ( 2022)
    Anaphora is an important and frequent concept in any form of discourse. It describes the use of expressions referring back to expressions used earlier in text, to avoid repetition. Anaphora resolution aims at resolving these reference relations in discourse and forms a core task in natural language understanding. It mainly contains two anaphoric types: coreference and bridging. While much effort has been targeted at anaphora resolution, most research has focused on these two anaphoric types separately. Specifically, anaphora research mostly focuses on coreference, modeling it from different perspectives across various resources. Bridging, on the other hand, has not been studied comprehensively. Different work analyzes bridging differently, leading to inconsistencies in bridging definitions. The lack of attention to bridging also brings challenges in capturing comprehensive anaphora phenomena in discourse -- only modeling coreference is not sufficient to capture complex anaphoric relations in text. It is becoming increasingly important to have both coreference and bridging annotated. Additionally, most existing anaphora research is based on declarative text. Procedural text, a common type of text, has received limited attention despite the richness and importance of anaphora phenomena in it, leaving much room for further exploration. In this thesis, we focus on anaphora resolution in procedural text, studying both coreference and bridging based on two common types of procedural text, chemical patents and recipes, and show that our proposed anaphora frameworks are well suited for procedural text. The four research questions we address in this thesis are: (1) How to model anaphora resolution in chemical patents? (2) How to combine different types of anaphora resolution? (3) How to incorporate external knowledge into anaphora resolution? (4) How to generalize our anaphora resolution model to domains apart from the biochemical domain? We address the first research question by proposing domain-specific anaphora annotation guidelines for chemical patents, targeting both coreference and bridging and incorporating general and domain-specific knowledge via in-depth investigations. We resolve ambiguities in bridging definitions by limiting the anaphoric relations to four specific subtypes related to the chemical domain while maintaining high coverage of anaphora phenomena. We achieve high IAA on the created ChEMU-Ref corpus, well above existing bridging corpora and demonstrating the reliability of the created dataset. To address the second research question, we propose an end-to-end joint training anaphora resolution model for coreference and bridging, adopting an end-to-end coreference resolution framework (Lee et al., 2017, 2018). Through empirical experiments on off-the-shelf anaphora corpora, we show the benefits of joint training for bridging. However, the impact on coreference is not clear. We argue that it could be due to ambiguity in the definition of bridging. To validate our hypothesis, we further experiment on two high-quality anaphora corpora with clear anaphora definitions, the ChEMU-Ref and RecipeRef (details in the last research question) datasets, and show the potential in improving both tasks through joint training, indicating the benefits of joint learning of coreference and bridging on high-quality anaphora corpora. Next, we address the third research question from the perspective of the utilization of pretrained language models based on the proposed end-to-end joint training framework, experimenting on the ChEMU-Ref corpus. We show that even with simple replacements, replacing generic language models (e.g. ELMo (Peters et al., 2018)) with domain pretrained language models (e.g. CHELMO (Zhai et al., 2019)), models achieve better performance, suggesting the potential of incorporating external knowledge for domain-specific anaphora resolution. Further explorations on recurrent neural network based and transformer based language models provide deeper insights, and suggest that different approaches might be needed to fully utilize different types of pretrained language models. For the last research question, we generalize the anaphora annotation framework developed for chemical patents to recipes with domain adjustments by detailed analysis of the similarities and differences between these two types of procedural text. Through in-depth comparison, we propose a more generic anaphora annotation framework for procedural text, designing in a hierarchy based on the state of entities. Based on the proposed annotation framework, we create the RecipeRef corpus, capturing rich anaphora phenomena in recipes, maintaining high IAA scores, and suggesting the feasibility of generalizing this framework to other procedural text. We observe further improvement from transfer learning, i.e. pretraining on the ChEMU-Ref dataset and fine-tuning on the RecipeRef dataset, indicating the transformation of general procedural knowledge in this domain. In summary, this thesis studies anaphora resolution in procedural text, particularly based on chemical patents and recipes, two common types of procedural text, and fills the gap in modeling and resolving anaphora resolution in this area. This establishes a firm base and contributes towards further research in anaphora resolution over procedural text.
  • Item
    Thumbnail Image
    Explicit Feature Interaction Modeling for Recommender Systems
    Su, Yixin ( 2022)
    Recommender systems play a key role in addressing information overload issues in many Web applications, such as e-commerce, social media platforms, and lifestyle apps. For example, the Amazon shopping website lists items that a user might be interested in at the top of the website. The core function of recommender systems is to predict how likely a user will select an item (e.g., purchase, click). Typical recommender systems leverage users' historical selected items (i.e., user-item interactions) to infer users' interests. For example, matrix factorization (MF), one of the most common recommendation algorithms, learns user and item representations by factorizing the user-item interaction matrix as the multiplication of a user matrix, and an item matrix. Besides user-item interactions, studies show that features (e.g., user/item attributes, contexts), especially feature interactions (e.g., the co-occurrences of features), are important side information that can significantly enhance the recommendation accuracy. Deep neural networks (DNNs) have achieved great success in many recent studies due to their powerful information analysis ability. As a result, recent recommendation models seek to leverage DNNs for better feature interaction modeling. However, they model feature interactions implicitly (i.e., model all feature interactions together in a black-box DNN model without knowing which interactions are modeled or how they are modeled). Recent studies have shown that implicit interaction learning is less effective in extracting useful information about feature interaction for accurate recommendations. In this thesis, we explore how to effectively leverage feature interactions and improve the performance of recommender systems. Specifically, we focus on modeling feature interactions in an explicit manner. Unlike the implicit modeling methods, explicit methods model each feature interaction individually, allowing to choose which feature interactions to model, and decide how to model each feature interaction for more accurate recommendations. Then, we focus on three challenging research questions in improving recommender systems through explicit feature interaction modeling. The first research question explores how to improve the performance of MF by enabling it to explicitly model feature interactions. MF learns user and item representations from user-item interactions, which has a drawback: it cannot consider feature interactions. This limits MF's potential to perform fine-grained analysis for more accurate predictions. Meanwhile, MF may encounter the cold-start problem (i.e., an item has too few historical user-item interactions to conduct an effective analysis). Focusing on this drawback, instead of factorizing the user-item interaction matrix, we propose to factorize multiple user-attribute interaction matrices to learn attribute representations. The final prediction is an aggregation of all the ratings predicted in the user-attribute matrices. Our proposed method achieves higher accuracy than MF-based methods, while resolving the cold-start problem. The second research question explores which feature interactions should be modeled in recommender systems. Existing recommendation algorithms consider all feature interactions to generate predictions. However, not all feature interactions are relevant to the recommendation prediction, and capturing irrelevant feature interactions may introduce noise and decrease the prediction accuracy. Therefore, we propose to detect a set of most relevant feature interactions (we formally define them as beneficial feature interactions in terms of prediction accuracy) and model only beneficial feature interactions for more accurate recommendation. We propose novel frameworks that leverage the relational reasoning ability of graph neural networks (GNNs) to achieve more effective explicit feature interaction modeling. Under these frameworks, beneficial feature interaction detection and recommendation prediction are achieved via an edge prediction task and a graph classification task, respectively. The third research question explores how to model different feature interactions in recommender systems. Existing recommendation algorithms model each feature interaction equally, neglecting their different impacts while performing the prediction. We explore how feature interactions can be categorized and modeled to fit their roles in recommendation. More specifically, for user attributes and item attributes (e.g., user gender, item color), we define two types of interactions: inner interactions for profile learning and cross interactions for preference matching. We propose a neural graph matching method, which is based on our proposed GNN-based interaction modeling framework, to model the two types of interactions so that can better analyze their impacts on recommendation predictions. For context features (e.g., weather, time), inspired by psychology, we leverage them to learn intrinsic factors and extrinsic factors that jointly influence users selection. Contrastive learning and disentanglement learning algorithms are leveraged to learn these factors. In summary, this thesis has made several contributions to explicit feature interaction modeling for improving recommender systems through: enabling feature interaction modeling in matrix factorization, detecting beneficial feature interactions, and categorizing and modeling different types of feature interactions to fit their roles in recommendation. All works are supported by theoretical analysis and empirical results.
  • Item
    Thumbnail Image
    Generating Deep Network Explanations with Robust Attribution Alignment
    Zeng, Guohang ( 2021)
    Deep Neural Networks (DNNs) have achieved impressive success in many fields, yet the black-box nature of DNNs hinders their application in critical domains, such as the medical domain. To this end, Interpretable Machine Learning (IML) is a research field aims to understand the mechanism behind DNNs via interpretability methods, which aim to provide explanations to human users and help them understand how black-box models make decisions. Current IML methods produce post-hoc attribution maps on pre-trained models. However, recent studies have shown that most of these methods yield unfaithful and noisy explanations. In this study, we present a new paradigm of interpretability methods to improve the quality of explanations. We treat a model’s explanations as a part of the network’s outputs, then generate attribution maps from the underlying deep network. The generated attribution maps are up-sampled from the last convolutional layer of the network to obtain localization information about the target to be explained. Another intuition behind this study is leveraging the connection between interpretability and adversarial machine learning to improve the quality of explanations. Inspired by recent studies that showed adversarially robust models’ saliency aligns well with human perception, we utilize attribution maps from the robust model to supervise the learned attributions. Our proposed method can produce visually plausible explanations along with the prediction in inference phase. Experiments on real datasets show that our proposed method yields more faithful explanations than post-hoc attribution methods with lighter computational costs.