Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 2 of 2
  • Item
    Thumbnail Image
    Robust and Trustworthy Machine Learning
    Huang, Hanxun ( 2024-01)
    The field of machine learning (ML) has undergone rapid advancements in recent decades. The primary objective of ML models is to extract meaningful patterns from vast amounts of data. One of the most successful models, deep neural networks (DNNs), have been deployed in many real-world applications, such as face recognition, medical image analysis, gaming agents, autonomous driving and chatbots. Current DNNs, however, are vulnerable to adversarial perturbations, where an adversary can craft malicious perturbations to manipulate these models. For example, they can inject backdoor patterns into the training data, allowing them to control the model’s prediction with the backdoor pattern (known as a backdoor attack). Also, an adversary can introduce imperceptible adversarial noise to an image and change the prediction of a trained DNN with high confidence (known as an adversarial attack). These vulnerabilities of DNNs raise security concerns, particularly if deployed in safety-critical applications. The current success of DNNs relies on the volume of “free” data on the internet. A recent news article revealed that a company trains large-scale commercial models using personal data obtained from social media, which raises serious privacy concerns. This has led to an open question regarding whether or not data can be made unlearnable for DNNs. Unlike backdoor attacks, unlearnable data do not seek to control the model maliciously but only prevent the model from learning meaningful patterns in the data. Recent advancements in self-supervised learning (SSL) have shown promise in enabling models to learn from data without the need for human supervision. Annotating largescale datasets can be time-consuming and expensive, making SSL an attractive alternative. However, one challenge with SSL is the potential for dimensional collapse in the learned representations. This occurs when many features are highly correlated, giving rise to an “underfilling” phenomenon whereby the data spans only a lower-dimensional subspace. This can reduce the utility of a representation for downstream learning tasks. The first part of this thesis investigates defense strategies against backdoor attacks. Specifically, we develop a robust backdoor data detection method under the poisoning attacks threat model. We introduce a novel backdoor sample detection method Cognitive Distilation (CD). It extracts the minimal essence of features in the input image responsible for the model’s prediction. Through an optimization process, features that are not important are removed. For data containing backdoor triggers, only a small portion of semantic meaningless features are important for calssification, while clean data contains a larger amount of useful semantic features. Based on this characteristic, CD provides novel insights into existing attacks and can robustly detect backdoor samples. Additionally, the CD also reveals the connection between dataset bias and backdoor attacks. Through a case study, we show CD not only can detect bias matches with existing works but also discover several potential biases in a real-world dataset. The second part of this work examines the defences towards adversarial attacks. Adversarial training is one of the most effective defences. However, despite preliminary understandings developed for adversarial training, it is still not clear, from the architectural perspective, what configurations can lead to more robust DNNs. This work addresses this gap via a comprehensive investigation of the impact of network width and depth on the robustness of adversarially trained DNNs. The theoretical and empirical analysis provides the following insights: (1) more parameters do not necessarily help adversarial robustness; (2) reducing capacity at the last stage (the last group of blocks) of the network can improve adversarial robustness; and (3) under the same parameter budget, there exists an optimal architectural configuration for adversarial robustness. These architectural insights can help design adversarially robust DNNs. The third part of this thesis addresses the question of whether or not data can be made unexploitable for DNNs. This work introduces a novel concept, the unlearnable examples, which DNNs cannot learn useful features on such data. The unlearnable examples are generated through error-minimizing noise, which intentionally reduces the error of one or more of the training example(s) close to zero. Consequently, DNNs believe there is “nothing” worth learning from these example(s). The noise is restricted to be imperceptible to human eyes and thus does not affect normal data utility. This work demonstrates its flexibility under extensive experimental settings and practicability in a case study of face recognition. The fourth part of this thesis studies robust regularization techniques to address dimension collapse in SSL. Previous work has considered dimensional collapse at a global level. In this thesis, we demonstrate that learned representations can span over high dimensional space globally but collapse locally. To address this, we propose a method called local dimensional regularization (LDReg). Our formulation is based on the derivation of the Fisher-Rao metric to compare and optimize local distance distributions at an asymptotically small radius for each point. By increasing the local intrinsic dimensionality, we demonstrate through a range of experiments that LDReg improves the representation quality of SSL. The empirical results also show that LDReg can regularize dimensionality at both local and global levels. In summary, this work has contributed significantly toward robust and trustworthy machine learning. It includes the detection of backdoor samples, the development of robust architectures against adversarial examples, the introduction of unlearnable examples and a robust regularization to prevent dimension collapse in self-suerpvised learning.
  • Item
    Thumbnail Image
    A Novel Perspective on Robustness in Deep Learning
    Mohaghegh Dolatabadi, Hadi ( 2022)
    Nowadays, machine learning plays a crucial role in our path toward automated decision-making. Traditional machine learning algorithms would require careful, often manual, feature engineering to deliver satisfactory results. Deep Neural Networks (DNNs) have shown great promise in an attempt to automate this process. Today, DNNs are the primary candidate for various applications, from object detection to high-dimensional density estimation and beyond. Despite their impressive performance, DNNs are vulnerable to different security threats. For instance, in adversarial attacks, an adversary can alter the output of a DNN for its benefit by adding carefully crafted yet imperceptible distortions to clean samples. As another example, in backdoor (Trojan) attacks, an adversary intentionally plants a loophole in the DNN during the learning process. This is often done via attaching specific triggers to the benign samples during training such that the model creates an association between the trigger and a particularly intended output. Once such a loophole is planted, the attacker can activate the backdoor with the learned triggers and bypass the model. All these examples demonstrate the fragility of DNNs in their decision-making, which questions their widespread use in safety-critical applications such as autonomous driving. This thesis studies these vulnerabilities in DNNs from novel perspectives. To this end, we identify two key challenges in the previous studies around the robustness of neural networks. First, while a plethora of existing algorithms can robustify DNNs against attackers to some extent, these methods often lack the efficiency required for their use in real-world applications. Second, the true nature of these adversaries has been less studied, leading to unrealistic assumptions about their behavior. This is particularly crucial as building defense mechanisms using such assumptions would fail to address the underlying threats and create a false belief in the security of DNNs. This thesis studies the first challenge in the context of robust DNN training. In particular, we leverage the theory of coreset selection to form informative weighted subsets of data. We use this framework in two different settings. First, we develop an online algorithm for filtering poisonous data to prevent backdoor attacks. Specifically, we identify two critical properties of poisonous samples based on their gradient space and geometrical representation and define an appropriate selection objective based on these criteria to select clean samples. Second, we extend the idea of coreset selection to adversarial training of DNNs. Although adversarial training is one of the most effective methods in defending DNNs against adversarial attacks, it requires generating costly adversarial examples for each training sample iteratively. To ease the computational burden of various adversarial training methods in a unified manner, we build a weighted subset of the training data that can faithfully approximate the DNN gradient. We show how our proposed solution can lead to robust neural network training more efficiently in both of these scenarios. Then, we touch upon the second challenge and question the validity of one of the widely used assumptions around adversarial attacks. More precisely, it is often assumed that adversarial examples stem from an entirely different distribution than clean data. To challenge this assumption, we resort to generative modeling, particularly Normalizing Flows (NF). Using an NF model pre-trained on clean data, we demonstrate how one can create adversarial examples closely following the clean data distribution. We then use our approach against state-of-the-art adversarial example detection methods to show that methods that explicitly assume a difference in the distribution of adversarial attacks vs. clean data might greatly suffer. Our study reveals the importance of correct assumptions in treating adversarial threats. Finally, we extend the distribution modeling component of our adversarial attacker to increase its density estimation capabilities. In summary, this thesis advances the current state of robustness in deep learning by i) proposing more effective training algorithms against backdoor and adversarial attacks and ii) challenging a fundamental prevalent misconception about the distributional properties of adversarial threats. Through these contributions, we aim to help create more robust neural networks, which is crucial before their deployment in real-world applications. Our work is supported by theoretical analysis and experimental investigations based on publications.