Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 1 of 1
  • Item
    Thumbnail Image
    Generating Deep Network Explanations with Robust Attribution Alignment
    Zeng, Guohang ( 2021)
    Deep Neural Networks (DNNs) have achieved impressive success in many fields, yet the black-box nature of DNNs hinders their application in critical domains, such as the medical domain. To this end, Interpretable Machine Learning (IML) is a research field aims to understand the mechanism behind DNNs via interpretability methods, which aim to provide explanations to human users and help them understand how black-box models make decisions. Current IML methods produce post-hoc attribution maps on pre-trained models. However, recent studies have shown that most of these methods yield unfaithful and noisy explanations. In this study, we present a new paradigm of interpretability methods to improve the quality of explanations. We treat a model’s explanations as a part of the network’s outputs, then generate attribution maps from the underlying deep network. The generated attribution maps are up-sampled from the last convolutional layer of the network to obtain localization information about the target to be explained. Another intuition behind this study is leveraging the connection between interpretability and adversarial machine learning to improve the quality of explanations. Inspired by recent studies that showed adversarially robust models’ saliency aligns well with human perception, we utilize attribution maps from the robust model to supervise the learned attributions. Our proposed method can produce visually plausible explanations along with the prediction in inference phase. Experiments on real datasets show that our proposed method yields more faithful explanations than post-hoc attribution methods with lighter computational costs.