|dc.description.abstract||Interpretability has been recognized as an important property of machine learning models. Lack of interpretability brings challenges for the deployments of many black models such as random forest, support vector machine (SVM) and neural networks. One aspect of interpretability is the ability to provide explanations for the predictions of a model, and explanations help users to understand the logical reasoning behind a model, thus giving users greater confidence to accept/reject predictions. Explanations are useful and sometimes even mandatory in domains like medical analysis, marketing, and criminal investigations, where decisions based on the predictions may have severe consequences.
Traditional classifiers can be categorized into interpretable models (or white-box models) and non-interpretable models (or black-box models). Interpretable models are the models whose internal structures or parameters are simple and can be easily explained, the examples of interpretable models are decision trees, linear models, logistic regression models. Non-interpretable models are the models whose are complex and difficult to explain, the examples of non-interpretable models are random forests, support vector machines and neural networks. Though the white-box models are intrinsically easy to interpret, they usually fail to achieve comparative accuracy as the black-box models. To facilitate the successful deployments of machine learning models when both of interpretability and accuracy are desired, there exist two directions of research: (1) increasing the accuracy of white-box models, and (2) increasing the interpretability of black-box models.
Patterns are conjunctions of feature-value conditions, which are intrinsically easy to comprehend, and they have been shown to have good predictive power as well. The objectives of the thesis is to propose methods to utilize patterns to increase the accuracy of white-box models by interpretable feature engineering and building interpretable models, and help black-box models provide explanations.
First, we discuss the pattern based interpretable feature engineering. Pattern based feature engineering extracts patterns from data, selects the representative patterns from the extracted candidates and then projects the instances in the original feature space into new pattern based feature space with a mapping function. The new pattern based features can be more discriminative, and meanwhile they are interpretable.
Second, we propose a method to explain any classifier using contrast patterns. Given a model and a query instance being explained, the proposed method first generates a synthetic neighborhood around the query using random perturbations, then labels the synthetic instances using the model, finally the method mines contrast patterns from the synthetic neighborhood and selects top K patterns as the final explanations. The experiments show that the method is able to achieve high faithfulness such that the explanations truly reveal how a model ``thinks'', moreover the method is able to support scenarios where there exist multiple possible explanations.
Third, we analyse why some instances are difficult to explain. We investigate the crucial process of generating synthetic neighbors for local explanations methods, as different synthetic neighbors can result in explanations of different quality, and in many cases, the random perturbation does work well. We analyze the relationship of local intrinsic dimensionality (LID) and the quality of explanations, and propose a LID based method to generate the synthetic neighbors such that the generated synthetic neighbors are more effective than the ones generated by other baseline methods.
Then we propose an interpretable model that achieves comparable accuracy with the state of the art baselines using patterns. The proposed method is a pattern based partition-wise linear models method that can be trained together with expert explanations. It divides the data space into several partitions, and each partition is represented by a pattern and is associated with a local linear model. The overall prediction for a query is a linear combination of the local linear models in the activated partitions. The model is interpretable and is able to work with expert explanations as a loss function in terms of explanations is part of the ultimate loss function. The results show that the proposed method is able to make superior reliable predictions and achieve competitive accuracy comparing with the baseline methods.
Finally, we show how to construct a model to make both accurate and reliable predictions by jointly learning explanations and class labels using multi-task learning in neural networks. We propose a neural network structure that is able to jointly train the class labels and the explanations where the explanations can be treated as another label information. We fit a neural network in the framework of multi-task learning. The neural network starts with a set of shared layers and then split into two separate layers where one is for class label and the other is for explanations. The experiments suggest that the proposed method is able to make reliable predictions.
In summary, this work recognizes the importance of interpretability of machine learning models, and it utilizes patterns to help improve the interpretability of machine learning models through: interpretable feature generation, pattern based model-agnostic local explanation extraction, pattern based partition-wise linear models and joint learning framework with explanations and class labels. We also investigate why a particular instance is difficult to explain using local intrinsic dimensionality. All work is supported by theoretical analysis and empirical evaluations.||