Computing and Information Systems - Research Publications

Permanent URI for this collection

http://hdl.handle.net/11343/350

Search Results

Now showing 1 - 10 of 13

Exploiting patterns to explain individual predictions

Jia, Y ; Bailey, J ; Ramamohanarao, K ; Leckie, C ; Ma, X (Springer London, 2020-03)

Users need to understand the predictions of a classifier, especially when decisions based on the predictions can have severe consequences. The explanation of a prediction reveals the reason why a classifier makes a certain prediction, and it helps users to accept or reject the prediction with greater confidence. This paper proposes an explanation method called Pattern Aided Local Explanation (PALEX) to provide instance-level explanations for any classifier. PALEX takes a classifier, a test instance and a frequent pattern set summarizing the training data of the classifier as inputs, and then outputs the supporting evidence that the classifier considers important for the prediction of the instance. To study the local behavior of a classifier in the vicinity of the test instance, PALEX uses the frequent pattern set from the training data as an extra input to guide generation of new synthetic samples in the vicinity of the test instance. Contrast patterns are also used in PALEX to identify locally discriminative features in the vicinity of a test instance. PALEX is particularly effective for scenarios where there exist multiple explanations. In our experiments, we compare PALEX to several state-of-the-art explanation methods over a range of benchmark datasets and find that it can identify explanations with both high precision and high recall.
Improving the quality of explanations with local embedding perturbations

Jia, Y ; Bailey, J ; Ramamohanarao, K ; Leckie, C ; Houle, ME (ACM, 2019-07-25)

Classifier explanations have been identified as a crucial component of knowledge discovery. Local explanations evaluate the behavior of a classifier in the vicinity of a given instance. A key step in this approach is to generate synthetic neighbors of the given instance. This neighbor generation process is challenging and it has considerable impact on the quality of explanations. To assess quality of generated neighborhoods, we propose a local intrinsic dimensionality (LID) based locality constraint. Based on this, we then propose a new neighborhood generation method. Our method first fits a local embedding/subspace around a given instance using the LID of the test instance as the target dimensionality, then generates neighbors in the local embedding and projects them back to the original space. Experimental results show that our method generates more realistic neighborhoods and consequently better explanations. It can be used in combination with existing local explanation algorithms.
Discovering outlying aspects in large datasets

Nguyen, XV ; Chan, J ; Romano, S ; Bailey, J ; Leckie, C ; Ramamohanarao, K ; Pei, J (SPRINGER, 2016-11)
Training Robust Models with Random Projection

Nguyen, XV ; Monazam Erfani, S ; Paisitkriangkrai, S ; Bailey, J ; Leckie, C ; Ramamohanarao, K (IEEE, 2016)

Regularization plays an important role in machine learning systems. We propose a novel methodology for model regularization using random projection. We demonstrate the technique on neural networks, since such models usually comprise a very large number of parameters, calling for strong regularizers. It has been shown recently that neural networks are sensitive to two kinds of samples: (i) adversarial samples, which are generated by imperceptible perturbations of previously correctly-classified samples - yet the network will misclassify them; and (ii) fooling samples, which are completely unrecognizable, yet the network will classify them with extremely high confidence. In this paper, we show how robust neural networks can be trained using random projection. We show that while random projection acts as a strong regularizer, boosting model accuracy similar to other regularizers, such as weight decay and dropout, it is far more robust to adversarial noise and fooling samples. We further show that random projection also helps to improve the robustness of traditional classifiers, such as Random Forrest and Gradient Boosting Machines.
A fast indexing approach for protein structure comparison

Zhang, L ; Bailey, J ; Konagurthu, AS ; Ramamohanarao, K (BMC, 2010)

BACKGROUND: Protein structure comparison is a fundamental task in structural biology. While the number of known protein structures has grown rapidly over the last decade, searching a large database of protein structures is still relatively slow using existing methods. There is a need for new techniques which can rapidly compare protein structures, whilst maintaining high matching accuracy. RESULTS: We have developed IR Tableau, a fast protein comparison algorithm, which leverages the tableau representation to compare protein tertiary structures. IR tableau compares tableaux using information retrieval style feature indexing techniques. Experimental analysis on the ASTRAL SCOP protein structural domain database demonstrates that IR Tableau achieves two orders of magnitude speedup over the search times of existing methods, while producing search results of comparable accuracy. CONCLUSION: We show that it is possible to obtain very significant speedups for the protein structure comparison problem, by employing an information retrieval style approach for indexing proteins. The comparison accuracy achieved is also strong, thus opening the way for large scale processing of very large protein structure databases.
A voting approach to identify a small number of highly predictive genes using multiple classifiers

Hassan, MR ; Hossain, MM ; Bailey, J ; Macintyre, G ; Ho, JWK ; Ramamohanarao, K (BMC, 2009-01-30)

BACKGROUND: Microarray gene expression profiling has provided extensive datasets that can describe characteristics of cancer patients. An important challenge for this type of data is the discovery of gene sets which can be used as the basis of developing a clinical predictor for cancer. It is desirable that such gene sets be compact, give accurate predictions across many classifiers, be biologically relevant and have good biological process coverage. RESULTS: By using a new type of multiple classifier voting approach, we have identified gene sets that can predict breast cancer prognosis accurately, for a range of classification algorithms. Unlike a wrapper approach, our method is not specialised towards a single classification technique. Experimental analysis demonstrates higher prediction accuracies for our sets of genes compared to previous work in the area. Moreover, our sets of genes are generally more compact than those previously proposed. Taking a biological viewpoint, from the literature, most of the genes in our sets are known to be strongly related to cancer. CONCLUSION: We show that it is possible to obtain superior classification accuracy with our approach and obtain a compact gene set that is also biologically relevant and has good coverage of different biological processes.
Semantic-compensation-based recovery in multi-agent systems

Unruh, A ; Harjadi, H ; Bailey, J ; Ramamohanarai, K (IEEE, 2005)
Mining Distribution Change in Stock Order Streams

Liu, X ; Wu, X ; Wang, H ; Zhang, R ; Bailey, J ; Ramamohanarao, K ; Li, F (IEEE COMPUTER SOC, 2010)
Feature weighted SVMs using receiver operating characteristics

Zhang, S ; Hossain, MM ; Hassan, MR ; Bailey, J ; Ramamohanarao, K (SIAM Publications, 2009-12-01)
Improving k-Nearest Neighbour Classification with Distance Functions Based on Receiver Operating Characteristics

Hassan, MR ; Hossain, MM ; Bailey, J ; Ramamohanarao, K ; Daelemans, W ; Goethals, B ; Morik, K (SPRINGER-VERLAG BERLIN, 2008)

Computing and Information Systems - Research Publications

Permanent URI for this collection

Filters

Date

Author

Subject

Type

Settings

Sort By

Results per page

Statistics

Citations

Search Results