Computing and Information Systems - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 13
  • Item
    Thumbnail Image
    Exploiting patterns to explain individual predictions
    Jia, Y ; Bailey, J ; Ramamohanarao, K ; Leckie, C ; Ma, X (Springer London, 2020-03)
    Users need to understand the predictions of a classifier, especially when decisions based on the predictions can have severe consequences. The explanation of a prediction reveals the reason why a classifier makes a certain prediction, and it helps users to accept or reject the prediction with greater confidence. This paper proposes an explanation method called Pattern Aided Local Explanation (PALEX) to provide instance-level explanations for any classifier. PALEX takes a classifier, a test instance and a frequent pattern set summarizing the training data of the classifier as inputs, and then outputs the supporting evidence that the classifier considers important for the prediction of the instance. To study the local behavior of a classifier in the vicinity of the test instance, PALEX uses the frequent pattern set from the training data as an extra input to guide generation of new synthetic samples in the vicinity of the test instance. Contrast patterns are also used in PALEX to identify locally discriminative features in the vicinity of a test instance. PALEX is particularly effective for scenarios where there exist multiple explanations. In our experiments, we compare PALEX to several state-of-the-art explanation methods over a range of benchmark datasets and find that it can identify explanations with both high precision and high recall.
  • Item
    Thumbnail Image
    Improving the quality of explanations with local embedding perturbations
    Jia, Y ; Bailey, J ; Ramamohanarao, K ; Leckie, C ; Houle, ME (ACM, 2019-07-25)
    Classifier explanations have been identified as a crucial component of knowledge discovery. Local explanations evaluate the behavior of a classifier in the vicinity of a given instance. A key step in this approach is to generate synthetic neighbors of the given instance. This neighbor generation process is challenging and it has considerable impact on the quality of explanations. To assess quality of generated neighborhoods, we propose a local intrinsic dimensionality (LID) based locality constraint. Based on this, we then propose a new neighborhood generation method. Our method first fits a local embedding/subspace around a given instance using the LID of the test instance as the target dimensionality, then generates neighbors in the local embedding and projects them back to the original space. Experimental results show that our method generates more realistic neighborhoods and consequently better explanations. It can be used in combination with existing local explanation algorithms.
  • Item
    Thumbnail Image
    Discovering outlying aspects in large datasets
    Nguyen, XV ; Chan, J ; Romano, S ; Bailey, J ; Leckie, C ; Ramamohanarao, K ; Pei, J (SPRINGER, 2016-11)
  • Item
    Thumbnail Image
    Training Robust Models with Random Projection
    Nguyen, XV ; Monazam Erfani, S ; Paisitkriangkrai, S ; Bailey, J ; Leckie, C ; Ramamohanarao, K (IEEE, 2016)
    Regularization plays an important role in machine learning systems. We propose a novel methodology for model regularization using random projection. We demonstrate the technique on neural networks, since such models usually comprise a very large number of parameters, calling for strong regularizers. It has been shown recently that neural networks are sensitive to two kinds of samples: (i) adversarial samples, which are generated by imperceptible perturbations of previously correctly-classified samples - yet the network will misclassify them; and (ii) fooling samples, which are completely unrecognizable, yet the network will classify them with extremely high confidence. In this paper, we show how robust neural networks can be trained using random projection. We show that while random projection acts as a strong regularizer, boosting model accuracy similar to other regularizers, such as weight decay and dropout, it is far more robust to adversarial noise and fooling samples. We further show that random projection also helps to improve the robustness of traditional classifiers, such as Random Forrest and Gradient Boosting Machines.
  • Item
    Thumbnail Image
    A fast indexing approach for protein structure comparison
    Zhang, L ; Bailey, J ; Konagurthu, AS ; Ramamohanarao, K (BMC, 2010)
    BACKGROUND: Protein structure comparison is a fundamental task in structural biology. While the number of known protein structures has grown rapidly over the last decade, searching a large database of protein structures is still relatively slow using existing methods. There is a need for new techniques which can rapidly compare protein structures, whilst maintaining high matching accuracy. RESULTS: We have developed IR Tableau, a fast protein comparison algorithm, which leverages the tableau representation to compare protein tertiary structures. IR tableau compares tableaux using information retrieval style feature indexing techniques. Experimental analysis on the ASTRAL SCOP protein structural domain database demonstrates that IR Tableau achieves two orders of magnitude speedup over the search times of existing methods, while producing search results of comparable accuracy. CONCLUSION: We show that it is possible to obtain very significant speedups for the protein structure comparison problem, by employing an information retrieval style approach for indexing proteins. The comparison accuracy achieved is also strong, thus opening the way for large scale processing of very large protein structure databases.
  • Item
    Thumbnail Image
    A voting approach to identify a small number of highly predictive genes using multiple classifiers
    Hassan, MR ; Hossain, MM ; Bailey, J ; Macintyre, G ; Ho, JWK ; Ramamohanarao, K (BMC, 2009-01-30)
    BACKGROUND: Microarray gene expression profiling has provided extensive datasets that can describe characteristics of cancer patients. An important challenge for this type of data is the discovery of gene sets which can be used as the basis of developing a clinical predictor for cancer. It is desirable that such gene sets be compact, give accurate predictions across many classifiers, be biologically relevant and have good biological process coverage. RESULTS: By using a new type of multiple classifier voting approach, we have identified gene sets that can predict breast cancer prognosis accurately, for a range of classification algorithms. Unlike a wrapper approach, our method is not specialised towards a single classification technique. Experimental analysis demonstrates higher prediction accuracies for our sets of genes compared to previous work in the area. Moreover, our sets of genes are generally more compact than those previously proposed. Taking a biological viewpoint, from the literature, most of the genes in our sets are known to be strongly related to cancer. CONCLUSION: We show that it is possible to obtain superior classification accuracy with our approach and obtain a compact gene set that is also biologically relevant and has good coverage of different biological processes.
  • Item
    Thumbnail Image
    Semantic-compensation-based recovery in multi-agent systems
    Unruh, A ; Harjadi, H ; Bailey, J ; Ramamohanarai, K (IEEE, 2005)
  • Item
    Thumbnail Image
    Mining Distribution Change in Stock Order Streams
    Liu, X ; Wu, X ; Wang, H ; Zhang, R ; Bailey, J ; Ramamohanarao, K ; Li, F (IEEE COMPUTER SOC, 2010)
  • Item
    Thumbnail Image
    Feature weighted SVMs using receiver operating characteristics
    Zhang, S ; Hossain, MM ; Hassan, MR ; Bailey, J ; Ramamohanarao, K (SIAM Publications, 2009-12-01)
  • Item
    Thumbnail Image
    Improving k-Nearest Neighbour Classification with Distance Functions Based on Receiver Operating Characteristics
    Hassan, MR ; Hossain, MM ; Bailey, J ; Ramamohanarao, K ; Daelemans, W ; Goethals, B ; Morik, K (SPRINGER-VERLAG BERLIN, 2008)