Computing and Information Systems - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 19
  • Item
    Thumbnail Image
    Exploiting patterns to explain individual predictions
    Jia, Y ; Bailey, J ; Ramamohanarao, K ; Leckie, C ; Ma, X (Springer London, 2020-03)
    Users need to understand the predictions of a classifier, especially when decisions based on the predictions can have severe consequences. The explanation of a prediction reveals the reason why a classifier makes a certain prediction, and it helps users to accept or reject the prediction with greater confidence. This paper proposes an explanation method called Pattern Aided Local Explanation (PALEX) to provide instance-level explanations for any classifier. PALEX takes a classifier, a test instance and a frequent pattern set summarizing the training data of the classifier as inputs, and then outputs the supporting evidence that the classifier considers important for the prediction of the instance. To study the local behavior of a classifier in the vicinity of the test instance, PALEX uses the frequent pattern set from the training data as an extra input to guide generation of new synthetic samples in the vicinity of the test instance. Contrast patterns are also used in PALEX to identify locally discriminative features in the vicinity of a test instance. PALEX is particularly effective for scenarios where there exist multiple explanations. In our experiments, we compare PALEX to several state-of-the-art explanation methods over a range of benchmark datasets and find that it can identify explanations with both high precision and high recall.
  • Item
    Thumbnail Image
    On the effectiveness of isolation-based anomaly detection in cloud data centers
    Calheiros, RN ; Ramamohanarao, K ; Buyya, R ; Leckie, C ; Versteeg, S (WILEY, 2017-09-25)
    Summary The high volume of monitoring information generated by large‐scale cloud infrastructures poses a challenge to the capacity of cloud providers in detecting anomalies in the infrastructure. Traditional anomaly detection methods are resource‐intensive and computationally complex for training and/or detection, what is undesirable in very dynamic and large‐scale environment such as clouds. Isolation‐based methods have the advantage of low complexity for training and detection and are optimized for detecting failures. In this work, we explore the feasibility of Isolation Forest, an isolation‐based anomaly detection method, to detect anomalies in large‐scale cloud data centers. We propose a method to code time‐series information as extra attributes that enable temporal anomaly detection and establish its feasibility to adapt to seasonality and trends in the time‐series and to be applied online and in real‐time.
  • Item
    Thumbnail Image
    Exponentially Weighted Ellipsoidal Model for Anomaly Detection
    Moshtaghi, M ; Erfani, SM ; Leckie, C ; Bezdek, JC (WILEY, 2017-09)
  • Item
    Thumbnail Image
    Comparative evaluation of performance measures for shading correction in time-lapse fluorescence microscopy
    Liu, L ; Kan, A ; Leckie, C ; Hodgkin, PD (WILEY, 2017-04)
    Time-lapse fluorescence microscopy is a valuable technology in cell biology, but it suffers from the inherent problem of intensity inhomogeneity due to uneven illumination or camera nonlinearity, known as shading artefacts. This will lead to inaccurate estimates of single-cell features such as average and total intensity. Numerous shading correction methods have been proposed to remove this effect. In order to compare the performance of different methods, many quantitative performance measures have been developed. However, there is little discussion about which performance measure should be generally applied for evaluation on real data, where the ground truth is absent. In this paper, the state-of-the-art shading correction methods and performance evaluation methods are reviewed. We implement 10 popular shading correction methods on two artificial datasets and four real ones. In order to make an objective comparison between those methods, we employ a number of quantitative performance measures. Extensive validation demonstrates that the coefficient of joint variation (CJV) is the most applicable measure in time-lapse fluorescence images. Based on this measure, we have proposed a novel shading correction method that performs better compared to well-established methods for a range of real data tested.
  • Item
    Thumbnail Image
    Online cluster validity indices for performance monitoring of streaming data clustering
    Moshtaghi, M ; Bezdek, JC ; Erfani, SM ; Leckie, C ; Bailey, J (WILEY-HINDAWI, 2019-04)
  • Item
    Thumbnail Image
    A time decoupling approach for studying forum dynamics
    Kan, A ; Chan, J ; Hayes, C ; Hogan, B ; Bailey, J ; Leckie, C (SPRINGER, 2013-11)
  • Item
    Thumbnail Image
    A visual-numeric approach to clustering and anomaly detection for trajectory data
    Kumar, D ; Bezdek, JC ; Rajasegarar, S ; Leckie, C ; Palaniswami, M (SPRINGER, 2017-03)
  • Item
    Thumbnail Image
    Discovering outlying aspects in large datasets
    Nguyen, XV ; Chan, J ; Romano, S ; Bailey, J ; Leckie, C ; Ramamohanarao, K ; Pei, J (SPRINGER, 2016-11)
  • Item
  • Item
    Thumbnail Image
    Meta-analysis of gene expression microarrays with missing replicates
    Shi, F ; Abraham, G ; Leckie, C ; Haviv, I ; Kowalczyk, A (BMC, 2011-03-24)
    BACKGROUND: Many different microarray experiments are publicly available today. It is natural to ask whether different experiments for the same phenotypic conditions can be combined using meta-analysis, in order to increase the overall sample size. However, some genes are not measured in all experiments, hence they cannot be included or their statistical significance cannot be appropriately estimated in traditional meta-analysis. Nonetheless, these genes, which we refer to as incomplete genes, may also be informative and useful. RESULTS: We propose a meta-analysis framework, called "Incomplete Gene Meta-analysis", which can include incomplete genes by imputing the significance of missing replicates, and computing a meta-score for every gene across all datasets. We demonstrate that the incomplete genes are worthy of being included and our method is able to appropriately estimate their significance in two groups of experiments. We first apply the Incomplete Gene Meta-analysis and several comparable methods to five breast cancer datasets with an identical set of probes. We simulate incomplete genes by randomly removing a subset of probes from each dataset and demonstrate that our method consistently outperforms two other methods in terms of their false discovery rate. We also apply the methods to three gastric cancer datasets for the purpose of discriminating diffuse and intestinal subtypes. CONCLUSIONS: Meta-analysis is an effective approach that identifies more robust sets of differentially expressed genes from multiple studies. The incomplete genes that mainly arise from the use of different platforms may also have statistical and biological importance but are ignored or are not appropriately involved by previous studies. Our Incomplete Gene Meta-analysis is able to incorporate the incomplete genes by estimating their significance. The results on both breast and gastric cancer datasets suggest that the highly ranked genes and associated GO terms produced by our method are more significant and biologically meaningful according to the previous literature.