Computing and Information Systems - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 43
  • Item
    Thumbnail Image
    Summarizing Significant Changes in Network Traffic Using Contrast Pattern Mining
    Chavary, EA ; Erfani, SM ; Leckie, C (Association for Computing Machinery, 2017)
    Extracting knowledge from the massive volumes of network traffic is an important challenge in network and security management. In particular, network managers require concise reports about significant changes in their network traffic. While most existing techniques focus on summarizing a single traffic dataset, the problem of finding significant differences between multiple datasets is an open challenge. In this paper, we focus on finding important differences between network traffic datasets, and preparing a summarized and interpretable report for security managers. We propose the use of contrast pattern mining, which finds patterns whose support differs significantly from one dataset to another. We show that contrast patterns are highly effective at extracting meaningful changes in traffic data. We also propose several evaluation metrics that reflect the interpretability of patterns for security managers. Our experimental results show that with the proposed unsupervised approach, the vast majority of extracted patterns are pure, i.e., most changes are either attack traffic or normal traffic, but not a mixture of both.
  • Item
    Thumbnail Image
    Anomalous Behavior Detection in Crowded Scenes Using Clustering and Spatio-Temporal Features
    Yang, M ; Rajasegarar, S ; Rao, AS ; Leckie, C ; Palaniswami, M ; Shi, Z ; Vadera, S ; Li, G (Springer, 2016)
    important problem in real-life applications. Detection of anomalous behaviors such as people standing statically and loitering around a place are the focus of this paper. In order to detect anomalous events and objects, ViBe was used for background modeling and object detection at first. Then, a Kalman filter and Hungarian cost algorithm were implemented for tracking and generating trajectories of people. Next, spatio-temporal features were extracted and represented. Finally, hyperspherical clustering was used for anomaly detection in an unsupervised manner. We investigate three different approaches to extracting and representing spatio-temporal features, and we demonstrate the effectiveness of our proposed feature representation on a standard benchmark dataset and a real-life video surveillance environment.
  • Item
    Thumbnail Image
    On the effectiveness of isolation-based anomaly detection in cloud data centers
    Calheiros, RN ; Ramamohanarao, K ; Buyya, R ; Leckie, C ; Versteeg, S (WILEY, 2017-09-25)
    Summary The high volume of monitoring information generated by large‐scale cloud infrastructures poses a challenge to the capacity of cloud providers in detecting anomalies in the infrastructure. Traditional anomaly detection methods are resource‐intensive and computationally complex for training and/or detection, what is undesirable in very dynamic and large‐scale environment such as clouds. Isolation‐based methods have the advantage of low complexity for training and detection and are optimized for detecting failures. In this work, we explore the feasibility of Isolation Forest, an isolation‐based anomaly detection method, to detect anomalies in large‐scale cloud data centers. We propose a method to code time‐series information as extra attributes that enable temporal anomaly detection and establish its feasibility to adapt to seasonality and trends in the time‐series and to be applied online and in real‐time.
  • Item
    Thumbnail Image
    Exponentially Weighted Ellipsoidal Model for Anomaly Detection
    Moshtaghi, M ; Erfani, SM ; Leckie, C ; Bezdek, JC (WILEY, 2017-09)
  • Item
    Thumbnail Image
    Comparative evaluation of performance measures for shading correction in time-lapse fluorescence microscopy
    Liu, L ; Kan, A ; Leckie, C ; Hodgkin, PD (WILEY, 2017-04)
    Time-lapse fluorescence microscopy is a valuable technology in cell biology, but it suffers from the inherent problem of intensity inhomogeneity due to uneven illumination or camera nonlinearity, known as shading artefacts. This will lead to inaccurate estimates of single-cell features such as average and total intensity. Numerous shading correction methods have been proposed to remove this effect. In order to compare the performance of different methods, many quantitative performance measures have been developed. However, there is little discussion about which performance measure should be generally applied for evaluation on real data, where the ground truth is absent. In this paper, the state-of-the-art shading correction methods and performance evaluation methods are reviewed. We implement 10 popular shading correction methods on two artificial datasets and four real ones. In order to make an objective comparison between those methods, we employ a number of quantitative performance measures. Extensive validation demonstrates that the coefficient of joint variation (CJV) is the most applicable measure in time-lapse fluorescence images. Based on this measure, we have proposed a novel shading correction method that performs better compared to well-established methods for a range of real data tested.
  • Item
    Thumbnail Image
    A time decoupling approach for studying forum dynamics
    Kan, A ; Chan, J ; Hayes, C ; Hogan, B ; Bailey, J ; Leckie, C (SPRINGER, 2013-11)
  • Item
    Thumbnail Image
    A visual-numeric approach to clustering and anomaly detection for trajectory data
    Kumar, D ; Bezdek, JC ; Rajasegarar, S ; Leckie, C ; Palaniswami, M (SPRINGER, 2017-03)
  • Item
    Thumbnail Image
    Discovering outlying aspects in large datasets
    Nguyen, XV ; Chan, J ; Romano, S ; Bailey, J ; Leckie, C ; Ramamohanarao, K ; Pei, J (SPRINGER, 2016-11)
  • Item
    Thumbnail Image
    Training Robust Models with Random Projection
    Nguyen, XV ; Monazam Erfani, S ; Paisitkriangkrai, S ; Bailey, J ; Leckie, C ; Ramamohanarao, K (IEEE, 2016)
    Regularization plays an important role in machine learning systems. We propose a novel methodology for model regularization using random projection. We demonstrate the technique on neural networks, since such models usually comprise a very large number of parameters, calling for strong regularizers. It has been shown recently that neural networks are sensitive to two kinds of samples: (i) adversarial samples, which are generated by imperceptible perturbations of previously correctly-classified samples - yet the network will misclassify them; and (ii) fooling samples, which are completely unrecognizable, yet the network will classify them with extremely high confidence. In this paper, we show how robust neural networks can be trained using random projection. We show that while random projection acts as a strong regularizer, boosting model accuracy similar to other regularizers, such as weight decay and dropout, it is far more robust to adversarial noise and fooling samples. We further show that random projection also helps to improve the robustness of traditional classifiers, such as Random Forrest and Gradient Boosting Machines.
  • Item
    Thumbnail Image
    Meta-analysis of gene expression microarrays with missing replicates
    Shi, F ; Abraham, G ; Leckie, C ; Haviv, I ; Kowalczyk, A (BMC, 2011-03-24)
    BACKGROUND: Many different microarray experiments are publicly available today. It is natural to ask whether different experiments for the same phenotypic conditions can be combined using meta-analysis, in order to increase the overall sample size. However, some genes are not measured in all experiments, hence they cannot be included or their statistical significance cannot be appropriately estimated in traditional meta-analysis. Nonetheless, these genes, which we refer to as incomplete genes, may also be informative and useful. RESULTS: We propose a meta-analysis framework, called "Incomplete Gene Meta-analysis", which can include incomplete genes by imputing the significance of missing replicates, and computing a meta-score for every gene across all datasets. We demonstrate that the incomplete genes are worthy of being included and our method is able to appropriately estimate their significance in two groups of experiments. We first apply the Incomplete Gene Meta-analysis and several comparable methods to five breast cancer datasets with an identical set of probes. We simulate incomplete genes by randomly removing a subset of probes from each dataset and demonstrate that our method consistently outperforms two other methods in terms of their false discovery rate. We also apply the methods to three gastric cancer datasets for the purpose of discriminating diffuse and intestinal subtypes. CONCLUSIONS: Meta-analysis is an effective approach that identifies more robust sets of differentially expressed genes from multiple studies. The incomplete genes that mainly arise from the use of different platforms may also have statistical and biological importance but are ignored or are not appropriately involved by previous studies. Our Incomplete Gene Meta-analysis is able to incorporate the incomplete genes by estimating their significance. The results on both breast and gastric cancer datasets suggest that the highly ranked genes and associated GO terms produced by our method are more significant and biologically meaningful according to the previous literature.