Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 2 of 2
  • Item
    Thumbnail Image
    Robust and efficient unsupervised anomaly detection in complex and dynamic environments
    Ghafoori, Zahra ( 2018)
    Creating value from data is the ultimate goal of all sensing platforms within the Internet of Things (IoT). This value is derived from data analytics that can support effective and timely decision making for a variety of business applications. An important part of this process is anomaly detection, which is typically used for two main purposes. First, anomaly detection can be used to clean data prior to applying pattern classification or prediction techniques. This purpose is often referred to as outlier removal or data cleansing instead of anomaly detection. Second and more importantly, anomaly detection can be used to detect events that have practical impact, such as detecting intrusions or identifying faults. Anomaly detection for the second purpose has attracted considerable attention in the last decade to detect events with usually negative effects. Such events should be detected and dealt with as soon as possible to minimise their impact. The wide use of sensors that collect data from our smart-phones to advanced monitoring systems has created new applications, which introduce new challenges for machine learning and data mining techniques. The sensor data is being generated at increasingly higher rates and in higher dimensional data streams. In this context, the unavailability of labelled training data, changes in normal behaviour over time, and time and memory constraints demand new methods for unsupervised anomaly detection. In this thesis, we look at the challenges for anomaly detection in this context, and propose novel methods to cope with these emerging challenges. We first motivate the need for unsupervised model-based anomaly detection and provide a comprehensive survey of methods from this view. We then enhance a widely used semi-supervised model-based anomaly detection technique to be able to perform unsupervised anomaly detection in a robust and efficient manner. This method builds the foundations of our unsupervised model-based anomaly detection framework, which combines two similarity measures to effectively and efficiently detect unknown anomalies that appear in clusters or as individual instances in a training set. Based on our unsupervised model-based anomaly detection framework, we propose an unsupervised adaptive anomaly detection technique for non-stationary environments. This technique is capable of detecting unknown change points in normal behaviour using an unsupervised change point detection module. In addition, it can automatically select relevant instances from recent instances in a sliding window for leaning new patterns. Our technique is ensemble-based and manages the ensemble of experts by autonomously removing obsolete models. Learning new models is performed using our unsupervised model-based anomaly detection framework. Finally, this technique is efficient for nonstationary environments, because it does not require continuous learning. We propose an unsupervised dimensionality reduction technique to learn data representations that preserve or enlarge the dissimilarity of normal data and anomalies. This method increases the robustness of our unsupervised model-based anomaly detection framework especially when there are clusters of anomalies in a training set. We finally propose a novel unsupervised anomaly detection technique based on estimating data density, sampling and cluster validity checking. This new method has constant training time and labels each new instance in (near) real-time. In addition, it is robust to the presence of a high fraction of anomalies in its training set, and handles various types of anomalies. Finally, if a limited budget is available to boost the accuracy of unsupervised learning on a dataset by asking a few data labels from an expert, our new method can make smart decisions and ask for labels of critical patterns. In summary, we propose efficient model-based and unsupervised dimensionality reduction and anomaly detection methods. The proposed methods can be used for anomaly detection in general, when the environment is non-stationary, or for (near) real-time decision making. We show that the proposed methods achieve higher or comparable accuracy in detecting anomalies compared to existing state-of-the-art techniques.
  • Item
    Thumbnail Image
    Anomaly detection in participatory sensing networks
    MONAZAM ERFANI, SARAH ( 2015)
    Anomaly detection or outlier detection aims to identify unusual values in a given dataset. In particular, there is growing interest in collaborative anomaly detection, where multiple data sources submit their data to an online data mining service, in order to detect anomalies with respect to the wider population. By combining data from multiple sources, collaborative anomaly detection aims to improve detection accuracy through the construction of a more robust model of normal behaviour. Cloud-based collaborative architectures such as Participatory Sensing Networks (PSNs) provide an open distributed platform that enables participants to share and analyse their local data on a large scale. Two major issues with collaborative anomaly detection are how to ensure the privacy of participants’ data, and how to efficiently analyse the large-scale high-dimensional data collected in these networks. The first problem we address is the issue of data privacy in PSNs, by introducing a framework for privacy-preserving collaborative anomaly detection with efficient local data perturbation at participating nodes, and global processing of the perturbed records at a data mining server. The data perturbation scheme that we propose enables the participants to perturb their data independently without requiring the cooperation of other parties. As a result our privacy-preservation approach is scalable to large numbers of participants and is computationally efficient. By collecting the participants’ data, the PSN server can generate a global anomaly detection model from these locally perturbed records. The global model identifies interesting measurements or unusual patterns in participants’ data without revealing the true values of the measurements. In terms of privacy, the proposed scheme thwarts several major types of attacks, namely, the Independent Component Analysis (ICA), Distance-inference, Maximum a Posteriori (MAP), and Collusion attacks. We further improve the privacy of our data perturbation scheme by: (i) redesigning the nonlinear transformation to better defend against MAP estimation attacks for normal and anomalous records, and (ii) supporting individual random linear transformations for each participant in order to provide the system with greater resistance to malicious collusion. A notable advantage of our perturbation scheme is that it preserves participants’ privacy while achieving comparable accuracy to non-privacy preserving anomaly detection techniques. The second problem we address in the thesis is how to model and interpret the large volumes of high-dimensional data that are generated in participatory domains by using One-class Support Vector Machines (1SVMs). While 1SVMs are effective at producing decision surfaces for anomaly detection from well-behaved feature vectors, they can be inefficient at modelling the variations in large, high-dimensional datasets. We overcome this challenge by taking two different approaches. The first approach is an unsupervised hybrid architecture, in which a Deep Belief Network (DBN) is used to extract generic underlying features, in combination with a 1SVM that uses the features learned by the DBN. DBNs have important advantages as feature detectors for anomaly detection, as DBNs use unlabelled data to capture higher-order correlations among features. Furthermore, using a DBN to reduce the number of irrelevant and redundant features improves the scalability of a 1SVM for use with large training datasets containing high-dimensional records. Our hybrid approach is able to generate an accurate anomaly detection model with lower computational and memory complexity compared to a 1SVM on its own. Alternatively, to overcome the shortcomings of 1SVMs in processing high-dimensional datasets, in our second approach we calculate a lower rank approximation of the optimisation problem that underlies the 1SVM training task. Instead of performing the optimisation in a high-dimensional space, the optimisation is conducted in a space of reduced dimension but on a larger neighbourhood. We leverage the theory of nonlinear random projections and propose the Reduced 1SVM (R1SVM), which is an efficient and scalable anomaly detection technique that can be trained on large-scale datasets. The main objective of R1SVM is to replace a nonlinear machine with randomised features and a linear machine. In summary, we have proposed efficient privacy-preserving anomaly detection approaches for PSNs, and scalable data modelling approaches for high-dimensional datasets, which lower the computational and memory complexity compared to traditional anomaly detection techniques. We have shown that the proposed methods achieve higher or comparable accuracy in detecting anomalies compared to existing state-of-art techniques.