Robust and efficient unsupervised anomaly detection in complex and dynamic environments
AffiliationComputing and Information Systems
Document TypePhD thesis
Access StatusOpen Access
© 2018 Dr. Zahra Ghafoori
Creating value from data is the ultimate goal of all sensing platforms within the Internet of Things (IoT). This value is derived from data analytics that can support effective and timely decision making for a variety of business applications. An important part of this process is anomaly detection, which is typically used for two main purposes. First, anomaly detection can be used to clean data prior to applying pattern classification or prediction techniques. This purpose is often referred to as outlier removal or data cleansing instead of anomaly detection. Second and more importantly, anomaly detection can be used to detect events that have practical impact, such as detecting intrusions or identifying faults. Anomaly detection for the second purpose has attracted considerable attention in the last decade to detect events with usually negative effects. Such events should be detected and dealt with as soon as possible to minimise their impact. The wide use of sensors that collect data from our smart-phones to advanced monitoring systems has created new applications, which introduce new challenges for machine learning and data mining techniques. The sensor data is being generated at increasingly higher rates and in higher dimensional data streams. In this context, the unavailability of labelled training data, changes in normal behaviour over time, and time and memory constraints demand new methods for unsupervised anomaly detection. In this thesis, we look at the challenges for anomaly detection in this context, and propose novel methods to cope with these emerging challenges. We first motivate the need for unsupervised model-based anomaly detection and provide a comprehensive survey of methods from this view. We then enhance a widely used semi-supervised model-based anomaly detection technique to be able to perform unsupervised anomaly detection in a robust and efficient manner. This method builds the foundations of our unsupervised model-based anomaly detection framework, which combines two similarity measures to effectively and efficiently detect unknown anomalies that appear in clusters or as individual instances in a training set. Based on our unsupervised model-based anomaly detection framework, we propose an unsupervised adaptive anomaly detection technique for non-stationary environments. This technique is capable of detecting unknown change points in normal behaviour using an unsupervised change point detection module. In addition, it can automatically select relevant instances from recent instances in a sliding window for leaning new patterns. Our technique is ensemble-based and manages the ensemble of experts by autonomously removing obsolete models. Learning new models is performed using our unsupervised model-based anomaly detection framework. Finally, this technique is efficient for nonstationary environments, because it does not require continuous learning. We propose an unsupervised dimensionality reduction technique to learn data representations that preserve or enlarge the dissimilarity of normal data and anomalies. This method increases the robustness of our unsupervised model-based anomaly detection framework especially when there are clusters of anomalies in a training set. We finally propose a novel unsupervised anomaly detection technique based on estimating data density, sampling and cluster validity checking. This new method has constant training time and labels each new instance in (near) real-time. In addition, it is robust to the presence of a high fraction of anomalies in its training set, and handles various types of anomalies. Finally, if a limited budget is available to boost the accuracy of unsupervised learning on a dataset by asking a few data labels from an expert, our new method can make smart decisions and ask for labels of critical patterns. In summary, we propose efficient model-based and unsupervised dimensionality reduction and anomaly detection methods. The proposed methods can be used for anomaly detection in general, when the environment is non-stationary, or for (near) real-time decision making. We show that the proposed methods achieve higher or comparable accuracy in detecting anomalies compared to existing state-of-the-art techniques.
Keywordsanomaly detection; outlier detection; unsupervised learning; one-class support vector machine; parameter estimation; concept drift; dimensionality reduction; Restricted Boltzmann Machine; kernel density estimation
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References