Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 1 of 1
  • Item
    Thumbnail Image
    Anomaly detection in heterogeneous sensed data
    MOSHTAGHI, MASUD ( 2013)
    Wireless Sensor Networks (WSNs) provide a cost-effective platform for monitoring and data collection in environments where the deployment of wired sensing infrastructure is too expensive or impractical. Many applications of WSNs involve detecting an event in the environment. Gathering all the data from the sensors and trying to analyze it to find events is a cumbersome task as the target event usually happens infrequently. Given the energy-intensive nature of radio transmissions, the limited energy resources of the network can quickly become depleted if the raw data from the nodes has to be transmitted to a single location. Therefore, a major challenge is how to detect interesting or abnormal measurements in the large volume of temporally and spatially correlated data. This research has developed efficient anomaly detection algorithms through modeling the normal behavior of the measurements in wireless sensor networks. These algorithms are able to identify events and faults in the monitored environment while reducing communication between the nodes, thus saving the limited energy of the nodes. We first introduce a framework for anomaly detection with efficient local processing using hyperellipsoidal summaries of the data at the nodes, and global processing of these local summaries. The global processing provides us with an understanding of the network as whole and helps us model different characteristics of the data in non-homogeneous networks. In contrast, the local processing helps to identify interesting measurements or patterns locally at each node. We show that this framework can significantly reduce the communication overhead of a centralized scheme where all the data is transmitted to the sink. The rest of this research is focused on improving the global and local data processing aspects of this framework. We propose an efficient clustering algorithm that can be executed with the limited computational capabilities of sensor nodes. This algorithm allows the nodes to build multiple hyperellipsoidal summaries of their local data, which can then be forwarded to the base station. This local data processing method can be used when multiple distributions may appear in the data of a single node. The base station compares and clusters the elliptical summaries from the nodes to find a global model for the network. Therefore, the accuracy of the global model largely relies on the definition of (dis)similarity between the hyeperellipsoids. We introduce three similarity measures for pairs of ellipsoids that take shape, orientation and location of the hyeperellipsoids into consideration. First an underlying theory and proof for each of these measures has been provided, and then they have been compared and evaluated on different sets of synthetic and real-life datasets. We then present an adaptive method that allows the model to change after the training period. It starts with a small batch of data for the initialization, and then incrementally updates the parameters of the global hyperellipsoidal decision boundaries using the available data at the base station. The sink uses the anomaly messages sent by the nodes to adapt the model. We finally propose two incremental data modeling approaches, which are designed to suit the data streaming nature of WSNs. The first model is called Incremental Data Capture Anomaly Detection (IDCAD), which and iteratively calculates an elliptical boundary for anomaly detection at a node. This model is able to detect changes and anomalies in the long term characteristics of the data. The second model is a predictive dynamic model called an Iterative Fuzzy Regression Model (iFRM). This model benefits from the IDCAD model and can detect long term anomalies, while its prediction capability gives it the ability to detect dynamic anomalies as well. These two approaches provide real-time decision making at the node level. In summary, we have proposed efficient data modeling approaches for anomaly detection in WSNs and a framework for distributed decision making which lowers the communication overhead in the network compared to a communication intensive centralized scheme. We have shown that the proposed methods achieve higher or comparable accuracy in detecting anomalies compared to existing state-of-art techniques.