Anomaly detection in large evolving graphs
AffiliationComputing and Information Systems
Document TypePhD thesis
Access StatusOpen Access
© 2017 Dr. Lida Rashidi
Anomaly detection plays a vital role in various application domains including network intrusion detection, environmental monitoring and road traffic analysis. However a major challenge in anomaly detection is how to mine datasets where objects possess causal/non-causal relationships such as friendship, citation and communication relationships. This type of relational data can be represented as a graph, and raises the challenges of how to extend anomaly detection to the domain of relational datasets such as graphs. Anomalies often provide valuable insights into the correlation between an abnormal pattern and a real world phenomenon. Therefore, there has been growing attention towards anomaly detection schemes in dynamic networks. Although these evolving networks impose a curse of dimensionality on the learning models, they usually contain structural properties that anomaly detection schemes can exploit. The major challenge is finding a feature extraction technique that preserves graph structure while balancing the accuracy of the model against its scalability. Another challenge in graph anomaly detection is interpretability and visualization. In this thesis, we tackle these challenges and propose multiple approaches, where each approach addresses several or all of these challenges. Our first scheme is the use of a scalable technique known as random projection as a method for structure-aware embedding. We show that this scheme can preserve useful relational properties of the network, and present an analytical proof of this claim. We also analyse the effect of embedding on the accuracy of one-class support vector machines for anomaly detection on real and synthetic graph datasets. We demonstrate that the embedding can be effective in terms of scalability without detrimental influence on the accuracy of the learned model. Our next approach focuses on community memberships and how they can change during graph evolution. We devise a graph specific non-negative matrix factorization technique, which takes into account the influence that vertices have on each other. This method is interpretable in terms of event attribution and can provide a low-rank approximation of the original graph. Moreover, this approach is applicable in community detection and monitoring community evolution throughout time. Finally, we propose a more generalized graph pre-processing technique that can both extract structural features from the networks and expedite graph mining techniques such as anomaly detection. We focus on detecting anomalies in a sequence of graphs based on rank correlations of the reordered nodes. The merits of our approach lie in its simplicity and resilience to challenges such as unsupervised input, large volumes and high velocities of data. We evaluate the scalability and accuracy of our method on real graphs, where our method facilitates graph processing while producing more deterministic orderings. We show that the proposed approach is capable of revealing anomalies in a more efficient manner based on node rankings. Furthermore, our method can produce visual representations of graphs that are useful for graph compression and can be used as a pre-processing step to expedite common graph mining algorithms such as PageRank.
Keywordsanomaly detection; graph mining; matrix permutation; graph embedding; dynamic graphs
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References