Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 1 of 1
  • Item
    Thumbnail Image
    Scalable contrast pattern mining for network traffic analysis
    Alipourchavary, Elaheh ( 2021)
    Contrast pattern mining is a data mining technique that characterises significant changes between datasets. Contrast patterns identify nontrivial differences between the classes of a dataset, interesting changes between multiple datasets, or emerging trends in a data stream. In this thesis, we focus on the pattern of characterizing changes in Internet traffic using contrast pattern mining. For example, network managers require a compact yet informative report of significant changes in network traffic for security and performance management. In this context, contrast pattern mining is a promising approach to provide a concise and meaningful summary of significant changes in the network. However, the volume and high dimensionality of network traffic records introduce a range of challenges for contrast pattern mining. In particular, these challenges include the combinatorial search space for contrast patterns, the need to mine contrast patterns over data streams, and identifying new changes as opposed to rare recurring changes. In this thesis, we introduce a range of novel contrast mining algorithms to address these challenges. We first introduce the use of contrast pattern mining in network traffic analysis. We show that contrast patterns have strong discriminative power that make them suitable for data summarization, and finding meaningful and important changes between different traffic datasets. We also propose several evaluation metrics that reflect the interpretability of patterns for security managers. In addition, we demonstrate on real-life datasets that the vast majority of extracted patterns are pure, i.e., most change patterns correspond to either attack traffic or normal traffic, but not a mixture of both. We propose a method to efficiently extract contrast patterns between two static datasets.We extract a high-quality set of contrast patterns by using only the most specific patterns to generate a compact and informative report of significant changes for network managers. By elimination of minimal patterns in our approach, we considerably reduce the overlap between generated patterns, and by reducing the redundant patterns, we substantially improve the scalability of contrast pattern mining and achieve a significant speed-up. We also propose a novel approach to discriminate between new events and rare recurring events. Some changes in network traffic occur on a regular basis and show periodic behaviour, which are already known to network analysts. Thus, network managers may want to filter out these known recurring changes, and prioritize their focus on new changes, given their limited time and resources. Our approach to this problem is based on second order contrast pattern mining. Our work demonstrates the importance of higher order contrast pattern mining in practice, and provides an effective method for finding such higher order patterns in large datasets. Finally, based on the approaches that we introduced for contrast pattern mining in static datasets, we then propose two novel methods to extract contrast patterns over high dimensional data streams. We consider two incremental scenarios: (i) when one dataset is changing over time and the other dataset is static as a reference dataset, and (ii) when both datasets are changing over a data stream. In our approach, instead of regenerating all patterns from scratch, we reuse the previously generated contrast patterns wherever possible to mine the new set of patterns. Using this technique, we avoid expensive incremental computation and increase the efficiency and scalability of mining on dense and high dimensional data streams. As a result of this scalability, our method also can find the solutions for datasets where the other algorithms cannot. In addition to the substantial improvements in performance and scalability of our algorithm, we demonstrate that the quality of the generated patterns of our algorithm is quite comparable with the other algorithms. In summary, we propose several efficient contrast pattern mining approaches to extract significant changes between two static datasets or over a data stream. We also introduce a novel approach to identify new changes from the recurring changes. Our experimental results on different real-life datasets demonstrate the improvements in performance of our proposed algorithms compared to the existing approaches.