Computing and Information Systems

This thesis deals with the use of Conditional Random Fields (CRFs; Lafferty et al. (2001)) for Natural Language Processing (NLP). CRFs are probabilistic models for sequence labelling which are particularly well suited to NLP. They have many compelling advantages over other popular models such as Hidden Markov Models and Maximum Entropy Markov Models (Rabiner, 1990; McCallum et al., 2001), and have been applied to a number of NLP tasks with considerable success (e.g., Sha and Pereira (2003) and Smith et al. (2005)). Despite their apparent success, CRFs suffer from two main failings. Firstly, they often over-fit the training sample. This is a consequence of their considerable expressive power, and can be limited by a prior over the model parameters (Sha and Pereira, 2003; Peng and McCallum, 2004). Their second failing is that the standard methods for CRF training are often very slow, sometimes requiring weeks of processing time. This efficiency problem is largely ignored in current literature, although in practise the cost of training prevents the application of CRFs to many new more complex tasks, and also prevents the use of densely connected graphs, which would allow for much richer feature sets. (For complete abstract open document)

Intrusion Detection systems are now an essential component in the overall network and data security arsenal. With the rapid advancement in the network technologies including higher bandwidths and ease of connectivity of wireless and mobile devices, the focus of intrusion detection has shifted from simple signature matching approaches to detecting attacks based on analyzing contextual information which may be specific to individual networks and applications. As a result, anomaly and hybrid intrusion detection approaches have gained significance. However, present anomaly and hybrid detection approaches suffer from three major setbacks; limited attack detection coverage, large number of false alarms and inefficiency in operation. In this thesis, we address these three issues by introducing efficient intrusion detection frameworks and models which are effective in detecting a wide variety of attacks and which result in very few false alarms. Additionally, using our approach, attacks can not only be accurately detected but can also be identified which helps to initiate effective intrusion response mechanisms in real-time. Experimental results performed on the benchmark KDD 1999 data set and two additional data sets collected locally confirm that layered conditional random fields are particularly well suited to detect attacks at the network level and user session modeling using conditional random fields can effectively detect attacks at the application level. We first introduce the layered framework with conditional random fields as the core intrusion detector. Layered conditional random field can be used to build scalable and efficient network intrusion detection systems which are highly accurate in attack detection. We show that our systems can operate either at the network level or at the application level and perform better than other well known approaches for intrusion detection. Experimental results further demonstrate that our system is robust to noise in training data and handles noise better than other systems such as the decision trees and the naive Bayes. We then introduce our unified logging framework for audit data collection and perform user session modeling using conditional random fields to build real-time application intrusion detection systems. We demonstrate that our system can effectively detect attacks even when they are disguised within normal events in a single user session. Using our user session modeling approach based on conditional random fields also results in early attack detection. This is desirable since intrusion response mechanisms can be initiated in real-time thereby minimizing the impact of an attack.

Computing and Information Systems - Theses

Permanent URI for this collection

Filters

Date

Author

Subject

Type

Settings

Sort By

Results per page

Statistics

Citations

Search Results