Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 42
  • Item
    Thumbnail Image
    Digital forensics: increasing the evidential weight of system activity logs
    AHMAD, ATIF ( 2007)
    The application of investigative techniques within digital environments has lead to the emergence of a new field of specialization that may be termed ‘digital forensics’. Perhaps the primary challenge concerning digital forensic investigations is how to preserve evidence of system activity given the volatility of digital environments and the delay between the time of the incident and the start of the forensic investigation. This thesis hypothesizes that system activity logs present in modern operating systems may be used for digital forensic evidence collection. This is particularly true in modern organizations where there is growing recognition that forensic readiness may have considerable benefits in case of future litigation. An investigation into the weighting of evidence produced by system activity logs present in modern operating systems takes place in this thesis. The term ‘evidential weight’ is used loosely as a measure of the suitability of system activity logs to digital forensic investigations. This investigation is approached from an analytical perspective. The first contribution of this thesis is to determine the evidence collection capability of system activity logs by a simple model of the logging mechanism. The second contribution is the development of evidential weighting criteria that can be applied to system activity logs. A unique and critical role for system activity logs by which they establish the reliability of other kinds of computer-derived evidence from hard disk media is also identified. The primary contribution of this thesis is the identification of a comprehensive range of forensic weighting issues arising from the use of log evidence that concern investigators and legal authorities. This contribution is made in a comprehensive analytical discussion utilizing both the logging model and the evidential weighting criteria. The practical usefulness of the resulting evidential weighting framework is demonstrated by rigorous and systematic application to a real-world logging system.
  • Item
    Thumbnail Image
    Hierarchical clustering and summarization of network traffic data
    Mahmood, Abdun Naser ( 2008)
    An important task in managing IP networks is understanding the different types of traffic that are utilizing a network, based on a given trace of the packets or flows in the network. One of the key challenges in this task is the volume and complexity of the data that is available in traffic traces. What is needed by network managers in this context is a concise report of the significant traffic patterns that are present in the network. In this thesis, we address the problem of how to generate a succinct traffic report that contains a set of aggregated traffic flows, such that each aggregate flow corresponds to a significant traffic pattern in the network. We view the problem of generating a report of the significant traffic patterns in a network as a form of clustering problem. In particular, some distance-based hierarchical clustering techniques have advantages in terms of scalability when analyzing the types of large traffic traces that arise in this context. However, there are several important problems that need to be addressed before we can effectively use these types of clustering techniques on network traffic traces. The first research problem we address is how to handle non-numeric attributes that appear in network traffic data, such as attributes with a categorical or hierarchical structure. We have proposed a hierarchical similarity measure that is suitable for comparing hierarchical attributes in network traffic data. We have then developed a one-pass, hierarchical clustering scheme that can exploit the structure of hierarchical attributes in combination with categorical and numerical attributes. We demonstrate that our clustering scheme achieves significant improvements in both accuracy and execution time on a standard benchmark dataset, compared to an existing approach based on frequent itemset clustering. The second research problem we address is how to improve the scalability of our hierarchical clustering scheme when computing resources are limited. We propose an adaptive, two-stage sampling technique, which controls the rate at which records from frequently seen patterns are received by our clustering scheme. This enables more computational resources to be allocated to clustering new or unusual traffic patterns. We demonstrate that our two-stage sampling technique can identify less frequent traffic patterns with greater accuracy than when traditional systematic sampling is used. The third research problem we address is how to generate a concise yet accurate summary report from the results of our hierarchical clustering. We present two approaches to summarization, based on the size and the homogeneity of the clusters in the hierarchical cluster tree. We demonstrate that these approaches to summarization can substantially reduce the final report size with little impact on the accuracy of the report.
  • Item
    Thumbnail Image
    Statistical modeling of multiword expressions
    Su, Kim Nam ( 2008)
    In natural languages, words can occur in single units called simplex words or in a group of simplex words that function as a single unit, called multiword expressions (MWEs). Although MWEs are similar to simplex words in their syntax and semantics, they pose their own sets of challenges (Sag et al. 2002). MWEs are arguably one of the biggest roadblocks in computational linguistics due to the bewildering range of syntactic, semantic, pragmatic and statistical idiomaticity they are associated with, and their high productivity. In addition, the large numbers in which they occur demand specialized handling. Moreover, dealing with MWEs has a broad range of applications, from syntactic disambiguation to semantic analysis in natural language processing (NLP) (Wacholder and Song 2003; Piao et al. 2003; Baldwin et al. 2004; Venkatapathy and Joshi 2006). Our goals in this research are: to use computational techniques to shed light on the underlying linguistic processes giving rise to MWEs across constructions and languages; to generalize existing techniques by abstracting away from individual MWE types; and finally to exemplify the utility of MWE interpretation within general NLP tasks. In this thesis, we target English MWEs due to resource availability. In particular, we focus on noun compounds (NCs) and verb-particle constructions (VPCs) due to their high productivity and frequency. Challenges in processing noun compounds are: (1) interpreting the semantic relation (SR) that represents the underlying connection between the head noun and modifier(s); (2) resolving syntactic ambiguity in NCs comprising three or more terms; and (3) analyzing the impact of word sense on noun compound interpretation. Our basic approach to interpreting NCs relies on the semantic similarity of the NC components using firstly a nearest-neighbor method (Chapter 5), then verb semantics based on the observation that it is often an underlying verb that relates the nouns in NCs (Chapter 6), and finally semantic variation within NC sense collocations, in combination with bootstrapping (Chapter 7). Challenges in dealing with verb-particle constructions are: (1) identifying VPCs in raw text data (Chapter 8); and (2) modeling the semantic compositionality of VPCs (Chapter 5). We place particular focus on identifying VPCs in context, and measuring the compositionality of unseen VPCs in order to predict their meaning. Our primary approach to the identification task is to adapt localized context information derived from linguistic features of VPCs to distinguish between VPCs and simple verb-PP combinations. To measure the compositionality of VPCs, we use semantic similarity among VPCs by testing the semantic contribution of each component. Finally, we conclude the thesis with a chapter-by-chapter summary and outline of the findings of our work, suggestions of potential NLP applications, and a presentation of further research directions (Chapter 9).
  • Item
    Thumbnail Image
    Automatic instant messaging dialogue using statistical models and dialogue acts
    Ivanovic, Edward ( 2008)
    Instant messaging dialogue is used for communication by hundreds of millions of people worldwide, but has received relatively little attention in computational linguistics. We describe methods aimed at providing a shallow interpretation of messages sent via instant messaging. This is done by assigning labels known as dialogue acts to utterances within messages. Since messages may contain more than one utterance, we explore automatic message segmentation using combinations of parse trees and various statistical models to achieve high accuracy for both classification and segmentation tasks. Finally, we gauge the immediate usefulness of dialogue acts in conversation management by presenting a dialogue simulation program that uses dialogue acts to predict utterances during a conversation. The predictions are evaluated via qualitative means where we obtain very encouraging results.
  • Item
    Thumbnail Image
    The effects of decision aid structural restrictiveness on decision-making outcomes
    Seow, Poh-Sun ( 2008)
    This study examines the effects of structural restrictiveness embedded within a decision aid on users’ decision-making outcomes. Structural restrictiveness is determined by the rules embedded within computerized decision aids that restrict how users interact with the decision aid. For example, a structurally-restrictive decision aids might force users to consider information and answer specific questions in a prescribed sequence. In contrast, a less structurally-restrictive decision aid would be designed so that users are free to consider information in whatever sequence they desire. The more structurally-restrictive design imposes more limits on users’ decision-making process because they are forced to adapt their decision-making process to match the decision aid. However, it is unclear whether restricting how users interact with decision aids affects their decision-making outcomes. The results indicate that the more structurally-restrictive decision aid did not assist participants to identify more prompted items compared with the less structurally-restrictive decision aid. However, it increased the decision-making bias in recalling non-prompted items. The results contribute to the decision aid literature by highlighting the cost of increasing the degree of structural restrictiveness embedded within decision aids.
  • Item
    Thumbnail Image
    QoS-based scheduling of workflows on global grids
    YU, JIA ( 2007-10)
    Grid computing has emerged as a global cyber-infrastructure for the next-generation of e-Science applications by integrating large-scale, distributed and heterogeneous resources. Scientific communities are utilizing Grids to share, manage and process large data sets. In order to support complex scientific experiments, distributed resources such as computational devices, data, applications, and scientific instruments need to be orchestrated while managing the application workflow operations within Grid environments. This thesis investigates properties of Grid workflow management systems, presents a workflow engine and algorithms for mapping scientific workflow applications to Grid resources based on specified QoS (Quality of Service) constraints. (For complete abstract open document)
  • Item
    Thumbnail Image
    Coordinated resource provisioning in federated grids
    RANJAN, RAJIV ( 2007-07)
    A fundamental problem in building large scale Grid resource sharing system is the need for efficient and scalable techniques for discovery and provisioning of resources for delivering expected Quality of Service (QoS) to users’ applications. The current approaches to Grid resource sharing based on resource brokers are non-coordinated since these brokers make scheduling related decisions independent of the others in the system. Clearly, this worsens the load-sharing and utilisation problems of distributed Grid resources as sub-optimal schedules are likely to occur. Further, existing brokering systems rely on centralised information services for resource discovery. Centralised or hierarchical resource discovery systems are prone to single-point failure, lack scalability and fault-tolerance ability. In the centralised model, the network links leading to the server are very critical to the overall functionality of the system, as their failure might halt the entire distributed system operation.
  • Item
    Thumbnail Image
    Local search methods for constraint problems
    Muhammad, Muhammad Rafiq Bin ( 2008-02)
    This thesis investigates the use of local search methods in solving constraint problems. Such problems are very hard in general and local search offers a useful and successful alternative to existing techniques. The focus of the thesis is to analyze the techniques of invariants used in local search. The use of invariants have recently become the cornerstone of local search technology as they provide a declarative way to specify incremental algorithms. We have produced a series of program libraries in C++ known as the One-Way-Solver. The One-Way-Solver includes the implementation of incremental data structures and is a useful tool for the implementation of local search. The One-Way-Solver is applied to two challenging constraint problems, the Boolean Satisfiability Testing (SAT) and university course timetabling problems.
  • Item
    Thumbnail Image
    Stigmergic collaboration: a theoretical framework for mass collaboration
    Elliott, Mark Alan ( 2007-12)
    This thesis presents an application-oriented theoretical framework for generalised and specific collaborative contexts with a special focus on Internet-based mass collaboration. The proposed framework is informed by the author’s many years of collaborative arts practice and the design, building and moderation of a number of online collaborative environments across a wide range of contexts and applications. The thesis provides transdisciplinary architecture for describing the underlying mechanisms that have enabled the emergence of mass collaboration and other activities associated with ‘Web 2.0’ by incorporating a collaboratively developed definition and general framework for collaboration and collective activity, as well as theories of swarm intelligence, stigmergy, and distributed cognition. (For complete abstract open document)
  • Item
    Thumbnail Image
    Developing systems for gene normalisation
    Goudey, Benjamin ( 2007-10)
    The rapid growth of biomedical literature has attracted interest from the text mining community to develop methods to help manage the ever-increasing amounts of data. Initiatives such as the BioCreative challenge (Hirschman et al. 2005b) have created standard corpora and tasks in which to evaluate a variety of systems in a common framework. One such task is gene normalisation, in which the problems of synonymy and polysemy in gene name identification are overcome by mapping each mention back to a unique identifier, unambiguously identifying that gene. This task is one of the foundations required for any kind of text mining system working with biomedical literature, where we must be very certain of which genes are being discussed in the text. (For complete abstract open document)