Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 7 of 7
  • Item
    Thumbnail Image
    Automatic instant messaging dialogue using statistical models and dialogue acts
    Ivanovic, Edward ( 2008)
    Instant messaging dialogue is used for communication by hundreds of millions of people worldwide, but has received relatively little attention in computational linguistics. We describe methods aimed at providing a shallow interpretation of messages sent via instant messaging. This is done by assigning labels known as dialogue acts to utterances within messages. Since messages may contain more than one utterance, we explore automatic message segmentation using combinations of parse trees and various statistical models to achieve high accuracy for both classification and segmentation tasks. Finally, we gauge the immediate usefulness of dialogue acts in conversation management by presenting a dialogue simulation program that uses dialogue acts to predict utterances during a conversation. The predictions are evaluated via qualitative means where we obtain very encouraging results.
  • Item
    Thumbnail Image
    Local search methods for constraint problems
    Muhammad, Muhammad Rafiq Bin ( 2008-02)
    This thesis investigates the use of local search methods in solving constraint problems. Such problems are very hard in general and local search offers a useful and successful alternative to existing techniques. The focus of the thesis is to analyze the techniques of invariants used in local search. The use of invariants have recently become the cornerstone of local search technology as they provide a declarative way to specify incremental algorithms. We have produced a series of program libraries in C++ known as the One-Way-Solver. The One-Way-Solver includes the implementation of incremental data structures and is a useful tool for the implementation of local search. The One-Way-Solver is applied to two challenging constraint problems, the Boolean Satisfiability Testing (SAT) and university course timetabling problems.
  • Item
    Thumbnail Image
    Capturing the semantics of change: operation augmented ontologies
    Newell, Gavan John ( 2009)
    As information systems become more complex it is infeasible for a non-expert to understand how the information system has evolved. Accurate models of these systems and the changes occurring to them are required for interpreters to understand, reason over, and learn from evolution of these systems. Ontologies purport to model the semantics of the domain encapsulated in the system. Existing approaches to using ontologies do not capture the rationale for change but instead focus on the direct differences between one version of a model and the subsequent version. Some changes to ontologies are caused by a larger context or goal that is temporally separated from each specific change to the ontology. Current approaches to supporting change in ontologies are insufficient for reasoning over changes and allow changes that lead to inconsistent ontologies. In this thesis we examine the existing approaches and their limitations and present a four-level classification system for models representing change. We address the shortcomings in current techniques by introducing a new approach, augmenting ontologies with operations for capturing and representing change. In this approach changes are represented as a series of connected, related and non-sequential smaller changes. The new approach improves on existing approaches by capturing root causes of change, by representing causal relationships between changes linking temporally disconnected changes to a root cause and by preventing inconsistencies in the evolution of the ontology. The new approach also explicitly links changes in an ontology to the motivating real-world changes. We present an abstract machine that defines the execution of operations on ontologies. A case study is then used to explain the new approach and to demonstrate how it improves on existing ways of supporting change in ontologies. The new approach is an important step towards providing ontologies with the capacity to go beyond representing an aspect of a domain to include ways in which that representation can change.
  • Item
    Thumbnail Image
    A dynamic approximate representation scheme for streaming time series
    Zhou, Pu ( 2009)
    The huge volume of time series data generated in many applications poses new challenges in the techniques of data storage, transmission, and computation. Further more, when the time series are in the form of streaming data, new problems emerge and new techniques are required because of the streaming characteristics, e.g. high volume, high speed and continuous flowing. Approximate representation is one of the most efficient and effective solutions to address the large-volume-high-speed problem. In this thesis, we propose a dynamic representation scheme for streaming time series. Existing methods use a unitary function form for the entire approximation task. In contrast, our method adopts a set of function candidates such as linear function, polynomial function(degree ≥ 2), and exponential function. We provide a novel segmenting strategy to generate subsequences and dynamically choose candidate functions to approximate the subsequences. Since we are dealing with streaming time series, the segmenting points and the corresponding approximate functions are incrementally produced. For a certain function form, we use a buffer window to find the local farthest possible segmenting point under a user specified error tolerance threshold. To achieve this goal, we define a feasible space for the coefficients of the function and show that we can indirectly find the local best segmenting point by the calculation in the coefficient space. Given the error tolerance threshold, the candidate function representing more information by unit parameter is chosen as the approximate function. Therefore, our representation scheme is more flexible and compact. We provide two dynamic algorithms, PLQS and PLQES, which involve two and three candidate functions, respectively. We also present the general strategy of function selection when more candidate functions are considered. In the experimental test, we examine the effectiveness of our algorithms with synthetic and real time series data sets. We compare our method with the piecewise linear approximation method and the experimental results demonstrate the evident superiority of our dynamic approach under the same error tolerance threshold.
  • Item
    Thumbnail Image
    A study of recursive partitioning technique for indexing spatial objects and trajectories
    Antoine, Elizabeth Arockiarani ( 2009)
    The requirement to store and manipulate data that represents the location and extent of objects, like roads, cities, rivers, etc. (spatial data), led to the evolution of spatial database systems. The domains that led to an increased interest in spatial database systems are earth science, robotics, resource management, urban planning, autonomous navigation and geographic information systems (GIS). To handle the spatial data efficiently, spatial database systems require indexing mechanisms that can retrieve spatial objects based on their locations using direct look-ups as opposed to the sequential search. Indexing structures designed for relational database systems cannot be used for objects with non-zero size. The fact that there is no total ordering of objects in space makes the conventional indexes, such as the B+ tree, incapable of handling spatial data. Extensive work has been done on spatial indexing and indexing methods are categorized in terms of their efficiency for the type of spatial objects or the type of queries. Queries in spatial database system are classified as single-scan and multi-scan queries. Spatial join is the most important multi-scan query in a spatial database system and the execution time of such queries is super linear to the number of objects. Among the indexing structures available for spatial join queries, Filter trees perform better than its counterparts, such as Hilbert R-trees. Filter tree join algorithm outperforms the R-tree join algorithm by reading each block of entities at most once. Filter trees combine the recursive partitioning, size separation and space filling curves to achieve this efficiency. However, for data sets of low join selectivity, the number of blocks processed for Filter trees is excessive compared to the number of blocks that have intersecting entities. The goal of this work is to provide a method for accelerating spatial join operations by using Spatial Join Bitmap (SJB) indices. The file organization is based on the concepts introduced in Filter trees. The SJB indices keep track of blocks that have intersecting entities and make the algorithm process only those blocks. We provide algorithms for generating SJB indices dynamically and for maintaining SJB indices when the data sets are updated. Although maintaining SJB indices for updates increases the cost in terms of response time, the cost saving in terms of the join operation is substantial and this makes the overall behaviour of the spatial system very efficient. We have performed an extensive study using both real and synthetic data sets of various data distributions. The results show that the use of SJB indices produces a substantial speed-up, ranging from 25% to 150% when compared to Filter trees. This method is highly beneficial in a real world scenario, as the number of times the data set is updated is fairly low when compared to the number of times the join processing is done on the data sets. The spatial indexing structures can be extended to handle data of higher dimensions including time. The position of the geometries, like points, lines, areas or volumes changing over time, represents moving objects. The need for storing and processing moving object data arises in a wide range of applications, including digital battlefields (battlefield simulators), air-traffic control, and mobile communication systems. The successive locations of the object are gathered as the object moves around the space and the locations that are ordered in time are interpolated to obtain the movement of the object: this is called as trajectory of the object. R-tree variations, such as the three dimensional R-trees, TB-trees, FNR trees, STR trees, MON trees and SETI trees, are found to be effective for storing and manipulating past locations of moving objects. The SETI tree is a combination of the R-tree in the time dimension and the partition based technique in the space dimension, and outperforms the other R-tree indexing structures in handling coordinate based queries. However, SETI increases the computational time when handling trajectory queries that retrieve the whole or part of the trajectories. We propose a methodology for using the recursive partitioning technique for indexing trajectories, called the Recursively Partitioned Trajectory Index (RPTI). RPTI uses a two-level indexing structure that is similar to the SETI and maintains separate indices for the space and time dimensions. However, the splitting of trajectory segments in SETI, which increases the computational time, does not arise in RPTI. We present the algorithms for constructing the RPTI and the algorithms for updates, which include insertion and deletion. We have conducted an experimental study of this method and have demonstrated that RPTI is better than SETI in handling trajectory queries and is competitive with SETI in handling coordinate based queries. Contrary to the SETI structure, RPTI recursively partitions the space and avoids the splitting of line segments, making it efficient for query processing. Deletion is often ignored while proposing a trajectory index as a result of the assumption that deleting the trajectory of a moving object is meaningless after the transmitted positions are recorded. However, deletions are necessary when the trajectory of a moving object is no longer useful. We have also shown that deletion of a trajectory can be efficiently done using the RPTI structure. The structure of RPTI can be easily implemented by using any of the existing spatial indexing structures. The only design parameters required are the standard disk page size and maximum level of recursive partitioning. However, in SETI, the number of spatial partitions, which is a crucial parameter in any spatial partitioning strategy, is highly dependent on the distribution of data sets.
  • Item
    Thumbnail Image
    Shifting in the n-Cube: online mistake bounds and the sample compression conjecture
    Rubinstein, Benjamin I. P. ( 2009)
    This thesis explores near-optimal bounds on worst-case expected risk in supervised classification, and the Sample Compressibility Conjecture of Littlestone and Warmuth. These topics are related by the one-inclusion graph, an important data structure in learning theory.
  • Item
    Thumbnail Image
    On the Pareto-Following Variation Operator for fast converging Multiobjective Evolutionary Algorithms
    Talukder, A. K. M. K. A. ( 2008)
    The focus of this research is to provide an efficient approach to deal with computationally expensive Multiobjective Optimization Problems (MOP’s). Typically, approximation or surrogate based techniques are adopted to reduce the computational cost. In such cases, the original expensive objective function is replaced by a cheaper mathematical model, where this model mimics the behavior/input-output (i.e. design variable – objective value) relationship. However, it is difficult to model an exact substitute of the targeted objective function. Furthermore, if this kind of approach is used in an evolutionary search, literally, the number of function evaluations does not reduce (i.e. The number of original function evaluation is replaced by the number of surrogate/approximate function evaluation). However, if a large number of individuals are considered, the surrogate model fails to offer smaller computational cost. To tackle this problem, we have reformulated the concept of surrogate modeling in a different way, which is more suitable for the Multiobjective Evolutionary Algorithm(MOEA) paradigm. In our approach, we do not approximate the objective function; rather we model the input-output behavior of the underlying MOEA itself. The model attempts to identify the search path (in both design-variable and objective spaces) and from this trajectory the model is guaranteed to generate non-dominated solutions (especially, during the initial iterations of the underlying MOEA – with respect to the current solutions) for the next iterations of the MOEA. Therefore, the MOEA can avoid re-evaluating the dominated solutions and thus can save large amount of computational cost due to expensive function evaluations. We have designed our approximation model as a variation operator – that follows the trajectory of the fronts and can be “plugged-in” to any kind of MOEA where non-domination based selection is used. Hence it is termed– the “Pareto-Following Variation Operator (PFVO)”. This approach also provides some added advantage that we can still use the original objective function and thus the search procedure becomes robust and suitable, especially for dynamic problems. We have integrated the model into three base-line MOEA’s: “Non-dominated Sorting Genetic Algorithm - II (NSGA-II)”, “Strength Pareto Evolutionary Algorithm - II (SPEAII)”and the recently proposed “Regularity Model Based Estimation of Distribution Algorithm (RM-MEDA)”. We have also conducted an exhaustive simulation study using several benchmark MOP’s. Detailed performance and statistical analysis reveals promising results. As an extension, we have implemented our idea for dynamic MOP’s. We have also integrated PFVO into diffusion based/cellular MOEA in a distributed/Grid environment. Most experimental results and analysis reveal that PFVO can be used as a performance enhancement tool for any kind of MOEA.