Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 2 of 2
  • Item
    Thumbnail Image
    Towards small-effort adaptions to off-the-shelf spatial and temporal indexes for modern database applications
    Stradling, Martin James ( 2012)
    Modern database applications demand high throughput of queries on complex data, such as the increasingly common types of spatial and temporal data. Many structures have been proposed for indexing spatial and temporal data and as standalone implementations they often achieve high levels of performance. However they are generally difficult to implement in an existing DBMS, which makes them very expensive to adopt. For example, Oracle took more than 5 years to implement the R-tree spatial index structure in their commercial DBMS. This thesis examines the techniques that exist in off-the-shelf spatial and temporal indexes with an emphasis on achieving large performance gains with minimal changes. In the domain of spatial data we propose a new index structure called Size Separation Indexing (SSI). This structure builds on the B+-tree, which is present in almost all DBMSs. Through extensive experimentation we show that SSI performs at least as well as all current spatial indexes and better than most on both a flat filesystem and on top of a DBMS. In the domain of temporal data we extend the TSB-tree, which is also based on the B+ -tree. In recent work the TSB-tree has been integrated into Microsoft SQL Server and we introduce the memory Hierarchy-aware Version tree which significantly improves on the performance of the TSB-tree for almost all query types and is scalable to huge datasets. Because our structures build upon existing indexes, we argue that they are better suited for implementation in current systems than other structures, reducing costs and increasing performance.
  • Item
    Thumbnail Image
    Multi-resolution indexing method for time series
    Ma, Mei ( 2010)
    Time series datasets are useful in a wide range of diverse real world applications. Retrieving or querying from a collection of time series is a fundamental task, with a key example being the similarity query. A similarity query returns all time series from the collection that are similar to a given reference time series. This type of query is particularly useful in prediction and forecasting applications. A key challenge for similarity queries is efficiency and for large datasets, it is important to develop efficient indexing techniques. Existing approaches in this area are mainly based on the Generic Multimedia Indexing Method (GEMINI), which is a framework that uses spatial indexes such as the R-tree to index reduced time series. For processing a similarity query, the index is first used to prune candidate time series using a lower bounding distance. Then, all remaining time series are compared using the original similarity measure, to derive the query result. Performance within this framework depends on the tightness of the lower bounding distance with respect to the similarity measure. Indeed much work has been focused on representation and dimensionality reduction, in order to provide a tighter lower bounding distance. Existing work, however, has not used employed dimensionality reduction in a flexible way, requiring all time series to be reduced to have the same dimension. In contrast, in this thesis, we investigate the possibility of allowing a variable dimension reduction. To this end, we develop a new and more flexible tree based indexing structure called the Multi-Resolution Index (MR-Index), which allows dimensionality to vary across different levels of the tree. We provide efficient algorithms for querying, building and maintaining this structure. Through an experimental analysis, we show that the MR-Index can deliver improved query efficiency compared to the traditional R-tree index, using both the Euclidean and dynamic time warping similarity measures.