Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 3 of 3
  • Item
    Thumbnail Image
    Online time series approximation and prediction
    Xu, Zhenghua ( 2012)
    In recent years, there are rapidly increasing research interests in the management of time series data due to its importance in a variety of applications such as network traffic management, telecommunications, finance, sensor network and location based services. In this thesis, we focus on two important problems of the management of time series data: the online time series approximation problem and the online time series prediction problem. The time series approximation can reduce the space and the computational cost of storing and transmitting time series data, and also reduce the workload of the data processing. Segmentation is one of the most commonly used methods to meet this requirement. However, while most of the current segmentation methods aim to minimize the holistic error between the approximation and the original time series, few works try to represent time series as compact as possible with an error bound guarantee on each data point. Moreover, in many real world situations, the patterns of the time series do not follow a constant rule such that using only one type of functions may not yield the best compaction. Motivated by these observations, we propose an online segmentation algorithm which approximates time series by a set of different types of candidate functions (poly nomials of different orders, exponential functions, etc.) and adaptively chooses the most compact one as the pattern of the time series changes. A challenge in this approach is to determine the approximation function on the fly (“online”). Thereby, we further present a novel method to efficiently generate the compact approximation of a time series in an online fashion for several types of candidate functions. This method incrementally narrows the feasible coefficient spaces of candidate functions in coefficient coordinate systems such that it can make each segment as long as possible given an error bound on each data point. Extensive experimental results show that our algorithm generates more compact approximations of the time series with lower average errors than the state-of-the-art algorithm. The time series prediction aims to predict future values of the time series according to their previously observed values. In this thesis, we focus our work on a specific branch of the time series prediction: the online locational time series prediction problem, i.e, predicting the final locations of the locational time series according to their current observed locational time series values on the fly. This problem is also called the destination prediction problem and the systems used to solve this problem are the called destination prediction systems. However, most of current destination prediction systems are based on the traveling histories of specific users so they are too ad hoc to accurately predict other users’ destination. In our work, we propose the first generic destination prediction solution to provide the accurate destination prediction services to all users based on the given user queries (i.e., the users’ current partial traveling trajectories) without knowing any traveling histories of the users. Extensive experiments validate that the prediction result of our method is more accurate than that of the competing method.
  • Item
    Thumbnail Image
    Multi-resolution indexing method for time series
    Ma, Mei ( 2010)
    Time series datasets are useful in a wide range of diverse real world applications. Retrieving or querying from a collection of time series is a fundamental task, with a key example being the similarity query. A similarity query returns all time series from the collection that are similar to a given reference time series. This type of query is particularly useful in prediction and forecasting applications. A key challenge for similarity queries is efficiency and for large datasets, it is important to develop efficient indexing techniques. Existing approaches in this area are mainly based on the Generic Multimedia Indexing Method (GEMINI), which is a framework that uses spatial indexes such as the R-tree to index reduced time series. For processing a similarity query, the index is first used to prune candidate time series using a lower bounding distance. Then, all remaining time series are compared using the original similarity measure, to derive the query result. Performance within this framework depends on the tightness of the lower bounding distance with respect to the similarity measure. Indeed much work has been focused on representation and dimensionality reduction, in order to provide a tighter lower bounding distance. Existing work, however, has not used employed dimensionality reduction in a flexible way, requiring all time series to be reduced to have the same dimension. In contrast, in this thesis, we investigate the possibility of allowing a variable dimension reduction. To this end, we develop a new and more flexible tree based indexing structure called the Multi-Resolution Index (MR-Index), which allows dimensionality to vary across different levels of the tree. We provide efficient algorithms for querying, building and maintaining this structure. Through an experimental analysis, we show that the MR-Index can deliver improved query efficiency compared to the traditional R-tree index, using both the Euclidean and dynamic time warping similarity measures.
  • Item
    Thumbnail Image
    A dynamic approximate representation scheme for streaming time series
    Zhou, Pu ( 2009)
    The huge volume of time series data generated in many applications poses new challenges in the techniques of data storage, transmission, and computation. Further more, when the time series are in the form of streaming data, new problems emerge and new techniques are required because of the streaming characteristics, e.g. high volume, high speed and continuous flowing. Approximate representation is one of the most efficient and effective solutions to address the large-volume-high-speed problem. In this thesis, we propose a dynamic representation scheme for streaming time series. Existing methods use a unitary function form for the entire approximation task. In contrast, our method adopts a set of function candidates such as linear function, polynomial function(degree ≥ 2), and exponential function. We provide a novel segmenting strategy to generate subsequences and dynamically choose candidate functions to approximate the subsequences. Since we are dealing with streaming time series, the segmenting points and the corresponding approximate functions are incrementally produced. For a certain function form, we use a buffer window to find the local farthest possible segmenting point under a user specified error tolerance threshold. To achieve this goal, we define a feasible space for the coefficients of the function and show that we can indirectly find the local best segmenting point by the calculation in the coefficient space. Given the error tolerance threshold, the candidate function representing more information by unit parameter is chosen as the approximate function. Therefore, our representation scheme is more flexible and compact. We provide two dynamic algorithms, PLQS and PLQES, which involve two and three candidate functions, respectively. We also present the general strategy of function selection when more candidate functions are considered. In the experimental test, we examine the effectiveness of our algorithms with synthetic and real time series data sets. We compare our method with the piecewise linear approximation method and the experimental results demonstrate the evident superiority of our dynamic approach under the same error tolerance threshold.