Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 5 of 5
  • Item
    Thumbnail Image
    Autoregressive generative models and multi-task learning with convolutional neural networks
    Schimbinschi, Florin ( 2018)
    At a high level, sequence modelling problems are of the form where the model aims to predict the next element of a sequence based on neighbouring items. Common types of applications include time-series forecasting, language modelling, machine translation and more recently, adversarial learning. One main characteristic of such models is that they assume that there is an underlying learnable structure behind the data generation process, such as it is for language. Therefore, the models used have to go beyond traditional linear or discrete hidden state models. Convolutional Neural Networks (CNNs) are the de facto state of the art in computer vision. Conversely, for sequence modelling and multi-task learning (MTL) problems, the most common choice are Recurrent Neural Networks (RNNs). In this thesis I show that causal CNNs can be successfully and efficiently used for a broad range of sequence modelling and multi-task learning problems. This is supported by applying CNNs to two very different domains, which highlight their flexibility and performance: 1) traffic forecasting in the context of highly dynamic road conditions with non-stationary data and normal granularity (sampling rate) and a high spatial volume of related tasks; 2) learning musical instrument synthesisers with stationary data and a very high granularity (high sampling rate raw waveforms) and thus a high temporal volume, and conditional side information. In the first case, the challenge is to leverage the complex interactions between tasks while keeping the streaming (online) forecasting process tractable and robust to faults and changes (adding or removing tasks). In the second case, the problem is highly related to language modelling, although much more difficult since, unlike words, multiple musical notes can be played at the same time, therefore making the task much more challenging. With the ascent of the Internet of Things (IoT) and Big Data becoming more common, new challenges arise. The four V‘s of Big Data (Volume, Velocity, Variety and Veracity) are studied in the context of multi-task learning for spatio-temporal (ST) prediction problems. These aspects are studied in the first part of this thesis. Traditionally such problems are addressed with static, non-modular linear models that do not leverage Big Data. I discuss what the four V‘s imply for multi-task ST problems and finally show how CNNs can be set up as efficient classifiers for such problems, if the quantization is properly set up for non-stationary data. While the first part is predominantly data-centric, focused on aspects such as Volume (is it useful?) and Veracity (how to deal with missing data?) the second part of the thesis addresses the Velocity and Variety challenges. I also show that even for prediction problems set up as regression, causal CNNs are still the best performing model as compared to state of the art algorithms such as SVRS and more traditional methods such as ARIMA. I introduce TRU-VAR (Topologically Regularized Universal Vector AutoRegression) which, as I show, is a robust, versatile real-time multi-task forecasting framework which leverages domain-specific knowledge (task topology), the Variety (task diversity) and Velocity (online training). Finally, the last part of this thesis is focused on generative CNN models. The main contribution is the SynthNet architecture which is the first capable of learning musical instrument synthesisers end-to-end. The architecture is derived by following a parsimonious approach (reducing complexity) and via an in-depth analysis of the learned representations of the baseline architectures. I show that the 2D projection of each layer gram activations can correspond to resonating frequencies (which gives each musical instrument it‘s timbre). SynthNet trains much faster and it’s generation accuracy is much higher than the baselines. The generated waveforms are almost identical to the ground truth. This has implications in other domains where the the goal is to generate data with similar properties as the data generation process (i.e. adversarial examples). In summary, this thesis makes contributions towards multi-task spatio-temporal time series problems with causal CNNs (set up as both classification and regression) and generative CNN models. The achievements of this thesis are supported by publications which contain an extensive set of experiments and theoretical foundations.
  • Item
    Thumbnail Image
    Scalable and accurate forecasting for smart cities
    Karunaratne, Pasan Manura ( 2018)
    Cities are getting bigger, better and smarter. The increased connectivity of people and devices and the availability of cheap sensors has led to a surge in public and government interest in smart city initiatives. This public interest, along with the recent increased interest in machine learning techniques has led to growing research focus into the mining and analysis of data in smart city settings. Much of the analysis in smart city settings is based on forecasting on time series data recorded by smart sensors for planning purposes. For example, utility companies can use electricity load forecasting on smart meter data for capacity planning, and prediction of pedestrian counts and passenger flow in public transportation systems can help in planning to reduce traffic congestion. Though forecasting in smart city settings yields such benefits, it also entails unique challenges, such as challenges related to multi-step prediction, challenges related to low quality training data due to sensors encountering vandalism, malfunction or communication failures, and challenges in maintaining predictive throughput in systems involving increasingly larger numbers of smart sensors. Improving accuracy is a primary goal in any forecasting task, which is especially challenging in multi-step prediction scenarios. We address this challenge by providing new methods to incorporate prior knowledge uniquely relevant to smart cities, such as the periodic behaviour of sensor time series data over the Monday-Friday working week. Specifically, we propose novel kernel function compositions which can incorporate such prior knowledge to kernel-based Bayesian forecasting techniques, with the goal of improving prediction accuracy and robustness to spurious data. We develop our kernel compositions for the state of the art Gaussian Process Regression technique. The new kernel compositions we develop enable prior knowledge relating to multiple periodic effects of the working week (e.g. daily, weekly, holiday effects) and their interactions to be incorporated in the same model. We also provide methods to mitigate the effects of convergence to local optima in the optimisation process over the hyperparameters used in the Gaussian Process models. We address the challenges relating to missing training data in smart city settings by making use of data of other related sensors (which may have more complete data) to mitigate the impact the low quality data has on prediction accuracy. To this end, we develop multi-task learning methods (which are able to learn joint representations from multiple sensors) to improve Gaussian Process Regression prediction accuracy with missing training data values. We also provide equivalent expressions to our multi-task learning methods as combinations of commonly used kernel functions in Gaussian Processes. This enables the straightforward implementation of these methods in popular machine learning toolkits. We address the scalability challenge of large volumes of sensor data in two steps. One, we focus on an interpretable label-based forecasting algorithm which allows for high-throughput predictions due to the minimal number of operations needed to be done in the forecasting stage. We perform numerous enhancements on this algorithm in order to improve its prediction accuracy, including filtering, windowing and ensembling methods as well as methods of incorporating exogenous variables. Our scalable forecasting methods are then developed using this enhanced base algorithm. We develop methods which enable the initial step of the algorithm to be performed using algorithms developed for stream processing, which not only allows for the algorithm to be parallelised across multiple machines, but also enables it to run on real-time data streams. We address the scalability challenges in scenarios with both a single fast stream and a large number of streams, especially with regard to synchronisation issues between multiple machines. We demonstrate the effectiveness of our methods on multiple real-world publicly available datasets to illustrate the potential generalisability of our techniques.
  • Item
    Thumbnail Image
    Volatility homogenisation and machine learning for time series forecasting
    Kowalewski, Adam Waldemar ( 2016)
    Volatility homogenisation is a technique of looking at a process at regular points in space. In other words, we are only interested when the process moves by a certain quantum. The intuition and empirical evidence behind this is that we ignore smaller movements which are just noise, while only concerning ourselves with larger movements which represent the information from the underlying process. In this vein, we have derived theoretical results showing volatility homogenisation as a means of estimating the drift and volatility of theoretical processes and verify these results by simulation. This demonstrates the ability of a “homogenised” process to retain salient information regarding the underlying process. Volatility homogenisation is then coupled, as a preprocessing step, with various machine learning techniques which yields greater forecasting accuracy than when the machine learning techniques are used without volatility homogenisation preprocessing. In addition to this, we develop volatility homogenisation kernels for machine learning kernel-based techniques such as support vector machines, relevance vector machines and Gaussian processes for machine learning. The volatility homogenisation kernel causes a kernel-based machine learning technique to utilise volatility homogenisation internally and, with it, obtain better predictions on forecasting the direction of a financial time series. In order to create and use the volatility homogenisation kernel, we have developed a solution to the problem of a kernel taking inputs which have dimensions of differing size while still maintaining a convex solution to the model for techniques such as support vector machines, for a given set of parameters. Furthermore, we have demonstrated the efficacy of volatility homogenisation as a way of successfully investing using a Kelly criterion strategy. The strategy makes use of the information inherent in a support vector machine model which uses a volatility homogenisation kernel in order to calculate the necessary parameters for the Kelly betting strategy. We also develop strategies which select additional features for the support vector machine through the use of a nearest neighbour strategy using various measures of association. Overall, volatility homogenisation is a robust strategy for the decomposition of a process which allows various machine learning techniques to discern the main driving process inherent in a financial time series, which leads to better forecasts and investment strategies.
  • Item
    Thumbnail Image
    Online time series approximation and prediction
    Xu, Zhenghua ( 2012)
    In recent years, there are rapidly increasing research interests in the management of time series data due to its importance in a variety of applications such as network traffic management, telecommunications, finance, sensor network and location based services. In this thesis, we focus on two important problems of the management of time series data: the online time series approximation problem and the online time series prediction problem. The time series approximation can reduce the space and the computational cost of storing and transmitting time series data, and also reduce the workload of the data processing. Segmentation is one of the most commonly used methods to meet this requirement. However, while most of the current segmentation methods aim to minimize the holistic error between the approximation and the original time series, few works try to represent time series as compact as possible with an error bound guarantee on each data point. Moreover, in many real world situations, the patterns of the time series do not follow a constant rule such that using only one type of functions may not yield the best compaction. Motivated by these observations, we propose an online segmentation algorithm which approximates time series by a set of different types of candidate functions (poly nomials of different orders, exponential functions, etc.) and adaptively chooses the most compact one as the pattern of the time series changes. A challenge in this approach is to determine the approximation function on the fly (“online”). Thereby, we further present a novel method to efficiently generate the compact approximation of a time series in an online fashion for several types of candidate functions. This method incrementally narrows the feasible coefficient spaces of candidate functions in coefficient coordinate systems such that it can make each segment as long as possible given an error bound on each data point. Extensive experimental results show that our algorithm generates more compact approximations of the time series with lower average errors than the state-of-the-art algorithm. The time series prediction aims to predict future values of the time series according to their previously observed values. In this thesis, we focus our work on a specific branch of the time series prediction: the online locational time series prediction problem, i.e, predicting the final locations of the locational time series according to their current observed locational time series values on the fly. This problem is also called the destination prediction problem and the systems used to solve this problem are the called destination prediction systems. However, most of current destination prediction systems are based on the traveling histories of specific users so they are too ad hoc to accurately predict other users’ destination. In our work, we propose the first generic destination prediction solution to provide the accurate destination prediction services to all users based on the given user queries (i.e., the users’ current partial traveling trajectories) without knowing any traveling histories of the users. Extensive experiments validate that the prediction result of our method is more accurate than that of the competing method.
  • Item
    Thumbnail Image
    Multi-resolution indexing method for time series
    Ma, Mei ( 2010)
    Time series datasets are useful in a wide range of diverse real world applications. Retrieving or querying from a collection of time series is a fundamental task, with a key example being the similarity query. A similarity query returns all time series from the collection that are similar to a given reference time series. This type of query is particularly useful in prediction and forecasting applications. A key challenge for similarity queries is efficiency and for large datasets, it is important to develop efficient indexing techniques. Existing approaches in this area are mainly based on the Generic Multimedia Indexing Method (GEMINI), which is a framework that uses spatial indexes such as the R-tree to index reduced time series. For processing a similarity query, the index is first used to prune candidate time series using a lower bounding distance. Then, all remaining time series are compared using the original similarity measure, to derive the query result. Performance within this framework depends on the tightness of the lower bounding distance with respect to the similarity measure. Indeed much work has been focused on representation and dimensionality reduction, in order to provide a tighter lower bounding distance. Existing work, however, has not used employed dimensionality reduction in a flexible way, requiring all time series to be reduced to have the same dimension. In contrast, in this thesis, we investigate the possibility of allowing a variable dimension reduction. To this end, we develop a new and more flexible tree based indexing structure called the Multi-Resolution Index (MR-Index), which allows dimensionality to vary across different levels of the tree. We provide efficient algorithms for querying, building and maintaining this structure. Through an experimental analysis, we show that the MR-Index can deliver improved query efficiency compared to the traditional R-tree index, using both the Euclidean and dynamic time warping similarity measures.