Autoregressive generative models and multi-task learning with convolutional neural networks
AffiliationComputing and Information Systems
Document TypePhD thesis
Access StatusOpen Access
© 2018 Dr Florin Schimbinschi
At a high level, sequence modelling problems are of the form where the model aims to predict the next element of a sequence based on neighbouring items. Common types of applications include time-series forecasting, language modelling, machine translation and more recently, adversarial learning. One main characteristic of such models is that they assume that there is an underlying learnable structure behind the data generation process, such as it is for language. Therefore, the models used have to go beyond traditional linear or discrete hidden state models. Convolutional Neural Networks (CNNs) are the de facto state of the art in computer vision. Conversely, for sequence modelling and multi-task learning (MTL) problems, the most common choice are Recurrent Neural Networks (RNNs). In this thesis I show that causal CNNs can be successfully and efficiently used for a broad range of sequence modelling and multi-task learning problems. This is supported by applying CNNs to two very different domains, which highlight their flexibility and performance: 1) traffic forecasting in the context of highly dynamic road conditions with non-stationary data and normal granularity (sampling rate) and a high spatial volume of related tasks; 2) learning musical instrument synthesisers with stationary data and a very high granularity (high sampling rate raw waveforms) and thus a high temporal volume, and conditional side information. In the first case, the challenge is to leverage the complex interactions between tasks while keeping the streaming (online) forecasting process tractable and robust to faults and changes (adding or removing tasks). In the second case, the problem is highly related to language modelling, although much more difficult since, unlike words, multiple musical notes can be played at the same time, therefore making the task much more challenging. With the ascent of the Internet of Things (IoT) and Big Data becoming more common, new challenges arise. The four V‘s of Big Data (Volume, Velocity, Variety and Veracity) are studied in the context of multi-task learning for spatio-temporal (ST) prediction problems. These aspects are studied in the first part of this thesis. Traditionally such problems are addressed with static, non-modular linear models that do not leverage Big Data. I discuss what the four V‘s imply for multi-task ST problems and finally show how CNNs can be set up as efficient classifiers for such problems, if the quantization is properly set up for non-stationary data. While the first part is predominantly data-centric, focused on aspects such as Volume (is it useful?) and Veracity (how to deal with missing data?) the second part of the thesis addresses the Velocity and Variety challenges. I also show that even for prediction problems set up as regression, causal CNNs are still the best performing model as compared to state of the art algorithms such as SVRS and more traditional methods such as ARIMA. I introduce TRU-VAR (Topologically Regularized Universal Vector AutoRegression) which, as I show, is a robust, versatile real-time multi-task forecasting framework which leverages domain-specific knowledge (task topology), the Variety (task diversity) and Velocity (online training). Finally, the last part of this thesis is focused on generative CNN models. The main contribution is the SynthNet architecture which is the first capable of learning musical instrument synthesisers end-to-end. The architecture is derived by following a parsimonious approach (reducing complexity) and via an in-depth analysis of the learned representations of the baseline architectures. I show that the 2D projection of each layer gram activations can correspond to resonating frequencies (which gives each musical instrument it‘s timbre). SynthNet trains much faster and it’s generation accuracy is much higher than the baselines. The generated waveforms are almost identical to the ground truth. This has implications in other domains where the the goal is to generate data with similar properties as the data generation process (i.e. adversarial examples). In summary, this thesis makes contributions towards multi-task spatio-temporal time series problems with causal CNNs (set up as both classification and regression) and generative CNN models. The achievements of this thesis are supported by publications which contain an extensive set of experiments and theoretical foundations.
Keywordsdeep learning; neural networks; generative models; autoregressive generative; time series; multi-task learning; convolutional neural networks; multivariate regression; vector autoregression; traffic forecasting; audio synthesisers; music; traffic; cars; urban mobility
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References