Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 1 of 1
  • Item
    Thumbnail Image
    Distribution enhanced data mining methods for time series forecasting
    Ristanoski, Goce ( 2014)
    Time series forecasting is an exciting research area whose challenges are to discover patterns from data that has been observed over time. Whether it is stock market variables concerning price behaviour, or weather measurements that can be used to issue a timely warning of an approaching cyclone, or radiation level measurements that can prevent a harmful disaster and save hundreds of lives in a power plant, time series find useful applications in many diverse sciences and disciplines. The behaviour of time series is susceptible to changes, often reflected through distribution features such as mean and variance. Though changes in the series are to be expected, they are of mostly a continuous nature, meaning that predictions to a certain point in the future can be made with a high degree of accuracy. Understanding how these changes affect the prediction accuracy is crucial to minimizing the forecasted error. This thesis investigates utilising information about changes in the series, and presents novel modifications in the learning process based on the knowledge gained about these changes. There are four main contributions presented in this thesis, which deliver innovative techniques for time series analysis by incorporating distribution characteristics. In the first part of the thesis we develop a pre-processing algorithm that uses distribution information which can easily accompany any prediction model we choose to work with. Our algorithm performs a fast and efficient reduction of the training samples depending on the change of the mean of the distribution, leaving a set of samples with a larger concentration of useful information and reduced noise. In the next part of our work, we introduce an intelligent group-based error minimization algorithm, which simultaneously achieves reduction of both mean and variance of the forecasted errors, associated with groups of observations with similar distribution. We demonstrate how this sensitive grouping of samples reduces both the error and variance of the error per group, embodied in a modified linear regression algorithm. We then introduce a modified form of Support Vector Regression that detects potential large-error producing samples, and which penalizes the loss for these samples by using a time-sensitive loss, for directly targeting a reduction of the variance in the forecasted errors. This new approach achieves competitive reduction in the error variance and produces more accurate predictions, with performance better than several state of the art methods. Finally, we apply our technical methodologies for the purpose of discrimination aware learning, where we demonstrate how they can be modified and used in another research context.