TY - THES
AU - Kwok, Chun Fung
Y2 - 2019/10/15
Y1 - 2019
UR - http://hdl.handle.net/11343/228925
AB - This thesis examines three problems in statistics: the missing data problem in the context of extracting trends from time series data, the combinatorial model selection problem in regression analysis, and the structure learning problem in graphical modelling / system identification.
The goal of the first problem is to study how uncertainty in the missing data affects trend extraction. This work derives an analytical bound to characterise the error of the estimated trend in terms of the error of the imputation. It works for any imputation method and various trend-extraction methods, including a large subclass of linear filters and the Seasonal-Trend decomposition based on Loess (STL).
The second problem is to tackle the combinatorial complexity which arises from the best-subset selection in regression analysis. Given p variables, a model can be formed by taking a subset of the variables, and the total number of models p is $2^p$. This work shows that if a hierarchical structure can be established on the model space, then the proposed algorithm, Gibbs Stochastic Search (GSS), can recover the true model with probability one in the limit and high probability with finite samples. The core idea is that when a hierarchical structure exists, every evaluation of a wrong model would give information about the correct model. By aggregating these information, one may recover the correct model without exhausting the model space. As an extension, parallelisation of the algorithm is also considered.
The third problem is about inferring from data the systemic relationship between a set of variables. This work proposes a flexible class of multivariate distributions in a form of a directed acyclic graphical model, which uses a graph and models each node conditioning on the rest using a Generalised Linear Model (GLM), and it shows that while the number of possible graphs is $\Omega(2^{p \choose 2})$, a hierarchical structure exists and the GSS algorithm applies. Hence, a systemic relationship may be recovered from the data. Other applications like imputing missing data and simulating data with complex covariance structure are also investigated.
KW - trend extraction
KW - STL
KW - Loess
KW - linear filters
KW - time series analysis
KW - missing data analysis
KW - combinatorial model selection
KW - stochastic search
KW - generalised linear model
KW - GLM
KW - best subset model selection
KW - Gibbs sampler
KW - Markov chain Monte Carlo
KW - structure learning
KW - graphical models
T1 - Missing data analysis, combinatorial model selection and structure learning
L1 - /bitstream/handle/11343/228925/c8ffd773-bd55-e911-949d-0050568d0279_thesis_final_submission.pdf?sequence=1&isAllowed=y
ER -