## Missing data analysis, combinatorial model selection and structure learning

##### Download

##### Citations

**Altmetric**

##### Author

Kwok, Chun Fung##### Date

2019##### Affiliation

School of Mathematics and Statistics##### Metadata

Show full item record##### Document Type

PhD thesis##### Access Status

**Open Access**

##### Description

© 2019 Chun Fung Kwok

##### Abstract

This thesis examines three problems in statistics: the missing data problem in the context of extracting trends from time series data, the combinatorial model selection problem in regression analysis, and the structure learning problem in graphical modelling / system identification.
The goal of the first problem is to study how uncertainty in the missing data affects trend extraction. This work derives an analytical bound to characterise the error of the estimated trend in terms of the error of the imputation. It works for any imputation method and various trend-extraction methods, including a large subclass of linear filters and the Seasonal-Trend decomposition based on Loess (STL).
The second problem is to tackle the combinatorial complexity which arises from the best-subset selection in regression analysis. Given p variables, a model can be formed by taking a subset of the variables, and the total number of models p is $2^p$. This work shows that if a hierarchical structure can be established on the model space, then the proposed algorithm, Gibbs Stochastic Search (GSS), can recover the true model with probability one in the limit and high probability with finite samples. The core idea is that when a hierarchical structure exists, every evaluation of a wrong model would give information about the correct model. By aggregating these information, one may recover the correct model without exhausting the model space. As an extension, parallelisation of the algorithm is also considered.
The third problem is about inferring from data the systemic relationship between a set of variables. This work proposes a flexible class of multivariate distributions in a form of a directed acyclic graphical model, which uses a graph and models each node conditioning on the rest using a Generalised Linear Model (GLM), and it shows that while the number of possible graphs is $\Omega(2^{p \choose 2})$, a hierarchical structure exists and the GSS algorithm applies. Hence, a systemic relationship may be recovered from the data. Other applications like imputing missing data and simulating data with complex covariance structure are also investigated.

##### Keywords

trend extraction; STL; Loess; linear filters; time series analysis; missing data analysis; combinatorial model selection; stochastic search; generalised linear model; GLM; best subset model selection; Gibbs sampler; Markov chain Monte Carlo; structure learning; graphical modelsExport Reference in RIS Format

## Endnote

- Click on "Export Reference in RIS Format" and choose "open with... Endnote".

## Refworks

- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References