Spline techniques for incomplete and complex data
AffiliationSchool of Mathematics and Statistics
Document TypePhD thesis
Access StatusThis item is embargoed and will be available on 2021-03-07.
© 2018 Dr. Wei Huang
We consider incomplete data problems in two different complex data contexts: group testing data and functional data. In the group testing data context, we consider estimating the conditional prevalence of a disease from data pooled according to the group testing mechanism. Consistent estimators have been proposed in the literature, but they rely on the data being available for all individuals. In infectious disease studies where group testing is frequently applied, the covariate is often missing for some individuals. There, unless the missing mechanism occurs completely at random, applying the existing techniques to the complete cases without adjusting for missingness does not generally provide consistent estimators, and finding appropriate modifications is challenging. We develop a consistent adjusted spline estimator, derive its theoretical properties, and show how to adapt local polynomial and likelihood estimators to the missing data problem. We illustrate the numerical performance of our methods on simulated and real examples. In the functional data context, we consider the problem of estimating the covariance function of functional data which are only observed on a subset of their domain, for example in the form of fragments observed on a small interval. Typically in this setting, no curve is observed on the entire domain so that the empirical covariance function or smooth versions of it can be computed only on a subset of its domain which typically consists in a diagonal band. We show that estimating the covariance function consistently outside that subset is possible and introduce conditions under which the covariance function is identifiable on its entire domain from the incomplete data. We propose to estimate the covariance on the observed subdomain first and extrapolate that to the entire domain by a tensor product series approximation. While implementing our idea on the covariance estimation of the incomplete functional data, we found that the final extrapolated estimator was sensitive to the covariance estimator on the observed subdomain and that some smoothing over the subdomain was sometimes needed. However, smoothing over those subdomains is not straightforward since the subdomains are irregularly shaped and can even have interior gaps, where conventional smoothing methods are usually not applicable. We proposed a tensor product spline technique adapted to the irregularly shaped domain with interior gaps. A thorough introduction to how to construct the estimator in practice is given. We also give a review of various smoothing methods for irregularly shaped domains in the literature, and investigate and compare the finite sample properties of these techniques.
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References