Evaluation of multiple imputation methods for dealing with missing longitudinal data
Citations
Altmetric
Author
De Silva, Anurika PriyanjaliDate
2018Affiliation
Melbourne School of Population and Global HealthMetadata
Show full item recordDocument Type
PhD thesisAccess Status
This item is embargoed and will be available on 2021-02-13. This item is currently available to University of Melbourne staff and students only, login required.Description
© 2018 Dr. Anurika Priyanjali De Silva
Abstract
Background: Missing data is a common problem in epidemiological studies and is especially prominent in longitudinal cohorts, as these studies require the participation of respondents at multiple waves. The statistical literature contains extensive research on handling missing data at a single time point, with multiple imputation (MI) being a widely used approach. However, there is limited guidance on using MI in complex longitudinal settings. My PhD focuses on the evaluation of MI methods in three specific settings commonly encountered in practice: 1) a time-dependent exposure with a non-linear trajectory over time, 2) a longitudinal categorical exposure with restrictions on transitions over time and 3) a time-dependent outcome variable in a longitudinal study with sampling weights.
Methods: I evaluated three MI methods currently available in the Stata statistical software; multivariate normal imputation (MVNI), fully conditional specification (FCS) and the two-fold fully conditional specification (two-fold FCS) algorithm. When handling missing longitudinal data MVNI and FCS treat repeated measurements of the same variables as distinct variables, and face convergence issues when there are many time points and/or many incomplete variables. The two-fold FCS algorithm was introduced to overcome these limitations as it only uses information from current and adjacent time points for imputation.
In each scenario, various versions of the MI methods were evaluated using comprehensive simulation studies based on the Longitudinal Study of Australian Children (LSAC). The performance of these methods was evaluated for varying percentages of missing data where data were either missing completely at random or missing at random. The methods were also compared using case studies from LSAC.
Results: MVNI and FCS performed adequately when handling the incomplete time-dependent exposure with a non-linear trajectory, demonstrating the importance of including as much information as possible in the imputation model. If faced with convergence problems, the two-fold FCS may be used as long as there is a sufficiently large time window to capture the non-linear trajectory.
Predictive mean matching within the FCS framework performed best for imputing an incomplete categorical variable with restrictions over time, while all other implementations of FCS faced convergence problems. MVNI followed by rounding to transform non-integer imputed values into original categories resulted in biased estimates. It was found that it is important to account for restrictions within the imputation procedure.
All implementations of FCS incorporating sampling weights faced issues of convergence. Meanwhile, MVNI including the design variables used to generate sampling weights in the imputation model performed well. If these variables are unknown, information from sampling weights can be incorporated within the imputation model by including sampling weights or the design stratum indicator as a fixed effect in the imputation model.
Conclusions: While existing MI methods, MVNI and FCS, can be used to handle incomplete longitudinal data, their performance varies depending on the scenario suggesting that MI should be customised to the given setting. This research provides guidance on how to do so in specific scenarios contributing to the literature on missing data methodology. Researchers are encouraged to be aware of these developments to ensure that missing data are handled effectively.
Keywords
multiple imputation; missing longitudinal data; multivariate normal imputation; fully conditional specification; sampling weights; restricted categorical variables; non-linear trajectoriesExport Reference in RIS Format
Endnote
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
Refworks
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References