We review a number of issues regarding missing data treatments for intervention and prevention researchers. Many of the common missing data practices in prevention research are still, unfortunately, ill-advised (e.g., use of listwise and pairwise deletion, insufficient use of auxiliary variables). Our goal is to promote better practice in the handling of missing data. We review the current state of missing data methodology and recent missing data reporting in prevention research. We describe antiquated, ad hoc missing data treatments and discuss their limitations. We discuss two modern, principled missing data treatments: multiple imputation and full information maximum likelihood, and we offer practical tips on how to best employ these methods in prevention research. The principled missing data treatments that we discuss are couched in terms of how they improve causal and statistical inference in the prevention sciences. Our recommendations are firmly grounded in missing data theory and well-validated statistical principles for handling the missing data issues that are ubiquitous in biosocial and prevention research. We augment our broad survey of missing data analysis with references to more exhaustive resources.
KeywordsMissing data Multiple imputation Full information maximum likelihood Auxiliary variables Intent-to-treat Statistical inference
The authors wish to acknowledge the diligent assistance of Jacob Curtis, Brooke Bell, Naomi Norwid, Virginia Stokes, and Jacquelyn Wall in preparing the systematic literature review presented in this article.
Compliance with Ethical Standards
Conflict of Interest
Todd D. Little owns and receives remuneration from Yhat Enterprises (yhatenterprises.com), which runs educational workshops such as Stats Camp (statscamp.org), and processes his royalties and his fees for consulting on statistics and methods with life science researchers.
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed consent was obtained from all individual participants included in the study.
This study was supported by grant NSF 1053160 (Wei Wu and Todd D. Little, co-PIs) and by the Institute for Measurement, Methodology, Analysis, and Policy (Todd D. Little, Director) at Texas Tech University.
- Arbuckle, J. L. (1996). Full information estimation in the presence of incomplete data. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling (pp. 243–277). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.Google Scholar
- Carpenter, J. R., & Kenward, M. G. (2013). Multiple imputation and its application. Chichester, West Sussex: Wiley.Google Scholar
- Enders, C. K. (2010). Applied missing data analysis. New York: Guilford.Google Scholar
- Goldstein, H., Carpenter, J., & Browne, W. J. (2014). Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non-linear terms. Journal of the Royal Statistical Society: Series A (Statistics in Society), 177, 553–564. doi: 10.1111/rssa.12022.CrossRefGoogle Scholar
- Heckman, J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. The Annals of Economic and Social Measurement, 5, 475–492.Google Scholar
- Little, T. D., Lang, K. M., Wu, W., & Rhemtulla, M. (2016). Missing data. In D. Cicchetti (Ed.), Developmental Psychopathology: Vol. 1. Theory and method (3rd ed., pp. 760–796). New York: Wiley.Google Scholar
- Raghunathan, T. E., Lepkowski, J. M., Van Hoewyk, J., & Solenberger, P. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology, 27, 85–96.Google Scholar
- Rubin, D. B. (1978). Multiple imputations in sample surveys—a phenomenological Bayesian approach to nonresponse (Proceedings of the Survey Research Methods Section of the American Statistical Association, pp. 30–34).Google Scholar
- van Buuren, S. (2011). Multiple imputation of multilevel data. In J. Hox & J. Roberts (Eds.), Handbook of advanced multilevel analysis (pp. 173–196). Milton Park, UK: Routledge.Google Scholar
- Zhao, J. H., & Schafer, J. L. (2013). pan: multiple imputation for multivariate panel or clustered data (Version 0.9) [R Package].Google Scholar
- Zhao, E., & Yucel, R. M. (2009). Performance of sequential imputation method in multilevel applications. In the Proceedings of the American Statistical Association Survey Research Methods Section (pp. 2800–2810).Google Scholar