Principled Missing Data Treatments

Lang, Kyle M.; Little, Todd D.

doi:10.1007/s11121-016-0644-5

Principled Missing Data Treatments

Published: 04 April 2016

Volume 19, pages 284–294, (2018)
Cite this article

Prevention Science Aims and scope Submit manuscript

Kyle M. Lang¹ &
Todd D. Little¹

5864 Accesses
159 Citations
5 Altmetric
Explore all metrics

Abstract

We review a number of issues regarding missing data treatments for intervention and prevention researchers. Many of the common missing data practices in prevention research are still, unfortunately, ill-advised (e.g., use of listwise and pairwise deletion, insufficient use of auxiliary variables). Our goal is to promote better practice in the handling of missing data. We review the current state of missing data methodology and recent missing data reporting in prevention research. We describe antiquated, ad hoc missing data treatments and discuss their limitations. We discuss two modern, principled missing data treatments: multiple imputation and full information maximum likelihood, and we offer practical tips on how to best employ these methods in prevention research. The principled missing data treatments that we discuss are couched in terms of how they improve causal and statistical inference in the prevention sciences. Our recommendations are firmly grounded in missing data theory and well-validated statistical principles for handling the missing data issues that are ubiquitous in biosocial and prevention research. We augment our broad survey of missing data analysis with references to more exhaustive resources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Are All Biases Missing Data Problems?

Article 12 July 2015

Advances in Missing Data Models and Fidelity Issues of Implementing These Methods in Prevention Science

Missing Data Imputation: A Practical Guide

References

Allison, P. D. (2002). Missing data. Thousand Oaks, CA: Sage Publications.
Book Google Scholar
Anderson, T. W. (1957). Maximum likelihood estimates for a multivariate normal distribution when some observations are missing. Journal of the American Statistical Association, 52, 200–203. doi:10.1080/01621459.1957.10501379.
Article Google Scholar
Andridge, R. R., & Little, R. J. A. (2010). A review of hot deck imputation for survey non-response. International Statistical Review, 78, 40–64. doi:10.1111/j.1751-5823.2010.00103.x.
Article PubMed PubMed Central Google Scholar
Arbuckle, J. L. (1996). Full information estimation in the presence of incomplete data. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling (pp. 243–277). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Google Scholar
Bodner, T. E. (2006). Missing data: prevalence and reporting practices. Psychological Reports, 99, 675–680. doi:10.2466/PR0.99.7.675-680.
Article PubMed Google Scholar
Carpenter, J. R., & Kenward, M. G. (2013). Multiple imputation and its application. Chichester, West Sussex: Wiley.
Collins, L. M., Schafer, J. L., & Kam, C. M. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6, 330–351. doi:10.1037//1082-989X.6.4.330.
Article CAS PubMed Google Scholar
Diggle, P., & Kenward, M. G. (1994). Informative dropout in longitudinal data analysis (with discussion). Applied Statistics, 43, 49–94.
Article Google Scholar
Enders, C. K. (2001). The performance of the full information maximum likelihood estimator in multiple regression models with missing data. Educational and Psychological Measurement, 61, 713–740. doi:10.1177/00131640121971482.
Article Google Scholar
Enders, C. K. (2010). Applied missing data analysis. New York: Guilford.
Google Scholar
Enders, C. K., & Bandalos, D. L. (2001). The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Structural Equation Modeling, 8, 430–457. doi:10.1207/S15328007SEM0803_5.
Article Google Scholar
Goldstein, H., Carpenter, J., Kenward, M. G., & Levin, K. A. (2009). Multilevel models with multivariate mixed response types. Statistical Modelling., 9, 173–197. doi:10.1177/1471082X0800900301.
Article Google Scholar
Goldstein, H., Carpenter, J., & Browne, W. J. (2014). Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non-linear terms. Journal of the Royal Statistical Society: Series A (Statistics in Society), 177, 553–564. doi:10.1111/rssa.12022.
Article Google Scholar
Graham, J. W. (2003). Adding missing-data-relevant variables to FIML-based structural equation models. Structural Equation Modeling, 10, 80–100. doi:10.1207/S15328007SEM1001_4.
Article Google Scholar
Graham, J. (2012). Missing data: analysis and design. New York: Springer.
Book Google Scholar
Graham, J. W., Olchowski, A. E., & Gilreath, T. D. (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8, 206–213. doi:10.1007/s11121-007-0070-9.
Article PubMed Google Scholar
Heckman, J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. The Annals of Economic and Social Measurement, 5, 475–492.
Google Scholar
Heckman, J. (1979). Sample selection bias as a specification error. Econometrica, 47, 153–161. doi:10.2307/1912352.
Article Google Scholar
Honaker, J., & King, G. (2010). What to do about missing values in time-series cross-section data. American Journal of Political Science, 54, 561–581. doi:10.1111/j.1540-5907.2010.00447.x.
Article Google Scholar
Honaker, J., King, G., & Blackwell, M. (2011). Amelia II: a program for missing data. Journal of Statistical Software, 45, 1–47.
Article Google Scholar
Howard, W., Rhemtulla, M., & Little, T. D. (2015). Using principal components as auxiliary variables in missing data estimation. Multivariate Behavioral Research, 50, 285–299. doi:10.1080/00273171.2014.999267.
Article PubMed Google Scholar
Little, R. J. A. (1993). Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association, 88, 125–134. doi:10.2307/2290705.
Google Scholar
Little, R. J. A. (1995). Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association, 90, 1112–1121. doi:10.1080/01621459.1995.10476615.
Article Google Scholar
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data. Hoboken, NJ: John Wiley & Sons.
Book Google Scholar
Little, R. J. A., & Yau, L. (1996). Intent-to-treat analysis for longitudinal studies with drop-outs. Biometrics, 52, 1324–1333. doi:10.2307/2532847.
Article CAS PubMed Google Scholar
Little, T. D., Jorgensen, T. D., Lang, K. M., & Moore, E. W. G. (2014). On the joys of missing data. Journal of Pediatric Psychology, 39, 151–162. doi:10.1093/jpepsy/jst048.
Article PubMed Google Scholar
Little, T. D., Lang, K. M., Wu, W., & Rhemtulla, M. (2016). Missing data. In D. Cicchetti (Ed.), Developmental Psychopathology: Vol. 1. Theory and method (3rd ed., pp. 760–796). New York: Wiley.
Liu, M., Taylor, J. M. G., & Belin, T. R. (2000). Multiple imputation and posterior simulation for multivariate missing data in longitudinal studies. Biometrics, 56, 1157–1163. doi:10.1111/j.0006-341X.2000.01157.x.
Article CAS PubMed Google Scholar
Peugh, J. L., & Enders, C. K. (2004). Missing data in educational research: a review of reporting practices and suggestions for improvement. Review of Educational Research, 74, 525–556. doi:10.3102/00346543074004525.
Article Google Scholar
Raghunathan, T. E., Lepkowski, J. M., Van Hoewyk, J., & Solenberger, P. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology, 27, 85–96.
Google Scholar
Rubin, D. B. (1978). Multiple imputations in sample surveys—a phenomenological Bayesian approach to nonresponse (Proceedings of the Survey Research Methods Section of the American Statistical Association, pp. 30–34).
Google Scholar
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Book Google Scholar
Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association, 91, 473–489. doi:10.2307/2291635.
Article Google Scholar
Savalei, V., & Rhemtulla, M. (2012). On obtaining estimates of the fraction of missing information from full information maximum likelihood. Structural Equation Modeling, 19, 477–494. doi:10.1080/10705511.2012.687669.
Article Google Scholar
Schafer, J. L. (1997). Analysis of incomplete multivariate data. New York: Chapman Hall.
Book Google Scholar
Schafer, J. L., & Yucel, R. M. (2002). Computational strategies for multivariate linear mixed-effects models with missing values. Journal of Computational and Graphical Statistics., 11, 437–457. doi:10.1198/106186002760180608.
Article Google Scholar
van Buuren, S. (2011). Multiple imputation of multilevel data. In J. Hox & J. Roberts (Eds.), Handbook of advanced multilevel analysis (pp. 173–196). Milton Park, UK: Routledge.
Google Scholar
van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton, FL: CRC Press.
Book Google Scholar
van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: multivariate imputation by chained equations in R. Journal of Statistical Software, 45, 1–67.
Article Google Scholar
van Buuren, S., Brand, J. P. L., Groothuis-Oudshoorn, C. G. M., & Rubin, D. B. (2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76, 1049–1064. doi:10.1080/10629360600810434.
Article Google Scholar
von Hippel, P. T. (2007). Regression with missing Ys: an improved strategy for analyzing multiply imputed data. Sociological Methodology, 37, 83–117. doi:10.1111/j.1467-9531.2007.00180.x.
Article Google Scholar
von Hippel, P. T. (2009). How to impute interactions, squares, and other transformed variables. Sociological Methodology, 39, 265–291. doi:10.1111/j.1467-9531.2009.01215.x.
Article Google Scholar
Wilkinson, L., & Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: guidelines and explanations. American Psychologist, 54, 594–604. doi:10.1037//0003-066X.54.8.594.
Article Google Scholar
Wu, W., Jia, F., & Enders, C. K. (2015). A comparison of imputation strategies for ordinal missing data on Likert scale variables. Multivariate Behavioral Research, 50, 484–503. doi:10.1080/00273171.2015.1022644.
Article PubMed Google Scholar
Yucel, R. M. (2008). Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response. Philosophical Transactions of the Royal Society A, 366, 2389–2403. doi:10.1098/rsta.2008.0038.
Article Google Scholar
Zhao, J. H., & Schafer, J. L. (2013). pan: multiple imputation for multivariate panel or clustered data (Version 0.9) [R Package].
Google Scholar
Zhao, E., & Yucel, R. M. (2009). Performance of sequential imputation method in multilevel applications. In the Proceedings of the American Statistical Association Survey Research Methods Section (pp. 2800–2810).
Google Scholar

Download references

Acknowledgments

The authors wish to acknowledge the diligent assistance of Jacob Curtis, Brooke Bell, Naomi Norwid, Virginia Stokes, and Jacquelyn Wall in preparing the systematic literature review presented in this article.

Author information

Authors and Affiliations

Institute for Measurement, Methodology, Analysis, and Policy, Texas Tech University, Lubbock, USA
Kyle M. Lang & Todd D. Little

Authors

Kyle M. Lang
View author publications
You can also search for this author in PubMed Google Scholar
Todd D. Little
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Kyle M. Lang or Todd D. Little.

Ethics declarations

Conflict of Interest

Todd D. Little owns and receives remuneration from Yhat Enterprises (yhatenterprises.com), which runs educational workshops such as Stats Camp (statscamp.org), and processes his royalties and his fees for consulting on statistics and methods with life science researchers.

Ethical Approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed Consent

Informed consent was obtained from all individual participants included in the study.

Funding

This study was supported by grant NSF 1053160 (Wei Wu and Todd D. Little, co-PIs) and by the Institute for Measurement, Methodology, Analysis, and Policy (Todd D. Little, Director) at Texas Tech University.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lang, K.M., Little, T.D. Principled Missing Data Treatments. Prev Sci 19, 284–294 (2018). https://doi.org/10.1007/s11121-016-0644-5

Download citation

Published: 04 April 2016
Issue Date: April 2018
DOI: https://doi.org/10.1007/s11121-016-0644-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Principled Missing Data Treatments

Abstract

Access this article

Similar content being viewed by others

Are All Biases Missing Data Problems?

Advances in Missing Data Models and Fidelity Issues of Implementing These Methods in Prevention Science

Missing Data Imputation: A Practical Guide

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of Interest

Ethical Approval

Informed Consent

Funding

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Principled Missing Data Treatments

Abstract

Access this article

Similar content being viewed by others

Are All Biases Missing Data Problems?

Advances in Missing Data Models and Fidelity Issues of Implementing These Methods in Prevention Science

Missing Data Imputation: A Practical Guide

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of Interest

Ethical Approval

Informed Consent

Funding

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation