Missing Data

  • Guangyu Tong
  • Fan Li
  • Andrew S. AllenEmail author
Living reference work entry


Missing data are commonly seen in randomized clinical trials. When missingness is not completely random, a complete-case analysis that ignores the missing data process often leads to biased estimates of the average treatment effect. This chapter defines different missing data mechanisms, discusses their impact on inference, and presents statistical methods that address missing data, including likelihood-based analysis, inverse probability weighting, and imputation. Each of these methods either models the missingness process or the observed outcome distribution. A more robust approach that combines the virtue of each of these modeling approaches is also introduced. This approach is doubly robust such that it yields a consistent estimate of the average treatment effect if either one of the missingness model or the outcome model is correctly specified, but not necessarily both. The chapter concludes with a brief discussion of sensitivity analyses used to assess the impact of unmeasured factors that affect both the missingness and outcomes. Throughout, statistical and practical considerations are discussed in the context of randomized clinical trials where the primary analysis is to compare two treatments and to estimate the average comparative effect among the enrolled population.


Average treatment effect Randomized clinical trials Doubly robust Inverse probability weighting Likelihood Missing at random Markov Chain Monte Carlo Multiple imputation Sensitivity analysis 


  1. Akande O, Li F, Reiter J (2017) An empirical comparison of multiple imputation methods for categorical data. Am Stat 71:162–170MathSciNetCrossRefGoogle Scholar
  2. Albert JH, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Sat Assoc 88:669–679MathSciNetCrossRefGoogle Scholar
  3. Angrist JD, Imbens GW, Rubin DB (1996) Identification of causal effects using instrumental variables. J Am Stat Assoc 91:444–455CrossRefGoogle Scholar
  4. Barnard J, Rubin DB (1999) Miscellanea. Small-sample degrees of freedom with multiple imputation. Biometrika 86:948–955MathSciNetCrossRefGoogle Scholar
  5. Browne WJ (2006) MCMC algorithms for constrained variance matrices. Comput Stat Data Anal 50:1655–1677MathSciNetCrossRefGoogle Scholar
  6. Carpenter J, Kenward M (2012) Multiple imputation and its application. Wiley, LondonzbMATHGoogle Scholar
  7. Cochran WG, Rubin DB (1973) Controlling bias in observational studies: a review. Sankhyā Indian J Stat Ser A 35:417–446zbMATHGoogle Scholar
  8. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol 39:1–38MathSciNetzbMATHGoogle Scholar
  9. Efron B, Tibshirani RJ (1994) An Introduction to the Bootstrap. Chapman and Hall/CRC, New YorkGoogle Scholar
  10. Frangakis CE, Rubin DB (2002) Principal stratification in causal inference. Biometrics 58:21–29MathSciNetCrossRefGoogle Scholar
  11. Hanson RH (1978) The current population survey: design and methodology. Department of Commerce, Bureau of the CensusGoogle Scholar
  12. Hoff PD (2009) A first course in Bayesian statistical methods. Springer Science & Business Media, New YorkCrossRefGoogle Scholar
  13. Hollis S, Campbell F (1999) What is meant by intention to treat analysis? Survey of published randomised controlled trials. BMJ 319:670–674CrossRefGoogle Scholar
  14. Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685MathSciNetCrossRefGoogle Scholar
  15. Imbens GW, Rubin DB (2015) Causal inference in statistics, social, and biomedical sciences. Cambridge University Press, New YorkCrossRefGoogle Scholar
  16. International Conference on Harmonization (1998) Statistical principles for clinical trials E9.
  17. Kang JD, Schafer JL (2007) Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 22:523–539MathSciNetCrossRefGoogle Scholar
  18. Kenward MG, Molenberghs G (2009) Last observation carried forward: a crystal ball? J Biopharm Stat 19:872–888MathSciNetCrossRefGoogle Scholar
  19. Li F, Thomas LE, Li F (2018) Addressing extreme propensity scores via the overlap weights. Am J Epidemiol.
  20. Little RJ (1992) Regression with missing X’s: a review. J Am Stat Assoc 87:1227–1237Google Scholar
  21. Little RJA, Rubin DB (2002) Statistical Analysis with Missing Data, Second Edition. John Wiley & Sons, Inc., Hoboken, New JerseyGoogle Scholar
  22. Little RJ (2014) Dropouts in longitudinal studies: methods of analysis. Wiley StatsRef: Statistics Reference OnlineGoogle Scholar
  23. Little R, Kang S (2015) Intention-to-treat analysis with treatment discontinuation and missing data in clinical trials. Stat Med 34:2381–2390MathSciNetCrossRefGoogle Scholar
  24. Little RJ, Rubin DB (2014) Statistical analysis with missing data. Wiley, HobokenzbMATHGoogle Scholar
  25. Little RJ, D’Agostino R, Dickersin K et al (2010) The prevention and treatment of missing data in clinical trials. Panel on handling missing data in clinical trials. In: Committee on national statistics, division of behavioral and social sciences and education. The National Academies Press, Washington DCGoogle Scholar
  26. Little RJ, Wang J, Sun X, Tian H, Suh EY, Lee M et al (2016) The treatment of missing data in a large cardiovascular clinical outcomes study. Clin Trials 13:344–351CrossRefGoogle Scholar
  27. Lunceford JK, Davidian M (2004) Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med 23:2937–2960CrossRefGoogle Scholar
  28. Mallinckrodt CH (2013) Preventing and treating missing data in longitudinal clinical trials: a practical guide. Cambridge University Press, New YorkCrossRefGoogle Scholar
  29. Meng X-L (1994) Multiple-imputation inferences with uncongenial sources of input. Stat Sci 9:538–558CrossRefGoogle Scholar
  30. Oehlert GW (1992) A note on the delta method. Am Stat 46(1):27–29MathSciNetGoogle Scholar
  31. Press SJ (2005) Applied multivariate analysis: using Bayesian and frequentist methods of inference. Dover Publications, INC. Mineola, New YorkGoogle Scholar
  32. Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P (2001) A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Methodol 27:85–96Google Scholar
  33. Ridgeway G, McCaffrey DF (2007) Comment: demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 22:540–543CrossRefGoogle Scholar
  34. Rubin DB (1976) Inference and missing data. Biometrika 63:581–592MathSciNetCrossRefGoogle Scholar
  35. Rubin DB (1978) Multiple imputations in sample surveys-a phenomenological Bayesian approach to nonresponse. In: Proceedings of the survey research methods section of the American Statistical Association. American Statistical Association, pp 20–34Google Scholar
  36. Rubin DB (1996) Multiple imputation after 18+ years. J Am Stat Assoc 91:473–489CrossRefGoogle Scholar
  37. Rubin DB (2004) Multiple imputation for nonresponse in surveys. Wiley, New YorkzbMATHGoogle Scholar
  38. Schafer JL (1997) Analysis of incomplete multivariate data. Chapman and Hall/CRC, New YorkCrossRefGoogle Scholar
  39. Seaman SR, Vansteelandt S (2018) Introduction to double robust methods for incomplete data. Stat Sci Rev J Inst Math Stat 33:184–197MathSciNetzbMATHGoogle Scholar
  40. Tsiatis A (2007) Semiparametric theory and missing data. Springer Science & Business Media, New YorkzbMATHGoogle Scholar
  41. Tsiatis AA, Davidian M (2007) Comment: demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 22:569–573CrossRefGoogle Scholar
  42. van Buuren S, Groothuis-Oudshoorn K (2011) MICE: multivariate imputation by chained equations in R. J Stat Softw 45:1–67Google Scholar
  43. White IR, Royston P, Wood AM (2011) Multiple imputation using chained equations: issues and guidance for practice. Stat Med 30:377–399MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of SociologyDuke UniversityDurhamUSA
  2. 2.Department of BiostatisticsYale University, School of Public HealthNew HavenUSA
  3. 3.Department of Biostatistics and BioinformaticsDuke University, School of MedicineDurhamUSA

Section editors and affiliations

  • Stephen George
    • 1
  1. 1.Department of Biostatistics and BioinformaticsDuke University, School of MedicineDurhamUSA

Personalised recommendations