Missing Data

Tong, Guangyu; Li, Fan; Allen, Andrew S.

doi:10.1007/978-3-319-52677-5_117-1

Guangyu Tong³,
Fan Li⁴ &
Andrew S. Allen⁵

508 Accesses
1 Altmetric

Abstract

Missing data are commonly seen in randomized clinical trials. When missingness is not completely random, a complete-case analysis that ignores the missing data process often leads to biased estimates of the average treatment effect. This chapter defines different missing data mechanisms, discusses their impact on inference, and presents statistical methods that address missing data, including likelihood-based analysis, inverse probability weighting, and imputation. Each of these methods either models the missingness process or the observed outcome distribution. A more robust approach that combines the virtue of each of these modeling approaches is also introduced. This approach is doubly robust such that it yields a consistent estimate of the average treatment effect if either one of the missingness model or the outcome model is correctly specified, but not necessarily both. The chapter concludes with a brief discussion of sensitivity analyses used to assess the impact of unmeasured factors that affect both the missingness and outcomes. Throughout, statistical and practical considerations are discussed in the context of randomized clinical trials where the primary analysis is to compare two treatments and to estimate the average comparative effect among the enrolled population.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Akande O, Li F, Reiter J (2017) An empirical comparison of multiple imputation methods for categorical data. Am Stat 71:162–170
Article MathSciNet Google Scholar
Albert JH, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Sat Assoc 88:669–679
Article MathSciNet Google Scholar
Angrist JD, Imbens GW, Rubin DB (1996) Identification of causal effects using instrumental variables. J Am Stat Assoc 91:444–455
Article Google Scholar
Barnard J, Rubin DB (1999) Miscellanea. Small-sample degrees of freedom with multiple imputation. Biometrika 86:948–955
Article MathSciNet Google Scholar
Browne WJ (2006) MCMC algorithms for constrained variance matrices. Comput Stat Data Anal 50:1655–1677
Article MathSciNet Google Scholar
Carpenter J, Kenward M (2012) Multiple imputation and its application. Wiley, London
MATH Google Scholar
Cochran WG, Rubin DB (1973) Controlling bias in observational studies: a review. Sankhyā Indian J Stat Ser A 35:417–446
MATH Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol 39:1–38
MathSciNet MATH Google Scholar
Efron B, Tibshirani RJ (1994) An Introduction to the Bootstrap. Chapman and Hall/CRC, New York
Google Scholar
Frangakis CE, Rubin DB (2002) Principal stratification in causal inference. Biometrics 58:21–29
Article MathSciNet Google Scholar
Hanson RH (1978) The current population survey: design and methodology. Department of Commerce, Bureau of the Census
Google Scholar
Hoff PD (2009) A first course in Bayesian statistical methods. Springer Science & Business Media, New York
Book Google Scholar
Hollis S, Campbell F (1999) What is meant by intention to treat analysis? Survey of published randomised controlled trials. BMJ 319:670–674
Article Google Scholar
Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685
Article MathSciNet Google Scholar
Imbens GW, Rubin DB (2015) Causal inference in statistics, social, and biomedical sciences. Cambridge University Press, New York
Book Google Scholar
International Conference on Harmonization (1998) Statistical principles for clinical trials E9. https://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E9/Step4/E9_Guideline.pdf
Kang JD, Schafer JL (2007) Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 22:523–539
Article MathSciNet Google Scholar
Kenward MG, Molenberghs G (2009) Last observation carried forward: a crystal ball? J Biopharm Stat 19:872–888
Article MathSciNet Google Scholar
Li F, Thomas LE, Li F (2018) Addressing extreme propensity scores via the overlap weights. Am J Epidemiol. https://doi.org/10.1093/aje/kwy201
Little RJ (1992) Regression with missing X’s: a review. J Am Stat Assoc 87:1227–1237
Google Scholar
Little RJA, Rubin DB (2002) Statistical Analysis with Missing Data, Second Edition. John Wiley & Sons, Inc., Hoboken, New Jersey
Google Scholar
Little RJ (2014) Dropouts in longitudinal studies: methods of analysis. Wiley StatsRef: Statistics Reference Online
Google Scholar
Little R, Kang S (2015) Intention-to-treat analysis with treatment discontinuation and missing data in clinical trials. Stat Med 34:2381–2390
Article MathSciNet Google Scholar
Little RJ, Rubin DB (2014) Statistical analysis with missing data. Wiley, Hoboken
MATH Google Scholar
Little RJ, D’Agostino R, Dickersin K et al (2010) The prevention and treatment of missing data in clinical trials. Panel on handling missing data in clinical trials. In: Committee on national statistics, division of behavioral and social sciences and education. The National Academies Press, Washington DC
Google Scholar
Little RJ, Wang J, Sun X, Tian H, Suh EY, Lee M et al (2016) The treatment of missing data in a large cardiovascular clinical outcomes study. Clin Trials 13:344–351
Article Google Scholar
Lunceford JK, Davidian M (2004) Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med 23:2937–2960
Article Google Scholar
Mallinckrodt CH (2013) Preventing and treating missing data in longitudinal clinical trials: a practical guide. Cambridge University Press, New York
Book Google Scholar
Meng X-L (1994) Multiple-imputation inferences with uncongenial sources of input. Stat Sci 9:538–558
Article Google Scholar
Oehlert GW (1992) A note on the delta method. Am Stat 46(1):27–29
MathSciNet Google Scholar
Press SJ (2005) Applied multivariate analysis: using Bayesian and frequentist methods of inference. Dover Publications, INC. Mineola, New York
Google Scholar
Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P (2001) A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Methodol 27:85–96
Google Scholar
Ridgeway G, McCaffrey DF (2007) Comment: demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 22:540–543
Article Google Scholar
Rubin DB (1976) Inference and missing data. Biometrika 63:581–592
Article MathSciNet Google Scholar
Rubin DB (1978) Multiple imputations in sample surveys-a phenomenological Bayesian approach to nonresponse. In: Proceedings of the survey research methods section of the American Statistical Association. American Statistical Association, pp 20–34
Google Scholar
Rubin DB (1996) Multiple imputation after 18+ years. J Am Stat Assoc 91:473–489
Article Google Scholar
Rubin DB (2004) Multiple imputation for nonresponse in surveys. Wiley, New York
MATH Google Scholar
Schafer JL (1997) Analysis of incomplete multivariate data. Chapman and Hall/CRC, New York
Book Google Scholar
Seaman SR, Vansteelandt S (2018) Introduction to double robust methods for incomplete data. Stat Sci Rev J Inst Math Stat 33:184–197
MathSciNet MATH Google Scholar
Tsiatis A (2007) Semiparametric theory and missing data. Springer Science & Business Media, New York
MATH Google Scholar
Tsiatis AA, Davidian M (2007) Comment: demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 22:569–573
Article Google Scholar
van Buuren S, Groothuis-Oudshoorn K (2011) MICE: multivariate imputation by chained equations in R. J Stat Softw 45:1–67
Google Scholar
White IR, Royston P, Wood AM (2011) Multiple imputation using chained equations: issues and guidance for practice. Stat Med 30:377–399
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Sociology, Duke University, Durham, North Carolina, USA
Guangyu Tong
Department of Biostatistics, Yale University, School of Public Health, New Haven, Connecticut, USA
Fan Li
Department of Biostatistics and Bioinformatics, Duke University, School of Medicine, Durham, North Carolina, USA
Andrew S. Allen

Authors

Guangyu Tong
View author publications
You can also search for this author in PubMed Google Scholar
Fan Li
View author publications
You can also search for this author in PubMed Google Scholar
Andrew S. Allen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrew S. Allen .

Editor information

Editors and Affiliations

Samuel Oschin Comprehensive Cancer Insti, WEST HOLLYWOOD, CA, USA
Steven Piantadosi
Bloomberg School of Public Health, Johns Hopkins Center for Clinical Trials Bloomberg School of Public Health, Baltimore, MD, USA
Curtis L. Meinert

Section Editor information

Department of Biostatistics and Bioinformatics, Duke University, School of Medicine, Durham, North Carolina, USA
Stephen George

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Tong, G., Li, F., Allen, A.S. (2020). Missing Data. In: Piantadosi, S., Meinert, C. (eds) Principles and Practice of Clinical Trials. Springer, Cham. https://doi.org/10.1007/978-3-319-52677-5_117-1

Download citation

DOI: https://doi.org/10.1007/978-3-319-52677-5_117-1
Received: 24 April 2019
Accepted: 10 October 2019
Published: 21 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52677-5
Online ISBN: 978-3-319-52677-5
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics