Abstract
A general framework for describing and handling missing data is presented. Methodology is categorized according to its validity under various assumptions about the missing data mechanism. Considerable attention is given to direct-likelihood approaches, weighted generalized estimating equations, and multiple imputation. The value of sensitivity analysis to examine the stability of inferences against untestable assumptions is discussed. A running example is used to illustrate methodology.
Similar content being viewed by others
References
Aerts M, Geys H, Molenberghs G, Ryan LM (2002) Topics in modelling of clustered binary data. Chapman & Hall, London
Afifi A, Elashoff R (1966) Missing observations in multivariate statistics I: review of the literature. J Am Stat Assoc 61:595–604
Baker SG, Rosenberger WF, DerSimonian R (1992) Closed-form estimates for missing counts in two-way contingency tables. Stat Med 11:643–657
Beckman RJ, Nachtsheim CJ, Cook RD (1987) Diagnostics for mixed-model analysis of variance. Technometrics 29:413–426
Beunckens C, Sotto C, Molenberghs G (2008) A simulation study comparing weighted estimating equations with multiple imputation based estimating equations for longitudinal binary data. Comput Stat Data Anal 52:1533–1548
Breslow NE, Clayton DG (1993) Approximate inference in generalized linear mixed models. J Am Stat Assoc 88:9–25
Carpenter JR, Kenward MG (2013) Multiple imputation and its applications. Wiley, Chichester
Carpenter JR, Kenward MG, Vansteelandt S (2006) A comparison of multiple imputation and doubly robust estimation for analyses with missing data. J R Stat Soc Ser A 169:571–584
Carpenter JR, Roger JH, Kenward MG (2013) Analysis of longitudinal trials with protocol deviation: a framework for relevant, accessible assumptions, and inference via multiple imputation. J Biopharm Stat 23:1352–1371
Chatterjee S, Hadi AS (1988) Sensitivity analysis in linear regression. Wiley, New York
Cook RD (1977) Detection of influential observations in linear regression. Technometrics 19:15–18
Cook RD (1979) Influential observations in linear regression. J Am Stat Assoc 74:169–174
Cook RD (1986) Assessment of local influence. J R Stat Soc Ser B 48:133–169
Cook RD, Weisberg S (1982) Residuals and influence in regression. Chapman & Hall, London
Dempster AP, Rubin DB (1983) Overview. In: Madow WG, Olkin I, Rubin DB (eds) Incomplete data in sample surveys, Theory and annotated bibliography, vol II. Academic Press, New York, pp 3–10
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B 39:1–38
Diggle PJ, Kenward MG (1994) Informative drop-out in longitudinal data analysis (with discussion). Appl Stat 43:49–93
Diggle PJ, Heagerty P, Liang K-Y, Zeger SL (2002) Analysis of longitudinal data. Oxford University Press, New York
Enders CK (2010) Applied missing data analysis. The Guildford Press, New York
Fitzmaurice GM, Molenberghs G, Lipsitz SR (1995) Regression models for longitudinal binary responses with informative dropouts. J R Stat Soc Ser B 57:691–704
Glynn RJ, Laird NM, Rubin DB (1986) Selection modeling versus mixture modeling with nonignorable nonresponse. In: Wainer H (ed) Drawing inferences from self-selected samples. Springer, New York, pp 115–142
Hartley HO, Hocking R (1971) The analysis of incomplete data. Biometrics 27:7783–7808
Heckman JJ (1976) The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Ann Econ Soc Meas 5:475–492
Heitjan F, Little RJA (1991) Multiple imputaiton for the fatal accident reporting system. Appl Stat 40:13–29
Hogan JW, Laird NM (1997) Mixture models for the joint distribution of repeated measures and event times. Stat Med 16:239–258
Ibrahim JG, Molenberghs G (2009) Missing data methods in longitudinal studies: a review (with discussion and rejoinder). TEST 18:68–80
Jansen I, Molenberghs G (2008) A flexible marginal modeling strategy for non-monotone missing data. J R Stat Soc Ser A 171:347–373
Jansen I, Hens N, Molenberghs G, Aerts M, Verbeke G, Kenward MG (2006) The nature of sensitivity in monotone missing not at random models. Comput Stat Data Anal 50:830–858
Kenward MG, Molenberghs G (1998) Likelihood based frequentist inference when data are missing at random. Stat Sci 12:236–247
Kenward MG, Molenberghs G (2009) Last observation carried forward: a crystal ball? J Biopharm Stat 19:872–888
Kenward MG, Molenberghs G, Thijs H (2003) Pattern-mixture models with proper time dependence. Biometrika 90:53–71
Laird NM (1994) Discussion to Diggle PJ, Kenward MG: informative dropout in longitudinal data analysis. Appl Stat 43:84
Laird NM, Ware JH (1998) Random effects models for longitudinal data. Biometrics 28:963–974
Lesaffre E, Verbeke G (1998) Local influence in linear mixed models. Biometrics 54:570–582
Liang K-Y, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73:13–22
Little RJA (1993) Pattern-mixture models for multivariate incomplete data. J Am Stat Assoc 88:125–134
Little RJA (1994) A class of pattern-mixture models for normal incomplete data. Biometrika 81:471–483
Little RJA (1995) Modeling the drop-out mechanism in repeated measures studies. J Am Stat Assoc 90:1112–1121
Little RJA, D’Agostino R, Dickersin K, Emerson SS, Farrar JT, Frangakis C, Hogan JW, Molenberghs G, Murphy SA, Neaton JD, Rotnitzky A, Scharfstein D, Shih W, Siegel JP, Stern H, National Research Council (2010) The prevention and treatment of missing data in clinical trials. Panel on handling missing data in clinical trials. Committee on National Statistics, Division of Behavioral and Social Sciences and Education. The National Academies Press, Washington, DC
Little RJA, Kang S (2015) Intention-to-treat analysis with treatment discontinuation and missing data in clinical trials. Stat Med 34:2381–2390
Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York
Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York
Little RJA, Rubin DB (2014) Statistical analysis with missing data, 3rd edn. Wiley, New York
Lu K (2014) An analytic method for the placebo-based pattern-mixture model. Stat Med 33:1134–1145
Mallinckrodt CH, Clark WS, David SR (2001a) Type I error rates from mixed-effects model repeated measures versus fixed effects analysis of variance with missing values imputed via last observation carried forward. Drug Inf J 35:1215–1225
Mallinckrodt CH, Clark WS, David SR (2001b) Accounting for dropout bias using mixed-effects models. J Biopharm Stat 11(1 & 2):9–21
Mallinckrodt CH, Clark WS, Carroll RJ, Molenberghs G (2003a) Assessing response profiles from incomplete longitudinal clinical trial data under regulatory considerations. J Biopharm Stat 13:179–190
Mallinckodt CH, Lipkovich I (2016) Analyzing longitudinal clinical trial data. A practical guide. Chapman & Hall/CRC, Boca Raton
Mallinckrodt CH, Sanger TM, Dube S, Debrota DJ, Molenberghs G, Carroll RJ, Zeigler Potter WM, Tollefson GD (2003b) Assessing and interpreting treatment effects in longitudinal clinical trials with missing data. Biol Psychiatry 53:754–760
Meng XL (1994) Multiple-imputation inferences with uncongenial sources of input (with discussion). Stat Scinica 10:538–573
Michiels B, Molenberghs G, Lipsitz SR (1999) A pattern-mixture odds ratio model for incomplete categorical data. Commun Stat Theory Methods 28:2843–2869
Michiels B, Molenberghs G, Bijnens L, Vangeneugden T, Thijs H (2002) Selection models and pattern-mixture models to analyze longitudinal quality of life data subject to dropout. Stat Med 21:1023–1041
Molenberghs G, Beunckens C, Thijs H, Jansen I, Verbeke G, Kenward MG, Van Steen K (2007) Analysis of incomplete data. In: Dmitrienko A, Chuang-Stein C, D’Agostino R (eds) Pharmaceutical statistics using SAS: a practical guide. SAS Press, Cary, pp 313–360
Molenberghs G, Fitzmaurice G, Kenward MG, Verbeke G, Tsiatis AA (2015) Handbook of missing data. Chapman & Hall/CRC, Boca Raton
Molenberghs G, Kenward MG (2007) Missing data in clinical studies. Wiley, New York
Molenberghs G, Kenward MG, Lesaffre E (1997) The analysis of longitudinal ordinal data with non-random dropout. Biometrika 84:33–44
Molenberghs G, Michiels B, Lipsitz SR (1999) A pattern-mixture odds ratio model for incomplete categorical data. Commun Stat Theory Methods 28:2843–2869
Molenberghs G, Michiels B, Kenward MG, Diggle PJ (1998) Missing data mechanisms and pattern-mixture models. Statistica Neerlandica 52:153–161
Molenberghs G, Verbeke G (2005) Models for discrete longitudinal data. Springer, New York
Molenberghs G, Verbeke G, Thijs H, Lesaffre E, Kenward MG (2001) Mastitis in dairy cattle: local influence to assess sensitivity of the dropout process. Comput Stat Data Anal 37:93–113
Murray GD, Findlay JG (1988) Correcting for the bias caused by drop-outs in hypertension trials. Stat Med 7:941–946
Nelder JA, Mead R (1965) A simplex method for function minimisation. Comput J 7:303–313
Neuhaus JM (1992) Statistical methods for longitudinal and clustered designs with binary responses. Stat Methods Med Res 1:249–273
Neuhaus JM, Kalbfleisch JD, Hauck WW (1991) A comparison of cluster-specific and population-averaged approaches for analyzing correlated binary data. Int Stat Rev 59:25–35
O’Kelly M, Ratitch B (2014) Clinical trials with missing data: a guide for practitioners. Wiley, New York
Pharmacological Therapy for Macular Degeneration Study Group (1997) Interferon α - IIA is ineffective for patients with choroiadal neovascularization secondary to age-related macular degeneration. Arch Ophthalmol 115:865–872
Raghunathan T (2016) Missing data analysis in practice. Taylor & Francis, Boca Raton
Robins JM, Rotnitzky A, Zhao LP (1995) Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J Am Stat Assoc 90:10–121
Robins JM, Rotnitzky A, Scharfstein DO (1998) Semiparametric regression for repeated outcomes with non-ignorable non-response. J Am Stat Assoc 93:1321–1339
Rosenbaum PR, Rubin DB (1983) The central role of the propensity score method in observational studies for causal effects. Biometrika 70:41–55
Rotnitzky A, Cox DR, Bottai M, Robins J (2000) Likelihood-based inference with singular information matrix. Ther Ber 6:243–284
Rubin DB (1976) Inference and missing data. Biometrika 63:581–592
Rubin DB (1978) Multiple imputation in sample surveys – a phenomenological Bayesian approach to non-response. In: Imputation and editing of faulty or missing survey data. U.S. Department of Commerce, Washington, DC, pp 1–23
Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New York
Rubin DB (1994) Discussion to Diggle PJ, Kenward MG: informative dropout in longitudinal data analysis. Appl Stat 43:80–82
Rubin DB, Schenker N (1986) Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J Am Stat Assoc 81:366–374
Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall, London
Schafer JL (1999) Multiple imputation: a primer. Stat Methods Med Res 8:3–15
Sheiner LB, Beal SL, Dunne A (1997) Analysis of nonrandomly censored ordered categorical longitudinal data from analgesic trials. J Am Stat Assoc 92:1235–1244
Siddiqui O, Ali MW (1998) A comparison of the random-effects pattern-mixture model with last-observation-carried-forward (LOCF) analysis in longitudinal clinical trials with dropouts. J Biopharm Stat 8:545–563
Tan MT, Tian G-L, Ng KW (2010) Bayesian missing data problems. Taylor & Francis, Boca Raton
Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation. J Am Stat Assoc 82:528–550
TenHave TR, Kunselman AR, Pulkstenis EP, Landis JR (1998) Mixed effects logistic regression models for longitudinal binary response data with informative dropout. Biometrics 54:367–383
Thijs H, Molenberghs G, Michiels B, Verbeke G, Curran D (2002) Strategies to fit pattern-mixture models. Biostatistics 3:245–265
van Buuren, S, Boshuizen, HC, & Knook, DL (1999) Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in medicine 18(6):681–694
van Buuren S (2007) Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res 16:219–242
van Buuren S (2012) Flexible imputation of missing data. Chapman & Hall/CRC, Boca Raton
Verbeke G, Molenberghs G (1997) Linear mixed models in practice: a SAS-oriented approach, Lecture notes in statistics 126. Springer, New York
Verbeke G, Molenberghs G (2000) Linear mixed models for longitudinal data. Springer, New York
Verbeke G, Lesaffre E, Spiessens B (2001a) The practical use of different strategies to handle dropout in longitudinal studies. Drug Inf J 35:419–439
Verbeke G, Molenberghs G, Thijs H, Lesaffre E, Kenward MG (2001b) Sensitivity analysis for non-random dropout: a local influence approach. Biometrics 57:7–14
Wu MC, Bailey KR (1988) Analysing changes in the presence of informative right censoring caused by death and withdrawal. Stat Med 7:337–346
Wu MC, Bailey KR (1989) Estimation and comparison of changes in the presence of informative right censoring: conditional linear model. Biometrics 45:939–955
Wu MC, Carroll RJ (1988) Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics 44:175–188
Acknowledgments
We gratefully acknowledge support from FWO-Vlaanderen Research Project G.0002.98: “Sensitivity Analysis for Incomplete and Coarse Data” and from Belgian IUAP/PAI network “Statistical Techniques and Modeling for Complex Substantive Questions with Complex Data.”
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Molenberghs, G., Beunckens, C., Jansen, I., Thijs, H., Verbeke, G., Kenward, M.G. (2023). Missing Data. In: Ahrens, W., Pigeot, I. (eds) Handbook of Epidemiology. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6625-3_20-1
Download citation
DOI: https://doi.org/10.1007/978-1-4614-6625-3_20-1
Received:
Accepted:
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6625-3
Online ISBN: 978-1-4614-6625-3
eBook Packages: Springer Reference MedicineReference Module Medicine