Abstract
The problem of dealing with missing values is common throughout statistical work and is present whenever human subjects are enrolled. Respondents may refuse participation or may be unreachable. Patients in clinical and epidemiological studies may with draw their initial consent without further explanation. Early work on missing values was largely concerned with algorithmic and computational solutions to the induced lack of balance or deviations from the intended study design (Afifi and Elashoff 1966; Hartley and Hocking 1971). More recently general algorithms such as the Expectation-Maximization (EM) (Dempster et al. 1977), and data imputation and augmentation procedures (Rubin1987;Tanner andWong1987) combined with powerful computing resources have largely provided a solution to this aspect of the problem. There remains the very difficult and important question of assessing the impact of missing data on subsequent statistical inference. Conditions can be formulated, under which an analysis that proceeds as if the missing data are missing by design, that is, ignoring the missing value process, can provide valid answers to study questions. While such an approach is attractive from a pragmatic point of view, the difficulty is that such conditions can rarely be assumed to hold with full certainty. Indeed, assumptions will be required that cannot be assessed from the data under analysis. Hence in this setting there cannot be anything that could be termed a definitive analysis, and hence any analysis of preference is ideally to be supplemented with a so-called sensitivity analysis.
Keywords
Generalize Linear Mixed Model Generalize Estimate Equation American Statistical Association Last Observation Carry Forward Royal Statistical Society SeriesPreview
Unable to display preview. Download preview PDF.
References
- Aerts M, Geys H, Molenberghs G, and Ryan LM (2002) Topics in Modelling of Clustered Binary Data. Chapman & Hall, LondonGoogle Scholar
- Afifi A, Elashoff R (1966) Missing observations in multivariate statistics I: Review of the literature. Journal of the American Statistical Association 61:595–604CrossRefMathSciNetGoogle Scholar
- Amemiya T (1984) Tobit models: a survey. Journal of Econometrics 24:3–61MATHCrossRefMathSciNetGoogle Scholar
- Ashford JR, Sowden RR (1970) Multi-variate probit analysis. Biometrics 26:535–546CrossRefGoogle Scholar
- Baker SG (1995) Marginal regression for repeated binary data with outcome subject to non-ignorable non-response. Biometrics 51:1042–1052MATHCrossRefGoogle Scholar
- Bahadur RR (1961) A representation of the joint distribution of responses to n dichotomous items. In: Solomon H (ed) Studies in Item Analysis and Prediction Stanford Mathematical Studies in the Social Sciences VI. Stanford University Press, Stanford CAGoogle Scholar
- Beckman RJ, Nachtsheim CJ, and Cook RD (1987) Diagnostics for mixed-model analysis of variance. Technometrics 29:413–426MATHCrossRefMathSciNetGoogle Scholar
- Breslow NE, Clayton DG (1993) Approximate inference in generalized linear mixed models. Journal of the American Statistical Association 88:9–25MATHCrossRefGoogle Scholar
- Buck SF (1960) A method of estimation of missing values in multivariate data suitable for use with an electronic computer. Journal of the Royal Statistical Society Series B 22:302–306MATHMathSciNetGoogle Scholar
- Chatterjee S, Hadi AS (1988) Sensitivity Analysis in Linear Regression. John Wiley & Sons, New YorkMATHGoogle Scholar
- Cook RD (1977) Detection of influential observations in linear regression. Technometrics 19:15–18MATHCrossRefMathSciNetGoogle Scholar
- Cook RD (1979) Influential observations in linear regression. Journal of the American Statistical Association 74:169–174MATHCrossRefMathSciNetGoogle Scholar
- Cook RD (1986) Assessment of local influence. Journal of the Royal Statistical Society Series B 48:133–169MATHGoogle Scholar
- Cook RD, Weisberg S (1982) Residuals and Influence in Regression. Chapman & Hall, LondonMATHGoogle Scholar
- Dale JR (1986) Global cross-ratio models for bivariate, discrete, ordered responses. Biometrics 42:909–917CrossRefGoogle Scholar
- Dempster AP, Rubin DB (1983) Overview. Incomplete Data in Sample Surveys, Vol. II: Theory and Annotated Bibliography, Madow WG, Olkin I, Rubin DB (eds). Academic Press, New York, pp 3–10Google Scholar
- Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society Series B 39:1–38MATHMathSciNetGoogle Scholar
- Diggle PJ, Kenward MG (1994) Informative drop-out in longitudinal data analysis (with discussion). Applied Statistics 43:49–93MATHCrossRefGoogle Scholar
- Diggle PJ, Heagerty P, Liang K-Y, Zeger SL (2002) Analysis of Longitudinal Data. Oxford University Press, New YorkGoogle Scholar
- Draper D (1995) Assessment and propagation of model uncertainty (with discussion). Journal of the Royal Statistical Society Series B 57:45–97MATHMathSciNetGoogle Scholar
- Ekholm A (1991) Algorithms versus models for analyzing data that contain misclassification errors. Biometrics 47:1171–1182CrossRefGoogle Scholar
- Fahrmeir L, Tutz G (2001) Multivariate Statistical Modelling Based on Generalized Linear Models. Springer-Verlag, HeidelbergMATHGoogle Scholar
- Fitzmaurice GM, Molenberghs G, Lipsitz SR (1995) Regression models for longitudinal binary responses with informative dropouts. Journal of the Royal Statistical Society Series B 57:691–704MATHMathSciNetGoogle Scholar
- Fitzmaurice GM, Heath G, Clifford P (1996a) Logistic regression models for binary data panel data with attrition. Journal of the Royal Statistical Society Series A 159:249–264MATHMathSciNetGoogle Scholar
- Fitzmaurice GM, Laird NM, Zahner GEP (1996b) Multivariate logistic models for incomplete binary response. Journal of the American Statistical Association 91:99–108MATHCrossRefGoogle Scholar
- George EO, Bowman D (1995) A saturated model for analyzing exchangeable binary data: Applications to clinical and developmental toxicity studies. Journal of the American Statistical Association 90:871–879MATHCrossRefGoogle Scholar
- Geys H, Molenberghs G, Lipsitz SR (1998) A note on the comparison of pseudolikelihood and generalized estimating equations for marginal odds ratio models. Journal of Statistical Computation and Simulation 62:45–72MATHCrossRefGoogle Scholar
- Glonek GFV, McCullagh P (1995) Multivariate logisticmodels. Journal of the Royal Statistical Society Series B 81:477–482Google Scholar
- Goss PE, Winer EP, Tannock IF, Schwartz LH, Kremer AB (1999) Breast cancer: randomized phase III trial comparing the new potent and selective third-generation aromatase inhibitor vorozole with megestrol acetate in postmenopausal advanced breast cancer patients. Journal of Clinical Oncology 17:52–63Google Scholar
- Hartley HO, Hocking R (1971) The analysis of incomplete data. Biometrics 27:7783–808CrossRefGoogle Scholar
- Heckman JJ (1976) The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement 5:475–492Google Scholar
- Hogan JW, Laird NM (1997) Mixture models for the joint distribution of repeated measures and event times. Statistics in Medicine 16:239–258CrossRefGoogle Scholar
- Kenward MG, Molenberghs G (1998) Likelihood based frequentist inference when data are missing at random. Statistical Science 12:236–247MathSciNetGoogle Scholar
- Kenward MG, Molenberghs G, Thijs H (2003) Pattern-mixture models with proper time dependence. Biometrika 90:53–71MATHCrossRefMathSciNetGoogle Scholar
- Laird NM (1994) Discussion to Diggle PJ, Kenward MG: Informative dropout in longitudinal data analysis. Applied Statistics 43:84MathSciNetGoogle Scholar
- Lang JB, Agresti A (1994) Simultaneously modeling joint and marginal distributions of multivariate categorical responses. Journal of the American Statistical Association 89:625–632MATHCrossRefGoogle Scholar
- le Cessie S, van Houwelingen JC (1994) Logistic regression for correlated binary data. Applied Statistics 43:95–108MATHCrossRefGoogle Scholar
- Lesaffre E, Verbeke G (1998) Local influence in linear mixed models. Biometrics 54:570–582MATHCrossRefGoogle Scholar
- Liang K-Y, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73:13–22MATHCrossRefMathSciNetGoogle Scholar
- Liang K-Y, Zeger SL, Qaqish B (1992) Multivariate regression analyses for categorical data. Journal of the Royal Statistical Society Series B 54:3–40MATHMathSciNetGoogle Scholar
- Lipsitz SR, Laird NM, Harrington DP (1991) Generalized estimating equations for correlated binary data: using the odds ratio as a measure of association. Biometrika 78:153–160CrossRefMathSciNetGoogle Scholar
- Little RJA (1986) A note about models for selectivity bias. Econometrika 53:1469–1474CrossRefGoogle Scholar
- Little RJA (1993) Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association 88:125–134MATHCrossRefGoogle Scholar
- Little RJA (1994) A class of pattern-mixture models for normal incomplete data. Biometrika 81:471–483MATHCrossRefMathSciNetGoogle Scholar
- Little RJA (1995) Modeling the drop-out mechanism in repeated measures studies. Journal of the American Statistical Association 90:1112–1121MATHCrossRefMathSciNetGoogle Scholar
- Little RJA, Rubin DB (1987) Statistical Analysis with Missing Data. John Wiley & Sons, New YorkMATHGoogle Scholar
- Mallinckrodt CH, Clark WS, Stacy RD (2001a) Type I error rates from mixed-effects model repeated measures versus fixed effects analysis of variance with missing values imputed via last observation carried forward. Drug Information Journal 35:1215–1225Google Scholar
- Mallinckrodt CH, Clark WS, Stacy RD (2001b) Accounting for dropout bias using mixed-effects models. Journal of Biopharmaceutical Statistics series 11,(1 & 2):9–21CrossRefGoogle Scholar
- Mallinckrodt CH, Clark WS, Carroll RJ, Molenberghs G (2003a) Assessing response profiles from incomplete longitudinal clinical trial data under regulatory considerations. Journal of Biopharmaceutical Statistics 13:179–190CrossRefMATHGoogle Scholar
- Mallinckrodt CH, Sanger TM, Dube S, Debrota DJ, Molenberghs G, Carroll RJ, Zeigler Potter WM, Tollefson, GD (2003b) Assessing and interpreting treatment effects in longitudinal clinical trials with missing data. Biological Psychiatry series 53:754–760CrossRefGoogle Scholar
- McCullagh P, Nelder JA (1989) Generalized Linear Models. Chapman & Hall, LondonMATHGoogle Scholar
- Michiels B, Molenberghs G, Bijnens L, Vangeneugden T, Thijs H (2002) Selection models and pattern-mixture models to analyze longitudinal quality of life data subject to dropout. Statistics in Medicine 21:1023–1041CrossRefGoogle Scholar
- Molenberghs G, Lesaffre E (1994) Marginal modelling of correlated ordinal data using a multivariate Plackett distribution. Journal of the American Statistical Association 89:633–644MATHCrossRefGoogle Scholar
- Molenberghs G, Lesaffre E (1999) Marginal modelling of multivariate categorical data. Statistics in Medicine 18:2237–2255CrossRefGoogle Scholar
- Molenberghs G, Kenward MG, Lesaffre E (1997) The analysis of longitudinal ordinal data with non-random dropout. Biometrika 84:33–44MATHCrossRefGoogle Scholar
- Molenberghs G, Michiels B, Kenward MG, Diggle PJ (1998) Missing data mechanisms and pattern-mixture models. Statistica Neerlandica 52:153–161MATHCrossRefMathSciNetGoogle Scholar
- Murray GD, Findlay JG (1988) Correcting for the bias caused by drop-outs in hypertension trials. Statististics in Medicine 7:941–946CrossRefGoogle Scholar
- Nelder JA, Mead R (1965) A simplex method for function minimisation. The Computer Journal 7:303–313MathSciNetGoogle Scholar
- Neuhaus JM (1992) Statistical methods for longitudinal and clustered designs with binary responses. Statistical Methods in Medical Research 1:249–273Google Scholar
- Neuhaus JM, Kalbfleisch JD, Hauck WW (1991) A comparison of cluster-specific and population-averaged approaches for analyzing correlated binary data. International Statistical Review 59:25–35CrossRefGoogle Scholar
- Plackett RL (1965) A class of bivariate distributions. Journal of the American Statistical Association 60:516–522CrossRefMathSciNetGoogle Scholar
- Prentice RL (1988) Correlated binary regression with covariates specific to each binary observation. Biometrics 44:1033–1048MATHCrossRefMathSciNetGoogle Scholar
- Robins JM, Rotnitzky A, Zhao LP (1995) Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association 90:106–121MATHCrossRefMathSciNetGoogle Scholar
- Robins JM, Rotnitzky A, Scharfstein DO (1998) Semiparametric regression for repeated outcomes with non-ignorable non-response. Journal of the American Statistical Association 93:1321–1339MATHCrossRefMathSciNetGoogle Scholar
- Rubin DB (1976) Inference and missing data. Biometrika 63:581–592MATHCrossRefMathSciNetGoogle Scholar
- Rubin DB (1987) Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons, New YorkGoogle Scholar
- Rubin DB (1994) Discussion to Diggle PJ, Kenward MG: Informative dropout in longitudinal data analysis. Applied Statistics 43:80–82Google Scholar
- Schafer JL (1997) Analysis of Incomplete Multivariate Data. Chapman & Hall, LondonMATHGoogle Scholar
- Schipper H, Clinch J, McMurray A (1984) Measuring the quality of life of cancer patients: the Functional-Living Index-Cancer: development and validation. Journal of Clinical Oncology 2:472–483Google Scholar
- Sheiner LB, Beal SL, Dunne A (1997) Analysis of nonrandomly censored ordered categorical longitudinal data from analgesic trials. Journal of the American Statistical Association 92:1235–1244MATHCrossRefGoogle Scholar
- Siddiqui O, Ali MW (1998) A comparison of the random-effects pattern-mixture model with last-observation-carried-forward (LOCF) analysis in longitudinal clinical trials with dropouts. Journal of Biopharmaceutical Statistics 8:545–563MATHCrossRefGoogle Scholar
- Skellam JG (1948) A probability distribution derived from the binomial distribution by regarding the probability of success as variable between the sets of trials. Journal of the Royal Statistical Society Series B 10:257–261MATHMathSciNetGoogle Scholar
- Smith DM, Robertson B, Diggle PJ (1996) Object-oriented Software for the Analysis of Longitudinal Data in S. Technical Report MA 96/192. Department of Mathematics and Statistics, University of Lancaster, LA1 4YF, United KingdomGoogle Scholar
- Stiratelli R, Laird N, Ware J (1984) Random effects models for serial observations with dichotomous response. Biometrics 40:961–972CrossRefGoogle Scholar
- Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association 82:528–550MATHCrossRefMathSciNetGoogle Scholar
- Thijs H, Molenberghs G, Michiels B, Verbeke G, Curran D (2002) Strategies to fit pattern-mixture models. Biostatistics 3:245–265MATHCrossRefGoogle Scholar
- Verbeke G, Molenberghs G (1997) Linear Mixed Models in Practice: A SAS-Oriented Approach. Lecture Notes in Statistics 126. Springer-Verlag, New YorkMATHGoogle Scholar
- Verbeke G, Molenberghs G (2000) Linear Mixed Models for Longitudinal Data. Springer-Verlag, New YorkMATHGoogle Scholar
- Verbeke G, Molenberghs G, Thijs H, Lesaffre E, Kenward MG (2001) Sensitivity analysis for non-random dropout: a local influence approach. Biometrics 57:7–14CrossRefMathSciNetGoogle Scholar
- Wedderburn RWM (1974) Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika 61:439–447MATHMathSciNetGoogle Scholar
- Wu MC, Bailey KR (1989) Estimation and comparison of changes in the presence of informative right censoring: conditional linear model. Biometrics 45:939–955MATHCrossRefMathSciNetGoogle Scholar