Statistics and Computing

, Volume 21, Issue 1, pp 31–43 | Cite as

Missing data mechanisms and their implications on the analysis of categorical data

  • Frederico Z. Poleto
  • Julio M. SingerEmail author
  • Carlos Daniel Paulino


We review some issues related to the implications of different missing data mechanisms on statistical inference for contingency tables and consider simulation studies to compare the results obtained under such models to those where the units with missing data are disregarded. We confirm that although, in general, analyses under the correct missing at random and missing completely at random models are more efficient even for small sample sizes, there are exceptions where they may not improve the results obtained by ignoring the partially classified data. We show that under the missing not at random (MNAR) model, estimates on the boundary of the parameter space as well as lack of identifiability of the parameters of saturated models may be associated with undesirable asymptotic properties of maximum likelihood estimators and likelihood ratio tests; even in standard cases the bias of the estimators may be low only for very large samples. We also show that the probability of a boundary solution obtained under the correct MNAR model may be large even for large samples and that, consequently, we may not always conclude that a MNAR model is misspecified because the estimate is on the boundary of the parameter space.


Categorical data Missing or incomplete data MAR, MCAR and MNAR Ignorable and non-ignorable mechanism Selection models 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Baker, S.G., Laird, N.M.: Regression analysis for categorical variables with outcome subject to nonignorable nonresponse. J. Am. Stat. Assoc. 83, 62–69 (1988) (p. 1232, correction) CrossRefMathSciNetGoogle Scholar
  2. Baker, S.G., Rosenberger, W.F., DerSimonian, R.: Closed-form estimates for missing counts in two-way contingency tables. Stat. Med. 11, 643–657 (1992) CrossRefGoogle Scholar
  3. Brown, C.H.: Protecting against nonrandomly missing data in longitudinal studies. Biometrics 46, 143–156 (1990) zbMATHCrossRefGoogle Scholar
  4. Chen, T.T., Fienberg, S.E.: Two-dimensional contingency tables with both completely and partially cross-classified data. Biometrics 30, 629–642 (1974) zbMATHCrossRefMathSciNetGoogle Scholar
  5. Clarke, P.S.: On boundary solutions and identifiability in categorical regression with non-ignorable non-response. Biom. J. 44, 701–717 (2002) CrossRefMathSciNetGoogle Scholar
  6. Clarke, P.S., Smith, P.W.F.: Interval estimation for log-linear models with one variable subject to non-ignorable non-response. J. R. Stat. Soc. B 66, 357–368 (2004) zbMATHCrossRefMathSciNetGoogle Scholar
  7. Clarke, P.S., Smith, P.W.F.: On maximum likelihood estimation for log-linear models with non-ignorable non-response. Stat. Probab. Lett. 73, 441–448 (2005) zbMATHCrossRefMathSciNetGoogle Scholar
  8. Cook, R.D.: Assessment of local influence. J. R. Stat. Soc. B 48, 133–169 (1986) zbMATHGoogle Scholar
  9. Fay, R.E.: Causal models for patterns of nonresponse. J. Am. Stat. Assoc. 81, 354–365 (1986) CrossRefGoogle Scholar
  10. Fitzmaurice, G.M., Laird, N.M., Zahner, G.E.P.: Multivariate logistic models for incomplete binary responses. J. Am. Stat. Assoc. 91, 99–108 (1996) zbMATHCrossRefGoogle Scholar
  11. Forster, J.J., Smith, P.W.F.: Model-based inference for categorical survey data subject to non-ignorable non-response (with discussion). J. R. Stat. Soc. B 60, 57–79,89–102 (1998) zbMATHCrossRefMathSciNetGoogle Scholar
  12. Glonek, G.F.V.: On identifiability in models for incomplete binary data. Stat. Probab. Lett. 41, 191–197 (1999) zbMATHCrossRefMathSciNetGoogle Scholar
  13. Gustafson, P.: On model expansion, model contraction, identifiability and prior information: two illustrative scenarios involving mismeasured variables (with discussion). Stat. Sci. 20, 111–140 (2005) zbMATHCrossRefMathSciNetGoogle Scholar
  14. Jansen, I., Hens, N., Molenberghs, G., Aerts, M., Verbeke, G., Kenward, M.G.: The nature of sensitivity in monotone missing not at random models. Comput. Stat. Data Anal. 50, 830–858 (2006) zbMATHCrossRefMathSciNetGoogle Scholar
  15. Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd ed. Wiley, New York (2002) zbMATHGoogle Scholar
  16. Michiels, B., Molenberghs, G.: Protective estimation of longitudinal categorical data with nonrandom dropout. Commun. Stat. Theory Methods 26, 65–94 (1997) zbMATHCrossRefGoogle Scholar
  17. Molenberghs, G., Kenward, M.G.: Missing data in clinical studies. Wiley, New York (2007) CrossRefGoogle Scholar
  18. Molenberghs, G., Goetghebeur, E., Lipsitz, S.R., Kenward, M.G.: Nonrandom missingness in categorical data: strengths and limitations. Am. Stat. 53, 110–118 (1999) CrossRefGoogle Scholar
  19. Molenberghs, G., Kenward, M.G., Goetghebeur, E.: Sensitivity analysis for incomplete contingency tables: the Slovenian plebiscite case. Appl. Stat. 50, 15–29 (2001) zbMATHGoogle Scholar
  20. Molenberghs, G., Beunckens, C., Sotto, C., Kenward, M.G.: Every missingness not at random model has a missingness at random counterpart with equal fit. J. R. Stat. Soc. B 70, 371–388 (2008) zbMATHCrossRefMathSciNetGoogle Scholar
  21. Murray, G.D., Findlay, J.G.: Correcting for the bias caused by drop-outs in hypertension trials. Stat. Med. 7, 941–946 (1988) CrossRefGoogle Scholar
  22. Park, T.: An approach to categorical data with nonignorable nonresponse. Biometrics 54, 1579–1590 (1998) zbMATHCrossRefGoogle Scholar
  23. Park, T., Brown, M.B.: Models for categorical data with nonignorable nonresponse. J. Am. Stat. Assoc. 89, 44–52 (1994) CrossRefGoogle Scholar
  24. Paulino, C.D.: Analysis of incomplete categorical data: a survey of the conditional maximum likelihood and weighted least squares approaches. Braz. J. Probab. Stat. 5, 1–42 (1991) zbMATHMathSciNetGoogle Scholar
  25. Paulino, C.D., Pereira, C.A.B.: Bayesian methods for categorical data under informative general censoring. Biometrika 82, 439–446 (1995) zbMATHCrossRefGoogle Scholar
  26. Rotnitzky, A., Cox, D.R., Bottai, M., Robins, J.M.: Likelihood-based inference with singular information matrix. Bernoulli 6, 243–284 (2000) zbMATHCrossRefMathSciNetGoogle Scholar
  27. Rubin, D.B.: Characterizing the estimation of parameters in incomplete-data problems. J. Am. Stat. Assoc. 69, 467–474 (1974) zbMATHCrossRefGoogle Scholar
  28. Rubin, D.B.: Inference and missing data. Biometrika 63, 581–592 (1976) zbMATHCrossRefMathSciNetGoogle Scholar
  29. Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, New York (1987) CrossRefGoogle Scholar
  30. Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman & Hall/CRC, Boca Raton (1997) zbMATHCrossRefGoogle Scholar
  31. Smith, P.W.F., Skinner, C.J., Clarke, P.S.: Allowing for non-ignorable non-response in the analysis of voting intention data. Appl. Stat. 48, 563–577 (1999) zbMATHGoogle Scholar
  32. Soares, P., Paulino, C.D.: Incomplete categorical data analysis: a Bayesian perspective. J. Stat. Comput. Simul. 69, 157–170 (2001) zbMATHCrossRefMathSciNetGoogle Scholar
  33. Soares, P., Paulino, C.D.: Log-linear models for coarse categorical data. In: Gomes, M.I., Pestana, D., Silva, P. (eds.) Proc. 56th Session of the Internat. Statist. Inst., Invited Paper Meeting #15 (Bayesian Theory and Practice), LVI Bulletin of the Internat. Statist. Inst., Lisbon (2007) Google Scholar
  34. Vansteelandt, S., Goetghebeur, E., Kenward, M.G., Molenberghs, G.: Ignorance and uncertainty regions as inferential tools in a sensitivity analysis. Stat. Sin. 16, 953–979 (2006) zbMATHMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Frederico Z. Poleto
    • 1
  • Julio M. Singer
    • 1
    Email author
  • Carlos Daniel Paulino
    • 2
  1. 1.Departamento de Estatística, Instituto de Matemática e EstatísticaUniversidade de São PauloSão PauloBrazil
  2. 2.Departamento de Matemática, Instituto Superior TécnicoUniversidade Técnica de Lisboa (and CEAUL-FCUL)LisboaPortugal

Personalised recommendations