, Volume 57, Issue 1, pp 29–42 | Cite as

A general approach to categorical data analysis with missing data, using generalized linear models with composite links

  • David Rindskopf


A general approach for analyzing categorical data when there are missing data is described and illustrated. The method is based on generalized linear models with composite links. The approach can be used (among other applications) to fill in contingency tables with supplementary margins, fit loglinear models when data are missing, fit latent class models (without or with missing data on observed variables), fit models with fused cells (including many models from genetics), and to fill in tables or fit models to data when variables are more finely categorized for some cases than others. Both Newton-like and EM methods are easy to implement for parameter estimation.

Key words

missing data generalized linear models categorical data 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Arminger, G. (1982).Latent class analysis with generalized linear models using composite link functions. Unpublished notes.Google Scholar
  2. Baker, S. G., & Laird, N. M. (1988). Regression analysis for categorical variables with outcome subject to nonignorable nonresponse.Journal of the American Statistical Association, 83, 62–69.Google Scholar
  3. Burn, R. (1982). Loglinear models with composite link functions in genetics. In R. Gilchrist (Ed.),GLIM 82: Proceedings of the international conference on generalised linear models (pp. 144–154). New York: Springer-Verlag.Google Scholar
  4. Chen, T., & Fienberg, S. E. (1976). The analysis of contingency tables with incompletely classified data.Biometrics, 32, 133–144.Google Scholar
  5. Ekholm, A., & Palmgren, J. (1985). A model for a binary response with misclassifications. In R. Gilchrest (Ed.),GLIM 82: Proceedings of the international conference on generalised linear models (pp. 128–143). New York: Springer-Verlag.Google Scholar
  6. Espeland, M. A., & Hui, S. L. (1987). A general approach to analyzing epidemiologic data that contain misclassification errors.Biometrics, 43, 1001–1012.Google Scholar
  7. Espeland, M. A., & Odoroff, C. L. (1985). Log-linear models for doubly sampled categorical data fitted by the EM algorithm.Journal of the American Statistical Association, 80, 663–670.Google Scholar
  8. Grizzle, J. E., Starmer, F. C., & Koch, G. C. (1969). Analysis of categorical data by linear models.Biometrics, 25, 489–504.Google Scholar
  9. Haberman, S. J. (1974). Log-linear models for frequency tables derived by indirect observation: Maximum likelihood equations.Annals of statistics, 2, 911–924.Google Scholar
  10. Haberman, S. J. (1977) Product models for frequency tables involving indirect observation.Annals of Statistics, 5, 1124–1147.Google Scholar
  11. Haberman, S. J. (1979).Analysis of qualitative data: Volume 2. New developments. New York: Academic Press.Google Scholar
  12. Haberman, S. J. (1988). A stabilized Newton-Raphson algorithm for log-linear models for frequency tables derived by indirect observation. In C. C. Clogg (Ed.),Sociological methodology 1988 (pp. 193–211). Washington, D.C.: American Sociological Association.Google Scholar
  13. Hochberg, Y. (1977) On the use of double sampling schemes in analyzing categorical data with misclassification errors.Journal of the American Statistical Association, 72, 914–921.Google Scholar
  14. Hocking, R. R., & Oxspring, H. H. (1974). The analysis of partially categorized contingency data.Biometrics, 30, 469–483.Google Scholar
  15. Kempthorne, O. (1980). The term “design matrix”.American Statistician, 34, 249.Google Scholar
  16. Little, R. J. A., & Rubin, D. B. (1987).Statistical analysis with missing data. New York: Wiley.Google Scholar
  17. McCullagh, P., & Nelder, J. A. (1989).Generalized linear models (2nd ed.). London: Chapman and Hall.Google Scholar
  18. Rindskopf, D. (1984). Linear equality restrictions in regression and loglinear models.Psychological Bulletin, 96, 597–603.Google Scholar
  19. Rindskopf, D. (1990). Nonstandard loglinear models.Psychological Bulletin, 108, 150–162.Google Scholar
  20. Tenenbein, A. (1970) A double sampling scheme for estimating from binomial data with misclassifications.Journal of the American Statistical Association, 65, 1350–1361.Google Scholar
  21. Thompson, R., & Baker, R. J. (1981). Composite link functions in generalized linear models.Applied Statistics, 30, 125–131.Google Scholar
  22. Winship, C., & Mare, R. D. (1990) Loglinear models with missing data: A latent class approach.Sociological methodology, 20, 331–367.Google Scholar

Copyright information

© The Psychometric Society 1992

Authors and Affiliations

  • David Rindskopf
    • 1
  1. 1.Educational Psychology ProgramCUNY Graduate CenterNew York

Personalised recommendations