, Volume 57, Issue 4, pp 567–580 | Cite as

Imputation of missing categorical data by maximizing internal consistency

  • Stef van Buuren
  • Jan L. A. van Rijckevorsel


This paper suggests a method to supplant missing categorical data by “reasonable” replacements. These replacements will maximize the consistency of the completed data as measured by Guttman's squared correlation ratio. The text outlines a solution of the optimization problem, describes relationships with the relevant psychometric theory, and studies some properties of the method in detail. The main result is that the average correlation should be at least 0.50 before the method becomes practical. At that point, the technique gives reasonable results up to 10–15% missing data.

Key words

missing data correlation ratio optimal scaling 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Dear, R. E. (1959).A principal component missing data method for multiple regression models (SP-86). Santa Monica, CA: System Development Corporation.Google Scholar
  2. Fisher, W. D. (1958). On grouping for maximum homogeneity.Journal of the American Statistical Association, 53, 789–798.Google Scholar
  3. Gifi, A. (1990).Nonlinear multivariate analysis. Chichester: Wiley.Google Scholar
  4. Gleason, T. C., & Staelin, R. (1975). A proposal for handling missing data.Psychometrika, 40, 229–252.Google Scholar
  5. Greenacre, M. J. (1984).Theory and applications of correspondence analysis. New York: Academic Press.Google Scholar
  6. Guttman, L. (1941). The quantification of a class of attributes: A theory and method of scale construction. In P. Horst et al. (Eds.),The prediction of personal adjustment (pp. 319–348). New York: Social Science Research Council.Google Scholar
  7. Hartigan, J. A. (1975).Clustering algorithms. New York: Wiley.Google Scholar
  8. Hartley, H. O., & Hocking, R. R. (1971). The analysis of incomplete data.Biometrics, 27, 783–808.Google Scholar
  9. Kalton, G., & Kasprzyk, D. (1982). Imputing for missing survey responses.Proceedings of the Section of Survey Research Methods, 1982 (pp. 22–23). Alexander, VA: American Statistical Association.Google Scholar
  10. Little, R. J. A., & Rubin, D. B. (1990). The analysis of social science data with missing values. In J. Fox & T. Scott Long (Eds.),Modern methods of data analysis (pp. 374–409). London: Sage.Google Scholar
  11. Madow, W. G., Olkin, I., & Rubin, D. B. (Eds.). (1983).Incomplete data in sample surveys (Vols. 1–3). New York: Academic Press.Google Scholar
  12. Meulman, J. (1982).Homogeneity analysis of incomplete data. Leiden: DSWO Press.Google Scholar
  13. Milligan, G. W. (1980). An examination of the effect of six types of error perturbation of fifteen clustering algorithms.Psychometrika, 45, 325–342.Google Scholar
  14. Nishisato, S. (1980).Analysis of categorical data: Dual scaling and its applications. Toronto: University of Toronto Press.Google Scholar
  15. Nishisato, S., & Ahn, H. (in press). When not to analyze data: Decision making on missing responses in dual scaling.Annals of Operations Research.Google Scholar
  16. Rubin, D. B. (1987).Multiple imputation for nonresponse in surveys. New York: Wiley.Google Scholar
  17. Rubin, D. B. (1991). EM and beyond.Psychometrika, 56, 241–254.Google Scholar
  18. Scheibler, D., & Schneider, W. (1985). Monte Carlo tests of the accuracy of cluster analysis algorithms.Multivariate Behavioral Research, 20, 283–304.Google Scholar
  19. Späth, H. (1985).Cluster dissection and analysis. Chichester: Ellis Horwood.Google Scholar
  20. Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation.Journal of the American Statistical Association, 82, 528–550.Google Scholar
  21. van Buuren, S., & Heiser, W. J. (1989). Clusteringn objects intok groups under optimal scaling of variables.Psychometrika, 54, 699–706.Google Scholar
  22. van Buuren, S., & van Rijckevorsel, J. L. A. (1992). Data augmentation and optimal scaling. In R. Steyer, K. F. Wender, & K. F. Widaman (Eds.),Psychometric Methodology. Proceedings of the 7th European Meeting of the Psychometric Society in Trier (80–84). Stuttgart and New York: Gustav Fischer Verlag.Google Scholar
  23. van der Heijden, P. G. M., & Escofier, B. (1989).Multiple correspondence analysis with missing data. Unpublished manuscript, University of Leiden, Department of Psychometrics and Research Methods.Google Scholar
  24. van Rijckevorsel, J. L. A., & de Leeuw, J. (1992). Some results about the importance of knot selection in nonlinear multivariate analysis.Statistica Applicata: Italian Journal of Applied Statistics, 4.Google Scholar

Copyright information

© The Psychometric Society 1992

Authors and Affiliations

  • Stef van Buuren
    • 1
  • Jan L. A. van Rijckevorsel
    • 1
  1. 1.TNO Institute of Preventive Health CareLeidenThe Netherlands

Personalised recommendations