Clustering the rows and columns of a contingency table

Abstract

A number of ways of investigating heterogeneity in a two-way contingency table are reviewed. In particular, we consider chi-square decompositions of the Pearson chi-square statistic with respect to the nodes of a hierarchical clustering of the rows and/or the columns of the table. A cut-off point which indicates “significant clustering” may be defined on the binary trees associated with the respective row and column cluster analyses. This approach provides a simple graphical procedure which is useful in interpreting a significant chi-square statistic of a contingency table.

This is a preview of subscription content, log in to check access.

References

  1. BENZECRI, J.-P. (1973),L'Analyse des Données, Tome (Vol.) 1 — La Taxinomie, Tome 2 — L'Analyse des Correspondances, Paris: Dunod.

    Google Scholar 

  2. BENZECRI, J.-P., and CAZES, P. (1978), “Probleme sur la classification,”Cahiers de L'Analyse des Données, 3, 95–101.

    Google Scholar 

  3. CLEVELAND, W.S., and RELLES, D.A. (1975), “Clustering by Identification with Special Application to Two-way Tables of Counts,”Journal of the American Statistical Association, 70, 626–630.

    Google Scholar 

  4. EVERITT, B.,Cluster Analysis, London: Heinemann.

  5. GABRIEL, K.R. (1966), “Simultaneous Test Procedures for Multiple Comparisons on Categorical Data,”Journal of the American Statistical Association, 61, 1081–1096.

    Google Scholar 

  6. GILULA, Z. (1986), “Grouping and Association in Contingency Tables: An Exploratory Canonical Correlation Approach,”Journal of the American Statistical Association, 81, 773–779.

    Google Scholar 

  7. GILULA, Z., and HABERMAN, S.J. (1986), “Canonical Analysis of Contingency Tables by Maximum Likelihood,”Journal of the American Statistical Association, 81, 780–788.

    Google Scholar 

  8. GILULA, Z. and KRIEGER, A.M. (1983), “The Decomposability and Monotonicity of Pearson's Chi-Square for Collapsed Contingency Tables with Applications,”Journal of the American Statistical Association, 78, 176–180.

    Google Scholar 

  9. GOLD, R.Z. (1963), “Tests Auxilliary to x2 Tests in a Markov Chain,”Annals of Mathematical Statistics, 34, 56–74.

    Google Scholar 

  10. GOODMAN, L.A. (1964), “Simultaneous Confidence Intervals for Contrasts Among Multinomial Populations,”Annals of Mathematical Statistics, 35, 716–725.

    Google Scholar 

  11. GOODMAN, L.A. (1965), “On Simultaneous Confidence Intervals for Multinomial Proportions,”Technometrics, 7, 247–254.

    Google Scholar 

  12. GOODMAN, L.A. (1985), “The Analysis of Cross-Classified Data Having Ordered and/or Unordered Categories: Association Models, Correlation Models, and Asymmetry Models for Contingency Tables with or without Missing Entries,”Annals of Statistics, 13, 10–69.

    Google Scholar 

  13. GOVAERT G. (1984), “Classification Simultanée de Tableaux Binaires,” inData Analysis and Informatics 3, eds. E. Diday, M. Jambu, L. Lebart, J. Pages, and R. Tomassone, Amsterdam: North Holland, 223–236.

    Google Scholar 

  14. GREENACRE, M.J. (1984),Theory and Applications of Correspondence Analysis, London: Academic Press.

    Google Scholar 

  15. GUTTMAN, L. (1971), “Measurement as Structural Theory,”Psychometrika, 36, 329–347.

    Google Scholar 

  16. HIROTSU, C. (1983), “Defining the Pattern of Association in Two-way Contingency Tables,”Biometrika, 70, 579–589.

    Google Scholar 

  17. JAMBU, M. (1978),Classification Automatique pour L'Analyse des Données, 1 — Méthodes et Algorithmes, Paris: Dunod.

    Google Scholar 

  18. JAMBU, M., and LEBEAUX, M.O. (1983),Cluster Analysis and Data Analysis, Amsterdam: North Holland.

    Google Scholar 

  19. LANCE, G.N., and WILLIAMS, W.T. (1967), “A General Theory of Classificatory Sorting Strategies. 1. Hierarchical Systems,”Computer Journal, 9, 373–380.

    Google Scholar 

  20. LEBART, L. (1975),Validité des Résultats en Analyse des Données, Paris: CREDOC-DGRST.

    Google Scholar 

  21. LEBART, L., MORINEAU, A., and WARWICK, K. (1984),Multivariate Descriptive Statistical Analysis, New York: Wiley.

    Google Scholar 

  22. O'NEILL, M.E. (1981), “A Note on the Canonical Correlations from Contingency Tables,”Australian Journal of Statistics, 23, 58–66.

    Google Scholar 

  23. PEARSON, E.S., and HARTLEY, H.O. (1972),Biometrika Tables for Statisticians, Volume 2, Cambridge, England: Cambridge University Press.

    Google Scholar 

  24. QUESENBERRY, C.P., and HURST, D.C. (1964), “Large Sample Simultaneous Confidence Intervals for Multinomial Proportions,”Technometrics, 6, 191–195.

    Google Scholar 

  25. SNEE, R.D. (1974), “Graphical Display of Two-way Contingency Tables,”American Statistician, 28, 9–12.

    Google Scholar 

  26. THARU, J., and WILLIAMS, W.T. (1966), “Concentration of Entries in Binary Arrays,”Nature, 210, 549.

    Google Scholar 

  27. WARD, J.H. (1963), “Hierarchical Grouping to Optimize an Objective Function,”Journal of the American Statistical Association, 58, 236–244.

    Google Scholar 

  28. WISHART, D. (1969), “An Algorithm for Hierarchical Classifications,”Biometrics, 25, 165–170.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Additional information

The author gratefully acknowledges the constructive comments of the referees and the editor.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Greenacre, M.J. Clustering the rows and columns of a contingency table. Journal of Classification 5, 39–51 (1988). https://doi.org/10.1007/BF01901670

Download citation

Keywords

  • Chi-square statistic
  • Cluster analysis
  • Contingency tables
  • Correspondence analysis
  • Multiple comparisons
  • Wishart distribution