Advertisement

Psychometrika

, Volume 82, Issue 1, pp 158–185 | Cite as

Cluster Correspondence Analysis

  • M. van de VeldenEmail author
  • A. Iodice D’Enza
  • F. Palumbo
Article

Abstract

A method is proposed that combines dimension reduction and cluster analysis for categorical data by simultaneously assigning individuals to clusters and optimal scaling values to categories in such a way that a single between variance maximization objective is achieved. In a unified framework, a brief review of alternative methods is provided and we show that the proposed method is equivalent to GROUPALS applied to categorical data. Performance of the methods is appraised by means of a simulation study. The results of the joint dimension reduction and clustering methods are compared with the so-called tandem approach, a sequential analysis of dimension reduction followed by cluster analysis. The tandem approach is conjectured to perform worse when variables are added that are unrelated to the cluster structure. Our simulation study confirms this conjecture. Moreover, the results of the simulation study indicate that the proposed method also consistently outperforms alternative joint dimension reduction and clustering methods.

Keywords

correspondence analysis cluster analysis dimension reduction categorical data 

Supplementary material

11336_2016_9514_MOESM1_ESM.zip (1.1 mb)
Supplementary material 1 (zip 1126 KB)

References

  1. Bäck, T. (1996). Evolutionary algorithms in theory and practice: Evolution strategies, evolutionary programming, genetic algorithms. Oxford: Oxford University Press.Google Scholar
  2. Borg, I., & Groenen, P. J. (2005). Modern multidimensional scaling: Theory and applications. New York: Springer.Google Scholar
  3. De Soete, G., & Carroll, J. D. (1994). K-means clustering in a low-dimensional euclidean space. In E. Diday, Y. Lechevallier, M. Schader, P. Bertrand, & B. Burtschy (Eds.), New approaches in classification and data analysis (pp. 212–219). Berlin: Springer.CrossRefGoogle Scholar
  4. Gifi, A. (1990). Nonlinear multivariate analysis. Chichester: Wiley.Google Scholar
  5. Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27, 623–637.CrossRefGoogle Scholar
  6. Gower, J. C., Lubbe, S. G., & Le Roux, N. J. (2011). Understanding biplots. New York: Wiley.CrossRefGoogle Scholar
  7. Gower, J. C., Groenen, P. J. F., & van de Velden, M. (2010). Area biplots. Journal of Computational and Graphical Statistics, 19(1), 46–61.CrossRefGoogle Scholar
  8. Gower, J. C., & Hand, D. J. (1996). Biplots. London: Chapman and Hall.Google Scholar
  9. Greenacre, M. J. (1984). Theory and applications of correspondence analysis. London: Academic Press.Google Scholar
  10. Greenacre, M. J. (1993). Biplots in correspondence analysis. Journal of Applied Statistics, 20(2), 251–269.CrossRefGoogle Scholar
  11. Greenacre, M. J. (2007). Correspondence analysis in practice. Boca Raton: CRC Press.CrossRefGoogle Scholar
  12. Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218. doi: 10.1007/BF01908075
  13. Hwang, H., Dillon, W. R., & Takane, Y. (2006). An extension of multiple correspondence analysis for identifying heterogenous subgroups of respondents. Psychometrika, 71, 161–171.CrossRefGoogle Scholar
  14. Iodice D’Enza, A., & Palumbo, F. (2013). Iterative factor clustering of binary data. Computational Statistics, 789-807. doi: 10.1007/s00180-012-0329-x
  15. Iodice D’Enza, A., van de Velden, M., & Palumbo, F. (2014). On joint dimension reduction and clustering of categorical data. In D. Vicari, A. Okada, G. Ragozini, & C. Weihs (Eds.), Analysis and modeling of complex data in behavioral and social sciences. Berlin: Springer.Google Scholar
  16. Jolliffe, J. (2002). Principal component analysis. New York: Springer.Google Scholar
  17. Kroonenberg, P. M., & Lombardo, R. (1999). Nonsymmetric correspondence analysis: A tool for analysing contingency tables with a dependence structure. Multivariate Behavioral Research, 34, 367–396.CrossRefGoogle Scholar
  18. Lauro, N., & D’Ambra, L. (1984). L’ analyse non symetrique des correspondances [nonsymmetric correspondence analysis]. In E. Diday, L. Lebart, M. Jambu, & Thomassone (Eds.), Data analysis and informatics III (pp. 433–446). Amsterdam: Elsevier.Google Scholar
  19. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In L. Cam & J. Neyman (Eds.), Proceedings of the fifth berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 281–297). California: University of California Press.Google Scholar
  20. Martin, R. A., Puhlik-Doris, P., Larsen, G., Gray, J., & Weir, K. (2003). Individual differences in uses of humor and their relation to psychological well-being: Development of the humor styles questionnaire. Journal of Research in Personality, 37(1), 48–75.CrossRefGoogle Scholar
  21. Nishisato, S. (1980). Analysis of categorical data: Dual scaling and its applications. Toronto: University of Toronto Press.Google Scholar
  22. Nishisato, S. (1994). Elements of dual scaling: An introduction to practical data analysis. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
  23. van de Velden, M., & Bijmolt, T. (2006). Generalized canonical correlation analysis of matrices with missing rows: A simulation study. Psychometrika, 71(2), 323–331.CrossRefPubMedGoogle Scholar
  24. van de Velden, M., & Takane, Y. (2012). Generalized canonical correlation analysis with missing values. Computational Statistics, 27(3), 551–571.CrossRefGoogle Scholar
  25. Van Buuren, S., & Heiser, W. (1989). Clustering n objects into k groups under optimal scaling of variables. Psychometrika, 54, 699–706.CrossRefGoogle Scholar
  26. Vichi, M., & Kiers, H. A. L. (2001). Factorial k-means analysis for two-way data. Computational Statistics and Data Analysis, 37, 49–64.CrossRefGoogle Scholar
  27. Vichi, M., Vicari, D., & Kiers, H. (2009). Clustering and dimensional reduction for mixed variables. (Unpublished manuscript)Google Scholar
  28. Yamamoto, M., & Hwang, H. (2014). A general formulation of cluster analysis with dimension reduction and subspace separation. Behaviormetrika, 41, 115–129.CrossRefGoogle Scholar

Copyright information

© The Psychometric Society 2016

Authors and Affiliations

  • M. van de Velden
    • 1
    Email author
  • A. Iodice D’Enza
    • 2
  • F. Palumbo
    • 3
  1. 1.Econometric InstituteErasmus University RotterdamRotterdamThe Netherlands
  2. 2.Università di Cassino e del Lazio MeridionaleCassinoItaly
  3. 3.Università degli Studi di Napoli Federico IINaplesItaly

Personalised recommendations