Cluster Correspondence Analysis

Abstract

A method is proposed that combines dimension reduction and cluster analysis for categorical data by simultaneously assigning individuals to clusters and optimal scaling values to categories in such a way that a single between variance maximization objective is achieved. In a unified framework, a brief review of alternative methods is provided and we show that the proposed method is equivalent to GROUPALS applied to categorical data. Performance of the methods is appraised by means of a simulation study. The results of the joint dimension reduction and clustering methods are compared with the so-called tandem approach, a sequential analysis of dimension reduction followed by cluster analysis. The tandem approach is conjectured to perform worse when variables are added that are unrelated to the cluster structure. Our simulation study confirms this conjecture. Moreover, the results of the simulation study indicate that the proposed method also consistently outperforms alternative joint dimension reduction and clustering methods.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. 1.

    The presented results are based on the solution using 100 random starts. Increasing the number of random starts to 1000 led to small changes in the configuration that had no effect on the interpretation. The congruence indices with the current solution were 0.997, for the attributes, and 0.999, for the subjects.

References

  1. Bäck, T. (1996). Evolutionary algorithms in theory and practice: Evolution strategies, evolutionary programming, genetic algorithms. Oxford: Oxford University Press.

    Google Scholar 

  2. Borg, I., & Groenen, P. J. (2005). Modern multidimensional scaling: Theory and applications. New York: Springer.

    Google Scholar 

  3. De Soete, G., & Carroll, J. D. (1994). K-means clustering in a low-dimensional euclidean space. In E. Diday, Y. Lechevallier, M. Schader, P. Bertrand, & B. Burtschy (Eds.), New approaches in classification and data analysis (pp. 212–219). Berlin: Springer.

    Google Scholar 

  4. Gifi, A. (1990). Nonlinear multivariate analysis. Chichester: Wiley.

    Google Scholar 

  5. Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27, 623–637.

    Article  Google Scholar 

  6. Gower, J. C., Lubbe, S. G., & Le Roux, N. J. (2011). Understanding biplots. New York: Wiley.

    Google Scholar 

  7. Gower, J. C., Groenen, P. J. F., & van de Velden, M. (2010). Area biplots. Journal of Computational and Graphical Statistics, 19(1), 46–61.

    Article  Google Scholar 

  8. Gower, J. C., & Hand, D. J. (1996). Biplots. London: Chapman and Hall.

    Google Scholar 

  9. Greenacre, M. J. (1984). Theory and applications of correspondence analysis. London: Academic Press.

    Google Scholar 

  10. Greenacre, M. J. (1993). Biplots in correspondence analysis. Journal of Applied Statistics, 20(2), 251–269.

    Article  Google Scholar 

  11. Greenacre, M. J. (2007). Correspondence analysis in practice. Boca Raton: CRC Press.

    Google Scholar 

  12. Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218. doi:10.1007/BF01908075

  13. Hwang, H., Dillon, W. R., & Takane, Y. (2006). An extension of multiple correspondence analysis for identifying heterogenous subgroups of respondents. Psychometrika, 71, 161–171.

    Article  Google Scholar 

  14. Iodice D’Enza, A., & Palumbo, F. (2013). Iterative factor clustering of binary data. Computational Statistics, 789-807. doi:10.1007/s00180-012-0329-x

  15. Iodice D’Enza, A., van de Velden, M., & Palumbo, F. (2014). On joint dimension reduction and clustering of categorical data. In D. Vicari, A. Okada, G. Ragozini, & C. Weihs (Eds.), Analysis and modeling of complex data in behavioral and social sciences. Berlin: Springer.

    Google Scholar 

  16. Jolliffe, J. (2002). Principal component analysis. New York: Springer.

    Google Scholar 

  17. Kroonenberg, P. M., & Lombardo, R. (1999). Nonsymmetric correspondence analysis: A tool for analysing contingency tables with a dependence structure. Multivariate Behavioral Research, 34, 367–396.

    Article  Google Scholar 

  18. Lauro, N., & D’Ambra, L. (1984). L’ analyse non symetrique des correspondances [nonsymmetric correspondence analysis]. In E. Diday, L. Lebart, M. Jambu, & Thomassone (Eds.), Data analysis and informatics III (pp. 433–446). Amsterdam: Elsevier.

  19. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In L. Cam & J. Neyman (Eds.), Proceedings of the fifth berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 281–297). California: University of California Press.

    Google Scholar 

  20. Martin, R. A., Puhlik-Doris, P., Larsen, G., Gray, J., & Weir, K. (2003). Individual differences in uses of humor and their relation to psychological well-being: Development of the humor styles questionnaire. Journal of Research in Personality, 37(1), 48–75.

    Article  Google Scholar 

  21. Nishisato, S. (1980). Analysis of categorical data: Dual scaling and its applications. Toronto: University of Toronto Press.

    Google Scholar 

  22. Nishisato, S. (1994). Elements of dual scaling: An introduction to practical data analysis. Hillsdale, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  23. van de Velden, M., & Bijmolt, T. (2006). Generalized canonical correlation analysis of matrices with missing rows: A simulation study. Psychometrika, 71(2), 323–331.

    Article  PubMed  Google Scholar 

  24. van de Velden, M., & Takane, Y. (2012). Generalized canonical correlation analysis with missing values. Computational Statistics, 27(3), 551–571.

    Article  Google Scholar 

  25. Van Buuren, S., & Heiser, W. (1989). Clustering n objects into k groups under optimal scaling of variables. Psychometrika, 54, 699–706.

    Article  Google Scholar 

  26. Vichi, M., & Kiers, H. A. L. (2001). Factorial k-means analysis for two-way data. Computational Statistics and Data Analysis, 37, 49–64.

    Article  Google Scholar 

  27. Vichi, M., Vicari, D., & Kiers, H. (2009). Clustering and dimensional reduction for mixed variables. (Unpublished manuscript)

  28. Yamamoto, M., & Hwang, H. (2014). A general formulation of cluster analysis with dimension reduction and subspace separation. Behaviormetrika, 41, 115–129.

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to M. van de Velden.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 1126 KB)

Appendix: GROUPALS and Cluster CA

Appendix: GROUPALS and Cluster CA

To show the relationship between GROUPALS and cluster CA, we consider the GROUPALS objective function for the case where all variables are categorical.

$$\begin{aligned} \min \phi _{\text {groupals}}\left( {\mathbf {B},\mathbf {Z}_{K},\mathbf {G}}\right) =\frac{1}{p}\sum _{j=1}^{p}\left\| \mathbf {Z}_{K}\mathbf {G}-\mathbf {Z}_{j}\mathbf {B}_{j}\right\| ^{2}, \end{aligned}$$

subject to

$$\begin{aligned} \sum \limits _{j=1}^{p} \mathbf {B}_{j}^{^{\prime }}\mathbf {Z}_{j}^{^{\prime }}\mathbf {Z}_{j} \mathbf {B}_{j}=\mathbf {I}_{k}. \end{aligned}$$

We can solve this problem deriving the first-order conditions. These first-order conditions can be used to formulate an alternating least-squares algorithm. Thus, we fix \(\mathbf {Z}_{K}\) and solve for \(\mathbf {B}_{j}\) and \(\mathbf {G}\) by setting up the Lagrangian:

$$\begin{aligned} \psi&=\frac{1}{p}\sum _{j=1}^{p}\hbox {trace }\left( \mathbf {Z} _{K}\mathbf {G-Z}_{j}\mathbf {B}_{j}\right) ^{^{\prime }}\left( \mathbf {Z} _{K}\mathbf {G-Z}_{j}\mathbf {B}_{j}\right) +\hbox {trace } \mathbf {L}\left( \sum _{j=1}^{p}\mathbf {B}_{j}^{^{\prime }}\mathbf {D} _{j}\mathbf {B}_{j}-\mathbf {I}_{k}\right) \\&=\hbox {trace }\mathbf {G}^{^{\prime }}\mathbf {Z}_{K}^{^{\prime } }\mathbf {Z}_{K}\mathbf {G}+\frac{1}{p}\sum _{j=1}^{p}\hbox {trace } \mathbf {B}_{j}^{^{\prime }}\mathbf {Z}_{j}^{^{\prime }}\mathbf {Z}_{j} \mathbf {B}_{j}-\frac{2}{p}\sum _{j=1}^{p}\hbox {trace }\mathbf {G} ^{^{\prime }}\mathbf {Z}_{K}{}^{^{\prime }}\mathbf {Z}_{j}\mathbf {B} _{j}\\&\quad +\hbox {trace }\mathbf {L}\left( \sum _{j=1}^{p}\mathbf {B} _{j}^{^{\prime }}\mathbf {D}_{j}\mathbf {B}_{j}-\mathbf {I}_{k}\right) \\&=\hbox {trace }\mathbf {G}^{^{\prime }}\mathbf {Z}_{K}{}^{^{\prime } }\mathbf {Z}_{K}\mathbf {G}+\frac{k}{p}-\frac{2}{p}\sum _{j=1}^{p} \hbox {trace }\mathbf {G}^{^{\prime }}\mathbf {Z}_{K}{}^{^{\prime } }\mathbf {Z}_{j}\mathbf {B}_{j}+\hbox {trace }\mathbf {L}\left( \sum _{j=1}^{p}\mathbf {B}_{j}^{^{\prime }}\mathbf {D}_{j}\mathbf {B} _{j}-\mathbf {I}_{k}\right) , \end{aligned}$$

where \(\mathbf {L}\) is the matrix of Lagrange multipliers and \(\mathbf {D}_j = \mathbf {Z}_j^{\prime }\mathbf {Z}_j\). Taking derivatives and equating to zero yields the first-order conditions.

For \(\mathbf {G:}\)

$$\begin{aligned} 2\hbox { trace }\mathbf {G}^{^{\prime }}\mathbf {Z}_{K}{}^{^{\prime } }\mathbf {Z}_{K}d\mathbf {G}&\,\mathbf {=}\,\frac{2}{p}\sum _{j=1}^{p} \hbox {trace }\mathbf {B}_{j}^{^{\prime }}\mathbf {Z}_{j}^{^{\prime } }\mathbf {Z}_{K}d\mathbf {G}\\ \mathbf {G}^{^{\prime }}\mathbf {Z}_{K}{}^{^{\prime }}\mathbf {Z}_{K}&\,\mathbf {=}\,\frac{1}{p}\sum _{j=1}^{p}\mathbf {B}_{j}^{^{\prime }}\mathbf {Z} _{j}^{^{\prime }}\mathbf {Z}_{K}\\ \mathbf {G}&\,\mathbf {=}\,\frac{1}{p}\left( \mathbf {Z}_{K}{}^{^{\prime } }\mathbf {Z}_{K}\right) ^{-1}\mathbf {Z}_{K}{}^{^{\prime }}\sum _{j=1} ^{p}\mathbf {Z}_{j}\mathbf {B}_{j}. \end{aligned}$$

For \(\mathbf {B}_{j}:\)

$$\begin{aligned} \frac{2}{p}\hbox {trace }\mathbf {G}^{^{\prime }}\mathbf {Z}_{K} {}^{^{\prime }}\mathbf {Z}_{j}d\mathbf {B}_{j}&=2\hbox { trace } \mathbf {LB}_{j}^{^{\prime }}\mathbf {D}_{j}d\mathbf {B}_{j}\\ \frac{1}{p}\mathbf {Z}_{j}^{^{\prime }}\mathbf {Z}_{K}\mathbf {G}&=\mathbf {D}_{j}\mathbf {B}_{j}\mathbf {L}. \end{aligned}$$

Inserting the solution for \(\mathbf {G}\) we obtain

$$\begin{aligned} \frac{1}{p^{2}}\mathbf {Z}_{j}^{^{\prime }}\mathbf {Z}_{K}\left( \mathbf {Z} _{K}{}^{^{\prime }}\mathbf {Z}_{K}\right) ^{-1}\mathbf {Z}_{K}{}^{^{\prime }} \sum _{j=1}^{p}\mathbf {Z}_{j}\mathbf {B}_{j}=\mathbf {D}_{j}\mathbf {B} _{j}\mathbf {L}. \end{aligned}$$

Note that, as the constraints are symmetric, \(\mathbf {L}\) is also symmetric. Furthermore, as \(j=1,...,p\), we have p equations. However, defining \(\mathbf {Z}=\left[ \mathbf {Z}_{1},\ldots ,\mathbf {Z} _{p}\right] \) and \(\mathbf {B}=\left[ \mathbf {B}_{1}^{^{\prime }} ,\ldots ,\mathbf {B}_{p}^{^{\prime }}\right] ^{^{\prime }}\), the p equations can be expressed as

$$\begin{aligned} \frac{1}{p^{2}}\mathbf {Z}^{^{\prime }}\mathbf {Z}_{K}\left( \mathbf {Z}_{K} {}^{^{\prime }}\mathbf {Z}_{K}\right) ^{-1}\mathbf {Z}_{K}{}^{^{\prime } }\mathbf {ZB}=\mathbf {DBL}, \end{aligned}$$

where \(\mathbf {D}\) is a block-diagonal matrix with as diagonal blocks \(\mathbf {D}_{1},\ldots ,\mathbf {D}_{p}\).

Premultiplying both sides by \(\mathbf {D}^{-1/2}\) we get

$$\begin{aligned} \frac{1}{p^{2}}\mathbf {D}^{-1/2}\mathbf {Z}^{^{\prime }}\mathbf {Z}_{K}\left( \mathbf {Z}_{K}{}^{^{\prime }}\mathbf {Z}_{K}\right) ^{-1}\mathbf {Z}_{K} {}^{^{\prime }}\mathbf {Z}\mathbf {D}^{-1/2}{\mathbf {D}}^{1/2} \mathbf {B}=\mathbf {D}^{1/2}\mathbf {BL}\text {.} \end{aligned}$$

Without loss of generality we can replace \(\mathbf {L}\) by its eigendecomposition \(\mathbf {U}\varvec{\Lambda }\mathbf {U}^{\prime }\) to get

$$\begin{aligned} \frac{1}{p^{2}}\mathbf {D}^{-1/2}\mathbf {Z}^{^{\prime }}\mathbf {Z}_{K}\left( \mathbf {Z}_{K}{}^{^{\prime }}\mathbf {Z}_{K}\right) ^{-1}\mathbf {Z}_{K} {}^{^{\prime }}\mathbf {Z}\mathbf {D}^{-1/2}\mathbf {D}^{1/2}\mathbf {B}=\mathbf {D} ^{1/2}\mathbf {BU}{\varvec{\Lambda }} \mathbf {U}^{^{\prime }} \end{aligned}$$

so that

$$\begin{aligned} \frac{1}{p^{2}}\mathbf {D}^{-1/2}\mathbf {Z}^{^{\prime }}\mathbf {Z}_{K}\left( \mathbf {Z}_{K}{}^{^{\prime }}\mathbf {Z}_{K}\right) ^{-1}\mathbf {Z} _{K}^{^{\prime }}\mathbf {Z}\mathbf {D}^{-1/2}\mathbf {D}^{1/2}\mathbf {B}\mathbf {U} =\mathbf {D}^{1/2}\mathbf {BU}\varvec{\Lambda }. \end{aligned}$$

Hence, letting

$$\begin{aligned} \mathbf {B}^{*}={\mathbf {D}}^{1/2}\mathbf {B}\mathbf {U} \end{aligned}$$

we see that \(\mathbf {B}^{*}\) can be obtained by taking the first k orthonormal eigenvectors (corresponding to the k largest eigenvalues) of

$$\begin{aligned} \frac{1}{p^{2}}\mathbf {D}^{-1/2}\mathbf {Z}^{^{\prime }}\mathbf {Z}_{K}\left( \mathbf {Z}_{K}{}^{^{\prime }}\mathbf {Z}_{K}\right) ^{-1}\mathbf {Z}_{K} {}^{^{\prime }}\mathbf {Z}\mathbf {D}^{-1/2}. \end{aligned}$$
(23)

The appropriately standardized category quantifications become

$$\begin{aligned} \mathbf {B}=\mathbf {D}^{-1/2}\mathbf {B}^{*} \end{aligned}$$
(24)

and \(\mathbf {G}\) is obtained by inserting this into the first-order condition for \(\mathbf {G}\), that is,

$$\begin{aligned} \mathbf {G}\,\mathbf {=}\,\frac{1}{p}\left( \mathbf {Z}_{K}{}^{^{\prime }} \mathbf {Z}_{K}\right) ^{-1}\mathbf {Z}_{K}{}^{^{\prime }}\mathbf {ZB} . \end{aligned}$$
(25)

To find \(\mathbf {Z}_{K}\), recall the original objective function:

$$\begin{aligned} \min \phi _{\text {groupals}}\left( {\mathbf {B},\mathbf {Z}_{K},\mathbf {G}}\right) =\frac{1}{p}\sum _{j=1}^{p}\left\| \mathbf {Z}_{K}\mathbf {G-Z}_{j}\mathbf {B}_{j}\right\| ^{2}. \end{aligned}$$

For fixed \(\mathbf {B}_{j}\), this is equivalent to considering

$$\begin{aligned} \min \phi ^{\prime }_{\text {groupals}}\left( \mathbf {Z}_{K},\mathbf {G}\right) =\left\| \frac{1}{p}\sum _{j=1} ^{p}\mathbf {Z}_{j}\mathbf {B}_{j}-\mathbf {Z}_{K}\mathbf {G}\right\| ^{2}. \end{aligned}$$

Hence, to find \(\mathbf {Z}_{K}\) we can apply K-means to the “average configuration”: \(\frac{1}{p}\sum _{j=1}^{p}\mathbf {Z}_{j}\mathbf {B}_{j}\).

Note: It can easily be verified that \(\mathbf {D}^{1/2}\mathbf {1}\) is an eigenvector of (23) corresponding to the eigenvalue 1. Hence, as in CA and MCA, there is a so-called trivial first solution. Discarding this solution can be achieved by centering \(\mathbf {Z}\).

We can summarize the resulting GROUPALS algorithm as follows:

  1. 1.

    Generate an initial cluster allocation \(\mathbf {Z}_{K}\) (e.g., by randomly assigning subjects to clusters).

  2. 2.

    Use (23), (24) and (25) to obtain \(\mathbf {B}\) and \(\mathbf {G}\).

  3. 3.

    Apply the K-means algorithm to the average configuration \(\frac{1}{p}\sum _{j=1}^{p}\mathbf {Z}_{j}\mathbf {B}_{j}\), using \(\mathbf {G}\) for the initial cluster means, to update \(\mathbf {Z}_{K}\) and \(\mathbf {G}\).

  4. 4.

    Return to step 2 and repeat until convergence.

Comparing this algorithm to the cluster CA algorithm in Sect. 3 shows that despite the different objectives, the two approaches lead to the same algorithm when all variables are categorical.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

van de Velden, M., D’Enza, A.I. & Palumbo, F. Cluster Correspondence Analysis. Psychometrika 82, 158–185 (2017). https://doi.org/10.1007/s11336-016-9514-0

Download citation

Keywords

  • correspondence analysis
  • cluster analysis
  • dimension reduction
  • categorical data