Abstract
A method is proposed that combines dimension reduction and cluster analysis for categorical data by simultaneously assigning individuals to clusters and optimal scaling values to categories in such a way that a single between variance maximization objective is achieved. In a unified framework, a brief review of alternative methods is provided and we show that the proposed method is equivalent to GROUPALS applied to categorical data. Performance of the methods is appraised by means of a simulation study. The results of the joint dimension reduction and clustering methods are compared with the so-called tandem approach, a sequential analysis of dimension reduction followed by cluster analysis. The tandem approach is conjectured to perform worse when variables are added that are unrelated to the cluster structure. Our simulation study confirms this conjecture. Moreover, the results of the simulation study indicate that the proposed method also consistently outperforms alternative joint dimension reduction and clustering methods.
Similar content being viewed by others
Notes
The presented results are based on the solution using 100 random starts. Increasing the number of random starts to 1000 led to small changes in the configuration that had no effect on the interpretation. The congruence indices with the current solution were 0.997, for the attributes, and 0.999, for the subjects.
References
Bäck, T. (1996). Evolutionary algorithms in theory and practice: Evolution strategies, evolutionary programming, genetic algorithms. Oxford: Oxford University Press.
Borg, I., & Groenen, P. J. (2005). Modern multidimensional scaling: Theory and applications. New York: Springer.
De Soete, G., & Carroll, J. D. (1994). K-means clustering in a low-dimensional euclidean space. In E. Diday, Y. Lechevallier, M. Schader, P. Bertrand, & B. Burtschy (Eds.), New approaches in classification and data analysis (pp. 212–219). Berlin: Springer.
Gifi, A. (1990). Nonlinear multivariate analysis. Chichester: Wiley.
Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27, 623–637.
Gower, J. C., Lubbe, S. G., & Le Roux, N. J. (2011). Understanding biplots. New York: Wiley.
Gower, J. C., Groenen, P. J. F., & van de Velden, M. (2010). Area biplots. Journal of Computational and Graphical Statistics, 19(1), 46–61.
Gower, J. C., & Hand, D. J. (1996). Biplots. London: Chapman and Hall.
Greenacre, M. J. (1984). Theory and applications of correspondence analysis. London: Academic Press.
Greenacre, M. J. (1993). Biplots in correspondence analysis. Journal of Applied Statistics, 20(2), 251–269.
Greenacre, M. J. (2007). Correspondence analysis in practice. Boca Raton: CRC Press.
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218. doi:10.1007/BF01908075
Hwang, H., Dillon, W. R., & Takane, Y. (2006). An extension of multiple correspondence analysis for identifying heterogenous subgroups of respondents. Psychometrika, 71, 161–171.
Iodice D’Enza, A., & Palumbo, F. (2013). Iterative factor clustering of binary data. Computational Statistics, 789-807. doi:10.1007/s00180-012-0329-x
Iodice D’Enza, A., van de Velden, M., & Palumbo, F. (2014). On joint dimension reduction and clustering of categorical data. In D. Vicari, A. Okada, G. Ragozini, & C. Weihs (Eds.), Analysis and modeling of complex data in behavioral and social sciences. Berlin: Springer.
Jolliffe, J. (2002). Principal component analysis. New York: Springer.
Kroonenberg, P. M., & Lombardo, R. (1999). Nonsymmetric correspondence analysis: A tool for analysing contingency tables with a dependence structure. Multivariate Behavioral Research, 34, 367–396.
Lauro, N., & D’Ambra, L. (1984). L’ analyse non symetrique des correspondances [nonsymmetric correspondence analysis]. In E. Diday, L. Lebart, M. Jambu, & Thomassone (Eds.), Data analysis and informatics III (pp. 433–446). Amsterdam: Elsevier.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In L. Cam & J. Neyman (Eds.), Proceedings of the fifth berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 281–297). California: University of California Press.
Martin, R. A., Puhlik-Doris, P., Larsen, G., Gray, J., & Weir, K. (2003). Individual differences in uses of humor and their relation to psychological well-being: Development of the humor styles questionnaire. Journal of Research in Personality, 37(1), 48–75.
Nishisato, S. (1980). Analysis of categorical data: Dual scaling and its applications. Toronto: University of Toronto Press.
Nishisato, S. (1994). Elements of dual scaling: An introduction to practical data analysis. Hillsdale, NJ: Lawrence Erlbaum Associates.
van de Velden, M., & Bijmolt, T. (2006). Generalized canonical correlation analysis of matrices with missing rows: A simulation study. Psychometrika, 71(2), 323–331.
van de Velden, M., & Takane, Y. (2012). Generalized canonical correlation analysis with missing values. Computational Statistics, 27(3), 551–571.
Van Buuren, S., & Heiser, W. (1989). Clustering n objects into k groups under optimal scaling of variables. Psychometrika, 54, 699–706.
Vichi, M., & Kiers, H. A. L. (2001). Factorial k-means analysis for two-way data. Computational Statistics and Data Analysis, 37, 49–64.
Vichi, M., Vicari, D., & Kiers, H. (2009). Clustering and dimensional reduction for mixed variables. (Unpublished manuscript)
Yamamoto, M., & Hwang, H. (2014). A general formulation of cluster analysis with dimension reduction and subspace separation. Behaviormetrika, 41, 115–129.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix: GROUPALS and Cluster CA
Appendix: GROUPALS and Cluster CA
To show the relationship between GROUPALS and cluster CA, we consider the GROUPALS objective function for the case where all variables are categorical.
subject to
We can solve this problem deriving the first-order conditions. These first-order conditions can be used to formulate an alternating least-squares algorithm. Thus, we fix \(\mathbf {Z}_{K}\) and solve for \(\mathbf {B}_{j}\) and \(\mathbf {G}\) by setting up the Lagrangian:
where \(\mathbf {L}\) is the matrix of Lagrange multipliers and \(\mathbf {D}_j = \mathbf {Z}_j^{\prime }\mathbf {Z}_j\). Taking derivatives and equating to zero yields the first-order conditions.
For \(\mathbf {G:}\)
For \(\mathbf {B}_{j}:\)
Inserting the solution for \(\mathbf {G}\) we obtain
Note that, as the constraints are symmetric, \(\mathbf {L}\) is also symmetric. Furthermore, as \(j=1,...,p\), we have p equations. However, defining \(\mathbf {Z}=\left[ \mathbf {Z}_{1},\ldots ,\mathbf {Z} _{p}\right] \) and \(\mathbf {B}=\left[ \mathbf {B}_{1}^{^{\prime }} ,\ldots ,\mathbf {B}_{p}^{^{\prime }}\right] ^{^{\prime }}\), the p equations can be expressed as
where \(\mathbf {D}\) is a block-diagonal matrix with as diagonal blocks \(\mathbf {D}_{1},\ldots ,\mathbf {D}_{p}\).
Premultiplying both sides by \(\mathbf {D}^{-1/2}\) we get
Without loss of generality we can replace \(\mathbf {L}\) by its eigendecomposition \(\mathbf {U}\varvec{\Lambda }\mathbf {U}^{\prime }\) to get
so that
Hence, letting
we see that \(\mathbf {B}^{*}\) can be obtained by taking the first k orthonormal eigenvectors (corresponding to the k largest eigenvalues) of
The appropriately standardized category quantifications become
and \(\mathbf {G}\) is obtained by inserting this into the first-order condition for \(\mathbf {G}\), that is,
To find \(\mathbf {Z}_{K}\), recall the original objective function:
For fixed \(\mathbf {B}_{j}\), this is equivalent to considering
Hence, to find \(\mathbf {Z}_{K}\) we can apply K-means to the “average configuration”: \(\frac{1}{p}\sum _{j=1}^{p}\mathbf {Z}_{j}\mathbf {B}_{j}\).
Note: It can easily be verified that \(\mathbf {D}^{1/2}\mathbf {1}\) is an eigenvector of (23) corresponding to the eigenvalue 1. Hence, as in CA and MCA, there is a so-called trivial first solution. Discarding this solution can be achieved by centering \(\mathbf {Z}\).
We can summarize the resulting GROUPALS algorithm as follows:
-
1.
Generate an initial cluster allocation \(\mathbf {Z}_{K}\) (e.g., by randomly assigning subjects to clusters).
-
2.
Use (23), (24) and (25) to obtain \(\mathbf {B}\) and \(\mathbf {G}\).
-
3.
Apply the K-means algorithm to the average configuration \(\frac{1}{p}\sum _{j=1}^{p}\mathbf {Z}_{j}\mathbf {B}_{j}\), using \(\mathbf {G}\) for the initial cluster means, to update \(\mathbf {Z}_{K}\) and \(\mathbf {G}\).
-
4.
Return to step 2 and repeat until convergence.
Comparing this algorithm to the cluster CA algorithm in Sect. 3 shows that despite the different objectives, the two approaches lead to the same algorithm when all variables are categorical.
Rights and permissions
About this article
Cite this article
van de Velden, M., D’Enza, A.I. & Palumbo, F. Cluster Correspondence Analysis. Psychometrika 82, 158–185 (2017). https://doi.org/10.1007/s11336-016-9514-0
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-016-9514-0