Abstract
In the two-step sequential approach called tandem analysis, we focus on applying a clustering algorithm on estimated object scores after dimensional reduction of variables. In this approach, reduction may obscure or mask taxonomic information (Arabie and Hubert in Handbook of marketing research. Blackwell, Oxford, 1994). As an alternative to tandem analysis, an approach combining two methods for categorical data is proposed by Hwang et al. (Psychometrika 71:161–171, 2006); however, this method does not consider the removal of object scores estimated as a vector of \(1\) that has no meaning in the first dimension. In this study, we propose a method for clustering objects consisting of categorical variables in a low-dimensional space. Our proposed method uses simultaneous analysis of multi-dimensional nonmetric principal component analysis and \(k\)-means clustering for categorical data; that is, we reduce dimensions with category quantifications, thus clustering object scores. We display object scores and variable categories, and therefore, every relationship between objects and categories can be interpreted for each cluster. Using simulated data, this method has been compared with tandem clustering and applied to real world data.
Similar content being viewed by others
References
Adachi K, Murakami T (2011) Nonmetric multivariate analysis. Asakura-Shoten, Tokyo (in Japanese)
Arabie P, Hubert L (1994) Cluster analysis in marketing research. In: Bagozzi RP (ed) Handbook of marketing research. Blackwell, Oxford
De Soete G, Carroll JD (1994) K-means clustering in a low-dimensional Euclidean space. In: Diday E, Lechevallier Y, Schader M, Bertrand P, Burtschy B (eds) New approaches in classification and data analysis. Springer, Heidelberg, pp 212–219
Gifi A (1990) Nonlinear multivariate analysis. Wiley, Chichester
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218
Hwang H, Dillon WR (2010) Simultaneous two-way clustering of multiple correspondence analysis. Multivar Behav Res 45:186–208
Hwang H, Dillon WR, Takane Y (2006) An extension of multiple correspondence analysis for identifying heterogeneous subgroups of respondents. Psychometrika 71:161–171
Hwang H, Dillon WR, Takane Y (2010) Fuzzy cluster multiple correspondence analysis. Behaviormetrika 37:111–133
Iodice D’ Enza A, Palumbo F (2013) Iterative factor clustering of binary data. Comput Stat 28:1–19
Iodice D’ Enza A, Van de Velden M, Palumbo F (2014) On joint dimension reduction and clustering of categorical data. In: Vicari D, Okada A, Ragozini G, Weihs C (eds) Analysis and modeling of complex data in behavioral and social sciences. Springer, Switzerland, pp 161–169
Lineoff GH (1981) The Audubon Society field guide to North American mushrooms. Alfred A. Knopf, New York
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Proc Fifth Berkeley Symp Math Stat Probab 1:281–297
Rocci R, Garrone SA, Vichi M (2011) A new dimension reduction method: factor discriminant \(k\)-means. J Classif 28:210–226
ten Berge JM (1993) Least squares optimization in multivariate analysis. DSWO Press, Leiden University, Leiden
Timmerman ME, Ceulemans E, Kiers HAL, Vichi M (2010) Factorial and reduced \(K\)-means reconsidered. Comput Stat Data Anal 54:1858–1871
Van Buuren S, Heiser WJ (1989) Clustering \(N\) objects into \(K\) groups under optimal scaling of variables. Psychometrika 54:699–706
Van de Velden M, Iodice D’ Enza A, Palumbo F (2012) On joint dimension reduction and clustering. In: JCS-CLADAG, analysis and modeling of complex data in behavioural and social sciences, September 3–4, 2012, Capri, Italy
Vichi M, Kiers HAL (2001) Factorial \(k\)-means analysis for two-way data. Comput Stat Data Anal 37:49–64
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mitsuhiro, M., Yadohisa, H. Reduced \(k\)-means clustering with MCA in a low-dimensional space. Comput Stat 30, 463–475 (2015). https://doi.org/10.1007/s00180-014-0544-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-014-0544-8