Abstract
In data analytics, to extract features from multivariate data, objects and variables are often displayed in a reduced space that is easy to interpret. When clustering is used in these displays, we often obtain the cluster structure of the original data using an approach combining two multivariate analysis methods called “simultaneous analysis.” Stratifying objects can also extract features of the variables that we would like to interpret. Simultaneous analysis methods for these tasks estimate the unknown parameters of the two methods simultaneously and can find a low-dimensional subspace that reflects the cluster structure. However, despite the many common parts of these methods, it is necessary to change the method depending to the aim of the analysis and data type, making them inconvenient to actually use. To address this shortcoming, we propose a simultaneous analysis framework that is composed of several possible reduction methods integrated with clustering methods. The unified framework is applicable to numerical, categorical, and mixed data. Using this method, we can display objects and variables in a low-dimensional subspace that reflects the cluster structure. Moreover, we discuss the framework’s extensions and how it relates to several other proposed simultaneous analysis methods.
Similar content being viewed by others
References
Adachi, K., & Murakami, T. (2011). Nonmetric multivariate analysis. Tokyo: Asakura-Shoten. (in Japanese).
Adachi, K. (2016). Matrix-based introduction to multivariate data analysis. Singapore: Springer.
Adachi, K., & Trendafilov, N. T. (2018). Some mathematical properties of the matrix decomposition solution in factor analysis. Psychometrika, 83, 1–18.
Arabie, P., & Hubert, L. (1994). Cluster analysis in marketing research. In R. P. Bagozzi (Ed.), Handbook of Marketing Research. Oxford: Blackwell.
Bezdek, J. C. (1974). Numerical taxonomy with fuzzy sets. Journal of Mathematical Biology, 1, 57–71.
De Leeuw, J., Young, F. W., & Takane, Y. (1976). Additive structure in qualitative data: An alternating least squares method with optimal scaling features. Psychometrika, 41, 471–503.
De Soete, G., & Carroll, J. D. (1994). \(K\)-means clustering in a low-dimensional Euclidean space. In E. Diday, Y. Lechevallier, M. Schader, P. Bertrand, & B. Burtschy (Eds.), New approaches in classification and data analysis (pp. 212–219). Heidelberg: Springer.
Fordellone, M., & Vichi, M. (2017). Multiple correspondence \(k\)-means: simultaneous versus sequential approach for dimension reduction and clustering. Data science and social research (pp. 81–95). Cham: Springer.
Gifi, A. (1990). Nonlinear multivariate analysis. Chichester: Wiley.
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417.
Hwang, H., & Dillon, W. R. (2010). Simultaneous two-way clustering of multiple correspondence analysis. Multivariate Behavioral Research, 45, 186–208.
Hwang, H., Dillon, W. R., & Takane, Y. (2006). An extension of multiple correspondence analysis for identifying heterogeneous subgroups of respondents. Psychometrika, 71, 161–171.
Hwang, H., Dillon, W. R., & Takane, Y. (2010). Fuzzy cluster multiple correspondence analysis. Behaviormetrika, 37, 111–133.
Iodice D’ Enza, A., & Palumbo, F. (2013). Iterative factor clustering of binary data. Computational Statistics, 28, 1–19.
Linting, M., Meulman, J. J., Groenen, P. J., & Van der Kooij, A. J. (2007). Nonlinear principal components analysis: introduction and application. Psychological methods, 12, 336–358.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1, 281–297.
Makino, N. (2015). Generalized data-fitting factor analysis with multiple quantification of categorical variables. Computational Statistics, 30, 1–14.
Meulman, J. J., Van der Kooij, A. J., & Heiser, W. J. (2004). Principal components analysis with nonlinear optimal scaling transformations for ordinal and nominal data. The Sage handbook of quantitative methodology for the social sciences (pp. 49–72).
Mitsuhiro, M., & Yadohisa, H. (2013). Simultaneous fuzzy clustering with multiple correspondence analysis. In Proceedings of the 59th World Statistics Congress of the International Statistics Institute (pp. 5567–5572).
Mitsuhiro, M., & Yadohisa, H. (2015). Reduced \(k\)-means clustering with MCA in a low-dimensional space. Computational Statistics, 30, 463–475.
Mori, Y., Kuroda, M., & Makino, N. (2016). Nonlinear Principal Component Analysis and Its Applications. Singapore: Springer.
Mulaik, S. A. (2010). Foundations of Factor Analysis (2nd ed.). Boca Raton: Chapman and Hall/CRC.
Reich, Y., & Fenves, S. J. (1992). Inductive learning of synthesis knowledge. International Journal of Expert Systems Research and Applications, 5, 275–275.
Rocci, R., Gattone, S. A., & Vichi, M. (2011). A new dimension reduction method: factor discriminant \(k\)-means. Journal of Classification, 28, 210–226.
Steinley, D. (2003). Local optima in \(k\)-means clustering: what you don’t know may hurt you. Psychological Methods, 8, 294.
Takane, Y., Young, F. W., & de Leeuw, J. (1979). Nonmetric common factor analysis: An alternating least squares method with optimal scaling features. Behaviormetrika, 6, 45–56.
Timmerman, M. E., Ceulemans, E., Kiers, H. A. L., & Vichi, M. (2010). Factorial and reduced \(k\)-means reconsidered. Computational Statistics and Data Analysis, 54, 1858–1871.
Timmerman, M. E., Ceulemans, E., De Roover, K., & Van Leeuwen, K. (2013). Subspace \(k\)-means clustering. Behavior Research Methods, 45, 1011–1023.
Trendafilov, N. T., & Unkel, S. (2011). Exploratory factor analysis of data matrices with more variables than observations. Journal of Computational and Graphical Statistics, 20, 874–891.
Unkel, S., & Trendafilov, N. T. (2010). Simultaneous parameter estimation in exploratory factor analysis: An expository review. International Statistical Review, 78, 363–382.
Unkel, S., & Trendafilov, N. T. (2013). Zig-zag exploratory factor analysis with more variables than observations. Computational Statistics, 28, 107–125.
Van Buuren, S., & Heiser, W. J. (1989). Clustering \(n\) objects into \(k\) groups under optimal scaling of variables. Psychometrika, 54, 699–706.
Vichi, M., & Kiers, H. A. L. (2001). Factorial \(k\)-means analysis for two-way data. Computational Statistics and Data Analysis, 37, 49–64.
Vichi, M., Rocci, R., & Kiers, H. A. (2007). Simultaneous component and clustering models for three-way data: within and between approaches. Journal of Classification, 24, 71–98.
Yamamoto, M., & Hwang, H. (2014). A general formulation of cluster analysis with dimension reduction and subspace separation. Behaviormetrika, 41, 115–129.
Acknowledgements
The authors would like to thank the reviewers and editors for their helpful comments on this manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mitsuhiro, M., Yadohisa, H. A unified representation of simultaneous analysis methods of reduction and clustering. Jpn J Stat Data Sci 1, 393–412 (2018). https://doi.org/10.1007/s42081-018-0022-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42081-018-0022-6