Skip to main content
Log in

A unified representation of simultaneous analysis methods of reduction and clustering

  • Published:
Japanese Journal of Statistics and Data Science Aims and scope Submit manuscript

Abstract

In data analytics, to extract features from multivariate data, objects and variables are often displayed in a reduced space that is easy to interpret. When clustering is used in these displays, we often obtain the cluster structure of the original data using an approach combining two multivariate analysis methods called “simultaneous analysis.” Stratifying objects can also extract features of the variables that we would like to interpret. Simultaneous analysis methods for these tasks estimate the unknown parameters of the two methods simultaneously and can find a low-dimensional subspace that reflects the cluster structure. However, despite the many common parts of these methods, it is necessary to change the method depending to the aim of the analysis and data type, making them inconvenient to actually use. To address this shortcoming, we propose a simultaneous analysis framework that is composed of several possible reduction methods integrated with clustering methods. The unified framework is applicable to numerical, categorical, and mixed data. Using this method, we can display objects and variables in a low-dimensional subspace that reflects the cluster structure. Moreover, we discuss the framework’s extensions and how it relates to several other proposed simultaneous analysis methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Adachi, K., & Murakami, T. (2011). Nonmetric multivariate analysis. Tokyo: Asakura-Shoten. (in Japanese).

    Google Scholar 

  • Adachi, K. (2016). Matrix-based introduction to multivariate data analysis. Singapore: Springer.

    Book  Google Scholar 

  • Adachi, K., & Trendafilov, N. T. (2018). Some mathematical properties of the matrix decomposition solution in factor analysis. Psychometrika, 83, 1–18.

    Article  MathSciNet  Google Scholar 

  • Arabie, P., & Hubert, L. (1994). Cluster analysis in marketing research. In R. P. Bagozzi (Ed.), Handbook of Marketing Research. Oxford: Blackwell.

    MATH  Google Scholar 

  • Bezdek, J. C. (1974). Numerical taxonomy with fuzzy sets. Journal of Mathematical Biology, 1, 57–71.

    Article  MathSciNet  Google Scholar 

  • De Leeuw, J., Young, F. W., & Takane, Y. (1976). Additive structure in qualitative data: An alternating least squares method with optimal scaling features. Psychometrika, 41, 471–503.

    Article  Google Scholar 

  • De Soete, G., & Carroll, J. D. (1994). \(K\)-means clustering in a low-dimensional Euclidean space. In E. Diday, Y. Lechevallier, M. Schader, P. Bertrand, & B. Burtschy (Eds.), New approaches in classification and data analysis (pp. 212–219). Heidelberg: Springer.

    Chapter  Google Scholar 

  • Fordellone, M., & Vichi, M. (2017). Multiple correspondence \(k\)-means: simultaneous versus sequential approach for dimension reduction and clustering. Data science and social research (pp. 81–95). Cham: Springer.

    Chapter  Google Scholar 

  • Gifi, A. (1990). Nonlinear multivariate analysis. Chichester: Wiley.

    MATH  Google Scholar 

  • Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417.

    Article  Google Scholar 

  • Hwang, H., & Dillon, W. R. (2010). Simultaneous two-way clustering of multiple correspondence analysis. Multivariate Behavioral Research, 45, 186–208.

    Article  Google Scholar 

  • Hwang, H., Dillon, W. R., & Takane, Y. (2006). An extension of multiple correspondence analysis for identifying heterogeneous subgroups of respondents. Psychometrika, 71, 161–171.

    Article  MathSciNet  Google Scholar 

  • Hwang, H., Dillon, W. R., & Takane, Y. (2010). Fuzzy cluster multiple correspondence analysis. Behaviormetrika, 37, 111–133.

    Article  Google Scholar 

  • Iodice D’ Enza, A., & Palumbo, F. (2013). Iterative factor clustering of binary data. Computational Statistics, 28, 1–19.

    Article  MathSciNet  Google Scholar 

  • Linting, M., Meulman, J. J., Groenen, P. J., & Van der Kooij, A. J. (2007). Nonlinear principal components analysis: introduction and application. Psychological methods, 12, 336–358.

    Article  Google Scholar 

  • MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1, 281–297.

    MathSciNet  MATH  Google Scholar 

  • Makino, N. (2015). Generalized data-fitting factor analysis with multiple quantification of categorical variables. Computational Statistics, 30, 1–14.

    Article  MathSciNet  Google Scholar 

  • Meulman, J. J., Van der Kooij, A. J., & Heiser, W. J. (2004). Principal components analysis with nonlinear optimal scaling transformations for ordinal and nominal data. The Sage handbook of quantitative methodology for the social sciences (pp. 49–72).

  • Mitsuhiro, M., & Yadohisa, H. (2013). Simultaneous fuzzy clustering with multiple correspondence analysis. In Proceedings of the 59th World Statistics Congress of the International Statistics Institute (pp. 5567–5572).

  • Mitsuhiro, M., & Yadohisa, H. (2015). Reduced \(k\)-means clustering with MCA in a low-dimensional space. Computational Statistics, 30, 463–475.

    Article  MathSciNet  Google Scholar 

  • Mori, Y., Kuroda, M., & Makino, N. (2016). Nonlinear Principal Component Analysis and Its Applications. Singapore: Springer.

    Book  Google Scholar 

  • Mulaik, S. A. (2010). Foundations of Factor Analysis (2nd ed.). Boca Raton: Chapman and Hall/CRC.

    MATH  Google Scholar 

  • Reich, Y., & Fenves, S. J. (1992). Inductive learning of synthesis knowledge. International Journal of Expert Systems Research and Applications, 5, 275–275.

    Article  Google Scholar 

  • Rocci, R., Gattone, S. A., & Vichi, M. (2011). A new dimension reduction method: factor discriminant \(k\)-means. Journal of Classification, 28, 210–226.

    Article  MathSciNet  Google Scholar 

  • Steinley, D. (2003). Local optima in \(k\)-means clustering: what you don’t know may hurt you. Psychological Methods, 8, 294.

    Article  Google Scholar 

  • Takane, Y., Young, F. W., & de Leeuw, J. (1979). Nonmetric common factor analysis: An alternating least squares method with optimal scaling features. Behaviormetrika, 6, 45–56.

    Article  Google Scholar 

  • Timmerman, M. E., Ceulemans, E., Kiers, H. A. L., & Vichi, M. (2010). Factorial and reduced \(k\)-means reconsidered. Computational Statistics and Data Analysis, 54, 1858–1871.

    Article  MathSciNet  Google Scholar 

  • Timmerman, M. E., Ceulemans, E., De Roover, K., & Van Leeuwen, K. (2013). Subspace \(k\)-means clustering. Behavior Research Methods, 45, 1011–1023.

    Article  Google Scholar 

  • Trendafilov, N. T., & Unkel, S. (2011). Exploratory factor analysis of data matrices with more variables than observations. Journal of Computational and Graphical Statistics, 20, 874–891.

    Article  MathSciNet  Google Scholar 

  • Unkel, S., & Trendafilov, N. T. (2010). Simultaneous parameter estimation in exploratory factor analysis: An expository review. International Statistical Review, 78, 363–382.

    Article  Google Scholar 

  • Unkel, S., & Trendafilov, N. T. (2013). Zig-zag exploratory factor analysis with more variables than observations. Computational Statistics, 28, 107–125.

    Article  MathSciNet  Google Scholar 

  • Van Buuren, S., & Heiser, W. J. (1989). Clustering \(n\) objects into \(k\) groups under optimal scaling of variables. Psychometrika, 54, 699–706.

    Article  MathSciNet  Google Scholar 

  • Vichi, M., & Kiers, H. A. L. (2001). Factorial \(k\)-means analysis for two-way data. Computational Statistics and Data Analysis, 37, 49–64.

    Article  MathSciNet  Google Scholar 

  • Vichi, M., Rocci, R., & Kiers, H. A. (2007). Simultaneous component and clustering models for three-way data: within and between approaches. Journal of Classification, 24, 71–98.

    Article  MathSciNet  Google Scholar 

  • Yamamoto, M., & Hwang, H. (2014). A general formulation of cluster analysis with dimension reduction and subspace separation. Behaviormetrika, 41, 115–129.

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the reviewers and editors for their helpful comments on this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Masaki Mitsuhiro.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mitsuhiro, M., Yadohisa, H. A unified representation of simultaneous analysis methods of reduction and clustering. Jpn J Stat Data Sci 1, 393–412 (2018). https://doi.org/10.1007/s42081-018-0022-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42081-018-0022-6

Keywords

Navigation