Advertisement

On Joint Dimension Reduction and Clustering of Categorical Data

  • Alfonso Iodice D’EnzaEmail author
  • Michel Van de Velden
  • Francesco Palumbo
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

There exist several methods for clustering high-dimensional data. One popular approach is to use a two-step procedure. In the first step, a dimension reduction technique is used to reduce the dimensionality of the data. In the second step, cluster analysis is applied to the data in the reduced space. This method may be referred to as the tandem approach. An important drawback of this method is that the dimension reduction may distort or hide the cluster structure. As an alternative, various authors have proposed joint dimension reduction and clustering approaches. In this paper we review some of these existing joint dimension reduction and clustering methods for categorical data in a unified framework that facilitates comparison.

Keywords

Cluster analysis Correspondence analysis Homogeneity analysis 

References

  1. Arabie, P., & Hubert, L. (1994). Cluster analysis in marketing research. IEEE Transactions on Automatic Control, 19, 716–723.Google Scholar
  2. Gifi, A. (1990). Nonlinear multivariate analysis. (579 pp). New York: John Wiley & Sons. ISBN 0-471-92620-5.Google Scholar
  3. Hwang, H., Dillon, W. R., & Takane, Y. (2006). An extension of multiple correspondence analysis for identifying heterogenous subgroups of respondents. Psychometrika, 71, 161–171.CrossRefMathSciNetGoogle Scholar
  4. Iodice D’ Enza, A., & Palumbo, F. (2013). Iterative factor clustering of binary data. Computational Statistics, 28(2), 789–807.CrossRefMathSciNetGoogle Scholar
  5. Lauro C. N., & D’Ambra, L. (1984). L’analyse non symétrique des correspondances. Data Analysis and Informatics, III, 433–446.Google Scholar
  6. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In L. M. L. Cam & J. Neyman (Eds.), Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 281–297).Google Scholar
  7. Nenadic, O., & Greenacre, M. (2007). Correspondence analysis in R, with two- and three-dimensional graphics: the ca package, Journal of Statistical Software, 20(3).Google Scholar
  8. Van Buuren, S., & Heiser, W. J. (1989). Clustering n objects in k groups under optimal scaling of variables. Psychometrika, 54, 699–706.CrossRefMathSciNetGoogle Scholar
  9. Vichi, M., & Kiers, H. (2001). Factorial k-means analysis for two way data. Computational Statistics & Data Analysis, 37, 49–64.CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Alfonso Iodice D’Enza
    • 1
    Email author
  • Michel Van de Velden
    • 2
  • Francesco Palumbo
    • 3
  1. 1.Università di CassinoCassinoItaly
  2. 2.Erasmus University of RotterdamPA RotterdamThe Netherlands
  3. 3.Università degli Studi di Napoli Federico IINapoliItaly

Personalised recommendations