Skip to main content

Abstract

In previous papers, we propose a generalized principal component analysis (GPCA) aimed to display salient features of a multidimensional data set, in particular the existence of clusters. In the light of an example, this article evidences how GPCA and clustering methods are complementary. The projections provided by GPCA and the sequence of eigenvalues give useful indications on the number and the type of clusters to be expected; submitting GPCA principal components to a clustering algorithm instead of the raw data can improve the classification. The use of a convenient robustification of GPCA is also evoked.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • ART, D., GNANADESIKAN, R. and KETTENRING, J.R. (1982): Data-based metrics for cluster analysis. Utilitas Mathematica, 21A, 75–99.

    Google Scholar 

  • BOCK, H.H. (1987): On the interface between cluster analysis, principal component analysis, and multidimensional scaling. In: H. Bozdogan and A.K. Gupta (Eds.): Multivariate Statistical Modeling and Data Analysis. D. Reidel Publishing Company, 17–34.

    Google Scholar 

  • CAUSSINUS, H., FEKRI, M., HAKAM, S. and RUIZ-GAZEN, A. (2003a): A monitoring display of multivariate outliers. Computational Statistics and Data Analysis 44, 237–252.

    Article  Google Scholar 

  • CAUSSINUS, H., HAKAM, S. and RUIZ-GAZEN, A. (2003b): Projections révélatrices contrôlées: groupements et structures diverses. Revue de Statistique Appliquée 51(1), 37–58.

    Google Scholar 

  • CAUSSINUS, H. and RUIZ-GAZEN, A. (1993): Projection pursuit and generalized principal component analyses. In: S. Morgenthaler, E. Ronchetti, W.A. Stahel (Eds.): New directions in statistical data analysis and robustness. Birkhauser Verlag, Basel Boston Berlin, 35–46.

    Google Scholar 

  • CAUSSINUS, H. and RUIZ-GAZEN, A. (1995): Metrics for finding typical structures by means of principal component analysis. In: Y. Escoufier and C. Hayashi (Eds.): Data Science and its Applications. Academic Press, Tokyo, 177–192.

    Google Scholar 

  • CAUSSINUS, H. and RUIZ-GAZEN, A. (2006): Projection pursuit approach for categorical data. In: M. Greenacre and J. Blasius (Eds.): Multiple Correspondence Analysis and Related Methods. Chapman and Hall/CRC, London, 405–418.

    Google Scholar 

  • CHAE, S. S. and WARDE, W.D. (2006): Effect of using principal coordinates and principal components on retrieval of clusters. Computational Statistics and Data Analysis 50, 1407–1417.

    Article  Google Scholar 

  • CHAVENT, M., LACOMBLEZ, C. and PATOUILLE, B. (2001): Critère de Rand asymétrique. Huitièmes rencontres de la Société Francophone de Classification, Pointe à Pitre, 82–88.

    Google Scholar 

  • COOK, D., CARAGEA, D. and HONAVAR, H. (2004): Visualization in classification problems. In: J. Antoch (Ed.): Proceedings in Computational Statistics (COMPSTAT 2004), Springer, Berlin, 799–806.

    Google Scholar 

  • DIDAY, E. et collaborateurs (1979): Optimisation en classification automatique. INRIA, Roquencourt.

    MATH  Google Scholar 

  • FORINA, M., ARMANINO, C. LANTERI, S. and TISCORNIA, E. (1983): Classification of olive oils from their fatty acid composition. In: H. Martens and H. Russwurm Jr. (Eds.): Food Research and Data Analysis. Applied Science Publishers, London, 189–214.

    Google Scholar 

  • GABRIEL, K.R. (1971): The biplot: graphical display of matrices with application to principal component analysis. Biometrika 58 453–467.

    Article  MATH  Google Scholar 

  • GABRIEL, K.R. (2002): Le biplot: outil d’exploration des données multidimensionnelles. Journal de la Société Française de Statistique 143(3–4), 5–55.

    Google Scholar 

  • GLOVER, D.M. and HOPKE, P.K. (1992): Exploration of multivariate chemical data by projection pursuit. Chemometrics and Intelligent Laboratory Systems 16, 45–59.

    Article  Google Scholar 

  • HARTIGAN, J. and WONG, M.A. (1979): A k-means clustering algorithm. Applied Statistics, 28, 100–108.

    Article  MATH  Google Scholar 

  • RAND, W.M. (1971): Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66(336), 846–850.

    Article  Google Scholar 

  • RUIZ-GAZEN, A. (1996): A very simple robust estimator of a dispersion matrix. Computational Statistics and Data Analysis 21, 149–162.

    Article  MATH  Google Scholar 

  • STUTE, W. and ZHU, L.X. (1995): Asymptotics of k-means clustering based on projection pursuit. Sankhya 57, series A(3), 462–471.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Caussinus, H., Ruiz-Gazen, A. (2007). Classification and Generalized Principal Component Analysis. In: Brito, P., Cucumel, G., Bertrand, P., de Carvalho, F. (eds) Selected Contributions in Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73560-1_50

Download citation

Publish with us

Policies and ethics