Classification and Generalized Principal Component Analysis

Caussinus, Henri; Ruiz-Gazen, Anne

doi:10.1007/978-3-540-73560-1_50

Henri Caussinus²³ &
Anne Ruiz-Gazen^24,25

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2750 Accesses
5 Citations

Abstract

In previous papers, we propose a generalized principal component analysis (GPCA) aimed to display salient features of a multidimensional data set, in particular the existence of clusters. In the light of an example, this article evidences how GPCA and clustering methods are complementary. The projections provided by GPCA and the sequence of eigenvalues give useful indications on the number and the type of clusters to be expected; submitting GPCA principal components to a clustering algorithm instead of the raw data can improve the classification. The use of a convenient robustification of GPCA is also evoked.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

ART, D., GNANADESIKAN, R. and KETTENRING, J.R. (1982): Data-based metrics for cluster analysis. Utilitas Mathematica, 21A, 75–99.
Google Scholar
BOCK, H.H. (1987): On the interface between cluster analysis, principal component analysis, and multidimensional scaling. In: H. Bozdogan and A.K. Gupta (Eds.): Multivariate Statistical Modeling and Data Analysis. D. Reidel Publishing Company, 17–34.
Google Scholar
CAUSSINUS, H., FEKRI, M., HAKAM, S. and RUIZ-GAZEN, A. (2003a): A monitoring display of multivariate outliers. Computational Statistics and Data Analysis 44, 237–252.
Article Google Scholar
CAUSSINUS, H., HAKAM, S. and RUIZ-GAZEN, A. (2003b): Projections révélatrices contrôlées: groupements et structures diverses. Revue de Statistique Appliquée 51(1), 37–58.
Google Scholar
CAUSSINUS, H. and RUIZ-GAZEN, A. (1993): Projection pursuit and generalized principal component analyses. In: S. Morgenthaler, E. Ronchetti, W.A. Stahel (Eds.): New directions in statistical data analysis and robustness. Birkhauser Verlag, Basel Boston Berlin, 35–46.
Google Scholar
CAUSSINUS, H. and RUIZ-GAZEN, A. (1995): Metrics for finding typical structures by means of principal component analysis. In: Y. Escoufier and C. Hayashi (Eds.): Data Science and its Applications. Academic Press, Tokyo, 177–192.
Google Scholar
CAUSSINUS, H. and RUIZ-GAZEN, A. (2006): Projection pursuit approach for categorical data. In: M. Greenacre and J. Blasius (Eds.): Multiple Correspondence Analysis and Related Methods. Chapman and Hall/CRC, London, 405–418.
Google Scholar
CHAE, S. S. and WARDE, W.D. (2006): Effect of using principal coordinates and principal components on retrieval of clusters. Computational Statistics and Data Analysis 50, 1407–1417.
Article Google Scholar
CHAVENT, M., LACOMBLEZ, C. and PATOUILLE, B. (2001): Critère de Rand asymétrique. Huitièmes rencontres de la Société Francophone de Classification, Pointe à Pitre, 82–88.
Google Scholar
COOK, D., CARAGEA, D. and HONAVAR, H. (2004): Visualization in classification problems. In: J. Antoch (Ed.): Proceedings in Computational Statistics (COMPSTAT 2004), Springer, Berlin, 799–806.
Google Scholar
DIDAY, E. et collaborateurs (1979): Optimisation en classification automatique. INRIA, Roquencourt.
MATH Google Scholar
FORINA, M., ARMANINO, C. LANTERI, S. and TISCORNIA, E. (1983): Classification of olive oils from their fatty acid composition. In: H. Martens and H. Russwurm Jr. (Eds.): Food Research and Data Analysis. Applied Science Publishers, London, 189–214.
Google Scholar
GABRIEL, K.R. (1971): The biplot: graphical display of matrices with application to principal component analysis. Biometrika 58 453–467.
Article MATH Google Scholar
GABRIEL, K.R. (2002): Le biplot: outil d’exploration des données multidimensionnelles. Journal de la Société Française de Statistique 143(3–4), 5–55.
Google Scholar
GLOVER, D.M. and HOPKE, P.K. (1992): Exploration of multivariate chemical data by projection pursuit. Chemometrics and Intelligent Laboratory Systems 16, 45–59.
Article Google Scholar
HARTIGAN, J. and WONG, M.A. (1979): A k-means clustering algorithm. Applied Statistics, 28, 100–108.
Article MATH Google Scholar
RAND, W.M. (1971): Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66(336), 846–850.
Article Google Scholar
RUIZ-GAZEN, A. (1996): A very simple robust estimator of a dispersion matrix. Computational Statistics and Data Analysis 21, 149–162.
Article MATH Google Scholar
STUTE, W. and ZHU, L.X. (1995): Asymptotics of k-means clustering based on projection pursuit. Sankhya 57, series A(3), 462–471.
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire de Statistique et Probabilités, U.M.R. - C.N.R.S. C5583, Université Paul Sabatier, 118, route de Narbonne, 31062, Toulouse cedex 4, France
Henri Caussinus
LSP, Université Paul Sabatier, France
Anne Ruiz-Gazen
GREMAQ, Université Toulouse 1, 21, allée de Brienne, 31000, Toulouse, France
Anne Ruiz-Gazen

Authors

Henri Caussinus
View author publications
You can also search for this author in PubMed Google Scholar
Anne Ruiz-Gazen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Economics, University of Porto, Rua Dr. Roberto Frias, 4200-464, Porto, Portugal
Paula Brito
ESG UQAM, 315 East, Sainte-Catherine Street, Montreal, Quebec, H2X 3X2, Canada
Guy Cucumel
Department Lussi, ENST Bretagne, 2 rue de la Châtaigneraie, CS 17607, 35576, Cesson-Sévigné Cedex, France
Patrice Bertrand
Centre of Computer Science (CIn), Federal University of Pernambuco (UFPE), Av. Prof. Luiz Freire s/n Cidade Universitária, CEP 50740-540, Recife-PE, Brazil
Francisco de Carvalho

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Caussinus, H., Ruiz-Gazen, A. (2007). Classification and Generalized Principal Component Analysis. In: Brito, P., Cucumel, G., Bertrand, P., de Carvalho, F. (eds) Selected Contributions in Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73560-1_50

Download citation

DOI: https://doi.org/10.1007/978-3-540-73560-1_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73558-8
Online ISBN: 978-3-540-73560-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics