Abstract
In previous papers, we propose a generalized principal component analysis (GPCA) aimed to display salient features of a multidimensional data set, in particular the existence of clusters. In the light of an example, this article evidences how GPCA and clustering methods are complementary. The projections provided by GPCA and the sequence of eigenvalues give useful indications on the number and the type of clusters to be expected; submitting GPCA principal components to a clustering algorithm instead of the raw data can improve the classification. The use of a convenient robustification of GPCA is also evoked.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
ART, D., GNANADESIKAN, R. and KETTENRING, J.R. (1982): Data-based metrics for cluster analysis. Utilitas Mathematica, 21A, 75–99.
BOCK, H.H. (1987): On the interface between cluster analysis, principal component analysis, and multidimensional scaling. In: H. Bozdogan and A.K. Gupta (Eds.): Multivariate Statistical Modeling and Data Analysis. D. Reidel Publishing Company, 17–34.
CAUSSINUS, H., FEKRI, M., HAKAM, S. and RUIZ-GAZEN, A. (2003a): A monitoring display of multivariate outliers. Computational Statistics and Data Analysis 44, 237–252.
CAUSSINUS, H., HAKAM, S. and RUIZ-GAZEN, A. (2003b): Projections révélatrices contrôlées: groupements et structures diverses. Revue de Statistique Appliquée 51(1), 37–58.
CAUSSINUS, H. and RUIZ-GAZEN, A. (1993): Projection pursuit and generalized principal component analyses. In: S. Morgenthaler, E. Ronchetti, W.A. Stahel (Eds.): New directions in statistical data analysis and robustness. Birkhauser Verlag, Basel Boston Berlin, 35–46.
CAUSSINUS, H. and RUIZ-GAZEN, A. (1995): Metrics for finding typical structures by means of principal component analysis. In: Y. Escoufier and C. Hayashi (Eds.): Data Science and its Applications. Academic Press, Tokyo, 177–192.
CAUSSINUS, H. and RUIZ-GAZEN, A. (2006): Projection pursuit approach for categorical data. In: M. Greenacre and J. Blasius (Eds.): Multiple Correspondence Analysis and Related Methods. Chapman and Hall/CRC, London, 405–418.
CHAE, S. S. and WARDE, W.D. (2006): Effect of using principal coordinates and principal components on retrieval of clusters. Computational Statistics and Data Analysis 50, 1407–1417.
CHAVENT, M., LACOMBLEZ, C. and PATOUILLE, B. (2001): Critère de Rand asymétrique. Huitièmes rencontres de la Société Francophone de Classification, Pointe à Pitre, 82–88.
COOK, D., CARAGEA, D. and HONAVAR, H. (2004): Visualization in classification problems. In: J. Antoch (Ed.): Proceedings in Computational Statistics (COMPSTAT 2004), Springer, Berlin, 799–806.
DIDAY, E. et collaborateurs (1979): Optimisation en classification automatique. INRIA, Roquencourt.
FORINA, M., ARMANINO, C. LANTERI, S. and TISCORNIA, E. (1983): Classification of olive oils from their fatty acid composition. In: H. Martens and H. Russwurm Jr. (Eds.): Food Research and Data Analysis. Applied Science Publishers, London, 189–214.
GABRIEL, K.R. (1971): The biplot: graphical display of matrices with application to principal component analysis. Biometrika 58 453–467.
GABRIEL, K.R. (2002): Le biplot: outil d’exploration des données multidimensionnelles. Journal de la Société Française de Statistique 143(3–4), 5–55.
GLOVER, D.M. and HOPKE, P.K. (1992): Exploration of multivariate chemical data by projection pursuit. Chemometrics and Intelligent Laboratory Systems 16, 45–59.
HARTIGAN, J. and WONG, M.A. (1979): A k-means clustering algorithm. Applied Statistics, 28, 100–108.
RAND, W.M. (1971): Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66(336), 846–850.
RUIZ-GAZEN, A. (1996): A very simple robust estimator of a dispersion matrix. Computational Statistics and Data Analysis 21, 149–162.
STUTE, W. and ZHU, L.X. (1995): Asymptotics of k-means clustering based on projection pursuit. Sankhya 57, series A(3), 462–471.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Caussinus, H., Ruiz-Gazen, A. (2007). Classification and Generalized Principal Component Analysis. In: Brito, P., Cucumel, G., Bertrand, P., de Carvalho, F. (eds) Selected Contributions in Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73560-1_50
Download citation
DOI: https://doi.org/10.1007/978-3-540-73560-1_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73558-8
Online ISBN: 978-3-540-73560-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)