Abstract
In principal component analysis (PCA), it is crucial to know how many principal components (PCs) should be retained in order to account for most of the data variability. A class of “objective” rules for finding this quantity is the class of cross-validation (CV) methods. In this work we compare three CV techniques showing how the performance of these methods depends on the covariance matrix structure. Finally we propose a rule for the choice of the “best” CV method and give an application to real data.
Similar content being viewed by others
References
Bartlett MS (1950) Test of significance in factor analysis.Br. J. Psych. Stat. 3: 77–85
Cattel RB (1966) The Scree test for the number of factors.Mult. Behav. Res. 1: 245–276
Eastment HT, Krzanowski WJ (1982) Cross-validatory choice of the number of component analysis.Technometrics 24: 73–77
Forina M, Lanteri S, Boggia R, Bertran E (1993) Double cross full validation.Química Analítica 12: 128–135
Heiberger RM (1978) AS 127. Generation of random orthogonal matrices.Applied Statistics 27: 199–206
Jackson JE (1991) A user's guide to principal components. Wiley, New York
Jeffers JNR (1967) Two case studies in the application of principal components analysis.Applied Statistics 16: 225–236
Jolliffe IT (1986) Principal component analysis. Springer, Berlin Heidelberg New York
Kaiser HF (1960) The application of electronic computers to factor analysis.Educ. Psychol. Meas. 20: 141–151
Krzanowski WJ (1983) Cross-validatory choice of the number inPprincipal component analysis; some sampling results.J. Statist. Comput. Simul. 18: 299–314
Krzanowski WJ (1987) Cross-validation in principal component analysis.Biometrics 43: 575–584
Krzanowski WJ (1987) Selection of variables to preserve multivariate data structure, using principal components.Applied Statistics 36: 22–33
Malinowski ER (1977) Theory of error in factor analysis.Analytical Chemistry 49: 606–612
Minka TP Automatic choice of dimensionality for PCA. Technical Report n. 514 (2000), MIT Media Laboratory, Vision and Modelling Group. http://citeseer.nj.nec.com/minkaooautomaic.html
Scarponi G, Moret I, Capodaglio G, Romanazzi M (1990) Cross-validation, influential observations and selection of variables in chemometric studies of wines by principal component analysis.Journal of Chemometrics 4: 217–240
Wold S (1976) Pattern recognition by means of disjoint principal components models.Pattern Recognition 8: 127–139
Wold S (1978) Cross-validatory estimation of the number of components in factor and principal components models.Technometrics 20: 397–405
Wold H, Lyttkens E (1969) Nonlinear iterative partial least squares (NIPALS) estimation procedures, Bull. Intern. Statist. Inst.: Proc. 37th Session, pp. 1–15. London
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Diana, G., Tommasi, C. Cross-validation methods in principal component analysis: A comparison. Statistical Methods & Applications 11, 71–82 (2002). https://doi.org/10.1007/BF02511446
Issue Date:
DOI: https://doi.org/10.1007/BF02511446