Skip to main content
Log in

Cross-validation methods in principal component analysis: A comparison

  • Statisticial Methods
  • Published:
Statistical Methods and Applications Aims and scope Submit manuscript

Abstract

In principal component analysis (PCA), it is crucial to know how many principal components (PCs) should be retained in order to account for most of the data variability. A class of “objective” rules for finding this quantity is the class of cross-validation (CV) methods. In this work we compare three CV techniques showing how the performance of these methods depends on the covariance matrix structure. Finally we propose a rule for the choice of the “best” CV method and give an application to real data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bartlett MS (1950) Test of significance in factor analysis.Br. J. Psych. Stat. 3: 77–85

    Google Scholar 

  2. Cattel RB (1966) The Scree test for the number of factors.Mult. Behav. Res. 1: 245–276

    Article  Google Scholar 

  3. Eastment HT, Krzanowski WJ (1982) Cross-validatory choice of the number of component analysis.Technometrics 24: 73–77

    Article  MathSciNet  Google Scholar 

  4. Forina M, Lanteri S, Boggia R, Bertran E (1993) Double cross full validation.Química Analítica 12: 128–135

    Google Scholar 

  5. Heiberger RM (1978) AS 127. Generation of random orthogonal matrices.Applied Statistics 27: 199–206

    Article  MATH  Google Scholar 

  6. Jackson JE (1991) A user's guide to principal components. Wiley, New York

    Book  MATH  Google Scholar 

  7. Jeffers JNR (1967) Two case studies in the application of principal components analysis.Applied Statistics 16: 225–236

    Article  Google Scholar 

  8. Jolliffe IT (1986) Principal component analysis. Springer, Berlin Heidelberg New York

    MATH  Google Scholar 

  9. Kaiser HF (1960) The application of electronic computers to factor analysis.Educ. Psychol. Meas. 20: 141–151

    Google Scholar 

  10. Krzanowski WJ (1983) Cross-validatory choice of the number inPprincipal component analysis; some sampling results.J. Statist. Comput. Simul. 18: 299–314

    Google Scholar 

  11. Krzanowski WJ (1987) Cross-validation in principal component analysis.Biometrics 43: 575–584

    Article  MathSciNet  Google Scholar 

  12. Krzanowski WJ (1987) Selection of variables to preserve multivariate data structure, using principal components.Applied Statistics 36: 22–33

    Article  Google Scholar 

  13. Malinowski ER (1977) Theory of error in factor analysis.Analytical Chemistry 49: 606–612

    Article  Google Scholar 

  14. Minka TP Automatic choice of dimensionality for PCA. Technical Report n. 514 (2000), MIT Media Laboratory, Vision and Modelling Group. http://citeseer.nj.nec.com/minkaooautomaic.html

  15. Scarponi G, Moret I, Capodaglio G, Romanazzi M (1990) Cross-validation, influential observations and selection of variables in chemometric studies of wines by principal component analysis.Journal of Chemometrics 4: 217–240

    Article  Google Scholar 

  16. Wold S (1976) Pattern recognition by means of disjoint principal components models.Pattern Recognition 8: 127–139

    Article  MATH  Google Scholar 

  17. Wold S (1978) Cross-validatory estimation of the number of components in factor and principal components models.Technometrics 20: 397–405

    Article  MATH  Google Scholar 

  18. Wold H, Lyttkens E (1969) Nonlinear iterative partial least squares (NIPALS) estimation procedures, Bull. Intern. Statist. Inst.: Proc. 37th Session, pp. 1–15. London

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Diana, G., Tommasi, C. Cross-validation methods in principal component analysis: A comparison. Statistical Methods & Applications 11, 71–82 (2002). https://doi.org/10.1007/BF02511446

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02511446

Key words

Navigation