Analytical and Bioanalytical Chemistry

, Volume 390, Issue 5, pp 1241–1251 | Cite as

Cross-validation of component models: A critical look at current methods

  • R. BroEmail author
  • K. Kjeldahl
  • A. K. Smilde
  • H. A. L. Kiers


In regression, cross-validation is an effective and popular approach that is used to decide, for example, the number of underlying features, and to estimate the average prediction error. The basic principle of cross-validation is to leave out part of the data, build a model, and then predict the left-out samples. While such an approach can also be envisioned for component models such as principal component analysis (PCA), most current implementations do not comply with the essential requirement that the predictions should be independent of the entity being predicted. Further, these methods have not been properly reviewed in the literature. In this paper, we review the most commonly used generic PCA cross-validation schemes and assess how well they work in various scenarios.


Overfitting PRESS Cross-validation PCA Rank estimation 


  1. 1.
    Mosier C (1951) Educ Psychol Meas 11:5–11CrossRefGoogle Scholar
  2. 2.
    Stone M (1974) J Roy Stat Soc B 36:111–148Google Scholar
  3. 3.
    Geisser S (2000) Biometrika 61:101–107CrossRefGoogle Scholar
  4. 4.
    Allen D (1974) Technometrics 16:125–127CrossRefGoogle Scholar
  5. 5.
    Wold S (1976) Pattern Recogn 8:127–139Google Scholar
  6. 6.
    Wold S (1978) Technometrics 20:397–405Google Scholar
  7. 7.
    Eastment HT, Krzanowski WJ (1982) Technometrics 24:73–77Google Scholar
  8. 8.
    Osten D (1988) J Chemom 2:39–48CrossRefGoogle Scholar
  9. 9.
    Louwerse D, Kiers H, Smilde A (1999) J Chemom 13:491–510CrossRefGoogle Scholar
  10. 10.
    Martens H, Martens M (2001) Multivariate analysis of quality: an introduction. Wiley, Chichester, UKGoogle Scholar
  11. 11.
    Martens H, Næs T (1989) Multivariate calibration. Wiley, Chichester, UKGoogle Scholar
  12. 12.
    Wold H (1975) Quantitative sociology. In: Blalock H, Aganbegian A, Borodkin F, Boudon R, Capecchi V (eds) International perspectives on mathematical and statistical modeling. Academic Press, New York, pp 307–357Google Scholar
  13. 13.
    Krzanowski WJ (1983) J Stat Comput Simul 18:299–314CrossRefGoogle Scholar
  14. 14.
    Louwerse D, Kiers H, Smilde A (1997) Internal Report 8:1–6Google Scholar
  15. 15.
    Wise B, Gallagher N, Bro R, Shaver J (2003) PLS Toolbox 3.0. Manson, WAGoogle Scholar
  16. 16.
    Wise B, Ricker N (1991) In: Najim K, Dufour E (eds) IFAC Symp on Advanced Control of Chemical Processes, Toulouse, France, 14–16 October 1991, pp 125–130Google Scholar
  17. 17.
    Dempster A, Laird N, Rubin D (1977) J Roy Stat Soc B 39:1–38Google Scholar
  18. 18.
    Bro R (1998) Multi-way analysis in the food industry. Models, algorithms, and applications. Ph.D. Thesis, University of Amsterdam, Amsterdam (see Accessed 2 Jan 2007)
  19. 19.
    Kiers H (1997) Psychometrika 62:251–266CrossRefGoogle Scholar
  20. 20.
    Bijlsma S, Boelens H, Smilde A (2001) Appl Spectrosc 55:77–83CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  • R. Bro
    • 1
    Email author
  • K. Kjeldahl
    • 1
  • A. K. Smilde
    • 2
  • H. A. L. Kiers
    • 3
  1. 1.Chemometrics Group, Faculty of Life SciencesUniversity of CopenhagenFrederiksberg CDenmark
  2. 2.Biosystems Data Analysis (BDA)Swammerdam Institute for Life SciencesAmsterdamThe Netherlands
  3. 3.Heymans Institute (DPMG)University of GroningenGroningenThe Netherlands

Personalised recommendations