Skip to main content
Log in

Cross-validation of component models: A critical look at current methods

  • Review
  • Published:
Analytical and Bioanalytical Chemistry Aims and scope Submit manuscript

Abstract

In regression, cross-validation is an effective and popular approach that is used to decide, for example, the number of underlying features, and to estimate the average prediction error. The basic principle of cross-validation is to leave out part of the data, build a model, and then predict the left-out samples. While such an approach can also be envisioned for component models such as principal component analysis (PCA), most current implementations do not comply with the essential requirement that the predictions should be independent of the entity being predicted. Further, these methods have not been properly reviewed in the literature. In this paper, we review the most commonly used generic PCA cross-validation schemes and assess how well they work in various scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Mosier C (1951) Educ Psychol Meas 11:5–11

    Article  Google Scholar 

  2. Stone M (1974) J Roy Stat Soc B 36:111–148

    Google Scholar 

  3. Geisser S (2000) Biometrika 61:101–107

    Article  Google Scholar 

  4. Allen D (1974) Technometrics 16:125–127

    Article  Google Scholar 

  5. Wold S (1976) Pattern Recogn 8:127–139

    Google Scholar 

  6. Wold S (1978) Technometrics 20:397–405

    Google Scholar 

  7. Eastment HT, Krzanowski WJ (1982) Technometrics 24:73–77

    Google Scholar 

  8. Osten D (1988) J Chemom 2:39–48

    Article  Google Scholar 

  9. Louwerse D, Kiers H, Smilde A (1999) J Chemom 13:491–510

    Article  CAS  Google Scholar 

  10. Martens H, Martens M (2001) Multivariate analysis of quality: an introduction. Wiley, Chichester, UK

  11. Martens H, Næs T (1989) Multivariate calibration. Wiley, Chichester, UK

  12. Wold H (1975) Quantitative sociology. In: Blalock H, Aganbegian A, Borodkin F, Boudon R, Capecchi V (eds) International perspectives on mathematical and statistical modeling. Academic Press, New York, pp 307–357

  13. Krzanowski WJ (1983) J Stat Comput Simul 18:299–314

    Article  Google Scholar 

  14. Louwerse D, Kiers H, Smilde A (1997) Internal Report 8:1–6

    Google Scholar 

  15. Wise B, Gallagher N, Bro R, Shaver J (2003) PLS Toolbox 3.0. Manson, WA

  16. Wise B, Ricker N (1991) In: Najim K, Dufour E (eds) IFAC Symp on Advanced Control of Chemical Processes, Toulouse, France, 14–16 October 1991, pp 125–130

  17. Dempster A, Laird N, Rubin D (1977) J Roy Stat Soc B 39:1–38

    Google Scholar 

  18. Bro R (1998) Multi-way analysis in the food industry. Models, algorithms, and applications. Ph.D. Thesis, University of Amsterdam, Amsterdam (see http://www.models.life.ku.dk/research/theses. Accessed 2 Jan 2007)

  19. Kiers H (1997) Psychometrika 62:251–266

    Article  Google Scholar 

  20. Bijlsma S, Boelens H, Smilde A (2001) Appl Spectrosc 55:77–83

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Bro.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bro, R., Kjeldahl, K., Smilde, A.K. et al. Cross-validation of component models: A critical look at current methods. Anal Bioanal Chem 390, 1241–1251 (2008). https://doi.org/10.1007/s00216-007-1790-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00216-007-1790-1

Keywords

Navigation