Abstract
In regression, cross-validation is an effective and popular approach that is used to decide, for example, the number of underlying features, and to estimate the average prediction error. The basic principle of cross-validation is to leave out part of the data, build a model, and then predict the left-out samples. While such an approach can also be envisioned for component models such as principal component analysis (PCA), most current implementations do not comply with the essential requirement that the predictions should be independent of the entity being predicted. Further, these methods have not been properly reviewed in the literature. In this paper, we review the most commonly used generic PCA cross-validation schemes and assess how well they work in various scenarios.
Similar content being viewed by others
References
Mosier C (1951) Educ Psychol Meas 11:5–11
Stone M (1974) J Roy Stat Soc B 36:111–148
Geisser S (2000) Biometrika 61:101–107
Allen D (1974) Technometrics 16:125–127
Wold S (1976) Pattern Recogn 8:127–139
Wold S (1978) Technometrics 20:397–405
Eastment HT, Krzanowski WJ (1982) Technometrics 24:73–77
Osten D (1988) J Chemom 2:39–48
Louwerse D, Kiers H, Smilde A (1999) J Chemom 13:491–510
Martens H, Martens M (2001) Multivariate analysis of quality: an introduction. Wiley, Chichester, UK
Martens H, Næs T (1989) Multivariate calibration. Wiley, Chichester, UK
Wold H (1975) Quantitative sociology. In: Blalock H, Aganbegian A, Borodkin F, Boudon R, Capecchi V (eds) International perspectives on mathematical and statistical modeling. Academic Press, New York, pp 307–357
Krzanowski WJ (1983) J Stat Comput Simul 18:299–314
Louwerse D, Kiers H, Smilde A (1997) Internal Report 8:1–6
Wise B, Gallagher N, Bro R, Shaver J (2003) PLS Toolbox 3.0. Manson, WA
Wise B, Ricker N (1991) In: Najim K, Dufour E (eds) IFAC Symp on Advanced Control of Chemical Processes, Toulouse, France, 14–16 October 1991, pp 125–130
Dempster A, Laird N, Rubin D (1977) J Roy Stat Soc B 39:1–38
Bro R (1998) Multi-way analysis in the food industry. Models, algorithms, and applications. Ph.D. Thesis, University of Amsterdam, Amsterdam (see http://www.models.life.ku.dk/research/theses. Accessed 2 Jan 2007)
Kiers H (1997) Psychometrika 62:251–266
Bijlsma S, Boelens H, Smilde A (2001) Appl Spectrosc 55:77–83
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bro, R., Kjeldahl, K., Smilde, A.K. et al. Cross-validation of component models: A critical look at current methods. Anal Bioanal Chem 390, 1241–1251 (2008). https://doi.org/10.1007/s00216-007-1790-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00216-007-1790-1