Abstract
Ten techniques used for selection of useful predictors in multivariate calibration and in other cases of multivariate regression are described and discussed in terms of their performance (ability to detect useless predictors, predictive power, number of retained predictors) with real and artificial data. The techniques studied include classical stepwise ordinary least-squares (SOLS), techniques based on the genetic algorithms, and a family of methods based on partial least-squares (PLS) regression and on the optimization of the predictive ability. A short introduction presents the evaluation strategies, a description of the quantities used to evaluate the regression model, and the criteria used to define the complexity of PLS models. The selection techniques can be divided into conservative techniques that try to retain all the informative, useful predictors, and parsimonious techniques, whose objective is to select a minimum but sufficient number of useful predictors. Some combined techniques, in which a conservative technique is used to perform a preliminary selection before the use of parsimonious techniques, are also presented. Among the conservative techniques, the Westad–Martens uncertainty test (MUT) used in Unscrambler, and uninformative variables elimination (UVE), developed by Massart et al., seem the most efficient techniques. The old SOLS can be improved to become the most efficient parsimonious technique, by means of the use of plots of the F-statistics value of the entered predictors and comparison with parallel results obtained with a data matrix with random data. This procedure indicates correctly how many predictors can be accepted and substantially reduces the possibility of overfitting. A possible alternative to SOLS is iterative predictors weighting (IPW) that automatically selects a minimum set of informative predictors. The use of an external evaluation set, with objects never used in the elimination of predictors, or of “complete validation” is suggested to avoid overestimate of the prediction ability.



















Similar content being viewed by others
References
Lucasius CB, Kateman G (1991) Trends Anal Chem 10:254–281
Leardi R, Boggia R, Terrile M (1992) J Chemometrics 6(5):267–281
Brown PJ, Vannucci M, Fearn T (1998) J Chemometrics 12:173–182
Martens H, Naes T (eds) (1989) Multivariate calibration. Wiley, Chichester
Frank IE (1987) Chemometrics Intell Lab Syst 1:233–242
Kettaneh-Wold N, MacGregor JF, Wold S (1994) Chemometrics Intell Lab Syst 23:39–50
Lindgren F, Geladi P, Rannar S, Wold S (1994) J Chemometrics 8:349–363
Forina M, Drava G, De La Pezuela C (1986) Sixth chemometrics in analytical chemistry conference (CAC), Tarragona, June 25–29, Abstract Book, PII-29
Cruciani G, Clementi S, Pastor M (1998) GOLPE-guided region selection. In: Kubinyi H, Folkers G, Martin YC (eds) 3D-QSAR in drug design. Recent advances. Kluwer, Dordrecht
GOLPE background, at http://www.miasrl.com/software/golpe/manual/background.html
Nørgaard L, Saudland A, Wagner J, Nielsen JP, Munck L, Engelsen SB (2000) Applied Spectrosc 54:413–419
Höskuldsson A (2001) Chemometrics Intell Lab Syst 55:23–38
Kennard RW, Stone LA (1969) Technometrics 11:137–148
Snee RD (1977) Technometrics 19:415–428
Shao J (1993) J Comput Graph Stat 88:486–494
Breiman L, Spector P (1992) Int Stat Rev 60:291–319
Kowalski BR, Seasholtz MB (1991) J Chemometrics 5:129–145
Van der Voet H (1994) Chemometrics Intell Lab Syst 25:313–323
Haaland D, Thomas E (1988) Anal Chem 60:1193–1202
Thomas E, Haaland D (1990) Anal Chem 62:1091–1099
Osten D (1988) J Chemometrics 2:39–48
Faber NM (2001) Anal Chim Acta 432:235–240
Massart DL, Vandeginste BGM, Buydens LMC, De Jong S, Lewi PJ, Smeyers-Verbeke J (eds) (1998) Handbook of chemometrics and qualimetrics, part A. Elsevier, Amsterdam
Belsley DA, Kuh E, Welsch RE (eds) (1981) Regression diagnostics: identifying influential data and sources of collinearity. Wiley, New York
Garrido Frenich A, Jouan-Rimbaud D, Massart DL, Kuttatharmmakul S, Martinez Galera M, Martinez Vidal JL (1995) Analyst 120:2787–2792
Boggia R, Forina M, Fossa P, Mosti L (1997) Quant Struct Activity Relationships (QSAR) 16:201–213
Forina M, Casolino C, Pizarro Millán (1999) J Chemometrics 13:165–184
Centner V, Massart DL, de Noord OE, de Jong S, Vandeginste BM, Sterna C (1996) Anal Chem 68:3851–3858
The Unscrambler, Camo ASA, Oslo
Westad F, Martens H (2000) J Near Infrared Spectrosc 8:117–124
Efron (eds) (1982) The Jackknife, the bootstrap and other re-sampling plans. Society for Industrial and Applied Mathematics, Philadelphia
Ojelund H, Madsen H, Thyregod P (2001) J Chemometrics 15:497–509
Tibshirani R (1996) J R Stat Soc Ser B 58:267–288
Forina M, Lanteri S, Armanino C, Casolino C, Cerrato Oliveros C (2003) V-PARVUS Release. An extendable package of programs for explorative data analysis, classification and regression analysis. Dip Chimica e Tecnologie Farmaceutiche, University of Genova. Free available at http://www.parvus.unige.it
Forina M, Drava G et al (1995) Chemometrics Intell Lab Syst 27:189–203
Kalivas JH (1997) Chemometrics Intell Lab Syst 37:255–259
Acknowledgements
Study developed with funds from the University of Genova.
Author information
Authors and Affiliations
Corresponding author
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Forina, M., Lanteri, S., Oliveros, M.C.C. et al. Selection of useful predictors in multivariate calibration. Anal Bioanal Chem 380, 397–418 (2004). https://doi.org/10.1007/s00216-004-2768-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00216-004-2768-x


