Skip to main content
Log in

Selection of useful predictors in multivariate calibration

  • Review
  • Published:
Analytical and Bioanalytical Chemistry Aims and scope Submit manuscript

Abstract

Ten techniques used for selection of useful predictors in multivariate calibration and in other cases of multivariate regression are described and discussed in terms of their performance (ability to detect useless predictors, predictive power, number of retained predictors) with real and artificial data. The techniques studied include classical stepwise ordinary least-squares (SOLS), techniques based on the genetic algorithms, and a family of methods based on partial least-squares (PLS) regression and on the optimization of the predictive ability. A short introduction presents the evaluation strategies, a description of the quantities used to evaluate the regression model, and the criteria used to define the complexity of PLS models. The selection techniques can be divided into conservative techniques that try to retain all the informative, useful predictors, and parsimonious techniques, whose objective is to select a minimum but sufficient number of useful predictors. Some combined techniques, in which a conservative technique is used to perform a preliminary selection before the use of parsimonious techniques, are also presented. Among the conservative techniques, the Westad–Martens uncertainty test (MUT) used in Unscrambler, and uninformative variables elimination (UVE), developed by Massart et al., seem the most efficient techniques. The old SOLS can be improved to become the most efficient parsimonious technique, by means of the use of plots of the F-statistics value of the entered predictors and comparison with parallel results obtained with a data matrix with random data. This procedure indicates correctly how many predictors can be accepted and substantially reduces the possibility of overfitting. A possible alternative to SOLS is iterative predictors weighting (IPW) that automatically selects a minimum set of informative predictors. The use of an external evaluation set, with objects never used in the elimination of predictors, or of “complete validation” is suggested to avoid overestimate of the prediction ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Lucasius CB, Kateman G (1991) Trends Anal Chem 10:254–281

    Article  CAS  Google Scholar 

  2. Leardi R, Boggia R, Terrile M (1992) J Chemometrics 6(5):267–281

    CAS  Google Scholar 

  3. Brown PJ, Vannucci M, Fearn T (1998) J Chemometrics 12:173–182

    Article  CAS  Google Scholar 

  4. Martens H, Naes T (eds) (1989) Multivariate calibration. Wiley, Chichester

    Google Scholar 

  5. Frank IE (1987) Chemometrics Intell Lab Syst 1:233–242

    Article  Google Scholar 

  6. Kettaneh-Wold N, MacGregor JF, Wold S (1994) Chemometrics Intell Lab Syst 23:39–50

    Article  CAS  Google Scholar 

  7. Lindgren F, Geladi P, Rannar S, Wold S (1994) J Chemometrics 8:349–363

    Google Scholar 

  8. Forina M, Drava G, De La Pezuela C (1986) Sixth chemometrics in analytical chemistry conference (CAC), Tarragona, June 25–29, Abstract Book, PII-29

  9. Cruciani G, Clementi S, Pastor M (1998) GOLPE-guided region selection. In: Kubinyi H, Folkers G, Martin YC (eds) 3D-QSAR in drug design. Recent advances. Kluwer, Dordrecht

    Google Scholar 

  10. GOLPE background, at http://www.miasrl.com/software/golpe/manual/background.html

  11. Nørgaard L, Saudland A, Wagner J, Nielsen JP, Munck L, Engelsen SB (2000) Applied Spectrosc 54:413–419

    Article  Google Scholar 

  12. Höskuldsson A (2001) Chemometrics Intell Lab Syst 55:23–38

    Article  Google Scholar 

  13. Kennard RW, Stone LA (1969) Technometrics 11:137–148

    Google Scholar 

  14. Snee RD (1977) Technometrics 19:415–428

    Google Scholar 

  15. Shao J (1993) J Comput Graph Stat 88:486–494

    Google Scholar 

  16. Breiman L, Spector P (1992) Int Stat Rev 60:291–319

    Google Scholar 

  17. Kowalski BR, Seasholtz MB (1991) J Chemometrics 5:129–145

    CAS  Google Scholar 

  18. Van der Voet H (1994) Chemometrics Intell Lab Syst 25:313–323

    Article  Google Scholar 

  19. Haaland D, Thomas E (1988) Anal Chem 60:1193–1202

    CAS  Google Scholar 

  20. Thomas E, Haaland D (1990) Anal Chem 62:1091–1099

    CAS  Google Scholar 

  21. Osten D (1988) J Chemometrics 2:39–48

    Google Scholar 

  22. Faber NM (2001) Anal Chim Acta 432:235–240

    Article  CAS  Google Scholar 

  23. Massart DL, Vandeginste BGM, Buydens LMC, De Jong S, Lewi PJ, Smeyers-Verbeke J (eds) (1998) Handbook of chemometrics and qualimetrics, part A. Elsevier, Amsterdam

    Google Scholar 

  24. Belsley DA, Kuh E, Welsch RE (eds) (1981) Regression diagnostics: identifying influential data and sources of collinearity. Wiley, New York

    Google Scholar 

  25. Garrido Frenich A, Jouan-Rimbaud D, Massart DL, Kuttatharmmakul S, Martinez Galera M, Martinez Vidal JL (1995) Analyst 120:2787–2792

    Article  Google Scholar 

  26. Boggia R, Forina M, Fossa P, Mosti L (1997) Quant Struct Activity Relationships (QSAR) 16:201–213

    CAS  Google Scholar 

  27. Forina M, Casolino C, Pizarro Millán (1999) J Chemometrics 13:165–184

    Article  CAS  Google Scholar 

  28. Centner V, Massart DL, de Noord OE, de Jong S, Vandeginste BM, Sterna C (1996) Anal Chem 68:3851–3858

    Article  CAS  Google Scholar 

  29. The Unscrambler, Camo ASA, Oslo

  30. Westad F, Martens H (2000) J Near Infrared Spectrosc 8:117–124

    CAS  Google Scholar 

  31. Efron (eds) (1982) The Jackknife, the bootstrap and other re-sampling plans. Society for Industrial and Applied Mathematics, Philadelphia

    Google Scholar 

  32. Ojelund H, Madsen H, Thyregod P (2001) J Chemometrics 15:497–509

    Article  CAS  Google Scholar 

  33. Tibshirani R (1996) J R Stat Soc Ser B 58:267–288

    Google Scholar 

  34. Forina M, Lanteri S, Armanino C, Casolino C, Cerrato Oliveros C (2003) V-PARVUS Release. An extendable package of programs for explorative data analysis, classification and regression analysis. Dip Chimica e Tecnologie Farmaceutiche, University of Genova. Free available at http://www.parvus.unige.it

  35. Forina M, Drava G et al (1995) Chemometrics Intell Lab Syst 27:189–203

    Article  CAS  Google Scholar 

  36. Kalivas JH (1997) Chemometrics Intell Lab Syst 37:255–259

    Article  CAS  Google Scholar 

Download references

Acknowledgements

Study developed with funds from the University of Genova.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Forina.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Forina, M., Lanteri, S., Oliveros, M.C.C. et al. Selection of useful predictors in multivariate calibration. Anal Bioanal Chem 380, 397–418 (2004). https://doi.org/10.1007/s00216-004-2768-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00216-004-2768-x

Keywords