Skip to main content
Log in

A variable importance criterion for variable selection in near-infrared spectral analysis

  • Articles
  • Published:
Science China Chemistry Aims and scope Submit manuscript

Abstract

Variable selection is a universal problem in building multivariate calibration models, such as quantitative structure-activity relationship (QSAR) and quantitative relationships between quantity or property and spectral data. Significant improvement in the prediction ability of the models can be achieved by reducing the bias induced by the uninformative variables. A new criterion, named as C, is proposed in this study to evaluate the importance of the variables in a model. The value of C is defined as the average contribution of a variable to the model, which is calculated by the statistics of the models built with different combinations of the variables. In the calculation, a large number of partial least squares (PLS) models are built using a subset of variables selected by randomly re-sampling. Then, a vector of the prediction errors, in terms of root mean squared error of cross validation (RMSECV), and a matrix composed of 1 and 0 indicating the selected and unselected variables can be obtained. If multiple linear regression (MLR) is employed to model the relationship between the RMSECVs and the matrix, the coefficients of the MLR model can be used as a criterion to evaluate the contribution of a variable to the RMSECV. To enhance the efficiency of the method, a multi-step shrinkage strategy was used. Comparison with Monte Carlo-uninformative variables elimination (MC-UVE), randomization test (RT) and competitive adaptive reweighted sampling (CARS) was conducted using three NIR benchmark datasets. The results show that the proposed criterion is effective for selecting the informative variables from the spectra to improve the prediction ability of models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Saeys Y, Inza I, Larrañaga P. Bioinformatics, 2007, 23: 2507–2517

    Article  CAS  PubMed  Google Scholar 

  2. Goodarzi M, Heyden YV, Funar-Timofei S. TrAC Trends Anal Chem, 2013, 42: 49–63

    Article  CAS  Google Scholar 

  3. Zhu XW, Xin YJ, Ge HL. J Chem Inf Model, 2015, 55: 736–746

    Article  CAS  PubMed  Google Scholar 

  4. Yousefinejad S, Hemmateenejad B. Chemom Intell Lab Syst, 2015, 149: 177–204

    Article  CAS  Google Scholar 

  5. Andersen CM, Bro R. J Chemom, 2010, 24: 728–737

    Article  CAS  Google Scholar 

  6. Xiaobo Z, Jiewen Z, Povey MJW, Holmes M, Hanpin M. Anal Chim Acta, 2010, 667: 14–32

    Article  CAS  PubMed  Google Scholar 

  7. Mehmood T, Liland KH, Snipen L, Sæbø S. Chemom Intell Lab Syst, 2012, 118: 62–69

    Article  CAS  Google Scholar 

  8. Chong IG, Jun CH. Chemom Intell Lab Syst, 2005, 78: 103–112

    Article  CAS  Google Scholar 

  9. Zhang J, Cui X, Cai W, Shao X. J Chemom, 2017, 28: e2971

    Google Scholar 

  10. Allegrini F, Braga JWB, Moreira ACO, Olivieri AC. Anal Chim Acta, 2018, 1011: 20–27

    Article  CAS  PubMed  Google Scholar 

  11. Ma C, Shao X. J Chem Inf Comput Sci, 2004, 44: 907–911

    Article  CAS  PubMed  Google Scholar 

  12. Zhu X, Li S, Shan Y, Zhang Z, Li G, Su D, Liu F. J Food Eng, 2010, 101: 92–97

    Article  CAS  Google Scholar 

  13. Fan M, Liu X, Yu X, Cui X, Cai W, Shao X. Sci China Chem, 2017, 60: 299–304

    Article  CAS  Google Scholar 

  14. Baumann K. TrAC Trends Anal Chem, 2003, 22: 395–406

    Article  CAS  Google Scholar 

  15. Kalivas JH, Roberts N, Sutter JM. Anal Chem, 2002, 61: 2024–2030

    Article  Google Scholar 

  16. Lucasius CB, Kateman G. TrAC Trends Anal Chem, 1991, 10: 254–261

    Article  CAS  Google Scholar 

  17. Li Z, Zhou X, Dai Z, Zou X. BMC BioInf, 2010, 11: 325

    Article  CAS  Google Scholar 

  18. Shen Q, Jiang JH, Tao JC, Shen GL, Yu RQ. J Chem Inf Model, 2005, 45: 1024–1029

    Article  CAS  PubMed  Google Scholar 

  19. Cao H, Wang Y, Yang S, Zhou Y. J Chemom, 2015, 29: 289–299

    Article  CAS  Google Scholar 

  20. Li H, Liang Y, Xu Q, Cao D. Anal Chim Acta, 2009, 648: 77–84

    Article  CAS  Google Scholar 

  21. Centner V, Massart DL, de Noord OE, de Jong S, Vandeginste BM, Sterna C. Anal Chem, 1996, 68: 3851–3858

    Article  CAS  PubMed  Google Scholar 

  22. Andries JPM, Vander Heyden Y, Buydens LMC. Anal Chim Acta, 2017, 982: 37–47

    Article  CAS  PubMed  Google Scholar 

  23. Cai W, Li Y, Shao X. Chemom Intell Lab Syst, 2008, 90: 188–194

    Article  CAS  Google Scholar 

  24. Han QJ, Wu HL, Cai CB, Xu L, Yu RQ. Anal Chim Acta, 2008, 612: 121–125

    Article  CAS  PubMed  Google Scholar 

  25. Zheng K, Li Q, Wang J, Geng J, Cao P, Sui T, Wang X, Du Y. Chemom Intell Lab Syst, 2012, 112: 48–54

    Article  CAS  Google Scholar 

  26. Xu H, Liu Z, Cai W, Shao X. Chemom Intell Lab Syst, 2009, 97: 189–193

    Article  CAS  Google Scholar 

  27. Milanez KDTM, Araújo Nóbrega TC, Silva Nascimento D, Galvão RKH, Pontes MJC. Anal Chim Acta, 2017, 984: 76–85

    Article  CAS  PubMed  Google Scholar 

  28. Rossi F, Lendasse A, François D, Wertz V, Verleysen M. Chemom Intell Lab Syst, 2006, 80: 215–226

    Article  CAS  Google Scholar 

  29. Tan C, Li M. Spectrochim Acta Part A-Mol Biomol Spectr, 2008, 71: 1266–1273

    Article  CAS  Google Scholar 

  30. Tran TN, Afanador NL, Buydens LMC, Blanchet L. Chemom Intell Lab Syst, 2014, 138: 153–160

    Article  CAS  Google Scholar 

  31. Afanador NL, Tran TN, Buydens LMC. Anal Chim Acta, 2013, 768: 49–56

    Article  CAS  PubMed  Google Scholar 

  32. Yun YH, Deng BC, Cao DS, Wang WT, Liang YZ. Anal Chim Acta, 2016, 911: 27–34

    Article  CAS  PubMed  Google Scholar 

  33. Shao X, Du G, Jing M, Cai W. Chemom Intell Lab Syst, 2012, 114: 44–49

    Article  CAS  Google Scholar 

  34. Shao X, Zhang M, Cai W. Anal Methods, 2012, 4: 467–473

    Article  CAS  Google Scholar 

  35. Shan R, Cai W, Shao X. Chemom Intell Lab Syst, 2014, 131: 31–36

    Article  CAS  Google Scholar 

  36. Brown CD, Green RL. TrAC Trends Anal Chem, 2009, 28: 506–514

    Article  CAS  Google Scholar 

  37. Kjeldahl K, Bro R. J Chemom, 2010, 24: 558–564

    Article  CAS  Google Scholar 

  38. Tran TN, Blanchet L, Afanador NL, Buydens LMC. Chemom Intell Lab Syst, 2015, 149: 127–139

    Article  CAS  Google Scholar 

  39. Tran T, Szymanska E, Gerretzen J, Buydens L, Afanador NL, Blanchet L. J Chemom, 2017, 31: e2887

    Article  CAS  Google Scholar 

  40. Deng BC, Yun YH, Cao DS, Yin YL, Wang WT, Lu HM, Luo QY, Liang YZ. Anal Chim Acta, 2016, 908: 63–74

    Article  CAS  PubMed  Google Scholar 

  41. Olivieri AC. Anal Chim Acta, 2015, 868: 10–22

    Article  CAS  PubMed  Google Scholar 

  42. Pan T, Han Y, Chen J, Yao L, Xie J. Chemom Intell Lab Syst, 2016, 156: 217–223

    Article  CAS  Google Scholar 

  43. Kennard RW, Stone LA. Technometrics, 1969, 11: 137–148

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (21475068, 21775076).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xueguang Shao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Cui, X., Cai, W. et al. A variable importance criterion for variable selection in near-infrared spectral analysis. Sci. China Chem. 62, 271–279 (2019). https://doi.org/10.1007/s11426-018-9368-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11426-018-9368-9

Keywords

Navigation