Abstract
Variable selection is a universal problem in building multivariate calibration models, such as quantitative structure-activity relationship (QSAR) and quantitative relationships between quantity or property and spectral data. Significant improvement in the prediction ability of the models can be achieved by reducing the bias induced by the uninformative variables. A new criterion, named as C, is proposed in this study to evaluate the importance of the variables in a model. The value of C is defined as the average contribution of a variable to the model, which is calculated by the statistics of the models built with different combinations of the variables. In the calculation, a large number of partial least squares (PLS) models are built using a subset of variables selected by randomly re-sampling. Then, a vector of the prediction errors, in terms of root mean squared error of cross validation (RMSECV), and a matrix composed of 1 and 0 indicating the selected and unselected variables can be obtained. If multiple linear regression (MLR) is employed to model the relationship between the RMSECVs and the matrix, the coefficients of the MLR model can be used as a criterion to evaluate the contribution of a variable to the RMSECV. To enhance the efficiency of the method, a multi-step shrinkage strategy was used. Comparison with Monte Carlo-uninformative variables elimination (MC-UVE), randomization test (RT) and competitive adaptive reweighted sampling (CARS) was conducted using three NIR benchmark datasets. The results show that the proposed criterion is effective for selecting the informative variables from the spectra to improve the prediction ability of models.
Similar content being viewed by others
References
Saeys Y, Inza I, Larrañaga P. Bioinformatics, 2007, 23: 2507–2517
Goodarzi M, Heyden YV, Funar-Timofei S. TrAC Trends Anal Chem, 2013, 42: 49–63
Zhu XW, Xin YJ, Ge HL. J Chem Inf Model, 2015, 55: 736–746
Yousefinejad S, Hemmateenejad B. Chemom Intell Lab Syst, 2015, 149: 177–204
Andersen CM, Bro R. J Chemom, 2010, 24: 728–737
Xiaobo Z, Jiewen Z, Povey MJW, Holmes M, Hanpin M. Anal Chim Acta, 2010, 667: 14–32
Mehmood T, Liland KH, Snipen L, Sæbø S. Chemom Intell Lab Syst, 2012, 118: 62–69
Chong IG, Jun CH. Chemom Intell Lab Syst, 2005, 78: 103–112
Zhang J, Cui X, Cai W, Shao X. J Chemom, 2017, 28: e2971
Allegrini F, Braga JWB, Moreira ACO, Olivieri AC. Anal Chim Acta, 2018, 1011: 20–27
Ma C, Shao X. J Chem Inf Comput Sci, 2004, 44: 907–911
Zhu X, Li S, Shan Y, Zhang Z, Li G, Su D, Liu F. J Food Eng, 2010, 101: 92–97
Fan M, Liu X, Yu X, Cui X, Cai W, Shao X. Sci China Chem, 2017, 60: 299–304
Baumann K. TrAC Trends Anal Chem, 2003, 22: 395–406
Kalivas JH, Roberts N, Sutter JM. Anal Chem, 2002, 61: 2024–2030
Lucasius CB, Kateman G. TrAC Trends Anal Chem, 1991, 10: 254–261
Li Z, Zhou X, Dai Z, Zou X. BMC BioInf, 2010, 11: 325
Shen Q, Jiang JH, Tao JC, Shen GL, Yu RQ. J Chem Inf Model, 2005, 45: 1024–1029
Cao H, Wang Y, Yang S, Zhou Y. J Chemom, 2015, 29: 289–299
Li H, Liang Y, Xu Q, Cao D. Anal Chim Acta, 2009, 648: 77–84
Centner V, Massart DL, de Noord OE, de Jong S, Vandeginste BM, Sterna C. Anal Chem, 1996, 68: 3851–3858
Andries JPM, Vander Heyden Y, Buydens LMC. Anal Chim Acta, 2017, 982: 37–47
Cai W, Li Y, Shao X. Chemom Intell Lab Syst, 2008, 90: 188–194
Han QJ, Wu HL, Cai CB, Xu L, Yu RQ. Anal Chim Acta, 2008, 612: 121–125
Zheng K, Li Q, Wang J, Geng J, Cao P, Sui T, Wang X, Du Y. Chemom Intell Lab Syst, 2012, 112: 48–54
Xu H, Liu Z, Cai W, Shao X. Chemom Intell Lab Syst, 2009, 97: 189–193
Milanez KDTM, Araújo Nóbrega TC, Silva Nascimento D, Galvão RKH, Pontes MJC. Anal Chim Acta, 2017, 984: 76–85
Rossi F, Lendasse A, François D, Wertz V, Verleysen M. Chemom Intell Lab Syst, 2006, 80: 215–226
Tan C, Li M. Spectrochim Acta Part A-Mol Biomol Spectr, 2008, 71: 1266–1273
Tran TN, Afanador NL, Buydens LMC, Blanchet L. Chemom Intell Lab Syst, 2014, 138: 153–160
Afanador NL, Tran TN, Buydens LMC. Anal Chim Acta, 2013, 768: 49–56
Yun YH, Deng BC, Cao DS, Wang WT, Liang YZ. Anal Chim Acta, 2016, 911: 27–34
Shao X, Du G, Jing M, Cai W. Chemom Intell Lab Syst, 2012, 114: 44–49
Shao X, Zhang M, Cai W. Anal Methods, 2012, 4: 467–473
Shan R, Cai W, Shao X. Chemom Intell Lab Syst, 2014, 131: 31–36
Brown CD, Green RL. TrAC Trends Anal Chem, 2009, 28: 506–514
Kjeldahl K, Bro R. J Chemom, 2010, 24: 558–564
Tran TN, Blanchet L, Afanador NL, Buydens LMC. Chemom Intell Lab Syst, 2015, 149: 127–139
Tran T, Szymanska E, Gerretzen J, Buydens L, Afanador NL, Blanchet L. J Chemom, 2017, 31: e2887
Deng BC, Yun YH, Cao DS, Yin YL, Wang WT, Lu HM, Luo QY, Liang YZ. Anal Chim Acta, 2016, 908: 63–74
Olivieri AC. Anal Chim Acta, 2015, 868: 10–22
Pan T, Han Y, Chen J, Yao L, Xie J. Chemom Intell Lab Syst, 2016, 156: 217–223
Kennard RW, Stone LA. Technometrics, 1969, 11: 137–148
Acknowledgements
This work was supported by the National Natural Science Foundation of China (21475068, 21775076).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, J., Cui, X., Cai, W. et al. A variable importance criterion for variable selection in near-infrared spectral analysis. Sci. China Chem. 62, 271–279 (2019). https://doi.org/10.1007/s11426-018-9368-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11426-018-9368-9