Abstract
Informative variable selection or wavelength selection plays an important role in the quantitative analysis of near-infrared (NIR) spectra because the modern spectroscopy instrumentations usually have a high resolution and the obtained spectral data sets may have thousands of variables and hundreds or thousands of samples. In this study, a new combination of Monte Carlo–uninformative variable elimination (MC-UVE) and successive projections algorithm (SPA; MC-UVE-SPA) was proposed to select the most effective variables. MC-UVE was firstly used to eliminate the uninformative variables in the raw spectra data. Then, SPA was applied to determine the variables with the least collinearity. A case study was done based on the NIR spectroscopy for the non-destructive determination of soluble solids content (SSC) in ‘Ya’ pear. A total of 160 samples were prepared for the calibration (n = 120) and prediction (n = 40) sets. Three calibration algorithms including linear regressions of partial least square regression (PLS) and multiple linear regression (MLR), and nonlinear regression of least-square support vector machine (LS-SVM) were used for model establishment by using the selected variables by SPA, UVE, MC-UVE, UVE-SPA, and MC-UVE-SPA, respectively. The results indicated that linear models such as PLS and MLR were more effective than nonlinear model such as LS-SVM in the prediction of SSC of ‘Ya’ pear. In terms of linear models, different variable selection methods can obtain a similar result with the RMSEP values range from 0.2437 to 0.2830. However, combination of MC-UVE and SPA was helpful for obtaining a more parsimonious and efficient model for predicting the SSC values in ‘Ya’ pear. Twenty-two effective variables selected by MC-UVE-SPA achieved the optimal linear MC-UVE-SPA-MLR model compared with other all developed models by balancing between model accuracy and model complexity. The coefficients of determination (r 2), root mean square error of prediction, and residual predictive deviation by MC-UVE-SPA-MLR were 0.9271, 0.2522, and 3.7037, respectively.
Similar content being viewed by others
References
Alsberg BK, Woodward AM, Winson MK, Rowland JJ, Kell DB (1998) Variable selection in wavelet regression models. Anal Chim Acta 368:29–44
Araújo MCU, Saldanha TCB, Galvã RKH, Yoneyama T, Chame HC, Visani V (2001) The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemom Intell Lab Syst 57:65–73
Balabin MR, Smirnov SV (2011) Variable selection in near-infrared spectroscopy: benchmarking of feature selection methods on biodiesel data. Anal Chim Acta 692:63–72
Barnes RJ, Dhanoa MS, Lister SJ (1989) Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra. Appl Spectrosc 43:772–777
Breitkreitz MC, Raimundo IM, Rohwedder JJR, Pasquini C, Dantas Filho HA, José GE, Araújo MCU (2003) Determination of total sulfur in diesel fuel employing NIR spectroscopy and multivariate calibration. Analyst 28:1204–1207
Cai WS, Li YK, Shao XG (2008) A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra. Chemom Intell Lab Syst 90:188–194
Cao F, Wu D, He Y (2010) Soluble solids content and pH prediction and varieties discrimination of grapes based on visible-near infrared spectroscopy. Comput Electron Agric 71S:S15–S18
Centner V, Massart DL, de Noord OE, de Jong S, Vandeginste BM, Sterna C (1996) Elimination of uninformative variables for multivariate calibration. Anal Chem 68:3851–3858
Cozzolino D, Liu L, Cynkar WU, Dambergs RG, Janik L, Colby CB, Gishen M (2007) Effect of temperature variation on the visible and near infrared spectra of wine and the consequences on the partial least square calibrations developed to measure chemical composition. Anal Chim Acta 588(2):224–230
Delphine JR, Massart DL, Leardi R, De Noord OE (1995) Genetic algorithms as a tool for wavelength selection in multivariate calibration. Anal Chem 67(23):4295–4301
Di Nezio MS, Pistonesi MF, Fragoso WD, Pontes MJC, Goicoechea HC, Araujo MCU, Fernández Band SB (2007) Successive projections algorithm improving the multivariate simultaneous direct spectrophotometric determination of five phenolic compounds in sea water. Microchem J 85:194–200
Galvão RKH, Araújo MCU, Fragoso WD, Silva EC, José GE, Soares SFC, Paiva HM (2008) A variable elimination method to improve the parsimony of MLR models using the successive projections algorithm. Chemom Intell Lab Syst 92:83–91
Gorry PA (1990) General least-squares smoothing and differentiation by the convolution (Savitzky–Golay) method. Anal Chem 62:570–573
Han QJ, Wu HL, Chen BC, Xu L, Yu RQ (2008) An ensemble of Monte Carlo uninformative variable elimination for wavelength selection. Anal Chim Acta 612:121–125
Helland IS, Nas T, Isaksson T (1995) Related versions of the multiplicative scatter correction method for preprocessing spectroscopic data. Chemom Intell Lab Syst 1995:233–241
Huang HB, Yu HY, Xu HR, Ying YB (2008) Near infrared spectroscopy for on/in-line monitoring of quality in foods and beverages: a review. J Food Eng 87:303–313
Jiang H, Zhu W (2013) Determination of pear internal quality attributes by Fourier transform near infrared (FT-NIR) spectroscopy and multivariate analysis. Food Anal Methods 6:569–577
Jiang JH, Berry RJ, Siesler HW, Ozaki Y (2002) Wavelength interval selection in multicomponent spectral analysis by moving window partial least-squares regression with applications to mid-infrared and near-infrared spectroscopic data. Anal Chem 74(14):3555–3565
Kawano S, Abe H, Iwamoto M (1995) Development of a calibration equation with temperature compensation for determining the Brix value in intact peaches. J Near Infrared Spectrosc 3(4):211–218
Khanmohammadi M, Garmarudi AB, Ghasemi K, Garrigues S, Guardia M (2009) Artificial neural network for quantitative determination of total protein in yogurt by infrared spectrometry. Microchem J 91:47–52
Lanza E, Li BW (1984) Application for near infrared spectroscopy for predicting the sugar content of fruit juices. J Food Sci 49:995–998
Leardi R, Nørgaard L (2004) Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions. J Chemom 18(11):486–497
Li JB, Huang WQ, Zhao CJ, Zhang BH (2013) A comparative study for the quantitative determination of soluble solids content, pH and firmness of pears by Vis/NIR spectroscopy. J Food Eng 116:324–332
Liu F, He Y, Wang L, Pan HM (2007) Feasibility of the use of visible and near infrared spectroscopy to assess soluble solids content and pH of rice wines. J Food Eng 83:430–435
Liu F, He Y, Wang L (2008) Comparison of calibrations for the determination of soluble solids content and pH of rice vinegars using visible and short-wave near infrared spectroscopy. Anal Chim Acta 610:196–204
Min M, Lee WS (2005) Determination of significant wavelengths and prediction of nitrogen content for citrus. Trans ASAE 48:455–461
Mireei SA, Mohtasebi SS, Sadeghi M (2013) Comparison of linear and non-linear calibration models for non-destructive firmness determining of ‘Mazafati’ date fruit by NIR spectroscopy. Int J Food Prop. doi:10.1080/10942912.2012.678533
Moros J, Kuligowski J, Quintás G, Garrigues S, de la Guardia M (2008) New cut-off criterion for uninformative variable elimination in multivariate calibration of near-infrared spectra for the determination of heroin in illicit street drugs. Anal Chim Acta 630(2):150–160
Nicolaï BM, Beullens K, Bobelyn E, Peirs A, Saeys W, Theron KI, Lammertyn J (2007a) Nondestructive measurement of fruit and vegetable quality by means of NIR spectroscopy: a review. Postharvest Biol Technol 46:99–118
Nicolaï BM, Theron KI, Lammertyn J (2007b) Kernel PLS regression on wavelet transformed NIR spectra for prediction of sugar content of apple. Chemom Intell Lab Syst 85:243–252
Nicolaï BM, Verlinden BE, Desmet M, Saevels S, Saeys W, Theron K, Cubeddu R, Pifferi A, Torricelli A (2008) Time-resolved and continuous wave NIR reflectance spectroscopy to predict soluble solids content and firmness of pear. Postharvest Biol Technol 47:68–74
NØgaard L, Saudland A, Wagner J, Nielsen JP, Munck L, Engelsen SB (2000) Interval partial least squares regression (iPLS): a comparative chemometric study with an example from near-infrared spectroscopy. Appl Spectrosc 54:413–419
Picard RR, Cook RD (1984) Cross validation of regression models. J Am Stat Assoc 79(387):575–583
Pravdova V, Walczak B, Massarta DL, Kawano S, Toyodab K, Tsenkova R (2001) Calibration of somatic cell count in milk based on near-infrared spectroscopy. Anal Chim Acta 450(1–2):131–141
Savitzky A, Golay MJE (1964) Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 16:1627–1639
Shao XG, Wang F, Chen D, Su QD (2004) A method for near-infrared spectral calibration of complex plant samples with wavelet transform and elimination of uninformative variables. Anal Bioanal Chem 378(5):1382–1387
Shao YN, Bao YD, He Y (2011) Visible/near-infrared spectra for linear and nonlinear calibrations: a case to predict soluble solids contents and pH value in peach. Food Bioprocess Technol 4(8):1376–1383
Soares SFC, Gomes AA, Galvao AR, Araujo MCU, Galvao RKH (2013) The successive projections algorithm. Trends Anal Chem 42:84–98
Sun XD, Zhang HL, Liu YD (2009a) Nondestructive assessment of quality of Nanfeng mandarin fruit by a portable near infrared spectroscopy. Int J Agric Biol Eng 2(1):65–71
Sun T, Lin HJ, Xu HR, Ying YB (2009b) Effect of fruit moving speed on predicting soluble solids content of ‘Cuiguan’ pears (Pomaceae pyrifolia Nakai cv. Cuiguan). Postharvest Biol Technol 51:86–90
Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Tanvir H, Demetriades-Shah, Steven MD, Clark JA (1990) High resolution derivative spectra in remote sensing. Remote Sens Environ 33:55–64
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
Wang WB, Paliwal J (2007) Near-infrared spectroscopy and imaging in food quality and safety. Sens Instrum Food Qual 1:193–207
Wang XF, Bao YF, Liu GL, Li G, Lin L (2012) Study on the best analysis spectral section of NIR to detect alcohol concentration based on SiPLS. Procedia Eng 29:2285–2290
Wu D, He Y, Feng S (2008) Short-wave near-infrared spectroscopy analysis of major compounds in milk powder and wavelength assignment. Anal Chim Acta 610(2):232–242
Wu D, He Y, Nie PC, Cao F, Bao YD (2010) Hybrid variable selection in visible and near-infrared spectral analysis for non-invasive quality determination of grape juice. Anal Chim Acta 659:229–237
Wu D, Chen XJ, Zhu XG, Guan XC, Wu GC (2011) Uninformative variable elimination for improvement of successive projections algorithm on spectral multivariable selection with different calibration algorithms for the rapid and non-destructive determination of protein content in dried laver. Anal Methods 3:1790–1796
Wu D, Nie PC, He Y, Bao YD (2012) Determination of calcium content in powdered milk using near and mid-infrared spectroscopy with variable selection and chemometrics. Food Bioprocess Technol 5(4):1402–1410
Wu D, Shi H, He Y, Yu XJ, Bao YD (2013) Potential of hyperspectral imaging and multivariate analysis for rapid and non-invasive detection of gelatin adulteration in prawn. J Food Eng 119(3):680–686
Xu QS, Liang YZ, Du YP (2004) Monte Carlo cross-validation for selecting a model and estimating the prediction error in multivariate calibration. J Chemom 18(2):112–120
Xu HR, Qi B, Sun T, Fu XP, Ying YB (2012) Variable selection in visible and near-infrared spectra: application to on-line determination of sugar content in pears. J Food Eng 109:142–147
Yang H, Kuang B, Mouazen AM (2011) Quantitative analysis of soil nitrogen and carbon at a farm scale using visible and near infrared spectroscopy coupled with wavelength reduction. Eur J Soil Sci 63(3):410–420
Ye SF, Wang D, Min SG (2008) Successive projections algorithm combined with uninformative variable elimination for spectral variable selection. Chemom Intell Lab Syst 91:194–199
Ying Y, Liu Y, Fu X (2006) Performance of FT-NIR instrument for Brix value measurement of intact pear fruit. Int J Postharvest Technol Innov 1:238–245
Zou XB, Zhao JW, Malcolm JW, Povey MH, Mao HP (2010) Variables selection methods in near-infrared spectroscopy. Anal Chim Acta 667:14–32
Acknowledgments
The authors gratefully acknowledge the financial support provided by Young Scientist Fund of National Natural Science Foundation of China (Project No. 31301236), China Postdoctoral Science Foundation (Project No. 2012M520193), and Postdoctoral Science Foundation of Beijing of China (Project No. 2013ZZ-70).
Conflict of Interest
Jiangbo Li, Wenqian Huang, Liping Chen, Shuxiang Fan, Baohua Zhang, Zhiming Guo, and Chunjiang Zhao declare that they have no a financial relationship with the organization that sponsored the research and also have no conflict of interest. This article does not contain any studies with human or animal subjects.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Li, J., Huang, W., Chen, L. et al. Variable Selection in Visible and Near-Infrared Spectral Analysis for Noninvasive Determination of Soluble Solids Content of ‘Ya’ Pear. Food Anal. Methods 7, 1891–1902 (2014). https://doi.org/10.1007/s12161-014-9832-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12161-014-9832-8