Skip to main content
Log in

Variable Selection in Visible and Near-Infrared Spectral Analysis for Noninvasive Determination of Soluble Solids Content of ‘Ya’ Pear

  • Published:
Food Analytical Methods Aims and scope Submit manuscript

Abstract

Informative variable selection or wavelength selection plays an important role in the quantitative analysis of near-infrared (NIR) spectra because the modern spectroscopy instrumentations usually have a high resolution and the obtained spectral data sets may have thousands of variables and hundreds or thousands of samples. In this study, a new combination of Monte Carlo–uninformative variable elimination (MC-UVE) and successive projections algorithm (SPA; MC-UVE-SPA) was proposed to select the most effective variables. MC-UVE was firstly used to eliminate the uninformative variables in the raw spectra data. Then, SPA was applied to determine the variables with the least collinearity. A case study was done based on the NIR spectroscopy for the non-destructive determination of soluble solids content (SSC) in ‘Ya’ pear. A total of 160 samples were prepared for the calibration (n = 120) and prediction (n = 40) sets. Three calibration algorithms including linear regressions of partial least square regression (PLS) and multiple linear regression (MLR), and nonlinear regression of least-square support vector machine (LS-SVM) were used for model establishment by using the selected variables by SPA, UVE, MC-UVE, UVE-SPA, and MC-UVE-SPA, respectively. The results indicated that linear models such as PLS and MLR were more effective than nonlinear model such as LS-SVM in the prediction of SSC of ‘Ya’ pear. In terms of linear models, different variable selection methods can obtain a similar result with the RMSEP values range from 0.2437 to 0.2830. However, combination of MC-UVE and SPA was helpful for obtaining a more parsimonious and efficient model for predicting the SSC values in ‘Ya’ pear. Twenty-two effective variables selected by MC-UVE-SPA achieved the optimal linear MC-UVE-SPA-MLR model compared with other all developed models by balancing between model accuracy and model complexity. The coefficients of determination (r 2), root mean square error of prediction, and residual predictive deviation by MC-UVE-SPA-MLR were 0.9271, 0.2522, and 3.7037, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Alsberg BK, Woodward AM, Winson MK, Rowland JJ, Kell DB (1998) Variable selection in wavelet regression models. Anal Chim Acta 368:29–44

    Article  CAS  Google Scholar 

  • Araújo MCU, Saldanha TCB, Galvã RKH, Yoneyama T, Chame HC, Visani V (2001) The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemom Intell Lab Syst 57:65–73

    Article  Google Scholar 

  • Balabin MR, Smirnov SV (2011) Variable selection in near-infrared spectroscopy: benchmarking of feature selection methods on biodiesel data. Anal Chim Acta 692:63–72

    Article  CAS  Google Scholar 

  • Barnes RJ, Dhanoa MS, Lister SJ (1989) Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra. Appl Spectrosc 43:772–777

    Article  CAS  Google Scholar 

  • Breitkreitz MC, Raimundo IM, Rohwedder JJR, Pasquini C, Dantas Filho HA, José GE, Araújo MCU (2003) Determination of total sulfur in diesel fuel employing NIR spectroscopy and multivariate calibration. Analyst 28:1204–1207

    Article  Google Scholar 

  • Cai WS, Li YK, Shao XG (2008) A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra. Chemom Intell Lab Syst 90:188–194

    Article  CAS  Google Scholar 

  • Cao F, Wu D, He Y (2010) Soluble solids content and pH prediction and varieties discrimination of grapes based on visible-near infrared spectroscopy. Comput Electron Agric 71S:S15–S18

    Article  Google Scholar 

  • Centner V, Massart DL, de Noord OE, de Jong S, Vandeginste BM, Sterna C (1996) Elimination of uninformative variables for multivariate calibration. Anal Chem 68:3851–3858

    Article  CAS  Google Scholar 

  • Cozzolino D, Liu L, Cynkar WU, Dambergs RG, Janik L, Colby CB, Gishen M (2007) Effect of temperature variation on the visible and near infrared spectra of wine and the consequences on the partial least square calibrations developed to measure chemical composition. Anal Chim Acta 588(2):224–230

    Article  CAS  Google Scholar 

  • Delphine JR, Massart DL, Leardi R, De Noord OE (1995) Genetic algorithms as a tool for wavelength selection in multivariate calibration. Anal Chem 67(23):4295–4301

    Article  Google Scholar 

  • Di Nezio MS, Pistonesi MF, Fragoso WD, Pontes MJC, Goicoechea HC, Araujo MCU, Fernández Band SB (2007) Successive projections algorithm improving the multivariate simultaneous direct spectrophotometric determination of five phenolic compounds in sea water. Microchem J 85:194–200

    Article  Google Scholar 

  • Galvão RKH, Araújo MCU, Fragoso WD, Silva EC, José GE, Soares SFC, Paiva HM (2008) A variable elimination method to improve the parsimony of MLR models using the successive projections algorithm. Chemom Intell Lab Syst 92:83–91

    Article  Google Scholar 

  • Gorry PA (1990) General least-squares smoothing and differentiation by the convolution (Savitzky–Golay) method. Anal Chem 62:570–573

    Article  CAS  Google Scholar 

  • Han QJ, Wu HL, Chen BC, Xu L, Yu RQ (2008) An ensemble of Monte Carlo uninformative variable elimination for wavelength selection. Anal Chim Acta 612:121–125

    Article  CAS  Google Scholar 

  • Helland IS, Nas T, Isaksson T (1995) Related versions of the multiplicative scatter correction method for preprocessing spectroscopic data. Chemom Intell Lab Syst 1995:233–241

    Article  Google Scholar 

  • Huang HB, Yu HY, Xu HR, Ying YB (2008) Near infrared spectroscopy for on/in-line monitoring of quality in foods and beverages: a review. J Food Eng 87:303–313

    Article  CAS  Google Scholar 

  • Jiang H, Zhu W (2013) Determination of pear internal quality attributes by Fourier transform near infrared (FT-NIR) spectroscopy and multivariate analysis. Food Anal Methods 6:569–577

    Article  Google Scholar 

  • Jiang JH, Berry RJ, Siesler HW, Ozaki Y (2002) Wavelength interval selection in multicomponent spectral analysis by moving window partial least-squares regression with applications to mid-infrared and near-infrared spectroscopic data. Anal Chem 74(14):3555–3565

    Article  CAS  Google Scholar 

  • Kawano S, Abe H, Iwamoto M (1995) Development of a calibration equation with temperature compensation for determining the Brix value in intact peaches. J Near Infrared Spectrosc 3(4):211–218

    Article  CAS  Google Scholar 

  • Khanmohammadi M, Garmarudi AB, Ghasemi K, Garrigues S, Guardia M (2009) Artificial neural network for quantitative determination of total protein in yogurt by infrared spectrometry. Microchem J 91:47–52

    Article  CAS  Google Scholar 

  • Lanza E, Li BW (1984) Application for near infrared spectroscopy for predicting the sugar content of fruit juices. J Food Sci 49:995–998

    Article  CAS  Google Scholar 

  • Leardi R, Nørgaard L (2004) Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions. J Chemom 18(11):486–497

    Article  CAS  Google Scholar 

  • Li JB, Huang WQ, Zhao CJ, Zhang BH (2013) A comparative study for the quantitative determination of soluble solids content, pH and firmness of pears by Vis/NIR spectroscopy. J Food Eng 116:324–332

    Article  CAS  Google Scholar 

  • Liu F, He Y, Wang L, Pan HM (2007) Feasibility of the use of visible and near infrared spectroscopy to assess soluble solids content and pH of rice wines. J Food Eng 83:430–435

    Article  CAS  Google Scholar 

  • Liu F, He Y, Wang L (2008) Comparison of calibrations for the determination of soluble solids content and pH of rice vinegars using visible and short-wave near infrared spectroscopy. Anal Chim Acta 610:196–204

    Article  CAS  Google Scholar 

  • Min M, Lee WS (2005) Determination of significant wavelengths and prediction of nitrogen content for citrus. Trans ASAE 48:455–461

    Article  CAS  Google Scholar 

  • Mireei SA, Mohtasebi SS, Sadeghi M (2013) Comparison of linear and non-linear calibration models for non-destructive firmness determining of ‘Mazafati’ date fruit by NIR spectroscopy. Int J Food Prop. doi:10.1080/10942912.2012.678533

    Google Scholar 

  • Moros J, Kuligowski J, Quintás G, Garrigues S, de la Guardia M (2008) New cut-off criterion for uninformative variable elimination in multivariate calibration of near-infrared spectra for the determination of heroin in illicit street drugs. Anal Chim Acta 630(2):150–160

    Article  CAS  Google Scholar 

  • Nicolaï BM, Beullens K, Bobelyn E, Peirs A, Saeys W, Theron KI, Lammertyn J (2007a) Nondestructive measurement of fruit and vegetable quality by means of NIR spectroscopy: a review. Postharvest Biol Technol 46:99–118

    Article  Google Scholar 

  • Nicolaï BM, Theron KI, Lammertyn J (2007b) Kernel PLS regression on wavelet transformed NIR spectra for prediction of sugar content of apple. Chemom Intell Lab Syst 85:243–252

    Article  Google Scholar 

  • Nicolaï BM, Verlinden BE, Desmet M, Saevels S, Saeys W, Theron K, Cubeddu R, Pifferi A, Torricelli A (2008) Time-resolved and continuous wave NIR reflectance spectroscopy to predict soluble solids content and firmness of pear. Postharvest Biol Technol 47:68–74

    Article  Google Scholar 

  • NØgaard L, Saudland A, Wagner J, Nielsen JP, Munck L, Engelsen SB (2000) Interval partial least squares regression (iPLS): a comparative chemometric study with an example from near-infrared spectroscopy. Appl Spectrosc 54:413–419

    Article  Google Scholar 

  • Picard RR, Cook RD (1984) Cross validation of regression models. J Am Stat Assoc 79(387):575–583

    Article  Google Scholar 

  • Pravdova V, Walczak B, Massarta DL, Kawano S, Toyodab K, Tsenkova R (2001) Calibration of somatic cell count in milk based on near-infrared spectroscopy. Anal Chim Acta 450(1–2):131–141

    Article  CAS  Google Scholar 

  • Savitzky A, Golay MJE (1964) Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 16:1627–1639

    Article  Google Scholar 

  • Shao XG, Wang F, Chen D, Su QD (2004) A method for near-infrared spectral calibration of complex plant samples with wavelet transform and elimination of uninformative variables. Anal Bioanal Chem 378(5):1382–1387

    Article  CAS  Google Scholar 

  • Shao YN, Bao YD, He Y (2011) Visible/near-infrared spectra for linear and nonlinear calibrations: a case to predict soluble solids contents and pH value in peach. Food Bioprocess Technol 4(8):1376–1383

    Article  Google Scholar 

  • Soares SFC, Gomes AA, Galvao AR, Araujo MCU, Galvao RKH (2013) The successive projections algorithm. Trends Anal Chem 42:84–98

    Article  CAS  Google Scholar 

  • Sun XD, Zhang HL, Liu YD (2009a) Nondestructive assessment of quality of Nanfeng mandarin fruit by a portable near infrared spectroscopy. Int J Agric Biol Eng 2(1):65–71

    CAS  Google Scholar 

  • Sun T, Lin HJ, Xu HR, Ying YB (2009b) Effect of fruit moving speed on predicting soluble solids content of ‘Cuiguan’ pears (Pomaceae pyrifolia Nakai cv. Cuiguan). Postharvest Biol Technol 51:86–90

    Article  CAS  Google Scholar 

  • Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300

    Article  Google Scholar 

  • Tanvir H, Demetriades-Shah, Steven MD, Clark JA (1990) High resolution derivative spectra in remote sensing. Remote Sens Environ 33:55–64

    Article  Google Scholar 

  • Vapnik VN (1995) The nature of statistical learning theory. Springer, New York

    Book  Google Scholar 

  • Wang WB, Paliwal J (2007) Near-infrared spectroscopy and imaging in food quality and safety. Sens Instrum Food Qual 1:193–207

    Article  Google Scholar 

  • Wang XF, Bao YF, Liu GL, Li G, Lin L (2012) Study on the best analysis spectral section of NIR to detect alcohol concentration based on SiPLS. Procedia Eng 29:2285–2290

    Article  CAS  Google Scholar 

  • Wu D, He Y, Feng S (2008) Short-wave near-infrared spectroscopy analysis of major compounds in milk powder and wavelength assignment. Anal Chim Acta 610(2):232–242

    Article  CAS  Google Scholar 

  • Wu D, He Y, Nie PC, Cao F, Bao YD (2010) Hybrid variable selection in visible and near-infrared spectral analysis for non-invasive quality determination of grape juice. Anal Chim Acta 659:229–237

    Article  CAS  Google Scholar 

  • Wu D, Chen XJ, Zhu XG, Guan XC, Wu GC (2011) Uninformative variable elimination for improvement of successive projections algorithm on spectral multivariable selection with different calibration algorithms for the rapid and non-destructive determination of protein content in dried laver. Anal Methods 3:1790–1796

    Article  CAS  Google Scholar 

  • Wu D, Nie PC, He Y, Bao YD (2012) Determination of calcium content in powdered milk using near and mid-infrared spectroscopy with variable selection and chemometrics. Food Bioprocess Technol 5(4):1402–1410

    Article  CAS  Google Scholar 

  • Wu D, Shi H, He Y, Yu XJ, Bao YD (2013) Potential of hyperspectral imaging and multivariate analysis for rapid and non-invasive detection of gelatin adulteration in prawn. J Food Eng 119(3):680–686

    Article  CAS  Google Scholar 

  • Xu QS, Liang YZ, Du YP (2004) Monte Carlo cross-validation for selecting a model and estimating the prediction error in multivariate calibration. J Chemom 18(2):112–120

    Article  CAS  Google Scholar 

  • Xu HR, Qi B, Sun T, Fu XP, Ying YB (2012) Variable selection in visible and near-infrared spectra: application to on-line determination of sugar content in pears. J Food Eng 109:142–147

    Article  CAS  Google Scholar 

  • Yang H, Kuang B, Mouazen AM (2011) Quantitative analysis of soil nitrogen and carbon at a farm scale using visible and near infrared spectroscopy coupled with wavelength reduction. Eur J Soil Sci 63(3):410–420

    Article  Google Scholar 

  • Ye SF, Wang D, Min SG (2008) Successive projections algorithm combined with uninformative variable elimination for spectral variable selection. Chemom Intell Lab Syst 91:194–199

    Article  CAS  Google Scholar 

  • Ying Y, Liu Y, Fu X (2006) Performance of FT-NIR instrument for Brix value measurement of intact pear fruit. Int J Postharvest Technol Innov 1:238–245

    Article  Google Scholar 

  • Zou XB, Zhao JW, Malcolm JW, Povey MH, Mao HP (2010) Variables selection methods in near-infrared spectroscopy. Anal Chim Acta 667:14–32

    Article  CAS  Google Scholar 

Download references

Acknowledgments

The authors gratefully acknowledge the financial support provided by Young Scientist Fund of National Natural Science Foundation of China (Project No. 31301236), China Postdoctoral Science Foundation (Project No. 2012M520193), and Postdoctoral Science Foundation of Beijing of China (Project No. 2013ZZ-70).

Conflict of Interest

Jiangbo Li, Wenqian Huang, Liping Chen, Shuxiang Fan, Baohua Zhang, Zhiming Guo, and Chunjiang Zhao declare that they have no a financial relationship with the organization that sponsored the research and also have no conflict of interest. This article does not contain any studies with human or animal subjects.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jiangbo Li or Chunjiang Zhao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Huang, W., Chen, L. et al. Variable Selection in Visible and Near-Infrared Spectral Analysis for Noninvasive Determination of Soluble Solids Content of ‘Ya’ Pear. Food Anal. Methods 7, 1891–1902 (2014). https://doi.org/10.1007/s12161-014-9832-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12161-014-9832-8

Keywords

Navigation