Soil Organic Carbon Prediction Using Vis-NIR Spectroscopy with a Large Dataset

  • Yang ShiEmail author
  • Rujing Wang
  • Yubing Wang
Conference paper
Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT, volume 545)


Visible and near-infrared reflectance spectroscopy based soil properties estimation is an alternative to traditional laboratory analysis. The calibration model is a main factor influencing predictive performance. In this study, a large scale soil database, which contains 19,036 soil samples, was compared to its subsets to validate the effect of sample size on predictive performance. Four regression techniques based on linear model, namely, multiple linear regression (MLR), principal components regression (PCR), partial least squares regression (PLSR), and stepwise regression (SR) were compared to identify suitable models to predict the content of organic carbon in soil samples. The impact of derivatives or the raw spectra as predictor variables, and the interval of spectra were also studied. The best predictions were obtained using SR and MLR on raw spectra, yielding root mean square of error of cross validation (RMSECV) and coefficient of determination (R2) values of 25.3912, 25.4254 and 0.9227, 0.9225, indicating excellent models.


Soil organic carbon MLR PCR PLSR Stepwise regression 



This work was supported by National Science Foundation of China under Grant No. 31671586.


  1. 1.
    Rossel, R.V., Walvoort, D., McBratney, A., Janik, L.J., Skjemstad, J.: Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties. Geoderma 131(1), 59–75 (2006)CrossRefGoogle Scholar
  2. 2.
    Askari, M.S., Cui, J., O’Rourke, S.M., Holden, N.M.: Evaluation of soil structural quality using VIS–NIR spectra. Soil Tillage Res. 146, 108–117 (2015)CrossRefGoogle Scholar
  3. 3.
    Rossela, R.A.V., et al.: Guest editorial: near infrared spectroscopy for a better understanding of soil (2016)CrossRefGoogle Scholar
  4. 4.
    Li, X., He, Y., Wu, C.: Non-destructive discrimination of paddy seeds of different storage age based on Vis/NIR spectroscopy. J. Stored Prod. Res. 44(3), 264–268 (2008)CrossRefGoogle Scholar
  5. 5.
    Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Demattê, J.A.M., Scholten, T.: The spectrum-based learner: A new local approach for modeling soil vis–NIR spectra of complex datasets. Geoderma 195, 268–279 (2013)CrossRefGoogle Scholar
  6. 6.
    Ramirez-Lopez, L., Behrens, T., Schmidt, K., Rossel, R.V., Demattê, J., Scholten, T.: Distance and similarity-search metrics for use with soil vis–NIR spectra. Geoderma 199, 43–53 (2013)CrossRefGoogle Scholar
  7. 7.
    Ladoni, M., Bahrami, H.A., Alavipanah, S.K., Norouzi, A.A.: Estimating soil organic carbon from soil reflectance: a review. Precis. Agric. 11(1), 82–99 (2010)CrossRefGoogle Scholar
  8. 8.
    Balabin, R.M., Lomakina, E.I.: Support vector machine regression (SVR/LS-SVM)an alternative to neural networks (ANN) for analytical chemistry? comparison of non-linear methods on near infrared (NIR) spectroscopy data. Analyst 136(8), 1703–1712 (2011)CrossRefGoogle Scholar
  9. 9.
    Morellos, A., et al.: Machine learning based prediction of soil total nitrogen, organic carbon and moisture content by using vis-nir spectroscopy. Biosyst. Eng. 152, 104–116 (2016)CrossRefGoogle Scholar
  10. 10.
    Andersen, C.M., Bro, R.: Variable selection in regressiona tutorial. J. Chemom. 24(11–12), 728–737 (2010)CrossRefGoogle Scholar
  11. 11.
    Centner, V., Massart, D.L., de Noord, O.E., de Jong, S., Vandeginste, B.M., Sterna, C.: Elimination of uninformative variables for multivariate calibration. Anal. Chem. 68(21), 3851–3858 (1996)CrossRefGoogle Scholar
  12. 12.
    Li, H., Liang, Y., Xu, Q., Cao, D.: Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. Anal. Chim. Acta 648(1), 77–84 (2009)CrossRefGoogle Scholar
  13. 13.
    Wang, W., Yun, Y., Deng, B., Fan, W., Liang, Y.: Iteratively variable subset optimization for multivariate calibration. RSC Adv. 5(116), 95771–95780 (2015)CrossRefGoogle Scholar
  14. 14.
    Hummel, J., Sudduth, K., Hollinger, S.: Soil moisture and organic matter prediction of surface and subsurface soils using an nir soil sensor. Comput. Electron. Agric. 32(2), 149–165 (2001)CrossRefGoogle Scholar
  15. 15.
    Montanarella, L., Tóth, G., Jones, A.: Soil component in the 2009 lucas survey. Land quality and land use information in the European Union. JRC, Office for Official Publications of the European Communities, Luxembourg, pp. 209–220 (2011)Google Scholar
  16. 16.
    Panagos, P., Van Liedekerke, M., Jones, A., Montanarella, L.: European soil data centre: response to european policy support and public data requirements. Land Use Policy 29(2), 329–338 (2012)CrossRefGoogle Scholar
  17. 17.
    Tóth, G., Jones, A., Montanarella, L.: LUCAS Topsoil Survey: Methodology, Data and Results. Publications Office (2013)Google Scholar
  18. 18.
    Rinnan, Å., van den Berg, F., Engelsen, S.B.: Review of the most common preprocessing techniques for near-infrared spectra. TrAC Trends Anal. Chem. 28(10), 1201–1222 (2009)CrossRefGoogle Scholar
  19. 19.
    Wold, S., Sjöström, M., Eriksson, L.: PLS-regression: a basic tool of chemometrics. Chemom. Intell. Lab. Syst. 58(2), 109–130 (2001)CrossRefGoogle Scholar
  20. 20.
    Vohland, M., Ludwig, M., Thiele-Bruhn, S., Ludwig, B.: Determination of soil properties with visible to near-and mid-infrared spectroscopy: effects of spectral variable selection. Geoderma 223, 88–96 (2014)CrossRefGoogle Scholar
  21. 21.
    Vohland, M., Besold, J., Hill, J., Fründ, H.C.: Comparing different multivariate calibration methods for the determination of soil organic carbon pools with visible to near infrared spectroscopy. Geoderma 166(1), 198–205 (2011)CrossRefGoogle Scholar
  22. 22.
    Wang, Y., et al.: Soil pH value, organic matter and macronutrients contents prediction using optical diffuse reflectance spectroscopy. Comput. Electron. Agric. 111, 69–77 (2015)CrossRefGoogle Scholar
  23. 23.
    R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2016).
  24. 24.
    Stevens, A., Ramirez-Lopez, L.: An introduction to the prospectr package (2013), r package version 0.1.3Google Scholar
  25. 25.
    Savitzky, A., Golay, M.J.: Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36(8), 1627–1639 (1964)CrossRefGoogle Scholar

Copyright information

© IFIP International Federation for Information Processing 2019

Authors and Affiliations

  1. 1.Institute of Intelligent Machines, Chinese Academy of ScienceHefeiChina
  2. 2.Department of AutomationUniversity of Science and Technology of ChinaHefeiChina

Personalised recommendations