Skip to main content

Advertisement

Log in

Accuracy and uncertainty of geostatistical models versus machine learning for digital mapping of soil calcium and potassium

  • Published:
Environmental Monitoring and Assessment Aims and scope Submit manuscript

Abstract

Accuracy and uncertainty of models used for digital soil mapping are important for assessing confidence of predictions and reliable land use planning and management. In this study, two approaches of geostatistical (spatial) and machine learning (ML) models were evaluated for predictive mapping of soil calcium (Ca) and potassium (K). Two spatial models including empirical Bayesian kriging (EBK) and sequential Gaussian simulation (SGS) were compared with machine learning models: Cubist, random forest (RF) and support vector machine (SVM) in terms of their accuracy and uncertainty for mapping soil Ca and K. The study area is in Nowley, New South Wales, Australia, with an area of 2083 ha and a variety of soil types and farming systems. For the models training process, 240 soil samples data and for validation 102 independent samples data were used. For accuracy assessment R2, root mean square error (RMSE), concordance and bias and for uncertainty assessment confidence limits were investigated. Also, in order to compare the outcomes for the two soil properties with different measurement units, mean absolute percentage error (MAPE) and relative uncertainty (RU) as accuracy and uncertainty measures, respectively, were evaluated. Results showed that for K map SGS had the highest R2 (0.74) and lowest RMSE (1.96), followed by EBK with R2 = 0.72 and RMSE = 2.02. For Ca map, EBK model showed the highest accuracy (R2 = 0.46; RMSE = 3.21), followed by SVM and SGS with comparable accuracies. Comparing the two soil properties, Ca map showed higher MAPE and RU, compared to K map. The lowest MAPE was obtained for EBK model (for K = 39) and SGS model (for K = 44). Also, the lowest RU values were found for EBK and SGS models. Among the ML models, SVM showed lower sensitivity to higher variance in data input. In general, the spatial models outperformed the ML models with regard to both accuracy and uncertainty. An additional conclusion is that considering the data variance in the two soil properties, geostatistical models with lower RU and MAPE were relatively less susceptible to data variance, compared to ML models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data availability statement

The datasets generated during and/or analyzed during the current study are not publicly available due to third party restrictions, but are available from the author on reasonable request.

References

  • Adhikari, K., Hartemink, A. E., Minasny, B., Kheir, R. B., Greve, M. B., & Greve, M. H. (2014). Digital mapping of soil organic carbon contents and stocks in Denmark. PLoS ONE, 9(8), e105519.

    Article  Google Scholar 

  • Arrouays, D., Lagacherie, P., & Hartemink, A. E. (2017). Digital soil mapping across the globe. In Geoderma Regional (Vol. 9, pp. 1–4). Elsevier.

  • Beguin, J., Fuglstad, G.-A., Mansuy, N., & Paré, D. (2017). Predicting soil properties in the Canadian boreal forest with limited data: Comparison of spatial and non-spatial statistical approaches. Geoderma, 306, 195–205.

    Article  CAS  Google Scholar 

  • Beven, K. J., & Kirkby, M. J. (1979). A physically based, variable contributing area model of basin hydrology/Un modèle à base physique de zone d’appel variable de l’hydrologie du bassin versant. Hydrological Sciences Journal, 24(1), 43–69.

    Article  Google Scholar 

  • Bohling, G. C. (2007). Introduction to geostatistics. Kansas Geological Survey Open File Report, 2007–26, 50.

    Google Scholar 

  • Borůvka, L., Vašát, R., Němeček, K., Novotný, R., Šrámek, V., Vacek, O., Pavlů, L., Fadrhonsová, V., & Drábek, O. (2020). Application of regression-kriging and sequential Gaussian simulation for the delineation of forest areas potentially suitable for liming in the Jizera Mountains region, Czech Republic. Geoderma Regional, 21, e00286.

    Article  Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

    Article  Google Scholar 

  • Cambardella, C. A., Moorman, T. B., Novak, J. M., Parkin, T. B., Karlen, D. L., Turco, R. F., & Konopka, A. E. (1994). Field-scale variability of soil properties in central Iowa soils. Soil Science Society of America Journal, 58(5), 1501–1511.

    Article  Google Scholar 

  • Carrara, M., Castrignanò, A., Comparetti, A., Febo, P., & Orlando, S. (2007). Mapping of penetrometer resistance in relation to tractor traffic using multivariate geostatistics. Geoderma, 142(3–4), 294–307.

    Article  Google Scholar 

  • Castrignanò, A., & Buttafuoco, G. (2004). Geostatistical stochastic simulation of soil water content in a forested area of south Italy. Biosystems Engineering, 87(2), 257–266.

    Article  Google Scholar 

  • Chen, F., Chen, S., & Peng, G. (2012). Using sequential gaussian simulation to assess geochemical anomaly areas of lead element. International Conference on Computer and Computing Technologies in Agriculture, 69–76.

  • Chen, S., Mulder, V. L., Martin, M. P., Walter, C., Lacoste, M., Richer-de-Forges, A. C., Saby, N. P. A., Loiseau, T., Hu, B., & Arrouays, D. (2019). Probability mapping of soil thickness by random survival forest at a national scale. Geoderma.

  • Drucker, H., Burges, C. J. C., Kaufman, L., Smola, A., & Vapnik, V. (1997). Support vector regression machines. Advances in Neural Information Processing Systems, 9, 155–161.

    Google Scholar 

  • Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap. CRC Press.

    Book  Google Scholar 

  • Giustini, F., Ciotoli, G., Rinaldini, A., Ruggiero, L., & Voltaggio, M. (2019). Mapping the geogenic radon potential and radon risk by using Empirical Bayesian Kriging regression: A case study from a volcanic area of central Italy. Science of the Total Environment, 661, 449–464.

    Article  CAS  Google Scholar 

  • Goovaerts, P. (1997). Kriging vs stochastic simulation for risk analysis in soil contamination. In geoENV I—Geostatistics for Environmental Applications (pp. 247–258). Springer.

  • Goovaerts, P. (1999). Geostatistics in soil science: State-of-the-art and perspectives. Geoderma, 89(1–2), 1–45.

    Article  Google Scholar 

  • Goovaerts, P. (2000). Estimation or simulation of soil properties? An optimization problem with conflicting criteria. Geoderma, 97(3–4), 165–186.

    Article  Google Scholar 

  • Goovaerts, P. (2001). Geostatistical modelling of uncertainty in soil science. Geoderma, 103(1–2), 3–26.

    Article  Google Scholar 

  • Gribov, A., & Krivoruchko, K. (2020). Empirical Bayesian kriging implementation and usage. Science of the Total Environment, 722, 137290.

    Article  CAS  Google Scholar 

  • Grimm, R., Behrens, T., Märker, M., & Elsenbeer, H. (2008). Soil organic carbon concentrations and stocks on Barro Colorado Island—Digital soil mapping using Random Forests analysis. Geoderma, 146(1–2), 102–113.

    Article  CAS  Google Scholar 

  • Grunwald, S. (2009). Multi-criteria characterization of recent digital soil mapping and modeling approaches. Geoderma, 152(3–4), 195–207.

    Article  Google Scholar 

  • Guo, P.-T., Li, M.-F., Luo, W., Tang, Q.-F., Liu, Z.-W., & Lin, Z.-M. (2015). Digital mapping of soil organic matter for rubber plantation at regional scale: An application of random forest plus residuals kriging approach. Geoderma, 237, 49–59.

    Article  Google Scholar 

  • Heuvelink, G. B. M., Kros, J., Reinds, G. J., & De Vries, W. (2016). Geostatistical prediction and simulation of European soil property maps. Geoderma Regional, 7(2), 201–215.

    Article  Google Scholar 

  • Hjerdt, K. N., McDonnell, J. J., Seibert, J., & Rodhe, A. (2004). A new topographic index to quantify downslope controls on local drainage. Water Resources Research, 40(5).

  • Keskin, H., Grunwald, S., & Harris, W. G. (2019). Digital mapping of soil carbon fractions with machine learning. Geoderma, 339, 40–58.

    Article  CAS  Google Scholar 

  • Kidd, D., Searle, R., Grundy, M., McBratney, A., Robinson, N., O’Brien, L., Zund, P., Arrouays, D., Thomas, M., & Padarian, J. (2020). Operationalising digital soil mapping–Lessons from Australia. Geoderma Regional, e00335.

  • Krivoruchko, K. (2012). Empirical bayesian kriging. ArcUser Fall, 6(10).

  • Krivoruchko, K., & Gribov, A. (2019). Evaluation of empirical Bayesian kriging. Spatial Statistics, 32, 100368.

    Article  Google Scholar 

  • Kuhn, M, Weston, S., Keefer, C., & Coulter, N. (2012). Cubist models for regression. R Package Vignette R Package Version 0.0, 18.

  • Kuhn, M, Weston, S., Keefer, C., Coulter, N., & Quinlan, R. (2013). Cubist: Rule-and Instance-Based Regression Modeling. R package version 0.0. 15.

  • Lark, R. M. (2012). A stochastic geometric model for continuous local trends in soil variation. Geoderma, 189, 661–670.

    Article  Google Scholar 

  • Lawrence, I., & Lin, K. (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics, 255–268.

  • Li, Y., Hernandez, J. H., Aviles, M., Knappett, P. S. K., Giardino, J. R., Miranda, R., Puy, M. J., Padilla, F., & Morales, J. (2020). Empirical Bayesian Kriging method to evaluate inter-annual water-table evolution in the Cuenca Alta del Río Laja aquifer, Guanajuato, México. Journal of Hydrology, 582, 124517.

    Article  Google Scholar 

  • Liang, Z., Chen, S., Yang, Y., Zhao, R., Shi, Z., & Rossel, R. A. V. (2019). National digital soil map of organic matter in topsoil and its associated uncertainty in 1980’s China. Geoderma, 335, 47–56.

    Article  Google Scholar 

  • Liao, K., Lai, X., Lv, L., & Zhu, Q. (2016). Uncertainty in predicting the spatial pattern of soil water temporal stability at the hillslope scale. Soil Research, 54(6), 739–748.

    Article  Google Scholar 

  • Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18–22.

    Google Scholar 

  • Lima, C. H. R., Kwon, H.-H., & Kim, Y.-T. (2021). A Bayesian Kriging model applied for spatial downscaling of daily rainfall from GCMs. Journal of Hydrology, 597, 126095.

    Article  Google Scholar 

  • Luo, Z., Eady, S., Sharma, B., Grant, T., Li Liu, D., Cowie, A., Farquharson, R., Simmons, A., Crawford, D., & Searle, R. (2019). Mapping future soil carbon change and its uncertainty in croplands using simple surrogates of a complex farming system model. Geoderma, 337, 311–321.

    Article  CAS  Google Scholar 

  • Ma, Y. X., Minasny, B., Malone, B. P., & McBratney, A. B. (2019). Pedology and digital soil mapping (DSM). European Journal of Soil Science70, 216–235. https://doi.org/10.1111/ejss.12790

  • Malone, B. P., McBratney, A. B., & Minasny, B. (2011). Empirical estimates of uncertainty for mapping continuous depth functions of soil attributes. Geoderma, 160(3–4), 614–626.

    Article  Google Scholar 

  • Malone, B. P., Minasny, B., & McBratney, A. B. (2017). Using R for digital soil mapping. Springer.

  • McBratney, A. B., Santos, M. L. M., & Minasny, B. (2003). On digital soil mapping. Geoderma, 117(1–2), 3–52.

    Article  Google Scholar 

  • Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F., Chang, C.-C., Lin, C.-C., & Meyer, M. D. (2019). Package ‘e1071.’ The R Journal.

  • Minaei-Bidgoli, B., Topchy, A. P., & Punch, W. F. (2004). A Comparison of Resampling Methods for Clustering Ensembles. IC-AI, 939–945.

  • Minasny, B., & McBratney, A. B. (2016). Digital soil mapping: A brief history and some lessons. Geoderma, 264, 301–311.

    Article  Google Scholar 

  • Minty, B., Franklin, R., Milligan, P., Richardson, M., & Wilford, J. (2009). The radiometric map of Australia. Exploration Geophysics, 40(4), 325–333.

    Article  CAS  Google Scholar 

  • Mulder, V. L., Lacoste, M., Richer-de-Forges, A. C., & Arrouays, D. (2016a). GlobalSoilMap France: High-resolution spatial modelling the soils of France up to two meter depth. Science of the Total Environment, 573, 1352–1369.

    Article  CAS  Google Scholar 

  • Mulder, V. L., Lacoste, M., Richer-de-Forges, A. C., Martin, M. P., & Arrouays, D. (2016b). National versus global modelling the 3D distribution of soil organic carbon in mainland France. Geoderma, 263, 16–34.

    Article  CAS  Google Scholar 

  • Padarian, J., Minasny, B., & McBratney, A. B. (2020). Machine learning and soil sciences: A review aided by machine learning tools. The Soil, 6(1), 35–52.

    Article  CAS  Google Scholar 

  • Pardo-Igúzquiza, E., & Olea, R. A. (2012). VARBOOT: A spatial bootstrap program for semivariogram uncertainty assessment. Computers & Geosciences, 41, 188–198.

    Article  Google Scholar 

  • Pásztor, L., Laborczi, A., Takács, K., Illés, G., Szabó, J., & Szatmári, G. (2020). Progress in the elaboration of GSM conform DSM products and their functional utilization in Hungary. Geoderma Regional, 21, e00269.

    Article  Google Scholar 

  • Pilz, J., & Spöck, G. (2008). Why do we need and how should we implement Bayesian kriging methods. Stochastic Environmental Research and Risk Assessment, 22(5), 621–632.

    Article  Google Scholar 

  • Quinlan, J. R. (1992). Learning with continuous classes. 5th Australian Joint Conference on Artificial Intelligence, 92, 343–348.

  • R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

  • Rossel, R. A. V., Chen, C., Grundy, M. J., Searle, R., Clifford, D., & Campbell, P. H. (2015). The Australian three-dimensional soil grid: Australia’s contribution to the GlobalSoilMap project. Soil Research, 53(8), 845–864.

    Article  Google Scholar 

  • Safikhani, M., Asghari, O., & Emery, X. (2017). Assessing the accuracy of sequential gaussian simulation through statistical testing. Stochastic Environmental Research and Risk Assessment, 31(2), 523–533.

    Article  Google Scholar 

  • Sahu, B., & Ghosh, A. K. (2021). Deterministic and geostatistical models for predicting soil organic carbon in a 60 ha farm on Inceptisol in Varanasi (p. e00413). Geoderma Regional.

    Google Scholar 

  • Scholkopf, B., & Smola, A. J. (2018). Learning with kernels: support vector machines, regularization, optimization, and beyond. Adaptive Computation and Machine Learning series.

  • Searle, R., McBratney, A., Grundy, M., Kidd, D., Malone, B., Arrouays, D., Stockman, U., Zund, P., Wilson, P., & Wilford, J. (2021). Digital soil mapping and assessment for Australia and beyond: A propitious future. Geoderma Regional, e00359.

  • Sharififar, A., Sarmadian, F., & Minasny, B. (2019). Mapping imbalanced soil classes using Markov chain random fields models treated with data resampling technique. Computers and Electronics in Agriculture, 159, 110–118.

    Article  Google Scholar 

  • Shrestha, D. L., & Solomatine, D. P. (2006). Machine learning approaches for estimation of prediction interval for the model output. Neural Networks, 19(2), 225–235.

    Article  Google Scholar 

  • Singh, K., & Whelan, B. (2020). Soil carbon change across ten New South Wales farms under different farm management regimes in Australia. Soil Use and Management, 36(4), 616–632.

    Article  Google Scholar 

  • Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199–222.

    Article  Google Scholar 

  • Somarathna, P., Minasny, B., & Malone, B. P. (2017). More Data or a Better Model? Figuring Out What Matters Most for the Spatial Prediction of Soil Carbon. Soil Science Society of America Journal, 81(6), 1413–1426.

    Article  CAS  Google Scholar 

  • Stockmann, U., Cattle, S. R., Minasny, B., & McBratney, A. B. (2016). Utilizing portable X-ray fluorescence spectrometry for in-field investigation of pedogenesis. Catena, 139, 220–231.

    Article  CAS  Google Scholar 

  • Szatmári, G., & Pásztor, L. (2019). Comparison of various uncertainty modelling approaches based on geostatistics and machine learning algorithms. Geoderma, 337, 1329–1340.

    Article  Google Scholar 

  • Taghizadeh-Mehrjardi, R., Minasny, B., Sarmadian, F., & Malone, B. P. (2014). Digital mapping of soil salinity in Ardakan region, central Iran. Geoderma, 213, 15–28.

    Article  CAS  Google Scholar 

  • Taghizadeh-Mehrjardi, R., Nabiollahi, K., & Kerry, R. (2016). Digital mapping of soil organic carbon at multiple depths using different data mining techniques in Baneh region, Iran. Geoderma, 266, 98–110.

    Article  CAS  Google Scholar 

  • Taghizadeh-Mehrjardi, R., Schmidt, K., Toomanian, N., Heung, B., Behrens, T., Mosavi, A., Band, S. S., Amirian-Chakan, A., Fathabadi, A., & Scholten, T. (2021). Improving the spatial prediction of soil salinity in arid regions using wavelet transformation and support vector regression models. Geoderma, 383, 114793.

    Article  CAS  Google Scholar 

  • Trevor, H., Robert, T., & JH, F. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer.

    Google Scholar 

  • Wadoux, A. M. -C., Minasny, B., & McBratney, A. B. (2020). Machine learning for digital soil mapping: applications, challenges and suggested solutions. Earth-Science Reviews, 103359.

  • Wiesmeier, M., Barthold, F., Blank, B., & Kögel-Knabner, I. (2011). Digital mapping of soil organic matter stocks using Random Forest modeling in a semi-arid steppe ecosystem. Plant and Soil, 340(1–2), 7–24.

    Article  CAS  Google Scholar 

Download references

Acknowledgements

I would like to acknowledge and thank Prof. Budiman Minasny from the University of Sydney for providing the data used in this study.

Funding

The author declares that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amin Sharififar.

Ethics declarations

Competing interests

The author has no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sharififar, A. Accuracy and uncertainty of geostatistical models versus machine learning for digital mapping of soil calcium and potassium. Environ Monit Assess 194, 760 (2022). https://doi.org/10.1007/s10661-022-10434-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10661-022-10434-9

Keywords