Abstract
Accuracy and uncertainty of models used for digital soil mapping are important for assessing confidence of predictions and reliable land use planning and management. In this study, two approaches of geostatistical (spatial) and machine learning (ML) models were evaluated for predictive mapping of soil calcium (Ca) and potassium (K). Two spatial models including empirical Bayesian kriging (EBK) and sequential Gaussian simulation (SGS) were compared with machine learning models: Cubist, random forest (RF) and support vector machine (SVM) in terms of their accuracy and uncertainty for mapping soil Ca and K. The study area is in Nowley, New South Wales, Australia, with an area of 2083 ha and a variety of soil types and farming systems. For the models training process, 240 soil samples data and for validation 102 independent samples data were used. For accuracy assessment R2, root mean square error (RMSE), concordance and bias and for uncertainty assessment confidence limits were investigated. Also, in order to compare the outcomes for the two soil properties with different measurement units, mean absolute percentage error (MAPE) and relative uncertainty (RU) as accuracy and uncertainty measures, respectively, were evaluated. Results showed that for K map SGS had the highest R2 (0.74) and lowest RMSE (1.96), followed by EBK with R2 = 0.72 and RMSE = 2.02. For Ca map, EBK model showed the highest accuracy (R2 = 0.46; RMSE = 3.21), followed by SVM and SGS with comparable accuracies. Comparing the two soil properties, Ca map showed higher MAPE and RU, compared to K map. The lowest MAPE was obtained for EBK model (for K = 39) and SGS model (for K = 44). Also, the lowest RU values were found for EBK and SGS models. Among the ML models, SVM showed lower sensitivity to higher variance in data input. In general, the spatial models outperformed the ML models with regard to both accuracy and uncertainty. An additional conclusion is that considering the data variance in the two soil properties, geostatistical models with lower RU and MAPE were relatively less susceptible to data variance, compared to ML models.





Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data availability statement
The datasets generated during and/or analyzed during the current study are not publicly available due to third party restrictions, but are available from the author on reasonable request.
References
Adhikari, K., Hartemink, A. E., Minasny, B., Kheir, R. B., Greve, M. B., & Greve, M. H. (2014). Digital mapping of soil organic carbon contents and stocks in Denmark. PLoS ONE, 9(8), e105519.
Arrouays, D., Lagacherie, P., & Hartemink, A. E. (2017). Digital soil mapping across the globe. In Geoderma Regional (Vol. 9, pp. 1–4). Elsevier.
Beguin, J., Fuglstad, G.-A., Mansuy, N., & Paré, D. (2017). Predicting soil properties in the Canadian boreal forest with limited data: Comparison of spatial and non-spatial statistical approaches. Geoderma, 306, 195–205.
Beven, K. J., & Kirkby, M. J. (1979). A physically based, variable contributing area model of basin hydrology/Un modèle à base physique de zone d’appel variable de l’hydrologie du bassin versant. Hydrological Sciences Journal, 24(1), 43–69.
Bohling, G. C. (2007). Introduction to geostatistics. Kansas Geological Survey Open File Report, 2007–26, 50.
Borůvka, L., Vašát, R., Němeček, K., Novotný, R., Šrámek, V., Vacek, O., Pavlů, L., Fadrhonsová, V., & Drábek, O. (2020). Application of regression-kriging and sequential Gaussian simulation for the delineation of forest areas potentially suitable for liming in the Jizera Mountains region, Czech Republic. Geoderma Regional, 21, e00286.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Cambardella, C. A., Moorman, T. B., Novak, J. M., Parkin, T. B., Karlen, D. L., Turco, R. F., & Konopka, A. E. (1994). Field-scale variability of soil properties in central Iowa soils. Soil Science Society of America Journal, 58(5), 1501–1511.
Carrara, M., Castrignanò, A., Comparetti, A., Febo, P., & Orlando, S. (2007). Mapping of penetrometer resistance in relation to tractor traffic using multivariate geostatistics. Geoderma, 142(3–4), 294–307.
Castrignanò, A., & Buttafuoco, G. (2004). Geostatistical stochastic simulation of soil water content in a forested area of south Italy. Biosystems Engineering, 87(2), 257–266.
Chen, F., Chen, S., & Peng, G. (2012). Using sequential gaussian simulation to assess geochemical anomaly areas of lead element. International Conference on Computer and Computing Technologies in Agriculture, 69–76.
Chen, S., Mulder, V. L., Martin, M. P., Walter, C., Lacoste, M., Richer-de-Forges, A. C., Saby, N. P. A., Loiseau, T., Hu, B., & Arrouays, D. (2019). Probability mapping of soil thickness by random survival forest at a national scale. Geoderma.
Drucker, H., Burges, C. J. C., Kaufman, L., Smola, A., & Vapnik, V. (1997). Support vector regression machines. Advances in Neural Information Processing Systems, 9, 155–161.
Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap. CRC Press.
Giustini, F., Ciotoli, G., Rinaldini, A., Ruggiero, L., & Voltaggio, M. (2019). Mapping the geogenic radon potential and radon risk by using Empirical Bayesian Kriging regression: A case study from a volcanic area of central Italy. Science of the Total Environment, 661, 449–464.
Goovaerts, P. (1997). Kriging vs stochastic simulation for risk analysis in soil contamination. In geoENV I—Geostatistics for Environmental Applications (pp. 247–258). Springer.
Goovaerts, P. (1999). Geostatistics in soil science: State-of-the-art and perspectives. Geoderma, 89(1–2), 1–45.
Goovaerts, P. (2000). Estimation or simulation of soil properties? An optimization problem with conflicting criteria. Geoderma, 97(3–4), 165–186.
Goovaerts, P. (2001). Geostatistical modelling of uncertainty in soil science. Geoderma, 103(1–2), 3–26.
Gribov, A., & Krivoruchko, K. (2020). Empirical Bayesian kriging implementation and usage. Science of the Total Environment, 722, 137290.
Grimm, R., Behrens, T., Märker, M., & Elsenbeer, H. (2008). Soil organic carbon concentrations and stocks on Barro Colorado Island—Digital soil mapping using Random Forests analysis. Geoderma, 146(1–2), 102–113.
Grunwald, S. (2009). Multi-criteria characterization of recent digital soil mapping and modeling approaches. Geoderma, 152(3–4), 195–207.
Guo, P.-T., Li, M.-F., Luo, W., Tang, Q.-F., Liu, Z.-W., & Lin, Z.-M. (2015). Digital mapping of soil organic matter for rubber plantation at regional scale: An application of random forest plus residuals kriging approach. Geoderma, 237, 49–59.
Heuvelink, G. B. M., Kros, J., Reinds, G. J., & De Vries, W. (2016). Geostatistical prediction and simulation of European soil property maps. Geoderma Regional, 7(2), 201–215.
Hjerdt, K. N., McDonnell, J. J., Seibert, J., & Rodhe, A. (2004). A new topographic index to quantify downslope controls on local drainage. Water Resources Research, 40(5).
Keskin, H., Grunwald, S., & Harris, W. G. (2019). Digital mapping of soil carbon fractions with machine learning. Geoderma, 339, 40–58.
Kidd, D., Searle, R., Grundy, M., McBratney, A., Robinson, N., O’Brien, L., Zund, P., Arrouays, D., Thomas, M., & Padarian, J. (2020). Operationalising digital soil mapping–Lessons from Australia. Geoderma Regional, e00335.
Krivoruchko, K. (2012). Empirical bayesian kriging. ArcUser Fall, 6(10).
Krivoruchko, K., & Gribov, A. (2019). Evaluation of empirical Bayesian kriging. Spatial Statistics, 32, 100368.
Kuhn, M, Weston, S., Keefer, C., & Coulter, N. (2012). Cubist models for regression. R Package Vignette R Package Version 0.0, 18.
Kuhn, M, Weston, S., Keefer, C., Coulter, N., & Quinlan, R. (2013). Cubist: Rule-and Instance-Based Regression Modeling. R package version 0.0. 15.
Lark, R. M. (2012). A stochastic geometric model for continuous local trends in soil variation. Geoderma, 189, 661–670.
Lawrence, I., & Lin, K. (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics, 255–268.
Li, Y., Hernandez, J. H., Aviles, M., Knappett, P. S. K., Giardino, J. R., Miranda, R., Puy, M. J., Padilla, F., & Morales, J. (2020). Empirical Bayesian Kriging method to evaluate inter-annual water-table evolution in the Cuenca Alta del Río Laja aquifer, Guanajuato, México. Journal of Hydrology, 582, 124517.
Liang, Z., Chen, S., Yang, Y., Zhao, R., Shi, Z., & Rossel, R. A. V. (2019). National digital soil map of organic matter in topsoil and its associated uncertainty in 1980’s China. Geoderma, 335, 47–56.
Liao, K., Lai, X., Lv, L., & Zhu, Q. (2016). Uncertainty in predicting the spatial pattern of soil water temporal stability at the hillslope scale. Soil Research, 54(6), 739–748.
Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18–22.
Lima, C. H. R., Kwon, H.-H., & Kim, Y.-T. (2021). A Bayesian Kriging model applied for spatial downscaling of daily rainfall from GCMs. Journal of Hydrology, 597, 126095.
Luo, Z., Eady, S., Sharma, B., Grant, T., Li Liu, D., Cowie, A., Farquharson, R., Simmons, A., Crawford, D., & Searle, R. (2019). Mapping future soil carbon change and its uncertainty in croplands using simple surrogates of a complex farming system model. Geoderma, 337, 311–321.
Ma, Y. X., Minasny, B., Malone, B. P., & McBratney, A. B. (2019). Pedology and digital soil mapping (DSM). European Journal of Soil Science, 70, 216–235. https://doi.org/10.1111/ejss.12790
Malone, B. P., McBratney, A. B., & Minasny, B. (2011). Empirical estimates of uncertainty for mapping continuous depth functions of soil attributes. Geoderma, 160(3–4), 614–626.
Malone, B. P., Minasny, B., & McBratney, A. B. (2017). Using R for digital soil mapping. Springer.
McBratney, A. B., Santos, M. L. M., & Minasny, B. (2003). On digital soil mapping. Geoderma, 117(1–2), 3–52.
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F., Chang, C.-C., Lin, C.-C., & Meyer, M. D. (2019). Package ‘e1071.’ The R Journal.
Minaei-Bidgoli, B., Topchy, A. P., & Punch, W. F. (2004). A Comparison of Resampling Methods for Clustering Ensembles. IC-AI, 939–945.
Minasny, B., & McBratney, A. B. (2016). Digital soil mapping: A brief history and some lessons. Geoderma, 264, 301–311.
Minty, B., Franklin, R., Milligan, P., Richardson, M., & Wilford, J. (2009). The radiometric map of Australia. Exploration Geophysics, 40(4), 325–333.
Mulder, V. L., Lacoste, M., Richer-de-Forges, A. C., & Arrouays, D. (2016a). GlobalSoilMap France: High-resolution spatial modelling the soils of France up to two meter depth. Science of the Total Environment, 573, 1352–1369.
Mulder, V. L., Lacoste, M., Richer-de-Forges, A. C., Martin, M. P., & Arrouays, D. (2016b). National versus global modelling the 3D distribution of soil organic carbon in mainland France. Geoderma, 263, 16–34.
Padarian, J., Minasny, B., & McBratney, A. B. (2020). Machine learning and soil sciences: A review aided by machine learning tools. The Soil, 6(1), 35–52.
Pardo-Igúzquiza, E., & Olea, R. A. (2012). VARBOOT: A spatial bootstrap program for semivariogram uncertainty assessment. Computers & Geosciences, 41, 188–198.
Pásztor, L., Laborczi, A., Takács, K., Illés, G., Szabó, J., & Szatmári, G. (2020). Progress in the elaboration of GSM conform DSM products and their functional utilization in Hungary. Geoderma Regional, 21, e00269.
Pilz, J., & Spöck, G. (2008). Why do we need and how should we implement Bayesian kriging methods. Stochastic Environmental Research and Risk Assessment, 22(5), 621–632.
Quinlan, J. R. (1992). Learning with continuous classes. 5th Australian Joint Conference on Artificial Intelligence, 92, 343–348.
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Rossel, R. A. V., Chen, C., Grundy, M. J., Searle, R., Clifford, D., & Campbell, P. H. (2015). The Australian three-dimensional soil grid: Australia’s contribution to the GlobalSoilMap project. Soil Research, 53(8), 845–864.
Safikhani, M., Asghari, O., & Emery, X. (2017). Assessing the accuracy of sequential gaussian simulation through statistical testing. Stochastic Environmental Research and Risk Assessment, 31(2), 523–533.
Sahu, B., & Ghosh, A. K. (2021). Deterministic and geostatistical models for predicting soil organic carbon in a 60 ha farm on Inceptisol in Varanasi (p. e00413). Geoderma Regional.
Scholkopf, B., & Smola, A. J. (2018). Learning with kernels: support vector machines, regularization, optimization, and beyond. Adaptive Computation and Machine Learning series.
Searle, R., McBratney, A., Grundy, M., Kidd, D., Malone, B., Arrouays, D., Stockman, U., Zund, P., Wilson, P., & Wilford, J. (2021). Digital soil mapping and assessment for Australia and beyond: A propitious future. Geoderma Regional, e00359.
Sharififar, A., Sarmadian, F., & Minasny, B. (2019). Mapping imbalanced soil classes using Markov chain random fields models treated with data resampling technique. Computers and Electronics in Agriculture, 159, 110–118.
Shrestha, D. L., & Solomatine, D. P. (2006). Machine learning approaches for estimation of prediction interval for the model output. Neural Networks, 19(2), 225–235.
Singh, K., & Whelan, B. (2020). Soil carbon change across ten New South Wales farms under different farm management regimes in Australia. Soil Use and Management, 36(4), 616–632.
Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199–222.
Somarathna, P., Minasny, B., & Malone, B. P. (2017). More Data or a Better Model? Figuring Out What Matters Most for the Spatial Prediction of Soil Carbon. Soil Science Society of America Journal, 81(6), 1413–1426.
Stockmann, U., Cattle, S. R., Minasny, B., & McBratney, A. B. (2016). Utilizing portable X-ray fluorescence spectrometry for in-field investigation of pedogenesis. Catena, 139, 220–231.
Szatmári, G., & Pásztor, L. (2019). Comparison of various uncertainty modelling approaches based on geostatistics and machine learning algorithms. Geoderma, 337, 1329–1340.
Taghizadeh-Mehrjardi, R., Minasny, B., Sarmadian, F., & Malone, B. P. (2014). Digital mapping of soil salinity in Ardakan region, central Iran. Geoderma, 213, 15–28.
Taghizadeh-Mehrjardi, R., Nabiollahi, K., & Kerry, R. (2016). Digital mapping of soil organic carbon at multiple depths using different data mining techniques in Baneh region, Iran. Geoderma, 266, 98–110.
Taghizadeh-Mehrjardi, R., Schmidt, K., Toomanian, N., Heung, B., Behrens, T., Mosavi, A., Band, S. S., Amirian-Chakan, A., Fathabadi, A., & Scholten, T. (2021). Improving the spatial prediction of soil salinity in arid regions using wavelet transformation and support vector regression models. Geoderma, 383, 114793.
Trevor, H., Robert, T., & JH, F. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer.
Wadoux, A. M. -C., Minasny, B., & McBratney, A. B. (2020). Machine learning for digital soil mapping: applications, challenges and suggested solutions. Earth-Science Reviews, 103359.
Wiesmeier, M., Barthold, F., Blank, B., & Kögel-Knabner, I. (2011). Digital mapping of soil organic matter stocks using Random Forest modeling in a semi-arid steppe ecosystem. Plant and Soil, 340(1–2), 7–24.
Acknowledgements
I would like to acknowledge and thank Prof. Budiman Minasny from the University of Sydney for providing the data used in this study.
Funding
The author declares that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The author has no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sharififar, A. Accuracy and uncertainty of geostatistical models versus machine learning for digital mapping of soil calcium and potassium. Environ Monit Assess 194, 760 (2022). https://doi.org/10.1007/s10661-022-10434-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10661-022-10434-9


