Skip to main content

Advertisement

Log in

Digital soil mapping: a predictive performance assessment of spatial linear regression, Bayesian and ML-based models

  • Original Article
  • Published:
Modeling Earth Systems and Environment Aims and scope Submit manuscript

Abstract

Nowadays, information on the spatial distribution of soil properties is considered a key element for environmental research and for agricultural planning and decision-making to monitor soil conditions, agricultural policies, etc. Developing models for spatial data is easy, but reliable predictions from such models are sometimes challenging due to the data features. Using simulation and data from the WoSI-ISRIC SoilGrid 250 m, we compared the predictive performance of five models: Spatial Linear Regression (SLR-REML), Machine learning (ML)-based models (Random Forest: RF and Random Forest Residual Kriging: RFRK), and Bayesian models (Integrated Laplace Approximation-Stochastic Partial Differential Equations: INLA-SPDE and spBAYES). Considering data characteristics such as spatial autocorrelation, range parameter, strength and type of relationship between the response variable and covariates, we cross-validated the models’ results using the following criteria: precision, unbiasedness, and uncertainty (RMSE, coefficient of determination (R\(^{2}\)), Lin’s concordance coefficient (\(\rho _{c}\)), and predicted interval coverage probability (PICP)). The results revealed the high precision of SLR-REML with a small bias in the case of low spatial autocorrelation. ML models (RF and RFRK) stood by their ability to account for nonlinearities, particularly the flexibility of RFRK to handle high spatial autocorrelation. The INLA-SPDE model was robust to all data characteristics. Despite its drawbacks related to the computation time observed, the SLR-REML model relaxed the minimum limit about the number of observations required in the classical regression by linear mixed modeling (REML-LMM) to make better predictions in Digital Soil Mapping (DSM). In addition to commonly used machine learning (ML) techniques, INLA-SPDE and SLR could be suitable for the understanding, characterization and mapping through spatiotemporal modeling of soil properties and environmental variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Availabity of data and materials

The simulated data used in this study are available on request from the corresponding author. The data used for application come from datasets websites that have been cited.

References

  • Amare T, Hergarten C, Hurni H, Wolfgramm B, Yitaferu B, Selassie YG (2013) 2013. Prediction of soil organic carbon for ethiopian highlands using soil spectroscopy, International Scholarly Research Notices

    Google Scholar 

  • Arshad M, Li N, Bella LD, Triantafilis J (2020) Field-scale digital soil mapping of clay: Combining different proximal sensed data and comparing various statistical models. Soil Sci Soc Am J 84(2):314–330

    Google Scholar 

  • Bahri H, Raclot D, Barbouchi M, Lagacherie P, Annabi M (2022) Mapping soil organic carbon stocks in tunisian topsoils. Geoderma Reg 30:e00561

    Google Scholar 

  • Beguin J, Fuglstad GA, Mansuy N, Paré D (2017) Predicting soil properties in the Canadian boreal forest with limited data: Comparison of spatial and non-spatial statistical approaches. Geoderma 306:195–205. https://doi.org/10.1016/j.geoderma.2017.06.016

    Article  Google Scholar 

  • Berger JO, De Oliveira V, Sansó B (2001) Objective Bayesian Analysis of Spatially Correlated Data. J Am Stat Assoc 96(456):1361–1374. https://doi.org/10.1198/016214501753382282

    Article  Google Scholar 

  • Bivand R, Gómez-Rubio V, Rue H (2015) Spatial data analysis with r-inla with some extensions. American Statistical Association

  • Blangiardo M, Cameletti M, Baio G, Rue H (2013) Spatial and spatio-temporal models with r-inla. Spatial Spatio-temp Epidemiol 4:33–49

    Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. https://doi.org/10.1007/BF00058655

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Google Scholar 

  • Busetto L, Ranghetti L (2016) Modistsp: An r package for automatic preprocessing of modis land products time series. Computers & geosciences 97:40–48

    Google Scholar 

  • Cameletti M, Lindgren F, Simpson D, Rue H (2013) Spatio-temporal modeling of particulate matter concentration through the spde approach. AStA Advances in Statistical Analysis 97(2):109–131

    Google Scholar 

  • Chen L, Ren C, Li L, Wang Y, Zhang B, Wang Z, Li L (2019) A comparative assessment of geostatistical, machine learning, and hybrid approaches for mapping topsoil organic carbon content. ISPRS Int J Geo Inf 8(4):174

    Google Scholar 

  • Cosandey-Godin A, Krainski ET, Worm B, Flemming JM (2015) Applying Bayesian spatiotemporal models to fisheries bycatch in the Canadian arctic. Can J Fish Aquat Sci 72(2):186–197

    Google Scholar 

  • Cressie N (1993) Statistics for spatial data. Wiley, Amsterdam

    Google Scholar 

  • Cressie N (2015) Statistics for spatial data. Wiley, Amsterdam

    Google Scholar 

  • Doetterl S, Stevens A, Van Oost K, Quine TA, Van Wesemael B (2013) Spatially-explicit regional-scale prediction of soil organic carbon stocks in cropland using environmental variables and mixed model approaches. Geoderma 204:31–42

    Google Scholar 

  • Eldeiry AA, Garcia LA (2010) Comparison of ordinary kriging, regression kriging, and cokriging techniques to estimate soil salinity using landsat images. J Irrig Drain Eng 136(6):355–364

    Google Scholar 

  • Fayad I, Baghdadi N, Bailly JS, Barbier N, Gond V, Hérault B, El Hajj M, Fabre F, Perrin J (2016) Regional Scale Rain-Forest Height Mapping Using Regression-Kriging of Spaceborne and Airborne LiDAR Data: Application on French Guiana. Remote Sens 8(3):240. https://doi.org/10.3390/rs8030240

    Article  Google Scholar 

  • Fick SE, Hijmans RJ (2017) Worldclim 2: new 1-km spatial resolution climate surfaces for global land areas. Int J Climatol 37(12):4302–4315

    Google Scholar 

  • Finley AO, Banerjee S (2020) Bayesian spatially varying coefficient models in the spBayes R package. Environ Model Softw 125:104608. https://doi.org/10.1016/j.envsoft.2019.104608

    Article  Google Scholar 

  • Finley, A.O., S. Banerjee, and A.E. Gelfand. 2013. spBayes for large univariate and multivariate point-referenced spatio-temporal data models. arXiv:1310.8192 [stat]

  • Folly CL, Konstantinoudis G, Mazzei-Abba A, Kreis C, Bucher B, Furrer R, Spycher BD (2021) Bayesian spatial modelling of terrestrial radiation in Switzerland. J Environ Radioact 233:106571. https://doi.org/10.1016/j.jenvrad.2021.106571

    Article  Google Scholar 

  • Fox EW, Hoef JMV, Olsen AR (2020) Comparing spatial regression to random forests for large environmental data sets. PLoS ONE 15(3):e0229509. https://doi.org/10.1371/journal.pone.0229509

    Article  Google Scholar 

  • Fuglstad GA, Simpson D, Lindgren F, Rue H (2019) Constructing priors that penalize the complexity of gaussian random fields. J Am Stat Assoc 114(525):445–452

    Google Scholar 

  • Gilks WR, Richardson S, Spiegelhalter D (1995) Markov chain Monte Carlo in practice. CRC Press, New York

    Google Scholar 

  • Guo PT, Li MF, Luo W, Tang QF, Liu ZW, Lin ZM (2015) Digital mapping of soil organic matter for rubber plantation at regional scale: An application of random forest plus residuals kriging approach. Geoderma 237:49–59

    Google Scholar 

  • Hanks EM, Schliep EM, Hooten MB, Hoeting JA (2015) Restricted spatial regression in practice: geostatistical models, confounding, and robustness under model misspecification. Environmetrics 26(4):243–254. https://doi.org/10.1002/env.2331

    Article  Google Scholar 

  • Harville DA (1977) Maximum likelihood approaches to variance component estimation and to related problems. J Am Stat Assoc 72(358):320–338

    Google Scholar 

  • Hengl T, Heuvelink GB, Kempen B, Leenaars JG, Walsh MG, Shepherd KD, Sila A, MacMillan RA, Mendes de Jesus J, Tamene L et al (2015) Mapping soil properties of africa at 250 m resolution: Random forests significantly improve current predictions. PLoS ONE 10(6):e0125814

    Google Scholar 

  • Hengl T, Mendes de Jesus J, Heuvelink GB, Ruiperez Gonzalez M, Kilibarda M, Blagotić A, Shangguan W, Wright MN, Geng X, Bauer-Marschallinger B et al (2017) Soilgrids250m: Global gridded soil information based on machine learning. PLoS ONE 12(2):e0169748

    Google Scholar 

  • Huang J, Malone BP, Minasny B, McBratney AB, Triantafilis J (2017) Evaluating a bayesian modelling approach (inla-spde) for environmental mapping. Sci Total Environ 609:621–632

    Google Scholar 

  • James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning, vol 112. Springer, New York

    Google Scholar 

  • Kaya F, Keshavarzi A, Francaviglia R, Kaplan G, Başayiğit L, Dedeoğlu M (2022) Assessing machine learning-based prediction under different agricultural practices for digital mapping of soil organic carbon and available phosphorus. Agriculture 12(7):1062

    Google Scholar 

  • Keskin H, Grunwald S (2018) Regression kriging as a workhorse in the digital soil mapper’s toolbox. Geoderma 326:22–41. https://doi.org/10.1016/j.geoderma.2018.04.004

    Article  Google Scholar 

  • Khan K, Calder CA (2022) Restricted Spatial Regression Methods: Implications for Inference. J Am Stat Assoc 117(537):482–494. https://doi.org/10.1080/01621459.2020.1788949

    Article  Google Scholar 

  • Krainski, E., F. Lindgren, D. Simpson, and H. Rue. 2016. The r-inla tutorial on spde models. Journal of Geographical Systems, http://www math ntnu no/inla/r-inla org/tutorials/spde/spde-tutorial pdf

  • Li N, Zare E, Huang J, Triantafilis J (2018) Mapping soil cation-exchange capacity using bayesian modeling and proximal sensors at the field scale. Soil Sci Soc Am J 82(5):1203–1216

    Google Scholar 

  • Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22

    Google Scholar 

  • Lin, L. 1989. A concordance correlation coefficient to evaluate reproducibility. Biometrics: 255–268

  • Lindgren F, Rue H (2015) Bayesian spatial modelling with r-inla. J Stat Softw 63:1–25

    Google Scholar 

  • Lindgren F, Rue H, Lindström J (2011) An explicit link between gaussian fields and gaussian markov random fields: the stochastic partial differential equation approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73(4):423–498

    Google Scholar 

  • Lombardo L, Opitz T, Ardizzone F, Guzzetti F, Huser R (2020) Space-time landslide predictive modelling. Earth-Sci Rev 209:103318

    Google Scholar 

  • Makungwe M, Chabala LM, Chishala BH, Lark RM (2021) Performance of linear mixed models and random forests for spatial prediction of soil ph. Geoderma 397:115079

    Google Scholar 

  • Malone BP, McBratney AB, Minasny B (2011) Empirical estimates of uncertainty for mapping continuous depth functions of soil attributes. Geoderma 160(3):614–626. https://doi.org/10.1016/j.geoderma.2010.11.013

    Article  Google Scholar 

  • Malone BP, Minasny B, McBratney AB et al (2017) Using R for digital soil mapping, vol 35. Springer, New York

    Google Scholar 

  • Mansuy N, Thiffault E, Paré D, Bernier P, Guindon L, Villemaire P, Poirier V, Beaudoin A (2014) Digital mapping of soil properties in canadian managed forests at 250 m of resolution using the k-nearest neighbor method. Geoderma 235:59–73

    Google Scholar 

  • Marchant BP (2018) Model-based soil geostatistics. Pedometrics: 341–371

  • McBratney AB, Minasny B, Stockmann U et al (2018) Pedometrics. Springer, New York

    Google Scholar 

  • Meinshausen, N. and M.N. Meinshausen. 2017. Package ‘quantregforest’. Quantile Regression Forests.(R packag e version 1.3–7)

  • Minasny B, McBratney AB (2005) The matérn function as a general model for soil variograms. Geoderma 128(3–4):192–207

    Google Scholar 

  • Minasny B, McBratney AB (2007) Spatial prediction of soil properties using eblup with the matérn covariance function. Geoderma 140(4):324–336

    Google Scholar 

  • Moraga P (2021) Species distribution modeling using spatial point processes: a case study of sloth occurrence in costa rica. The R Journal 12(2):293–310

    Google Scholar 

  • Moraga, P. and L. Baker. 2022. rspatialdata: a collection of data sources and tutorials on downloading and visualising spatial data using r. F1000Research 11

  • Moraga P, Cano J, Baggaley RF, Gyapong JO, Njenga SM, Nikolay B, Davies E, Rebollo MP, Pullan RL, Bockarie MJ et al (2015) Modelling the distribution and transmission intensity of lymphatic filariasis in sub-saharan africa prior to scaling up interventions: integrated use of geostatistical and mathematical modelling. Parasites & vectors 8(1):1–16

    Google Scholar 

  • Moraga P, Dean C, Inoue J, Morawiecki P, Noureen SR, Wang F (2021) Bayesian spatial modelling of geostatistical data using inla and spde methods: A case study predicting malaria risk in mozambique. Spatial and Spatio-temporal Epidemiology 39:100440

    Google Scholar 

  • Ottoy S, De Vos B, Sindayihebura A, Hermy M, Van Orshoven J (2017) Assessing soil organic carbon stocks under current and potential forest cover using digital soil mapping and spatial generalisation. Ecol Ind 77:139–150

    Google Scholar 

  • Padarian J, Minasny B, McBratney AB (2020) Machine learning and soil sciences: A review aided by machine learning tools. Soil 6(1):35–52

    Google Scholar 

  • Paradis E, Blomberg S, Bolker B, Brown J, Claude J, Cuong HS, Desper R, Didier G (2019) Package ‘ape’. Analyses of phylogenetics and evolution, version 2(4):47

    Google Scholar 

  • Piikki K, Wetterlind J, Söderström M, Stenberg B (2021) Perspectives on validation in digital soil mapping of continuous attributes-a review. Soil Use Manag 37(1):7–21

    Google Scholar 

  • Poggio L, Gimona A, Spezia L, Brewer MJ (2016) Bayesian spatial modelling of soil properties and their uncertainty: the example of soil organic matter in Scotland using R-INLA. Geoderma 277:69–82. https://doi.org/10.1016/j.geoderma.2016.04.026

    Article  Google Scholar 

  • Pollice A, Bilancia M (2002) Kriging with mixed effects models. Statistica (Bologna) 62(3):405–429

    Google Scholar 

  • QGIS Development Team 2019. QGIS Geographic Information System. Open Source Geospatial Foundation

  • R Core Team (2022) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria

    Google Scholar 

  • Rue H, Martino S, Chopin N (2009) Approximate bayesian inference for latent gaussian models by using integrated nested laplace approximations. Journal of the royal statistical society: Series b (statistical methodology) 71(2):319–392

    Google Scholar 

  • Saha A, Basu S, Datta A (2021) Random Forests for Spatially Dependent Data. Journal of the American Statistical Association 1–19. https://doi.org/10.1080/01621459.2021.1950003

  • Saha A, Datta A (2018) Brisc: bootstrap for rapid inference on spatial covariances. Stat 7(1):e184

  • Saha, A. and A. Datta. 2018b. Brisc: Fast inference for large spatial datasets using brisc. r package version 0.1. 0

  • Stein ML (1999) Interpolation of spatial data: some theory for kriging. Springer, New York

    Google Scholar 

  • Stroup WW (2002) Power analysis based on spatial effects mixed models: A tool for comparing design and analysis strategies in the presence of spatial variability. J Agric Biol Environ Stat 7(4):491–511. https://doi.org/10.1198/108571102780

    Article  Google Scholar 

  • Sun XL, Yang Q, Wang HL, Wu YJ (2019) Can regression determination, nugget-to-sill ratio and sampling spacing determine relative performance of regression kriging over ordinary kriging? CATENA 181:104092. https://doi.org/10.1016/j.catena.2019.104092

    Article  Google Scholar 

  • Takoutsing B, Heuvelink GB, Stoorvogel JJ, Shepherd KD, Aynekulu E (2022) Accounting for analytical and proximal soil sensing errors in digital soil mapping. Eur J Soil Sci 73(2):e13226

    Google Scholar 

  • Ver Hoef JM, Cressie NA, Glenn-Lewin DC (1993) Spatial models for spatial statistics: some unification. J Veg Sci 4(4):441–452

    Google Scholar 

  • Vrugt JA (2016) Markov chain monte carlo simulation using the dream software package: theory, concepts, and matlab implementation. Environ Model Softw 75:273–316

    Google Scholar 

  • Wadoux AMC, Minasny B, McBratney AB (2020) Machine learning for digital soil mapping: applications, challenges and suggested solutions. Earth Sci Rev 210:103359

    Google Scholar 

  • Webster R, Oliver MA (2007) Geostatistics for environmental scientists. Wiley, Amsterdam

    Google Scholar 

  • Wetschoreck, F., T. Krabel, and S. Krishnamurthy. 2020. 8080labs/ppscore: zenodo release

  • Zhang S, Huang Y, Shen C, Ye H, Du Y (2012) Spatial prediction of soil organic matter using terrain indices and categorical variables as auxiliary information. Geoderma 171:35–43

    Google Scholar 

  • Zimmerman DL, Ver Hoef JM (2021) On Deconfounding Spatial Confounding in Linear Models. The American Statistician 1–9. https://doi.org/10.1080/00031305.2021.1946149

Download references

Funding

This research was supported by Deutscher Akademischer Austauschdienst German Academic Exchange Service (DAAD), Germany: In-Country/In-Region Scholarship Programme through Laboratoire de Biomathématiques  & d’Estimations Forestières  (LABEF), FSA/UAC/Benin, 2020 (Grant number 57546598/Ref:91786413).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, A.K.M. and R.G.K.; formal analysis, A.K.M.; data curation, A.K.M.; writing—original draft preparation, A.K.M. and E.E.G; writing— review and editing, R.G.K; supervision, R.G.K. All authors have read and agreed to the published version of the manuscript

Corresponding author

Correspondence to Alain Kangela Matazi.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Simulated data on the 16 scenarios considered

See Fig. 8.

Fig. 8
figure 8

Simulated Data (n = 1500)

Appendix B Predictive Power Scores Matrix

Fig. 9
figure 9

Predictive power scores matrix

See Fig. 9.

Appendix C Individual relationships between soil properties and environmental covariates

See Fig. 10.

Fig. 10
figure 10

Relationships between soil properties and environmental covariates

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Matazi, A.K., Gognet, E.E. & Kakaï, R.G. Digital soil mapping: a predictive performance assessment of spatial linear regression, Bayesian and ML-based models. Model. Earth Syst. Environ. 10, 595–618 (2024). https://doi.org/10.1007/s40808-023-01788-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40808-023-01788-1

Keywords

Navigation