Abstract
Nowadays, information on the spatial distribution of soil properties is considered a key element for environmental research and for agricultural planning and decision-making to monitor soil conditions, agricultural policies, etc. Developing models for spatial data is easy, but reliable predictions from such models are sometimes challenging due to the data features. Using simulation and data from the WoSI-ISRIC SoilGrid 250 m, we compared the predictive performance of five models: Spatial Linear Regression (SLR-REML), Machine learning (ML)-based models (Random Forest: RF and Random Forest Residual Kriging: RFRK), and Bayesian models (Integrated Laplace Approximation-Stochastic Partial Differential Equations: INLA-SPDE and spBAYES). Considering data characteristics such as spatial autocorrelation, range parameter, strength and type of relationship between the response variable and covariates, we cross-validated the models’ results using the following criteria: precision, unbiasedness, and uncertainty (RMSE, coefficient of determination (R\(^{2}\)), Lin’s concordance coefficient (\(\rho _{c}\)), and predicted interval coverage probability (PICP)). The results revealed the high precision of SLR-REML with a small bias in the case of low spatial autocorrelation. ML models (RF and RFRK) stood by their ability to account for nonlinearities, particularly the flexibility of RFRK to handle high spatial autocorrelation. The INLA-SPDE model was robust to all data characteristics. Despite its drawbacks related to the computation time observed, the SLR-REML model relaxed the minimum limit about the number of observations required in the classical regression by linear mixed modeling (REML-LMM) to make better predictions in Digital Soil Mapping (DSM). In addition to commonly used machine learning (ML) techniques, INLA-SPDE and SLR could be suitable for the understanding, characterization and mapping through spatiotemporal modeling of soil properties and environmental variables.
Similar content being viewed by others
Availabity of data and materials
The simulated data used in this study are available on request from the corresponding author. The data used for application come from datasets websites that have been cited.
References
Amare T, Hergarten C, Hurni H, Wolfgramm B, Yitaferu B, Selassie YG (2013) 2013. Prediction of soil organic carbon for ethiopian highlands using soil spectroscopy, International Scholarly Research Notices
Arshad M, Li N, Bella LD, Triantafilis J (2020) Field-scale digital soil mapping of clay: Combining different proximal sensed data and comparing various statistical models. Soil Sci Soc Am J 84(2):314–330
Bahri H, Raclot D, Barbouchi M, Lagacherie P, Annabi M (2022) Mapping soil organic carbon stocks in tunisian topsoils. Geoderma Reg 30:e00561
Beguin J, Fuglstad GA, Mansuy N, Paré D (2017) Predicting soil properties in the Canadian boreal forest with limited data: Comparison of spatial and non-spatial statistical approaches. Geoderma 306:195–205. https://doi.org/10.1016/j.geoderma.2017.06.016
Berger JO, De Oliveira V, Sansó B (2001) Objective Bayesian Analysis of Spatially Correlated Data. J Am Stat Assoc 96(456):1361–1374. https://doi.org/10.1198/016214501753382282
Bivand R, Gómez-Rubio V, Rue H (2015) Spatial data analysis with r-inla with some extensions. American Statistical Association
Blangiardo M, Cameletti M, Baio G, Rue H (2013) Spatial and spatio-temporal models with r-inla. Spatial Spatio-temp Epidemiol 4:33–49
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. https://doi.org/10.1007/BF00058655
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Busetto L, Ranghetti L (2016) Modistsp: An r package for automatic preprocessing of modis land products time series. Computers & geosciences 97:40–48
Cameletti M, Lindgren F, Simpson D, Rue H (2013) Spatio-temporal modeling of particulate matter concentration through the spde approach. AStA Advances in Statistical Analysis 97(2):109–131
Chen L, Ren C, Li L, Wang Y, Zhang B, Wang Z, Li L (2019) A comparative assessment of geostatistical, machine learning, and hybrid approaches for mapping topsoil organic carbon content. ISPRS Int J Geo Inf 8(4):174
Cosandey-Godin A, Krainski ET, Worm B, Flemming JM (2015) Applying Bayesian spatiotemporal models to fisheries bycatch in the Canadian arctic. Can J Fish Aquat Sci 72(2):186–197
Cressie N (1993) Statistics for spatial data. Wiley, Amsterdam
Cressie N (2015) Statistics for spatial data. Wiley, Amsterdam
Doetterl S, Stevens A, Van Oost K, Quine TA, Van Wesemael B (2013) Spatially-explicit regional-scale prediction of soil organic carbon stocks in cropland using environmental variables and mixed model approaches. Geoderma 204:31–42
Eldeiry AA, Garcia LA (2010) Comparison of ordinary kriging, regression kriging, and cokriging techniques to estimate soil salinity using landsat images. J Irrig Drain Eng 136(6):355–364
Fayad I, Baghdadi N, Bailly JS, Barbier N, Gond V, Hérault B, El Hajj M, Fabre F, Perrin J (2016) Regional Scale Rain-Forest Height Mapping Using Regression-Kriging of Spaceborne and Airborne LiDAR Data: Application on French Guiana. Remote Sens 8(3):240. https://doi.org/10.3390/rs8030240
Fick SE, Hijmans RJ (2017) Worldclim 2: new 1-km spatial resolution climate surfaces for global land areas. Int J Climatol 37(12):4302–4315
Finley AO, Banerjee S (2020) Bayesian spatially varying coefficient models in the spBayes R package. Environ Model Softw 125:104608. https://doi.org/10.1016/j.envsoft.2019.104608
Finley, A.O., S. Banerjee, and A.E. Gelfand. 2013. spBayes for large univariate and multivariate point-referenced spatio-temporal data models. arXiv:1310.8192 [stat]
Folly CL, Konstantinoudis G, Mazzei-Abba A, Kreis C, Bucher B, Furrer R, Spycher BD (2021) Bayesian spatial modelling of terrestrial radiation in Switzerland. J Environ Radioact 233:106571. https://doi.org/10.1016/j.jenvrad.2021.106571
Fox EW, Hoef JMV, Olsen AR (2020) Comparing spatial regression to random forests for large environmental data sets. PLoS ONE 15(3):e0229509. https://doi.org/10.1371/journal.pone.0229509
Fuglstad GA, Simpson D, Lindgren F, Rue H (2019) Constructing priors that penalize the complexity of gaussian random fields. J Am Stat Assoc 114(525):445–452
Gilks WR, Richardson S, Spiegelhalter D (1995) Markov chain Monte Carlo in practice. CRC Press, New York
Guo PT, Li MF, Luo W, Tang QF, Liu ZW, Lin ZM (2015) Digital mapping of soil organic matter for rubber plantation at regional scale: An application of random forest plus residuals kriging approach. Geoderma 237:49–59
Hanks EM, Schliep EM, Hooten MB, Hoeting JA (2015) Restricted spatial regression in practice: geostatistical models, confounding, and robustness under model misspecification. Environmetrics 26(4):243–254. https://doi.org/10.1002/env.2331
Harville DA (1977) Maximum likelihood approaches to variance component estimation and to related problems. J Am Stat Assoc 72(358):320–338
Hengl T, Heuvelink GB, Kempen B, Leenaars JG, Walsh MG, Shepherd KD, Sila A, MacMillan RA, Mendes de Jesus J, Tamene L et al (2015) Mapping soil properties of africa at 250 m resolution: Random forests significantly improve current predictions. PLoS ONE 10(6):e0125814
Hengl T, Mendes de Jesus J, Heuvelink GB, Ruiperez Gonzalez M, Kilibarda M, Blagotić A, Shangguan W, Wright MN, Geng X, Bauer-Marschallinger B et al (2017) Soilgrids250m: Global gridded soil information based on machine learning. PLoS ONE 12(2):e0169748
Huang J, Malone BP, Minasny B, McBratney AB, Triantafilis J (2017) Evaluating a bayesian modelling approach (inla-spde) for environmental mapping. Sci Total Environ 609:621–632
James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning, vol 112. Springer, New York
Kaya F, Keshavarzi A, Francaviglia R, Kaplan G, Başayiğit L, Dedeoğlu M (2022) Assessing machine learning-based prediction under different agricultural practices for digital mapping of soil organic carbon and available phosphorus. Agriculture 12(7):1062
Keskin H, Grunwald S (2018) Regression kriging as a workhorse in the digital soil mapper’s toolbox. Geoderma 326:22–41. https://doi.org/10.1016/j.geoderma.2018.04.004
Khan K, Calder CA (2022) Restricted Spatial Regression Methods: Implications for Inference. J Am Stat Assoc 117(537):482–494. https://doi.org/10.1080/01621459.2020.1788949
Krainski, E., F. Lindgren, D. Simpson, and H. Rue. 2016. The r-inla tutorial on spde models. Journal of Geographical Systems, http://www math ntnu no/inla/r-inla org/tutorials/spde/spde-tutorial pdf
Li N, Zare E, Huang J, Triantafilis J (2018) Mapping soil cation-exchange capacity using bayesian modeling and proximal sensors at the field scale. Soil Sci Soc Am J 82(5):1203–1216
Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22
Lin, L. 1989. A concordance correlation coefficient to evaluate reproducibility. Biometrics: 255–268
Lindgren F, Rue H (2015) Bayesian spatial modelling with r-inla. J Stat Softw 63:1–25
Lindgren F, Rue H, Lindström J (2011) An explicit link between gaussian fields and gaussian markov random fields: the stochastic partial differential equation approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73(4):423–498
Lombardo L, Opitz T, Ardizzone F, Guzzetti F, Huser R (2020) Space-time landslide predictive modelling. Earth-Sci Rev 209:103318
Makungwe M, Chabala LM, Chishala BH, Lark RM (2021) Performance of linear mixed models and random forests for spatial prediction of soil ph. Geoderma 397:115079
Malone BP, McBratney AB, Minasny B (2011) Empirical estimates of uncertainty for mapping continuous depth functions of soil attributes. Geoderma 160(3):614–626. https://doi.org/10.1016/j.geoderma.2010.11.013
Malone BP, Minasny B, McBratney AB et al (2017) Using R for digital soil mapping, vol 35. Springer, New York
Mansuy N, Thiffault E, Paré D, Bernier P, Guindon L, Villemaire P, Poirier V, Beaudoin A (2014) Digital mapping of soil properties in canadian managed forests at 250 m of resolution using the k-nearest neighbor method. Geoderma 235:59–73
Marchant BP (2018) Model-based soil geostatistics. Pedometrics: 341–371
McBratney AB, Minasny B, Stockmann U et al (2018) Pedometrics. Springer, New York
Meinshausen, N. and M.N. Meinshausen. 2017. Package ‘quantregforest’. Quantile Regression Forests.(R packag e version 1.3–7)
Minasny B, McBratney AB (2005) The matérn function as a general model for soil variograms. Geoderma 128(3–4):192–207
Minasny B, McBratney AB (2007) Spatial prediction of soil properties using eblup with the matérn covariance function. Geoderma 140(4):324–336
Moraga P (2021) Species distribution modeling using spatial point processes: a case study of sloth occurrence in costa rica. The R Journal 12(2):293–310
Moraga, P. and L. Baker. 2022. rspatialdata: a collection of data sources and tutorials on downloading and visualising spatial data using r. F1000Research 11
Moraga P, Cano J, Baggaley RF, Gyapong JO, Njenga SM, Nikolay B, Davies E, Rebollo MP, Pullan RL, Bockarie MJ et al (2015) Modelling the distribution and transmission intensity of lymphatic filariasis in sub-saharan africa prior to scaling up interventions: integrated use of geostatistical and mathematical modelling. Parasites & vectors 8(1):1–16
Moraga P, Dean C, Inoue J, Morawiecki P, Noureen SR, Wang F (2021) Bayesian spatial modelling of geostatistical data using inla and spde methods: A case study predicting malaria risk in mozambique. Spatial and Spatio-temporal Epidemiology 39:100440
Ottoy S, De Vos B, Sindayihebura A, Hermy M, Van Orshoven J (2017) Assessing soil organic carbon stocks under current and potential forest cover using digital soil mapping and spatial generalisation. Ecol Ind 77:139–150
Padarian J, Minasny B, McBratney AB (2020) Machine learning and soil sciences: A review aided by machine learning tools. Soil 6(1):35–52
Paradis E, Blomberg S, Bolker B, Brown J, Claude J, Cuong HS, Desper R, Didier G (2019) Package ‘ape’. Analyses of phylogenetics and evolution, version 2(4):47
Piikki K, Wetterlind J, Söderström M, Stenberg B (2021) Perspectives on validation in digital soil mapping of continuous attributes-a review. Soil Use Manag 37(1):7–21
Poggio L, Gimona A, Spezia L, Brewer MJ (2016) Bayesian spatial modelling of soil properties and their uncertainty: the example of soil organic matter in Scotland using R-INLA. Geoderma 277:69–82. https://doi.org/10.1016/j.geoderma.2016.04.026
Pollice A, Bilancia M (2002) Kriging with mixed effects models. Statistica (Bologna) 62(3):405–429
QGIS Development Team 2019. QGIS Geographic Information System. Open Source Geospatial Foundation
R Core Team (2022) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria
Rue H, Martino S, Chopin N (2009) Approximate bayesian inference for latent gaussian models by using integrated nested laplace approximations. Journal of the royal statistical society: Series b (statistical methodology) 71(2):319–392
Saha A, Basu S, Datta A (2021) Random Forests for Spatially Dependent Data. Journal of the American Statistical Association 1–19. https://doi.org/10.1080/01621459.2021.1950003
Saha A, Datta A (2018) Brisc: bootstrap for rapid inference on spatial covariances. Stat 7(1):e184
Saha, A. and A. Datta. 2018b. Brisc: Fast inference for large spatial datasets using brisc. r package version 0.1. 0
Stein ML (1999) Interpolation of spatial data: some theory for kriging. Springer, New York
Stroup WW (2002) Power analysis based on spatial effects mixed models: A tool for comparing design and analysis strategies in the presence of spatial variability. J Agric Biol Environ Stat 7(4):491–511. https://doi.org/10.1198/108571102780
Sun XL, Yang Q, Wang HL, Wu YJ (2019) Can regression determination, nugget-to-sill ratio and sampling spacing determine relative performance of regression kriging over ordinary kriging? CATENA 181:104092. https://doi.org/10.1016/j.catena.2019.104092
Takoutsing B, Heuvelink GB, Stoorvogel JJ, Shepherd KD, Aynekulu E (2022) Accounting for analytical and proximal soil sensing errors in digital soil mapping. Eur J Soil Sci 73(2):e13226
Ver Hoef JM, Cressie NA, Glenn-Lewin DC (1993) Spatial models for spatial statistics: some unification. J Veg Sci 4(4):441–452
Vrugt JA (2016) Markov chain monte carlo simulation using the dream software package: theory, concepts, and matlab implementation. Environ Model Softw 75:273–316
Wadoux AMC, Minasny B, McBratney AB (2020) Machine learning for digital soil mapping: applications, challenges and suggested solutions. Earth Sci Rev 210:103359
Webster R, Oliver MA (2007) Geostatistics for environmental scientists. Wiley, Amsterdam
Wetschoreck, F., T. Krabel, and S. Krishnamurthy. 2020. 8080labs/ppscore: zenodo release
Zhang S, Huang Y, Shen C, Ye H, Du Y (2012) Spatial prediction of soil organic matter using terrain indices and categorical variables as auxiliary information. Geoderma 171:35–43
Zimmerman DL, Ver Hoef JM (2021) On Deconfounding Spatial Confounding in Linear Models. The American Statistician 1–9. https://doi.org/10.1080/00031305.2021.1946149
Funding
This research was supported by Deutscher Akademischer Austauschdienst German Academic Exchange Service (DAAD), Germany: In-Country/In-Region Scholarship Programme through Laboratoire de Biomathématiques & d’Estimations Forestières (LABEF), FSA/UAC/Benin, 2020 (Grant number 57546598/Ref:91786413).
Author information
Authors and Affiliations
Contributions
Conceptualization, A.K.M. and R.G.K.; formal analysis, A.K.M.; data curation, A.K.M.; writing—original draft preparation, A.K.M. and E.E.G; writing— review and editing, R.G.K; supervision, R.G.K. All authors have read and agreed to the published version of the manuscript
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Matazi, A.K., Gognet, E.E. & Kakaï, R.G. Digital soil mapping: a predictive performance assessment of spatial linear regression, Bayesian and ML-based models. Model. Earth Syst. Environ. 10, 595–618 (2024). https://doi.org/10.1007/s40808-023-01788-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40808-023-01788-1