Advertisement

Exploring prediction uncertainty of spatial data in geostatistical and machine learning approaches

  • Francky FouedjioEmail author
  • Jens Klump
Thematic Issue
Part of the following topical collections:
  1. Learning from spatial data: unveiling the geo-environment through quantitative approaches

Abstract

Geostatistical methods such as kriging with external drift (KED) as well as machine learning techniques such as quantile regression forest (QRF) have been extensively used for the modeling and prediction of spatially distributed continuous variables when auxiliary information is available everywhere within the region under study. In addition to providing predictions, both methods are able to deliver a quantification of the uncertainty associated with the prediction. In this paper, kriging with external drift and quantile regression forest are compared with respect to their ability to deliver reliable predictions and prediction uncertainties of spatial data. The comparison is carried out through both synthetic and real-world spatial data. The results indicate that the superiority of KED over QRF can be expected when there is a linear relationship between the variable of interest and auxiliary variables, and the variable of interest shows a strong or weak spatial correlation. In other hand, the superiority of QRF over KED can be expected when there is a non-linear relationship between the variable of interest and auxiliary variables, and the variable of interest exhibits a weak spatial correlation. Moreover, when there is a non-linear relationship between the variable of interest and auxiliary variables, and the variable of interest shows a strong spatial correlation, one can expect QRF outperforms KED in terms of prediction accuracy but not in terms of prediction uncertainty accuracy.

Keywords

Auxiliary information Prediction uncertainty Kriging with external drift Quantile regression forest Spatial data 

Notes

Acknowledgements

The authors are grateful to the anonymous reviewers for their helpful and constructive comments on earlier versions of the manuscript. The authors would like to thank Charlie Kirkwood at the British Geological Survey for providing the real-world spatial data set used in this paper.

References

  1. Appelhans T, Mwangomo E, Hardy DR, Hemp A, Nauss T (2015) Evaluating machine learning approaches for the interpolation of monthly air temperature at mt. kilimanjaro, tanzania. Spat Stat 14(Part A):91–113CrossRefGoogle Scholar
  2. Ballabio C, Panagos P, Monatanarella L (2016) Mapping topsoil physical properties at european scale using the lucas database. Geoderma 261(Supplement C):110–123CrossRefGoogle Scholar
  3. Barzegar R, Asghari Moghaddam A, Adamowski J, Fijani E (2016) Comparison of machine learning models for predicting fluoride contamination in groundwater. Stoch Environ Res Risk Assess 31:1–14Google Scholar
  4. Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefGoogle Scholar
  5. Breiman L, Friedman J, Stone C, Olshen R (1984) Classification and regression trees. The Wadsworth and Brooks-Cole statistics-probability series. Taylor & Francis, AbingdonGoogle Scholar
  6. Carranza EJM (2008) Geochemical anomaly and mineral prospectivity mapping in GIS. Handbook of Exploration and Environmental Geochemistry. Elsevier, AmsterdamGoogle Scholar
  7. Chiles J-P, Delfiner P (2012) Geostatistics: modeling spatial uncertainty. Wiley, HobokenCrossRefGoogle Scholar
  8. Coulston JW, Blinn CE, Thomas VA, Wynne RH (2016) Approximating prediction uncertainty for random forest regression models. Photogramm Eng Remote Sens 82(3):189–197CrossRefGoogle Scholar
  9. Deutsch C (1997) Direct assessment of local accuracy andprecision. In: Baafi, EY, Schofield NA (Eds), 5th International Geostatistics Congress, Wollongong ’96. KluwerAcademic Publishers, London, pp 115–125Google Scholar
  10. Foresti L, Pozdnoukhov A, Tuia D, Kanevski M (2010) Extreme precipitation modelling using geostatistics and machine learning algorithms. In: Atkinson PM, Lloyd CD (eds) geoENV VII—geostatistics for environmental applications. Springer, Dordrecht, pp 41–52CrossRefGoogle Scholar
  11. Goovaerts P (2001) Geostatistical modelling of uncertainty in soil science. Geoderma 103(1):3–26CrossRefGoogle Scholar
  12. Hengl T (2009) A practical guide to geostatistical mapping. University of Amsterdam, AmsterdamGoogle Scholar
  13. Hengl T, Heuvelink GB, Stein A (2004) A generic framework for spatial prediction of soil variables based on regression-kriging. Geoderma 120(1):75–93CrossRefGoogle Scholar
  14. Hengl T, Nussbaum M, Wright M, Heuvelink G, Gräler B (2018) Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ 6:e5518.  https://doi.org/10.7717/peerj.5518 CrossRefGoogle Scholar
  15. Kanevski M (2008) Advanced mapping of environmental data: geostatistics, machine learning and B ayesian maximum entropy. Wiley, HobokenCrossRefGoogle Scholar
  16. Kanevski M, Pozdnoukhov A, Timonin V (2009) Machine learning for spatial environmental data: theory, applications, and software. EPFL press, LausanneCrossRefGoogle Scholar
  17. Khan SZ, Suman S, Pavani M, Das SK (2016) Prediction of the residual strength of clay using functional networks. Geosci Front 7(1):67–74CrossRefGoogle Scholar
  18. Kirkwood C, Cave M, Beamish D, Grebby S, Ferreira A (2016a) A machine learning approach to geochemical mapping. J Geochem Explor 167(Supplement C):49–61CrossRefGoogle Scholar
  19. Kirkwood C, Everett P, Ferreira A, Lister B (2016b) Stream sediment geochemistry as a tool for enhancing geological understanding: an overview of new data from south west england. J Geochem Explor 163:28–40CrossRefGoogle Scholar
  20. Lado LR, Hengl T, Reuter HI (2008) Heavy metals in european soils: a geostatistical analysis of the foregs geochemical database. Geoderma 148(2):189–199CrossRefGoogle Scholar
  21. Leuenberger M, Kanevski M (2015) Extreme learning machines for spatial environmental data. Comput Geosci 85(Part B):64–73CrossRefGoogle Scholar
  22. Li J (2013) Predictive modelling using random forest and its hybrid methods with geostatistical techniques in marine environmental geosciences. In: 11-th Australasian data mining conference (AusDM’13). Canberra, Australia, pp 73–79Google Scholar
  23. Li J, Heap AD (2008) A review of spatial interpolation methods for environmental scientists. Geoscience Australia, CanberraGoogle Scholar
  24. Li J, Heap AD, Potter A, Daniell JJ (2011) Application of machine learning methods to spatial interpolation of environmental variables. Environmen Modell Softw 26(12):1647–1659CrossRefGoogle Scholar
  25. Meinshausen N (2006) Quantile regression forests. J Mach Learn Res 7(Jun):983–999Google Scholar
  26. Meinshausen N (2017) quantregForest: Quantile Regression Forests. https://CRAN.R-project.org/package=quantregForest. R package version 1.3-7
  27. Moyeed RA, Papritz A (2002) An empirical comparison of kriging methods for nonlinear spatial point prediction. Math Geol 34(4):365–386CrossRefGoogle Scholar
  28. Papritz A, Dubois JR (1999) Mapping heavy metals in soil by (non-)linear kriging an empirical validation. In: Gómez-Hernández J, Soares A, Froidevaux R (eds) geoENV II—geostatistics for environmental applications. Springer, Dordrecht, pp 429–440CrossRefGoogle Scholar
  29. Papritz A, Moyeed RA (2001) Parameter uncertainty in spatial prediction: checking its importance by cross-validating the wolfcamp and rongelap data sets. In: Monestiez P, Allard D, Froidevaux R (eds) geoENV III—geostatistics for environmental applications. Springer, Dordrecht, pp 369–380CrossRefGoogle Scholar
  30. R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/. Accessed 11 Nov 2018
  31. Renard D, Bez N, Desassis N, Beucher H, Ors F, Freulon X (2018) RGeostats: geostatistical package. R package version 11.2.4. http://cg.ensmp.fr/rgeostats. Accessed 11 Nov 2018
  32. Tadic JM, Ilic V, Biraud S (2015) Examination of geostatistical and machine-learning techniques as interpolators in anisotropic atmospheric environments. Atmos Environ 111:28–38CrossRefGoogle Scholar
  33. Taghizadeh-Mehrjardi R, Nabiollahi K, Kerry R (2016) Digital mapping of soil organic carbon at multiple depths using different data mining techniques in baneh region, iran. Geoderma 266(Supplement C):98–110CrossRefGoogle Scholar
  34. Vaysse K, Lagacherie P (2017) Using quantile regression forest to estimate uncertainty of digital soil mapping products. Geoderma 291(Supplement C):55–64CrossRefGoogle Scholar
  35. Vermeulen D, Niekerk AV (2017) Machine learning performance for predicting soil salinity using different combinations of geomorphometric covariates. Geoderma 299(Supplement C):1–12CrossRefGoogle Scholar
  36. Wackernagel H (2013) Multivariate geostatistics: an introduction with applications. Springer, BerlinGoogle Scholar
  37. Wilford J, de Caritat P, Bui E (2016) Predictive geochemical mapping using environmental correlation. Appl Geochem 66(Supplement C):275–288CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Geological SciencesStanford UniversityStanfordUSA
  2. 2.CSIRO Mineral ResourcesKensingtonAustralia

Personalised recommendations