Skip to main content

Advertisement

Log in

A Comparative Assessment Between Three Machine Learning Models and Their Performance Comparison by Bivariate and Multivariate Statistical Methods in Groundwater Potential Mapping

  • Published:
Water Resources Management Aims and scope Submit manuscript

Abstract

As demand for fresh groundwater in the worldwide is increasing, delineation of groundwater spring potential zones become an increasingly important tool for implementing a successful groundwater determination, protection, and management programs. Therefore, the objective of current study is to evaluate the capability of three machine learning models such as boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF), and comparison of their performance by bivariate (evidential belief function (EBF)), and multivariate (general linear model (GLM)) statistical methods in the groundwater potential mapping. This study was carried out in the Beheshtabad Watershed, Chaharmahal-e-Bakhtiari Province, Iran. In total, 1425 spring locations were detected in the study area. Seventy percent of the spring locations were used for model training, and 30 % for validation purposes. Fourteen conditioning-factors were considered in this investigation, including slope angle, slope aspect, altitude, plan curvature, profile curvature, slope length (LS), stream power index (SPI), topographic wetness index (TWI), distance from rivers, distance from faults, river density, fault density, lithology, and land use. Using the above conditioning factors and different algorithms, groundwater potential maps were generated, and the results were plotted in ArcGIS 9.3. According to the results of success rate curves (SRC), values of area under the curve (AUC) for the five models vary from 0.692 to 0.975. In contrast, the AUC for prediction rate curves (PRC) ranges from 77.26 to 86.39 %. The CART, BRT, and RF machine learning techniques showed very good performance in groundwater potential mapping with the AUC values of 86.39, 86.12, and 86.05 %, respectively. By the way, The GLM and EBF models in comparison by machine learning models showed weaker performance in spring groundwater potential mapping by the AUC values of 77.26, and 67.72 %, respectively. The proposed methods provided rapid, accurate, and cost effective results. Furthermore, the analysis may be transferable to other watersheds with similar topographic and hydro-geological characteristics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Abeare SM (2009) Comparisons of boosted regression tree, GLM and GAM performance in the standardization of yellowfin tuna catch-rate data from the Gulf of Mexico Lonline Fishery. Master’s Thesis, Louisiana State University

  • Aertsen W, Kint V, Van Orshoven J, Özkan K, Muys B (2010) Comparison and ranking of different modeling techniques for prediction of site index in Mediterranean mountain forests. Ecol Model 221:1119–1130

    Article  Google Scholar 

  • Aertsen W, Kint V, Van Orshoven J, Muys B (2011) Evaluation of modelling techniques for forest site productivity prediction in contrasting eco-regions using stochastic multi-criteria acceptability analysis (SMAA). Environ Model Softw 26(7):929–937

    Article  Google Scholar 

  • Bachmair S, Weiler M (2012) Hillslope characteristics as controls of subsurface flow variability. Hydrol Earth Syst Sci 16:3699–3715

    Article  Google Scholar 

  • Beven KJ, Kirkby MJ (1979) A physically based, variable contributing area model of basin hydrology. Hydrol Sci Bull 24:43–69

    Article  Google Scholar 

  • Breiman L, Friedman JH, Olshen R, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont

    Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  • Calle ML, Urrea V (2010) Letter to the Editor: stability of random forest importance measures. Brief Bioinform 12(1):86–89

    Article  Google Scholar 

  • Catani F, Lagomarsino D, Segoni S, Tofani V (2013) Landslide susceptibility estimation by random forests technique: sensitivity and scaling issues. Nat Hazards Earth Syst Sci 13:2815–2831

    Article  Google Scholar 

  • Carranza EJM, Van Ruitenbeek F, Hecker C et al (2008) Knowledge-guided data-driven evidential belief modeling of mineral prospectivity in Cabo de Gata, SE Spain. Int J Appl Earth Obs 10:374–387

    Article  Google Scholar 

  • Catry FX, Rego FC, Bação FL, Moreira F (2009) Modelling and mapping the occurrence of wildfire ignitions in Portugal. Int J Wildland Fire 18:921–931

    Article  Google Scholar 

  • Chezgi J, Pourghasemi HR, Naghibi SA, Moradi HR, Kheirkhah Zarkesh M (2015) Assessment of a spatial multi-criteria evaluation to site selection underground dam in the Alborz Province, Iran. Geocarto Int. doi:10.1080/10106049.2015.1073366

    Google Scholar 

  • Davoodi Moghaddam D, Rezaei M, Pourghasemi HR, Pourtaghie ZS, Pradhan B (2015) Groundwater spring potential mapping using bivariate statistical model and GIS in the Taleghan watershed, Iran. Arab J Geosci 8(2):913–929

    Article  Google Scholar 

  • Dempster AP (1968) A generalization of Bayesian inference. J R Stat Soc 30:205–247

    Google Scholar 

  • Durga Rao KHV (2014) Spatial optimization technique for planning groundwater supply schemes in a rapid growing urban environment. Water Resour Manag 28(3):731–747

    Article  Google Scholar 

  • Elith J, Graham CH, Anderson RP et al (2006) Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29:129–151

    Article  Google Scholar 

  • Elith J, Leathwick JR, Hastie T (2008) A working guide to boosted regression trees. J Anim Ecol 77:802–813

    Article  Google Scholar 

  • Esquivel JM, Morales GP, Esteller MV (2015) Groundwater monitoring network design using GIS and multi-criteria analysis. Water Resour Manag 29(9):3157–3194

    Google Scholar 

  • Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232

    Article  Google Scholar 

  • Friedman JH, Meulman JJ (2003) Multiple additive regression trees with application in epidemiology. Stat Med 22:1365–1381

    Article  Google Scholar 

  • Geology Survey of Iran (GSI) (1997) Geology map of the Chaharmahal-e-Bakhtiari Province. http://www.gsi.ir/Main/Lang_en/index.html. Accessed September 2000

  • Gutiérrez ÁG, Schnabel S, Felicisimo AM (2009a) Modelling the occurrence of gullies in rangelands of southwest Spain. Earth Surf Process Landf 34:1894–1902

    Article  Google Scholar 

  • Gutiérrez ÁG, Schnabel S, Lavado Contador JF (2009b) Using and comparing two nonparametric methods (CART and MARS) to model the potential distribution of gullies. Ecol Model 220(24):3630–3637

    Article  Google Scholar 

  • Leathwick JR, Elith J, Francis MP et al (2006) Variation in demersal fish species richness in the oceans surrounding New Zealand: an analysis using boosted regression trees. Mar Ecol Prog Ser 321:267–281

    Article  Google Scholar 

  • Lee S (2007) Application and verification of fuzzy algebraic operators to landslide susceptibility mapping. Environ Geol 52:615–623

    Article  Google Scholar 

  • Lee S, Song KY, Kim Y et al (2012) Regional groundwater productivity potential mapping using a geographic information system (GIS) based artificial neural network model. Hydrogeol J 20:1511–1527

    Article  Google Scholar 

  • Lee S, Hwang J, Park I (2013) Application of data-driven evidential belief functions to landslide susceptibility mapping in Jinbu, Korea. Catena 100:15–30

    Article  Google Scholar 

  • Leuenberger M, Kanevski M, Orozco CDV (2013) Forest fires in a random forest. EGU General Assembly, Austria

    Google Scholar 

  • Liaw A, Wiener M (2002) Classification and regression by random forest. R News 2:18–22

    Google Scholar 

  • Manap MA, Nampak H, Pradhan B et al (2012) Application of probabilistic-based frequency ratio model in groundwater potential mapping using remote sensing data and GIS. Arab J Geosci. doi:10.1007/s12517-012-0795-z

    Google Scholar 

  • Mazza R, La Vigna F, Alimonti C (2014) Evaluating the available regional groundwater resources using the distributed hydrogeological budget. Water Resour Manag 28(3):749–765

    Article  Google Scholar 

  • Micheletti N, Foresti L, Robert S (2014) Machine learning feature selection methods for landslide susceptibility mapping. Math Geosci 46:33–57

    Article  Google Scholar 

  • Moisen GG, Freeman EA, Blackard JA et al (2006) Predicting tree species presence and basal area in Utah: a comparison of stochastic gradient boosting, generalized additive models, and tree-based methods. Ecol Model 199:176–187

    Article  Google Scholar 

  • Mojiri HR, Zarei AR (2006) The investigation of precipitation condition in the Zagros area and its effects on the central plateau of Iran. The 2nd Conference of Water Resource Management. Tehran, Iran

  • Moore ID, Burch GJ (1986) Sediment transport capacity of sheet and rill flow: application of unit stream power theory. Water Resour Res 22(8):1350–1360

    Article  Google Scholar 

  • Moore ID, Grayson RB, Ladson AR (1991) Digital terrain modelling: a review of hydrological, geomorphological, and biological applications. Hydrol Process 4:3–30

    Article  Google Scholar 

  • Moradi Dashtpagerdi M, Nohegar A, Vagharfard H, Honarbakhsh A, Mahmoodinejad V, Noroozi A, Ghonchehpoor D (2013) Application of spatial analysis techniques to select the most suitable areas for flood spreading. Water Resour Manag 27:3071–3084

    Article  Google Scholar 

  • Naghibi SA, Pourghasemi HR, Pourtaghi ZS et al (2015) Groundwater qanat potential mapping using frequency ratio and Shannon’s entropy models in the Moghan watershed, Iran. Earth Sci Inf 8(1):171–186

    Article  Google Scholar 

  • Nampak H, Pradhan B, Manap MA (2014) Application of GIS based data driven evidential belief function model to predict groundwater potential zonation. J Hydrol. doi:10.1016/j.jhydrol.2014.02.053

    Google Scholar 

  • O’Brien RM (2007) A caution regarding rules of thumb for variance inflation factors. Qual Quant 41(5):673–690

    Article  Google Scholar 

  • Oh HJ, Lee S (2010) Assessment of ground subsidence using GIS and the weights-of-evidence model. Eng Geol 115(1–2):36–48

    Article  Google Scholar 

  • Oh HJ, Kim YS, Choi JK et al (2011) GIS mapping of regional probabilistic groundwater potential in the area of Pohang City, Korea. J Hydrol 399:158–172

    Article  Google Scholar 

  • Oliveira S, Oehler F, San-Miguel-Ayanz J (2012) Modeling spatial patterns of fire occurrence in Mediterranean Europeusing Multiple Regression and Random Forest. Forest Ecol Manag 275:117–129

    Article  Google Scholar 

  • Ozdemir A (2011a) GIS-based groundwater spring potential mapping in the Sultan Mountains (Konya, Turkey) using frequency ratio, weights of evidence and logistic regression methods and their comparison. J Hydrol 411:290–308

    Article  Google Scholar 

  • Ozdemir A (2011b) Using a binary logistic regression method and GIS for evaluating and mapping the groundwater spring potential in the Sultan Mountains (Aksehir, Turkey). J Hydrol 405:123–136

    Article  Google Scholar 

  • Ozdemir A, Altural T (2013) A comparative study of frequency ratio, weights of evidence and logistic regression methods for landslide susceptibility mapping: Sultan Mountains, SW Turkey. J Asian Earth Sci 64:180–197

    Article  Google Scholar 

  • Peters J, De Baets B, Verhoest NEC, Samson R, Degroeve S, De Becker P, Huybrechts W (2007) Random forests as a tool for ecohydrological distribution modelling. Ecol Model 207:304–318

    Article  Google Scholar 

  • Pourghasemi HR, Beheshtirad M (2014) Assessment of a data-driven evidential belief function model and GIS for groundwater potential mapping in the Koohrang Watershed, Iran. Geocarto Int. doi:10.1080/10106049.2014.966161

    Google Scholar 

  • Pourtaghi ZS, Pourghasemi HR (2014) GIS-based groundwater spring potential assessment and mapping in the Birjand Township, southern Khorasan Province, Iran. Hydrogeol J 22(3):643–662

    Article  Google Scholar 

  • Rahmati O, Nazari Samani A, Mahdavi M, Pourghasemi HR, Zeinivand H (2014) Groundwater potential mapping at Kurdistan region of Iran using analytic hierarchy process and GIS. Arab J Geosci. doi:10.1007/s12517-014-1668-4

    Google Scholar 

  • Razandi Y, Pourghasemi HR, Samani Neisani N, Rahmati O (2015) Application of analytical hierarchy process, frequency ratio, and certainty factor models for groundwater potential mapping using GIS. Earth Sci Inf. doi:10.1007/s12145-015-0220-8

    Google Scholar 

  • Shafer G (1976) A mathematical theory of evidence. Princetown University Press, New Jersey

    Google Scholar 

  • Stumpf A, Kernel N (2011) Object-oriented mapping of landslides using random forests. Remote Sens Environ 115(10):2564–2577

    Article  Google Scholar 

  • Tien Bui D, Pradhan B, Lofman O et al (2012) Spatial prediction of landslide hazards in Hoa Binh province (Vietnam): a comparative assessment of the efficacy of evidential belief functions and fuzzy logic models. Catena 96:28–40

    Article  Google Scholar 

  • Thiam AK (2005) An evidential reasoning approach to land degradation evaluation: Dempster-Shafer theory of evidence. Trans GIS 9:507–520

    Article  Google Scholar 

  • Trigila A, Frattini P, Casagli N et al (2013) Landslide susceptibility mapping at national scale: the Italian case study. Landslide Sci Prac 1:287–295

    Article  Google Scholar 

  • Vilar L, Woolford DG, Martell DL et al (2010) A model for predicting human-caused wildfire occurrence in the region of Madrid, Spain. Int J Wildland Fire 19(3):325–337

    Article  Google Scholar 

  • Vorpahl P, Elsenbeer H, Märker M et al (2012) How can statistical models help to determine driving factors of landslides? Ecol Model 239:27–39

    Article  Google Scholar 

  • Williams G (2011) Data mining with rattle and R (The art of excavating data for knowledge discovery series)

  • Zekri S, Triki C, Al-Maktoumi A, Bazargan-Lari MR (2015) An optimization-simulation approach for groundwater abstraction under recharge uncertainty. Water Resour Manag 29(10):3681–3695

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank of editorial comments and the anonymous reviewers for their helpful comments on the previous version of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamid Reza Pourghasemi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Naghibi, S.A., Pourghasemi, H.R. A Comparative Assessment Between Three Machine Learning Models and Their Performance Comparison by Bivariate and Multivariate Statistical Methods in Groundwater Potential Mapping. Water Resour Manage 29, 5217–5236 (2015). https://doi.org/10.1007/s11269-015-1114-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11269-015-1114-8

Keywords

Navigation