Skip to main content

Advertisement

Log in

Comparison of gradient boosted decision trees and random forest for groundwater potential mapping in Dholpur (Rajasthan), India

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

In the drought prone district of Dholpur in Rajasthan, India, groundwater is a lifeline for its inhabitants. With population explosion and rapid urbanization, the groundwater is being critically over-exploited. Hence the current groundwater potential mapping study was undertaken to ascertain the areas that are more likely to yield a larger volume of groundwater against those areas that have poor groundwater potential and accordingly perpetuate the much needed damage control. Thematic layers for 14 groundwater influencing factors were considered for the study region, including elevation, slope, aspect, plan curvature, profile curvature, topographic wetness index (TWI), geology, soil, land use, normalized difference vegetation index (NDVI), surface temperature, precipitation, distance from roads, and distance from rivers. These were then subjected to an overlay operation, with the groundwater inventory which comprised of the locations of observational groundwater wells. The resulting geospatial database was then used to train two decision tree based ensemble models: gradient boosted decision trees (GBDT) and random forest (RF). The predictive performance of these models was then compared using various performance metrics such as area under curve (AUC) of receiver operating characteristics (ROC), sensitivity, accuracy, etc. It was found that GBDT (AUC: 0.79) outperformed RF (AUC: 0.71). The validated GBDT model was then used to construct the groundwater potential zonation map. The generated map showed that about 20.2% of the region has very high potential, while 22.6% has high potential to yield groundwater, and approximately 19.9–17.5% of the study region has very low to low groundwater potential.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Abedi Gheshlaghi H, Feizizadeh B, Blaschke T (2020) GIS-based forest fire risk mapping using the analytical network process and fuzzy logic. J Environ Plan Manag 63(3):481–499

    Google Scholar 

  • Al-Abadi AM, Shahid S (2015) A comparison between index of entropy and catastrophe theory methods for mapping groundwater potential in an arid region. Environ Monit Assess 187(9):576

    Google Scholar 

  • Alam MZ, Rahman MS, Rahman MS (2019) A Random Forest based predictor for medical data classification using feature ranking. Inform Med Unlocked 15:100180

    Google Scholar 

  • Althuwaynee OF, Pradhan B, Lee S (2012) Application of an evidential belief function model in landslide susceptibility mapping. Comput Geosci 44:120–135

    Google Scholar 

  • Althuwaynee OF, Pradhan B, Park HJ, Lee JH (2014) A novel ensemble bivariate statistical evidential belief function with knowledge-based analytical hierarchy process and multivariate statistical logistic regression for landslide susceptibility mapping. CATENA 114:21–36

    Google Scholar 

  • Arabameri A, Pradhan B, Rezaei K, Sohrabi M, Kalantari Z (2019a) GIS-based landslide susceptibility mapping using numerical risk factor bivariate model and its ensemble with linear multivariate regression and boosted regression tree algorithms. J Mt Sci 16(3):595–618

    Google Scholar 

  • Arabameri A, Pradhan B, Lombardo L (2019b) Comparative assessment using boosted regression trees, binary logistic regression, frequency ratio and numerical risk factor for gully erosion susceptibility modelling. CATENA 183:104223

    Google Scholar 

  • Avand M, Janizadeh S, Naghibi SA, Pourghasemi HR, Khosrobeigi Bozchaloei S, Blaschke T (2019) A comparative assessment of Random Forest and k-Nearest Neighbor classifiers for gully erosion susceptibility mapping. Water 11(10):2076

    Google Scholar 

  • Banks D, Robins N, Robins N (2002) An introduction to groundwater in crystalline bedrock. Norges geologiske undersøkelse, Trondheim

    Google Scholar 

  • Beaudoin A, Bernier PY, Guindon L, Villemaire P, Guo XJ, Stinson G, Hall RJ (2014) Mapping attributes of Canada’s forests at moderate resolution through kNN and MODIS imagery. Can J For Res 44(5):521–532

    Google Scholar 

  • Bragagnolo L, da Silva RV, Grzybowski JMV (2020a) Artificial neural network ensembles applied to the mapping of landslide susceptibility. CATENA 184:104240

    Google Scholar 

  • Bragagnolo L, da Silva RV, Grzybowski JMV (2020b) Landslide susceptibility mapping with r landslide: a free open-source GIS-integrated tool based on Artificial Neural Networks. Environ Model Softw 123:104565

    Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Google Scholar 

  • Bui QT, Nguyen QH, Nguyen XL, Pham VD, Nguyen HD, Pham VM (2020) Verification of novel integrations of swarm intelligence algorithms into deep learning neural network for flood susceptibility mapping. J Hydrol 581:124379

    Google Scholar 

  • Carranza EJM, Hale M (2003) Evidential belief functions for data-driven geologically constrained mapping of gold potential, Baguio district, Philippines. Ore Geol Rev 22(1–2):117–132

    Google Scholar 

  • Central Ground Water Board (CGWB), Ministry of Jal Shakti, Department of Water Resources, River Development and Ganga Rejuvenation, Government of India, Assesment of Ground Water (2018). http://cgwb.gov.in/. Accessed 18 Jan 2020

  • Chen W, Xie X, Wang J, Pradhan B, Hong H, Bui DT, Ma J (2017) A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility. CATENA 151:147–160

    Google Scholar 

  • Chen W, Zhang S, Li R, Shahabi H (2018) Performance evaluation of the GIS-based data mining techniques of best-first decision tree, random forest, and naïve Bayes tree for landslide susceptibility modeling. Sci Total Environ 644:1006–1018

    CAS  Google Scholar 

  • Chen J, Li Q, Wang H, Deng M (2020a) A machine learning ensemble approach based on random forest and radial basis function neural network for risk evaluation of regional flood disaster: a case study of the Yangtze River Delta, China. Int J Environ Res Public Health 17(1):49

    Google Scholar 

  • Chen W, Li Y, Xue W, Shahabi H, Li S, Hong H, Ahmad BB (2020b) Modeling flood susceptibility using data-driven approaches of naïve bayes tree, alternating decision tree, and random forest methods. Sci Total Environ 701:134979

    CAS  Google Scholar 

  • Choubin B, Moradi E, Golshan M, Adamowski J, Sajedi-Hosseini F, Mosavi A (2019) An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines. Sci Total Environ 651:2087–2096

    CAS  Google Scholar 

  • Çolak E, Sunar F (2020) Evaluation of forest fire risk in the Mediterranean Turkish forests: a case study of Menderes region, Izmir. Int J Disaster Risk Reduct 45:101479

    Google Scholar 

  • Corsini A, Cervi F, Ronchetti F (2009) Weight of evidence and artificial neural networks for potential groundwater spring mapping: an application to the Mt. Modino area (Northern Apennines, Italy). Geomorphology 111(1–2):79–87

    Google Scholar 

  • Costache R, Bui DT (2020) Identification of areas prone to flash-flood phenomena using multiple-criteria decision-making, bivariate statistics, machine learning and their ensembles. Sci Total Environ 712:136492

    CAS  Google Scholar 

  • de Quadros TF, Koppe JC, Strieder AJ, Costa JF (2006) Mineral-potential mapping: a comparison of weights-of-evidence and fuzzy methods. Nat Resour Res 15(1):49–65

    Google Scholar 

  • Díaz-Alcaide S, Martínez-Santos P (2019) Advances in groundwater potential mapping. Hydrogeol J 27(7):2307–2324

    Google Scholar 

  • Dou J, Yunus AP, Bui DT, Merghadi A, Sahana M, Zhu Z, Pham BT (2019) Assessment of advanced random forest and decision tree algorithms for modeling rainfall-induced landslide susceptibility in the Izu-Oshima Volcanic Island, Japan. Sci Total Environ 662:332–346

    CAS  Google Scholar 

  • Feloni E, Mousadis I, Baltas E (2020) Flood vulnerability assessment using a GIS-based multi-criteria approach—the case of Attica region. J Flood Risk Manag 13:e12563

    Google Scholar 

  • Feng B, Wang J, Zhang Y, Hall B, Zeng C (2020) Urban flood hazard mapping using a hydraulic–GIS combined model. Nat Hazards 100:1089–1104

    Google Scholar 

  • Fitts CR (2002) Groundwater science. Elsevier, Amsterdam

    Google Scholar 

  • Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378

    Google Scholar 

  • Garosi Y, Sheklabadi M, Pourghasemi HR, Besalatpour AA, Conoscenti C, Van Oost K (2018) Comparison of differences in resolution and sources of controlling factors for gully erosion susceptibility mapping. Geoderma 330:65–78

    Google Scholar 

  • Gayen A, Pourghasemi HR, Saha S, Keesstra S, Bai S (2019) Gully erosion susceptibility assessment and management of hazard-prone areas in India using different machine learning algorithms. Sci Total Environ 668:124–138

    CAS  Google Scholar 

  • Gjertsen AK (2007) Accuracy of forest mapping based on Landsat TM data and a kNN-based method. Remote Sens Environ 110(4):420–430

    Google Scholar 

  • Hosseinalizadeh M, Kariminejad N, Chen W, Pourghasemi HR, Alinejad M, Behbahani AM, Tiefenbacher JP (2019) Gully headcut susceptibility modeling using functional trees, naïve Bayes tree, and random forest models. Geoderma 342:1–11

    Google Scholar 

  • Hu Q, Zhou Y, Wang S, Wang F (2020) Machine learning and fractal theory models for landslide susceptibility mapping: case study from the Jinsha River Basin. Geomorphology 351:106975

    Google Scholar 

  • Jha MK, Chowdhury A, Chowdary VM, Peiffer S (2007) Groundwater management and development by integrated remote sensing and geographic information systems: prospects and constraints. Water Resour Manag 21(2):427–467

    Google Scholar 

  • Kaur L, Rishi MS, Singh G, Thakur SN (2020) Groundwater potential assessment of an alluvial aquifer in Yamuna sub-basin (Panipat region) using remote sensing and GIS techniques in conjunction with analytical hierarchy process (AHP) and catastrophe theory (CT). Ecol Ind 110:105850

    Google Scholar 

  • Kayastha P, Dhital MR, De Smedt F (2012) Landslide susceptibility mapping using the weight of evidence method in the Tinau watershed, Nepal. Nat Hazards 63(2):479–498

    Google Scholar 

  • Khosravi K, Pham BT, Chapi K, Shirzadi A, Shahabi H, Revhaug I, Bui DT (2018) A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. Sci Total Environ 627:744–755

    CAS  Google Scholar 

  • Kim JC, Lee S, Jung HS, Lee S (2018) Landslide susceptibility mapping using random forest and boosted tree models in Pyeong-Chang, Korea. Geocarto Int 33(9):1000–1015

    Google Scholar 

  • Kuhnert PM, Henderson AK, Bartley R, Herr A (2010) Incorporating uncertainty in gully erosion calculations using the random forests modelling approach. Environmetrics 21(5):493–509

    Google Scholar 

  • Lee S, Choi J (2004) Landslide susceptibility mapping using GIS and the weight-of-evidence model. Int J Geogr Inf Sci 18(8):789–814

    Google Scholar 

  • Lee S, Pradhan B (2007) Landslide hazard mapping at Selangor, Malaysia using frequency ratio and logistic regression models. Landslides 4(1):33–41

    Google Scholar 

  • Lee S, Song KY, Kim Y, Park I (2012) Regional groundwater productivity potential mapping using a geographic information system (GIS) based artificial neural network model. Hydrogeol J 20(8):1511–1527

    Google Scholar 

  • Lee S, Kim JC, Jung HS, Lee MJ, Lee S (2017) Spatial prediction of flood susceptibility using random-forest and boosted-tree models in Seoul metropolitan city, Korea. Geomat Nat Hazards Risk 8(2):1185–1203

    Google Scholar 

  • Liaw A, Wiener M (2002) Classification and regression by random forest. R News 2(3):18–22

    Google Scholar 

  • Lombardo L, Cama M, Conoscenti C, Märker M, Rotigliano EJNH (2015) Binary logistic regression versus stochastic gradient boosted decision trees in assessing landslide susceptibility for multiple-occurring landslide events: application to the 2009 storm event in Messina (Sicily, southern Italy). Nat Hazards 79(3):1621–1648

    Google Scholar 

  • Mastere M (2020) Mass movement hazard assessment at a medium scale using weight of evidence model and neo-predictive variables creation. In: Mapping and spatial analysis of socio-economic and environmental indicators for sustainable development, pp 73–85. Springer, Cham

  • Miraki S, Zanganeh SH, Chapi K, Singh VP, Shirzadi A, Shahabi H, Pham BT (2019) Mapping groundwater potential using a novel hybrid intelligence approach. Water Resour Manag 33(1):281–302

    Google Scholar 

  • Mishra K, Sinha R (2020) Flood risk assessment in the Kosi megafan using multi-criteria decision analysis: a hydro-geomorphic approach. Geomorphology 350:106861

    Google Scholar 

  • Moghaddam DD, Rahmati O, Panahi M, Tiefenbacher J, Darabi H, Haghizadeh A, Bui DT (2020) The effect of sample size on different machine learning models for groundwater potential mapping in mountain bedrock aquifers. CATENA 187:104421

    Google Scholar 

  • Mukherjee P, Singh CK, Mukherjee S (2012) Delineation of groundwater potential zones in arid region of India—a remote sensing and GIS approach. Water Resour Manag 26(9):2643–2672

    Google Scholar 

  • Naghibi SA, Pourghasemi HR, Pourtaghi ZS, Rezaei A (2015) Groundwater qanat potential mapping using frequency ratio and Shannon’s entropy models in the Moghan watershed, Iran. Earth Sci Inform 8(1):171–186

    Google Scholar 

  • Naghibi SA, Pourghasemi HR, Dixon B (2016) GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran. Environ Monit Assess 188(1):44

    Google Scholar 

  • Naghibi SA, Ahmadi K, Daneshi A (2017) Application of support vector machine, random forest, and genetic algorithm optimized random forest models in groundwater potential mapping. Water Resour Manag 31(9):2761–2775

    Google Scholar 

  • Naghibi SA, Pourghasemi HR, Abbaspour K (2018) A comparison between ten advanced and soft computing models for groundwater qanat potential assessment in Iran using R and GIS. Theoret Appl Climatol 131(3–4):967–984

    Google Scholar 

  • Nampak H, Pradhan B, Manap MA (2014) Application of GIS based data driven evidential belief function model to predict groundwater potential zonation. J Hydrol 513:283–300

    Google Scholar 

  • Ozdemir A (2011) GIS-based groundwater spring potential mapping in the Sultan Mountains (Konya, Turkey) using frequency ratio, weights of evidence and logistic regression methods and their comparison. J Hydrol 411(3–4):290–308

    Google Scholar 

  • Pham BT, Jaafari A, Prakash I, Singh SK, Quoc NK, Bui DT (2019) Hybrid computational intelligence models for groundwater potential mapping. CATENA 182:104101

    Google Scholar 

  • Porwal A, Carranza EJM, Hale M (2006) Bayesian network classifiers for mineral potential mapping. Comput Geosci 32(1):1–16

    Google Scholar 

  • Pourghasemi HR, Pradhan B, Gokceoglu C (2012) Application of fuzzy logic and analytical hierarchy process (AHP) to landslide susceptibility mapping at Haraz watershed, Iran. Nat Hazards 63(2):965–996

    Google Scholar 

  • Pourghasemi HR, Termeh SVR, Kariminejad N, Hong H, Chen W (2020) An assessment of metaheuristic approaches for flood assessment. J Hydrol 582:124536

    Google Scholar 

  • Pradhan B (2013) A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput Geosci 51:350–365

    Google Scholar 

  • Rahimi I, Azeez SN, Ahmed IH (2020) Mapping forest-fire potentiality using remote sensing and GIS, case study: Kurdistan Region-Iraq. In: Environmental remote sensing and GIS in Iraq, pp 499–513. Springer, Cham

  • Rahmati O, Samani AN, Mahdavi M, Pourghasemi HR, Zeinivand H (2015) Groundwater potential mapping at Kurdistan region of Iran using analytic hierarchy process and GIS. Arab J Geosci 8(9):7059–7071

    Google Scholar 

  • Rahmati O, Pourghasemi HR, Zeinivand H (2016) Flood susceptibility mapping using frequency ratio and weights-of-evidence models in the Golastan Province, Iran. Geocarto Int 31(1):42–70

    Google Scholar 

  • Razandi Y, Pourghasemi HR, Neisani NS, Rahmati O (2015) Application of analytical hierarchy process, frequency ratio, and certainty factor models for groundwater potential mapping using GIS. Earth Sci Inf 8(4):867–883

    Google Scholar 

  • Rodriguez-Galiano V, Chica-Olmo M (2012) Land cover change analysis of a Mediterranean area in Spain using different sources of data: multi-seasonal Landsat images, land surface temperature, digital terrain models and texture. Appl Geogr 35(1–2):208–218

    Google Scholar 

  • Sameen MI, Sarkar R, Pradhan B, Drukpa D, Alamri AM, Park HJ (2020) Landslide spatial modelling using unsupervised factor optimisation and regularised greedy forests. Comput Geosci 134:104336

    Google Scholar 

  • Sander P, Chesley MM, Minor TB (1996) Groundwater assessment using remote sensing and GIS in a rural groundwater project in Ghana: lessons learned. Hydrogeol J 4(3):40–49

    Google Scholar 

  • Sansare DA, Mhaske SY (2020) Natural hazard assessment and mapping using remote sensing and QGIS tools for Mumbai city, India. Nat Hazards 100:1117–1136

    Google Scholar 

  • Sarkar D, Mondal P (2020) Flood vulnerability mapping using frequency ratio (FR) model: a case study on Kulik river basin, Indo-Bangladesh Barind region. Appl Water Sci 10(1):17

    Google Scholar 

  • Sevinc V, Kucuk O, Goltas M (2020) A Bayesian network model for prediction and analysis of possible forest fire causes. For Ecol Manag 457:117723

    Google Scholar 

  • Tang RX, Kulatilake PH, Yan EC, Cai JS (2020) Evaluating landslide susceptibility based on cluster analysis, probabilistic methods, and artificial neural networks. Bull Eng Geol Environ 79:2235–2254. https://doi.org/10.1007/s10064-019-01684-y

    Article  Google Scholar 

  • Tehrany MS, Pradhan B, Jebur MN (2015) Flood susceptibility analysis and its verification using a novel ensemble support vector machine and frequency ratio method. Stoch Environ Res Risk Assess 29(4):1149–1165

    Google Scholar 

  • Thai Pham B, Tien Bui D, Prakash I (2018) Landslide susceptibility modelling using different advanced decision trees methods. Civ Eng Environ Syst 35(1–4):139–157

    Google Scholar 

  • Tien Bui D, Pradhan B, Lofman O, Revhaug I (2012) Landslide susceptibility assessment in Vietnam using support vector machines, decision tree, and Naive Bayes Models. Math Probl Eng 2012:974638. https://doi.org/10.1155/2012/974638

    Article  Google Scholar 

  • Van Dao D, Jaafari A, Bayat M, Mafi-Gholami D, Qi C, Moayedi H, Luu C (2020) A spatially explicit deep learning neural network model for the prediction of landslide susceptibility. CATENA 188:104451

    Google Scholar 

  • Venkatesh K, Preethi K, Ramesh H (2020) Evaluating the effects of forest fire on water balance using fire susceptibility maps. Ecol Ind 110:105856

    Google Scholar 

  • Wang Y, Feng L, Li S, Ren F, Du Q (2020) A hybrid model considering spatial heterogeneity for landslide susceptibility mapping in Zhejiang Province, China. CATENA 188:104425

    Google Scholar 

  • Wu Y, Ke Y, Chen Z, Liang S, Zhao H, Hong H (2020) Application of alternating decision tree with AdaBoost and bagging ensembles for landslide susceptibility mapping. CATENA 187:104396

    Google Scholar 

  • Yalcin A (2008) GIS-based landslide susceptibility mapping using analytical hierarchy process and bivariate statistics in Ardesen (Turkey): comparisons of results and confirmations. CATENA 72(1):1–12

    Google Scholar 

  • Yalcin A, Reis S, Aydinoglu AC, Yomralioglu T (2011) A GIS-based comparative study of frequency ratio, analytical hierarchy process, bivariate statistics and logistics regression methods for landslide susceptibility mapping in Trabzon, NE Turkey. CATENA 85(3):274–287

    Google Scholar 

  • Yariyan P, Janizadeh S, Van Phong T, Nguyen HD, Costache R, Van Le H, Tiefenbacher JP (2020) Improvement of best first decision trees using bagging and dagging ensembles for flood probability mapping. Water Resour Manag 34:3037–3053

    Google Scholar 

  • Yilmaz I (2010) Comparison of landslide susceptibility mapping methodologies for Koyulhisar, Turkey: conditional probability, logistic regression, artificial neural networks, and support vector machine. Environ Earth Sci 61(4):821–836

    CAS  Google Scholar 

  • Zabihi M, Pourghasemi HR, Pourtaghi ZS, Behzadfar M (2016) GIS-based multivariate adaptive regression spline and random forest models for groundwater potential mapping in Iran. Environ Earth Sci 75(8):665

    Google Scholar 

  • Zabihi M, Pourghasemi HR, Motevalli A, Zakeri MA (2019) Gully erosion modeling using GIS-based data mining techniques in Northern Iran: a comparison between boosted regression tree and multivariate adaptive regression spline. In: Natural hazards GIS-based spatial modeling using data mining techniques, pp. 1–26. Springer, Cham

  • Zaheer M, Zaheer A, Hamza A (2020) Use of geoinformatics for landslide susceptibility mapping: a case study of Murree, Northern Area, Pakistan. In: Transportation soil engineering in cold regions, vol 2, pp 191–199. Springer, Singapore

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shruti Sachdeva.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

The performances of RF and GBDT models at various dataset splits of (50/50, 60/40, 70/30, 80/20, and 90/10) are summarized in Table 6.

Table 6 Performance statistics for various training–testing splits

From Table 6, it can be observed that during testing, the best performance for both the models is delivered when 80% of the dataset were being used for training, i.e. developing the model, while the rest 20% of the dataset is used for testing purpose. Since a classifier’s true performance is gauged by how well the model performs on unseen data, hence the accuracy attained by the model at testing phase is taken as the true metric for determining the ratio for splitting the dataset. Hence, the ratio of 80:20 was employed for generating training and testing data sets.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sachdeva, S., Kumar, B. Comparison of gradient boosted decision trees and random forest for groundwater potential mapping in Dholpur (Rajasthan), India. Stoch Environ Res Risk Assess 35, 287–306 (2021). https://doi.org/10.1007/s00477-020-01891-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-020-01891-0

Keywords

Navigation