Abstract
In the drought prone district of Dholpur in Rajasthan, India, groundwater is a lifeline for its inhabitants. With population explosion and rapid urbanization, the groundwater is being critically over-exploited. Hence the current groundwater potential mapping study was undertaken to ascertain the areas that are more likely to yield a larger volume of groundwater against those areas that have poor groundwater potential and accordingly perpetuate the much needed damage control. Thematic layers for 14 groundwater influencing factors were considered for the study region, including elevation, slope, aspect, plan curvature, profile curvature, topographic wetness index (TWI), geology, soil, land use, normalized difference vegetation index (NDVI), surface temperature, precipitation, distance from roads, and distance from rivers. These were then subjected to an overlay operation, with the groundwater inventory which comprised of the locations of observational groundwater wells. The resulting geospatial database was then used to train two decision tree based ensemble models: gradient boosted decision trees (GBDT) and random forest (RF). The predictive performance of these models was then compared using various performance metrics such as area under curve (AUC) of receiver operating characteristics (ROC), sensitivity, accuracy, etc. It was found that GBDT (AUC: 0.79) outperformed RF (AUC: 0.71). The validated GBDT model was then used to construct the groundwater potential zonation map. The generated map showed that about 20.2% of the region has very high potential, while 22.6% has high potential to yield groundwater, and approximately 19.9–17.5% of the study region has very low to low groundwater potential.
Similar content being viewed by others
References
Abedi Gheshlaghi H, Feizizadeh B, Blaschke T (2020) GIS-based forest fire risk mapping using the analytical network process and fuzzy logic. J Environ Plan Manag 63(3):481–499
Al-Abadi AM, Shahid S (2015) A comparison between index of entropy and catastrophe theory methods for mapping groundwater potential in an arid region. Environ Monit Assess 187(9):576
Alam MZ, Rahman MS, Rahman MS (2019) A Random Forest based predictor for medical data classification using feature ranking. Inform Med Unlocked 15:100180
Althuwaynee OF, Pradhan B, Lee S (2012) Application of an evidential belief function model in landslide susceptibility mapping. Comput Geosci 44:120–135
Althuwaynee OF, Pradhan B, Park HJ, Lee JH (2014) A novel ensemble bivariate statistical evidential belief function with knowledge-based analytical hierarchy process and multivariate statistical logistic regression for landslide susceptibility mapping. CATENA 114:21–36
Arabameri A, Pradhan B, Rezaei K, Sohrabi M, Kalantari Z (2019a) GIS-based landslide susceptibility mapping using numerical risk factor bivariate model and its ensemble with linear multivariate regression and boosted regression tree algorithms. J Mt Sci 16(3):595–618
Arabameri A, Pradhan B, Lombardo L (2019b) Comparative assessment using boosted regression trees, binary logistic regression, frequency ratio and numerical risk factor for gully erosion susceptibility modelling. CATENA 183:104223
Avand M, Janizadeh S, Naghibi SA, Pourghasemi HR, Khosrobeigi Bozchaloei S, Blaschke T (2019) A comparative assessment of Random Forest and k-Nearest Neighbor classifiers for gully erosion susceptibility mapping. Water 11(10):2076
Banks D, Robins N, Robins N (2002) An introduction to groundwater in crystalline bedrock. Norges geologiske undersøkelse, Trondheim
Beaudoin A, Bernier PY, Guindon L, Villemaire P, Guo XJ, Stinson G, Hall RJ (2014) Mapping attributes of Canada’s forests at moderate resolution through kNN and MODIS imagery. Can J For Res 44(5):521–532
Bragagnolo L, da Silva RV, Grzybowski JMV (2020a) Artificial neural network ensembles applied to the mapping of landslide susceptibility. CATENA 184:104240
Bragagnolo L, da Silva RV, Grzybowski JMV (2020b) Landslide susceptibility mapping with r landslide: a free open-source GIS-integrated tool based on Artificial Neural Networks. Environ Model Softw 123:104565
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Bui QT, Nguyen QH, Nguyen XL, Pham VD, Nguyen HD, Pham VM (2020) Verification of novel integrations of swarm intelligence algorithms into deep learning neural network for flood susceptibility mapping. J Hydrol 581:124379
Carranza EJM, Hale M (2003) Evidential belief functions for data-driven geologically constrained mapping of gold potential, Baguio district, Philippines. Ore Geol Rev 22(1–2):117–132
Central Ground Water Board (CGWB), Ministry of Jal Shakti, Department of Water Resources, River Development and Ganga Rejuvenation, Government of India, Assesment of Ground Water (2018). http://cgwb.gov.in/. Accessed 18 Jan 2020
Chen W, Xie X, Wang J, Pradhan B, Hong H, Bui DT, Ma J (2017) A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility. CATENA 151:147–160
Chen W, Zhang S, Li R, Shahabi H (2018) Performance evaluation of the GIS-based data mining techniques of best-first decision tree, random forest, and naïve Bayes tree for landslide susceptibility modeling. Sci Total Environ 644:1006–1018
Chen J, Li Q, Wang H, Deng M (2020a) A machine learning ensemble approach based on random forest and radial basis function neural network for risk evaluation of regional flood disaster: a case study of the Yangtze River Delta, China. Int J Environ Res Public Health 17(1):49
Chen W, Li Y, Xue W, Shahabi H, Li S, Hong H, Ahmad BB (2020b) Modeling flood susceptibility using data-driven approaches of naïve bayes tree, alternating decision tree, and random forest methods. Sci Total Environ 701:134979
Choubin B, Moradi E, Golshan M, Adamowski J, Sajedi-Hosseini F, Mosavi A (2019) An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines. Sci Total Environ 651:2087–2096
Çolak E, Sunar F (2020) Evaluation of forest fire risk in the Mediterranean Turkish forests: a case study of Menderes region, Izmir. Int J Disaster Risk Reduct 45:101479
Corsini A, Cervi F, Ronchetti F (2009) Weight of evidence and artificial neural networks for potential groundwater spring mapping: an application to the Mt. Modino area (Northern Apennines, Italy). Geomorphology 111(1–2):79–87
Costache R, Bui DT (2020) Identification of areas prone to flash-flood phenomena using multiple-criteria decision-making, bivariate statistics, machine learning and their ensembles. Sci Total Environ 712:136492
de Quadros TF, Koppe JC, Strieder AJ, Costa JF (2006) Mineral-potential mapping: a comparison of weights-of-evidence and fuzzy methods. Nat Resour Res 15(1):49–65
Díaz-Alcaide S, Martínez-Santos P (2019) Advances in groundwater potential mapping. Hydrogeol J 27(7):2307–2324
Dou J, Yunus AP, Bui DT, Merghadi A, Sahana M, Zhu Z, Pham BT (2019) Assessment of advanced random forest and decision tree algorithms for modeling rainfall-induced landslide susceptibility in the Izu-Oshima Volcanic Island, Japan. Sci Total Environ 662:332–346
Feloni E, Mousadis I, Baltas E (2020) Flood vulnerability assessment using a GIS-based multi-criteria approach—the case of Attica region. J Flood Risk Manag 13:e12563
Feng B, Wang J, Zhang Y, Hall B, Zeng C (2020) Urban flood hazard mapping using a hydraulic–GIS combined model. Nat Hazards 100:1089–1104
Fitts CR (2002) Groundwater science. Elsevier, Amsterdam
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378
Garosi Y, Sheklabadi M, Pourghasemi HR, Besalatpour AA, Conoscenti C, Van Oost K (2018) Comparison of differences in resolution and sources of controlling factors for gully erosion susceptibility mapping. Geoderma 330:65–78
Gayen A, Pourghasemi HR, Saha S, Keesstra S, Bai S (2019) Gully erosion susceptibility assessment and management of hazard-prone areas in India using different machine learning algorithms. Sci Total Environ 668:124–138
Gjertsen AK (2007) Accuracy of forest mapping based on Landsat TM data and a kNN-based method. Remote Sens Environ 110(4):420–430
Hosseinalizadeh M, Kariminejad N, Chen W, Pourghasemi HR, Alinejad M, Behbahani AM, Tiefenbacher JP (2019) Gully headcut susceptibility modeling using functional trees, naïve Bayes tree, and random forest models. Geoderma 342:1–11
Hu Q, Zhou Y, Wang S, Wang F (2020) Machine learning and fractal theory models for landslide susceptibility mapping: case study from the Jinsha River Basin. Geomorphology 351:106975
Jha MK, Chowdhury A, Chowdary VM, Peiffer S (2007) Groundwater management and development by integrated remote sensing and geographic information systems: prospects and constraints. Water Resour Manag 21(2):427–467
Kaur L, Rishi MS, Singh G, Thakur SN (2020) Groundwater potential assessment of an alluvial aquifer in Yamuna sub-basin (Panipat region) using remote sensing and GIS techniques in conjunction with analytical hierarchy process (AHP) and catastrophe theory (CT). Ecol Ind 110:105850
Kayastha P, Dhital MR, De Smedt F (2012) Landslide susceptibility mapping using the weight of evidence method in the Tinau watershed, Nepal. Nat Hazards 63(2):479–498
Khosravi K, Pham BT, Chapi K, Shirzadi A, Shahabi H, Revhaug I, Bui DT (2018) A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. Sci Total Environ 627:744–755
Kim JC, Lee S, Jung HS, Lee S (2018) Landslide susceptibility mapping using random forest and boosted tree models in Pyeong-Chang, Korea. Geocarto Int 33(9):1000–1015
Kuhnert PM, Henderson AK, Bartley R, Herr A (2010) Incorporating uncertainty in gully erosion calculations using the random forests modelling approach. Environmetrics 21(5):493–509
Lee S, Choi J (2004) Landslide susceptibility mapping using GIS and the weight-of-evidence model. Int J Geogr Inf Sci 18(8):789–814
Lee S, Pradhan B (2007) Landslide hazard mapping at Selangor, Malaysia using frequency ratio and logistic regression models. Landslides 4(1):33–41
Lee S, Song KY, Kim Y, Park I (2012) Regional groundwater productivity potential mapping using a geographic information system (GIS) based artificial neural network model. Hydrogeol J 20(8):1511–1527
Lee S, Kim JC, Jung HS, Lee MJ, Lee S (2017) Spatial prediction of flood susceptibility using random-forest and boosted-tree models in Seoul metropolitan city, Korea. Geomat Nat Hazards Risk 8(2):1185–1203
Liaw A, Wiener M (2002) Classification and regression by random forest. R News 2(3):18–22
Lombardo L, Cama M, Conoscenti C, Märker M, Rotigliano EJNH (2015) Binary logistic regression versus stochastic gradient boosted decision trees in assessing landslide susceptibility for multiple-occurring landslide events: application to the 2009 storm event in Messina (Sicily, southern Italy). Nat Hazards 79(3):1621–1648
Mastere M (2020) Mass movement hazard assessment at a medium scale using weight of evidence model and neo-predictive variables creation. In: Mapping and spatial analysis of socio-economic and environmental indicators for sustainable development, pp 73–85. Springer, Cham
Miraki S, Zanganeh SH, Chapi K, Singh VP, Shirzadi A, Shahabi H, Pham BT (2019) Mapping groundwater potential using a novel hybrid intelligence approach. Water Resour Manag 33(1):281–302
Mishra K, Sinha R (2020) Flood risk assessment in the Kosi megafan using multi-criteria decision analysis: a hydro-geomorphic approach. Geomorphology 350:106861
Moghaddam DD, Rahmati O, Panahi M, Tiefenbacher J, Darabi H, Haghizadeh A, Bui DT (2020) The effect of sample size on different machine learning models for groundwater potential mapping in mountain bedrock aquifers. CATENA 187:104421
Mukherjee P, Singh CK, Mukherjee S (2012) Delineation of groundwater potential zones in arid region of India—a remote sensing and GIS approach. Water Resour Manag 26(9):2643–2672
Naghibi SA, Pourghasemi HR, Pourtaghi ZS, Rezaei A (2015) Groundwater qanat potential mapping using frequency ratio and Shannon’s entropy models in the Moghan watershed, Iran. Earth Sci Inform 8(1):171–186
Naghibi SA, Pourghasemi HR, Dixon B (2016) GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran. Environ Monit Assess 188(1):44
Naghibi SA, Ahmadi K, Daneshi A (2017) Application of support vector machine, random forest, and genetic algorithm optimized random forest models in groundwater potential mapping. Water Resour Manag 31(9):2761–2775
Naghibi SA, Pourghasemi HR, Abbaspour K (2018) A comparison between ten advanced and soft computing models for groundwater qanat potential assessment in Iran using R and GIS. Theoret Appl Climatol 131(3–4):967–984
Nampak H, Pradhan B, Manap MA (2014) Application of GIS based data driven evidential belief function model to predict groundwater potential zonation. J Hydrol 513:283–300
Ozdemir A (2011) GIS-based groundwater spring potential mapping in the Sultan Mountains (Konya, Turkey) using frequency ratio, weights of evidence and logistic regression methods and their comparison. J Hydrol 411(3–4):290–308
Pham BT, Jaafari A, Prakash I, Singh SK, Quoc NK, Bui DT (2019) Hybrid computational intelligence models for groundwater potential mapping. CATENA 182:104101
Porwal A, Carranza EJM, Hale M (2006) Bayesian network classifiers for mineral potential mapping. Comput Geosci 32(1):1–16
Pourghasemi HR, Pradhan B, Gokceoglu C (2012) Application of fuzzy logic and analytical hierarchy process (AHP) to landslide susceptibility mapping at Haraz watershed, Iran. Nat Hazards 63(2):965–996
Pourghasemi HR, Termeh SVR, Kariminejad N, Hong H, Chen W (2020) An assessment of metaheuristic approaches for flood assessment. J Hydrol 582:124536
Pradhan B (2013) A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput Geosci 51:350–365
Rahimi I, Azeez SN, Ahmed IH (2020) Mapping forest-fire potentiality using remote sensing and GIS, case study: Kurdistan Region-Iraq. In: Environmental remote sensing and GIS in Iraq, pp 499–513. Springer, Cham
Rahmati O, Samani AN, Mahdavi M, Pourghasemi HR, Zeinivand H (2015) Groundwater potential mapping at Kurdistan region of Iran using analytic hierarchy process and GIS. Arab J Geosci 8(9):7059–7071
Rahmati O, Pourghasemi HR, Zeinivand H (2016) Flood susceptibility mapping using frequency ratio and weights-of-evidence models in the Golastan Province, Iran. Geocarto Int 31(1):42–70
Razandi Y, Pourghasemi HR, Neisani NS, Rahmati O (2015) Application of analytical hierarchy process, frequency ratio, and certainty factor models for groundwater potential mapping using GIS. Earth Sci Inf 8(4):867–883
Rodriguez-Galiano V, Chica-Olmo M (2012) Land cover change analysis of a Mediterranean area in Spain using different sources of data: multi-seasonal Landsat images, land surface temperature, digital terrain models and texture. Appl Geogr 35(1–2):208–218
Sameen MI, Sarkar R, Pradhan B, Drukpa D, Alamri AM, Park HJ (2020) Landslide spatial modelling using unsupervised factor optimisation and regularised greedy forests. Comput Geosci 134:104336
Sander P, Chesley MM, Minor TB (1996) Groundwater assessment using remote sensing and GIS in a rural groundwater project in Ghana: lessons learned. Hydrogeol J 4(3):40–49
Sansare DA, Mhaske SY (2020) Natural hazard assessment and mapping using remote sensing and QGIS tools for Mumbai city, India. Nat Hazards 100:1117–1136
Sarkar D, Mondal P (2020) Flood vulnerability mapping using frequency ratio (FR) model: a case study on Kulik river basin, Indo-Bangladesh Barind region. Appl Water Sci 10(1):17
Sevinc V, Kucuk O, Goltas M (2020) A Bayesian network model for prediction and analysis of possible forest fire causes. For Ecol Manag 457:117723
Tang RX, Kulatilake PH, Yan EC, Cai JS (2020) Evaluating landslide susceptibility based on cluster analysis, probabilistic methods, and artificial neural networks. Bull Eng Geol Environ 79:2235–2254. https://doi.org/10.1007/s10064-019-01684-y
Tehrany MS, Pradhan B, Jebur MN (2015) Flood susceptibility analysis and its verification using a novel ensemble support vector machine and frequency ratio method. Stoch Environ Res Risk Assess 29(4):1149–1165
Thai Pham B, Tien Bui D, Prakash I (2018) Landslide susceptibility modelling using different advanced decision trees methods. Civ Eng Environ Syst 35(1–4):139–157
Tien Bui D, Pradhan B, Lofman O, Revhaug I (2012) Landslide susceptibility assessment in Vietnam using support vector machines, decision tree, and Naive Bayes Models. Math Probl Eng 2012:974638. https://doi.org/10.1155/2012/974638
Van Dao D, Jaafari A, Bayat M, Mafi-Gholami D, Qi C, Moayedi H, Luu C (2020) A spatially explicit deep learning neural network model for the prediction of landslide susceptibility. CATENA 188:104451
Venkatesh K, Preethi K, Ramesh H (2020) Evaluating the effects of forest fire on water balance using fire susceptibility maps. Ecol Ind 110:105856
Wang Y, Feng L, Li S, Ren F, Du Q (2020) A hybrid model considering spatial heterogeneity for landslide susceptibility mapping in Zhejiang Province, China. CATENA 188:104425
Wu Y, Ke Y, Chen Z, Liang S, Zhao H, Hong H (2020) Application of alternating decision tree with AdaBoost and bagging ensembles for landslide susceptibility mapping. CATENA 187:104396
Yalcin A (2008) GIS-based landslide susceptibility mapping using analytical hierarchy process and bivariate statistics in Ardesen (Turkey): comparisons of results and confirmations. CATENA 72(1):1–12
Yalcin A, Reis S, Aydinoglu AC, Yomralioglu T (2011) A GIS-based comparative study of frequency ratio, analytical hierarchy process, bivariate statistics and logistics regression methods for landslide susceptibility mapping in Trabzon, NE Turkey. CATENA 85(3):274–287
Yariyan P, Janizadeh S, Van Phong T, Nguyen HD, Costache R, Van Le H, Tiefenbacher JP (2020) Improvement of best first decision trees using bagging and dagging ensembles for flood probability mapping. Water Resour Manag 34:3037–3053
Yilmaz I (2010) Comparison of landslide susceptibility mapping methodologies for Koyulhisar, Turkey: conditional probability, logistic regression, artificial neural networks, and support vector machine. Environ Earth Sci 61(4):821–836
Zabihi M, Pourghasemi HR, Pourtaghi ZS, Behzadfar M (2016) GIS-based multivariate adaptive regression spline and random forest models for groundwater potential mapping in Iran. Environ Earth Sci 75(8):665
Zabihi M, Pourghasemi HR, Motevalli A, Zakeri MA (2019) Gully erosion modeling using GIS-based data mining techniques in Northern Iran: a comparison between boosted regression tree and multivariate adaptive regression spline. In: Natural hazards GIS-based spatial modeling using data mining techniques, pp. 1–26. Springer, Cham
Zaheer M, Zaheer A, Hamza A (2020) Use of geoinformatics for landslide susceptibility mapping: a case study of Murree, Northern Area, Pakistan. In: Transportation soil engineering in cold regions, vol 2, pp 191–199. Springer, Singapore
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
The performances of RF and GBDT models at various dataset splits of (50/50, 60/40, 70/30, 80/20, and 90/10) are summarized in Table 6.
From Table 6, it can be observed that during testing, the best performance for both the models is delivered when 80% of the dataset were being used for training, i.e. developing the model, while the rest 20% of the dataset is used for testing purpose. Since a classifier’s true performance is gauged by how well the model performs on unseen data, hence the accuracy attained by the model at testing phase is taken as the true metric for determining the ratio for splitting the dataset. Hence, the ratio of 80:20 was employed for generating training and testing data sets.
Rights and permissions
About this article
Cite this article
Sachdeva, S., Kumar, B. Comparison of gradient boosted decision trees and random forest for groundwater potential mapping in Dholpur (Rajasthan), India. Stoch Environ Res Risk Assess 35, 287–306 (2021). https://doi.org/10.1007/s00477-020-01891-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-020-01891-0