Skip to main content

Advertisement

Log in

Groundwater potential mapping using C5.0, random forest, and multivariate adaptive regression spline models in GIS

  • Published:
Environmental Monitoring and Assessment Aims and scope Submit manuscript

Abstract

Ever increasing demand for water resources for different purposes makes it essential to have better understanding and knowledge about water resources. As known, groundwater resources are one of the main water resources especially in countries with arid climatic condition. Thus, this study seeks to provide groundwater potential maps (GPMs) employing new algorithms. Accordingly, this study aims to validate the performance of C5.0, random forest (RF), and multivariate adaptive regression splines (MARS) algorithms for generating GPMs in the eastern part of Mashhad Plain, Iran. For this purpose, a dataset was produced consisting of spring locations as indicator and groundwater-conditioning factors (GCFs) as input. In this research, 13 GCFs were selected including altitude, slope aspect, slope angle, plan curvature, profile curvature, topographic wetness index (TWI), slope length, distance from rivers and faults, rivers and faults density, land use, and lithology. The mentioned dataset was divided into two classes of training and validation with 70 and 30% of the springs, respectively. Then, C5.0, RF, and MARS algorithms were employed using R statistical software, and the final values were transformed into GPMs. Finally, two evaluation criteria including Kappa and area under receiver operating characteristics curve (AUC-ROC) were calculated. According to the findings of this research, MARS had the best performance with AUC-ROC of 84.2%, followed by RF and C5.0 algorithms with AUC-ROC values of 79.7 and 77.3%, respectively. The results indicated that AUC-ROC values for the employed models are more than 70% which shows their acceptable performance. As a conclusion, the produced methodology could be used in other geographical areas. GPMs could be used by water resource managers and related organizations to accelerate and facilitate water resource exploitation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Aghdam, I. N., Varzandeh, M. H. M., & Pradhan, B. (2016). Landslide susceptibility mapping using an ensemble statistical index (Wi) and adaptive neuro-fuzzy inference system (ANFIS) model at Alborz Mountains (Iran). Environmental Earth Sciences, 75(7), 553. https://doi.org/10.1007/s12665-015-5233-6.

    Article  Google Scholar 

  • Akkaş, E., Akin, L., Evren Çubukçu, H., & Artuner, H. (2015). Application of decision tree algorithm for classification and identification of natural minerals using SEM–EDS. Computers & Geosciences, 80, 38–48. https://doi.org/10.1016/j.cageo.2015.03.015.

    Article  Google Scholar 

  • Ayazi, M. H. A., Pirasteh, S., Pili, A. K. A., Biswajeet, P., Nikouravan, B., & Mansor, S. (2010). Disasters and risk reduction in groundwater: Zagros Mountain, Southwest Iran using geoinformatics techniques. Disaster Advances, 3(1), 1–8 3, 1–8.

    Google Scholar 

  • Beven, K. J., & Kirkby, M. J. (1979). A physically based, variable contributing area model of basin hydrology / un modèle à base physique de zone d’appel variable de l’hydrologie du bassin versant. Hydrological Sciences Bulletin, 24(1), 43–69. https://doi.org/10.1080/02626667909491834.

    Article  Google Scholar 

  • Breiman, L. (2001). Random Forests. Machine Learing, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324.

    Article  Google Scholar 

  • Briand, L., Freimut, B., & Vollei, F. (2004). Using multiple adaptive regression splines to support decision making in code inspections. Journal of Systems and Software. http://www.sciencedirect.com/science/article/pii/S0164121204000068. Accessed 22 August 2016.

  • Catry, F., Rego, F., Bação, F., & Moreira, F. (2010). Modeling and mapping wildfire ignition risk in Portugal. International Journal of Wildland. http://www.publish.csiro.au/?paper=WF07123. Accessed 22 August 2016.

  • Chen, W., Pourghasemi, H.R., & Naghibi, S.A. (2017a) Comparative study of landslide susceptibility maps produced using support vector machine with different kernel functions and entropy data mining models in China. Bulletin of Engineering Geology and the Environment, 1–18.

  • Chen, W., Pourghasemi, H.R., & Naghibi, S.A. (2017b). Prioritization of landslide conditioning factors and its spatial modeling in Shangnan County, China using GIS-based data mining algorithms. 2017b. Bulletin of Engineering Geology and the Environment, 1–19.

  • Chenini, I., & Ben Mammou, A. (2010). Groundwater recharge study in arid region: an approach using GIS techniques and numerical modeling. Computers & Geosciences, 36(6), 801–817. https://doi.org/10.1016/j.cageo.2009.06.014.

    Article  Google Scholar 

  • Chezgi, J., Pourghasemi, H. R., Naghibi, S. A., Moradi, H. R., & Kheirkhah Zarkesh, M. (2015). Assessment of a spatial multi-criteria evaluation to site selection underground dams in the Alborz Province, Iran. Geocarto International, 31(6), 628–646. https://doi.org/10.1080/10106049.2015.1073366.

    Article  Google Scholar 

  • Chowdhury, A., Jha, M. K., & Chowdary, V. M. (2010). Delineation of groundwater recharge zones and identification of artificial recharge sites in West Medinipur district, West Bengal, using RS, GIS and MCDM techniques. Environmental Earth Sciences, 59(6), 1209–1222. https://doi.org/10.1007/s12665-009-0110-9.

    Article  Google Scholar 

  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. https://doi.org/10.1177/001316446002000104.

    Article  Google Scholar 

  • Conforti, M., Pascale, S., Robustelli, G., & Sdao, F. (2014). Evaluation of prediction capability of the artificial neural networks for mapping landslide susceptibility in the Turbolo River catchment (northern Calabria, Italy). Catena, 113, 236–250. https://doi.org/10.1016/j.catena.2013.08.006.

    Article  Google Scholar 

  • Conoscenti, C., Ciaccio, M., Caraballo-Arias, N. A., Gómez-Gutiérrez, Á., Rotigliano, E., & Agnesi, V. (2015). Assessment of susceptibility to earth-flow landslide using logistic regression and multivariate adaptive regression splines: a case of the Belice River basin (western Sicily, Italy). Geomorphology, 242, 49–64. https://doi.org/10.1016/j.geomorph.2014.09.020.

    Article  Google Scholar 

  • Corsini, A., Cervi, F., & Ronchetti, F. (2009). Weight of evidence and artificial neural networks for potential groundwater spring mapping: an application to the Mt. Modino area (northern Apennines, Italy). Geomorphology, 111(1–2), 79–87. https://doi.org/10.1016/j.geomorph.2008.03.015.

    Article  Google Scholar 

  • Craven, P., & Wahba, G. (1978). Smoothing noisy data with spline functions. Numerische Mathematik, 31(4), 377–403. https://doi.org/10.1007/BF01404567.

    Article  Google Scholar 

  • Cortez, P., & Embrechts, M. J. (2013). Using sensitivity analysis and visualization techniques to open black box data mining models. Information Sciences, 225, 1–17. https://doi.org/10.1016/j.ins.2012.10.039.

    Article  Google Scholar 

  • Dehnavi, A., Aghdam, I. N., Pradhan, B., & Morshed Varzandeh, M. H. (2015). A new hybrid model using step-wise weight assessment ratio analysis (SWARA) technique and adaptive neuro-fuzzy inference system (ANFIS) for regional landslide hazard assessment in Iran. Catena, 135, 122–148. https://doi.org/10.1016/j.catena.2015.07.020.

    Article  Google Scholar 

  • Egan, J. P. (1975). Signal detection theory and ROC-analysis. Academic Press.

  • Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19(1), 1–67. https://doi.org/10.1214/aos/1176347963.

    Article  Google Scholar 

  • Greenbaum, D. (1992). Structural influences on the occurrence of groundwater in SE Zimbabwe. Geological Society, London, Special. http://sp.lyellcollection.org/content/66/1/77.short. Accessed 23 August 2016.

  • Gupta, M., & Srivastava, P. K. (2010). Integrating GIS and remote sensing for identification of groundwater potential zones in the hilly terrain of Pavagarh, Gujarat, India. Water International, 35(2), 233–245. https://doi.org/10.1080/02508061003664419.

    Article  Google Scholar 

  • Gutiérrez, Á. G., Schnabel, S., & Lavado Contador, J. F. (2009). Using and comparing two nonparametric methods (CART and MARS) to model the potential distribution of gullies. Ecological Modelling, 220(24), 3630–3637. https://doi.org/10.1016/j.ecolmodel.2009.06.020.

    Article  Google Scholar 

  • Hansen, M., Dubayah, R., & Defries, R. (1996). Classification trees: an alternative to traditional land cover classifiers. International Journal of Remote Sensing, 17(5), 1075–1081. https://doi.org/10.1080/01431169608949069.

    Article  Google Scholar 

  • Hong, H., Naghibi, S. A., Pourghasemi, H. R., & Pradhan, B. (2016). GIS-based landslide spatial modeling in Ganzhou City, China. Arabian Journal of Geosciences, 9(2), 1–26.

    Article  Google Scholar 

  • Hong, H., Naghibi, S. A., Moradi Dashtpagerdi, M., Pourghasemi, H. R., & Chen, W. (2017). A comparative assessment between linear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidence models for forest fire susceptibility mapping in China. Arabian Journal of Geosciences, 10(7). https://doi.org/10.1007/s12517-017-2905-4.

  • Immitzer, M., Atzberger, C., & Koukal, T. (2012). Tree species classification with random forest using very high spatial resolution 8-band WorldView-2 satellite data. Remote Sensing, 4(12), 2661–2693. https://doi.org/10.3390/rs4092661.

    Article  Google Scholar 

  • Klein, I., Gessner, U., & Kuenzer, C. (2012). Regional land cover mapping and change detection in Central Asia using MODIS time-series. Applied Geography, 35(1–2), 219–234. https://doi.org/10.1016/j.apgeog.2012.06.016.

    Article  Google Scholar 

  • Kotsiantis, S. B. (2007). Supervised machine learning: a review of classification techniques. Informatica, 31(3).

  • Leathwick, J. R., Elith, J., & Hastie, T. (2006). Comparative performance of generalized additive models and multivariate adaptive regression splines for statistical modelling of species distributions. Ecological Modelling, 199(2), 188–196. https://doi.org/10.1016/j.ecolmodel.2006.05.022.

    Article  Google Scholar 

  • Lee, S., Kim, Y. S., & Oh, H. J. (2012). Application of a weights-of-evidence method and GIS to regional groundwater productivity potential mapping. Journal of Environmental Management, 96(1), 91–105. https://doi.org/10.1016/j.jenvman.2011.09.016.

    Article  Google Scholar 

  • Manap, M. A., Nampak, H., Pradhan, B., Lee, S., Sulaiman, W. N. A., & Ramli, M. F. (2012). Application of probabilistic-based frequency ratio model in groundwater potential mapping using remote sensing data and GIS. Arabian Journal of Geosciences, 7(2), 711–724. https://doi.org/10.1007/s12517-012-0795-z.

    Article  Google Scholar 

  • Meng, X. H., Huang, Y. X., Rao, D. P., Zhang, Q., & Liu, Q. (2013). Comparison of three data mining models for predicting diabetes or prediabetes by risk factors. Kaohsiung Journal of Medical Sciences, 29(2), 93–99. https://doi.org/10.1016/j.kjms.2012.08.016.

    Article  Google Scholar 

  • Mogaji, K. a., Lim, H. S., & Abdullah, K. (2015). Regional prediction of groundwater potential mapping in a multifaceted geology terrain using GIS-based Dempster???Shafer model. Arabian Journal of Geosciences, 8(5), 3235–3258. https://doi.org/10.1007/s12517-014-1391-1.

    Article  Google Scholar 

  • Moore, I. D., & Burch, G. J. (1986). Sediment transport capacity of sheet and rill flow: application of unit stream power theory. Water Resources Research, 22(8), 1350–1360. https://doi.org/10.1029/WR022i008p01350.

    Article  Google Scholar 

  • Moore, I. D., Grayson, R. B., & Ladson, A. R. (1991). Digital terrain modelling: a review of hydrological, geomorphological, and biological applications. Hydrological Processes, 5(1), 3–30. https://doi.org/10.1002/hyp.3360050103.

    Article  Google Scholar 

  • Moosavi, V., & Niazi, Y. (2015). Development of hybrid wavelet packet-statistical models (WP-SM) for landslide susceptibility mapping. Landslides, (April 2016). doi:https://doi.org/10.1007/s10346-014-0547-0.

  • Mousavi, S. M., Golkarian, A., Naghibi, S. A., Kalantar, B., & Pradhan, B. (2017). GIS-based groundwater spring potential mapping using data mining boosted regression tree and probabilistic frequency ratio models in Iran. AIMS Geosciences, 3(1), 91–115. https://doi.org/10.3934/geosci.2017.1.91.

    Article  Google Scholar 

  • Naghibi, S. A., & Moradi Dashtpagerdi, M. (2016). Evaluation of four supervised learning methods for groundwater spring potential mapping in Khalkhal region (Iran) using GIS-based features. Hydrogeology Journal, 25(1), 169–189. https://doi.org/10.1007/s10040-016-1466-z.

    Article  Google Scholar 

  • Naghibi, S. A., & Pourghasemi, H. R. (2015). A comparative assessment between three machine learning models and their performance comparison by bivariate and multivariate statistical methods in groundwater potential mapping. Water Resources Management., 29(14), 5217–5236. https://doi.org/10.1007/s11269-015-1114-8.

    Article  Google Scholar 

  • Naghibi, S. A., Pourghasemi, H. R., & Dixon, B. (2016). GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran. Environmental Monitoring and Assessment, 188(1), 44. https://doi.org/10.1007/s10661-015-5049-6.

    Article  Google Scholar 

  • Naghibi, S. A., Pourghasemi, H. R., Pourtaghi, Z. S., & Rezaei, A. (2015). Groundwater qanat potential mapping using frequency ratio and Shannon’s entropy models in the Moghan watershed, Iran. Earth Science Informatics, 8(1), 1–16. https://doi.org/10.1007/s12145-014-0145-7.

    Article  Google Scholar 

  • Naghibi, S. A., Pourghasemi, H. R., & Abbaspour, K. (2017a). A comparison between ten advanced and soft computing models for groundwater qanat potential assessment in Iran using R and GIS. Theoretical and Applied Climatology. https://doi.org/10.1007/s00704-016-2022-4.

  • Naghibi, S. A., Ahmadi, K., & Daneshi, A. (2017b). Application of support vector machine, random Forest, and genetic algorithm optimized random Forest models in groundwater potential mapping. Water Resources Management., 31(9), 2761–2775. https://doi.org/10.1007/s11269-017-1660-3.

    Article  Google Scholar 

  • Naghibi, S. A., Moghaddam, D. D., Kalantar, B., Pradhan, B., & Kisi, O. (2017c). A comparative assessment of GIS-based data mining models and a novel ensemble model in groundwater well potential mapping. Journal of Hydrology, 548, 471–483. https://doi.org/10.1016/j.jhydrol.2017.03.020.

    Article  Google Scholar 

  • Nampak, H., Pradhan, B., & Manap, M. A. (2014). Application of GIS based data driven evidential belief function model to predict groundwater potential zonation. Journal of Hydrology, 513, 283–300. https://doi.org/10.1016/j.jhydrol.2014.02.053.

    Article  Google Scholar 

  • Negnevitsky, M. (2005). Artificial intelligence : a guide to intelligent systems. Addison-Wesley.

  • Nobre, R. C. M., Filho, O. C. R., Mansur, W. J., Nobre, M. M. M., & Cosenza, C. A. N. (2007). Groundwater vulnerability and risk mapping using GIS, modeling and a fuzzy logic tool. Journal of Contaminant Hydrology 94, 277–292.

  • Oh, H.-J., Kim, Y.-S., Choi, J.-K., Park, E., & Lee, S. (2011). GIS mapping of regional probabilistic groundwater potential in the area of Pohang City, Korea. Journal of Hydrology, 399(3–4), 158–172. https://doi.org/10.1016/j.jhydrol.2010.12.027.

    Article  Google Scholar 

  • Oh, H.-J., & Pradhan, B. (2011). Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallow landslides in a tropical hilly area. Computers & Geosciences, 37(9), 1264–1276. https://doi.org/10.1016/j.cageo.2010.10.012.

    Article  Google Scholar 

  • Oliveira, S., Oehler, F., San-Miguel-Ayanz, J., Camia, A., & Pereira, J. M. C. (2012). Modeling spatial patterns of fire occurrence in Mediterranean Europe using multiple regression and random forest. Forest Ecology and Management, 275, 117–129. https://doi.org/10.1016/j.foreco.2012.03.003.

    Article  Google Scholar 

  • Ozdemir, A. (2011a). Using a binary logistic regression method and GIS for evaluating and mapping the groundwater spring potential in the Sultan Mountains (Aksehir, Turkey). Journal of Hydrology, 405(1–2), 123–136. https://doi.org/10.1016/j.jhydrol.2011.05.015.

    Article  Google Scholar 

  • Ozdemir, A. (2011b). GIS-based groundwater spring potential mapping in the Sultan Mountains (Konya, Turkey) using frequency ratio, weights of evidence and logistic regression methods and their comparison. Journal of Hydrology, 411(3–4), 290–308. https://doi.org/10.1016/j.jhydrol.2011.10.010.

    Article  Google Scholar 

  • Pourtaghi, Z. S., & Pourghasemi, H. R. (2014). GIS-based groundwater spring potential assessment and mapping in the Birjand township, southern Khorasan Province, Iran. Hydrogeology Journal, 22(3), 643–662. https://doi.org/10.1007/s10040-013-1089-6.

    Article  Google Scholar 

  • Prasad, R. K., Mondal, N. C., Banerjee, P., Nandakumar, M. V., & Singh, V. S. (2008). Deciphering potential groundwater zone in hard rock through the application of GIS. Environmental Geology 55, 467–475.

  • Quinlan, J. R. (John R. (1993). C4.5 : programs for machine learning. Morgan Kaufmann Publishers.

  • Rahmati, O., Pourghasemi, H. R., & Melesse, A. M. (2016). Application of GIS-based data driven random forest and maximum entropy models for groundwater potential mapping: a case study at Mehran region, Iran. Catena, 137, 360–372. https://doi.org/10.1016/j.catena.2015.10.010.

    Article  Google Scholar 

  • Rahmati, O., & Melesse, A. M. (2016). Application of Dempster – Shafer theory, spatial analysis and remote sensing for groundwater potentiality and nitrate pollution analysis in the semi-arid region of Khuzestan, Iran, (June). https://doi.org/10.1016/j.scitotenv.2016.06.176.

    Google Scholar 

  • Razandi, Y., Pourghasemi, H. R., Neisani, N. S., & Rahmati, O. (2015). Application of analytical hierarchy process, frequency ratio, and certainty factor models for groundwater potential mapping using GIS. Earth Science Informatics, 8(4), 867–883 Razavi Khorasan Natural Resources and Watershed Management Organization 2015. http://www.nr-khr.ir/. Access date: 2015.

    Article  Google Scholar 

  • Saha, D., Dhar, Y. R., & Vittala, S. S. (2010). Delineation of groundwater development potential zones in parts of marginal Ganga Alluvial Plain in South Bihar, Eastern India. Environmental Monitoring and Assessment 165, 179–191.

  • Samui, P., & Kurup, P. (2012). Multivariate adaptive regression spline (MARS) and least squares support vector machine (LSSVM) for OCR prediction. Soft Computing, 16(8), 1347–1351. https://doi.org/10.1007/s00500-012-0815-7.

    Article  Google Scholar 

  • Siknun, G. P., & Sitanggang, I. S. (2016). Web-based classification application for forest fire data using the shiny framework and the C5.0 algorithm. Procedia Environmental Sciences, 33, 332–339. https://doi.org/10.1016/j.proenv.2016.03.084.

    Article  Google Scholar 

  • Tahmassebipoor, N., Rahmati, O., Noormohamadi, F., & Lee, S. (2016). Spatial analysis of groundwater potential using weights-of-evidence and evidential belief function models and remote sensing. Arabian Journal of Geosciences., 9(1). https://doi.org/10.1007/s12517-015-2166-z.

  • Tweed, S. O., Leblanc, M., Webb, J. A., & Lubczynski, M. W. (2007). Remote sensing and GIS for mapping groundwater recharge and discharge areas in salinity prone catchments, southeastern Australia. Hydrogeology Journal 15(1), 75–96.

  • Van Beijma, S., Comber, A., & Lamb, A. (2014). Random forest classification of salt marsh vegetation habitats using quad-polarimetric airborne SAR, elevation and optical RS data. Remote Sensing of Environment, 149, 118–129. https://doi.org/10.1016/j.rse.2014.04.010.

    Article  Google Scholar 

  • Viera, A., & Garrett, J. (2005). Understanding interobserver agreement: the kappa statistic. Fam Med.http://www1.cs.columbia.edu/~julia/courses/CS6998/Interrater_agreement.Kappa_statistic.pdf. Accessed 22 August 2016.

  • Vilar, L., Woolford, D., Martell, D., & Martín, M. (2010). A model for predicting human-caused wildfire occurrence in the region of Madrid. International Journal of: Spain http://www.publish.csiro.au/?paper=WF09030. Accessed 22 August 2016.

    Google Scholar 

  • Wilson, J. P., John, P., & Gallant, J. C. (2000). Terrain analysis : principles and applications. Wiley.

  • Yang, C.-C., Prasher, S. O., R. Lacroix, R., & S. H. Kim, S. H. (2004). Application of multivariate adaptive regression splines (mars) to simulate soil temperature. Transactions of the ASAE, 47(3), 881–887. https://doi.org/10.13031/2013.16085

  • Youssef, A. M., Pourghasemi, H. R., Pourtaghi, Z. S., & Al-Katheeri, M. M. (2015). Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir region, Saudi Arabia. Landslides, (July). https://doi.org/10.1007/s10346-015-0614-1.

  • Zabihi, M., Pourghasemi, H. R., Pourtaghi, Z. S., & Behzadfar, M. (2016). GIS-based multivariate adaptive regression spline and random forest models for groundwater potential mapping in Iran. Environmental Earth Sciences, 75(8), 665. https://doi.org/10.1007/s12665-016-5424-9.

    Article  Google Scholar 

  • Zabihi, M., Mirchooli, F., Motevalli, A., Darvishan, A. K., Pourghasemi, H. R., Zakeri, M. A., & Sadighi, F. (2018). Spatial modelling of gully erosion in Mazandaran Province, northern Iran. Catena, 161, 1–13. https://doi.org/10.1016/j.catena.2017.10.010.

    Article  Google Scholar 

Download references

Funding

This research was funded by the vice president for research and technology of Ferdowsi University of Mashhad (FUM) with project No. 41538.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Golkarian.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Golkarian, A., Naghibi, S.A., Kalantar, B. et al. Groundwater potential mapping using C5.0, random forest, and multivariate adaptive regression spline models in GIS. Environ Monit Assess 190, 149 (2018). https://doi.org/10.1007/s10661-018-6507-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10661-018-6507-8

Keywords

Navigation