Abstract
This study applied two bivariate statistical models (frequency ratio and information value), one multivariate statistical model (logistic regression), and two supervised statistical learning models (boosted regression trees and classification and regression trees) for mapping flood proneness in an arid region of southern Iraq. For this purpose, ten flood causative factors were chosen based on data availability and local conditions along with the spatial extent of the large flood that affected the study area on 13 May 2013. The factors used involved topography-related factors (elevation, slope, curvature, topographic wetness index, and stream power index), lithology, soil, land use/land cover, the average of annual rainfall, and distance to rivers. The multicollinearity test proved that there was no multicollinearity problem among the factors used. Investigating the worth of factors in building the models using information gain ratio showed that the most important factors that play a major role in controlling flood proneness were elevation, followed by annual rainfall average, distance to rivers, land use/land cover, lithology, and soil. The models were employed using the most important factors to get flood proneness maps. The values of flood proneness were categorized into five classes using a quantile classification scheme. For validating the models, area under the receiver operating characteristic curve (AUC) was used. The AUC for prediction data set was 0.793, 0.786, 0.779, 0.754, and 0.753 for classification and regression trees, boosted regression trees, logistic regression, information value, and frequency ratio, respectively. For the best performance model (classification and regression trees), the areas occupied by flood proneness zones were 2735 km2, 2809 km2, 2816 km2, 2732 km2, and 2801 km2, for very low, low, moderate, high, and very high flood proneness zones, respectively. The main conclusion is that the machine learning models are optimal in mapping flood proneness in the study area, followed by the multivariate and bivariate models. Decision makers and hydrologists for improved management of access floodwater and prevention of flood-related damages can adopt the flood proneness maps developed in this study.
Similar content being viewed by others
References
Abeare S (2009) Comparisons of boosted regression tree, GLM and GAM performance in the standardization of yellowfin tuna catch-rate data from the Gulf of Mexico lonline [sic] fishery. LSU Master's Theses. 2880. https://digitalcommons.lsu.edu/gradschool_theses/2880
Aertsen W, Kint K, Vos BD, Deckers J, Orshoven JV, Muys B (2012) Predicting forest site productivity in temperate lowland from forest floor, soil and litterfall characteristics using boosted regression trees. Plant Soil 354(1–2):157–172. https://doi.org/10.1007/s11104-011-1052-z
Al-Abadi AM (2017) A novel geographical information system-based Ant Miner algorithm model for delineating groundwater flowing artesian well boundary: a case study from Iraqi southern and western deserts. Environ Earth Sci 76:534. https://doi.org/10.1007/s12665-017-6876-2
Al-Abadi AM (2018) Mapping flood susceptibility in an arid region of southern Iraq using ensemble machine learning classifiers: a comparative study. Arab J Geosci 11:218. https://doi.org/10.1007/s12517-018-3584-5
Al-Abadi AM, Al-Temmeme AA, Al-Ghanimy MA (2016) A GIS-based combining of frequency ratio and index of entropy approaches for mapping groundwater availability zones at Badra–Al Al-Gharbi–Teeb areas Iraq. Sustain Water Resour Manag 2(3):265–283. https://doi.org/10.1007/s40899-016-0056-5
Al-Abadi AM, Handhal AM, Al-Ginamy MA (2019) Evaluating the Dibdibba aquifer productivity at the Karbala-Najaf plateau (Central Iraq) using GIS-based tree machine learning algorithms. Nat Resour Res. https://doi.org/10.1007/s11053-019-09561-x
Al-Abadi AM, Shahid S (2016) Spatial mapping of artesian zone at Iraqi southern desert using a GIS-based random forest machine learning model. Model Earth Syst Environ 2(2):96. https://doi.org/10.1007/s40808-016-0150-6
Allison PD (1999) Multiple regression: a primer. Pine Forge Press, Newbury Park
Atiaa AM (2012) Hydrological and hydrogeological of analysis of northeastern Missan governorate, south of Iraq using geographic information system. Unpublished PhD thesis, College of Science, Baghdad University, Baghdad, Iraq. www.scbaghdad.edu.iq/.../Hydrological%20and%20hydrogeological%20analysis.pdf
Ayalew L, Yamagishi H (2005) The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan. Geomorphology 65(1–2):15–31. https://doi.org/10.1016/j.geomorph.2004.06.010
Bai SB, Wang J, Guo NL, Zhou PG, Hou SS, Xu SN (2010) GIS-based logistic regression for landslide susceptibility mapping of the Zhongxian segment in the three Gorgesarea, China. Geomorphology 115(1–2):23–31. https://doi.org/10.1016/j.geomorph.2009.09.025
Bonham-Carter GF (1994) Geographic information systems for geoscientists. Pergamon Press, Modeling with GIS, Oxford, p 398
Butler D, Kokkalidou A, Makropoulos CK (2006) Supporting the siting of new urban developments for integrated urban water resource management. In: Hlavinek P, Kukharchyk T, Marsalek J, Mahrikova I (eds) Integrated urban water resources management. Springer, Dordrecht, pp 19–34
Cao C, Xu P, Wang Y, Chen J, Zheng L, Niu C (2016) Flash flood hazard susceptibility mapping using frequency ratio and statistical index methods in coalmine subsidence areas. Sustainability 8(9):1–18. https://doi.org/10.3390/su8090948
Cervi F, Berti M, Borgatti L, Ronchetti F, Manenti F, Corsini A (2010) Comparing predictive capability of statistical and deterministic methods for landslide susceptibility mapping: a case study in the northern Apennines (Reggio Emilia Province, Italy). Landslides 7(4):433–444. https://doi.org/10.1007/s10346-010-0207-y
Chauhan S, Sharma M, Arora MK (2010) Landslide susceptibility zonation of the Chamoli region, Garhwal Himalayas, using logistic regression model. Landslides 7(4):411–423
Chen W, Shirzadi A, Shahabi H, Ahmad BB, Zhang S, Hong H, Zhang N (2017) A novel hybrid artificial intelligence approach based on the rotation forest ensemble and naïve Bayes tree classifiers for a landslide susceptibility assessment in Langao County, China. Geomat Nat Hazards Risk 8(2):1955–1977. https://doi.org/10.1080/19475705.2017.1401560
Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) Infogan: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in neural information processing systems, pp. 2172–2180
Choubin B, Moradi E, Golshan M, Adamowski J, Sajedi-Hosseini F, Mosavi A (2019) An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines. Sci Total Environ 651:2087–2096. https://doi.org/10.1016/j.scitotenv.2018.10.064
Elith J, Leathwick JR, Hastie T (2008) A working guide to boosted regression trees. J Anim Ecol 77:802–813. https://doi.org/10.1111/j.1365-2656.2008.01390.x
Felicísimo ÁM, Cuartero A, Remondo J, Quirós E (2013) Mapping landslide susceptibility with logistic regression, multiple adaptive regression splines, classification and regression trees, and maximum entropy methods: a comparative study. Landslides 10(2):175–189. https://doi.org/10.1007/s10346-012-0320-1
Hong H, Tsangaratos P, Ilia I, Liu J, Zhu A-X, Chen W (2018) Application of fuzzy weight of evidence and data mining techniques in construction of flood susceptibility map of Poyang County, China. Sci Total Environ 625:575–588. https://doi.org/10.1016/j.scitotenv.2017.12.256
ISRO (2013) Flood water over Wasit governorate, Republic of Iraq. https://reliefweb.int/map/iraq/flood-waters-over-wasit-governoraterepublic-iraq
James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning, vol 112. Springer, New York
Jassim SZ, Goff JC (2006) Geology of Iraq. Dolin, Prague and Moravian Museum, Brno, p 431
Krivoruchko K (2011) Spatial statistical data analysis for GIS users. Esri Press, Redlands, p 928
Lee S, Sambath T (2006) Landslide susceptibility mapping in the Damrei Romel area, Cambodia using frequency ratio and logistic regression models. Environ Geol 50(6):847–855. https://doi.org/10.1007/s00254-006-0256-7
Lee S, Kim YS, Oh HJ (2012) Application of a weigh-of-evidence method and GIS to regional groundwater productivity potential mapping. Environ Manage 96(1): 91–105
Lucà F, Conforti F, Robustelli G (2011) Comparison of GIS-based gullying susceptibility mapping using bivariate and multivariate statistics: Northern Calabria, South Italy. Geomorphology 134(3–4):297–308. https://doi.org/10.1016/j.geomorph.2011.07.006
Mahjoobi J, Etemad-Shahidi A (2008) An alternative approach for the prediction of significant wave heights based on classification and regression trees. Appl Ocean Res 30(3):172–177. https://doi.org/10.1016/j.apor.2008.11.001
Mason SJ, Graham NE (2002) Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: statistical significance and interpretation. Q J R Meteorol Soc 128(584):2145–2166. https://doi.org/10.1256/003590002320603584
McDonald RC, Isbell RF, Speight JG, Walker J, Hopkins MS (1990) Australian land and soil survey field handbook, 2nd ed. Inkata Press, Melbourne
Moore ID, Grayson RB, Ladson AR (1991) Digital terrain modeling—a review of hydrological, geomorphological, and biological applications. Hydrol Process 5(1):3–30. https://doi.org/10.1002/hyp.3360050103
Oh HJ, Lee HJ, Lee C, Lee S (2011) Combining landslide susceptibility maps obtained from frequency ratio, logistic regression, and artificial neural network models using ASTER images and GIS. Eng Geol 124:12–23. https://doi.org/10.1016/j.enggeo.2011.09.011
Parsons RM (1956) Groundwater resources of Iraq. Khanaqin-Jassan area, vol 1. Development Board, Ministry of Development, Government of Iraq. ILL/Joseph R. Skeen Library, New Maxico Institute of Mining and Technology, Socorro, NM 87801
Pourghasemi HR, Moradi HR, Aghda SF (2013) Landslide susceptibility mapping by binary logistic regression, analytical hierarchy process, and statistical index models and assessment of their performances. Nat Hazards 69(1):749–779. https://doi.org/10.1007/s11069-013-0728-5
Pradhan B (2013) A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput Geosci 51:350–365. https://doi.org/10.1016/j.cageo.2012.08.023
Pradhan B, Shafiee M, Pirasteh S (2009) Maximum flood prone area mapping using RADARSAT images and GIS: Kelantan river basin. Int J Geoinform 5(2):11–23
Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9(2):181–199. https://doi.org/10.1007/s10021-005-0054-1
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo
Rahmati O, Zeinivand H, Besharat M (2015) Flood hazard zoning in Yasooj region, Iran, using GIS and multi-criteria decision analysis. Geomat Nat Hazards Risk 7(3):1000–1017. https://doi.org/10.1080/19475705.2015.1045043
Ramani SE, Pitchaimani K, Gnanamanickam VR (2011) GIS based landslide susceptibility mapping of Tevankarai Arsub-watershed, Kodaikkanal, India using binary logistic regression analysis. J Mountain Sci 8(4):505–517. https://doi.org/10.1007/s11629-011-2157-9
Samanta RK, Bhunia GS, Shit PK, Pourghasemi HR (2018) Flood susceptibility mapping using geospatial frequency ratio technique: a case study of Subarnarekha River Basin, India. Model Earth Syst Environ 4(1):395–408. https://doi.org/10.1007/s40808-018-0427-z
Shirzadi A, Saro L, Joo OH, Chapi K (2012) A GIS-based logistic regression model in rock-fall susceptibility mapping along a mountainous road: Salavat Abad case study, Kurdistan, Iran. Nat Hazards 64(2):1639–1656. https://doi.org/10.1007/s11069-012-0321-3
Srinivas VV, Tripathi S, Rao AR, Govindaraju RS (2008) Regional flood frequency analysis by combining self-organizing feature maps and fuzzy clustering. J Hydrol 348(1–2):148–166. https://doi.org/10.1016/j.jhydrol.2007.09.046
Swets JA (1973) The relative operating characteristic in psychology. Science 182(4116):990–1000. https://doi.org/10.1126/science.182.4116.990
Tehrany MS, Lee M-J, Pradhan B, Jebur MN, Lee S (2014) Flood susceptibility mapping using integrated bivariate and multivariate statistical models. Environ Earth Sci 72(10): 4001–4015. https://doi.org/10.1007/s12665-014-3289-3
Tehrany M, Shabani F, Jebur M, Chen W, Xie X (2017) GIS-based spatial prediction of flood prone areas using standalone frequency ratio, logistic regression. Geomat Nat Hazards Risk. https://doi.org/10.1080/19475705.2017.1362038
Tien Bui D, Pradhan B, Nampak H, Bui QT, Tran QA, Nguyen QP (2016a) Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibility modeling in a high-frequency tropical cyclone area using GIS. J Hydrol 540:317–330. https://doi.org/10.1016/j.jhydrol.2016.06.027
Tien Bui D, Tuan TA, Klempe H, Pradhan B, Revhaug I (2016b) Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides 13(2):361–378. https://doi.org/10.1007/s10346-015-0557-6
van Westen CJ (1993)Application of geographical information system to landslide hazard zonation. International Institute for Aerospace Survey and Earth Sciences, Enschede. ITC Publication. The Netherlands
Xiaomeng W, Borgelt C (2004) Information measures in fuzzy decision trees. IEEE 1:85–90. https://doi.org/10.1109/FUZZY.2004.1375694
Yesilnacar E, Topal T (2005) Landslide susceptibility mapping: a comparison of logistic regression and neural networks methods in a medium scale study, Hendek region (Turkey). Eng Geol 79(3–4):251–266. https://doi.org/10.1016/j.enggeo.2005.02.002
Yin KJ, Yan TZ (1988) Statistical prediction model for slope instability of metamorphosed rocks. In: Proceedings of the 5th international symposium on landslides, Lausanne, vol 2, pp 1269–1272
Youssef AM, Pradhan B, Hassan AM (2011) Flash flood risk estimation along the St. Katherine road, southern Sinai, Egypt using GIS based morphometry and satellite imagery. Environ Earth Sci 62(3):611–623. https://doi.org/10.1007/s12665-010-0551-1
Youssef AM, Pourghasemi HR, Pourtaghi, Z, Al-Katheeri MM (2015) Landslides susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslides 13(5):839–856. https://doi.org/10.1007/s10346-015-0614-1
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Al-Abadi, A.M., Al-Najar, N.A. Comparative assessment of bivariate, multivariate and machine learning models for mapping flood proneness. Nat Hazards 100, 461–491 (2020). https://doi.org/10.1007/s11069-019-03821-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11069-019-03821-y