Abstract
Flash floods stand as a substantial peril linked to climate change, imposing a severe menace to both human existence and built structures. This study aims to assess and compare the effectiveness of four distinct machine learning (ML) methodologies in the production of flood susceptibility maps (FSMs) in Ibaraki prefecture, Japan. Additionally, the investigation aims to examine the influence of excluding plan and profile curvature factors on the accuracy of the resulting maps. The dataset comprised 224 spots, consisting of 112 flooded and 112 non-flooded locations, and 11 environmental factors. The models were trained using 70% of the dataset, while the remaining 30% was utilized for model evaluation using the ROC curve method. The results indicated that both the ANN-MLP and SVR models achieved notable accuracy, with area under curve values of 95.23% and 95.83% respectively. An intriguing observation was made when the plan and profile curvature factors were excluded, as it led to an improvement in the accuracy of the ANN-MLP model, resulting in an accuracy of 96.7%. Furthermore, the generated FSMs were classified into five distinct hazard levels. The northern region of the maps predominantly exhibited very low and low hazard levels, while areas located in the southern region, closer to main streams, demonstrated considerably higher hazard levels categorized as very high and high. Ultimately, this study marks novel endeavor to investigate the impact of the curvature factor on the precision of machine learning algorithms in the creation of FSMs, which serve as fundamental tools for subsequent investigations.
Avoid common mistakes on your manuscript.
Introduction
In recent years, flash floods have become a significant and disastrous type of natural calamity, posing considerable dangers to human life, infrastructure, and ecological systems (Elsadek et al. 2023; Wahba et al. 2024b). The global vulnerability to flooding has increased by more than 40% over the past 2 decades and is projected to continue to escalate in the future; this trend is primarily attributable to urbanization and global climate change. Furthermore, significant increases in the susceptibility to flooding have been noted in the Asia-Pacific, Europe, and North America regions (Alfieri et al. 2017; Vaghefi et al. 2019).Furthermore, the increase in the mean surface temperature, which is widely known as “global warming”, is a consequence that arises from the emission of greenhouse gases produced by human activities, in conjunction with subsequent physical feedback mechanisms operating within the complex systems of the Earth. In contrast, climate change refers to the process through which alterations in the typical patterns of climate variables occur at the local level (Janizadeh et al. 2021).
Meanwhile, with the capacity to cause extensive destruction in a matter of minutes, these sudden and powerful floods manifest a swift initiation and considerable intensity. Flash floods, which have been documented across different historical periods, are becoming more frequent and severe. This worrying trend has raised concerns about the potential for climate change to exacerbate the detrimental effects of flash floods, as accumulating evidence indicates that both their frequency and severity are increasing. Therefore, ensuring accurate prediction of flood hazards is crucial for promoting effective flood management and preventative actions (Wahba et al. 2022).
Moreover, floods have emerged as a profoundly concerning issue in numerous regions across the globe, evidenced by the staggering toll of human lives, with over a million individuals tragically losing their lives to storms and flooding events (Malik and Pal 2021). Climate change has engendered a notable escalation in flood hazards through various mechanisms, encompassing intensified precipitation patterns leading to heavier rainfall and more frequent storm surges. This complex phenomenon emerges from an interaction of several detrimental elements, encompassing detrimental hydrological circumstances, changed meteorological patterns, alterations in landforms (geomorphology), and inadequacies in flood protection systems and infrastructures (Pal et al. 2022). The compounding effects of these elements contribute to the heightened vulnerability to flooding events, necessitating comprehensive and adaptive measures to mitigate the risks effectively.
In Japan, between June 28 and July 8, 2018, western Japan experienced a series of torrential rains that resulted in numerous floods and landslides. The magnitude of devastation caused by this natural disaster was ranked as the second most significant event of its kind in the 21st century, following closely after the Great East Japan Earthquake of 2011. Furthermore, in April 2019, the calamity led to a death toll of 263 individuals, with 8 persons still reported missing, and an additional 484 individuals sustaining injuries. The impact on infrastructure was profound, with 6783 houses completely destroyed and 11,346 structures experiencing partial damage (Okazaki et al. 2022).
Generation of FSM is very essential, as it enables the identification of areas most vulnerable to flash floods, thereby facilitating the implementation of appropriate mitigation strategies. In this study the FSM has been produced using variety of machine learning techniques, namely ANN-MLP reg, SVR, GBR, and LASSO. Moreover, the study examines the exclusion of curvature factor in the accuracy assessment of machine learning (ML) techniques. Moreover, scrutinizing the impacts of curvature on the FSM resemble one of the key novelty of this research. By exploring the predictive capabilities of these diverse ML models in the absence of the curvature factor, the research aims to shed light on their respective strengths and limitations in accurately predicting flood-prone areas. The omission of the curvature factor serves as a critical variable to analyze the robustness and reliability of the chosen models in producing flood susceptibility maps. The outcomes of this research will provide valuable insights into the performance of each ML model and contribute significantly to the advancement of flood susceptibility mapping methodologies. Ultimately, this study’s findings hold substantial implications for flood risk management and decision-making processes, facilitating more effective and targeted mitigation strategies to safeguard vulnerable communities and infrastructures from the adverse impacts of flooding events.
Related work
FSMs serve as a vital instrument for decision-makers in formulating effective mitigation plans for regions impacted by the adverse consequences of flash floods. Consequently, the precision of these maps holds immense significance in determining the precise level of hazard. In this context, various methodologies have been employed to generate these maps, encompassing bivariate statistical approaches, multi-criteria decision-making analyses, and more recently, machine learning techniques. The present study employs machine learning techniques, as they exhibit considerable potential in achieving high performance and accuracy in forecasting flood hazard maps. Moreover, artificial neural networks (ANN), decision trees, logistic regression, random forest (RF), regression trees, and support vector machines (SVM) are among the extensively employed machine learning models for conducting flood risk assessments (Roozbeh Hasanzadeh and Tuan 2018; Kourgialas and Karatzas 2017; Gotham et al. 2018; Mojaddadi et al. 2017). Although numerous studies have focused on evaluating flood damage prediction, none of them have specifically investigated the impact of incorporating the curvature factor into machine learning models on the accuracy of the resulting FSMs.
Prior investigations concerning flood vulnerability mapping have accentuated the imperative for advancing more precise and reliable models. Consequently, a cohort of scientists has expressed a preference for leveraging deep learning neural networks or hybrid ensemble models, as these sophisticated approaches demonstrate superior capabilities in accurately evaluating the susceptibility to flooding events (Abu Reza et al. 2021; Talukdar et al. 2020). Concurrently, hydrodynamic models can be employed to simulate runoff within a small-scale catchment area. For instance Wahba et al. (2024a) conducted a simulation of the runoff generated by the maximum predicted storm occurring in 100 year to assess the impact of flood hazards on buildings.
A wide array of methodologies has been employed to create spatial flood hazard maps. These methods encompass statistical indices, frequency ratios, Shannon’s entropy, generalized linear models, logistic regression, weights-of-evidence, multivariate discriminant analysis, weighting factors, flexible discriminant analysis, generalized additive models, multivariate logistic regression, and other multivariate statistical approaches, as documented in studies by. Additionally, multi-criteria decision-making analysis has been applied in studies by Giovannettone et al. (2018) and Youssef et al. (2016). Furthermore, machine learning techniques, including support vector machines (SVM), artificial neural networks (ANNs), least squares SVM (LSSVM), backpropagation ANNs, classification and regression trees (CART), and random forest (RF), have been leveraged by numerous researchers to develop Flood Susceptibility Maps (FSM), as evidenced in studies by Haoyuan et al. (2018) and Darabi et al. (2019).
Regarding SVR, Panahi et al. (2021) conducted an investigation into the performance of a standalone SVR model in generating Flood Susceptibility Maps (FSM). The study involved training the model using 9 environmental factors. The findings indicated that the accuracy of the standalone SVR model, as measured by the area under the ROC curve, reached 87%. However, this relatively lower accuracy might be attributed to the utilization of a smaller set of environmental factors. Nonetheless, the research also demonstrated that employing an ensemble model of SVR, which incorporated the Grasshopper Optimization Algorithm (GOA) and Particle Swarm Optimization (PSO), resulted in a significant improvement in the accuracy of the generated FSM. The use of these optimization techniques in conjunction with SVR contributed to enhancing the predictive capability of the model and, consequently, the accuracy of the produced Flood Susceptibility Maps.
On the hand, the GBR represents an ensemble learning model founded on the principles of boosting, effectively minimizing prediction loss by iteratively fitting the residuals. Wu et al. (2022) successfully applied the GBR and highlighted its efficacy in achieving an efficient downscaling approach, resulting in the generation of high spatial resolution precipitation data.
Additionally, Gaagai et al. (2023) conducted an evaluation of groundwater quality employing a comprehensive approach, which encompassed two machine learning models, namely Artificial Neural Network (ANN) and Gradient Boosting Regressor (GBR), in conjunction with multivariate statistical analysis and Geographic Information System (GIS). The findings of their study revealed that the ANN model exhibited superior predictive capabilities over the GBR in terms of groundwater quality assessment. Moreover, they underscored the significance of leveraging physicochemical parameters and water quality indices, supported by GIS techniques, machine learning, and multivariate modeling, as a valuable and pragmatic strategy for both assessing the quality and facilitating sustainable development of groundwater resources.
In parallel, according to study conducted by Linh et al. (2022) using the K-nearest neighbor (KNN) and Extreme Gradient Boosting (XGB) machine learning models, along with a hybrid genetic algorithm (GA) combined with the XGB model (GA-XGB), to generate the FSM, the GA-XGB model, in particular, demonstrated the highest accuracy among these approaches. Furthermore, Pandey et al. (2021) investigated the integration of frequency ratio (FR) and evidential belief function (EBF) with classification and regression tree (CART) models resulted in the CART-FR and CART-EBF models, respectively. Comparative analyses reveal that the CART-EBF model slightly outperforms the CART-FR model (Pandey et al. 2021). Additionally, a novel Flash-Flood Propagation Susceptibility Index (FFPSI) was calculated using a combination of Weights of Evidence (WOE), Analytical Hierarchy Process (AHP), Logistic Regression (LR), Classification and Regression Trees (CART), and Radial Basis Function Neural Network-Weights of Evidence (RBFN-WOE). The study was conducted in the Zabala river basin located in the mountainous region of the central-southeastern part of Romania. The LR-WOE and AHP-WOE models showed the highest performance among the evaluated models (Costache et al. 2022).
Furthermore, the selection of causative factors is crucial in estimating the FSM and can significantly influence the overall accuracy of the resulted map. Consequently, this study aims to evaluate the impact of chosen causative factors on the accuracy of FSM estimation, both prior to and following the incorporation of the curvature factor.
Study area
Ibaraki Prefecture is located in the Kanto region of Japan. It is situated on the eastern coast of Honshu, the main island of Japan. The prefecture borders Tochigi Prefecture to the north, Gunma Prefecture to the northwest, Saitama Prefecture to the west, Chiba Prefecture to the south, and the Pacific Ocean to the east. The geographical extent of Ibaraki prefecture covers approximately 6100 square kilometers, accommodating an estimated population of around 2.87 million individuals. Figure 1. provides an illustration of the precise geographical positioning of Ibaraki prefecture. In 2019, the prefecture encountered a notable flood incident triggered by Typhoon Hagibis, leading to substantial destruction of both properties and infrastructure, as well as the unfortunate loss of at least one life.
Material and data
In this study, a Digital Elevation Model (DEM) with a 30-m resolution was employed. The DEM is an essential element for flood susceptibility mapping and plays a significant role in hydrological and hydraulic simulations (Kepeng et al. 2021). The DEM data for Ibaraki Prefecture was sourced from Yamazaki Lab website and underwent a correction process. The boundaries of Ibaraki were delineated using a polygon shapefile obtained from DIVA-GIS website.
Land cover data utilized in the study was extracted from the ESRI website. Additionally, shapefiles for roads and rivers sourced from Geofabrik website. These maps are crucial for calculating distances to roads and rivers within ArcMap. Moreover, the identification of flooded and non-flooded locations in Japan was based on a survey conducted on the hazard map portal of Disaster Prevention Division, Ministry of Land, Infrastructure, Transport and Tourism of Japan. This spatial data is critical for generating the flood inventory map. Table 1 describes an overall information about the utilized data.
Methodology
This study comprises four primary stages: preparatory processing, environmental factors, training of the machine learning models, and model validation. The initial step, preparatory processing, involves the utilization of ArcMap software to implement a delineation of the Digital Elevation Model (DEM) in order to determine the flow direction, which is crucial for calculating potential streamlines and basins. Subsequently, these environmental factors are estimated and visualized. the environmental factors involve elevation, slope, distance to stream, distance to river, distance to road, Topographic Wetness Index (TWI), Stream Power Index (SPI), aspect, plane curvature, profile curvature, land cover.
Moreover, the flooded and non-flooded points, along with the environmental factors, are merged and then divided into a 70% portion for training the machine learning models, while the remaining 30% is reserved for testing the performance of models. All the selected machine learning models, namely Artificial Neural Network-Multi Layers Perceptron (ANN-MLP), Support Vector Regression (SVR), Gradient Boosting Regressor (GBR), and Lasso regression, are employed specifically for regression tasks in via Python software.
Following the training of the models, each model generated a Flood Susceptibility Map (FSM). The FSM was produced twice for each model: the first version included all 11 environmental factors, while the second version excluded the plan and profile curvature of the flood conditioning factors.
To evaluate the accuracy of the models, the area under the Receiver Operating Characteristic (ROC) curve was calculated. Additionally, a residual analysis was conducted, and performance measures such as R-squared, mean absolute error (MAE), and mean square error (MSE) were estimated. These measures were utilized to assess and compare the performance of the models. Figure 2 illustrates the methodological framework.
Artificial neural network-multi layer perceptron
Artificial Neural Networks (ANN) offer a straightforward approach to emulating the neural architecture of the human brain. By utilizing training samples, these networks enable the recognition of previously unseen data and facilitate decision-making and problem-solving related to the spatial correlation between input variables and the presence or absence of a specific phenomenon (Gomez and Kavzoglu 2005; Taravat et al. 2016). In addition, Multi-Layer Perceptron (MLP) is widely recognized as one of the most prominent types of Artificial Neural Networks (ANNs). Functioning as a robust modeling tool, the MLP employs a supervised training procedure that relies on data examples containing known outputs (Bishop 1995). The multilayer perceptron (MLP) represents a variant of a feed-forward neural network distinguished by the presence of a solitary output, numerous inputs, and the inclusion of one or more hidden layers (Murtagh 1991).
In this study, ANN-MLP regressor model is utilized to predict the degree of hazard in a previously identified flooded area. The model has been fine-tuned by incorporating various variables, including the sizes of hidden layers, the maximum number of iterations, the activation function, and the solver. Through optimization, the ideal parameter values have been determined as follows: three hidden layers with 100, 50, and 30 nodes respectively, a maximum of 300 iterations, the “tanh” activation function, and the “adam” solver. Figure 3 illustrates a conceptual depiction of the employed ANN-MLP regressor model. The first layer corresponds to the input layer, comprising a collection of neurons that encode the input 11 flood conditioning factors. The neurons within the hidden layers execute a transformation on the sum of input values and corresponding weight factors \((w_{1} x_{1} + w_{2} x_{2} + \cdots + w_{n} x_{n})\) by means of employing an activation function.
On the other hand, The loss curve was determined through the computation of the Mean Squared Error (MSE) at each iteration. By evaluating the average squared discrepancy between predicted and actual values, the MSE served as a metric to gauge the model’s performance. Figure 4 illustrates the estimated loss curve.
This process allowed for the tracking of the loss curve, providing valuable insights into the convergence behavior and the quality of predictions throughout the training iterations. The MSE formula is described in Eq. (1).
where \(n\) is the total number of samples, \(y_i\) represents the true values, and \({\hat{y}}_i\) denotes the predicted values.
Support vector regression
Support vector regression (SVR) can be regarded as a category of supervised machine learning algorithms that possess the ability to function as both prediction models and effective tools for addressing pattern recognition challenges (Vapnik 1999). The training dataset refers to a collection of paired input and target instances \({(x_1, y_1), \ldots , (x_i, y_i)}\) and the corresponding predicted values (\(y_{i} \in R^{n}\)) can be determined by means of a linear approximation function (\(f(x)\)) as depicted in Eq. (2). The precision of the process can be determined by calculating the \(\varepsilon\)-deviation value, which measures the discrepancies between the predicted actual outputs for each corresponding pair of training samples.
where \(f(x)\) is the predicted output, \(w\) represents the weight vector, \(\varphi (x)\) denotes the feature transformation function applied to the input \(x\), and \(b\) is the bias term. According to the theory of structural risk minimization, the values of \(w\) and \(b\) can be determined using the following formula:
where \(\xi _{i}\) and \(\xi _{i}^{*}\) are slack variables that contribute to the minimization of the objective function (Rezaie et al. 2022). In addition, some essential parameters of the SVR model have been described in Fig. 5
The constant “C” in support vector regression (SVR) serves as a regularization parameter that governs the balance between minimizing training error and allowing model flexibility. By controlling the penalty for margin violations and deviation from the desired regression fit, “C” influences the complexity of the SVR model and its tolerance towards outliers. Higher “C” values lead to tighter margins and potentially improved training fit, prioritizing accuracy over generalization. Conversely, lower “C” values result in wider margins, allowing for greater tolerance of violations and potential enhanced generalization. Optimal “C” selection involves considering dataset characteristics, problem complexity, and the desired trade-off between training accuracy and generalization. The mathematical formulation of support vector regression can be expressed in Eq. (3) according to Huaizhi et al. (2018).
where \(\alpha _{i},\alpha _{i}^{*}\) resemble Lagrange coefficient and \(k(x,x_{{\text {i}}} ) = \langle \varphi (x),\varphi (x_{{\text {i}}} )\rangle\) are the kernel functions. In this study, The Radial Basis Function (RBF) was chosen as the kernel function due to its optimal characteristics, as suggested by Huang et al. (2020). Moreover, Eq. (3) can be rearranged as depicted in Eq. (4)
where \(\sigma\) is the factor of the (RBF).
Gradient boosting regressor
This machine learning model employs the technique of "boosting" to generate predictions by combining an ensemble of weak prediction models, commonly decision trees as implemented in this study, in order to construct a more resilient and reliable model (Rao et al. 2019). The Gradient Boosting Regression (GBR) model with M number of trees can be represented mathematically as described in Eq. (5) (Otchere et al. 2022):
where \({\hat{y}}\) is the predicted value, \(\beta _m\) is the weight or contribution of the mth tree, \(h_m(x)\) is the prediction made by the mth tree for input feature vector \(x\).
Least absolute shrinkage and selection operator (LASSO)
This technique is a linear regression model that serves as a variable selection method which effectively reduces the number of factors included in the final model (Hastie et al. 2009). The formula for the LASSO model can be represented as Eqs. (6) and (7):
subject to the constraint:
where \(\beta _0\) is the y-intercept or bias term, \(\beta _j\) represents the coefficients for the input features \(x_j\), \(p\) is the number of input features, \(x_j\) represents the j-th input feature, \(t\) is the maximum allowed sum of the absolute values of the coefficients. LASSO necessitates the specification of an \(\alpha\) parameter, which governs the magnitude of the penalty imposed. To explore the impact of varying penalty strengths, multiple \(\alpha\) values were tested in this study, including 0, 0.1, 0.5, 1, and 10. Where \(\alpha\) is the regularization parameter controlling the strength of the penalty term. Based on the evaluation metrics of Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared (R2), an alpha value of 0.1 was chosen as it yielded the highest accuracy. The LASSO model was implemented using the scikit-learn platform in the Python programming language.
Flood inventory map
The flood inventory map is an essential tool utilized in the assessment of flood hazards, enabling the identification of areas vulnerable to potential flooding. It serves a critical function in the identification and recognition of regions that are at risk of experiencing flood events. Rahmati et al. (2016) and Tehrany et al. (2014). In addition, by increasing the number of flooded areas accurately marked on the flood risk map, the precision of the map can be enhanced. Enhancing the accuracy of identifying flooded areas on a map significantly enhances the map’s efficacy in precisely delineating the regions prone to flooding hazards (Tien Bui and Nhat-Duc 2017).
In this study, a comprehensive sample of 224 locations was selected within the designated study region. Among these locations, 112 were classified as flooded points, while the remaining sites were identified as non-flooded areas. The spatial distribution of these flooded and non-flooded points is visualized in Fig. 6.
Moreover, the utilization of both flooded and non-flooded points in machine learning approaches holds promise for training models that possess the capability to accurately forecast the incidence of floods or their impacts on diverse systems. For instance, a machine learning model could be trained to anticipate the timing and spatial extent of flood events based on input data that encompasses information from both flooded and non-flooded points. By incorporating such data, these models can enhance their predictive accuracy and contribute to more effective flood risk management strategies.
Variables selection
The flood environmental factors play a crucial role in the formulation of the flood hazard map.The selection of these factors was conducted through comprehensive investigations into the correlation between past flood occurrences and the localized geo-environmental characteristics (Costache 2019; Tehrany et al. 2015; Luu et al. 2018). In this research, 11 flood environmental variables were adopted as described in Figs. 8 and 9. The first chosen factor is elevation. There is an inverse relationship exists between elevation and floods, which serves as a fundamental factor in assessing flood vulnerability (Bui et al. 2016). Additionally, it has been observed that higher elevations experience relatively less severe impacts from flash floods (Khosravi et al. 2016). The second factor, slope, has a direct impact on the flow velocity of floodwaters, as it is intimately linked to the geomorphological process of flooding (Rahmati et al. 2016). Moreover, in the determination of flood risk zones, the proximity to river networks plays a crucial role (Osman and Das 2023).
Plan curvature refers to the curvature that occurs perpendicular to the direction of the steepest slope. It characterizes the convergence and divergence of flow across a surface. A positive value indicates that the surface is laterally convex at the respective cell, while a negative value suggests lateral concavity. A value of 0 indicates a flat surface. Alternatively, Profile curvature refers to the curvature that directly corresponds to the slope and indicates the steepness of the terrain. It represents the velocity of flow across the ground surface. A negative value signifies that the edge of the cell is convex upwards, resulting in a delayed flow. Conversely, a positive value indicates that the cell’s surface is concave upwards, promoting increased flow. A value of zero suggests a linear terrain (see Fig. 7).
It has been observed that areas in close proximity to river networks experience the highest levels of flood hazard (Fernández and Lutz 2010). In terms of the distance to streams, streams serve as the primary channels for floodwaters, and areas in close proximity to streams are more vulnerable to flooding (Opperman et al. 2009). Regarding the distance to roads, the presence of man-made roads significantly influences flooding as they can impede the natural flow of water. Highways and urban development, in particular, have been observed to decrease the infiltration rate of a region, consequently increasing its susceptibility to flooding (Tehrany et al. 2019).
On the other hand, Land cover plays a significant role in influencing the rates of runoff, infiltration, interception, and evaporation (Yalcin et al. 2011). Therefore, the land cover map serves as a critical factor in determining flood hazard (Komolafe et al. 2018). Furthermore, aspect, as a parameter, contributes to climatic characteristics such as the direction of rainfall and the intensity of sunshine, both of which have an impact on natural phenomena occurring on the Earth’s surface (Mohamed Wahba et al. 2023). Additionally, the Topographic Wetness Index (TWI) quantifies the relative wetness or moisture content of a landscape based on topographic characteristics. The TWI reflects the ability of the gravitational impact within a watershed to transport water downstream, indicating areas of higher moisture or wetness. It takes into account the relationship between slope and contributing area, providing valuable information about water flow and potential water accumulation in a given landscape (Mudashiru et al. 2022). Equation (8) denotes the estimation of TWI.
where \({\varphi }_{s}\) is the accumulation of flow in a specific watershed region, and \(\alpha\) is the slope in degree.
Likewise, the Stream Power Index (SPI) is a measure that assesses the erosive potential of surface runoff. It is commonly employed to identify effective soil conservation strategies aimed at mitigating the damage caused by excessive runoff. The calculation of SPI can be determined using Eq. (9) as described by Burrough et al. (2015). Figures 8 and 9 demonstrate the 11 environmental factors.
Results and discussion
Flood susceptibility map (FSM)
The FSM was generated employing four machine learning techniques. Subsequently, the output generated by each method was systematically categorized into five distinct classes, utilizing the equal interval tool available within the ArcMap software. Figure 10 illustrates the generated FSMs using the four adopted ML techniques.
In relation to the FSMs generated through ANN-MLP, SVR, and GBR methods, the susceptibility classification indicates moderate and low levels of flood susceptibility in the northern region of Ibaraki Prefecture. This area is characterized by high altitude and extensive tree coverage. Furthermore, the vegetation in the north plays a significant role in reducing flood susceptibility, as it contributes to high infiltration rates, which in turn diminish surface runoff volumes. Conversely, the southern urban area exhibits a significant vulnerability to flooding. This observation highlights the critical influence of urbanization on flood susceptibility, suggesting that the transformation of substantial expanses of permeable land into impervious surfaces is a key factor contributing to the region’s heightened flood risk. Furthermore, the FSM generated by the LASSO model indicates that the majority of Ibaraki Prefecture is categorized as having high to very high flood susceptibility, with the exception of a small northern area that demonstrates moderate to low levels of flood susceptibility. Figure 11 illustrates the relative distribution of each hazard class within the employed machine learning approaches.
Statistical analysis
ML can be assessed using different statistical metrics. These metrics can assess how well the ML model is efficient in prediction. Table 2 presents the application of various statistical metrics, namely Mean Square Error (MSE), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R-squared, to evaluate the accuracy of the employed machine learning approaches. Notably, the results indicate that the ANN-MLP reg method consistently demonstrates the highest accuracy across all four metrics, with the exception of MAE where the GBR method yields superior results.
From another perspective, the residual analysis has been conducted on the adopted four models. Figure 12 illustrates the residual distribution (Gaussian distribution) for the ML models. The residual distribution can be derived by using Eq. (10).
where \(\mu\) represents the mean of the distribution, and \(\sigma\) represents the standard deviation. The mean specifies the center of the distribution, while the standard deviation controls the spread or dispersion of the distribution. The square of the standard deviation, \(\sigma ^2\), is known as the variance.
Following the evaluation of the residual distribution, it has been noted that the ANN-MLP reg, SVR, and GBR models exhibit a tendency towards symmetry around their mean values and demonstrate a predominantly normal distribution. Noteworthy is the fact that among these models, the ANN-MLP reg model stands out for its ability to present the most precise normal distribution, evident in its remarkably low standard deviation. This finding substantiates the superior performance of the ANN-MLP reg model relative to the other models employed in the study. Conversely, the LASSO model demonstrates a non-uniform distribution, which serves as an indication of its diminished accuracy in prediction.
Variable importance
Variable importance refers to the significance or contribution of different variables or factors in determining the level of flood risk or hazard in the FSM. It is a measure of how much each variable affects the final outcome or prediction of flood hazards. Moreover, variable importance analysis helps in understanding the relative importance of these input variables in predicting the flood hazard. It helps identify which variables have the most influence on the flood risk or hazard levels. In this study, an assessment of the significance of each environmental factor was conducted within the framework of the employed machine learning techniques. Figure 13 showcases the relative importance percentages attributed to each environmental factor in the generation of the FSMs.
The analysis reveals the prominent role of the elevation factor in the generation of the FSMs across all utilized machine learning techniques. Elevation consistently exhibits a substantial influence, constituting approximately 60% of the overall importance percentage within each ML approach. Notably, the GBR method assigns even greater significance to elevation, accounting for slightly over 80% of its contribution in the production of the FSM. The second most influential factor in the generation of the FSMs is slope, which contributes approximately 10% to the overall process. However, the distance to the river factor plays a more significant role, accounting for around 30% of the importance in the LASSO approach and approximately 15% in the SVR approach.
By identifying the most important variables, planners and decision-makers can prioritize interventions and mitigation measures in flood-prone areas. This knowledge helps allocate resources effectively and develop targeted strategies to reduce the impact of floods on vulnerable communities.
Models validation
The employed models have undergone validation using the Receiver Operating Characteristic (ROC) curve method. Figure 14 illustrates the ROC curves for all the adopted methods. The area under the curve (AUC) has been estimated and the results are presented in Table 3. It is noteworthy that the ANN-MLP reg and SVR models exhibit higher AUC values, indicating their superior accuracy in predicting the FSMs. On the other hand, the LASSO model exhibits the lowest AUC, suggesting comparatively lower accuracy.
By excluding the plan and profile curvatures, the ANN-MLP reg model demonstrates higher accuracy with an AUC of 96.7%. Conversely, the SVR, GBR, and LASSO models exhibit lower accuracy with AUC values of 95.74%, 91.05%, and 92.53% respectively. These results suggest that the inclusion of plan and profile curvatures has a negative effect on the accuracy of the FSM generated by ANN-MLP reg. However, incorporating these factors into the other three models can improve the accuracy of the FSM. Figure 15 sketches the ROC curve for the ANN=MLP reg model after excluding plan and profile curvature factors.
Excluding the curvature factor
An alternative version of the FSM was developed using ANN-MLP model after excluding the plan and profile curvatures from the set of environmental factors. This additional step aimed to investigate the impact of including or excluding these specific factors on the performance and accuracy of the FSMs. Figure 16 shows the produced FSMs after excluding plan and profile curvatures using ANN-MLP-reg algorithm.
The results indicated that the FSM can be developed without the necessity for estimating the plan and profile curvature when employing ANN-MLP, which additionally enhances the precision of the predicted map.
Mitigation measures
Mitigation measures for flash floods involve various strategies and actions aimed at reducing the impact and likelihood of flash floods occurring. These measures can be implemented at different levels, including individual, community, and governmental levels. Here are some effective mitigation measures: (1) enhancing early warning systems constitutes a crucial measure aimed at delivering prompt alerts to inhabitants residing in flood-vulnerable zones. Particularly, focusing on Ibaraki prefecture, specifically in the southern regions and proximity to river areas, implementing and refining these systems is of paramount importance. Such mechanisms can harness cutting-edge technologies encompassing weather forecasting, rainfall monitoring, and river level gauges to effectively identify potential flash floods and proactively disseminate timely warnings to the populace, mitigating potential risks and ensuring the safety of affected communities. (2) necessitates the formulation and rigorous enforcement of zoning regulations specifically designed to restrict construction activities within flood-prone regions. Emphasizing the avoidance of building in vulnerable areas such as low-lying terrains, riverbanks, or steep slopes that could exacerbate the risks associated with flash floods. (3) building retention and detention basins to capture excess rainfall and release it slowly, reducing the intensity of flash floods downstream. (4) construct flood control structures, such as levees, embankments, and flood walls, to protect communities from flash floods and redirect water away from vulnerable areas. (5) Preserve and promote the growth of forests and other natural vegetation, as they help absorb rainfall, reduce surface runoff, and stabilize soil, thus mitigating flash flood impacts. (6) implement sustainable stormwater management practices, such as green roofs, rain gardens, and permeable pavements, to reduce surface runoff and allow water to infiltrate into the ground. (VII) elevate critical infrastructure and buildings in flood-prone areas above potential flood levels to minimize damage during flash floods.
Conclusion
This research highlights performance evaluation of four machine learning (ML) approaches in generating flood susceptibility maps (FSMs) and investigate the impact of excluding plan and profile curvature factors on the accuracy of the generated maps in Ibaraki prefecture, Japan. The four selected models, namely ANN-MLP reg, SVR, GBR, and LASSO, were trained using 70% of the data-set, which consisted of 112 flooded and 112 non-flooded spots, along with 11 environmental factors. The predictive capability of the models was assessed using the remaining 30% of the input data and validated using the ROC curve method. The results revealed that both the ANN-MLP reg and SVR models achieved high accuracy, with area under the curve (AUC) values of 95.23% and 95.83%, respectively.
Interestingly, upon excluding the plan and profile curvature factors, the accuracy of the ANN-MLP reg model significantly improved, reaching 96.7%. Additionally, the generated FSMs were categorized into five hazard levels. The northern region of the maps exhibited predominantly very low and low hazard levels. In contrast, areas situated closer to main streams and located in the southern region demonstrated a considerably higher hazard level, categorized as very high and high. The generated FSMs can provide valuable insights into regions that exhibit higher susceptibility to flash floods. These insights can be presented to decision-makers, enabling them to deliberate and adopt appropriate protective measures.
Moreover, it is highly recommended to implement mitigation measures within the high and very high hazard classifications to align with the social, environmental, and economic goals set forth by the sustainable development pillars.
Data availability
The datasets used and/or analyzed during the current study available from the corresponding author on reasonable request.
References
Abu Reza Md, Islam T, Talukdar S, Mahato S, Kundu S, Eibek KU, Pham QB, Kuriqi A, Linh NTT (2021) Flood susceptibility modelling using advanced ensemble machine learning models. Geosci Front 12(3):101075
Alfieri L, Bisselink B, Dottori F, Gustavo Naumann A, de Roo P, Salamon KW, Feyen L (2017) Global projections of river flood risk in a warmer world. Earth’s Future 5(2):171–182
Bishop CM et al (1995) Neural networks for pattern recognition. Oxford University Press, Oxford
Bui DT, Pradhan B, Nampak H, Bui Q-T, Tran Q-A, Nguyen Q-P (2016) Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibilitgy modeling in a high-frequency tropical cyclone area using gis. J Hydrol 540:317–330
Burrough PA, McDonnell RA, Lloyd CD (2015) Principles of geographical information systems. Oxford University Press, Oxford
Costache R (2019) Flood susceptibility assessment by using bivariate statistics and machine learning models-a useful tool for flood risk management. Water Resour Manag 33(9):3239–3256
Costache R, Pham QB, Arabameri A, Diaconu DC, Costache I, Crăciun A, Ciobotaru N, Pandey M, Arora A, Ali SA et al (2022) Flash-flood propagation susceptibility estimation using weights of evidence and their novel ensembles with multicriteria decision making and machine learning. Geocarto Int 37(25):8361–8393
Darabi H, Choubin B, Rahmati O, Haghighi AT, Pradhan B, Kløve B (2019) Urban flood risk mapping using the garp and quest models: a comparative study of machine learning techniques. J Hydrol 569:142–154
Elsadek Wael M, Mohamed W, Nassir A-A, Shinjiro K, Mustafa E-R (2023) Scrutinizing the performance of gis-based analytical hierarchical process approach and frequency ratio model in flood prediction-case study of Kakegawa, Japan. Ain Shams Eng J 15:102453
Fernández DS, Lutz MA (2010) Urban flood hazard zoning in tucumán province, Argentina, using gis and multicriteria decision analysis. Eng Geol 111(1–4):90–98
Gaagai A, Aouissi HA, Bencedira S, Hinge G, Athamena A, Heddam S, Gad M, Elsherbiny O, Elsayed S, Eid MH et al (2023) Application of water quality indices, machine learning approaches, and gis to identify groundwater quality for irrigation purposes: a case study of Sahara aquifer, Doucen plain, Algeria. Water 15(2):289
Giovannettone J, Copenhaver T, Burns M, Choquette S (2018) A statistical approach to mapping flood susceptibility in the lower Connecticut river valley region. Water Resour Res 54(10):7603–7618
Gomez H, Kavzoglu T (2005) Assessment of shallow landslide susceptibility using artificial neural networks in Jabonosa river basin, Venezuela. Eng Geol 78(1–2):11–27
Gotham KF, Campanella R, Lauve-Moon K, Powers B (2018) Hazard experience, geophysical vulnerability, and flood risk perceptions in a postdisaster city, the case of new Orleans. Risk Anal 38(2):345–356
Haoyuan H, Mahdi P, Ataollah S, Tianwu M, Junzhi L, Zhu A-X, Chen W, Kougias I, Kazakis N (2018) Flood susceptibility assessment in hengfeng area coupling adaptive neuro-fuzzy inference system with genetic algorithm and differential evolution. Sci Total Environ 621:1124–1141
Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction, vol 2. Springer, Berlin
Huaizhi S, Li X, Yang B, Wen Z (2018) Wavelet support vector machine-based prediction model of dam deformation. Mech Syst Signal Process 110:412–427
Huang Y, Zhang J, Ann FT, Ma G (2020) Intelligent mixture design of steel fibre reinforced concrete using a support vector regression and firefly algorithm based multi-objective optimization model. Constr Build Mater 260:120457
Janizadeh S, Pal SC, Saha A, Chowdhuri I, Ahmadi K, Mirzaei S, Mosavi AH, Tiefenbacher JP (2021) Mapping the spatial and temporal variability of flood hazard affected by climate and land-use changes in the future. J Environ Manag 298:113551
Kepeng X, Fang J, Fang Y, Sun Q, Chengbo W, Liu M (2021) The importance of digital elevation model selection in flood simulation and a proposed method to reduce dem errors: a case study in shanghai. Int J Disaster Risk Sci 12:890–902
Khosravi K, Nohani E, Maroufinia E, Pourghasemi HR (2016) A gis-based flood susceptibility assessment and its mapping in Iran: a comparison between frequency ratio and weights-of-evidence bivariate statistical models with multi-criteria decision-making technique. Nat Hazards 83:947–987
Komolafe AA, Herath S, Avtar R (2018) Methodology to assess potential flood damages in urban areas under the influence of climate change. Nat Hazards Rev 19(2):05018001
Kourgialas NN, Karatzas GP (2017) A national scale flood hazard mapping methodology: the case of Greece-protection and adaptation policy approaches. Sci Total Environ 601:441–452
Linh NTT, Pandey M, Janizadeh S, Bhunia GS, Norouzi A, Ali S, Pham QB, Anh DT, Ahmadi K (2022) Flood susceptibility modeling based on new hybrid intelligence model: optimization of xgboost model using ga metaheuristic algorithm. Adv Space Res 69(9):3301–3318
Luu C, Von Meding J, Kanjanabootra S (2018) Assessing flood hazard using flood marks and analytic hierarchy process approach: a case study for the 2013 flood event in quang nam, vietnam. Nat Hazards 90:1031–1050
Malik S, Pal SC (2021) Potential flood frequency analysis and susceptibility mapping using cmip5 of miroc5 and hec-ras model: a case study of lower Dwarkeswar river, eastern India. SN Appl Sci 3:1–22
Mohamed Wahba H, Hassan S, Elsadek WM, Kanae S, Sharaan M (2023) Novel utilization of simulated runoff as causative parameter to predict the hazard of flash floods. Environ Earth Sci 82(13):333
Mojaddadi H, Pradhan B, Nampak H, Ahmad N, Ghazali AH (2017) Ensemble machine-learning-based geospatial approach for flood risk assessment using multi-sensor remote-sensing data and gis. Geomat Nat Hazards Risk 8(2):1080–1102
Mudashiru RB, Sabtu N, Abdullah R, Saleh A, Abustan I (2022) Optimality of flood influencing factors for flood hazard mapping: an evaluation of two multi-criteria decision-making methods. J Hydrol 612:128055
Murtagh F (1991) Multilayer perceptrons for classification and regression. Neurocomputing 2(5–6):183–197
Okazaki Y, Yoshida S, Kashima S, Koike S, Matsumoto M (2022) Impact of the 2018 japan floods on prescriptions for migraine: a longitudinal analysis using the national database of health insurance claims. Headache J Head Face Pain 62(6):657–667
Opperman JJ, Galloway GE, Fargione J, Mount JF, Richter BD, Secchi S (2009) Sustainable floodplains through large-scale reconnection to rivers. Science 326(5959):1487–1488
Osman SA, Das J (2023) Gis-based flood risk assessment using multi-criteria decision analysis of Shebelle river basin in southern Somalia. SN Appl Sci 5(5):134
Otchere DA, Ganat TOA, Ojero JO, Tackie-Otoo BN, Taki MY (2022) Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions. J Petrol Sci Eng 208:109244
Pal SC, Chowdhuri I, Das B, Chakrabortty R, Roy P, Saha A, Shit M (2022) Threats of climate change and land use patterns enhance the susceptibility of future floods in India. J Environ Manag 305:114317
Panahi M, Dodangeh E, Rezaie F, Khosravi K, Van Le H, Lee M-J, Lee S, Pham BT (2021) Flood spatial prediction modeling using a hybrid of meta-optimization and support vector regression modeling. Catena 199:105114
Pandey M, Arora A, Arabameri A, Costache R, Kumar N, Mishra VN, Nguyen H, Mishra J, Siddiqui MA, Ray Y et al (2021) Flood susceptibility modeling in a subtropical humid low-relief alluvial plain environment: application of novel ensemble machine learning approach. Front Earth Sci 9:659296
Rahmati O, Pourghasemi HR, Zeinivand H (2016) Flood susceptibility mapping using frequency ratio and weights-of-evidence models in the Golastan province, Iran. Geocarto Int 31(1):42–70
Rao H, Shi X, Rodrigue AK, Feng J, Xia Y, Elhoseny M, Yuan X, Lichuan G (2019) Feature selection based on artificial bee colony and gradient boosting decision tree. Appl Soft Comput 74:634–642
Rezaie F, Panahi M, Bateni SM, Jun C, Neale CMU, Lee S (2022) Novel hybrid models by coupling support vector regression (svr) with meta-heuristic algorithms (woa and gwo) for flood susceptibility mapping. Nat Hazards 114(2):1247–1283
Roozbeh HN, Tuan N (2018) Predictive applications of Australian flood loss models after a temporal and spatial transfer. Geomat Nat Hazards Risk 9(1):416–430
Talukdar S, Ghose B, Shahfahad RS, Mahato S, Pham QB, Linh NTT, Costache R, Avand M (2020) Flood susceptibility modeling in Teesta river basin, Bangladesh using novel ensembles of bagging algorithms. Stoch Environ Res Risk Assess 34:2277–2300
Taravat A, Rajaei M, Emadodin I, Hasheminejad H, Mousavian R, Biniyaz E (2016) A spaceborne multisensory, multitemporal approach to monitor water level and storage variations of lakes. Water 8(11):478
Tehrany MS, Pradhan B, Jebur MN (2014) Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in gis. J Hydrol 512:332–343
Tehrany MS, Pradhan B, Mansor S, Ahmad N (2015) Flood susceptibility assessment using gis-based support vector machine model with different kernel types. Catena 125:91–101
Tehrany MS, Jones S, Shabani F (2019) Identifying the essential flood conditioning factors for flood prone area mapping using machine learning techniques. Catena 175:174–192
Tien Bui D, Nhat-Duc H (2017) A bayesian framework based on a gaussian mixture model and radial-basis-function fisher discriminant analysis (baygmmkda v1. 1) for spatial prediction of floods. Geosci Model Dev 10(9):3391–3409
Vaghefi SA, Keykhai M, Jahanbakhshi F, Sheikholeslami J, Ahmadi A, Yang H, Abbaspour KC (2019) The future of extreme climate in Iran. Sci Rep 9(1):1464
Vapnik V (1999) The nature of statistical learning theory. Springer science & business media, Berlin
Wahba M, Mahmoud H, Elsadek WM, Shinjiro Kanae H, Hassan S (2022) Alleviation approach for flash flood risk reduction in urban dwellings: a case study of fifth district, Egypt. Urban Clim 42:101130
Wahba M, El-Rawy M, Al-Arifi N (2024a) Integrating geographic information systems and hydrometric analysis for assessing and mitigating building vulnerability to flash flood risks. Water 16(3):434
Wahba M, Sharaan M, Elsadek WM, Kanae S, Shokry HH (2024b) Building information modeling integrated with environmental flood hazard to assess the building vulnerability to flash floods. Stoch Environ Res Risk Assess 1–21. https://doi.org/10.1007/s00477-023-02640-9
Wu Y, Zhang Z, Crabbe M, James C, Chandra DL (2022) Statistical learning-based spatial downscaling models for precipitation distribution. Adv Meteorol 2022:3140872
Yalcin A, Selçuk Reis AC, Aydinoglu TY (2011) A gis-based comparative study of frequency ratio, analytical hierarchy process, bivariate statistics and logistics regression methods for landslide susceptibility mapping in trabzon, ne turkey. Catena 85(3):274–287
Youssef AM, Pradhan B, Sefry SA (2016) Flash flood susceptibility assessment in Jeddah city (Kingdom of Saudi Arabia) using bivariate and multivariate statistical models. Environ Earth Sci 75(1):12
Acknowledgements
The primary author would like to express gratitude to the Egyptian Ministry of Higher Education (MoHE) for their support in the form of a PhD fellowship. Additionally, appreciation is extended to E-JUST for generously supplying the essential equipment and software required for the successful execution of this study. Their invaluable contributions have significantly enriched the research endeavor and are greatly acknowledged.
Funding
Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB). This research has been funded by Egyptian Ministry of Higher Education (MoHE).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wahba, M., Sharaan, M., Elsadek, W.M. et al. Examination of the efficacy of machine learning approaches in the generation of flood susceptibility maps. Environ Earth Sci 83, 429 (2024). https://doi.org/10.1007/s12665-024-11696-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12665-024-11696-x