1 Introduction

The threat posed by natural disasters related to weather and climate to society is growing and leads to massive loss of life and property (Glade and Nadim 2014). From 1970 to 2019, climate disasters accounted for 50% of all recorded disasters, 74% of all reported economic damages and 45% of all reported deaths, where over ninety-one percent of these deaths and injuries occurred in developing countries as reported by the United Nations Country Classification (Mandyam et al. 2022).

Flash floods are common in arid and semiarid regions during rain events because of the scarcity of plants and the low rate of infiltration in mountainous regions. Flash floods are a complex phenomenon that is influenced mainly by regional geology and morphometric characteristics of the watershed (Subyani and Al-Dakheel 2009). Other important factors that influence flash floods are rainfall intensity and duration, surface runoff, evaporation and infiltration rates (Nouh 2006). Flash floods have become a significant concern in urban areas worldwide. This is primarily due to the increase in the population pressure around the floodplains, leading to an increase in urbanization, which reduces the space available for infiltration and exacerbates surface runoff, often exceeding the water-bearing capacity of the catchment (Zaidi 2012). Even in very dry regions like the Arabian Peninsula, flash floods have caused huge number of fatalities (Youssef et al. 2016), billions of dollars of business losses, facilities damages (Al Saud 2010) and unnominated impacts (Kumar 2013). As a result, many countries have implemented early warning systems aiming to minimize the impact of natural hazards on human life and assets (Billa et al. 2004, 2011; Villagrán de León et al. 2006; Pengel et al. 2013).

Due to its multifaceted nature, the problem of warning communities of impending disasters quickly becomes complex. The main difficulty related to warning is to have enough time to predict the impact of floods in addition to estimating the area affected through accurate maps before floods occurrence. This is followed by reporting the prediction to the appropriate authorities, warning affected communities and evacuating those communities (Basha and Rus 2007). Thus, understanding flood risk is essential for controlling its socioeconomic and environmental effects.

The purpose of flood inundation maps is to show, at a certain area, the impact of a certain rainfall storm event. These maps support the decision-makers to minimize damages to flooded areas and assess the risks. Models of developing flood hazard maps could be classified into three types: empirical methods, hydrodynamic models and simplified conceptual models (McGrath et al. 2018). Empirical methods can only provide guidance for real-time monitoring and post-event evaluation (Liu et al. 2018). Hydrodynamic models, such as HEC-RAS, deduce inundation maps by solving one-dimensional or two-dimensional equations. These hydrodynamic models take a long running time for specific flood events to deduce high-resolution flood extent maps based on a mesh cell size and a simulation time step (Alipour et al. 2022). On the contrary, simplified conceptual models can generate flood inundation maps in a relatively shorter time (Coulthard et al. 2013).

Developing simplified methods to generate flood maps has been tackled by many researchers. Leopold and Maddock (1953) represented the quantitative equations to determine the natural stream properties such as depth, width, velocity and suspended load based on the discharge using some power functions. In this approach, the discharge at a certain location is considered as a function of its upstream catchment area.

One of the main concerns of simplified models is to determine the starting point of a stream. There are two widely used approaches to tackle this issue: Tarboton et al. (1991) recommended to consider the channel from the location where the flow accumulation is higher than a specific threshold; while Giannoni et al. (2005) developed a newer method to specify the first point of the channel where the quantity \(AS^{k}\) (A = contributing area, S = local slope, k = drainage basin-related exponent) is higher than a specific threshold.

Various methods were developed based on simplified concepts to generate flood plain maps for areas with lack of high-resolution digital elevation model (DEM) or rain data. One of these methods is generating flood-prone maps using classifiers by developing low-cost geomorphic flood plain algorithms. Many researchers worked to develop methods to define hydrogeomorphic stream networks to simplify the algorithm for generating flood plain maps based on catchment characteristics for a specific flood event (Nardi et al. 2006). Later, Nardi et al. (2008) extended the outcomes and checked the influence of generating floodplain maps in flat areas and tested various treatment methods including DEM treatment methods. Manfreda et al. (2011) developed a simple procedure for detecting areas at risk using only data of catchment topography (DEM) which are globally available. The data contained in a DEM could be summarized by a topographic index which is a good descriptor of the inundation maps, noting that the used DEMs describe the ground surface without taking into account the presence of manmade structures.

Subsequently, Degiorgis et al. (2012) developed a classifier which depends on two main variables chosen among five available ones which could be deduced from the utilized DEM. The first selected variable is the length of the stream that hydrologically links the study element to the nearest element of the drainage network, and the second selected variable is the difference in elevation between the cell under investigation and the end point of the same drainage network.

Different flood descriptors and classifiers were developed and evaluated in terms of their suitability to generate flood plain maps based on DEM and catchments characteristics (Manfreda et al. 2015). After testing many areas, Samela et al. (2016) found that the Geomorphic Flood Index (GFI) is the best classifier that could be used to define flood extent across the utilized DEM. The output inundation maps generated by GFI were compared with FEMA flood plain maps and found that, for getting the optimum value of threshold of the generated inundation maps, it should be first calibrated against a reference flood map obtained from the outcomes of a hydrodynamic model or any other regulatory authority, for an area exceeding 2% of the study area (Samela et al. 2017). A QGIS tool called Geomorphic Flood Area tool (GFA tool) was successfully developed based on the classifier of GFI concept to detect flood-prone areas at the continental scale while reducing computational time and costs, opening up new possibilities for flood risk assessment and management at large scales Samela et al. (2018). The method was improved to predict the depth of the flooding as well, which helps measure how much damage the flood caused (Manfreda and Samela 2019). According to Tavares da Costa et al. (2020), optimal GFI thresholds were also positively correlated with flood extents associated with specific return periods. The developed tool preliminary provides hydrologic and hydraulic models in those regions where there is obvious shortage in rainfall data.

In this paper, we investigate the use of the GFA tool in arid regions where flooding is rare and thus not carving the terrain. Further, we investigate the use of high- and low-resolution terrains, and test the tool in different locations with different topographic characteristics. Thus, the main objectives of this study can be summarized in the following:

  • Evaluate the generated flood inundation maps using GFA tool in arid regions comparing with the ones generated using hydrodynamic model results.

  • Evaluate the effect of changing GFI parameters and the impact of various rain depths at different return periods and DEM resolutions on the output inundation maps in arid regions.

  • Enhance flood inundation maps generated using GFA tool by evaluating different calibration criteria / methods.

  • Develop relationships between the output threshold and the rainfall depth to help rapidly obtaining a flood plain map, for an uncalibrated rain event, which could be used in early warning systems.

The paper is organized as follows. The current introduction and literature review are followed by a detailed methodology (Sect. 2) as well as the performance and evaluation criteria. Section 3 presents the study areas. The paper results and their discussion are provided in Sect. 4 and followed by the main conclusions and recommendations for future research (Sect. 5).

2 Methodology

The current section describes the methodology used to generate accurate inundation maps based on the flood descriptor GFI. The parameters of the GFI are calibrated based on generated flood maps using the two-dimensional hydrodynamic HEC-RAS model. The main investigated parameter is the threshold value of the GFI that determines whether a certain location is prone to flooding for that specific rain event. In this paper, we extend the capability of the GFA tool to map uncalibrated rainfall events through establishing relationships to predict the threshold values based on DEM characteristics of the catchment. Moreover, the GFA tool was tested on different digital elevation models to check the impact of DEM resolution on the output flood plain maps. Figure 1 describes how the hydrogeomorphic models and the linear binary classifiers GFI can be used to generate floodplain maps. Hereafter, we present the main components of each models/loop and how we checked the output for uncalibrated rainfall events.

Fig. 1
figure 1

Process of the geomorphic model

2.1 Development of a hydrodynamic model using HEC-RAS

The River Analysis System (RAS) freely available software, developed by the Hydrologic Engineering Center (HEC-RAS), is a commonly used software that allows to model from 1D steady flow to 2D unsteady flow, dam break analysis, sediment transport, as well as generating floodplain maps.

In this study, two flow regime options were utilized (1) inflows as upstream boundary condition for hydraulic simulation for part of the watercourse and (2) precipitation on the entire catchment area in the model area. In our models, shallow water equations (SWE) were used based on Saint Venant equations (Horváth et al. 2015). This method utilizes finite volumes solution algorithm for more stability and accuracy than the traditional algorithms of finite differences or finite elements. The 2D simulation algorithm starts with initial depth, continuing to the movement of water between cells calculated based on starting depth and cell face rating curve. After that, the new cell volume is calculated based on flux over computational time step. Finally, depth in each cell extracted from volume depth rating curves. This computational algorithm is robust and allows 2D cells to wet and dry. Two-dimensional flow areas can start completely dry and can handle a sudden rush of water into them. The spread of water over a surface is dependent on the roughness of the surface. Manning’s n value is used to define roughness of the main watercourse and its floodplain or on the entire catchment. HEC-RAS outputs are used as the flood maps benchmark.

2.2 Simplified model: the GFA tool

The GFA is a tool that utilizes the GFI to generate inundation maps based on geomorphic indices extracted from terrains and existing inundation maps obtained for a portion of the area of interest. This is accomplished by subdividing the study area to small cells utilizing the DEM and then categorizing each cell inside the area of interest domain into two groups: flooded areas and non-flooded areas (Samela et al. 2017).

The GFA tool requires geomorphic data to calculate the GFI to derive floodplain maps. The geomorphic data could be extracted from terrain. The required input files are:

  • DEM with voids and/or local low points (sinks): the original digital elevation model from any source, either SRTM, ALOS, LIDAR or contour map information.

  • DEM fill: the raster of DEM is filled using a suitable tool (ArcGIS, QGIS, etc.) by removing local sinks and peaks in the data, which are spurious cells with elevations greater or lower than what would be expected.

  • Flow direction: the raster of flow directions from each pixel to its steepest downslope neighbor. Using the D8 direction method by comparing the elevation relative to 8 (eight) surrounding cells, flow directions are determined based on the direction of the steepest slope.

  • Flow accumulation: the raster of accumulated flow into each pixel, as determined by accumulating the number of all pixels that flow into each downslope cell.

  • Part of the existing floodplain map: to calibrate the GFI classifier and produce maps for the sub-scheme, a reference flood map for a minimum of 2% of the domain is used to estimate floodplain maps for the remaining areas (Samela et al. 2017). The reference floodplain could be obtained from FEMA, NOAA, USGS in the USA or authorities already developed floodplain maps or could be generated from any hydrodynamic models (HEC-RAS, InfoWorks ICM, Urban flow) as it is the case in this research.

2.3 Preprocessing steps

After preparing the input data, the next step in the GFA Tool is to select one of the following methods to identify the starting point of each stream inside the study area.

  • \(AS^{k}\) method: The start of the channel is determined where the flow accumulation exceeds a specific threshold (Tarboton et al. 1991).

  • The channel starts from the location where the flow accumulation exceeds a specific threshold (Tarboton et al. 1991).

This is needed as the main GFA process (described in the next subsections) realize on the calculations of the catchment area upstream of a certain cell.

2.4 The GFA process

The value of GFI at each cell is estimated as Eq. (1):

$${\text{GFI}} = \ln \frac{{h_{r} }}{H} = \ln \frac{{a A_{r}^{n} }}{H} = \ln \left( a \right) + n\ln \left( {A_{r} } \right) - \ln \left( H \right)$$
(1)

where \(h_{r}\) is the difference between the water level and the stream bed level [m], \(A_{r}\) is the contributing area of water runoff to the point of interest [km2], a is a scale factor, n is the exponent (dimensionless) and H [m] is the difference between the elevation of each cell and the elevation of the nearest stream point of the identified path, as shown in Fig. 2. The water depth (\(WD\)) could be calculated for every cell of the floodplain areas as follows (Manfreda and Samela 2019):

$$WD = h_{r} {-}H$$
(2)
Fig. 2
figure 2

Main components of the GFI classifier [after Manfreda et al. (2015)]

If the hydraulic model under study is not covering the entire catchment area and an inflow hydrograph is imposed at the upstream of the hydrodynamic model, the total contributing area—input in Eq. (1)—is in fact subdivided into two sub-areas. Sub-area 1 (shown hatched in Fig. 2) comprises the area upstream of the hydrodynamic model. It is determined via catchment delineation software, using the common, flow direction, flow accumulation, etc. procedure. Moreover, the remaining Sub-area 2 (part not hatched in Fig. 2 accounts for the part within the hydrodynamic model boundary. As we move further toward the downstream, the contributing area (within the model boundary) increases, added to Sub-area 1 (at the upstream of the model as previously mentioned). It should also be highlighted that the catchment area at the most downstream of the model is equal to Sub-area 1 + the model area, and as such, the part that is modeled via the hydrodynamic model (as well as the GFI tool) constitutes a complete sub-catchment area.

The GFA tool then normalizes the GFI values using as much as needed of iterations to calibrate the optimal threshold (t) that produces the flood potential at a required location (i.e., WD ≥ t, the cell is considered flooded). Moreover, it expands the classification between flood-prone and non-flood-prone locations to cover the entire study area.

The GFA uses a confusion matrix as shown in Table 1A, where each column represents the reference classification and each row represents the predicted classification. Table 1B illustrates four different performance measures which are used to calibrate the optimal threshold (t). The optimal threshold (t) value for a specific inundation map corresponding to a certain period is calculated by testing the threshold of the classifier in the range from − 1 to 1 and then comparing the resulting GFI classifier with the floodplain maps used for calibration.

Table 1 Confusion matrix with the performance measures to assess the calibration

2.5 Performance criteria

Four various performance criteria were investigated to compare between each cell in the reference maps and the corresponding one in the predicted maps of each GFI value using the confusion matrix (Fathi et al. 2021) to obtain the optimum GFA threshold:

  • The receiver operating characteristic (ROC) graph is a method to quantify a binary classifier as its limit (Fawcett 2006). The ROC values in Fig. 3 are calculated by the scatter plot of the true positive (\(R_{{{\text{TP}}}}\)) against false positive (\(R_{{{\text{FP}}}}\)) rates as illustrated in Table 1B. Equation (3) presents the calculation of the area under the curve (Bradley 1997).

    $${\text{ROC}} = \frac{{{ }1 + R_{{{\text{TP}}}} - R_{{{\text{FP}}}} }}{2}$$
    (3)
  • Matthew’s correlation coefficient (MCC) is a method to assess binary (two-class) classifications (Matthews et al. 1979). This measure is similar to the Pearson correlation coefficient in its interpretation. The coefficient takes into account true and false positives and negatives. It is generally regarded as a balanced measure which can be used, even if the classes are of very different sizes. The MCC can be calculated directly from the confusion matrix using the formula:

    $${\text{MCC}} = { }\frac{{{\text{TP}}*{\text{TN}} - {\text{FP}}*{\text{FN}}}}{{\sqrt {\left( {{\text{TP}} + {\text{FP}}} \right)*\left( {{\text{TP}} + {\text{FN}}} \right)*\left( {{\text{TN}} + {\text{FP}}} \right)*\left( {{\text{TN}} + {\text{FN}}} \right)} }}$$
    (4)
  • Fowlkes–Mallow’s index is a third performance criteria. A higher value for this index indicates a greater similarity between the clusters and the benchmark classifications, when results of two clustering algorithms are used to evaluate the results. The Fowlkes–Mallow’s index is written as (Halkidi et al. 2001):

    $${\text{FM}} = { }\sqrt {\frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}}{ }*{ }\frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}}}$$
    (5)
  • F-score index is calculated using the precision and recall, where the precision is the number of true positive results divided by the number of all positive results, including those not identified correctly, while the recall is the number of true positive results divided by the number of all samples that should have been identified as positive. It is formulated as follows (Sasaki 2007):

    $$F1 = { }\frac{{2{\text{ TP}}}}{{2{\text{ TP}} + {\text{FP}} + {\text{FN}}}}$$
    (6)
Fig. 3
figure 3

Receiver operating characteristic (ROC) graph, drawn by CMG Lee based on http://commons.wikimedia.org/wiki/File:roc-draft-xkcd-style.svg

2.6 Assessment methods

After obtaining the optimum threshold (t), three independent measures were calculated to further assess the correspondence between the inundation maps generated by the hydrodynamic models and those by the GFA tool using the following equations as shown in Table 2.

Table 2 Assessment methods for the obtained threshold

As the objective is to predict a flood extent envelope, overpredicting the flood‐prone areas might benefit the TP and inflate the \(R_{{{\text{TP}}}}\) and accuracy. This inflation would be of concern if there was no other reported measure that would give an alternative account of the performance. By reporting the \(R_{{{\text{FP}}}}\), an account of the percentage of cells that are overpredicted is given. At the same time, the critical success index extends the \(R_{{{\text{TP}}}}\) by including the FP, accounting for both underprediction and overprediction. The error bias gives the ratio between the FP and the FN, indicating whether there is a tendency for underprediction or overprediction. The reporting of these three measures should give a reasonable overall account of the performance.

2.7 Regression analysis

Regression analysis is a technique used to develop relations between two or more variables. In the context of assessing flood hazard using the GFI, regression analysis was employed in this research to establish a quantitative relationship between GFI values and rainfall depth, DEM cell size or other variables, with the purpose of understanding the influence of rainfall on flood extend maps produced by the GFA tool.

The regression analysis allows deriving the mathematical relationship between GFI values and rainfall depth. This equation can then be used to predict the flood potential based on an uncalibrated rainfall of 24-h depth. By establishing such a relationship, decision-makers and urban planners can make informed decisions regarding flood management strategies, land-use planning and disaster preparedness. Additionally, it can aid in the development of early warning systems that utilize 24-h rainfall depth to anticipate potential flood events and take appropriate preventive measures.

3 Study areas

In recent years, Saudi Arabia has suffered from flash floods which caused major losses in lives and damages. This study encompasses 3 areas as shown in Fig. 4 and Table 3. The first study area for catchment (A) is located in Al-Baha province with high-steep terrain at its upstream and flatten terrain at the downstream, which could be a good setting to evaluate GFA tool. Another study area (B) covers sections (not an entire catchment) of Wadi Hanifa in Riyadh to test the ability of the tool to generate flood maps for part of a stream. The third study area (C) is for a portion of Wadi Bayer (Al-Jouf, Saudi Arabia) in a flat area at the downstream of the wadi. In latest studies, the GFA tool was tested on DEMs with low resolutions 30m for big streams. In the current study, the high-resolution DEMs were resampled for different resolutions from 1.0 to 90.0 m and then utilized in generating flood plain maps using the GFA tool. The tool was also tested for various rainfall depths corresponding to return periods (5, 10, 25, 50 and 100 yr). It is worth mentioning that the area denoted as model area in Table 3 is the one where the HEC-RAS and the GFI tool are applied, while the catchment area upstream is the one at the upstream of the model boundary used in the calculation of Ar in Eq. (1).

Fig. 4
figure 4

Study areas and the utilized DEM

Table 3 Summary of the characteristics of the 3 study areas

4 Results and discussion

The methodology described in Sect. 2 was followed to allow assessing the performance of the GFA tool and to deduce flood extents for uncalibrated rainfall events. First, the benchmark flood maps for the study areas were generated using HEC-RAS 2D models, with various DEM resolutions (cell sizes) and for different terrain types. The 2D models are using two types of boundary conditions: flow hydrographs upstream of the 2D flow area and rain-on-grid using excess rainfall for the entire 2D catchment area. The input hydrographs and excess rainfall depths were used for rainfall events corresponding to various return periods (5, 10, 25, 50 and 100 yr).

The assessment of the performance of the GFA tool was analyzed using the following process:

4.1 Benchmarks flood extent maps

The obtained benchmark maps, in the 1.0-m-resolution DEM, illustrate the maximum flood extents using the input hyetographs or upstream hydrographs. For study areas (B) and (C), an inflow hydrograph is added as upstream boundary condition to account for the flow from the contributing upstream catchment of the modeled area. As shown in Fig. 5 for study area (A), the flood extent for 100-yr return period is narrow at the upstream and gets wider at the downstream, where the small streams are joining to form a large wadi. For area (B), the wadi flood extent for the 100-yr return period is well defined as the cross section of the wadi is V-shaped defined and the water depth is high. For area (C), the flood extent for 100-yr return period is wide as the area is almost flat, and similar results were obtained for the resampled resolutions of the DEM.

Fig. 5
figure 5

Maximum flood extent of the 100-yr return period for the three study areas

4.2 Sensitivity of the results to the used evaluation criteria in the calibration

The GFA tool uses a minimum of 2% of the area of the input flood maps generated using HEC-RAS 2D models as a reference for the calibration process. The tool converts the reference floodplain map into a binary map, where “1” represents flooded areas and “0” represents unflooded areas. To correctly simulate the flood extent, the threshold parameter must be calibrated. The threshold ranges from − 1 to 1 where (− 1) means maximum width of streams and (1) means minimum width of streams. For study areas (B) and (C), the total contributing area is considered in the calculation of GFI values and output thresholds.

Four evaluation criteria were tested: ROC, MCC, F1 and FM to obtain the optimum value of threshold and detect the flooded areas at different return periods. Figure 6 indicates that the GFA tool correctly classifies the raster cells as flood-prone areas and non-flood-prone areas with a probability of success that depends on the terrain types. Figure 6 as an example shows that, for the study area (A), the GFA tool has also a high success rate. It accurately interpolates and estimates missing flood extent. However, in few locations, the GFA tool was not able to accurately simulate it. For defined wadi’s topography such as study area (B), the GFA tool could generate flood maps with high accuracy very similar to the reference maps. For the study area (C), the generated flood extent maps using the GFA tool are wider than the reference maps become of the flat topography.

Fig. 6
figure 6

Flood inundation maps for 100-yr return period for the study areas

The obtained values of the thresholds for study area (A), shown in Table 4A, range between − 0.259 and − 0.747. In general, the absolute value of threshold increases with the increase in the rainfall and flood extent. For low-rainfall events, the four evaluation methods resulted in similar threshold values and same extents. The values obtained from the MCC and ROC methods are very similar across return periods. However, the values of thresholds obtained using F1 and FM methods are almost equal for return periods less than 25 yr. Comparing the four methods for the 100-yr return period, as shown in Table 4B, the results show that MCC and ROC methods give better results with accuracy higher than 85%, and critical success index associated with using MCC is about 0.89, which indicates a high similarity between the obtained flood extent maps and the reference maps. Moreover, the error bias is about 0.80 which also indicates that the obtained flood extent maps show a number of underestimated cells higher than the number of overestimated cells as compared to the reference maps.

Table 4 Obtained threshold values and comparison between calibration methods for study area (A)

The obtained threshold values for study area (B) are shown in Table 5A. As the wadi is well defined, the absolute values of thresholds are convergent and range from 0.5 to 0.75. The output thresholds using F1 and FM methods are almost equal. Table 5B shows that the investigated four methods are good for calibration for study area (B) with an accuracy higher than ninety percent. Similar to study area (A), the MCC is the best method among the four methods to obtain the optimum threshold with a high accuracy of 97% and flood extent maps similar to flood maps generated using hydraulic models (HEC-RAS). Furthermore, critical success index associated with using MCC is about 0.97 and the error bias is about 1.04 which also indicates that the obtained flood extent maps are neither underestimating nor overestimating the reference flood extents.

Table 5 Output threshold values and comparison each calibration method for study area (B)

Table 6A presents the obtained values of threshold using the four calibration methods for study area (C). The study area has flat slopes with a wide flood extent. Thus, the absolute values of thresholds are high and range between 0.68 and 0.75. Moreover, the difference between the values of thresholds is small for various return periods. Table 6B shows that the four methods are nearly acceptable to obtain the optimum thresholds with accuracy higher than 75%. However, the MCC gives the highest accuracy of 85% and the critical success index associated with using MCC is about 0.81 while the error bias is about 1.24 which indicates that the obtained flood extent maps are overestimated compared to the reference maps.

Table 6 Output threshold values and comparison each calibration method for study area (C)

From the above investigation and the obtained evaluation criteria values, both ROC and MCC are best as calibration criteria with an advantage of MCC in all 3 terrain types (study areas) as the MCC produces better accuracy and critical success index with an error bias close to unity.

4.3 Sensitivity analysis of GFI parameters

Various values of parameters a and n were examined to check the sensitivity of changing each parameter on the obtained threshold. For the same study areas, results show that different values of parameter (a) with a fixed value of (n) have no impact on the obtained values of threshold (t). On the contrary, changing the value of parameter (n) has a significant effect on the threshold. The optimum values of n range from 0.1 to 0.5 to get the optimum values of threshold with high accuracy as shown in Table 7. For defined wadis with suitable slopes, such as study area (B), the value of parameter n has no impact on the obtained optimum value of threshold.

Table 7 Obtained threshold values for each n value for three study areas

4.4 Relationships between rainfall depths and threshold values

Figure 7 presents an overall visualization of this sensitivity analysis results. In general, there is an increase in the absolute value of threshold with the increase in the rainfall depth. The regression analysis shows that square root transformation of the rainfall depth provides the highest R2 between rainfall depth and threshold (generated for the thresholds related to the 5-yr to 100-yr return periods, obtained using MCC as calibration criterion).

Fig. 7
figure 7

Relations between rainfall depths and thresholds for the study areas

Table 8 shows that there are strong relationships between rainfall depths and thresholds with low standard error, significant p-values for all parameters and high adjusted R2. A rainfall depth value between rainfall depths of 50- and 100-yr return period is utilized to test the predicted thresholds using developed relations.

Table 8 Results of regression analysis between rainfall depths and thresholds for the three study areas

As shown in Table 9, the predicted threshold shows a high skill in obtaining flood extent maps similar to the generated flood extent maps using hydrodynamic models. The accuracy of flood extent maps obtained from predicted threshold is from 87 to 98%. Furthermore, the critical success index is very high which indicates high rate of true positive values. Moreover, the error bias is close to unity which means that number of underestimated cells is almost equal to the number of overestimated cells. Figure 8 shows flood extent maps generated using HEC-RAS and flood extent maps generated based on the value of threshold calculated from the developed regression equations.

Table 9 Predicted threshold values for uncalibrated rainfall depth
Fig. 8
figure 8

Flood plain maps for tested rainfall events for the study areas

4.4.1 Sensitivity to the DEM resolution

Finally, to investigate the sensitivity of the GFA tool results to the DEM resolution, the digital elevation model of study area (A) (originally of 1.0m resolution) was resampled at different DEM cell sizes (3.0, 5.0, 10.0, 30.0 and 90.0 m). Figure 9 shows that the relationships between DEM cell size and the obtained threshold values have an asymptotic pattern.

Fig. 9
figure 9

Threshold values with various DEM resolutions for different return periods

As above shown in Fig. 9, the value of the threshold is constant for various return periods in DEM resolution of 90 m. As a general trend, the threshold tends to be lower (more negative) with coarser resolutions. In fact, the relationship between DEM resolutions and the obtained threshold values shows similar patterns for DEM of 1m to 10m resolution. The threshold values converge starting from 30-m-resolution DEM.

5 Conclusions

The GFA is a valuable tool for assessing flood risk and understanding the potential impacts of flooding in a given area. The tool utilizes advanced geomorphic analysis techniques to evaluate the likelihood and severity of flooding. One of the key strengths of the GFI tool is its ability to integrate multiple data sources and generate comprehensive flood risk maps, as demonstrated by the current study. It is also important to acknowledge that the accuracy and reliability of the GFI tool depend on the quality and availability of input digital elevation model data.

We have investigated the capabilities of the GFA tool to generate inundation maps and detect flood-prone areas in various locations in arid regions using digital elevation models with high resolutions and applied various rainfall depths corresponding to different return periods. The main conclusions drawn from this research could be summarized as follows:

  • In the GFA method, the resulting value of threshold is not affected by changing the value of parameter (a) while fixing the value of parameter (n) (i.e., no impact on flood extent). On the other hand, the obtained value of threshold (t) changes noticeably when the value of parameter (n) is changed. The best values of parameter (n) range from 0.1 to 0.5 in order to obtain the optimum threshold and increase the accuracy of the flood extent maps.

  • To get more accurate inundation maps in arid regions, the MCC evaluation criterion is compared to ROC, F1 and FM evaluation criteria. The outputs show that MCC method is the best calibration criterion to get optimum threshold values for different types of topographic. Moreover, for large defined wadis MCC and ROC methods could be similar in obtaining the optimum value of threshold.

  • The GFA tool is best suited to predict flood extent maps for areas with high slopes and defined streams. Conversely, generating flood extent maps in flat areas with low slopes requires more enhancement and research.

  • We found that the threshold value has strong relationship with the square root of the rainfall depth, which extends the applications of the GFA tool to generate inundation maps for uncalibrated rainfall events. Nevertheless, the application of the derived equation is limited to return periods from 10 yr to 100 yr (or even above).

  • For areas with small streams widths, especially in mountains areas, to generate accurate flood extent maps, it is recommended to use digital elevation models with high resolution (cell size equal or less than 10 m). For coarser resolutions, the threshold values are similar for various rainfall depths (i.e., the same flood extent is predicted regardless of the input rainfall depths).