Introduction

Floods are the most frequent type of natural disaster happening everywhere in the world. The severity of flooding is very visible in a country where there is no sufficient structural affordability due to financial limitations (Chen et al. 2014; Abaya et al. 2009). Currently, river floods are a global issue causing a serious problem to the residents living in the riverside (Cirella and Iyalomhe 2018). In Africa, the number of households displaced and left without shelter by this disaster is dramatically increasing (Thiemig et al. 2011; Moges 2007; Dessalegn et al. 2017) and Ethiopia is not exceptional to this problem. According to the report obtained from International Disaster Data (IDD) of the 2017 and 2018 (Ababa 2018), flooding incidents were frequently seen in many parts of Africa (Thiemig et al. 2011), particularly in East Africa. Ethiopia is one of the East African countries where the severity is relatively high (Haile et al. 2013; Tarekegn 2009; Desalegn et al. 2016). The topographical conditions, heavy rainfall, river bank overflowing, sudden destructions of river banks, inadequate urban drainage systems, steep slope in channel design, and land use land cover change have made the country more vulnerable to floods (Lamichhane and Sharma 2018; Mosavi et al. 2018). According to national disaster report obtained from FDPPA (2007) (Mengistu et al. 2016), the historical flood events recorded in this river resulted in the loss of life (Imanshoar et al. 2014; Ho and Lee 2015; Desalegn et al. 2016), left residents of the area without shelter, destructed infrastructures, transmissible diseases, and livelihoods (Broxton et al. 2014). Flood risk is increasing in flood plain areas due to population growth and property (Abaya et al. 2009) and the problem is aggravated by the impact of climate change.

To minimize the impacts of this natural disaster, different hydrological models (such as Physical based, Conceptual, Empirical, and probabilistic) are implemented for flood forecasting by researchers worldwide (Shibuo et al. 2016; Devia et al. 2015; Siccardi et al. 2005; Shamseldin and O’Connor 2010). Based on the method implemented for explaining the connection between the input and output, flood forecasting models can be categorized as Physical based models, Conceptual models, and Black-box models (Shamseldin et al. 1999). Black-box models are purely empirical and spatial or physical processes (Mengistu et al. 2016; Goswami and O’Connor 2005) are excluded and the hydrological process result of this model is completely governed by the metric relationship between the input and output parameters. The other commonly used flood forecasting model is the physical based model (Shibuo et al. 2016), in which complex physical characteristics and the dynamic nature of a watershed are considered. This model is more appropriate when inputs for hydrologic processes are huge and high temporal resolution in the computation is required. The compound in nature of the hydrological process and the non-linearity characteristics of input parameters make it difficult to select the appropriate model for flood forecasting. The nature of the watershed, the purpose of modeling, the appropriateness of the model, and the quality of input parameters such as rainfall, temperature, humidity, land use land cover, and spatiotemporal variability of the inputs can affect the reliability of the flood forecasting model (Toth et al. 2000; Ateeq-ur-Rauf et al. 2016; Lateef Ahmad Dar 2017).

In recent years, Artificial Neural Networks (ANNs) have been developed as an alternative method to hydrological modeling of stream flows (Grimes et al. 2003; Chang et al. 2007; Ateeq-ur-Rauf et al. 2016; Shamseldin, 2010). A neural network is a machine learning that focuses on an information processing algorithm to solve a non-linear nature of the hydrologic process (Ateeq-ur-Rauf et al. 2016; Shamseldin and O’Connor 2010; Campolo et al. 2003; Barbetta et al. 2016) by linking input parameters with weights in the network. ANN is a data-driven model that has been developed in a recent year and the application of this model in the hydrological model improved the uncertainty in space and time. The determination of the magnitude of incoming flood peak and the probable time of occurrence of the flood can be estimated by several models (Ligaray et al. 2015; Asadi 2013; Dogan et al. 2007) and the selection of a specific model and its accuracy is generally governed by factors such as availability of input parameters, the skills of the forecaster and the knowledge with the watershed.

The integration of different models in the areas of hydrologic and hydraulic models is getting global attention and has a paramount role in flood risk management strategies (Chang et al. 2007; Abaya 2008). Flood inundation mapping is a difficult task that needs a combination of high quality and observed data to verify the performance of the models (Lohani et al. 2012; Seenu 2019; Duvvuri and Narasimhan 2013). The application of machine learning (ANN) in areas of hydrologic processes is a recently evolving approaches and has been applied in rainfall-runoff modeling (Riad et al. 2004), daily water supply–demand (Akhtar et al. 2009), streamflow computation (Veintimilla-Reyes et al. 2016; Poonia 2018), extreme hydrologic event analysis and generation of the unit hydrograph (Mengistu et al. 2016). A feedforward ANN structure is commonly used in the one-way computation of the hydrologic process, in which inputs are pushed forward until the rough result is obtained. The main objective of this study is to generate flood-prone areas using ANN as hydrological model and HEC-RAS as hydraulic modeling. The two models are integrated to improve the spatiotemporal uncertainties in traditional flood forecasting models. Thus, the improvement of accuracy related to space and time is presented as the novelty of this integrated ANN and HEC-RAS models.

Materials and methods

Study area

The Baro Akobo basin is located in the southwestern part of Ethiopia. Geographically, it is located, between latitudes 5° 31″ and 10° 54″ north and longitude 33° and 36° 17″ east. The River basin (Fig. 1) is the fourth largest basin in the country, covering an approximate area of 74,100 km2. The western, northwestern, and southwestern sides of the basin are bordered with South Sudan; the northern and northeast sides are bordered with the Abay river basin; and the east and southeast are bordered by the Omo-Gibe river basin. The River originates from the highlands in the southwest part of Ethiopia and flows across the low-lying plains. The most recent (2015) flood event occurred in the river basin forced eviction of around 2,000 peoples out of their homes (Alemayehu 2016; Thiemig et al. 2013; Woube 1999; Abaya 2008).

Fig. 1
figure 1

Map of rainfall stations, River gauging stations, and Baro Akobo Basin in Ethiopia

Data and software used

In this study, ArcGIS (ver.10.4), RStudio, and HEC-RAS (ver. 5.0.1) were used to prepare an inundation map, to develop ANN predictive hydrological model, and to model the river flowing in the natural channel, respectively. All packages are supported by student license and open-source privileges. For the predictive ANN hydrological model training, both spatial (Topographical wetness Index) and temporal (7-year daily Rainfall and Temperature) data (Table 1) were used. The spatial resolution of 30 × 30 m pixel size was implemented and all input parameters were prepared based on the fixed grid.

Table 1 Point rainfall and temperature stations in the study area

Hydrologic modeling

Feedforward artificial neural network (ANN) model

To begin with the modeling, the input parameters for hydrological modeling were prepared based on spatiotemporal variations. Daily rainfall (R) and temperature (T) data of the same period (1999–2005) were first distributed on the spatial resolution of 30 m × 30 m. Inverse Distance Weighted (IDW) was used to convert the point climate data (rainfall and temperature) into spatial data, and with the same spatial resolution, the Topographical Wetness Index (TWI) was prepared. The prepared spatiotemporal data were normalized and squashed in between 0 and 1. A feedforward network is selected in this study based on the studies conducted by de Vos and Rientjes (2005), Poonia (2018), Abhishek et al. (2012), Hung et al. (2009), and Arun and Baskaran (2013). For this class of ANN architecture, R, T, and TWI were assigned to the networks (Dolling and Varas 2002; Tayebiyan et al. 2016). Random initial weights were generated and assigned to the ANNs between input and hidden nodes, hidden and output nodes. The input nodes labeled as 1, 2 and 3 receive the normalized (Eq. 2) input parameters and connected to hidden nodes labeled as 4, 5, and 6. The synoptic links (weights) between the input and hidden nodes were assigned with weights labeled as the 1st, 2nd and 3rd rows of the weight matrix in (Eq. 1), and as well as the weights between hidden and output nodes were assigned with the weights labeled as the 4th row of the matrix (Amengual et al. 2007). Once the input nodes receive the normalized input parameters (rainfall, temperature, and TWI), then the weighted sum of the input parameters and initial weights (Eq. 1 and Fig. 2) reached the hidden nodes and activated using sigmoid activation function (Veintimilla-Reyes et al. 2016; Napolitano 2011; Abdulkadir et al. 2012). A sigmoid function (Arun and Baskaran 2013; Šimor et al. 2012; Agatonovic-Kustrin and Beresford 2000) (Eq. 3) is used to activate the values in the hidden nodes and then multiplied and summed up with the assigned random weights between hidden and output layers (labeled as 8):

$${\text{Weights}} = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {w_{11} } \\ {w_{21} } \\ \end{array} } \\ {w_{31} } \\ {w_{41} } \\ \end{array} \begin{array}{*{20}c} {\begin{array}{*{20}c} {w_{12} } \\ {w_{22} } \\ \end{array} } \\ {w_{32} } \\ {w_{42} } \\ \end{array} \begin{array}{*{20}c} {\begin{array}{*{20}c} {w_{13} } \\ {w_{23} } \\ \end{array} } \\ {w_{33} } \\ {w_{43} } \\ \end{array} \begin{array}{*{20}c} {\begin{array}{*{20}c} {w_{14} } \\ {w_{24} } \\ \end{array} } \\ {w_{34} } \\ {w_{44} } \\ \end{array} } \right]$$
(1)
$${\text{Normalization}} = \frac{{X - X_{{{\text{min}}}} }}{{X_{{{\text{max}}}} - X_{{{\text{min}}}} }}$$
(2)
$${\text{Activation function}} = \frac{1}{{1 + e^{x} }}$$
(3)
Fig. 2
figure 2

Assigned initial weights in the ANNs

Training ANN hydrologic modeling

Back propagation

In the feedforward propagation, the input parameters are pushed forward to get the rough solution at the output node and does not take any account to minimize the error between the result obtained from network and target output (Dar 2017; Tayebiyan et al. 2016; Malmgren and Nordlund 1996). The initial weight values assigned in the feedforward processes are just to start the modeling, and the accuracy of the model is very low at this stage (Shamseldin and O’Connor 2003). The main importance of backpropagation (Fig. 3) is to spread the error back into the networks to minimize the error obtained in the feedforward process (Sattari et al. 2017; Timbadiya et al. 2011). The overall error obtained at the output layer starts to propagate back into the networks from the output node to the entire networks (Mai and De Smedt 2017). Training in a sense meaning that the network learns from the mistakes through the built-in learning algorithm in ANNs (Abhishek et al. 2012; Hawkin 2014).

Fig. 3
figure 3

The conceptual ANN artichitecture for feedforward and back propagation processes

HEC-RAS model

The HEC-RAS software is a computer program developed for modeling river flowing through open natural channels and used for computing water surface profile (Mapping and Field 2017; Lamichhane and Sharma 2018; Duvvuri and Narasimhan 2013). HEC-RAS get accepted and being used for river simulation by hydraulic engineers and different researchers (Marimin et al. 2018) because of its capabilities and abilities to simulate unsteady flow and identifies flood-prone areas where the surface ground level is lower than the computed water profile and allows the researcher to visualize the flood extent along a river course (Maidment 2017; Timbadiya et al. 2011). The river geometries such as centerlines, bank lines, flow paths, and cross-sectional lines are the major parameters processed in HEC-RAS to generate flood-prone areas. A Digital Elevation Model (DEM) of 12.5 m × 12.5 m pixel resolution downloaded from https://asf.alaska.edu/ (Li 2010) was used as input to extract the major parameters. A flood inundation map generated in this study is to provide information on the spatially distributed depth of flood and prone areas (Parhi 2013) along the Baro River. A coupled 1D and 2D models were implemented in this study to generate the depth and prone-prone areas along the Baro River (Enea et al. 2018). HEC-RAS model received the result (runoff) from the tested ANN hydrological model as input and gave the information on the spatial extent and depth of flooding along the river.

Integrated ANN and HEC-RAS models

The trained and tested ANN predictive hydrological model developed in this paper was to generate runoff (Dawson and Wilby 2010; Biragani 2016), and linked to the HEC-RAS model to generate the flood extent along the river (RAJURKAR et al. 2010). Whenever the ANN model receives input parameters (Rainfall, temperature and Topographical Wetness Index) and computed runoff, the HEC-RAS is ready to accept the result (runoff) it as input to generate the information on spatial distribution of flood and prone areas along the river (Fig. 4).

Fig. 4
figure 4

ANN and HEC-RAS integrated conceptual framework (Source: Author)

The final corrected and updated values of weights in the ANN model are used to generate the runoff values whenever the input parameters are sent to the input nodes.

Model calibration and validation

ANN model evaluation

The terms calibration/training and validation/testing are commonly used instruments for accuracy of the model (Parhi 2013; Desta and Lemma 2017; Chuma et al. 2013). The performance of ANN hydrologic model result was trained with 7-year (1999–2005) climate data (rainfall, and temperature), and Topographical Wetness Index (TWI) with the target data (observed daily discharge) of the same periods (1999–2005) and also tested with 3-year (2006–2008) observed daily discharge. The performances in both periods (training and testing) were evaluated by Nash–Sutcliffe Efficiency (NSE) using the following equation:

$${\text{NSE}} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} (Q_{O} - Q_{S} )^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} (Q_{O} - Q_{{{\text{mean}}}} )^{2} }}$$
(4)

where Qo is observed discharge (m3/s), QS is simulated discharge (m3/s), and Qo is mean discharge (m3/s).

However, Nash–Sutcliffe Efficiency (NSE) alone cannot give us the information on the model bias. The fitness of simulated versus observed evaluated in NSE should be supported by additional statistical error index model evaluation method. Therefore, the PBIAS as statistical error index model evaluation method was also used to check whether the model result was overpredicted or underpredicted and the equation for this model evaluation presented in Eq. (5) (Ouali and Cannon, 2018; Pérez-Sánchez et al., 2019), and the corresponding criteria of fit for hydrological modeling for both model evaluation techniques is summarized in Table 3.

$${\text{PBIAS}}\left( \% \right) = \left[ {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Y_{{i^{{\left( {{\text{Obs}}} \right)}} }} - Y_{{i^{{\left( {{\text{Sim}}} \right)}} }} } \right) \times 100 }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Y_{{i^{{\left( {{\text{Obs}}} \right)}} }} } \right)}}} \right]$$
(5)

where \(Y_{{i^{{\left( {{\text{Obs}}} \right)}} }}\) and \(Y_{{i^{{\left( {{\text{Sim}}} \right)}} }}\) are observed and simulated, respectively.

Further, the ANN predictive hydrological model result was evaluated to describe the proportion of the variance between the observed and simulated values with the coefficient of determination, R2. The general equation by which this coefficient is computed for the evaluation of a model is presented in Eq. (6) (Pérez-Sánchez et al. 2019), and the acceptable range for hydrological modeling is given in the Table 2.

Table 2 Model goodness of fit (Pérez-Sánchez et al. 2019; Wang et al. 2017a, b)

HEC-RAS model evaluation

The runoff values obtained in ANN hydrological model was used as input in HEC-RAS to generate flood inundation areas. The inundation map generated in HEC-RAS during training (1999–2005) and testing (2006–2008) periods were checked with the water body delineated using Normalized Difference Water Index (NDWI). The flood events of 2005 and 2008 were detected while delineating the water body in NDWI (Ali et al. 2016; Enea et al. 2018). NDWI (Eq. 6) uses Green (Band-2) and near infrared (Band-4) bands of remote sensing images to extract a water body in which near-infrared (NIR) and short-wave infrared (SWI) are used as the main input. The performance evaluation between the inundation map generated in HEC-RAS and the water bodies delineated from remotely sensed LANDSAT 8 imagery downloaded from https://earthexplorer.usgs.gov/ in NDWI were compared based on overlapping areas (Bagherzadeh and Daneshvar, 2011).

$${\text{NDWI}} = \frac{{{\text{NIR}} - {\text{SWI}}}}{{\text{NIR + SWIR}}}$$
(6)

To get the percentage of overlapping area between the water body delineated in NDWI and HEC-RAS software, the intersect tool within the Analysis toolbox in ArcGIS (ver.10.4) was implemented. First, the raster formats in both results (NDWI and HEC-RAS) were changed into vector (Polygons) using conversion tool (Scanlon et al. 2005), and then the corresponding shape areas were calculated using geometry calculation algorithms in the ArcGIS. The same geographic Coordinate system (Adindan UTM Zone 37 N) was adjusted for both polygons, and the percentage of overlapping areas is calculated as shown in the following equation (Potential 2020; Wang et al. 2017a, b):

$${\text{Overlapping percentage}} \left( \% \right) = \frac{{{\text{Layer}}_{{ 1}} { }\left( {{\text{Km}}^{{2}} } \right)}}{{{\text{Layer}}_{{ 2}} { }\left( {{\text{Km}}^{{2}} } \right)}}$$
(7)

where Layer 1 is the area of the water body delineated in NDWI from flood events and Layer 2 is the area of the inundation map generated in HEC-RAS.

Results and discussions

ANN hydrological model result

The daily stream flow (runoff) values generated in ANN predictive hydrological model developed in RStudio for the training and testing results are presented in Figs. 5 and 6. As indicated Fig. 5, the daily runoff values computed in ANN model and the corresponding daily discharges of 7 years during the training periods (1999–2005) was evaluated at NSE = 0.86, and PBIAS = 8.2%, respectively, and whereas as we can see from Fig. 6, the NSE = 0.88, and PBIAS = 8.5% were found during the 3-year testing periods (2006–2008), and similar model evaluation agreement were made by the studies conducted in Kumar et al. (2020), Tsakiri et al. (2018), and Kan et al. (2020). As shown the Fig. 7a–g, the ANN model results were further evaluated using coefficient of determination or regression (R2) with scatter plot for each year and values of 0.96, 0.96, 0.93, 0.93, 0.89, 0.93 and 0.92, respectively, were obtained during the training periods and the results are very good (Kan et al. 2020).

Fig. 5
figure 5

ANN and observed results during the training/validation periods (1999–2005)

Fig. 6
figure 6

ANN and observed results during the testing/validation periods (2006–2008)

Fig. 7
figure 7figure 7

The scatter plot between ANN and observed results for each year

In the Fig. 7h, the average simulated daily ANN results and the corresponding observed discharges of the entire periods with the scatter plot was demonstrated and this result revealed a bit better than the individual scatter plot. The visualized scatter plot for each year is not concentrated along the regression line; however, the goodness for fit and the performance rating scale for the R2 is very good. The model evaluation performed at training periods is poorer that the testing periods as we can see from Table 3.

Table 3 ANN model performance evaluation results (Kumar et al. 2020)

The 7-year hydrological model and actual daily discharge gauged at the basin outlet and the scatter plot of both values processed in RStudio is presented in Fig. 7g showing that the values were evaluated at regression R2 = 0.89. Similar results were obtained in Tayfur et al. (2018) and Pérez-Sánchez et al. (2019) that the R2 values ranging between 0.85 and 1 are a very good model. Artificial Neural Networks applied as hydrological modeling presented in Villada et al. (2012), Lateef( 2017), and Dibaba et al. (2020) was acceptable with the values summarized in Table 3.

HEC-RAS model result

Figures 8 and 9 present the HEC-RAS model results during the calibration and validation periods. The visualized inundation map in HEC-RAS was further checked with historical flood events of 2005 and 2008 for calibration (Fig. 7) and validation (Fig. 8) periods. The inundated map generated in HEC-RAS software for the ANN hydrological results of 7 years for the periods of (1999–2005) and the water body delineated in NDWI from flood event of 2005 are presented in Fig. 7a and b, respectively. The blue color in both figures is the water body in the area of interest.

Fig. 8
figure 8

Trained (calibrated) inundation map result in HEC-RAS (1999–2005)

Fig. 9
figure 9

Tested (validated) inundation map result in HEC-RAS (2006–2008)

The HEC-RAS result was further evaluated with the ANN hydrological model result obtained during the validation periods of (2006–2008), and the water body delineated in NDWI from the flood event of 2008 were presented in Fig. 8a and b, respectively. From the overlapping percentage areas computed in ArcGIS (ver.10.4), 94.6% and 96% of intersected areas were counted from the inundation map generated in HEC-RAS and water body delineated in NDWI during the calibration and validation periods. According to the studies conducted by Bagherzadeh and Daneshvar (2011) and Mai and De Smedt (2017), the inundation map was evaluated based on the overlapping areas and if more than 85% counted percentage of overlapped, it is considered as a good agreement.

Conclusion

In this study, an integrated machine-learning and HEC-RAS models for flood inundation mapping in Baro River Basin (Ethiopia) is presented. ANN as a predictive hydrological modeling and HEC-RAS as hydraulic modeling was integrated and accurate flood inundation areas were identified. Stream flow was generated in ANN model and flood depths were generated in HEC-RAS model. The performance of ANN model results for training (1999–2005) and testing (2006–2008) periods were evaluated with Nash–Sutcliffe Efficiency (NSE), PBIAS and coefficient of determination (R2). The NSE values of 0.86 and 0.88, PBIAS of 8.2% and 8.5% were obtained during training and testing periods, respectively. The ANN model result of each year (1999–2005) was further evaluated graphically at R2 values of 0.96, 0.96, 0.93, 0.93, 0.89, 0.93 and 0.92, respectively. Accordingly, the HEC-RAS model and NDWI results were overlapped at 94.6% and 96% during the calibration and validation periods. The results of this integrated ANN and HEC-RAS models as the flood inundation was successful and it was highly recommended that this could be a possible alternative for flood risk strategies. Finally, it was concluded that an integrated machine-learning and HEC-RAS models for flood inundation mapping is an appropriate tool for flood risk management and early warning systems.