Machine-learning and HEC-RAS integrated models for flood inundation mapping in Baro River Basin, Ethiopia

This study presents an integrated machine-learning and HEC-RAS models for flood inundation mapping in Baro River Basin, Ethiopia. ANN and HEC-RAS models were integrated as a predictive hydrological and hydraulic model to generate runoff and the extent of flood, respectively. Daily rainfall and temperature data of 7-years (1999–2005), daily discharge (1999–2005) and 30 m × 30 m gridded Topographical Wetness Index (TWI) were used to train a predictive ANN hydrological model in RStudio. The predictive performance of the developed ANN hydrological model was evaluated in RStudio using Nash–Sutcliffe Efficiency (NSE) values of 0.86 and 0.88 during the training period (1999–2005) and testing period (2006–2008), respectively, with the corresponding observed daily discharge. The validated ANN predictive hydrological model was linked with HEC-RAS to generate the flood extent along the river course. The HEC-RAS model result was calibrated and validated using the water body delineated using Normal Difference Water Index (NDWI) from LANDSAT 8 imagery based on historical flood events of 2005 and 2008. It was found that about 96% of an agreement was made between the flood-prone areas generated in HEC-RAS and the water body delineated using NDWI. Therefore, it is logical to conclude that the integration of a machine-learning approach with the HEC-RAS model has improved the spatiotemporal uncertainties in traditional flood forecasting methods. This integrated model is powerful tool for flood inundation mapping to warn residents of this basin.


Introduction
Floods are the most frequent type of natural disaster happening everywhere in the world. The severity of flooding is very visible in a country where there is no sufficient structural affordability due to financial limitations (Chen et al. 2014;Abaya et al. 2009). Currently, river floods are a global issue causing a serious problem to the residents living in the riverside (Cirella and Iyalomhe 2018). In Africa, the number of households displaced and left without shelter by this disaster is dramatically increasing (Thiemig et al. 2011;Moges 2007;Dessalegn et al. 2017) and Ethiopia is not exceptional to this problem. According to the report obtained from International Disaster Data (IDD) of the 2017 and 2018 (Ababa 2018), flooding incidents were frequently seen in many parts of Africa (Thiemig et al. 2011), particularly in East Africa. Ethiopia is one of the East African countries where the severity is relatively high (Haile et al. 2013;Tarekegn 2009;Desalegn et al. 2016). The topographical conditions, heavy rainfall, river bank overflowing, sudden destructions of river banks, inadequate urban drainage systems, steep slope in channel design, and land use land cover change have made the country more vulnerable to floods (Lamichhane and Sharma 2018; Mosavi et al. 2018). According to national disaster report obtained from FDPPA (2007) (Mengistu et al. 2016), the historical flood events recorded in this river resulted in the loss of life (Imanshoar et al. 2014;Ho and Lee 2015;Desalegn et al. 2016), left residents of the area without shelter, destructed infrastructures, transmissible diseases, and livelihoods (Broxton et al. 2014). Flood risk is increasing in flood plain areas due to population growth and property (Abaya et al. 2009) and the problem is aggravated by the impact of climate change.
To minimize the impacts of this natural disaster, different hydrological models (such as Physical based, Conceptual, Empirical, and probabilistic) are implemented for flood forecasting by researchers worldwide (Shibuo et al. 2016;Devia et al. 2015;Siccardi et al. 2005;Shamseldin and O'Connor 2010). Based on the method implemented for explaining the connection between the input and output, flood forecasting models can be categorized as Physical based models, Conceptual models, and Black-box models (Shamseldin et al. 1999). Black-box models are purely empirical and spatial or physical processes (Mengistu et al. 2016;Goswami and O'Connor 2005) are excluded and the hydrological process result of this model is completely governed by the metric relationship between the input and output parameters. The other commonly used flood forecasting model is the physical based model (Shibuo et al. 2016), in which complex physical characteristics and the dynamic nature of a watershed are considered. This model is more appropriate when inputs for hydrologic processes are huge and high temporal resolution in the computation is required. The compound in nature of the hydrological process and the non-linearity characteristics of input parameters make it difficult to select the appropriate model for flood forecasting. The nature of the watershed, the purpose of modeling, the appropriateness of the model, and the quality of input parameters such as rainfall, temperature, humidity, land use land cover, and spatiotemporal variability of the inputs can affect the reliability of the flood forecasting model (Toth et al. 2000;Ateeq-ur-Rauf et al. 2016;Lateef Ahmad Dar 2017).
In recent years, Artificial Neural Networks (ANNs) have been developed as an alternative method to hydrological modeling of stream flows (Grimes et al. 2003;Chang et al. 2007;Ateeq-ur-Rauf et al. 2016;Shamseldin, 2010). A neural network is a machine learning that focuses on an information processing algorithm to solve a non-linear nature of the hydrologic process (Ateeq-ur-Rauf et al. 2016;Shamseldin and O'Connor 2010;Campolo et al. 2003;Barbetta et al. 2016) by linking input parameters with weights in the network. ANN is a data-driven model that has been developed in a recent year and the application of this model in the hydrological model improved the uncertainty in space and time. The determination of the magnitude of incoming flood peak and the probable time of occurrence of the flood can be estimated by several models (Ligaray et al. 2015;Asadi 2013;Dogan et al. 2007) and the selection of a specific model and its accuracy is generally governed by factors such as availability of input parameters, the skills of the forecaster and the knowledge with the watershed.
The integration of different models in the areas of hydrologic and hydraulic models is getting global attention and has a paramount role in flood risk management strategies (Chang et al. 2007;Abaya 2008). Flood inundation mapping is a difficult task that needs a combination of high quality and observed data to verify the performance of the models (Lohani et al. 2012;Seenu 2019;Duvvuri and Narasimhan 2013). The application of machine learning (ANN) in areas of hydrologic processes is a recently evolving approaches and has been applied in rainfall-runoff modeling (Riad et al. 2004), daily water supply-demand (Akhtar et al. 2009), streamflow computation (Veintimilla-Reyes et al. 2016;Poonia 2018), extreme hydrologic event analysis and generation of the unit hydrograph (Mengistu et al. 2016). A feedforward ANN structure is commonly used in the one-way computation of the hydrologic process, in which inputs are pushed forward until the rough result is obtained. The main objective of this study is to generate floodprone areas using ANN as hydrological model and HEC-RAS as hydraulic modeling. The two models are integrated to improve the spatiotemporal uncertainties in traditional flood forecasting models. Thus, the improvement of accuracy related to space and time is presented as the novelty of this integrated ANN and HEC-RAS models.

Study area
The Baro Akobo basin is located in the southwestern part of Ethiopia. Geographically, it is located, between latitudes 5° 31″ and 10° 54″ north and longitude 33° and 36° 17″ east. The River basin ( Fig. 1) is the fourth largest basin in the country, covering an approximate area of 74,100 km 2 . The western, northwestern, and southwestern sides of the basin are bordered with South Sudan; the northern and northeast sides are bordered with the Abay river basin; and the east and southeast are bordered by the Omo-Gibe river basin. The River originates from the highlands in the southwest part of Ethiopia and flows across the low-lying plains. The most recent (2015) flood event occurred in the river basin forced eviction of around 2,000 peoples out of their homes (Alemayehu 2016;Thiemig et al. 2013;Woube 1999;Abaya 2008).

Data and software used
In this study, ArcGIS (ver.10.4), RStudio, and HEC-RAS (ver. 5.0.1) were used to prepare an inundation map, to develop ANN predictive hydrological model, and to model the river flowing in the natural channel, respectively. All packages are supported by student license and open-source privileges. For the predictive ANN hydrological model training, both spatial (Topographical wetness Index) and temporal (7-year daily Rainfall and Temperature) data (Table 1) were used. The spatial resolution of 30 × 30 m pixel size was implemented and all input parameters were prepared based on the fixed grid.

Feedforward artificial neural network (ANN) model
To begin with the modeling, the input parameters for hydrological modeling were prepared based on spatiotemporal variations. Daily rainfall (R) and temperature (T) data of the   (1999)(2000)(2001)(2002)(2003)(2004)(2005) were first distributed on the spatial resolution of 30 m × 30 m. Inverse Distance Weighted (IDW) was used to convert the point climate data (rainfall and temperature) into spatial data, and with the same spatial resolution, the Topographical Wetness Index (TWI) was prepared. The prepared spatiotemporal data were normalized and squashed in between 0 and 1. A feedforward network is selected in this study based on the studies conducted by de Vos and Rientjes (2005), Poonia (2018), Abhishek et al. (2012), Hung et al. (2009), and Arun and Baskaran (2013). For this class of ANN architecture, R, T, and TWI were assigned to the networks (Dolling and Varas 2002;Tayebiyan et al. 2016). Random initial weights were generated and assigned to the ANNs between input and hidden nodes, hidden and output nodes. The input nodes labeled as 1, 2 and 3 receive the normalized (Eq. 2) input parameters and connected to hidden nodes labeled as 4, 5, and 6. The synoptic links (weights) between the input and hidden nodes were assigned with weights labeled as the 1st, 2nd and 3rd rows of the weight matrix in (Eq. 1), and as well as the weights between hidden and output nodes were assigned with the weights labeled as the 4th row of the matrix (Amengual et al. 2007). Once the input nodes receive the normalized input parameters (rainfall, temperature, and TWI), then the weighted sum of the input parameters and initial weights (Eq. 1 and (1)

Back propagation
In the feedforward propagation, the input parameters are pushed forward to get the rough solution at the output node and does not take any account to minimize the error between the result obtained from network and target output (Dar 2017;Tayebiyan et al. 2016;Malmgren and Nordlund 1996). The initial weight values assigned in the feedforward processes are just to start the modeling, and the accuracy of the model is very low at this stage (Shamseldin and O'Connor 2003). The main importance of backpropagation (Fig. 3) is to spread the error back into the networks to minimize the error obtained in the feedforward process (Sattari et al. 2017;Timbadiya et al. 2011). The overall error obtained at the output layer starts to propagate back into the networks from the output node to the entire networks (Mai and De Smedt 2017). Training in a sense meaning that the network learns from the mistakes through the built-in learning algorithm in ANNs (Abhishek et al. 2012;Hawkin 2014).

HEC-RAS model
The HEC-RAS software is a computer program developed for modeling river flowing through open natural channels and used for computing water surface profile (Mapping and Field 2017;Lamichhane and Sharma 2018;Duvvuri and Narasimhan 2013). HEC-RAS get accepted and being used for river simulation by hydraulic engineers and different researchers (3) Activation function = 1 1 + e x Fig. 2 Assigned initial weights in the ANNs (Marimin et al. 2018) because of its capabilities and abilities to simulate unsteady flow and identifies flood-prone areas where the surface ground level is lower than the computed water profile and allows the researcher to visualize the flood extent along a river course (Maidment 2017;Timbadiya et al. 2011). The river geometries such as centerlines, bank lines, flow paths, and cross-sectional lines are the major parameters processed in HEC-RAS to generate flood-prone areas. A Digital Elevation Model (DEM) of 12.5 m × 12.5 m pixel resolution downloaded from https:// asf. alaska. edu/ (Li 2010) was used as input to extract the major parameters. A flood inundation map generated in this study is to provide information on the spatially distributed depth of flood and prone areas (Parhi 2013) along the Baro River. A coupled 1D and 2D models were implemented in this study to generate the depth and prone-prone areas along the Baro River (Enea et al. 2018). HEC-RAS model received the result (runoff) from the tested ANN hydrological model as input and gave the information on the spatial extent and depth of flooding along the river.

Integrated ANN and HEC-RAS models
The trained and tested ANN predictive hydrological model developed in this paper was to generate runoff (Dawson and Wilby 2010;Biragani 2016), and linked to the HEC-RAS model to generate the flood extent along the river (RAJURKAR et al. 2010). Whenever the ANN model receives input parameters (Rainfall, temperature and Topographical Wetness Index) and computed runoff, the HEC-RAS is ready to accept the result (runoff) it as input to generate the information on spatial distribution of flood and prone areas along the river (Fig. 4).
The final corrected and updated values of weights in the ANN model are used to generate the runoff values whenever the input parameters are sent to the input nodes.
However, Nash-Sutcliffe Efficiency (NSE) alone cannot give us the information on the model bias. The fitness of simulated versus observed evaluated in NSE should be  Table 3. where Y i (Obs) and Y i (Sim) are observed and simulated, respectively.
Further, the ANN predictive hydrological model result was evaluated to describe the proportion of the variance between the observed and simulated values with the coefficient of determination, R 2 . The general equation by which this coefficient is computed for the evaluation of a model is presented in Eq. (6) (Pérez-Sánchez et al. 2019), and the acceptable range for hydrological modeling is given in the Table 2.

HEC-RAS model evaluation
The runoff values obtained in ANN hydrological model was used as input in HEC-RAS to generate flood inundation areas. The inundation map generated in HEC-RAS during training (1999)(2000)(2001)(2002)(2003)(2004)(2005) and testing (2006)(2007)(2008) periods were checked with the water body delineated using Normalized Difference Water Index (NDWI). The flood events of 2005 and 2008 were detected while delineating the water body in NDWI (Ali et al. 2016;Enea et al. 2018). NDWI (Eq. 6) uses Green (Band-2) and near infrared (Band-4) bands of remote sensing images to extract a water body in which near-infrared (NIR) and short-wave infrared (SWI) are used as the main input. The performance evaluation between the inundation map generated in HEC-RAS and the water bodies delineated from remotely sensed LANDSAT 8 imagery downloaded from https:// earth explo rer. usgs. gov/ in NDWI were compared based on overlapping areas (Bagherzadeh and Daneshvar, 2011).
To get the percentage of overlapping area between the water body delineated in NDWI and HEC-RAS software, the intersect tool within the Analysis toolbox in ArcGIS (ver.10.4) was implemented. First, the raster formats in both results (NDWI and HEC-RAS) were changed into vector (Polygons) using conversion tool (Scanlon et al. 2005), and then the corresponding shape areas were calculated using geometry calculation algorithms in the Arc-GIS. The same geographic Coordinate system (Adindan UTM Zone 37 N) was adjusted for both polygons, and the percentage of overlapping areas is calculated as shown in (6) NDWI = NIR − SWI NIR + SWIR  the following equation (Potential 2020;Wang et al. 2017a, b): where Layer 1 is the area of the water body delineated in NDWI from flood events and Layer 2 is the area of the inundation map generated in HEC-RAS.

ANN hydrological model result
The daily stream flow (runoff) values generated in ANN predictive hydrological model developed in RStudio for the training and testing results are presented in Figs. 5 and 6. As indicated Fig. 5, the daily runoff values computed in ANN model and the corresponding daily discharges of 7 years during the training periods (1999)(2000)(2001)(2002)(2003)(2004)(2005) was evaluated at NSE = 0.86, and PBIAS = 8.2%, respectively, and whereas as we can see from Fig. 6, the NSE = 0.88, and PBIAS = 8.5% were found during the 3-year testing periods (2006)(2007)(2008), and similar model evaluation agreement were made by the studies conducted in Kumar et al.
(7) Overlapping percentage(%) = Layer 1 Km 2 Layer 2 Km 2 (2020), Tsakiri et al. (2018), and Kan et al. (2020). As shown the Fig. 7a-g, the ANN model results were further evaluated using coefficient of determination or regression (R 2 ) with scatter plot for each year and values of 0.96, 0.96, 0.93, 0.93, 0.89, 0.93 and 0.92, respectively, were obtained during the training periods and the results are very good (Kan et al. 2020). In the Fig. 7h, the average simulated daily ANN results and the corresponding observed discharges of the entire periods with the scatter plot was demonstrated and this result revealed a bit better than the individual scatter plot. The visualized scatter plot for each year is not concentrated along the regression line; however, the goodness for fit and the performance rating scale for the R 2 is very good. The model evaluation performed at training periods is poorer that the testing periods as we can see from Table 3.
The 7-year hydrological model and actual daily discharge gauged at the basin outlet and the scatter plot of both values processed in RStudio is presented in Fig. 7g showing that the values were evaluated at regression R 2 = 0.89. Similar results were obtained in Tayfur et al. (2018) and Pérez-Sánchez et al. (2019) that the R 2 values ranging between 0.85 and 1 are a very good model. Artificial Neural Networks applied as hydrological modeling presented in Villada et al. (2012), Lateef( 2017), and Dibaba et al. (2020) was acceptable with the values summarized in Table 3.  (Fig. 7) and validation (Fig. 8) periods. The inundated map generated in HEC-RAS software for the ANN hydrological results of 7 years for the periods of (1999)(2000)(2001)(2002)(2003)(2004)(2005) and the water body delineated in NDWI from flood event of 2005 are presented in Fig. 7a and b, respectively. The blue color in both figures is the water body in the area of interest.
The HEC-RAS result was further evaluated with the ANN hydrological model result obtained during the validation periods of (2006)(2007)(2008), and the water body delineated in NDWI from the flood event of 2008 were presented in Fig. 8a and b, respectively. From the overlapping percentage areas computed in ArcGIS (ver.10.4), 94.6% and 96% of intersected areas were counted from the inundation map generated in HEC-RAS and water body delineated in NDWI during the calibration and validation periods. According to the studies conducted by Bagherzadeh and Daneshvar (2011) and Mai and De Smedt (2017), the inundation map was evaluated based on the overlapping areas and if more than 85% counted percentage of overlapped, it is considered as a good agreement.

Conclusion
In this study, an integrated machine-learning and HEC-RAS models for flood inundation mapping in Baro River Basin (Ethiopia) is presented. ANN as a predictive hydrological modeling and HEC-RAS as hydraulic modeling  (1999)(2000)(2001)(2002)(2003)(2004)(2005) and testing (2006)(2007)(2008) periods were evaluated with Nash-Sutcliffe Efficiency (NSE), PBIAS and coefficient of determination (R 2 ). The NSE values of 0.86 and 0.88, PBIAS of 8.2% and 8.5% were obtained during training and testing periods, respectively. The ANN model result of each year (1999)(2000)(2001)(2002)(2003)(2004)(2005) was further evaluated graphically at R 2 values of 0.96, 0.96, 0.93, 0.93, 0.89, 0.93 and 0.92, respectively. Accordingly, the HEC-RAS model and NDWI results were overlapped at 94.6% and 96% during the calibration and validation periods. The results of this integrated ANN and HEC-RAS models as the flood inundation was successful and it was highly recommended that this could be a possible alternative for flood risk strategies. Finally, it was concluded that an integrated machine-learning and HEC-RAS models for flood inundation mapping is an appropriate tool for flood risk management and early warning systems.  Author contributions HT proposed the research title, designed the methodology, and analyzed the data. MOD collaborated with the corresponding author in result interpretation and construction of the manuscript the he brought it into the standard, and he finally read the entire manuscript and approved the final manuscript.

Funding
The study did not receive any external funding.
Availability of data and material All data generated during the manuscript analysis are included in the article. Further datasets are available from the corresponding author upon request.

Conflict of interest
The authors declare that they have no competing interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.