1 Introduction

In the coming decades, climate change will likely become a primary problem affecting hydrological regimes and flood hazard conditions. According to the IPCC report, significant changes in atmospheric temperature, precipitation, humidity, and circulation are expected, resulting in increased extreme events, including floods, droughts, heatwaves, heavy rainfall, and more intense cyclones (Knutti and Sedláček 2013; IPPC 2013). According to Serdeczny et al. (2016) and Rojas et al. (2013), flood frequency will be increased due to climate change and will have a massive socio-economic impacts. Serdeczny et al. (2016) stated that there have been climate-linked changes in the magnitude, frequency and timing of flood characteristics in different regions of Africa. The autumn and winter flood are significantly increasing due to a large positive change in rainfall intensity and frequent during the autumn and winter seasons. In contrast, the decline in peak floods are due to decreasing future precipitation and increasing evapotranspiration (Coulibaly et al. 2020; Thober et al. 2018; Balke and Nilsson 2019; Blöschl et al. 2019). The understanding of catchment-scale flood hazard projections requires a chain of linked concepts and comprehensive framework. Therefore, developing a comprehensive cascade modeling that considers these principal components of climate change and hydrologic modeling framework is crucial.

Assessing the impact of climate change on local hydrological extremes is difficult, mainly due to the uncertainty in propagating information from a global coarse resolution model to the local scale. Meresa and Romanowicz (2017) studied the critical uncertainty embedded (hydrological parameter uncertainty (HBV), climate models (RCP), and distribution parameter uncertainty (Generalize Extreme Value (GEV)) in the projection of hydrological extremes and addressed that local-scale impact study is highly prone to climate models chain error. Joseph et al. (2018) also considered hydrological parameters (VIC) and climate models (RCP) uncertainty assessment in seasonal flow projections. Soriano et al. (2019) evaluated the uncertainty of bias correction methods on flood frequency and found that the climate model's bias correction is significant and brought significant change in the magnitude of flood design values. Feng and Beighley (2020) also identified the uncertainties from both climate forcings and hydrologic model components as well as associated parameterization. They compared the contribution of climate models, hydrological model structure and hydrological parameters using the seasonal flow indices as hydrological indicators. For the changes in seasonal mean flow, hydrologic models and climate models are two main uncertainty contributors. In contrast, the role of hydrologic parameters is not significant in seasonal flow and high flow (Meresa et al. 2021). The uncertainty in future climate flood hazard assessment arises from the uncertainties in the different component elements of the modeling chain used to calculate streamflow from model-derived estimates of future rainfall. These include uncertainties in the model-estimated parameters of future rainfall and temperature, climate downscaling approaches for calibrating meteorological conditions at a particular basin, the structure and parameters of the hydrological model of streamflow, and the flood frequency model used to assess flood recurrence periods (Bastola et al. 2011; Charles et al. 2019; Meresa 2020; Lawrence 2020; Meresa et al. 2021). Therefore, it is crucial to estimate and understand the complex and uncertain impact and challenging to identify the primary sources of uncertainty in future hazard estimation.

In the other hand, different climate projects (SRES, CORDEX, CMIP) have employed different ensembles of regional climate model (RCMs) and global climate models (RCMs) with different atmosphere-land modules to derive a wide range of possible future climate conditions. Therefore, the locally projected hydrological extremes are highly dependent on the spread (or range) of forcing conditions from the ensemble of the RCM/GCM estimates employed in the local impact study area (Woldemeskel et al. 2014; Meresa and Romanowicz 2017; Hattermann et al. 2018; Charles et al. 2019; Meresa 2020; Lawrence 2020). The information from the spreads of multiple GCMs is considered as a band of impact uncertainty and quantified using the variance of the climate models (Meresa and Romanowicz 2017; Lawrence 2020). Bias correction methods of GCMs output are also essential and significantly impact the projected hydrological extremes magnitude and direction (Pierce et al. 2015; Hattermann et al. 2018; Charles et al. 2019). Similarly, the biases correction of GCMs output was applied on a grid commensurate with daily climate model output. Nowadays, many researchers have been used and addressed the importance of climate bias correction methods (Kay et al. 2009; Saini et al. 2015; Soriano et al. 2019). Quantile mapping, empirical mapping, simple statistical transformation, and joint probability are the most popular and widely applied for climate model bias correction and related to observed climate time series. These techniques will reduce the original climate model variability and outliers and prepare for forcing hydrological models. From the above study, the major source of uncertainty in climate projection is explained by selecting bias correction methods and the spread of selected GCMs. Furthermore, uncertainty is also introduced through the structure and parameterization of hydrological models. Hydrological models translate the meteorological conditions of rainfall and temperature into predicted stream flows using parameterizations of varying complexity, which must be calibrated with measurement databases. Different hydrological models may show varying skill levels in simulating different kinds of flow regime, and this is important for assessing future climate-linked flood extremes (for e.g., Pechlivanidis et al. 2017; Joseph et al. 2018; Her et al. 2019). Of which, many of the studies attempt to address the uncertainty of the hydrological parameters in climate change impact study using GLUE (Beven and Binley 1992), multi-objective function optimization approach (Dakhlaoui et al. 2017; Zhang et al. 2019), Bayesian Model Averaging (Beigi et al. 2019). This implies that on top of the spread of climate models and bias correction methods, the hydrological parameter uncertainty using GLUE is also important in climate change impact on hydrological extremes. In addition to these sources of uncertainty, the flood frequency associated uncertainty is also important. Collet et al. (2017), Kay et al. (2009), Meresa and Romanowicz (2017), Lawrence (2020) found that the flood frequency distribution models are highly dependent on the distribution of extreme floods, the number of events, and the size and shape of the catchment. This shows that a significant effort has been made to evaluate climate change impacts on extreme hydrology, mostly focusing on identifying trends in extreme low flow and high flow magnitude with their associated uncertainty estimation. However, there is little attention to uncertainty and impact estimation in projected peak flood magnitude and frequency. This study applies comprehensive uncertainty and impact estimation of climate change on flood hazard approach.

This investigation has therefore been structured to evaluate the CMIP6 climate data set to assess hydrological extreme conditions at selected catchments in the Awash basin of Ethiopia. The location is a particularly effective test site with a large historical meteorological and hydrological data archive and is of particular interest because of large extremes. This includes to (i) Compare different bias correction techniques for climate change impact assessment, (ii) Estimate the hydrological structure and parameter uncertainly, (iii) Estimate the impact and uncertainty of climate change and bias correction approach on flood projections, and (iv) Identify the source of uncertainty associated with projected flood hazards.

2 Study area and hydro-climate datasets

Five sub-catchments were selected for climate change impact assessment. These are located in different geographical and hydro-climatic regimes (Table 1 and Fig. 1) and inside the Awash basin study area. Awash_K (@U/S of Koka), Akaki (@ Aba S.), Kela (@ Welen), Awash_B (@ Bello), and Awash_H (@ Melka H) catchments are most representative for flood hazard estimation under varying climate conditions. The drainage area of the selected catchments ranges from 67.5 km2 for Welen to 7093.9 km2 for U/S of Koka. The elevation of these catchments varies from 1602 m (@ U/S of koka) to 2300 m (@ Melka H). The historical records of daily precipitation, temperature, and streamflow data were taken from the Ethiopian meteorological and hydrological institute for the period 1981–2010. The selected stations are located inside the Awash basin study area. The highest observed flow rates from the daily time series for the basins ranges from 82 m3/s (@ Welen) to 884 m3/s (@ U/S of Koka), which is proportional to the catchment area. Simultaneously, the annual maximum precipitation (or the 95% of annual precipitation) is not proportional to the catchment area and annual maximum flow (or 95% of annual flow). The high mean of surface runoff is directly related to the high coefficient of variance in river flow (e.g., U/S of Koka).

Table 1 Statistical description of the selected gauged stations
Fig. 1
figure 1

Location of the selected case study. The red dots are the main hydrological stations chosen in this study

3 Modeling and numerical experiments

The impacts and uncertainty of projected flood hazard for selected Awash catchments were examined using a new climate dataset (CMIP6), three bias correction techniques (statistical quantile factor (SQF), distribution quantile mapping (DQM), and empirical quantile mapping (EQM)), four hydrological models (HBV, SMART, NAM and HYMOD) with GLUE (generalize likelihood uncertainty estimation)—Monte Carlos simulation of 30,000 hydrological parameter sets and general extreme value (GEV)) (Fig. 2). In this study, four stages of numerical modeling experiments have been performed. The investigation started by selecting natural and/or semi-natural catchments and then extracting, correcting, and evaluating new climate datasets (CMIP6) using the three statistical techniques. A sampling of 30,000 hydrological parameters set using Monte Carlo simulation (MCS) for hydrological parameter uncertainty band estimation using GLUE (Beven and Binley 1992), simulation of flood hazard projections using HBV, SMART, NAM and HYMOD conceptual hydrological models for hydrological model structure and parameter impact analysis. Finally, generalized extreme value (GEV), log normal (LNOR), Pearson-III (PEAR-II) was applied to estimate the extreme flood hazard magnitude and frequency (Fig. 2).

Fig. 2
figure 2

Research flow chart to estimate projections of flood hazard and identify associated uncertainty

3.1 Coupled model intercomparison project phase 6 (CMIP6)

CMIP6 is a new climate dataset project (https://pcmdi.llnl.gov/CMIP6/) which was released by a collaborative effort of different climate research institutes to advance climate change knowledge and applicability. Precipitation and air temperature daily time series were extracted from the CMIP6 simulation datasets (https://esgf-node.llnl.gov/search/cmip6/). The climate models output covers 1971–2100 time period and provides global scale spatial resolution, ranging from 50 to 250 km. The three shared socioeconomic pathways (SSP) scenarios used in CMIP6 are SSP126 (lower level), SSP370 (middle level), and SSP585 (high level). In this study, 12 climate models and three CMIP6 scenarios (SSP126, SSP370, SSP585) were used to evaluate the projected flood hazard's impact and uncertainty in the selected Awash catchments (see Table S1 for climate model name and model details). The catchment centroid points were bracketed by four grid points of the coarse resolution GCMS, and the air temperature and precipitation from these points were extracted for further analysis. The climate models listed in Table S1 are all global models with 50-250 km resolutions. The CORDEX project has a fine resolution than CMIP6 and is assumed to simulate better than coarse resolution climate datasets. However, these GCMs can capture the variability and seasonality of precipitation and temperature better than those with high resolution (Almazroui 2019).

3.2 Climate projections—bias correction techniques

Global and regional climate models (GCM/RCMs) are essential for enlightening our understanding of annual, seasonal, and daily precipitation and air temperature characteristics. Climate model predictions differ from each other because of differences in the way fluxes are parameterized and as well as the underlying grid structure. This gives rise to biases in their outputs (GiorgiI and Gao 2018; Krinner and Flanner 2018), making it challenging to develop accurate and reliable climate information. It has long been recognized that climate model bias must be corrected or ‘calibrated’ before use in other applications like the hydrological extremes projections (e.g., Ehret et al. 2012; Teng et al. 2012; Meresa and Romanowicz 2017; Osuch et al. 2017) in hydrology and water resources. However, minimizing the bias in climate model output is still a big challenge. There is a problem finding/developing a robust climate bias correction technique that reduces the biases between the observed and simulated climate data in the historical period. Previous studies have had mixed success in the application of different bias correction methods for precipitation (including intensity and wet days) and temperature data. For instance, Teutschbein and Seibert (2013), Yang et al. (2010) showed the distribution mapping based on theoretical distributions outperforms other bias correction methods in their result. Similarly, Berg et al. (2012), Chen et al. (2013) showed that theoretical distribution mapping performs similar to, or only marginally better than, empirical quantile mapping or other statistical methods. Contrary, Gudmundsson et al. (2012), Gutjahr and Heinemann (2013), Lafon et al. (2013) show that empirical quantile mapping demonstrates higher skill than theoretical distribution mapping in systematically correcting RCM precipitation. In this study, we used three climate bias techniques to correct the raw climate output and examine the contribution of bias correction methods to the flood hazard magnitude and frequency.

3.2.1 Statistical quantile factor (SQF)

This study proposed a new bias correction technique to compare with the commonly used methods (DQM and EQM). The statistical quantile factor change technique (SQM) of bias correction is a direct and straightforward precipitation correction, considered a simple error transfer from the historical to a future period. This involves correcting the future daily precipitation (\(P_{{{\text{fut}}, {\text{corr}}}}\)) by multiplying the sum of average ratio and reciprocal ratio of observed precipitation (\(P_{{{\text{obs}}}}\)) and reference precipitation simulation (\(P_{{{\text{ref}},\,{\text{raw}}}}\)) to the prediction of raw climate model precipitation (\(P_{{{\text{fur}},\,{\text{raw}}}}\)). By contrast, future air temperature is corrected (\(T_{{{\text{fut}}, {\text{corr}}}}\)) with an additive/subtractive constant applied to the raw climate output (\(T_{{{\text{fur}},\,{\text{raw}}}}\)) and minimum difference of the raw (\(T_{{{\text{ref}},\,{\text{raw}}}}\)) and observed (\(T_{{{\text{obs}}}}\)) air temperature in the reference period.

$$ P_{{{\text{fut}}, {\text{corr}}}} = \frac{{P_{{{\text{fur}},\,{\text{raw}}}} }}{2}*\left[ {\frac{{P_{{{\text{obs}}}} }}{{P_{{{\text{ref}},\,{\text{raw}}}} }} + \frac{{P_{{{\text{ref}}, {\text{raw}}}} }}{{P_{{{\text{obs}}}} }}} \right] $$
(1)
$$ T_{{{\text{fut}}, {\text{corr}}}} = T_{{{\text{fur}},\,{\text{raw}}}} + {\text{min}}\left( {\left( {T_{{{\text{obs}}}} - T_{{{\text{ref}},\,{\text{raw}}}} } \right), \; \pm \;2.2} \right) $$
(2)

3.2.2 Distribution quantile mapping (DQM)

Distribution quantile mapping is a parameter bias correction technique that depends on the type of distribution fitted to observed and simulated climate data (Piani et al. 2010). This distribution-based approach can be single or double distribution quantile mapping (S/DQM) or other distributions. The excess number of dry days, drizzles and wet days were considered and properly corrected in this method. For every N year, the zero precipitation is removed, and the single Gamma distribution is fitted to the upper 75% of daily precipitation. In contrast, the double Gamma distribution fitted to both upper and lower parts of 75% of the daily precipitation.

$$ P_{{{\text{corr}}}} = F_{dg}^{ - 1} \left( {F_{dg} \left( {P_{{{\text{raw}}}} \left( t \right),\alpha_{{{\text{raw}}}} ,\beta_{{{\text{raw}}}} } \right),\alpha_{{{\text{Obs}}}} ,\beta_{{{\text{Obs}}}} } \right) $$
(3)
$$ T_{{{\text{corr}}}} = F_{dn}^{ - 1} \left( {F_{dn} \left( {T_{{{\text{raw}}}} \left( t \right),\alpha_{{{\text{raw}}}} ,\beta_{{{\text{raw}}}} } \right),\alpha_{{{\text{Obs}}}} ,\beta_{{{\text{Obs}}}} } \right) $$
(4)

where Pcorr and Tcorr stand the bias-corrected daily precipitation and temperature, respectively. Likewise, Praw and Traw represent raw climate output, daily precipitation, and air temperature, respectively. The raw climate output inverse cumulative density (CDF) is symbolized by Fdg and Fdn for precipitation and temperature, respectively. The dn and dg subscripts stand for the normal (for temperature) and Gamma (for precipitation) distributions. The Gamma (for precipitation) distributions have two parameters, shape and scale parameters are symbolized by α and β, and the normal (for temperature) distribution standard deviation and mean are symbolized by σ and µ, respectively.

3.2.3 Empirical quantile mapping

Unlike the distribution mapping approach, the empirical quantile mapping is based on a paired-wise comparison between empirical cumulative density function (ecdf) of observed and simulated precipitation at the reference period. This is purely empirical and directly matching the observed histogram to the future period. The future precipitation and temperature are corrected using the inverse of ecdf (\({\text{ecdf}}^{ - 1}\)) and fitted ecdf

$$ P_{{{\text{hst}},m,d}}^{{{\text{cor}}}} = \left( {{\text{ecdf}}_{{{\text{obs}},\,m}}^{ - 1} \left( {{\text{ecdf}}_{{{\text{hst}}, m}} \left( {P_{{{\text{hst}}, m,d}} } \right)} \right)} \right) $$
(5)
$$ T_{{{\text{hst}},m,d}}^{{{\text{cor}}}} = \left( {{\text{ecdf}}_{{{\text{obs}},m}}^{ - 1} \left( {{\text{ecdf}}_{{{\text{hst}}, m}} \left( {T_{{{\text{hst}}, m,d}} } \right)} \right)} \right) $$
(6)

3.3 Bias correction performance evaluation

The performance of the selected statistical bias correction technique was evaluated using four statistical measures: correlation coefficient (RR), mean absolute error (MAE), root mean square error (RMSE), percent of bias (PBIAS), and maximum precipitation weighting root means square error (PWRMSE). These statistical measures were applied to daily rainfall and temperature time series of climate model output. The time series-based evaluation was performed by comparing the capacity of each approach to generate precipitation and temperature.

$$ {\text{RR}} = \frac{{\sum {\left( {P_{{\text{S}}} - P_{{\text{S}}}^{ - } } \right)*\left( {P_{{\text{C}}} - P_{{\text{C}}}^{ - } } \right)} }}{{\sqrt {\sum {\left( {P_{{\text{S}}} - P_{{\text{S}}}^{ - } } \right)^{2} *\sum \left( {P_{{\text{C}}} - P_{{\text{C}}}^{ - } } \right)^{2} } } }} $$
(7)
$$ {\text{MAE}} = \frac{{\sum\nolimits_{i = 1}^{n} {\left| {P_{{\text{S}}} - P_{{\text{C}}} } \right|} }}{N} $$
(8)
$$ {\text{PBIAS}} = \frac{{\sum\nolimits_{i = 1}^{n} {(P_{{\text{S}}} - P_{{\text{C}}} )} }}{{\sum\nolimits_{i = 1}^{n} {P_{{\text{S}}} } }} $$
(9)
$$ {\text{PWRMSE}} = \left[ {\frac{{\sum\nolimits_{i = 1}^{n} {\left( {P_{{\text{C}}} - P_{{\text{C}}} } \right)^{2} *\left( {\frac{{P_{{\text{S}}} + \overline{{P_{{\text{S}}} }} }}{{2*\overline{{P_{{\text{S}}} }} }}} \right)} }}{N}} \right]^{0.5} $$
(10)

where Ps is observed precipitation at a given station, Pc is corrected precipitation, N number of observations.

3.4 Hydrological modeling

Many hydrological models have been developed to understand hydrological processes on local and global spatial scales. There are vast ranges of hydrological model types; most of the hydrological models have considered physical mechanism and empirical equations to describe the catchment processes. The hydrological parameter set is one type of uncertainty source in extreme hydrological modeling. Precipitation-runoff models are often made up of a series of equations that describe a simplified description of hydrological processes at the small to large catchment scale. Lumped models treat the catchment process as a single unit (Keith 2001), with all variables being averaged over it at a given time scale. This study looks at three different hydrological models with various structures. Each model gets forcings of catchment average precipitation and potential evapotranspiration input time series data at daily time-steps. At daily intervals, the models estimate actual evapotranspiration, change in stored water in various components, including the slow and quick runoff.

3.4.1 Hydrologiska Byråns Vattenbalansavdelning (HBV)

The HBV (Bergström 1976) is widely applied in different hydro-climate conditions of the world (Meresa and Gatachew 2019; Meresa et al. 2017; Her et al. 2019). The model is mainly designed to simulate streamflow using precipitation, temperature, and evapotranspiration (estimated using Hamon 1964) climate variables. It is a daily lumped conceptual hydrological model with fourteen parameters and applied in many parts of the world (Meresa et al. 2017; Meresa and Gatachew 2019). In this study, HBV was used with eleven hydrological parameters (Table S2). HBV model structure has three consecutive stores: one related to the surface, the second associated with the saturated soil layer, and the other to the unsaturated routing store. Daily precipitation and daily evapotranspiration (estimated using Hamon 1964) are the primary climate variable that are used as inputs to the model. The detailed information about the model is described in Bergström (1976). The upper and lower limits of the three hydrological models are listed in Table S2. The details of these hydrological models, including their physical meaning and description of each model parameter, are explained Bergström (1976).

3.4.2 Nedbr-Afstrmnings-model (NAM)

The NAM (Nielsen and Hansen 1973) is widely used all over the globe. It has been used to estimate the direct and runoff and investigate the contributions of groundwater and surface water to streamflow in catchments (O'Brien et al. 2013). The NAM model structure applied in this study has two main storage reservoirs for soil moisture accounting, including runoff routing and reservoirs, to characterize four hydrological process pathways. The model has eight parameters to govern the moisture content in both storages representing the surface runoff, soil and groundwater storages, and three parameters conveying to the runoff routing components (Table S2).

3.4.3 Soil moisture analysis rainfall tool (SMART) model

The SMART model (Mockler et al. 2016, 2014) was created as a hydrological model for water quality simulations. Individual contributions to hydrological flow paths are highlighted in the model. The model has ten parameters to control the precipitation-runoff process at a catchment scale (Table S2). Following the approach of SMARG and its predecessors, SMART simulates hydrological flows utilizing conceptual soil moisture accounting equations based on a number of soil moisture layers. The depths of these soil moisture layers vary based on the characteristics of the watershed, and they represent average conditions across the catchment. The four conceptual flow paths' numerical procedures involve a total of ten parameters, four of which are related to flow routing. The model, like the SMARG, comprises six equal-depth soil layers and six soil moisture accounting parameters (Table S2). Because drain flow can be an important conduit for nutrients in agricultural catchments (e.g., Madison et al. 2014), it is included as a separate flow path in the model. It is related to soil moisture surplus and the drain parameter (S), fluctuating between 0 and 1. The soil outflow coefficient is used to determine interflow, a mix of soil moisture excess and outflow from the soil layers (D). Individual outflow equations also related to the outflow coefficient (D) parameter, and used to compute shallow and deep groundwater at each subsurface layer.

3.4.4 HYdrologic MODel (HYMOD)

The HYMOD is a conceptual model that is lumped the hydrological cycle components together (Boyle 2000; Sun et al. 2010). The rainfall excess and two parallel sets of linear tanks are the two fundamental components of this scheme. It was decided to use a modified version of the model. The model now includes a snow module based on degree days. The degree-day factor DD, the precipitation/degree-day relation (Dew), and the threshold temperature are the three factors that control the melting storage. The maximum storage capacity (Cmax) and the degree of geographic variability of soil moisture capacity within the watershed determine soil moisture content. Excess water from the soil zone flows into quick-flow tanks and groundwater based on a partitioning factor that divides the flow between fast and slow reservoirs (v1 and v2). The water is dispersed into three linear reservoirs in series, one for the quick runoff component and one for groundwater flow in parallel with the other flows. The residence times Rs and Rq are used to classify reservoirs (Table S2).

3.5 Hydrological model parameter selection and evaluation

There are different ways of parameter sampling from the upper and lower boundary of hydrological parameters, which depend on the computing times and number of parameters in a specific hydrological model. Beven and Binley (2014) stated that there is no fixed threshold in parameter sampling that varies from ten thousand to a hundred thousand parameter sets. In this study, 30,000 parameter sets were generated from three hydrological model parameters range (Table S2). Characteristics for high flow and low flows have different and need two parallel calibration techniques (Meresa and Romanowicze 2017). Meresa and Romanowicze (2017) stated that the NSE likelihood function is relatively useful for high flow simulation due to the interest in peak flow and LogNSE for low flow. This study used the NSE, LogNSE, and KGE objective functions to simulate high flow regimes and evaluated against observed streamflow. Based on the model performance, 2000 sets of hydrological model simulations were selected as behavioral conditions.

$$ {\text{NSE}} = 1 - \frac{{\mathop \sum \nolimits_{t = 1}^{j} \left( {Q_{o,t} - Q_{m,t} } \right)^{2} }}{{\mathop \sum \nolimits_{t = 1}^{j} \left( {Q_{O,t} - \overline{Q}_{o} } \right)^{2} }} $$
(11a)
$$ {\text{LNSE}} = 1 - \frac{{\mathop \sum \nolimits_{t = 1}^{j} \left( {{\text{ln}}(Q_{o,t} } \right) - {\text{ln}}\left( {Q_{m,t} } \right)^{2} }}{{\mathop \sum \nolimits_{t = 1}^{j} \left( {{\text{ln}}(Q_{O,t} } \right) - {\text{ln}}\left( {\overline{Q}_{o} } \right)^{2} }} $$
(11b)

where Qo, t and Qm, t is the observed and simulated flow at time t, Qo is the mean observed flow, and j is the length of the jth time series.

$$ {\text{KGE}} = 1 - \sqrt {\left( {r - 1} \right)^{2} + \left( {\alpha - 1} \right)^{2} + \left( {\beta - 1} \right)} $$
(11c)

with \(\beta = \frac{{Q_{{{\text{obs}}}} }}{{\overline{Q}_{{{\text{obs}}}} }}\); \(\alpha^{2} = \frac{{Q^{2}_{{{\text{sim}}}} }}{{\sigma_{{{\text{obs}}}}^{2} }}\); \(r = \frac{1}{n}\mathop \sum \limits_{1}^{n} \frac{{\left( {Q_{{{\text{obs}}}} - \overline{Q}_{{{\text{obs}}}} } \right)(Q_{{{\text{sim}}}} - \overline{{Q_{{{\text{sim}}}} )}} }}{{\sigma_{{{\text{obs}}}} \sigma_{{{\text{sim}}}} }}\).

Where Qobs and Qsim are the observed and simulated flows, σobs and σsim are, respectively, the standard deviation of the observed and simulated flows. The best values of each objective function were selected for hydrological model structure and parameter uncertainty analyses.

The hydrological model runs using the entire space of hydrological parameters combination and evaluated using goodness-of-fit criterion (Beven 2007). Assume likelihood function H (X) to separate the non-behavioral and behavioral simulations produced by different variables X, such as input data, hydrological model parameters, hydrological model structures, and extreme frequency models. Every ith values of the variable X has its own one likelihood measure at time t. The ensemble of each variable Xi (i = 1…m) provides the multi-likelihood measure values H (Xi). The generalized likelihood uncertainty estimation (GLUE) function is shown in Eq. (12). The std/variance of residual σ2e value is the error in the estimated results affected by the model parameters/input data/hydrological model type/extreme frequency distribution models. If the estimated value of σ2e is near equal to the estimated maximum likelihood or equal to the std/variance of the observation data σ2o, the likelihood measure H (X) is equal to zero, which indicates extremely high uncertainty.

$$ H\left( X \right) = 1 - \frac{{\sigma_{O}^{2} }}{{\sigma_{e}^{2} }} $$
(12)

In this study, GLUE approach was used to estimate the uncertainty associated with hydrological parameters.

3.6 Flood frequency analysis

Extreme frequency analysis is crucial to understand the probability of reoccurrence of flood events at different return periods. This information plays an essential role in flood control and water resource planning. However, high flow frequency mainly depends on the frequency distribution model type and number of distribution parameters. Depending on the environmental and climatic background, many distributions are deployed in various countries by many researchers to estimate the frequencies of high flow. For example, the Log-Pearson III distribution model is popular in the USA and Australia for infrastructure design (Griffis and Stedinger 2007), GEV and PEAR-III in Europe (Refsgaard et al. 2013) and Africa (Meresa and Gatachew 2019), and Wakeby and Log-Normal distribution types have been frequently used in Asian countries (Chen et al. 2012). However, one or two statistical distribution models may not capture the entire temporal and spatial variability of hydrological extremes. Therefore, the most commonly used distribution type (GEV) was applied for flood hazard frequency and magnitude curve development. The distribution was fitted to peak flow to understand the hydrological flood in the selected Awash catchments. In Eq. (13), the probability density function (PDF) is presented and has three parameters.

$$ {\text{GEV}}\quad f\left( x \right) = \left\{ {\begin{array}{*{20}l} {\frac{1}{\sigma }\exp ( - \left( {1 + kz)^{{ \wedge - \frac{1}{k}}} } \right)\left( {1 + kz} \right)^{{ - 1 - \frac{1}{k}}} } \hfill & {k \ne 0} \hfill \\ {\frac{1}{\sigma }\exp \left( { - z - \exp \left( { - z} \right)} \right)} \hfill & {k = 0} \hfill \\ \end{array} } \right. $$
(13)

where α is scale parameter, β shape parameter and k location parameter.

3.7 Uncertainty decomposition and estimation

In this study, three main variables, climate models, bias correction techniques, and hydrological parameters, were used to identify the relative uncertainty contribution on the total flood hazard magnitude. Unlike additive or multiplicative uncertainty estimation methods, ANOVA can decompose the aggregated source of uncertainty into individuals' and their interaction using specific extreme flow indices (Meresa and Romanowicz 2017). Three sources of uncertainty in flood hazard projections drive from climate models, bias correction methods, and hydrological parameters were estimated using the variances decomposition approach (ANOVA). In this study, n-way of ANOVA was used to distinguish the main variable effects and their interaction effect on the aggregated extreme frequency indices.

$$ {\text{SST}} = {\text{SS}}_{{{\text{CM}}}} + {\text{SS}}_{{{\text{BC}}}} + {\text{SS}}_{{{\text{HP}}}} + {\text{SS}}_{{{\text{CMBC}}}} + {\text{SS}}_{{{\text{CMHP}}}} + {\text{SS}}_{{{\text{BCHP}}}} + {\text{Error}} $$
(14)

where SSCM is the sum of the standard error of the climate models, SSBC is the sum of the standard error of the bias correction methods, SSHP sum of the standard of the hydrological parameters, SSCMBC is sum standard error of combined effect of climate models and bias correction methods, SSBCHP sum standard error of combined impact of climate bias correction methods and hydrological parameters.

4 Results

4.1 Evaluation and validation of different climate bias correction techniques

The annual and seasonal maximum precipitation time series of 12 GCMs was evaluated and validated against observed extreme precipitation in the time interval 1981–2010, which is the reference period of this study. Four statistical matrices (RR, MAE, PBIAS, RMSE) were used to measure the accuracy of the annual and seasonal maximum precipitation time series of 12 GCMs in reproducing the daily seasonal and annual maximum time series of observed precipitation. The correlation coefficient (RR), mean average error (MAE), percentage of bias (PBIAS), and maximum precipitation weighting root mean square error (PWRMSE) performance measure criterion were used to evaluate the 12 GCMs output with respect to the observed annual maximum precipitation. The results obtained for five locations in Awash basin are given in Fig. 3. Overall, the selected stations showed lower medium, and MAE, PWRMSE, and PBIAS values, respectively, indicating that the reliability and accuracy of the GCMs precipitation in reproducing observed seasonal and annual precipitation is high. The statistical matrix values were found different for different climate models. The MAE values vary in the ranges of 0.1–20 for annual maximum precipitation, PBIAS values range − 40 to 30, PWRMSE values range 0–0.1, and RR − 0.25 to 0.55 in all the selected catchments. Relatively, the bias correction methods performed better in C1 and C3.

Fig. 3
figure 3

Comparison of three bias correction techniques with observed daily maximum precipitation using a statistical matrix. The column represents each catchment using three bias correction methods, and each row represents each climate model

The climate model biases must correct before using it to force the hydrological models, especially because of the nonlinear response of the hydrological model to the precipitation and temperature forcing mechanisms. Figure 4 shows the comparison of three bias correction methods on simulation of monthly maximum precipitation average over each catchment for the CMIP6 ensemble (color bands, single models shown as a straight line), and the observed monthly maximum precipitation is represented by blue straight-line color for the period 1981–2010. Overall, the 12 GCMs give a broader spread in the rainy season and a relatively narrow band of climate models in the dry season. The spread of the selected 12 climate model ensemble does not exceed these climate models' upper and lower limit in the reference period. However, the width of these 12 climate models mainly depends on the type of bias correction method. DQM and SQF methods are relatively given a narrower band/spread than the other, whereas the EQM provided a wide range of climate model spreads (Fig. 4). In general, the annual cycle of maximum precipitation shows that the uncorrected GCMs have a considerable bias with respect to observed time series. The result of the three bias correction techniques is not the same in reproducing the observed seasonal maximum precipitation of the selected catchments (Fig. 4). The performance of these bias correction techniques in the reference period (1981–2010) is uniform in five catchments. Using distribution quantile mapping and empirical interpolation, the corrected precipitation range/band of 12 climate models is relatively smaller.

Fig. 4
figure 4

Comparison of seasonal raw and bias corrected daily maximum precipitation simulations from 12 CMIP6 GCMs for each of our five study catchments. Each column presents results of one bias correction method of five catchments. Each row presents a result of three bias correction methods of one catchment. In each panel, the dashed black line is the range of bias-corrected 12 GCMs models, red line is the median of the corrected 12 GCMs, blue line is the raw median of 12 GCMs, and black line represents the observed monthly maximum precipitation

These climate bias correction methods were applied to each GCM in future period (2015–2100). The influence of each climate bias correction method on the magnitude of change was assessed (Fig. 5). Daily precipitation between 2040 and 2069 (future-clim2) and 2070–2099 (future-clim3) was compared to the reference period 1981–2010 (Fig. 5). An increase was simulated in the far and near-future period by all GCMs scenarios in terms of annual maximum precipitation. The individual climate models simulated both increase and decrease in annual maximum precipitation. Similarly, changes from each bias correction approach show a slight difference in magnitude and direction in changing annual precipitation. Generally, these climate change signals are higher for SSP585 than SSP370 and SSP126 and slightly higher in the C4 and C2 catchments. Generally, the changes are smaller in C1 and C4 using DG methods, whereas C2 and C3 changes are smaller using EQM and DQM. Interestingly, a linear relationship exists between the annual maximum precipitation changes in 2020s and 2050s with proportional magnitude. This indicates that the slope of these changes in 2020s, 2050s, and 2080s is a positive trend.

Fig. 5
figure 5

Maximum precipitation change using three bias correction techniques and three scenarios. SQF-change factor, DGM-quantile mapping using gamma distribution, EQ- empirical quantile using simple interpolation. The blue dot stands for SSP126, red for SSP370, and green SSP585

Three scenarios have been assembled from 12 CMIP6 climate models (Table S1), constructed for three SSPs. Figure S1 and Table 2 show the projected max air temperature under each SSP scenario for each of the 12 climate models, together with the mean of the ensemble air temperature projected for each SSP. Air temperature is more stable and consistent with time and space in Ethiopia. The projected air temperature from the new climate dataset and new scenarios provides reasonable temperature change, increasing by 1.7–3.6 °C and relatively uniform in the selected catchments (Fig. S1 and Table 2). However, the spreads of the selected climate models are more extensive in C1 and C5. Overall, the projected air temperature has a positive slope and continuously increasing in the clim1, clim2, and clim3 future period. Overall, the mean daily air temperature and maximum daily air temperature over the baseline condition for the three different climate scenarios and two future periods (2050’s and 2080’s) are presented in Table 2. It can be seen that the maximum and mean daily air temperature increase from the lower to the higher/worst climate scenarios in 2050’s and 2080’s periods.

Table 2 Overall mean (Mean temp [°C]) and maximum daily temperature (Mean temp [°C]) conditions for the reference and the SSP126, SSP370 and SSP585 scenarios in 2050 and 2080 time slices

4.2 Calibration of the Hydrological models and parameter uncertainty evaluation

An overview of hydrological model calibration performance was provided to assess the quality of the sets of robust reference parameters and to check and control that the models perform reasonably well in validation and calibration period. The calibration performance of the four hydrological model in terms of goodness-of-fit criteria (GOFC) is summarized as boxplots according to models and criteria (Figs. 6 and 7). The horizontal line (red lines) in the box is the median (0.5) of goodness-of-fit values, and the lower (0.25) and upper (0.75) envelopes show the 25th and the 75th percentile values and the lower and upper whiskers show the 5th and 95th percentile values, respectively.

Fig. 6
figure 6

Summary of GOFC values across the five catchments for the calibration of the four rainfall-runoff models for the modeling periods according to the goodness-of-fit criteria

Fig. 7
figure 7

The multi-model ensemble of monthly maximum flow simulation using best HBV (first column), SMART(second column) NAM (third column) and HYMOD(fourth column) parameter sets selected based on best NSE value range in the reference period at each catchment

An overall analysis of the performance of the models showed that the SMART and HBV models are more efficient than the NAM and HYMOD models. The plots' median and upper whisker are lower for HYMOD models than for the NAM, SMART and HBV models. The number of runs with at least 60 for GOFC is higher for SMART models than for HBV, NAM and HYMOD models. The concept of SMART is more applicable to the study catchments investigated, which is probably due to the structure of SMART models, which have more routing reservoir instead of none for HYMOD models. The SMART models are slightly more effective for performant than the HBV model in the given period. This is also confirmed by the number of runs, which is higher for HBV models than for the NAM model.

The results of model performance are more distinct for LNSE than for the two other GOFC’s because of the lower whisker and median, which is higher than for others (NSE and KGE). In the case of KGE, the upper whisker is slightly similar to the two other GOFCs. The different versions of the HBV and HYMOD models with two or three parameters are almost equivalent in calibration period, despite the different model parameters to calibrate the performance criteria. It seems that assimilating the soil reservoir parameter to a water holding capacity of the soil is appropriate. It must be due to the fact that the two models are not very sensitive to this parameter.

The HBV, SMART, NAM and HYMOD hydrological parameter sets with a sample size of 30,000 were generated through uniform distributions. The 30,000 results from the Monte Carlo simulation are flagged with the NSE threshold criteria and carried forward to estimate future streamflow. GLUE-based parameter uncertainty approach was adopted to simulate a possible ensemble of daily streamflow, and NSE objective function was applied to separate the behavior and non-behavior simulations. In this study, the runoff simulation ensemble with likelihood value larger than 0.5 NSE threshold value was selected. The likelihood value less than the NSE threshold value of the model parameter sets is considered as non-behavioral. Figure 7 shows the seasonal maximum flow simulation results from best parameter (behavioral) sets, including observations in 1981–2010 period and confidence interval of 5% and 95% of the streamflow simulations. It indicated that the maximum seasonal simulation falls mostly between the 5% and 95% confidence bands, except in the C4 catchment. Almost in all the catchments, the upper 95% confidence interval band is mostly exposed to extreme seasonal peak flows. Overall, 83% of the observed maximum flow time series falls under the shaded area confidence interval and has a wide range of NSE values (0.3–0.75), and it is promising to use for climate change impact study.

4.3 Projected hydrological flow: performance of hydrological models under different bias correction approach

The long-term internal variability and projected flow dynamics from the period 1981–2100 at the given five stations are shown in Fig. S2. The x-axis is time in the year extending out to 2100, and the y-axis is the seasonal value of projected flow. It is clearly seen that the main rainy/peak flow season is covered by a yellow color, which is mainly June, July, and August. The flow during flood seasons has increased noticeably for each station since 2010. Relatively, there is a strong increasing trend in seasonal and annual flow in C1 than in C2. Overall, considering the daily projected flow time series shows that the peak flow increased for the coming three periods (2010–2039, 2040–69, and 2070–2099). This increase appears to be more significant during the summer and early spring season, with a mean peak flow increasing from 60 to 70 m3/s and during summer, with a mean flow increasing from 85 to 100 m3/s at C3. Also, projected flows seem to be smoothed during late autumn and early summer periods, with a mean flow of about 35 m3/s. In contrast, the future conditions show an inevitable variability from about 25 m3/s in May to 55 m3/s in October (Fig. S2).

The changes in flow simulations forced using three bias correction procedures for climate variables have been compared and presented in Fig. 8. Overall, it observed positive changes in annual maximum flow using DQM, EQM, and SQF, while the magnitude of annual maximum flow changes is not the same. However, these are not uniform across the selected catchments, bias correction methods, and climate scenarios. Relatively, distribution-based bias correction models give smaller changes in the annual maximum flow time series. This indicates that wet day frequency correction is significant in understanding the future maximum flow projections. Mainly, annual maximum flow changes using EQM give a higher spread range and uncertainty. Boldly observed that annual maximum flow changes have smaller uncertainty (size of the box plot) in SSP370 and higher in SSP126 (Fig. 8). Therefore, correcting the wet days and intensity of precipitation may significantly change the change magnitude and direction and minimize uncertainty. The annual maximum flow changes are smaller in the C1 and C3 catchments and higher in C5 catchment.

Fig. 8
figure 8

River flow change in clim1 (2010–2039), clim2 (2040–2069), and clim3 (2071–2100) with respect to the reference period (1981–2010). Each box plot represents the spread of twelve climate models listed in Table S1 with median 0.5 and 0.25 and 0.75 quantiles. Each column stands for each scenario (the first column for SSP126, the second column for SSP370, and the third column for SSP585). Each column represents each catchment (first column C1, second column C2, third column C3, fourth column C4, and the last column for C5). The red box at the top of the box plot in the right side indicates extreme outlier points

4.4 Flood hazard projections under varying climate conditions

Best fitted to annual maximum flow was selected for each station and climate model, and the best distribution was selected based on Akaike information criteria (AIC) (Table 3). Across the selected catchments and climate models, GEV, Gamma, Pearson type III, and Weibull distribution are the most dominant distribution types (Table 3). The projected annual maximum flow series has different distribution characteristics, resulting from different flood hazard magnitude and risk level. Of which, the three most dominant and best-fitted distribution models have similar probability density functions (PDF).

Table 3 Best fitted frequency distribution models to simulated peak flow using 12 GCMs and four hydrological models (SMART, NAM, HBV and HYMOD). Each frequency model is represented by number: Pearson-III distribution by 1, GEV by 2, Gamma by 3, Weibull by 4, Nakegami by 5, lognormal by 6, Loglogistic by 7 and Inverse Gaussian by 8

Using the most dominant PDF model (GEV), the flood quantile changes due to climate change have been evaluated. Figure 9 shows flood quantile changes in climate1 clim2 clim3 with respect to the reference period, estimated using an ensemble of 12 GCMs and three bias correction methods. The ensemble average of 12 climate models within three scenarios assessed (SSP126, SSP370, and SSP585) and three bias correction methods in five catchments shows a significant increase in flood quantile magnitudes and frequency in future (Fig. 9). The changes of flood quantiles are mostly consistent with changes in maxima flow, but it does not mean that the magnitude of changes and directions are the same. The future flood quantile changes are not the same in space and bias correction methods, and the frequency of the larger return period flood increases to higher than once in 20 years. The smallest changes are observed in C1 using EQM, C4 using SQF, C5 using DQM, and C3 using EQM. Relatively, the methods are consistent in C2 and C3 in providing information about flood quantile changes in future. Simultaneously, the flood quantile changes are significant at different return periods with different hydroclimatic simulations. For example, C1 catchment has a change range from 0 to 100%, but using the DQM, the changes range from 0 to 170%. Similarly, in C5 catchment 0–100% using DQM.

Fig. 9
figure 9

Changes in peak flow quantiles using three bias correction techniques and GEV frequency distribution model (at 10-, 30- and 50-years return period) using four hydrological model (HBV, SMART, NAM and HYMOD) at each catchment

4.5 Uncertainty estimation and decomposition of associated sources in flood hazard estimation

The four sources of uncertainty were estimated based on the one chain principle. The one chain uncertainty band quantification started from one climate model that passed through three bias correction techniques and four hydrological models with best hydrological parameters (Fig. 10). The contribution of each source of uncertainty is not uniform across the selected catchments. Each color band represents each source of uncertainty as a relative range of cascade uncertainty. Spreads of the 12 climate models represent the climate models uncertainty band; the spread of four hydrological models for hydrological structure uncertainty; the hydrological parameter uncertainty is presented by 95% confidence interval of selected hydrological parameter sets. The uncertainty due to bias correction types is estimated from the spread of three bias correction simulation results are estimated. However, the ensemble uncertainty of GCMs is relatively more considerable than the other sources. By comparison, hydrological parameter sets do not play a significant role in future hazard estimation (Fig. 10). Future flood hazard in C1 catchment is highly sensitive to climate change and respective bias correction methods, while C2 is more sensitive to hydrological model structures and bias correction methods in flood hazard estimation. Overall, from the highest to lowest the sources are arranged as follows: the climate models, hydrological model structure, bias correction procedures and hydrological parameters.

Fig. 10
figure 10

Summary of the additive way of uncertainty contribution analysis result using the direct additive method without considering their interaction. The peak flood was estimated at 30 years return period and calculated its change with respect to far future period

Figure 11 shows only the main sources of uncertainty and their respective bands at specific return periods and catchments, with considering the main factors’ interdependence using ANOVA (Meresa and Gatachew 2019), the contribution of individual sources of uncertainty as the main source and their interaction contribution to the flood hazard magnitude and frequency changes in future. The decomposition of source uncertainty from the main variables and their interaction were separated based on the variance in flood hazard values and calculated out of 100, which means the sum of all sources is 100. Hydrological parameter sets on flood hazard projection are not significant and less uncertain in the selected catchments. This finding is more related to (Meresa et al. 2021) conclusion. In comparison, the climate change variability shows a significant impact and uncertainty on the projection of flood hazards. This may be due to the projected precipitation intensity, higher temperature, and time of concentration in the area. On the other side, the contribution of hydrological parameter sets is not significant in flood hazard estimation due to weak role of ground and interflow components in the peak flow (Fig. 11). In general, the climate models and hydrological models structure are the main factors in peak flood magnitude and frequency projection, Whilst the uncertainty from the hydrological parameter sets is not added significant value to the total uncertainty. The share of climate models, bias correction, and frequency distributions are significant at design flow value estimation at 70 year return period. In C1, future flood hazard risk is highly uncertain due to climate model variability and their respective bias correction methods. By contrast, C2 catchment is more sensitive to the frequency distribution and climate model variability.

Fig. 11
figure 11

Contribution of each source of uncertainty in flood frequency estimation under climate change for the five catchments identified as C1 to C5. The three Quantiles estimated at return periods (RT) of 20, 50 and 70 years (Q20, Q50, Q70) arranged in the x-axis from left to right, respectively. Each color represents each source of uncertainty: green is uncertainty from climate models (CC), red from bias correction methods (BC), orange color from hydrological model structure (HM), pink color from the hydrologic model parameter (HP), and the other colors represent the interaction between the main factors

5 Discussion

5.1 Projected hydroclimate using CMIP6 dataset

This study used CMIP6 climate change dataset source to understand the impact and uncertainty in future peak flood hazards. From CMIP6 GCMs, 12 GCMs climate models with three climate scenarios (SSP126, SSP370, SSP585) were applied for future hydrological extreme simulations. Due to the lack of robust atmospheric-land model interaction, uncorrected GCMs are not recommended for extreme hydrological modeling (Meresa et al. 2017; Mendez et al. 2020). However, the impact of climate change magnitude also depends on the bias correction methods' principles (Pierce et al. 2015). Three climate bias correction methods were applied to correct the GCMs output with daily observed air temperature and precipitation.

Overall, the distribution-based bias-corrected model fields successfully reproduced the observed maximum precipitation, both in terms of the number of wet days and precipitation intensity. This implies that correction of wet days and intensity of GCMs is significant and provides a significant role in minimizing uncertainty, whereas the change factor and empirical methods give similar results in reproducing the observed daily mean air temperature. This shows that the interval variability and magnitude of the climate variable significantly influences the result of each bias-correction procedures. Due to this reason the bias correction to the GCM temperature field was small compared with precipitation.

Moreover, we found that the maximum future temperatures will increase but precipitation may either increase or decrease depending on climate model, climate scenario, hydrological model, bias correction method, future climate period and catchment. At each stage and modeling chain, the peak flood magnitude and frequency gives different change magnitude. The highest change in annual maximum precipitation was observed in C4 catchment, whereas in C5 the changes in the clim2 and clima2 show a non-significant (near to zero) magnitude and direction. In C1 catchment, most climate models indicate a positive change, but these changes are very small. These changes in precipitation and air temperature are consistent with previous findings using RCP-AFRO-CORDEX (Meresa and Gatachew 2019; Taye et al. 2015), using MIROC-ESM-CHEM and CSIRO-Mk3-6-0 (Daba et al. 2020), and using Statistical Down-Scaling Model, SDSM-CMI5 (Gebrechorkos et al. 2020) in Ethiopia. The annual daily maximum precipitation and daily mean air temperature projected under SSP126, SSP370, SSP585 shared socioeconomic pathways scenarios suggests that wet conditions could be slightly intensified and increased in C1 and C5. However, it doesn’t mean that the mean peak flood is much higher the normal condition. In contrast, the magnitude of the positive changes in maximum precipitation will be intensified in C1 and C2. Similarly, the mean temperature changes of these selected catchments are not much different in magnitude and direction. However, SSP126 gives lower changes in precipitation and temperature magnitude than changes using SSP370 and SSP585.

5.2 Modeling and projections of hydrological extremes

We examined the impact of the choice of different hydrological model structures with respect to reproducing historical extreme flows and the effect of the model structure choice on hydrological climate change evaluations. We studied four hydrological models with different number of parameter and routing and storages. The hydrological model with three storages and five parameters and six storages with 12 parameters yield equally good results in predicting peak flow in the selected five catchments. Those different model structures reproduce historical flow well and the projections of hydrological response under changing climate are consistent with meteorological data input patterns.

We compared the feasibility and robustness of four conceptual models widely used in Africa and Europe: HBV, SMART, NAM and HYMOD models. The investigation was undertaken on five catchments covering the main Awash River basin. The model’s performance were evaluated under three GOFC: KGE, NSE and LogNSE. For the climate change impact study, only those obtained with parameters sets giving at least 55% (0.55) as the calibration and validation performance were considered. An analysis of validation and calibration performance showed that SMART model performs better than HBV model, regardless the GOFC. The performance of NAM models is somewhat similar, albeit slightly higher than that of HYMOD models. The behavior of and validation and calibration performance of these models differs according to the GOFC criterion considered and is different from model to model. A clear relationship was found between validation and calibration performance and the hydrological flow regime using four hydrological models and three objective functions.

Robustness analysis also showed SMART models to be more robust than NAM and HYMOD models. The hydrologic models are most robust in terms of LogNSE. They seem to perform and work better when the same weight is given to all parts of the hydrograph. HBV and SMART models are the most robust in terms of NSE for peak flood simulation. In terms of KGE, the HBV model is the most robust and HYMOD model shows lower KGE values. SMART is the most robust in terms of LogNSE than any other likelihood functions. The analysis of the robustness and uncertainty of the models structure according to input climatic variations between validation and calibration periods (precipitation and PET) showed that it is better to transfer the model parameters from a higher PET and a drier period than doing the reverse. A more refined analysis showed that a precipitation variation of − 9% to 9% or PET of − 4% to 4% between the calibration and validation period causes: a loss of robustness less than 10% for SMART models and 30% for HBV models in terms of NSE and KGE; a loss of robustness between 15 and 20% for SMART models in terms of LogNSE. Authors such as Coron et al. (2012) and Dakhlaoui et al. (2017), reached conclusions comparable to those in this study. Particularly, Coron et al. (2012) found that the difference among the hydrological models and between validation and calibration periods should not exceed 15% for better model parameter transfer results. The sensitivity of the model parameters depends on the hydrological processes that predominate in the river flow over the calibration period and hydrological model structure. Similarly, the more sensitive the model parameter over a given period, the more likely it is to be the optimal model parameter for other periods.

Five catchments have been selected from different hydroclimatic and geographical locations to investigate the impact of climate change on flood hazard and identify the associated uncertainty in the projected flood hazard magnitude and frequency. Three hydrological model was calibrated and validated against the observed streamflow data using Nash–Sutcliffe (NSE), LogNSE and KGE objective functions in the period of 1981–1990 and 1991–2005, respectively. A likelihood value higher than 0.5 indicates a satisfactory hydrological simulation zone (indicates how well the hydrological model reproduces the extreme high flow) (Bennis and Crobeddu 2007) and promising for further analysis. Therefore, 0.5 was selected as a threshold in runoff simulation and accepted for future runoff simulations. The five catchments have an NSE value ranging from 0.5 to 0.75, LogNSE ranging from 0.5 to 0.78, and KGE ranging from 0.5 to 0.71 from the 30,000 parameter sets sample. Similarly, other studies also show similar results using HEC-HMS (Roth et al. 2018), using SWAT model (Shawul et al. 2019), and using the HBV light model (Bekele et al. 2019) in Awash catchments.

Uncorrected and the future corrected precipitation and temperature data from CMIP6 data sets were used to force four hydrological model and to examine the potential impact of climate change on the future flood hazard/peak flow. Overall, the projected high flow changes are positive, which is an increase in high flow in the near future (2010–2039), far future (2040–2069), very far future (2070–2099) for an ensemble of SSP370. Like the precipitation changes, the effect of climate bias correction on high flow change is clearly visible and brought higher changes in magnitude with a large uncertainty band. However, the distribution-based quantile mapping techniques relatively produce reasonable changes in high flow at the selected sites. The changes are higher using SSP126 and SSP585 climate scenarios.

In contrast, the SSP370 gives a smaller band in high flow changes in a near and far future period, which is a direct reflection of maximum precipitation change. In the C1 and C4 catchments, the near future changes, far future, very far future for an ensemble of SSP370 shows significant changes with a large uncertainty band, whereas in C2 and C3 catchments, the changes are not uniform across the bias correction methods, and changes are smaller than C1. This is most likely due to increased summer and spring precipitation (Meresa and Gatachew 2019). Similarly, the distribution-based climate bias correction gives relatively smaller comparisons with the empirical and linear interpolation methods at most of the catchments.

5.3 Flood hazard frequency and magnitude estimation under climate change

The impact of climate change on flood hazards was assessed through the extreme frequency distribution models. Each catchment's baseline and future flood frequency simulations were estimated from 12 climate models (GCMs). Various distribution types are commonly used in the US, Europe, and Africa (Kay et al. 2009; Collet et al. 2017; Meresa and Romanowicz 2017; Meresa and Gatachew 2019; Lawrence 2020). In this study, we fitted a series of frequency distribution models to the annual maximum flow time series and assessed the best performing distribution model. The GEV distribution model is commonly used in hydrological extremes analysis and was found to be the best statistical description of the model prediction data in this study. It was applied for flood quantile estimation and associated impacts. The estimated magnitude of flood quantiles at different return periods using GEV frequency distributions is not the same in magnitude and uncertainty band. Each climate model at each catchment gives a very wide range of flood quantile values. Overall, smaller flood quantiles were observed in C3 catchment, whereas the highest was estimated from C1 and C4 catchments. The estimated quantiles' total range is relatively smaller in C3 and C2, whereas the highest uncertainty range is estimated on C4 and C1 catchments.

Moreover, the changes of flood quantile values are not the same (significance difference) across the catchment and type of bias correction methods. In the C2 and C3 catchment, the flood quantiles at different return periods are expected to be larger in the near, far, very far future. In comparison, the changes are relatively higher in C1 and C4 catchments and expected to be higher in the very far period. This is mainly due to the magnitude and distribution of flood events in the given period (Armstrong et al. 2012; Rao et al. 1972; Keast and Ellison 2013).

5.4 Uncertainty in flood hazard frequency and magnitude and decomposition

In this study, 12 climate models, three bias correction methods, three hydrological models with 30,000 parameter sets, GEV flood frequency model, and three climate scenarios from the new dataset (CMIP6) were considered to analyze the impact of future flood hazard in the selected five Awash catchments. Using additive uncertainty aggregation, each component of these analyses is integrated into one framework that contains information from each part. Climate models are highly uncertain in characterizing the future climate variables. In the last decades, various studies stated that RCP and SRES data are very weak in reproducing the historical climate extremes (Fowler et al. 2007; Kingston et al. 2011; Saini et al. 2015; Meresa and Romanowicz 2017). This uncertainty is may be due to the structure, parametrization, and spatial resolution of the GCMs. Using multiple models in climate change impact analysis would lead to an improved understanding of the uncertainty associated with climate models. It is essential in flood risk magnate and water resource management. Similarly, in this study, multiple climate models from the new data set CMIP6 (shared socio-economic pathway scenarios) were evaluated for flood hazard projection in Awash basin. The spread/discrepancy of climate models’ impact is significant. There are also important uncertainties component which is associated with the climate bias correction methods. The catchment-scale characteristics are not the same as the GCMs spatial and temporal characteristics; therefore, it is important to use highly relevant climate bias correction techniques to understand the uncertainty and minimize. Three bias correction methods were used to correct the GCMs output at catchment scale. These methods were effectively corrected in the historical period and then adopt the parameters for future climate corrections. There are few studies related to bias correction methods uncertainty and flood frequency (Kay et al. 2009; Saini et al. 2015; Soriano et al. 2019). They found that the climate bias correction could alter the magnitude of flood hazard. This study also confirmed this study using three climate bias correction methods and GCMs from CMIP6 in Ethiopia's selected catchment. We find that the uncertainty source due to wet days and intensities of precipitation has a significant impact on the flood hazard magnitude (e.g., C4 catchment).

The hydrological model structure and parameters represent hydrological process. The hydrological parameters control the flow from single droplet of rain to the deep catchment percolation. In this study, more sensitive parameter sets of HBV, SMART, NAM, HYMOD hydrological models were selected using NSE objective function values (0.5 as threshold). GLUE was used to investigate the role of hydrological parameters in flood hazard estimation in the selected catchments. The analysis was performed by using a uniform sampling approach from the given parameter ranges. The simulated flow in the historical period is pretty good for mean and high flow simulation. The width of the simulated band is not significantly increased by changing the threshold for high flow simulation. Generally, the uncertainty is band due to parameter change is not significant in flood hazard estimation. Our finds conformed that the role of hydrological parameter sets is less in all the selected catchments in Awash basin. Yan et al. (2015), Meresa and Romanowicz (2017),Joseph et al. (2018) they appear with similar conclusion that high flow is less sensitive to hydrological parameters.

The uncertainty related to flood frequency under climate change is not extensively investigated and it has been very challenging to integrate with other sources of uncertainties. Each extreme high flow simulation from each climate model simulation and bias corrected simulations were fitted to most dominant frequency models. Several researchers have been conducted flood frequency estimation, and they concluded that GEV and EV distribution models are the most commonly used frequency models in Awash catchments (Tegegne et al. 2020; Ahilan et al. 2012). However, our numerical experiment result confirms that GEV distribution is the most dominant model type in the selected Awash catchments. The uncertainty band is not significant in the selected catchments of flood quantile values. Therefore, the uncertainty of flood frequency models in these catchments are avoidably in the hydro-climate projections. However, the results should be assessed with care because the probability of recurrence of extreme floods of the selected catchments are become more intense and frequent and intense in the near, far, and very far future periods.

6 Conclusions

The results of climate variables from CMIP6 data set are consistent with previous studies and applied for the first time in estimation of future flood hazard under varying climate conditions and associated uncertainty estimation. There is a significant range of differences among the results of GCMs in climate projections. The results also confirm that the CMIP6 GCMs outputs are promising for climate change impact studies on hydrological extremes. Our results also indicate that the proper climate model bias correction techniques can improve future estimated precipitation and temperature characteristics. The climate models bias has been corrected using three techniques (DQM, EQM, and SQF). The precipitation results using the DQM approach are relatively close to observed daily maximum precipitation data. This indicates that climate models have higher biases in projection of extreme precipitation magnitude (greater than 75% of the maximum precipitation), and smaller bias results in lower precipitation magnitude (lower than 75% of the maximum precipitation). This is due to seasonal variability and intensity of precipitation. Therefore, giving more focus to the upper quantile of precipitation distribution can improve the model's capability to reproduce the observed time series and reduce the error of model predicted extreme values of precipitation and temperature.

The projected hydro-climate extreme indices are less consistent and indicate a strong model agreement with a wide range of variability in the selected catchments. However, the magnitude of changes in hydro-climate extreme indices mainly depend on the type of bias correction technique. DQM, EQM, and SQF give relatively smaller range of changes in annual maximum precipitation and annual maximum flow. Overall, increases in flood and extreme rainfall are expected and are not uniform in space/time. Similarly, the GEV frequency model was fitted to simulated annual maximum flow time series. The flood quantile changes were calculated based on the relative difference between the future (2015–2100) and historical (1981–2010) flood quantile simulations. Due to precipitation and temperature changes, in the twenty-first century, the magnitude of flood quantiles at different return periods between 5 and 100 years could increase between 5 and 50% in the selected catchments. However, these changes are not the same across all bias correction methods and frequency models in each catchment.

Comparing the results of climate change scenarios revealed that SMART, HBV, NAM and HYMOD are suitable for assessing climate change impact on peak flow because they provide similar and comparable results. The NAM and HYMOD models produce considerably higher peak discharge. The SMART model can simulate the slow and fast subsurface flow components and produce smaller change in peak flow in almost all the catchments, all the catchments. Whilst the HBV model shows suitability for peak flow simulation in C1 and C4 catchments due to its physical representativeness and exhibits the best tradeoffs between calibration parameterization efforts. In contrast, the NAM model reproduces the peak flow curve very well but does not significantly contribute to understanding the subsurface processes. This finding strengthens the opinion that a smaller number of parameters and storages would lead to a poor representation of hydrological processes, especially in data-sparse catchments.

The uncertainty analysis was performed by comparing each source of uncertainty regarding the total aggregated value flood quantile at different return periods. The result confirms that hydrological model parameters, spread of climate models and bias correction techniques are important sources of uncertainty in flood hazard estimation. Specially, the climate models, bias correction methods and frequency distribution type models are the most significant source of uncertainty in future flood hazard estimation. This additive of uncertainty analysis is also proved by ANOVA analysis. ANOVA is variance-based sensitivity analysis and provides information about the main factors' contribution and their interactions. The decomposition of sources of uncertainty in future hazard estimation was performed based on the variance in changes of flood hazard value at Q20 (probability of recurrence of flood event at 20 years return period), Q50 (probability of recurrence of flood event at 50 years return period) and Q100 (probability of recurrence of flood event at 100 years return period). The result confirms that climate change is the dominant factor in C1 and C2, whereas C3 bias correction type and C4 distribution type are very important factors. The ANOVA also confirms that the contribution of their interactions is significant. The contribution of climate models and bias correction methods and interaction is especially substantial and non-negligible in flood hazard estimation.

Moreover, better flood risk management and policy need to consider these main factors and their management policy interaction. However, the value range of each source is large and challenging to communicate with decision-makers and stakeholders. Therefore, we strongly recommended considering the use of each spread's median values for future flood risk management and take as standard information on extreme flood adaptation measures.