Introduction

Water is the most important element of vitality in human life, such that all civilizations were formed by rivers. In many countries around the world, rainfalls are not evenly distributed over the year or do not conform to the demands. While climates in many parts of the Middle East are entering a critical situation. It is essential to understand that the water shortage crisis also adversely affects the qualitative characteristics of the flow (Bakhsipoor et al. 2019). Meanwhile, asymmetric distribution of drought and flood on the surface of the earth cause widespread human and financial losses (Kim et al. 2005; Moazami et al. 2016). According to the Parliament of Iran, an estimated $2.5 billion per year in damages is caused by drought in the country. In 2019, the Intergovernmental Panel on Climate Change (IPCC) reported that over the past decades, changes in climate caused by global warming have led to alterations in the hydrological regime (Change 2019; Gholami et al. 2022). Over the years, a distorted pattern of inflow to reservoirs due to climate change and human activities such as land-use change has led to significant alterations in the amount, frequency, and timing of the inflow to the water reservoirs (Demirel et al. 2013; Halgamuge and Nirmalathas 2017; Yang et al. 2018; Mostaghimzadeh et al. 2021). Therefore, the need for the employment of strategies to reduce the damages caused by droughts and floods has become more eminent. Predicting the inflow to reservoirs is considered a fundamental solution to deduce more efficient operation policies and thus reduce damages. However, alteration of the inflow to the system affects its supply values, energy production, and quality properties in the long term (Ashrafi and Mahmoudi 2021). It is important to note that extending the prediction further into the future will provide the policy-makers and operators with more leeway in making immediate or long-term decisions (Mostaghimzadeh et al. 2022). However, longer-term predictions come with higher uncertainty and therefore make the decision-making processes more complicated. In this context, data-based models are widely used. In these models, hydrologic processes are not presented definitively, but the main focus is on finding the relationship between the variables influencing the output of the models (Shortridge et al. 2016; Peng et al. 2018). With this rationale, using these models to estimate the values outside the observation intervals must be carried out with caution (Tongal and Booij 2018).

In general, awareness of the emerging uncertain condition requires an estimation of future status (MostaghimZadeh et al. 2021; Huang et al. 2022; Beshavard et al. 2022; Luo et al. 2023). This means the prediction should present the desired accuracy and meet the evaluation indices at a significant level and, at the same time, its horizon must be as far ahead in the future as possible to paint a better picture of the situation and allow for making short-term decisions following the long-term course of events. A review of the previous research reveals that the development of the flow predictive models is summarized in several different structures. Much of previous research has developed predictive models with a one-month lead time using novel methods and smart algorithms. In such researches, a wide range of feature selection methods was employed and various learning models such as neural networks, support vector regression, etc., were used to relate inputs to outputs. Also, in studies where the forecasting lead time extended further than a month, the work was based solely on the amount of rainfall. By predicting the rainfall values and subsequently using a rainfall-runoff model, the flow was obtained in the future. Despite all capabilities of Artificial Neural Networks (ANNs) in prediction of real-word phenomenon (Geem 2011), the framework of learning processes affects the accuracy of the results considerably. Osman et al. (2020) employed Fast Orthogonal Search to investigate the nonlinear relation of the inflow to the Aswan High Dam basin. The results showed that this method outperformed the classic artificial intelligence methods, especially for larger numbers. Ni et al. (2020) proposed the GMM-XGboost hybrid model, created by coupling the Extreme Gradient Boosting and the Gaussian Mixture models, to develop a monthly flow model in the Gangtze River basin. The results showed that the proposed hybrid model was more successful than the SVM model in the detection of hydrologic patterns. Ren et al. (2020) compared the performance of eight feature selection methods in predicting the monthly flow (CAMELS). Ashrafi et al. (2020) used the Ensemble Learning technique to compare seven different approaches to estimate the monthly flow upstream of the Dez dam in the Southwest of Iran. The results showed the superiority of coupled Discrete Wavelet Transformation-Artificial Neural Network (DWT-ANN) model with four-item groups. Quedi and Fan (2020) developed a seasonal hydrologic predictive model and extended the results for tropical weather conditions as well as large-scale basins. The results showed the desirable performance of Ensemble Prediction in the assessment of flow patterns and consequently its acceptable estimation on higher lead times. Sigaroodi et al. (2014) attempted to develop a long-term precipitation forecasting model by investigating the correlation of the oceanic-atmospheric indices on the precipitation of the Maharloo basin in Iran. The results indicated a deep correlation between the fluctuations in rainfall patterns and some of these indices. In the end, the final model was able to estimate the amount of precipitation with more than one-month lead times based on a set of influential indices. Kilinc and Haznedar (2022) developed a model to estimate future streamflow conditions considering sustainability using deep learning models. The estimation results of the models were evaluated with RMSE, MAE, MAPE, STD and R2 statistical metrics. The comparison of daily streamflow predictions results revealed that the proposed model provided promising accuracy results and mainly presented higher performance than the benchmark model and the linear regression model. Liu et al. (2022) integrate meteorological forecasts, land surface hydrological model simulations and machine learning to forecast hourly streamflow over the Yantan catchment, where the streamflow is influenced by both the upstream reservoir water release and the rainfall–runoff processes within the catchment. Evaluation of the hourly streamflow hindcasts during the rainy seasons of 2013–2017 shows that the hydrometeorological ensemble forecast approach reduces probabilistic and deterministic forecast errors by 6% compared with the traditional ensemble streamflow prediction (ESP) approach during the first 7 days.

The literature shows that most of the flow prediction models use novel feature selection methods along with learning models to assess the flow in the form of monthly models. Also in some cases, models with lead times shorter than one month have been investigated. Indirect methods were often used for runoff predictions with longer than one-month lead times. In other words, first, a model for long-term rainfall is obtained. Then, by using a rainfall-runoff model and estimating the other contributors at the basin level, flow in a long-time scale (more than one month) is estimated based on the predicted precipitation (Adamowski and Prasher 2012; Chu et al. 2017; Mosavi et al. 2018; Peng et al. 2019). In that sense, there is a research gap in the development of models that can directly predict runoff within water resource systems with long forecasting lead times. Furthermore, comparisons of flow prediction models with short and long lead times have rarely been carried out.

Therefore, this study attempts to develop a model based on the ensemble learning approach derived from (Ashrafi et al. 2020) and the various oceanic-atmospheric indices that can estimate the inflow to a drainage basin with a six-month lead time. In an ensemble learning system, the impact of predictors on target parameters is investigated systematically in groups and the final solution can be a range of predictors. In the beginning, the flow with one-month lead time is calculated and used as a benchmark to compare the results of the short-term and long-term lead time. To do so, first, different approaches to the development of a flow prediction model with monthly time steps are examined (Ashrafi et al. 2020). Then, solutions to extend the prediction lead time are proposed and a model is deduced to provide a proper estimation of the flow for a long-time step (six-month lead time). In the end, the best long-and short-term models according to assessment indices are selected and their performances are compared to study the impact of extending the prediction horizon. Investigation of the direct impact of climate indices on inflow to a multi-reservoir system and the model development approach are innovations of this study.

Research methodology

In this study, two sub-models are presented. First, the short-term model, which is a prediction model for monthly inflow to a multi-reservoir system. Second, the long-term model, which is a flow prediction model with a six-month lead time. For the one-month lead time, the seven approaches proposed by Ashrafi et al. (2020) are extended to predict future inflows into a multi-reservoir system and the best scheme for each reservoir is selected. Then, to increase the prediction lead time, the flow time series is used in combination with various climatic-oceanic indices contributing to the flow of the multi-reservoir basin. In addition, a new structure based on ensemble learning is introduced to select proper inputs for model learning. In the end, the best short- and long-term models are selected and compared to study the impact of prediction lead time on the model's performance and the evaluation indices.

For the short-term forecast model, the wavelet components of the streamflow time series are adopted as potential predictors considering the lags of 1 to 10. In contrast, for longer forecast horizon the oceanic atmospheric teleconnection indices are used as predictors. As these indices have a seasonal behaviour, it is better not to use them for every month and sampling of them was different from monthly horizons. Moreover, the lack of enough data was a barrier to consider their lags within the feature selection process. In order to develop a solution, the present state of each index is just considered as a potential predictor. Finally, the potential lags are identified using ACF and PACF function according to the rate of correlation between streamflow and selected indices. The length of forecast horizon is considered as the closest effective lags to the present state.

The proposed forecasting model

The proposed structure is based on the developed model by Ashrafi et al. (2020) to obtain short- and long-term prediction models. This structure consists of learning sub-models (McCulloch 1990), input selection model (Phuong et al. 2005; Wang et al. 2009), and evaluation indices (Nash and Sutcliffe 1970; Willmott 1981; Shamseldin and Press 1984; Legates and McCabe 1999; Krause et al. 2005; Chai and Draxler 2014). In this structure, seven proposed approaches are studied for each reservoir and the final model is selected based on the evaluation indices. The short-term flow prediction is generally based either on the exclusive use of time series of the inflow or the forecast deduced from satellite and radar data on precipitation and its conversion into the runoff. With the extended prediction lead time, the flow time series is not sufficient as more data are required to estimate the flow with optimal accuracy. Long lead times surpass the one-month benchmark and which fills the gap between short-term predictions and long-term climate changes (Tourigny and Jones 2009). These forecasts are often influenced by significant uncertainties. Many researchers have been attempting to propose methods for more accurate predictions (Wu et al. 2011). Many studies have concluded there are nonlinear relations between meteorological phenomena, climate indices, and accuracy of answers (Kim and Barros 2001; Tourigny and Jones 2009; Guérémy et al. 2012). According to Teschl and Randeu (2006), certain variables representing the atmospheric flow fluctuation, which are directly related to the sea surface temperature fluctuations, have a significant long-term impact on rainfall, and as a consequence, on stream flows. At the same time, atmospheric flow dynamics are influenced by solar activities including radiation intensity and sunspots which are defined as temporary phenomena of the photosphere and are seen as dark patches on the surface of the sun. Solar radiation is considered the energy source for climate systems and is often measured as a constant factor. According to Sigaroodi et al. (2014), the number of solar patches is correlated with rainfall. Wang et al. also confirmed the correlation between sunspot and runoff in river basins (Wang and Sheeley 1997). They also showed that the length of the sunspot cycles is strongly correlated with the temperature of the northern hemisphere and drought periods (Vennerstrøm et al. 1991; Weickmann et al. 2000). Meanwhile, the high thermal capacity of the oceans has a significant impact on the atmosphere–ocean interaction, which ultimately changes the atmospheric flow patterns over the subsequent periods (Penland and Matrosova 1998). Based on the above-mentioned studies, climate indices are highly correlated with precipitation at a basin scale. In the same way, the amounts of rainfall and runoff are always correlated. Therefore, the correlation between climate indices and runoff can be directly measured concerning other contributors to runoff, such as soil conditions, vegetation, snow depth. Also, a model to predict the direct runoff can be developed based on the indices using machine learning algorithms.

To develop a prediction model, the relation between the oceanic-atmospheric currents and the time series of inflows to a multi-reservoir system is examined with the proposed approach. Then, according to the level of correlation between these indices and the inflow to each system, inputs are selected for the learning model. For input selection, various quality grades are defined based on the amount of correlation for each index, and the model’s input parameters are selected by categorizing the elements at each grade. Then, after the best group in each grade is determined, combinations with the best groups from all the levels are created and the final model is chosen based on the evaluation indices. In Fig. 1 a flow chart of the model is demonstrated.

Fig. 1
figure 1

Conceptual flow chart of long-term forecast model

Atmospheric-oceanic indices

Atmospheric-oceanic Indices represent the physical phenomena correlated with rainfall. In this study, 36 climate indices from the NOAA’s website (http://www.esrl.noaa.gov/gmd/dv/ftpdata) are used to evaluate existing correlations. In Table 1, a list of the indices as climate predictors along with their respective harvest periods is presented.

Table 1 Primary selection of ocean-atmospheric predictors

The study area

The Karun River in Khuzestan, Iran comprises five operational dams: the Karun4, the Karun3, the Karun1, Masjed Soleyman, and Gotvand-Olya, and the Dez River includes the Dez dam. The two rivers join at Band-e-Ghir and form the Great Karun River. The average annual precipitation in the basin range from about 150 mm per year in the lowland plains to around 1800 mm per year in the adjacent Koohrang Mountains. The Khersaan River is one of the most water-rich tributaries of the Karun River which drains into the Karun-3 reservoir. In addition, the outflow from the Karun-4 also drains into the Karun-3 reservoir. Past the Karun-3, all the dams on the Karun basin are positioned serially behind one another where at the end of each reservoir is another dam. At the downstream of the Karun3 is the Karun1, which has been in operation for years. The Masjed Soleyman dam is located in Masjed Soleyman downstream of the Karun1 and is tasked with energy supply. The last dam on the Karun, the Gotvand-Olya, is currently in operation. Meanwhile, on the Dez basin is located the Dez storage dam whose outflow joins that of the Gotvand-Olya dam in Band-e-Ghir and form the Great Karun River (Ashrafi 2021). In the figure below, the layout of the reservoirs on the Great Karun basin, which is investigated in this research, is presented. In this study, 10 flow nodes including tributaries and sub-basin inflows are accounted for according to the following Table 2 and Fig. 2.

Table 2 Inflow nodes of great Karun Basin
Fig. 2
figure 2

Schematic of great Karun Basin

Results and discussion

The achieved results are presented in two following sub-sections for short and long lead times separately, and the performance of the proposed approach is discussed.

Runoff forecasting with Short-lead time

According to the initial framework proposed by Ashrafi et al. (2020), the short-term forecast model has been developed using an approach consisting of an Artificial Neural Network (ANN) and a Discrete Wavelet Transformation (DWT) operator. This combination has been employed in order to deduce the frequency of variation dominant in the initial time series. The general equation of the DWT function is defined as follows (Polikar 1999);

$${\text{DWT}}_{x} \left( {s,b} \right) = \frac{1}{{\sqrt {s_{0}^{m} } }}\mathop \smallint \limits_{ - \infty }^{ + \infty } x\left( t \right).\psi \left( {\frac{{t - {\text{nb}}_{0} s_{0}^{m} }}{{s_{0}^{m} }}} \right){\text{ dt}}$$
(1)

Here, \(\psi \left( . \right)\) is the mother wavelet function,\(s>0\), is the scale parameter, b, stands for the time transfer parameter, and \({\text{DWT}}_{x} \left( {s, b} \right)\) indicates the discrete wavelet transform of \(x\left(t\right)\). The scale parameter determines the wavelet width. The monthly runoff time series of all watershed nodes are decomposed to the fourth level. At each level of wavelet decomposition, one of the coefficients indicates low-frequency, long-term changes and the other indicates high-frequency, short-term fluctuations. Also, at other levels, the low-frequency coefficient is split and two new coefficients are obtained. Thereby, the changes hidden in the initial time series are more accurately evaluated, enabling the learning model to estimate the relationship more precisely by accurately recognizing the underlying runoff patterns. As the levels of decomposition progress, the coefficients become smoother. The process proceeds until the initial mother wavelet replaces the coefficients. In this process, at each level of decomposition, multiple parts of the time series are sampled by the mother wavelet, and the similarity is determined in the form of the internal multiplication of the two vectors. This process continues until the sampling replaces the initial time series with the mother wavelet chart, as the smoothed surface. Accordingly, the number of decomposition levels should not be too high (to inhibit fluctuation patterns from being hidden) and too low (to restrict the model from reaching the initial mother wavelet due to low oscillations). Hence, as previously mentioned, each time series is split into four levels. An “Only ANN” and six “pre-processed ANN” approaches including self-correlation and correlation with the main data are analysed to develop the monthly forecast model for each of the time series of reservoirs’ inflows and sub-basin inflows. Several models are extracted for each time series (Table 3). Importantly, ECWT models include the best possible combination with the desired evaluation indicators.

Table 3 Beast predictor combination for inflow nodes (monthly forecast model)

Several neural networks have been trained to estimate the runoff into each key point of the system using 85% of the observed data. The trained networks are evaluated using 15% of the observed data and the best one is implemented for predicting the runoff of each key point. The best networks for different key nodes of the system are presented in Table 3. Table 4 summarizes the performance evaluation indicators related to 10 independent training processes in the form of mean and standard deviation (SD).

Table 4 Performance evaluation indicators for the best monthly forecast models

The highest correlation coefficient (R) is associated with the prediction process of the Sub-basin inflow of Karun1 to MasjedSoleyman. The Willmotts index works like a correlation coefficient. In this case, the Sub-basin inflow of Karun1 to MasjedSoleyman has the best performance. Also, by analysing the Nash Sutcliffe index, and according to (Shamseldin 1997), all predictions are satisfactory, and in some cases, such as the Bakhtiari and Karun 4 Head flows, the overall performance is highly acceptable.

RMSE and MAE indicators are two scalar indexes in terms of quantity and their numerical value does not provide information on their relative utility, i.e., we cannot certainly discuss the predictive performance of these indicators. Therefore, the relative RRMSE and MAPE indicators, which measure the residual value or error of the model relative to the target value, are calculated. From Table 4, the chosen models for all runoff time series have a low relative error and are in the acceptable range. Figure 3 shows simulated runoffs compared to the observed flows in all considered nodes.

Fig. 3
figure 3

Observed and predicted monthly streamflow in testing period (a Bakhtiari Head flow, b Cezar Head flow, c Dez to Band-e-Ghir Sub-basin Inflow, d Karun4 Head flow, e Karun4 to Karun3 Sub-basin Inflow, f Karun3 to Karun1 Sub-basin Inflow, g Karun1 to MasjedSoleyman Sub-basin Inflow, h MasjedSoleyman to Gotvand Sub-basin Inflow, i Gotvand to Band-e-Ghir Sub-basin Inflow, j Khersan Head flow)

From Fig. 3, the performances of the selected methods for detecting the runoff pattern in all key nodes of the system are highly acceptable. At this step, the wavelet transformation operator has helped to accurately identify the existing fluctuations and repetitive patterns. It has also helped the learning process to perform and the corresponding coefficients to be determined following these patterns. Importantly, peak values have not been desirable in most sub-basins. Peak values are always difficult to predict in methods based on the streamflow time series. The reason is that in a long time, the maximum number of samples is very small, and if learning is done only based on the number of occurrences, these values cannot be accurately predicted. However, according to Fig. 3, in all key nodes, the pattern of peak occurrence has been correctly identified and its period obtained. Also, the evaluation of the proposed method in the study of time series with a natural cycle is important. According to Table 4 and Fig. 3, the evaluation indicators are achieved in nearly the same range, meaning that time series derived from a natural cycle have similar patterns, and if this pattern is correctly identified, then it can be accurately predicted. However, this is expected not to change the general patterns governing the natural cycle. That’s why peak events that are somehow the result of disruption of natural cycles are not properly identified.

Accordingly, if the forecast lead time increases, it is not possible to merely follow the pattern of current changes, and other influential parameters need to be identified. The following figure shows the peak estimation performance of the selected methods at all key nodes.

Figure 4 shows that the peak values are often estimated to be less than the observed values. As mentioned earlier, due to the small number of training patterns and the data-driven nature of the neural network, more samples are required to determine the exact behaviour of a real-world phenomenon. The lower peak value of runoff diminishes the rate of predictive relative error and increases the accuracy of the estimated value, as the low peak values are closer to the normal data range, and there are more samples in that range. During the learning process, the neural network has more similar samples to determine its weight coefficients for estimating normal ranges (Geem et al. 2007). A box-plot diagram is used to further examine the distribution of the predicted data. Due to its nonparametric nature, the box plot is independent of the initial assumption regarding statistical distribution, leading to a better understanding of the degree of expansion and skewness of a time series. Also, whiskers show the diversity in the data in down and upper quartiles (25% and 75%).

Fig. 4
figure 4

Peak estimation within the test period for (a Bakhtiari Head flow, b Cezar Head flow, c Dez to Band-e-Ghir Sub-basin Inflow, d Karun4 Head flow, e Karun4 to Karun3 Sub-basin Inflow, f Karun3 to Karun1 Sub-basin Inflow, g Karun1 to MasjedSoleyman Sub-basin Inflow, h MasjedSoleyman to Gotvand Sub-basin Inflow, i Gotvand to Band-e-Ghir Sub-basin Inflow, j Khersan Head flow)

Figure 5 shows that the predictions of the chosen model are almost identical to observational data in all sub-basins. Besides, in estimating the minimum, maximum, and median values, the chosen models have acceptable performance with a justified error level. It is also observed that the estimated values of the chosen models are skewed to the upper values as the observational values, which can be due to the presence of a larger number of data greater than the median in the observational values. In other words, there are many points outside the box range, which tends to distribute upwards. Importantly, in a limited number of nodes, such as Khersan, Karun 4 Head flows, and also Sub-basin inflow between Karun 4 and Karun 3, the distribution of data varies from the observed value, especially in small quantities. This could be due to a lack of appropriate data in this range. Also, the high difference between scattered points outside of the box in all basins is due to the inability of the proposed models to estimate peak values, which has caused a large difference between the maximum points.

Fig. 5
figure 5

Monthly Streamflow Peak estimation within the test period for (a Bakhtiari Head flow, b Cezar Head flow, c Dez to Band-e-Ghir Sub-basin Inflow, d Karun4 Head flow, e Karun4 to Karun3 Sub-basin Inflow, f Karun3 to Karun1 Sub-basin Inflow, g Karun1 to MasjedSoleyman Sub-basin Inflow, h MasjedSoleyman to Gotvand Sub-basin Inflow, i Gotvand to Band-e-Ghir Sub-basin Inflow, j Khersan Head flow)

Runoff forecasting with long-lead-time

As discussed in the methodology section, as the forecast lead time increases, the uncertainty of the results progresses, and the prediction accuracy decreases. Accordingly, it is no longer feasible to rely only on the current time series and even its decomposed components as a proper input for the learning model. In this study, a wide range of atmospheric-climatic characteristics are introduced (Table 1). Initially, the Pearson test was performed for each of these indicators relative to all current time series in the Great Karun Basin. In other words, a linear correlation was obtained between each indicator and the amount of streamflow in each of the inlet nodes of the catchment area. Importantly, a common period was initially obtained in each pair and became the basis of the work due to the inconsistency of the observed flows and climatic characteristics. In the following, different quality levels were defined based on the correlation values. In other words, concerning the amount of correlation, the influential indicators were ranked from the highest value to the lowest level of correlation, respectively. The results of the grouping of indicators are summarized in Table 5.

Table 5 Grouping of the climatic index based on the correlation coefficient

According to Table 5, from 36 indicators examined in this study, CAR, Tropical Pacific SST EOF, Nino1 + 2, and Atlantic Tripole SST EOF have the highest correlation with the streamflow time series in all key nodes of the watershed. As the goal of developing a long-term model is to predict runoff values with a more than six-month lead time, in classifying the correlation of an index, values related to 7th lag onwards are considered as a basis to can estimate the runoff at the 7th month according to these values in each time step. Note that each indicator, due to its nature, has an internal relationship with the amount of flow and has a unique correlation pattern. Figure 6 shows the correlation values of the last twelve lags of 1st-group indicators with the Bakhtiari Head flow time series. From the figure, each index has its correlation pattern with different values in each lag. For example, the Tropical Pacific SST EOF (Red) index is more consistent from the sixth lag onwards, with the highest number happening in the ninth lag. Therefore, the assumption of choosing the seventh lag onwards as the inputs of the learning model to predict the runoff that leads to the six-month growth of the forecast lead time is correct and can be quantified by having the input variables per month. The runoff in the seventh month can be estimated from input variables in each month. The seasonal nature of these indicators is also important. The effects of all indicators are different in seasons and according to the dominant climatic phenomena, their effects also vary. As it is clear in Fig. 6, the behaviour of some of these indicators is similar. It should be mentioned that in order to develop an efficient forecasting model, a wide range of different indicators should be used so that somehow all the dominant phenomena are well identified and used in the model. But one of the most important problems of seasonality is the small number of monthly and seasonal samples. In other words, there is a small number of indicators in each stage, which causes problems for numerical models that are dependent on data.

Fig. 6
figure 6

Correlation patterns of the first group indicators

For longer forecast horizons effective teleconnection indices for each month were identified. Afterwards, ACF and PACF functions were drew to identify effective lags. As we know, we must estimate streamflow for future month within a specified time interval. So, the distance between the current states and the first lag with high correlation determines forecast horizon. It is obvious that the other lags more than horizon lag also could be used as predictor but the first one is the forecast horizon. Figure 6 shows the behaviour of each index against streamflow. The length of this interval could identify the forecast lead time because there is a distance between current state and past. Indeed, the graph identifies how these indices could affect streamflow. In other words, the seasonality is gained form the fluctuations reflected in each chart.

Models are grouped after determining each of the influential indicators and qualitative grouping. In this way, single, double, triple, and quadruple categories are created at each level and the best category is chosen in each combination. In this way, the learning process is based on the similarities of a category with the main data (Ensemble Learning) and somehow distributes the effective feature and ultimately improves the performance of the training. Importantly, due to the differences in climatic indicators and the subsequent diversity of existing correlations, in some months, the correlation rate of an indicator may be at its best but other months of the year at worst. As established in many studies (Jayawardena and Fernando 1998; Palmer et al. 2004; Guérémy et al. 2012; Sigaroodi et al. 2014), the initial correlation of the indicators with the current values is relatively low due to the long spatial-local distance of the general atmospheric indicators as well as the potential low data resolution. To solve these problems, it is recommended to use an intermediate mode with a moderate correlation in all conditions. Accordingly, the basis of grouping makes it possible to balance the remaining months of the year while taking advantage of the highest correlation rates. Also, combinations have been created between other levels to assess the possibility of the quality of other indicators and choose the best possible mode. This is because the grouping criterion is based on the maximum monthly correlation, and the maximum correlation rate may be low in one month but has a uniform distribution in other months of the year. It can increase the accuracy of the final answers. In this research, a limited number of indicators have been used in each group of the long-term forecasting model, which indicates the low correlation of other indicators in each group. It is obvious that the quality of the model should be reduced if less important indicators are included in the learning process. Accordingly, removing some of them can improve the final performance. Table 6 presents the best compounds of one, two, three, and four members and also the best interfacial compounds related to Bakhtiari Head flow as a certain case.

Table 6 The best combination of climatic indexes for Bakhtiari head flow

Eighty-five per cent of observed data are used to train neural networks and 15% to test developed networks. Table 7 presents the evaluation indicators for the 10 independent runs of the best neural network for each key point of the system in the form of mean and standard deviation.

Table 7 Evaluation Index for runoff prediction with six months lead time

Table 7 shows the appropriateness of the linear fit in terms of the correlation coefficient. In other words, in the chosen models, a proper linear relationship has been established between the input and the output. Note that the difference in the amount of correlation coefficient in the monthly and six-month model is because the monthly model uses the time series of runoff and its decomposed components, and the one-month lag has increased the correlation to 0.9. Also, the highest correlation coefficient is related to the Gotvand to Band-e-Ghir sub-basin inflow belonging to the L1_Triple model, which is a single-group model. It is observed that the intra-group and inter-group categorization of indicators has ultimately led to the distribution of correlation values. In other words, the overlap of correlations has led a final correlation coefficient of the group to be higher than the correlation coefficient of every single component of the group compared to the original data, and in some nodes to record 0.8. The Willmotts index relatively works similar to the correlation coefficient, although it is more sensitive to differences in observational and predictive values. In this case, the Gotvand and Band-e-Ghir currents with the selected L1_Triple model have the highest value of this index. Also, it is observed that in all flow inlet nodes, this index has a value of more than 0.7. This is due to the nature of the index, which is slightly more than 0.65, even in weaker models. By examining the Nash Sutcliffe index and, according to (Shamseldin 1997) as the monthly model, all predictions are satisfactory and a uniform state is observed in all flow key nodes. The RRMSE index shows that the highest relative difference between the hypothetical fitting line and the observed samples is 32%, which is related to the Karun3 to Karun1 sub-basin inflow with the chosen L1 & 2_Triple model. This is an acceptable value for the forecast model that estimates the value of the target parameter over the next six months. Also, in other forecasting key nodes, the index is less than this value, indicating the optimal performance of the models. The RRMSE index is more sensitive to peak values due to its structure. Part of the RRMSE error can be attributed to the fact that the performance of all chosen models in predicting peak values is not fully satisfactory. However, in the MAPE index, the remaining value is calculated by the power of one and, therefore, it is more affected by the normal data. Accordingly, it is more reliable in the current problem. Figure 7 shows comparative predicted and observed values for all key nodes. The simulated values are the results of the best-chosen models in each node (Table 7).

Fig. 7
figure 7

Observed and forecasted streamflow with six-month lead time within the testing period (a.Bakhtiari Head flow, b Cezar Head flow, c Dez to Band-e-Ghir Sub-basin Inflow, d Karun4 Head flow, e Karun4 to Karun3 Sub-basin Inflow, f Karun3 to Karun1 Sub-basin Inflow, g Karun1 to MasjedSoleyman Sub-basin Inflow, h MasjedSoleyman to Gotvand Sub-basin Inflow, i Gotvand to Band-e-Ghir Sub-basin Inflow, j Khersan Head flow)

From Fig. 7, the effect of climatic indicators on the hydrologic processes within a watershed eventually causes changes in the surface flow pattern, and surface flows can be effectively predicted by identifying such effective indicators. However, these indicators directly affect the amount of precipitation in the basin and consequently have indirect effects on the surface flow pattern. Surface flow is not entirely due to precipitation, and the amount of base current accounts for a significant portion of the surface flow. As a result, the high correlation between climatic indicators and precipitation in all seasons does not lead to a high correlation between precipitation and surface runoff. However, it can be said that the base current is generally due to the infiltration process, which is itself due to precipitation in previous months. Therefore, in the months when the base flow is minimal, the climatic characteristics are directly related to the amount of runoff, whilst in the months when the base flow has a significant value, the surface flow has two direct and indirect correlations and is estimated using the indicators affecting the rate of penetration and precipitation.

In general, the performance of each of the chosen methods in detecting the runoff pattern is very acceptable. For example, we can highlight the input node of Karun4 Head flow, in which the L1_Triple model has a very small relative error in appearance, besides obtaining a correlation coefficient above 0.8. Also, most of the chosen models are from the indicators available in level 1. And the L1 & 2_Triple model has been chosen as an intermediate level model only in estimating the Dez to Band-e-Ghir and Karun3 to Karun1 sub-basin inflow, where the second level indicators have been influential. The overlap of correlation values ​​in estimating a target variable is another point to consider. Each indicator has a unique correlation pattern based on its internal nature (Fig. 6). In all key nodes, Ensemble models have been chosen from different multi-group member combinations. Also, in some nodes, such as the Karun3 to Karun1 sub-basin inflow, the effects of correlation overlap are observed. In other words, despite a lower correlation between the two groups, the final chosen model, L1 & 2_Triple, is an intermediate group combination and gives a better result due to the synergy of the correlation values. Also, the six-month forecasting model is not optimal in estimating peak values, which can be due to the low number of peak observation samples. Figure 8 shows the peak estimation results using the top three models in each node compared to the observed values.

Fig. 8
figure 8

Six-month lead time peak estimation within the test period (a Bakhtiari Head flow, b Cezar Head flow, c Dez to Band-e-Ghir Sub-basin Inflow, d Karun4 Head flow, e Karun4 to Karun3 Sub-basin Inflow, f Karun3 to Karun1 Sub-basin Inflow, g Karun1 to MasjedSoleyman Sub-basin Inflow, h MasjedSoleyman to Gotvand Sub-basin Inflow, i Gotvand to Band-e-Ghir Sub-basin Inflow, j Khersan Head flow)

From Fig. 8, there is a kind of underestimation with peak values. This could be due to the low number of observed data of peak events throughout the considered time horizon. For example, Karun4 to Karun3 sub-basin inflow, the L1_Foursquare model, which originated from the quadruple grouping of level one, can make the least error in peak values. However, it underestimates all peak values. Another case is the node of Dez to Band-e-Ghir sub-basin inflow, in which the chosen model L1 & 2_Triple has experienced the most relative error in the fourth peak. It is also observed that the relative error in smaller peaks is less, which is consistent with the monthly forecast performance. For example, in the node of Karun1 to MasjedSoleyman sub-basin inflow, the chosen L1_Foursquare model has a relative error of 22% in the second peak and 1% in the seventh peak. This indicates that higher peaks have a higher relative error and a higher underestimation rate. Also, the proper performance of a forecasting model in terms of statistical indicators does not mean the exact prediction value of peak values. For example, in the node of Gotvand to Band-e-Ghir sub-basin inflow, the L1_Foursquare, and L1 & 2_Triple models, which had a high correlation with the main current, could not even identify the peak current pattern and its time leg, because statistical indicators measure the mean time series and exclude local effects such as peak values. Also, data-driven learning methods are data-orientated and can train neural networks and ultimately estimate similar values by repeating a predefined pattern (Geem, and Roper 2009). As a result, due to the lack of peak samples, it is not appropriate to use only classical learning models to accurately estimate peak values. To examine the distribution of forecast data, the box-plot diagram of the results is shown in Fig. 9.

Fig. 9
figure 9

Box-plots of six-month streamflow peak estimation within the test period (a Bakhtiari Head flow, b Cezar Head flow, c Dez to Band-e-Ghir Sub-basin Inflow, d Karun4 Head flow, e Karun4 to Karun3 Sub-basin Inflow, f Karun3 to Karun1 Sub-basin Inflow, g Karun1 to MasjedSoleyman Sub-basin Inflow, h MasjedSoleyman to Gotvand Sub-basin Inflow, i Gotvand to Band-e-Ghir Sub-basin Inflow, j Khersan Head flow)

In the key nodes of Bakhtiari Head flow, Karun4 Head flow, Karun1 to MasjedSoleyman sub-basin inflow, and MasjedSoleyman to Gotvand sub-basin inflow the distribution is such that the values outside the box are similar to the observations and the maximums are estimated more accurately. It can be attributed to the better estimation of peak values by the selected model (yellow box). The figure clearly shows that in the selected models the span, mean, and range are 25% and 75% similar to the observations, whereas in other models generally the whiskers are propagated in the opposite direction or the dispersion of the answers is not sufficient. For example, for the node of Karun1 to MasjedSoleyman sub-basin inflow, the L1&2 Foursquare model, derived from the combination of first and second four-item groups, shows a downward whisker which is contrary to the observed data. This is not just an underestimation in the peak values but indicates that the means and minimums were also underestimated. In another example, the Dez to Band-e-Ghir sub-basin inflow where the L1 & 2 Triple model, resulting from the combination of first- and second-grade three-item groups, has a very small span and in a sense formed a limited box which is usually because the pattern of the target variable is not accurately recognized in the correlation. Here, the six-item group comprised of three first- and second-grade indices has not been able to accurately identify the pattern of flow change in the node; hence, the estimated values are calculated in a limited range of original data changes. It is also seen that in some of the selected models, such as L1_Foursquare, the distribution of data is fully similar to the observed values for Dez to Band-e-Ghir sub-basin inflow while the lower whisker, which is completely skewed, is in contrast to the observation. However, the small difference between the lower whisker and the observations indicates that there is sufficient data in this interval and the difference can be due to low local correlation in some months. Because in practice calculations are based on the mean correlation and in some cases, large differences between correlation and observation in a certain month lead to such results. According to Table 5, grade-1 indices are more correlated with the overall flow in the drainage basin which indicates the flow was generally influenced by the fluctuations in Pacific SST. This is in agreement with the findings of (Guérémy et al. 2012; Sigaroodi et al. 2014) on precipitation correlation in the Mediterranean region. It is also essential to note that the results are slightly different because the flow is partially influenced by snow melting and base flows in the Great Karun basin. For instance, the Caribbean SST is a decisive index in the first level of correlation; indicating that the flow is influenced by fluctuations in the Caribbean region. The long-distance between the studied area and the reference point of the indices such as the Pacific SST in a way represents the time lag in the correlation between precipitation and flow. In other words, while a strong correlation between an index and precipitation in a region means also a high correlation with the flow, the impact of rainfall-runoff transformation time delay, determined by snow melting, soil moisture, penetration rate, etc., influences the selection of final correlated indices. Another point is the seasonal distribution of prediction performance and the better forecasting conditions of droughts compared to wet seasons. A demonstration of the predicted flow in each season with another reference to Fig. 7 shows that the largest difference between prediction and observation occurred in spring. This indicates the maximum flow volume occurred in spring and the prediction error increased due to insufficient data on peak values. At the same time, the lowest difference was observed in autumn. In this context, for the same reason of low flow volume in autumn and adequate data distribution in this interval, the model has been well able to identify the pattern of change.

A closer view of Tables 4 and 7 reveals increasing the prediction lead time eventually led to a decrease in the values of evaluation indices. The mean correlation coefficient on the monthly lead time was estimated as 0.92 while increasing the prediction lead time lowered the mean to 0.76. The Wilmotts index was calculated as 0.92 for the monthly and 0.77 for the six-month time step. Also, the Nash Sutcliffe index was recorded as 0.9 on the monthly and as 0.76 on six-month models. For RRMSE and MAPE relative indices, 25% and 32% on the monthly calculation, as well as 29% and 36% on the six-month prediction were recorded, respectively. While the proposed technique was used for the development of six-month prediction, the results are weaker than the monthly forecasting despite the desired results in terms of assessment indices. This is due to the complex nature of flow and various parameters affecting the rainfall-runoff process as well as errors in calculating the long-distance climate indices which make a prediction with long-lead-time difficult. Figures 4 and 8 highlight the difficulty of accurately predicting peak values in short and long lead times. A shorter prediction lead time did not increase the accuracy of peak values; even at certain nodes, larger underestimations were recorded. It also showed that the quality of peak results is relatively independent of the horizon and dependent on the frequency of occurrence. Finally, in Figs. 5 and 9, it is observed that the lead time has no bearing on the data distribution and does not follow a clear pattern.

Analysis of the results from the monthly and six-month prediction model suggests a proper ANN performance in the recognition of nonlinear patterns. The accurate identification of input patterns and their impact on output indicates the high flexibility of ANNs (Teschl and Randeu 2006; Wu and Chau 2010; Li et al. 2012). In addition, using ensemble learning in examining a group of inputs led to better recognition of patterns and proper distribution of desirable property among the members. This feature caused the overlap of the wavelet components’ desired properties in the monthly model and also distributed the positive impact of each index in a group in the six-month model; improving the answers in the end.

At the end, it should be pointed that both the short- and long-term forecast models should be implemented in their suitable applications. The first and the most important function of forecast models is the water resources operating systems. The more accurate the water resources model handle the future situation, the more accurately it can allocate the available water and reduce the critical shortages. But it can be said that the main difference between short-term and long-term models in managing water resources is how they are used. Long-term models are mostly proper to extract the management strategies while, short-term models are used to modify and verify the operational actions during critical periods (e.g. within droughts or wet seasons).

Conclusion

The present study adopted the Ensemble approach based on ANN to predict the long-term inflow to the Great Karun multi-reservoir system in Iran. For this purpose, and to find the contributing input parameters, the relationship between runoff and 36 long-distance climate indices was investigated and influential indices were selected. In the input selection process, ACF and PACF functions, which calculate linear correlation as a function of time delay, were used. With a hydrological-geographic division of the drainage basin and considering the location of reservoirs, inflows to the Great Karun multi-reservoir system were modelled at 10 separate key nodes and the surface runoffs in the nodes were predicted. An examination of the correlations showed that in most of the selected indices, there is a similar behaviour in short-term (1–3 months) and long-term (6–8 months) delays. This feature was used in categorizing climate-atmospheric indices, and the resulting categories were selected to form the inputs for the learning model.

Tropical Pacific SST EOF, CAR, Nino1 + 2, Atlantic Triple SST EOF indices had a higher correlation with inflows to the Great Karun system, indicating that runoff is primarily influenced by Pacific STT. Results showed that the fluctuation pattern of runoff can be predicted with acceptable accuracy even for six-month horizons using these indices; even if the unique correlation of each index with runoff is not high. The low correlation is due to the long-distance and time delay between the reference point of indices and the studied area. In other words, the difference between observations and forecasts suggests that other local parameters such as local wind, humidity contribute directly to precipitation and indirectly to surface runoff and therefore, must be considered in predictions. Other factors (i.e. soil moisture, slope, etc.) also contribute directly to flow rate, and excluding them increases computational error. Therefore, the use of ensemble learning resulted in the acceptable accuracy of the final model’s performance.

Results also showed the nonuniform distribution of prediction performance during wet and dry seasons. The largest difference between forecast and observation values occurred in spring. This is due to an inadequate number of peak samples; because the highest runoff amount in this area occurs during spring and the learning model was unable to cover these values due to insufficient occurrence frequency. Results also show the flexibility of ANN in investigating nonlinear relationships. Finally, it was found that prediction lead time has a restrictive effect on the model’s performance. The longer the prediction lead time, the more data are needed for accurate prediction, and the results’ accuracy decreases proportionally. Implementing various ensemble learning methods can significantly improve the achieved results.