Utilizing deep learning machine for inflow forecasting in two different environment regions: a case study of a tropical and semi-arid region

Reservoir inflow (Qflow) forecasting is one of the crucial processes in achieving the best water resources management in a particular catchment area. Although physical models have taken place in solving this problem, those models showed a noticeable limitation due to their requirements for huge efforts, hydrology and climate data, and time-consuming learning process. Hence, the recent alternative technology is the development of the machine learning models and deep learning neural network (DLNN) is the recent promising methodology explored in the field of water resources. The current research was adopted to forecast Qflow at two different catchment areas characterized with different type of inflow stochasticity, (semi-arid and topical). Validation against two classical algorithms of neural network including multilayer perceptron neural network (MLPNN) and radial basis function neural network (RBFNN) was elaborated and discussed. The research was further investigated the potential of the feature selection algorithm “genetic algorithm (GA)”, for identifying the appropriate predictors. The research finding confirmed the feasibility of the developed DLNN model for the investigated two case studies. In addition, the DLNN model confirmed its capability in solving daily scale Q more accurately in comparison with the monthly scale. The applied GA as feature selection algorithm was reduced the dimension and complexity of the learning process of the applied predictive model. Further, the research finding approved the adequacy of the data span used in the current investigation development of computerized ML algorithm.


Introduction
One of the useful and most direct ways of guiding reservoir operation and management is reservoir inflow (Q flow ) prediction; it is also useful for flood control, reservoir operation, drought management, irrigation water management, and reservoir operation (Rezaeianzadeh et al. 2016;Xu et al. 2021). Using the forecasted Q flow as an input information, the delicacy management of water resources at a reservoir is strongly reliant on precise Q flow predictions (Herbert et al. 2021). In most parts of the world, accurate and real-time daily or monthly prediction of Q flow remains a difficult challenge due to the nonlinearity and non-stationarity of the associated real hydrological data (Kim et al. 2019;Lee et al. 2020). Hence, this research topic has received much attention by the water engineers and decision makers.
Reservoir inflow prediction has become a major topic in hydrologic time series over the last few decades (Esmaeilzadeh et al. 2017;Bashir et al. 2019;Allawi et al. 2019a).
However, the paucity of information about physical concepts while studying the relationships between variables has necessitated the use of data-driven models in hydrological forecasting as an alternative to knowledge-driven methods. Two types of hydrological models are currently available to find solutions to nonlinear complex problems; these are the standard statistical methods and the physical-based methods (Tran et al. 2021). The standard statistical methods can not accurately capture nonlinear patterns while the physical process-based methods cannot sufficiently capture information to characterize basins based on hydrologic parameters (Yaseen et al. 2016b). While both methods depend on historical data to forecast the future, the underlying complexity of hydrologic input-output interactions necessitates a model that is strong enough to discern nonlinear patterns without sacrificing accuracy (Petty and Dhingra 2018). The new adopted computer-based approaches, such as "machine learning (ML) algorithms," have shown capacity in capturing complex nonlinear data patterns that would have required extrapolation (Nearing et al. 2021;Zounemat-Kermani et al. 2021); hence, they are considered an alternative method to the existing prediction methods in several fields of hydrology such as river flow forecasting (Yaseen et al. 2016a;Osman et al. 2020;Afan et al. 2020), rainfall forecasting (Tao et al. 2018;Ali et al. 2020), reservoir operation (Hossain and El-shafie 2014; Ehteram et al. 2018), evaporation process simulation (Allawi et al. 2019b;Salih et al. 2019), surface water quality prediction Yahya et al. 2019), geo-science related problems (Mukhlisin et al. 2012;Alizamir et al. 2020), drought detection (Alamgir et al. 2020;Singh et al. 2021), and several others (Raghavendra and Deka 2014a;Zounemat-Kermani et al. 2021).
In both direct and multi-step scenarios, ML algorithms were used to predict reservoir inflow, resulting in more reliable predictions of extreme inflows. It was first published in Coulibaly et al. (2000), where the authors used a feedforward neural network (FFNN) that was trained by using an early stopping approach for the prediction of real-time inflow with lead periods ranging from one to seven days. An enhanced version of the ML model for the prediction of daily inflow which employs a robust weighted-average ensemble that combines 3 different frameworks (a physical model, nearest neighbors, and an artificial neural network (ANN) was reported by Coulibaly et al. (2005). The use of least squares support vector machine (Bai et al. 2015), ensemble ML models (Ahmed et al. 2015), fuzzy logic models (El-Shafie et al. 2007), classical artificial neural networkbased radial basis function (El-Shafie et al. 2009), support relevance vector model (Liu et al. 2016), random forest (Liu et al. 2017), were noticed over the literature to predict inflow. Readers are encouraged to go more into the associated literature for reservoir inflow prediction using ML models by looking at the survey studies reported by Choong and El-Shafie (2015), Mosavi et al. (2018) and Wee et al. (2021).
A recurrent neural network (RNN) model was utilized to predict daily flow data by Apaydin et al. (2020). To check the predictive model, the performance of the RNN model is compared with the ANN model. Based on several indicators, the research concluded that the RNN model is more accurate than the ANN in predicting reservoir flow records.
The effectiveness of well-known computation methods, including MLP, ANN, and SVM, to predict inflow data was examined by Lee et al. (2020). The coefficient S, NSE, and other indices were used to assess these forecasting techniques. The study demonstrated that the models that were created could be a useful tool to predict reservoir inflow records.
The possible use of an ANN model in predicting reservoir inflow data was examined by Hadiyan et al. (2020). The research gave helpful data to simulate the inflow for the reservoir Sefidround, Iran. Allawi et al. (2017) employed the Coactive Neuro-Fuzzy Inference System (CANFIS) method to forecast reservoir inflow. The proposed model succeeded in providing high accuracy prediction results.
The use of ML algorithms to predict reservoir inflow is inconsistent, making it difficult to determine which strategy is preferable. In addition, the artificial intelligence models, such as ANN models, have several drawbacks, such as generalizing performance and learning divergence shortfalls, local minimum entrapment, and over-fitting issues (Ghimire et al. 2018). While the support vector machine model appears to overcome some of the shortcomings of ANN, it does so at the expense of a long simulation time due to the kernel function (penalty factor and kernel width) (Raghavendra and Deka 2014a, b). As a result, ML algorithms may not efficiently learn all the conditions if there is high data complexity. In the field of hydrology, the search for new and more reliable ML algorithms is still underway. New deep learning-based ML models have been recently developed for inflow simulation. These models, such as Deep Learning Neural Network [DNNN], Long Short-Term Memory [LSTM], and Convolutional Neural Network [CNN] have been created and are frequently employed in the prediction of hydrological time-series. These DL models have advantages such as the ability to handle highly stochastic data and the ability to extract the internal physical mechanism (Hrnjica and Mehr 2020).
Although there have been several researches conducted on the inflow forecasting over the literature (Bai et al. 2016(Bai et al. , 2018Aljanabi et al. 2017;Herbert et al. 2021), limitations are still existed and motivated the hydrological scientists to further study this essential problem. For instance, the robustness of the ML model, classical models such as ANN, SVM, ANFIS, etc., has demonstrated limitation in the learning process of the network and thus exploring new version of ML such as deep learning can contribute to overcome the classical ML models. The recognition of the appropriate features in order to construct the learning process of the ML model has been observed to be serious element in the computer aid models development and thus the reliable nature inspired optimization called genetic algorithm has been integrated as feature selection for the proper lead time reservoir inflow forecasting. As a matter of fact, the stochastic variation varies from one dam to another, thus, implementation of the methodology based on two different flow mechanisms can be tested where the generalization method for this hydrology can be examined.

Semi-arid region
The first case study used in the current research is Dukan reservoir. It is located around 67 km north of Sulaimani City in northern Iraq. The dam is adjacent to the city of Ranya and is located at 35°57′13.24′′ N and 44°57′11.61′′ E. It has a total capacity of 6.8 km 3 and is situated near Latitude 35°57′13.24′′ N and Longitude 44°57′11.61′′ E. It is a reservoir that was created during Dukan Dam construction on the small Zab River. This multipurpose dam The dam's total maximum discharge is around 4300 m 3 /s (150,000 ft 3 /s). This is partitioned between a spillway tunnel with 3 radial gates and an emergency bell mouth glory hole spillway that can discharge 2440 m 3 /s (86,000 ft 3 ) and 1860 m 3 /s (66,000 ft 3 ) per second, respectively. There are also 2 irrigation outlets that can co-discharge 220 m 3 /s (7,800 ft 3 /s) per second, but they haven't been used in 10 years. There is a powerhouse of 5 Francis units, each with an output of 80 MW, emitting between 110 and 550 m 3 /s (3900 and 19,000 ft 3 /s) of water. The lake has a surface size of 270 km 2 . The reservoir's capacity is 6.8 km 2 in normal operation, with a maximum capacity of 8.3 km 3 . The surface elevation is 515 m above sea level. The surface elevation of the dam must be within 469 and 511 m to operate the power station. The Dukan Dam's drainage basin spans 11,700 km 2 , with part of it in Iraq and the rest in Iran. The main source of water is the Zab River. The daily inflow to the reservoir over 11 years (January 2010-December 2020) is the only available data record. A Google map of this reservoir is shown in Fig. 1a.

Tropical region
Timah Tasoh Dam (TTD) construction began in 1987 and was finished in 1992 in Perlis, Malaysia (6°36′ N; 100°14′ E). TTD is an essential hydraulic construction within Peninsular Malaysia and its Q flow patterns operation and quantification is highly important for the water resources management of that region. In fact, the high variance and nonlinearity seen in the Q flow of the tropical zone frequently include a high stochastic pattern that contributes to the complexity of the dam's reservoir systems. This case study will necessitate the development of a new method for evaluating the offered models. As a result, a thorough comparison of the existing and proposed operating procedures is required.
The reservoir system has a total surface area of over 13.3 km 2 . The reservoir's overall capacity is around 40 million cubic meters (MCM). With an entry average runoff of over 100 MCM, the reservoir water storage has two major zones: a dead zone of 6.7 MCM and a live zone of 33.3 MCM. The reservoir could be classified as a shallow reservoir, with a maximum depth of 10 m. The reservoir's position was chosen to receive water from two main rivers in Perlis State: The Tasoh and Perlarit Rivers. The TTD provides irrigation water for 3100 ha at a rate of roughly 55 MCM per year. Furthermore, it delivers around 55*103 m 3 of water each day for home consumption. Dams are built to regulate and avoid floods that are expected during the rainy season. The location of the Timah Tasoh Dam is displayed in Fig. 1b.

Deep learning neural network
Deep learning (DL) has emerged as a new branch of ANN research that is altering different scientific disciplines in the modern day (Goodfellow et al. 2016). The term "deep" in this method refers to a connection of layers that allows the translation of data representation from one to another. A deep net (DN) is a type of ANN that has numerous hidden layers, an input layer, and an output layer (Lecun et al. 2015). In comparison with traditional machine learning methods, a DL-based model necessitates a huge amount of training data in order to comprehend the underlying data patterns increases in the network depth (i.e. number of layers) allowing the extraction of the most appropriate data hierarchical representations using a proper data transformation (Schmidhuber 2015). In recent years, DL has found use in remote sensing, hydrological prediction, and image processing. Although DL has different versions adopted over the literature, in the current study, the long short-term memory (LSTM) is conducted for the reservoir inflow forecasting. The design of the LSTM model is having feedback connection with the learning layers that support the concept of complete input sequences. The LSTM model is established to fit the pattern of the inflow based on lag times

Multiple linear perceptron
A feed-forward network is a multilayer perceptron NN (MLPNN) with numerous layers; in this network, the output of one neuron serves as the input to the next neuron layer. Figure 2b depicts the MLPNN model. The input layer nodes in the MLPNN can only forward the input values of the first hidden layer's node. The input-output correlation of each node can be displayed in the hidden layers as follows: where x j is the output that corresponds to the j node of the previous layer, w j is the weight that connects the j node and the current node, b is the bias value at the current node, and f is a sigmoid-like transfer function with nonlinear attributes. (1) where z is the weighted inputs aggregate, while f(z) is the neuron's output.
The unit description of an MLPNN is an architecture that allows the computation of a nonlinear function using the scalar product of the weight and input vectors. The network architecture determines the efficiency of MLPNN models. It contains the hidden layer count, the neurons specific to each layer, as well as the form of computation employed by each neuron.

Radial basis function neural network (RBFNN)
RBFNN is a function approximation variation of the standard ANN model that has a faster learning capacity (Cotar and Brilly 2008). The model structure has one input layer and one output layer, as well as a single hidden layer; it uses Gaussian functions as the basis and the least-square criterion as the objective function (Talukdar et al. 2020). In the hidden layer, the Gaussian functions give a significant response to the input boost when the network input falls within a restricted region of the input space. The RBF is presented as which is also know the hidden later function, whereas the hidden space is state in the following form . In the forgoing function, the number of the basis is less than input data observation, typically. Hence, the role of the Gaussian is the player for the solution of the one-dimension problem that is explained as (x, ) = e − ∥x− | 2 2d 2 . is the center value of the Gaussian function. d is the radius "distance" from the input value x to the (x, ), that indicates the measure of the spread of the Gaussian curve. Because of this mathematical mechanism, the RBFNN model is sometimes known as a (2) f (z) = 1 1 + exp(z)  The framework of the GA is reported in Fig. 2c. The selection of the optimal lags is conducted simultaneously and determined based on the minimal error metric (i.e., root-mean-square error). The procedure is adopted due to the satisfaction of the fitness function of the GA approach (Chang et al. 2019). Worth to highlight, the GA approach is worked based on the three-optimization processes including selection, crossover and mutation.

Implementation results and analysis
The research was adopted on the development of new machine learning model for forecasting Q flow at two different regions located at semi-arid and tropical (Iraq and Malaysia), respectively. The proposed model was emphasized from the latest version of deep learning and was validated against two classical ANN algorithms. The modelling structure was initiated based on univariate modelling where only lead time of previous records was used for the initial development of the learning algorithms, where correlated lags were used as predictors for the prediction matrix. Worth to highlight, that forecasting Q flow using only lag times is a distinguished modeling scheme where the merit of the machine learning models take place in mimicking the complex relationship between the predictors and predicated. As this research was conducted in different climatic zones, this section will cover two subsections elaborating the modeling results of the developed deep learning model and its validation classical neural network algorithms. The third subsection is focused on the feasibility of integrating feature selection algorithm prior the forecasting process. Several metrics were calculated for the prediction evaluation that present the best-fitgoodness [i.e., Nash-Sutcliffe efficiency (NSE), Willmott index (d)], absolute error indicators [i.e., root-mean-square error (RMSE), mean absolute error (MAE), Nash], and scatter index (SI), BIAS, MBE; readers are advised to refer to the following literature for the reference of the mathematical expression (Yaseen 2021).   1  12  23  34  45  56  67  78  89  100  111  122  133  144  155  166  177  188  199  210  221  232  243  254  265  276  287  298  309  320  331  342  353  364  375  386

GA-DLNN-2
Semi-arid case study It can be note here that all models for the case of the semi-arid region, the second lag time series provided the best forecasting results. Although the correlation was determined for the five lags using the auto-correlation statistics, this gives the credit that the applied ML models reported a homogeneous mechanism in abstracting the essential information from the memorial time series. Figure 3a, b and c explains the deviation from the identical line in the form of scatter plots for the applied ML models (i.e., DLNN, MLPNN and RBFNN) and for the five-input combination configured at the first place. The maximum determination coefficient uses the second input combination for the DLNN model (R 2 = 0.90), whereas the comparable models attained MLPNN (R 2 = 0.85) and RBFNN (R 2 = 0.87). It can be observed from Fig. 3 presentation, the models in general performed well. Particularly the DLNN attained identical prediction for the whole range of the data minimum and maximum Q flow data.

Tropical case study
The statistical results over the testing phases for the tropical region case study are reported in Table 2. Apparently, the developed DLNN model attained the best prediction results with values of (RMSE = 4.69, MAE = 2.89, d = 0.94, NSE = 0.90), whereas MLPNN attained (RMSE = 5.66, MAE = 3.67, d = 0.92, NSE = 0.85) and RBFNN attained (RMSE = 5.16, MAE = 3.08, d = 0.93, NSE = 0.88). The best results indicated that the best results were achieved using the second lags incorporating two months of previous inflow to forecast one step ahead inflow for the DLNN and RBFNN models. On the other hand, the MLPNN showed that including three lags is the best scenarios for the forecasting process. The superiority of the DLNN clearly explained the prediction performance enhancement. Also, this is elaborating the merit of the DLNN in better understanding the complicated relationship using the feasibility of the deep learning processes executed using multiple layers learning over the classical introduced ML algorithms over the literature.  Graphical results were confirmed the presented predictability performance displayed in Table 2, based on the scatter plots generated in Fig. 4; the ideal correlation was observed for the DLNN using the second lag with max determination coefficient (R 2 = 0.89). In comparison with the benchmark models, MLPNN was attained max determination coefficient equal to (R 2 = 0.78); RBFNN was given max determination coefficient equal to (R 2 = 0.82).

Integrative predictive model results
Modeling Q flow based on univariate modeling where only historical data of inflow used for the learning process is somehow a complex hydrological problem. Hence, reducing the dimension of the prediction matrix though the integration of the feature selection can participate essentially on providing a reliable and robust predictive model. Hence, the results of the hypothesized integration of GA as selection algorithm are presented in this subsection.
The results of the semi-arid indicated the variation between the best selected lags between the applied ML models. The best results configured using GA feature selection using second and third lags with best statistical (i.e., GA-DLNN-2: (RMSE = 23.49,MAE = 15.55,d = 0.98,MAE = 24.24,d = 0.96,NSE = 0.95)). However, lower prediction results uses the first two lags. The results of the Tropical case study revealed similar results with respect to the optimal lags, GA-MLPNN and GA-RBFNN best results the first two lags. GA-BLNN best results using the second and third lags. In quantitative results, Graphical result is based on scatter plots for the two cases in Figs. 5 and 6. The max determination coefficient (GA-DLNN: R 2 = 0.96) was attained for both case studies. The relative error percentage and actual/forecasted time series graphics were calculated and presented in Figs. 7 and 8 for semi-arid and tropical, respectively. It can be observed that the relative error percentage ranged between ∓ 30% for the semi-arid case study, while the tropical case study achieved even lower relative error percentage ranged between ∓ 20%. Final graphical presentation tested for the research results model is the Taylor diagram (Taylor 2001). Figures 9 and 10 present the two dimensions of the Taylor diagram for the conducted integrative ML models for both cases. Clearly, the GA-DLNN model showed nearer coordinate to the observed record of Q flow . The results for both cases confirmed crucial findings: (i) the feasibility of the GA feature selection algorithm for reducing the dimension of the predictors and facilitate more reliable prediction matrix for the learning process, (ii) the adopted integrative GA-DLNN model confirmed its capability in modeling different time scales Q flow day scale "semi-arid region" and monthly scale "tropical region", and (iii) GA-DLNN model could comprehend the actual mechanism that interconnect the predictors and predictand with more robust manner for both regions stochasticity (Tables 3, 4).

Discussion, limitation and future research
This current research is similar to several adopted related literature on reservoir inflow forecasting. However, the main contribution that is worth to highlight here is the potential of introducing new version of machine learning for better prediction accuracy. In addition, the capability to merge the mask input selection automated algorithm for selecting the relevant predictors. The finding of the research is totally supporting the research hypothesized assigned in the first section of this article. The limitation of the current research is the large data spam that is possibility to be incorporated in the model's construction. In addition, the whole process is an offline modeling for reservoir inflow forecasting. Therefore, for better practicality, an active learning can be adopted for such a kind of simulation and that can be adopted in future studies.

Conclusion
The main objective of the current study was to roll out a new robust and reliable predictive model to forecast reservoir inflow data in two different climatic regions (semi-arid and tropical). The development and validation of the deep learning predictive model started using two algorithms of the ANN model (MLPNN and RBFNN). In addition, transcriptomes of certified ML models were tested where the genetic algorithm was incorporated as an approach to selecting reliable input variables.
The results of the prediction accuracy indicated the potential of the DLNN model over the benchmark models and for both investigated case studies. Also, it was observed that the DLNN model confirmed its capability in solving daily scale reservoir inflow more accurately in comparison with the monthly scale. Further, the research finding approved the adequacy of the data span used in the current investigation development of computerized ML algorithm. The capacity of the GA approach had reduced the dimension of the prediction matrix and provided the learning process of the ML models with more informative historical memorial time series data.
Authors contribution The first draft of the manuscript was written by the first author. The manuscript version has been edited and revised by fourth and fifth authors. Supervising and reviewing have been made by the second, third and sixth authors.
Funding The research has no funding.

Conflict of interest
The authors declare no conflict of interest.

Ethical approval
We acknowledge that the current research has been conducted ethically and the final shape of the research has been agreed by all authors.

Consent to participate
The authors consent to participate in this research study.

Consent to publish
The authors consent to publish the current research in AWSC journal.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.