1 Introduction

Reservoir evaporation is natural process for the water cycle and hydrology system in which reflects the exchange between water and air; it is also interesting for the heat transfer and energy production around the world (Katsaros 2001). About 61% of the global precipitation around the world is lost by evaporation (Vishwakarma et al. 2022). The estimation of evaporation volume is essential for the future water resources management and planning, crops modelling, and irrigation scheduling in high water stress regions (Kişi, 2006; Kushwaha et al. 2022). The methods for evaporation estimation include the water budget, evaporation pans, bulk aerodynamic, and Penman’s equation. Recently, these methods also include the satellite sensor technology and remote sensing, and the regular hybrid and integrative data-driven models (Kişi 2006; Singh et al. 2021).

The climatic variables are used to estimate evaporation values by numbers of researchers (Duan and Bastiaanssen 2017). These variables are playing important role that affects the evaporation processes including the meteorological parameters of temperature, wind speed, relative humidity, vapor pressure and others (Ghorbani et al. 2021). The metaheuristic algorithms were incorporated for the modeling of hydrological system as a step for the developments in data-driven (Adnan et al. 2020; Kushwaha et al. 2021).

Kişi (2006) showed that the adaptive neuro-fuzzy inference systems (ANFIS) technique is good for the evaporation modeling using the climatic data compare with the the artificial neural networks (ANN) and Stephens–Stewart (SS). Kim and Kim (2008) developed a modeling for the pan evaporation and evapotranspiration using the ANN and genetic algorithm models; the study showed that the two methods are good tools to estimate the hydrological systems.

Tabari et al. (2009) showed that the ANN method is the best compare with the multivariate non-linear regression (MNLR) method for evaporation estimation in Iran. Kumar et al. (2012) applied the two techniques of ANN and ANFIS technique for prediction theevaporation in Pantnagar, India. The results showed that the ANFIS technique is good performance than ANN method. Deo et al. (2016) applied three models including the relevance vector machine (RVM), extreme learning machine (ELM), and multivariate adaptive regression splines (MARS) techniques in Australia. The study presented that the RVM method is more accurate than the other methods for forecasting of monthly evaporation. Pammar and Deka (2017) showed that the two methods of discrete wavelet transform (DWT) and support vector regression (SVR) are good for estimation of the pan evaporation in the humid areas (Bajpe region) than semi-arid areas (Bangalore region) in India. Al-Mukhtar (2021) presented that the SVM is a better performance for estimating the evaporation time series in New Delhi and Ludhiana, India than random tree (RT), reduced error pruning tree (REPTree) and random subspace (RSS).

The water evaporation from Aswan high dam reservoir (AHDR), Egypt is about 20% to 30% of the country's water supplies from the Nile River, which translates to 12 × 109 m3 year−1 to 16 × 109 m3 year−1 (Abdel Wahab et al. 2018). Hassan et al. (2007) reduced the evaporation losses in AHDR using a pontoon framework and circular foam sheets, saving, about 1 million cubic meters (MCM) per year by covering area of 0.50 km2. Ebaid and Ismail (2010) also reduced the evaporation in AHRD by disconnecting the secondary channels of El khors, saving water about 2.40 billion cubic meters (BCM) per year. The evaporation rates were found to be 2.73 mm day−1 and 9.58 mm day−1 at the middle and the edge the lake respectively.

Elba et al. (2017) increased the water depths in the AHDR by lowering the bed levels of the lake through removing sediments, saving water about 6.5% of the forecasting evaporation losses in year 2100.

The use of floating photovoltaic systems (FPVS) with solar cells for sustainable renewable energy that saves surface water and land resources (Trapani and Millar 2013). FPVS systems have advantages compared with the land solar panels by the lower sunlight obstacles, efficient power production, and improvement of the aquatic environment and water quality (Sahu, et al. 2016).

The Floating photovoltaic (FPV) cells are better than conventional ground-mounted photovoltaic (GPV) cells, especially in highly populated regions where the open area is limited and competed (Zhang et al. 2020). Current and future prediction indicate that FPV is expected to double. In 2020, the global breakdown of installed FPV in China was the majority at 75.04% while the Japan, Korea, United Kingdom and others countries had15.98%, 6.01%, 0.99% and 1.98% respectively. The FPV cells cost are higher than GPV with range 4% to 8% (Cazzaniga and Rosa-Clot 2021; Almeida et al. 2022).

Furthermore, the old-fashioned crystalline silicon solar cells are brittle, heavy and hard compare with the semitransparent polymer solar cells (ST-PSCs). FPV is gaining the world attention for green energy revolution and the use of land. ST-PSCs are smart with high absorption, thin layers; provide controlled shading with green electricity and tunable absorption spectra. This system is good for power generation, water evaporation, and algal growth compared to the old-fashioned crystalline silicon cells. The results of the study showed that the ST—PSCs achieved a maximum efficiency of 13% and an average visible transmittance of over 20% (Zhang et al. 2020) (see Fig. 1a). Abd-Elaty et al. (2021) studied the impact of covering irrigation canals for water management in the Nile Delta, Egypt. The study showed that the efficiency of the canals increased by 75.33% and the efficiency of aquifer recharge increased by 20% when using floating PV plus canal lining.

Fig. 1
figure 1

a A schematic diagram semitransparent polymer solar cells (After Yin et al. 2021) and b Location map of study area

This research aims to investigate and predict the evaporation in Nasser Lake, Egypt using Machine Learning Algorithms (MLA). The objective of this study is also to reduce water losses in the reservoir and increase water supplies by using FPVS to address climate change and increased temperatures. This system will increase the power production for use in agriculture activities to optimize the value of water industry and the unit value of fresh water. This study is important for decision makers in Egypt and around the world for planning and development, increasing water resources and power generation for use in agriculture production.

2 Materials and Methods

2.1 Study Area and Climate Data

Reservoirs of dams play an important role saving the human lives and the Communities development by managing the flooding and the fresh water resources (Asmal et al. 2000). The climate change will increase the temperature and the evaporation from water bodies and reservoirs and reduce Egypt’s water share from the Nile which estimated to be 20 BCM per year (LNFDC 2008). Figure 1b presents the location of Lake Nasser locates between 23˚ 58` and 20˚ 27` N and 30˚ 70` and 33˚ 150` E, the total lake length is 500 km in the Egypt border by 350 km and 150 km in the Sudanese border. The High Aswan Dam (HAD) constructed for developing the water resources management in Egypt along the year and protection the infrastructures and human lives from flood; this dam formed the reservoir of HAD (RHAD) in the upstream.

The climate data from 1980 to 2021 were developed for the location of the study area in HAD. Figure 2 is presented these values including the minimum, average and maximum temperature with ranged 1.57o C to 34.85o C, also the humidity is ranged from 34.50% to 49.62%, the minimum, average and maximum wind speed 0.06 m s−1 to 10.41 m s−1 while the saturated and air pressure was ranged from 0.66 Kpa to 2.49 Kpa.The average annual evaporation from the lake of ranges from 4.65 mm day−1 to 7.95 mm day−1.

Fig. 2
figure 2

Climate data for the study area

2.2 Empirical Methods for Evaporation and Water Saving

The evaporation losses were estimated in the current study using the bulk aerodynamic method. This method is used the Harbeck equation based on the climate data for the evaporation estimation from massive lakes and reservoirs. The monthly evaporation was estimated by Rosenberry et al. (2007), the saturated vapor pressure and monthly evaporation volume by Junzeng et al. (2012) while the water saving using the solar cells by Sahu et al. (2016) as the following Eqs. (1), (2), (3) and (4).

The AHDR evaporation was estimated based on the climate data from 1980 to 2021 using Lake Nasser coefficient, saturated vapor pressure at water surface temperature and the actual vapor pressure of the air; the following equation is used to calculate the evaporation (Rosenberry et al. 2007):

$$\mathrm{E} = \mathrm{N}\,{\mathrm{U}}_{2}(\mathrm{es}-\mathrm{ea})$$
(1)

The saturated vapor pressure at water surface temperature is estimated using the temperature while the actual vapor pressure of the air is estimated using the relative humidity (Fig. 2) as the following equation (Junzeng et al. 2012);

$$es=0.611exp\;(\frac{17.27T}{T+237.30})\;and\;ea=(\frac{RH}{100})es$$
(2)

The evaporation volume in the study area estimated using the evaporation rates, the surface area of lake by 5775 km2 and numbers of day in month as the following equation (Junzeng et al. 2012):

$$\mathrm{Vlosses}=(\mathrm{E}*\mathrm{Asurface}*\mathrm{n})*10\,\hat{}\,3$$
(3)

The volume of water saving for AHDR is estimated using the evaporation rates, the solar cell area and the number of day in month as the following equation (Sahu et al. 2016):

$$Vsaving=(E*Apannel*n*0.70)*10\,\hat{}3\,$$
(4)

where: E: evaporation losses (mm day−1); N: Lake Nasser coefficient by 0.0525 (Hassan et al. 2017); U2: wind speed (m sec−1); es: saturated vapor pressure (kpa) at water surface temperature; ea: actual vapor pressure of the air (kpa); RH: relative humidity; T: temperature (o C); Vlosses: water losses volume (m3 month−1); E: evaporation rate (mm day−1); Asurface: Lake Nasser surface area (km2); n: numbers of day in month (day), Vsaving: water savings; Apanel: solar cell area.

2.3 Solar Energy Estimation Using Covering the Lake

The saving of water from evaporation losses is carried out in the current study using solar panels. These cells will covering the lake at shallow depths where the evaporation from smaller depths is high compare with deeper depths. The available solar power is ranged between 200 and 250 MW km−2 for Aswan city., These values of solar power data were obtained from the Solar Atlas of Egypt (Choi 2014). The estimation of energy potential in the study area using this equation (Kosmopoulos et al. 2013):

$$\mathrm{SEP}= \mathrm{A} * \mathrm{Ed} *\mathrm{N} *\mathrm{H}$$
(5)

where SEP: is the Solar energy production (MW month −1); A: is the surface area subjected to sun (km2); Ed: is the energy density (MW Km−2); N: is the number of sun hourly in day (hr), and H is the number of days in month.

2.4 Selection of Best Input Combination for Model Development

The optimal selection and combination of the input parameters for the climate and meteorological is the key for the best performance of the hydrological models Also, the best combination of these parameters has been selected using Relief algorithm (Kira and Rendell 1992). Table 1 presents the ranks of the es, Taver, Uaver, Umax and RH, variables for predicting the evaporation.

Table 1 Selected attribute and their ranks

2.5 Machine Learning Algorithms

2.5.1 Additive Regression AR

This algorithm is a non-parametric regression technique and developed by Friedman and Stuetzle (1981). It used one smoother function and overcomes the problem of curse dimensionality. The technique using following form:

$$E\left[{y}_{i}|{x}_{i1},\dots , {x}_{ip}\right]={\beta }_{o}+{\sum}_{j=1}^{p}{f}_{i}({x}_{ij})$$
(6)

where \(\sum_{j=1}^{p}{f}_{i}({x}_{ij})\): the smooth functions fitted from the data, \({\beta }_{o}\) is regression coefficient.

2.5.2 Random Subspace (RSS)

RSS is ensemble methods of machine learning which it is used for decision trees. It constructs of the decision trees according to the parallel learning algorithm (MA) (Ho 1998). The advantage of this method is being sensibly with the high-dimensional problems in which the features numbers is much larger than the training points numbers (Arabameri et al. 2021).

The most voting for the generated trees is used in the subspace ensemble system for the sample of X, and is estimated by the following form (Skurichina and Duin 2002):

$$\beta \left(x\right)={argmax}_{y\epsilon (-\mathrm{1,1})}{\sum}_{b}{\delta }_{sgn\left({C}^{b}\left(x\right)\right),y}$$
(7)

where \(\delta\) is the Kronecker symbol, and y ∈ {-1, 1} is a decision (class label) of the classifier, \({C}^{b}\left(x\right)\) is the classifier of each subspace b (Skurichina and Duin 2002).

2.5.3 M5 Pruned (M5P)

The tree model is developed by Quinlan (1992) and it used for solving the regression problems (Kisi et al. 2017). This technique is developed based on the two steps, the first step, the standard deviation reduction (SDR) is estimated using the following form (Malik et al. 2020; Al-Mukhtar 2021).

$$\mathrm{SDR}=sd\left(T\right)-{\sum }_{i}\frac{\left|{T}_{i}\right|}{\left|T\right|}\times sd({T}_{i})$$
(8)

The second step is constructing a multivariate linear regression model for each node at the model subtree (Malik et al. 2020). The optimal model selection according to the minimize value of error for the linear or subtree model (Arabameri et al. 2021). The accuracy forecasting of M5 model is adjusted using this equation:

$$PV\left(S\right)=\frac{{n}_{i}\times PV({S}_{i})+k\times M(S)}{{n}_{i}+k}$$
(9)

where T is a set of cases in the data that reach a node, \(sd\) is the standard deviation, \({T}_{i}\) is the subset of cases that have the ith outcomes of the potential test, \(PV({S}_{i})\) is the predicted value at \({S}_{i}\), \({n}_{i}\) number of training cases, \({S}_{i}\) is the case follow a branch of subtree \(S\), \(k\) is the smoothing factor, \(M(S)\) is the value given by the model at \(S\).

2.5.4 Reduced Error Pruning Tree (REPTree)

The REPTree is adjusted as simple and computationally speed (Quinlan 1987) and based on the principle of information gain and minimizing the variance error (Chen et al. 2019). It was constructing to compare with other pruning methods (Ganatra and Bhensdadia 2012). Also, the data is split according to the information gain at each node and then the subtrees are pruned by the reduced error (Quinlan 1987). The subtree would be replaced by a leaf when the new induced tree has equal or less error than before. The process is repeated over until any further replacements increases the variance error of the test data.

2.5.5 Support Vector Machine (SVM)

Cortes and Vapnik (1995) established the supervised soft computing kernel-based SVM model that is capable of reducing complexities alongside errors in the estimation. The classifier models of SVMs are applied to problems of data classification under different classes. Another group of the SVM is the SVR, which is used in regression prediction problems. Here, the kernel function (input vector \(x\)) implicitly transforms the lower-dimensional inputs to a higher-dimensional feature [\(\beta (x)\)] such that \(w\) is the weighting vector and \(b\) is a bias. These two parameters are estimated using regularized risk function [\(R\left(P\right)\)], as shown in the following equations:

$$f\left(x\right)=w\beta \left(x\right)+b; \quad R\left(P\right)=P\frac{1}{n}{\sum }_{i=1}^{n}L({d}_{i},{y}_{i})+\frac{1}{2}{|\left|w\right||}^{2}$$
(10)
$${L}_{\varepsilon }\left(d,y\right)=\left\{\begin{array}{c}\left|d-y\right|-\varepsilon \left|d-y\right|\ge \varepsilon \\ 0, \quad otherwise\end{array}\right.$$
(11)

where; \(P\) is a penalty parameter, \(\frac{1}{2}{|\left|w\right||}^{2}\) is a regularization term, \({d}_{i}\) is the desired value, \(P\frac{1}{n}{\sum }_{i=1}^{n}L({d}_{i},{y}_{i})\) is the error term, and \(\upepsilon\) is the tube size of SVM in \({L}_{\varepsilon }\) (Table 2).

Table 2 Machine learning algorithm parameters used for modeling evaporation

2.6 Hybridization of Meta-Heuristics Algorithms Using Stacked Generalization

According to purpose of forecasting monthly evaporation, the current experiment used stacking hybrid algorithms. Wolpert (1992) developed the stacking hybrid algorithm proposal. This technique offers a framework for ensemble algorithms, which mix two or more algorithms over the course of training. Studies have shown that stacking hybrid algorithms can improve the predictability of the algorithms (Kushwaha et al. 2022). Moreover, the idea of the stacking hybrid generalization is applying the first-level learners for the train and forecast the training data sets. The new training dataset is creating for the meta learner based on the predicted outcomes from the first-level learners and then it were combined. (Sikora et al. 2015) and (Zhou 2009) presented the complete details on layered hybrid generalization.

2.7 Statistical Performance Assessment of Developed Hybrid Models

The current study was evaluated based on the performance machine learning model For numbers of statistical indices including the mean square error (MSE), root mean square error (RMSE), relative root square error (RRSE), mean absolute error (MAE), relative absolute error (RAE), Nash–Sutcliffe Efficiency (NSE) (Kushwaha et al. 2021), Willmott Index (WI), and determination coefficient of (r2) ( Kushwaha et al. 2022). MSE calculates the proximity of a fitted line to the data points. The root mean square deviation of time series anticipated values from observed values is represented by RMSE statistics. When the error is being reduced in the same dimensions as the quantity being predicted, the relative squared error (RRSE) is measured as its square root. The mean absolute deviation of time series anticipated values from the observed values is represented by MAE statistics, as shown. While the RAE statistic shows the proportion of the measurement's absolute error to the real measurement, it also aids in estimating the absolute error's size relative to the measurement's actual size. The most often employed metric for measuring model performance is Nash–Sutcliffe efficiency. It goes from 1 for the ideal fit value of 0 means that the accuracy was the same as the mean value. The index of agreement is another name for the Willmott index (WI). The WI varies from zero to one (0 WI 1); an optimum agreement/fit of about one is possible. In contrast, r2 is a measurement of the linear relationship between the dependent and independent variables.

For evaporation modelling, the models that have been shown to have higher values of r2 (closer to 1) and RRSE and lower values of MSE, RMSE, MAE, and RAE are deemed to be comparable better models. O and P stand for observed and predicted or simulated values for the dataset in the following equations; OAvg and PAvg stand for the average or mean magnitude of observed and predicted or simulated values; and N is for the number of observations.

$$MSE=\frac{1}{N}{\sum }_{i=1}^{N}{({O}_{i}-{P}_{i})}^{2}; \quad RMSE=\sqrt{\frac{1}{N}{\sum }_{i=1}^{N}{({O}_{i}-{P}_{i})}^{2} }$$
(12)
$$RRSE=\sqrt{\frac{{\sum }_{i=1}^{N}{({O}_{i}-{P}_{i})}^{2}}{{\sum }_{i=1}^{N}{({O}_{i}-{O}_{Avg})}^{2}} };\quad MAE=\frac{1}{N}{\sum }_{i=1}^{N}|{O}_{i}-{P}_{i}|$$
(13)
$$RAE= \frac{{\sum }_{i=1}^{N}|{O}_{i}-{P}_{i}|}{{\sum }_{i=1}^{N}|{O}_{i}-{O}_{Avg}|};\quad {r}^{2}=1-\frac{{\sum }_{i=1}^{N}{({O}_{i}-{P}_{i})}^{2}}{{\sum }_{i=1}^{N}{({O}_{i}-{O}_{Avg})}^{2}}$$
(14)
$$NSE=1-\left[\frac{{\sum }_{i=1}^{N}{{(O}_{i}-{P}_{i})}^{2}}{{\sum }_{i=1}^{N}{{(O}_{i}-{O}_{Avg})}^{2}}\right];\quad WI=1-\frac{{\sum }_{i=1}^{N}({{O}_{i}-{P}_{i})}^{2}}{{\sum }_{i=1}^{N}{\left(\left|{P}_{i}-PAvg \right|+\left|{O}_{i}-{O}_{Avg}\right|\right)}^{2}}$$
(15)

2.8 Study Scenarios

This study aims to estimate evaporation from Nasser Lake using machine learning algorithms based on climate and meteorological data from 1980 to 2021. The method predicts future monthly evaporation values using this data. Using these predictions, we estimated water losses and developed a plan for water saving and energy production by covering shallow water depths in the lake.

3 Application of Machine Learning Models

The goal of this study is to estimate evaporation losses and assess the predictability of hybrid machine learning algorithms/models. In this section, we present the results of modelling evaporation losses at our study site using data-driven machine learning algorithms. We evaluated a total of five models, including AR, AR-M5P, AR-SVM, AR-RSS, and AR-REPTree, for predicting evaporation. These models were trained on a monthly time scale from 1981 to 2009 and tested from 2010 to 2021. We compared the models using various statistical criteria, both numerical and graphical.

3.1 Training of Models (1981–2009)

The training of all ML models was performed with 348 data points (accounting 70% of dataset). The descriptive statistic of observed evaporation and estimated from five ML models (AR, AR-M5P, AR-SVM, AR-RSS, and AR-REPTree) is presented in the form of Violin box plot (Fig. 3a). The Figure depicts the minimum, maximum, median along with 1/4th and 3/4th percentile quartiles, and the distribution of dataset (observed and modeled). The statistical results for testing period related to correlation coefficient, mean absolute error, root mean squared error, relative absolute error (%), and root relative squared error (%), Nash–Sutcliffe efficiency (NSE), and Willmott Index (WI) are collected in the Table 3.

Fig. 3
figure 3

Violin box plot of predicted and observed monthly evaporation values

Table 3 Statistical results of different ML algorithms for monthly Evaporation during training period (1981–2009) and testing period (2010–2021)

As indicated by the descriptive statistics the estimated evaporation through, AR and AR-M5P model, has the least (0.970) and highest (0.982) correlation with that of observed values respectively. Whereas the RMSE of 0.629 mm (AR-M5P) and 0.789 mm (AR) was observed to be minimum and maximum respectively among all five implemented models. The range of relative absolute error (%) was found to be 16.183–21.518 percent during calibration of the models. Higher NSE and WI statistics for AR-M5P model among all algorithms confirms that it is being better performer during training period. Similarly, statistics depicts that AR algorithm is being least performer based on the lowest NSE (0.941) and WI (0.985). From Table 3 it is clear that the hybrid AR-M5P algorithm confirms optimal values against all descriptive statistics. Moreover, based on statistics, it was observed that during calibration all four hybrid ML algorithms (AR-M5P, AR-SVM, AR-RSS, and AR-REPTree) performs superior than AR model alone. Depicts the monthly time series during training period of observed vs predicted evaporation and respective scatter plots was done. The time series plots provide the capabilities pf models to estimate the evaporation. The scatter plots with liner regression results the highest R-square value of 0.965 for the AR-M5P followed by AR-SVM, AR-RSS, AR-REPTree and AR model with that of 0.962, 0.957, 0.951, and 0.941 respectively. These results indicates that the AR-M5P model provides the best result close to best fit line among all five ML models for E estimation during training period. The interesting fact that was observed from rhis step that the AR model is predicting the overestimated and underestimated values for low and high evaporation values respectively, leading to the model to show weaker prediction power then hybrid models during training period. Further comparative analysis of models was done using the Taylor diagram (Fig. 5a). In Taylor diagram on the basis of correlation and standard deviation hybrid AR-M5P model and AR model are found to be at closest and farthest respectively to the observed/referenced point. This indicated hybrid AR-M5P model has performed best among all five ML models to simulate evaporation during training of the models.

3.2 Testing of Models (2010–2021)

Validation or testing of the trained models is necessary to avoid overfitting of the models. The testing of all ML models was performed with 144 data points (accounting 30% of dataset). For testing period, descriptive statistic of observed evaporation and estimated from five ML models (AR, AR-M5P, AR-SVM, AR-RSS, and AR-REPTree) are presented in the form of Violin box plot (Fig. 3b). The figure depicts the minimum, maximum, median along with 1/4th and 3/4th percentile quartiles, and the distribution of dataset (observed and modeled).

Table 3 presents the results of all ML models tested, along with descriptive statistics for the testing period. The highest correlation between observed and predicted evaporation values was found for the AR-M5P model (0.978), followed by the AR-SVM model (0.977), the AR model (0.968), the AR-RSS model (0.967), and the AR-REPTree model (0.961). The results consistently showed that the AR-M5P ML model had the lowest errors (MAE, RMSE, RAE, and RRSE), along with the highest NSE (0.954) and WI (0.989). In contrast, the AR-REPTree model had the highest errors and the lowest NSE and WI (Table 3).

Scatter plots with regression lines were generated for all models to compare observed and predicted evaporation values. The AR-M5P model had the highest coefficient of determination (R2) and closely followed the best fit line (1:1). The hybrid AR-REPTree model had the lowest R2 of 0.923 for the testing period. Time series plots in Fig. 4 depicted the fluctuation of estimated and observed evaporation values for the testing period. The AR-M5P model estimated the variation of evaporation reasonably well, followed by the AR-SVM, AR, AR-RRS, and AR-REPTree models, indicating that the AR-M5P model is superior among the five ML models.

Fig. 4
figure 4

Scatter plots between observed and predicted monthly evaporation during testing period

The model testing was further analysed using a Taylor diagram (Fig. 5b) to carry out a comparative evaluation of the models. The hybrid AR-M5P model was found to be the closest to the observed/referenced location based on correlation and standard deviation, while the AR-REPTree model was furthest. This indicates that the hybrid AR-M5P model performed the best among all five ML models for simulating evaporation during testing.

Fig. 5
figure 5

Taylor diagram representation of model performance

3.3 Estimation of Future Monthly Evaporation

The calibrated model was used to forecast future lake evaporation for the years 2030, 2050, and 2070. The results showed that the maximum evaporation occurred in August and reached 12.33, 13.45, and 14.56 mm/month, with a volume of 2.21, 2.41, and 2.61 BCM/month, respectively. Conversely, the minimum evaporation values occurred in December at 2.58, 2.48, and 2.38 mm/month, with corresponding volumes of 0.46, 0.44, and 0.43 BCM/month (Fig. 6). The results also showed that months with higher temperatures led to an increase in future evaporation, while those with lower temperatures led to a decrease in predicted evaporation because of the impact of climate change. The total annual loss for future lake evaporation for the years 2030, 2050, and 2070 was estimated to be 16.52, 17.54, and 18.56 BCM/year, respectively.

Fig. 6
figure 6

Predicted Evaporation and water losses from Nasser Lake

3.4 Estimation of Future Water Saving and Energy Production

Figure 7 illustrates the predicted water savings and energy production from the lake by covering shallow water depths from contour level 0 to 10 m from the surface water level in an area of 1732 km2, which represents about 30% of the total lake area. The maximum water saving reached 0.46, 0.505, and 0.548 million cubic meters (MCM) per month for the years 2030, 2050, and 2070, respectively, in August. The minimum water savings occurred in December, with values of 0.097, 0.093, and 0.089 BCM per month for the years 2030, 2050, and 2070, respectively. The expected total annual water savings were estimated to be 3.47, 3.684, and 3.898 BCM per year for the years 2030, 2050, and 2070, respectively.

Fig. 7
figure 7

Predicted water saving and energy production from lake Nasser

Regarding power production, the maximum predicted power output was estimated to be 0.134 × 109, 0.151 × 109, and 0.167 × 109 megawatts (MW) per month for the years 2030, 2050, and 2070, respectively, in July. The minimum power output was estimated to be 0.077 × 10^9, 0.087 × 109, and 0.097 × 109 MW per month for the years 2030, 2050, and 2070, respectively, in February. The total annual power production was estimated to be 1.266 × 109, 1.425 × 109, and 1.583 × 109 MW per year for the years 2030, 2050, and 2070, respectively (refer to Table 4).

Table 4 Nasser Lake for predicted evaporation, water losses and saving and energy production

4 Qualification the Performance of Machine Learning Models for AHDR

In this study hybrid ML models were comparatively evaluated to estimate monthly evaporation. The climate Data from mass water body of Lake Nasser in Egypt and Sudan were employed for the sake of models evaluations using several statistical metrics (MAE, RMSE, NSE, WI and r). Then best combination has been selected using Relief algorithm. As such, the optimal input combinations were es, Taver, Uaver, Umax and RH, indicating that all these variables affect pan evaporation. The results showed that derived evaporation had good correlation with observed evaporation along with low values of associated errors which showed that hybrid ML models can predict monthly evaporation reasonably well. Based on RMSE and correlation coefficient M5P model found to be satisfactory performing for prediction of stream flow (Onyari and Ilunga 2013). Moreover, M5P outperformed other hybrid ML models for flow prediction and also for daily water level prediction (Khosravi et al. 2021). Similar to such studies the present study, in particular, highlights the capability of hybrid AR-M5P model to predict monthly evaporation losses. The AR model was found to have the least accuracy during training period; however, it has outperformed two hybrid models (AR-RSS and AR-REPTree) during testing period. The highest R2 and closeness of linear regression line (fitted between predicted evaporation from AR-M5P model and observed) with 1:1 line (best fit line) reinforces the capability of AR-M5P hybrid ML model to predict evaporation. Figure 5 show the clear comparative picture of the hybrid ML models. The nearer position of AR-M5P model then other ML models to the reference location shows its good performance and best prediction capability among all. Overall, from Fig. 4 it has been observed that the hybrid models are capable to estimate the monthly evaporation with reasonable accuracy (Kushwaha et al. 2021). Moreover, these results, in line with past studies Malik et al. (2020), and Kushwaha et al. (2022) and highlight the ability of ML models to estimate evaporation accurately. Considering the results obtained from this study it is concluded that hybrid AR-M5P provides superior results than all another LM models for both the period (training and testing) as it commits the highest correlation and lowest errors in prediction of evaporation (Table 3).—The study concludes that the hybrid AR-M5P ML model is not only superior to the AR model alone but also outperforms other hybrid models such as AR-RSS and AR-REPTree.

Evaporation is a critical parameter in the hydrological cycle. It can reduce water losses and the water budget for the total inflow. The current results show that the maximum evaporation occurs in August, while the minimum values occur in December. The total annual water losses from Lake Nasser will reach 16.52, 17.54, and 18.56 billion cubic meters (BCM) per year in 2030, 2050, and 2070, respectively. The expected total annual water savings will reach 3.47, 3.684, and 3.898 BCM per year in the same years. The total annual power production will reach 1.266 × 109, 1.425 × 109, and 1.583 × 109 megawatt-hours (MWh) per month in 2030, 2050, and 2070, respectively.

This management will be developed by covering the shallow water depths from contour level 0 to 10 m from the surface water level in the lake by a surface area of 1732 square kilometres, which represents about 30% of the total lake area (see Fig. 8). In other words, the management plan will cover about 30% of the surface of Lake Nasser with a floating photovoltaic system (FPVS). This will reduce evaporation from the lake and generate electricity at the same time. The expected water savings and power production are significant, and this management plan could be a valuable tool for conserving water resources in Egypt.

Fig. 8
figure 8

Future lake Nasser water losses, water saving and energy production

The results of our study agree with those of Abd-Elhamid et al. (2021), who estimated the annual evaporation from Lake Nasser to be 12 billion cubic meters (BCM). They also estimated that annual water savings would reach 2.1, 4.2, 6.3, 7.0, and 8.4 BCM, and annual energy production would reach 2.85 × 109, 5.67 × 109, 8.54 × 109, and 11.38 × 109 megawatt-hours (MWh) by covering 25, 50, 75, and 100% of the total surface area of the lake, respectively. The difference in energy production between the two studies is due to the fact that Abd-Elhamid et al. estimated energy production over 24 h, while our study took into account daily sunsine hours.

Allawi et al. (2019) predicted the reservoir using ML in the Layang Reservoir, Johor River, Malaysia, the study showed that the selection the suitable training period to learn the prediction model plays role to attain good results. Al Sudani and Salem (2022) forecasting the evaporation rates using advanced ML, the study in two study area for Diyala and Erbil state, Iraq. The study proposed that GMBM model can therefore assist local stakeholders in the management of water resources. Abed et al. (2022) estimated the monthly pan evaporation rates using random forest, convolutional neural network (CNN), and deep neural network (DNN), the study showed that the CNN model was powerful for predict evaporation. Essak and Ghosh (2022) showed that covering global reservoirs with FPV by just 1% would produce 404 GW of power. Antonopoulos and Gianniou (2022) investigated the surface water temperature using energy budget components and Artificial Neural Networks (ANNs models) at the interface of atmosphere and water. The results showed that the ANN models are the most accurate compare net radiation models using the adjusted Slob’s equations, and then, by the models with the heat storage changes functions. Agrawal et al. (2022) used 5 ensembled machine learning methods in estimating the daily reference evapotranspiration. The results showed that this approach substantiated by boosting algorithm for Extreme Gradient Boosting (XGBoost) significantly enhance the performance in the estimation method. Faramarzzadeh et al. (2023) developed Machine Learning and Remote Sensing model for investigating the gap-filling daily precipitation data in East Africa, the study showed that the random forest technique is performed the best results among all other methods for sloving the gap data. Beça et al. (2023) optimized the reservoir water management using rule curves and a dynamic assessment of water demands that are dependent on a water transfer system. The study indicated that this method can ensure 100% of the urban water supply, improve the reliability of the irrigation supply from 75% to 86–91%, and provide 92–98% of the river ecological flow. TR et al. (2023) estimated the precise values of the daily reference evapotranspiration using four ensemble techniques in the state of Karnataka, India, from 1979 to 2014. The results showed that these models based on all climatic variables were the most accurate in comparison with other input combinations.

The challenges of our research are the environmental impact of FPV on the ecosystem and water quality. More studies are needed to achieve future zero emissions, water savings, and green solar energy production. Overall, our findings indicate that hybrid models have a stronger predictive value in real-world situations and may be employed more effectively in watersheds with little data. In addition to predicting pan evaporation, these types of models may be used to forecast a wide range of hydrological and water resources phenomena, including ETo. The current study limitations are the cost of using floating semi-transparent polymer solar cells compare with the other solar cells, also the environmental impact of the covering this reservoir on the ecosystem and hydrology cycle in this region.

5 Conclusions

In this study, we applied additive regression and its hybridization with four other machine learning algorithms (AR-M5P, AS-SVM, AS-RSS, and AR-REPTree) to forecast monthly evaporation, water savings, and power production for the years 2030, 2050, and 2070. The algorithms were trained with 29 years of data (1981–2009) and tested against 12 years of data (2010–2021). The results were compared with classic AR to see the accuracy improvement of the new developed methods.

The AR-M5P model was found to be the best performing model among the other evaluated methods, as it showed the least error indices values. Overall, our findings indicate that hybrid models have a stronger predictive value in real-world situations and may be employed more effectively in watersheds with little data. In addition to predicting pan evaporation, these types of models may be used to forecast a wide range of hydrological and water resources phenomena, including ETo.

The descriptive statistics results showed that the highest correlation between observed and predicted evaporation values was found for the AR-M5P model (0.978), followed by the AR-SVM model (0.977), the AR model (0.968), the AR-RSS model (0.967), and the AR-REPTree model (0.961).

The study found that the total annual evaporation losses from Lake Nasser will reach 16.52, 17.54, and 18.56 billion cubic meters (BCM) per year, the also the expected total annual water saving reached 3.47, 3.684 and 3.898 BCM year1 while the total annual power production reached 1.266 × 109, 1.425 × 109 and 1.583 × 109 MW month−1 in 2030, 2050, and 2070, respectively. This is a significant amount of water, and it is therefore important to find ways to reduce evaporation losses. The study recommends extending water management studies for shallow water depths to minimize evaporation losses and save freshwater. This could be done by covering shallow water depths with floating solar panels, which would reduce evaporation and generate electricity at the same time. The study also recommends that policymakers take steps to address the expected rising in temperature, climate changes, rising in sea levels and drought. This could include investing in water conservation measures, such as water recycling and desalination, and developing drought-resistant crops. By taking these steps, policymakers can help to ensure that Egypt has a sustainable water supply in the future.