Introduction

Evapotranspiration (ET) stands as a pivotal element within the water cycle, wielding significant influence over crop growth, agricultural yield (Elsadek et al. 2023), and micro-meteorological dynamics (Dong et al. 2021; Yi et al. 2022). Its impact reverberates throughout agricultural management, directly influencing soil moisture levels, crop water requirements, and the formulation of effective irrigation strategies (Tang et al. 2018; Jiang et al. 2019). Lysimeters and eddy covariance are widely used instruments for ET measuring. They give precise information, but they are time-consuming and expensive (Dong et al. 2021). The Food and Agriculture Organization (FAO) and the American Society of Civineers endorse the standardized Penman–Monteith (PM) equation as the preferred method for estimating reference crop evapotranspiration (ETo). This equation, which integrates physiological and aerodynamic parameters, estimates ET from reference surfaces and has been validated across diverse climatic regions using lysimeter measurements (Allen et al. 1998). Nonetheless, its reliance on extensive meteorological datasets (such as solar radiation, wind speed, and relative humidity) poses challenges in regions where such data may be scarce (Lee et al. 2024). As an alternative, the Hargreaves (HG), a simplified temperature-based equation that utilizes maximum, minimum, and average temperatures (Elsadek 2023; Elsadek et al. 2024), has been proposed. However, potential evapotranspiration (PET) estimates derived from the HG equation have exhibited discrepancies when compared to the PM equation across various geographical locations worldwide. As a result, numerous researchers have devised practical models that demand fewer climatic data inputs for estimating ETo, yielding satisfactory outcomes across diverse climatic conditions (Tabari et al. 2013; Feng et al. 2017a; Zhang et al. 2018). Conversely, these models may entail an uncertain estimation process due to the influence exerted by local climates and geographic variations (Zhao et al. 2023).

Recently, machine learning (ML) techniques such as artificial neural networks (ANNs), extreme learning machines, random forests, gene expression programming, and extreme gradient boosting have shown satisfactory results in ETo modeling and prediction (Kumar et al. 2002; Landeras et al. 2008; Traore et al. 2010; Fan et al. 2018; Reis et al. 2019; Ferreira et al. 2019; Petkovi et al. 2020; Zhu et al. 2020; Gong et al. 2021; Chia et al. 2021). These models provide credible estimations of ETo, and they require fewer climatic data inputs in comparison with conventional methods (Feng et al. 2017b; Fan et al. 2019; Chia et al. 2021). These models have proven their effectiveness across different climates and geographic regions, making them valuable tools for ETo modeling (Ferreira and da Cunha 2020; Sowmya et al. 2020; Bellido-Jiménez et al. 2021; Kaya et al. 2021). Moreover, hybrid ML models that combine various algorithms are considered an efficient alternative model that enhances the reliability of ETo estimates (Reis et al. 2019; Zhu et al. 2020; Mohammadi and Mehdizadeh 2020; Gong et al. 2021; Roy 2021; Chia et al. 2021; Elzain et al. 2024; Granata et al. 2024).

In this study, various data preprocessing techniques were combined with the ABC-ANN algorithm for predicting daily ETo values based on several meteorological variables (minimum temperature (Tmin), maximum temperature (Tmax), mean temperature (Tmean), precipitation (P), wind speed (WS), relative humidity (RH), solar radiation (Rs), and extraterrestrial solar radiation (Ra). By utilizing these sophisticated decomposition techniques, the study aims to effectively capture the intrinsic modes and underlying patterns in the climatic data, leading to a more accurate and robust prediction of ETo. The present study also aimed to investigate how the signal processing approach would impact the ETo prediction performance of the ML model. Furthermore, the prediction of ETo values was compared to evaluate the prediction performance of different models. The ultimate goal is to provide a powerful and reliable tool for critical applications such as watershed management, agricultural planning, and ecological sustainability, thereby addressing the challenges associated with accurate ETo prediction in arid and semi-arid climates.

Material and methods

Study area and data collection

This research employed daily meteorological variables from 1979 to 2014 from four regions in Egypt, namely Al-Qalyubiyah, Cairo, Damietta, and Port Said. Figure 1 shows the geographical information for the chosen locations. Also, Fig. 1 depicts the locations of the case studies on a map of Egypt. Besides, we also employed ET datasets for five regions during the same period, which were calculated by the HG method because of limited meteorological variables. The Al-Qalyubiyah region witnesses a Mediterranean climate and practices traditional agriculture, which includes the growth of crops such as wheat and vegetables. The desert climate of Cairo is well known, as is its urbanization and the cultivation of dates and citrus fruits. Damietta is located under arid climate conditions and is well-known for rice (Elsadek 2023; Elsadek et al. 2024) and citrus cultivation. At the same time, Port Said is strategically positioned at the entrance of the Suez Canal and engages in trade-related activities as well as wheat and barley farming. Geographical information for each station, including latitude, longitude, and elevation, is listed in Table 1.

Fig. 1
figure 1

Map of the study area in Egypt

Table 1 Geographical information of the meteorological stations

Nine daily meteorological observations were used to implement this study, which includes Tmin, Tmax, Tmean, P, WS, RH, Rs, Ra, and ETo. The first eight climatic variables were used as inputs in the models, while the ETo was predicted as the target variable. The selection of meteorological variables and considering their time series as the inputs of the model were based on their correlation (Fig. 2) and previous studies, which proved the effectiveness of these variables on ETo prediction (Abdallah et al. 2022; Mehdizadeh et al. 2021).

Fig. 2
figure 2

Correlation plots between all used variables in five case studies

Artificial neural network (ANN)

ANNs are systems of computation that simulate how information is addressed, learned, and processed in the human brain (Hussan et al. 2022). They have many applications in hydrology (Tung et al. 2020; Yin et al. 2023a), allowing them to approximate unknown functions or forecast values in the future based on potentially noisy time series data (Bisoyi et al. 2019). Analysis of the structure of an ANN involves some simple components working in parallel. Describing the function of the ANN in a similar way to natural processing mainly relies on the links between the layers (Yaseen et al. 2016). The hidden layer represents the primary key element of the ANN, as it is located between the output and input layers, in which the neurons get a set of weighted inputs and thus generate an output by implementing a specific automation function. For details on ANN training, the reader is referred to Yaseen et al. (2016).

Artificial Bee Colony (ABC)

The ABC is a metaheuristic optimization algorithm inspired by the foraging behavior of honeybees (Karaboga 2005). In ABC, the search process is modeled based on the food-foraging behavior of honeybee colonies. The algorithm maintains a population of artificial bees, where each bee represents a potential solution to the optimization problem. The ABC algorithm consists of three main components:

  • Employed bees These bees explore the search space by exploiting the information obtained from other bees in the population. They iteratively improve their solutions by selecting promising solutions found by other bees and making small modifications to them.

  • Onlooker (keeper) bees These bees choose which employed bees to follow based on the quality of their solutions. They observe the solutions generated by the employed bees and decide whether to explore similar regions of the search space or exploit different regions.

  • Scout bees These bees are responsible for exploring new areas of the search space that other bees have not visited. If an employed bee exhausts its search for a solution without finding an improvement, it becomes a scout bee and explores a new solution randomly.

The bees transmit messages about the location of food sources by performing a series of motions referred to as a waggle dance. Utilizing the food source consumed by the bees, the bee will become a scout to randomly search for new food sources (Karaboga and Basturk 2007, 2008; Iqbal et al. 2024).

In this study, ABC-ANN, EMD, VMD, LMD, and EWT codes were created with Matlab 2019 software, while EEMD and CEEMDAN codes were produced with R software.

Empirical mode decomposition (EMD)

EMD offers extensively applied adaptive instantaneous frequency-based intrinsic mode functions (IMFs) to flexibly analyze multichannel data with linear and non-stationary time series. Each IMF symbolizes a basic oscillation in the signal. IMFs’ frequencies and amplitudes are not fixed, yet they can vary with time. Suppose x(n) is stated as a time series. In that case, it adaptively separates a non-stationary signal into a series of IMFs from high frequencies to low frequencies, and the separated signal could be labeled as follows:

$$x_{i} = \mathop \sum \limits_{i = 1}^{N} C_{i} \left( t \right) + r_{N\left( t \right)}$$
(1)

in which Ci(t): ith IMF x(t), rN(t): residual of signal (Kedadouche et al. 2016).

Variational mode decomposition (VMD)

It recursively separates a real-valued multicomponent signal f into quasi-vertical band-limited sub-signals. In addition, all modes are compact around a center vibration. Therefore, the equation of the limited variational problem can be formulated as follows:

$$\left\{ {\begin{array}{*{20}l} {\min \left\{ {\mathop \sum \limits_{1}^{K} \left. \| {\partial t\left( {\delta \left( t \right) + \frac{i}{\pi t}} \right)u_{k} \left( t \right)e^{{ - jw_{k} t}} \| \begin{array}{*{20}c} 2 \\ 2 \\ \end{array} } \right\}} \right.} \hfill \\ {\left\{ {u_{k} } \right\},\left\{ {w_{k} } \right\} } \hfill \\ {{\text{s}}.{\text{t}}\;\mathop \sum \limits_{k = 1}^{K} u_{k} = f} \hfill \\ \end{array} } \right.$$
(2)

in which, wk shows the frequency center of each IMF, \(\left\{ {w_{k} } \right\} = \left\{ {w_{1} ,w_{2} , \ldots , w_{k} } \right\};\left\{ {u_{k} } \right\} = \left\{ {u_{1} ,u_{2} , \ldots ,u_{k} } \right\}\) uk refers to decomposed band-limited IMF (Danandeh Mehr et al. 2023).

Ensemble empirical mode decomposition (EEMD)

While EMD has been extensively applied in non-stationary and/or nonlinear signal analysis, the decomposition results are susceptible to several problems due to mode mixing. To overcome this problem, the EMD ensemble method has been presented. Using the EMD method, a finite-amplitude white noise is introduced into the original signal while the signal is repeatedly decomposed into IMFs. The average of the IMFs obtained from each test is presented as the IMFs of the EEMD method. This means that the mode mixing problem is avoided with the EEMD method (Zhang et al. 2018).

Local mean decomposition (LMD)

Smith (2005) is a pioneer of the LMD method and has presented its advantages in the processing of non-stationary and nonlinear signals. The basic principle of using the LMD method in signal processing is given as follows:

  • The signal instantaneous amplitudes and frequencies are deduced from the initial signal using the moving average method (as in Eqs. (3) to (9)).

  • The LMD and frequency serve to produce a set of small product functions (PF), each of which is an envelope signal, a frequency-modulated signal being the sum of an envelope signal and a time-varying instantaneous phase, and a frequency-modulated signal from which an instantaneous frequency can be deduced (as in Eq. 10). The procedures for the calculation of the LMD are shown below (Li et al. 2023; Yongbo et al. 2019):

  • To derive the distance between successive local extrema of the signal to be averaged by weighting, the average value ni, ni+1 of two successive local extrema should be computed.

    $$m_{i} = \frac{{n_{i} + n_{i + 1} }}{2}$$
    (3)

The ai (i.e., envelope estimate) of two successive local extrema is given below:

$$a_{i} = \frac{{\left| {n_{i} - n_{i + 1} } \right|}}{2}.$$
(4)
  1. (2)

    The procedure is given as follows to obtain a completely frequency-modulated signal from which a positive frequency can be generated: Firstly, m11(t) (the local mean function) is acquired by implementing the moving average method to smooth the prediction of the envelope. Then, sort m11(t) from the original signal x(t) to deduce h11(t) (signal residual).

    $$h_{11} \left( t \right) = x\left( t \right) - m_{11} \left( t \right)$$
    (5)

Furthermore, the a11(t) envelope function is generated by utilizing the moving average method to smooth the estimations of the envelope.

  1. (3)

    Sort h11(t) from a11(t):

    $$s_{11} \left( t \right) = \frac{{h_{11} \left( t \right)}}{{a_{11} \left( t \right)}}.$$
    (6)

By repeating steps 1 to 2 q times with s11(t) as the original signal, a pure frequency modulated signal s1q(t) is derived with the envelope function meeting the following a1(q+1) (t) ≈ 1.

$$\left\{ {\begin{array}{*{20}l} {h_{11} \left( t \right) = x\left( t \right) - m_{11} \left( t \right)} \hfill \\ {h_{12} \left( t \right) = s_{11} \left( t \right) - m_{12} \left( t \right)} \hfill \\ \ldots \hfill \\ {h_{1q} \left( t \right) = s_{{1\left( {q - 1} \right)}} \left( t \right) - m_{1q} \left( t \right)} \hfill \\ \end{array} } \right.$$
(7)

in which

$$\left\{ {\begin{array}{*{20}l} {s_{11} \left( t \right) = \frac{{h_{11} \left( t \right)}}{{a_{11} \left( t \right)}}} \hfill \\ {s_{12} \left( t \right) = \frac{{h_{12} \left( t \right)}}{{a_{12} \left( t \right)}}} \hfill \\ \ldots \hfill \\ {s_{1q} \left( t \right) = \frac{{h_{1q} \left( t \right)}}{{a_{1q} \left( t \right)}}} \hfill \\ \end{array} } \right..$$
(8)

Once a signal is obtained meeting the envelope function, the iterations finish a1q(t) = 1.

  1. (4)

    The a1(t) is in Eq. (7), and the 1. component PF1(t) is:

    $$a_{1} \left( t \right) = a_{11} \left( t \right) \times a_{12} \left( t \right) \times a_{13} \left( t \right) \times \cdots \times a_{1q} \left( t \right)$$
    (9)
    $${\text{PF}}_{1} \left( t \right) = s_{1q} \left( t \right) \times a_{1} \left( t \right)$$
    (10)
  2. (5)

    μ1(t) can be subtracted via division PF1(t) from x(t):

    $$\mu_{1} \left( t \right) = x\left( t \right) - {\text{PF}}_{1} \left( t \right)$$
    (11)

To apply the LMD method, repeat steps 1 to 4 with μ1(t) as the new data and iterate this process k times until μk(t) remains fixed. The cycle concludes at this point, and μk(t) is directly associated with the residual (R). All residuals and PFs are equal to the accumulation of x(t):

$$X\left( t \right) = \mathop \sum \limits_{p = 1}^{k} {\text{PF}}_{p} \left( t \right) + \mu_{k} \left( t \right).$$
(12)

This method can divide the load into various parts and summarize the original load’s complexity according to various characteristics. Once the signal is acquired by LMD, a set of low-frequency PFs can be deduced from the high frequency, thereby abstracting the load signal with complicated properties into pure PFs with clear properties.

The complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN)

A new decomposition algorithm, namely CEEMDAN, has been proposed. It can overcome the mode overlap (overlapping of signal modes at multiple frequencies) problem from EMD. By adaptively adding white noise to each component of the EEMD, the white noise-induced reconstruction error will be eliminated by iteration, and IMFs will be acquired by calculating the residual error (Ma et al. 2020). The CEEMDAN enables decomposition accuracy and greatly minimizes mode overlap. The CEEMDAN’s dedicated procedures are given as follows:

  1. (1)

    The si(n) (signal time series) is separated by the CEEMDAN method. \({\text{IMF}}_{1} \left( n \right)\) is calculated as follows:

    $${\text{IMF}}_{1} \left( n \right) = \frac{1}{I}\mathop \sum \limits_{i = 1}^{I} {\text{IMF}}_{1}^{i} \left( n \right) = \overline{{{\text{IMF}}_{1} \left( n \right)}}$$
    (13)

    I is the time of decomposition.

  2. (2)

    Calculate the residual sequence and give the first residual sequence as follows:

    $$r_{1} \left( n \right) = s\left( n \right) - {\text{IMF}}_{1} \left( n \right).$$
    (14)
  3. (3)

    Adding ε1E1[νI (n)] to the first residual sequence, the realization r1(n) + ε1E1[νi (n)], i = 1, 2,…, I. is obtained. Separate r1(n) + ε1E1[νi (n)], i = 1, 2, …, I, and \(IMF_{2} \left( n \right)\) can be computed as follows:

    $${\text{IMF}}_{2} \left( n \right) = \frac{1}{I}\mathop \sum \limits_{i = 1}^{I} E_{1} \left( {r_{1} \left( n \right)} \right) + \varepsilon_{1} E_{1} \left( {\nu^{i} \left( n \right)} \right).$$
    (15)
  4. (4)

    After the k-th residue sequence is computed, the next modal component can be calculated as follows:

    $$r_{k} \left( n \right) = r_{k - 1} \left( n \right) - {\text{IMF}}_{k} \left( n \right)$$
    (16)
    $${\text{IMF}}_{k + 1} \left( n \right) = \frac{1}{I}\mathop \sum \limits_{i = 1}^{I} E_{1} \left( {r_{k} \left( n \right)} \right) + \varepsilon_{k} E_{k} \left( {\nu^{i} \left( n \right)} \right).$$
    (17)
  5. (5)

    When the residual sequence is unsuitable for further decomposition, the decomposition ends. Once decomposition has been performed K times, the residual sequence is shown as the result:

    $$R\left( n \right) = s\left( n \right) - \mathop \sum \limits_{i = 1}^{K} {\text{IMF}}_{k} \left( n \right).$$
    (18)

The s(n) can be formulated as:

$$s\left( n \right) = R\left( n \right) + \mathop \sum \limits_{i = 1}^{K} {\text{IMF}}_{k} \left( n \right).$$
(19)

Empirical wavelet transform (EWT)

The wavelet transform has emerged as a promising decomposition technique, demonstrating its potential to enhance the accuracy of hydrological models (Mehr et al. 2021). Gilles introduced EWT in 2013. EWT’s main idea is to derive amplitude-modulated-frequency-modulated (AM-FM) signals of a particular signal by adaptively designing a suitable wavelet filter bank. The EWT generates IMFs based on the information available in the signal spectrum. The process of decomposing the signal x(t) by using EWT can be expressed in the following way:

  1. (1)

    Use the Fast Fourier transform algorithm (FFT) to calculate the Fourier spectrum F(ω) of the original time series x(t).

  2. (2)

    Proper segmentation of the Fourier spectrum for extracting distinct modes and determination of boundaries.

  3. (3)

    Once the limits are detected by proper segmentation, compute the coefficients of approximation and detail by applying the appropriate scaling function and empirical wavelets to each determined segment. The limits ωi can be defined as follows:

    $$\omega_{i} = \frac{{f_{i} + f_{i + 1} }}{2}\;{\text{for}}\;1 \le i \le N - 1$$
    (20)

The code in the “Appendix” implements a comprehensive decomposition of a signal Empirical Orthogonal Function (EOF)-based LMD algorithm with an Empirical Optimal method (EOE). It iteratively extracts PFs until a stop criterion is met or the maximum number of PFs is reached. PFs, amplitudes, and siftings are stored in arrays, and the Orthogonal Residual Trend (ORT) is computed. Figure 3 indicates the flow chart of the applied modeling steps.

Fig. 3
figure 3

Systematic framework of the study

Results

Accurate estimation of ETo values is vital for managing water resources, agricultural production, dams, reservoir planning, irrigation projects, water treatment plants, hydroelectric generation, and ecological aspects. Within the scope of this study, a novel hybrid approach was proposed by combining nature-inspired ABC optimization and various signal processing techniques with neural networks to estimate ETo values. During the modeling phase, 70% of the data were separated as training (1.01.1979–11/25/2003) and 30% as testing (11/26/2003–7/24/2014). Tmax, Tmin, Tmean, WS, RH, Rs, Ra, and precipitation were input in all models to obtain ETo. It has benefited from the correlation matrix as a particular choice for detecting hidden combinations during modeling. Since ET values depend on many meteorological parameters, all parameters are used in the models. The correlation matrix was used to form the best model input combinations. Figures 4, 5, 6 and 7 demonstrate the 95% confidence interval correlation matrix used to choose the input variables to obtain ETo using hybrid-ABC-ANN with EMD, VMD, LMD, EEMD, CEEMDAN, and EWT techniques for Al-Qalyubiyah, Cairo, Damietta, and Port Said stations. The relationships of Tmax, Tmin, Tmean, WS, RH, Rs, Ra, and P were evaluated to forecast monthly ETo values. As a result, it was found that the solar radiation and min, average, and max temperature values are very effective in forecasting ETo. In addition, it was also determined that precipitation values are weakly correlated with ETo. Multiple combinations of parameters significant at the 95% confidence interval and positive correlation with ETo values were employed to establish the input combination of the hybrid models.

Fig. 4
figure 4

Correlation matrix between inputs and output for Al-Qalyubiyah station

Fig. 5
figure 5

Correlation matrix between inputs and output for Cairo station

Fig. 6
figure 6

Correlation matrix between inputs and output for Damietta station

Fig. 7
figure 7

Correlation matrix between inputs and output for Port Said station

The input and target variables used during modeling are presented below. In all models, inputs are Tmax, Tmin, Tmean, WS, RH, Rs, and Ra. The ABC-ANN hybrid model has been established. The parameters set include a feedforward neural network as the network structure, a transfer function of the tangent sigmoid, and a maximum iteration of 200.

Results of Al-Qalyubiyah station

Table 2 summarizes the performance analysis of the ETo prediction models et al.-Qalyubiyah station using three statistical criteria: MAE, MSE, and R2. Based on our analysis, the highest ETo prediction accuracy was obtained with ABC-ANN (Train R2: 0.990 and Test R2: 0.989), while the weakest results were obtained with LMD-ABC-ANN (Train R2: 0.872 and Test R2: 0.704). The ABC-ANN model outperformed all signal-processing-based models. In addition, the order of superiority of other data decomposition-based models was determined as EWT, CEEMDAN, VMD, and EMD.

Table 2 Comparison of ETo prediction models et al.-Qalyubiyah station

Figure 7 compares the ETo prediction successes of hybrid-ABC-ANN with EMD, VMD, LMD, EEMD, CEEMDAN, and EWT techniques in Al-Qalyubiyah station with the Taylor diagram. Model performances were evaluated using R, RMSE, and standard deviation values. Accordingly, while the ABC-ANN model gave the most accurate prediction results in both the training and testing phases, the LMD-ABC-ANN model showed the weakest prediction results (Fig. 8).

Fig. 8
figure 8

Estimation of ETo with Taylor diagrams at Al-Qalyubiyah station

Figure 9 shows the scatter diagrams of the training and test results at the Al-Qalyubiyah station. Scatter plots are graphical representations utilized to visualize the direction and configuration of the correlation between two data sets and the data distribution. Regions with high dot density indicate greater data concentration, whereas regions with low dot density may indicate less data. In the event that the dots indicate a downward trend, it can be deduced that there exists a negative correlation between the variables. A linear correlation between the variables exists if the dots are evenly spread in a straight line. If the data are randomly distributed, there is no correlation present. When examining the scatter plots of the training data in Fig. 9, it can be observed that all models exhibit a positive and linear distribution with the actual values. Additionally, the ABC-ANN algorithm shows an approximately linear trend. This indicates that it delivers superior results in ETo prediction. Furthermore, the hybrid approaches of EWT-ABC-ANN, VMD-ABC-ANN, and CEEMDAN-ABC-ANN demonstrate more accurate performance than EMD-ABC-ANN and LMD-ABC-ANN, as inferred. When evaluating the test results’ scatter plots, the ABC-ANN algorithm, which exhibits the most accurate and linear relationship, demonstrates superior outcomes. EWT-ABC-ANN and CEEMDAN-ABC-ANN also exhibit successful prediction results up to the second degree. Particularly, the LMD-ABC-ANN algorithm shows the weakest ETo predictions in terms of a random distribution.

Fig. 9
figure 9

Scatter plots of ETo prediction results at Al-Qalyubiyah station

Violin diagrams of Al-Qalyubiyah station ETo prediction results are shown in Fig. 10. Accordingly, it was evaluated which model had higher accuracy in estimating ETo. For this purpose, the structure and distribution of violin shapes belonging to the estimated dataset and the actual dataset were examined. The ABC-ANN algorithm has the closest structure to the actual values in the training and testing phases. In addition, the LMD-ABC-ANN model has the furthest distribution from the actual values and has the weakest results.

Fig. 10
figure 10

Violin plots of ETo prediction result at Al-Qalyubiyah station

Results of Cairo station

In Table 3, the ETo prediction models at Cairo station were analyzed according to various statistical criteria. As a result of the analysis, the highest ETo prediction accuracy was obtained with ABC-ANN (Train R2: 0.986 and Test R2: 0.986), while the weakest results were obtained with LMD-ABC-ANN (Train R2: 0.557 and Test R2: 0.794). In addition, it has been determined that the performance of the ABC-ANN model is slightly weakened when combined with data separation techniques. Moreover, the order of superiority of other data decomposition-based models was found in EWT, CEEMDAN, VMD, and EMD.

Table 3 Comparison of ETo prediction models at Cairo station

In Fig. 11, the ETo prediction performances of hybrid-ABC-ANN-based EMD, VMD, LMD, EEMD, CEEMDAN, and EWT models in Cairo station are evaluated according to the Taylor diagram. Model performances were evaluated according to R, RMSE, and standard deviation values. According to this, it can be said that it has the highest success since the highest R and lowest RMSE values were obtained with the ABC-ANN model both in the training and testing stages. In addition, the EWT-ABC-ANN model has the second most accurate result.

Fig. 11
figure 11

Estimation of ETo with Taylor diagrams at Cairo station

Figure 12 displays scatter plots of ETo predictions for Cairo Station during the training and testing phases. These graphs evaluate the prediction accuracy by examining the relationship and distribution between the predicted and actual time series. Based on these graphs, it can be inferred that the ABC-ANN, EWT-ABC-ANN, and CEEMDAN-ABC-ANN algorithms produce similar and satisfactory prediction results during the training phase. This is evident from the scatter plots, where the predicted values align closely with the actual values along a linear trend. However, the other models generally exhibit weaker ETo predictions regarding random distribution. A similar pattern emerges when analyzing the testing phase’s scatter plots. The ABC-ANN, EWT-ABC-ANN, and CEEMDAN-ABC-ANN hybrid algorithms also generate close and highly accurate prediction outputs. However, the LMD-ABC-ANN, VMD-ABC-ANN, and EMD-ABC-ANN algorithms yield weaker predictions compared to the others.

Fig. 12
figure 12

Scatter plots of ETo prediction results at Cairo station

In Fig. 13, the performances of ETo prediction models established at Cairo station were evaluated according to violin diagrams. The structure and distribution of the dataset estimated by the actual dataset were evaluated with violin plots. According to the analyses, the best estimation performance was shown in the training and testing phases since the ABC-ANN algorithm has the closest structure to the actual values. In addition, the LMD-ABC-ANN and EWT-ABC-ANN models have the furthest distribution and distribution from the true values and have the weakest results.

Fig. 13
figure 13

Violin plots of ETo prediction results at Cairo station

Results of Damietta station

In Table 4, the accuracy of the ETo prediction models at Damietta station is evaluated. Accordingly, the highest ETo prediction success was obtained with ABC-ANN (Train R2: 0.991 and Test R2: 0.989), while the weakest predictions were produced by LMD-ABC-ANN (Train R2: 0.385 and Test R2: 0.211). Moreover, the performance rankings of other data preprocessing algorithms, such as EWT, CEEMDAN, VMD, and EMD, are given.

Table 4 Comparison of ETo prediction models at Damietta

Figure 14 depicts the evaluation of ETo prediction success for the Damietta station using a hybrid-ABC-ANN-based data decomposition algorithm, which is assessed with a Taylor diagram. This analysis indicates that the ABC-ANN model achieved the highest success rates in both the training and testing stages, as evidenced by the lowest RMSE and highest R values. Moreover, the CEEMDAN-ABC-ANN model displays the second most accurate outcome.

Fig. 14
figure 14

Estimation of ETo with Taylor diagrams at Damietta station

Scatter plots of ETo predictions for Damietta Station during the training and testing phases are depicted in Fig. 15. The prediction accuracy is evaluated through an analysis of the relationship and distribution between the predicted and actual time series, as depicted in these graphs. The graphs suggest that the ABC-ANN, EWT-ABC-ANN, and CEEMDAN-ABC-ANN algorithms yield comparable and satisfactory prediction results during the training phase. The scatter plots demonstrate a linear trend in which the predicted values closely align with the actual values. Upon examination of the scatter plots of the testing phase, a similar pattern is evident. The output predictions of the ABC-ANN, EWT-ABC-ANN, and CEEMDAN-ABC-ANN hybrid algorithms are also highly accurate and closely related. Nevertheless, the LMD-ABC-ANN, VMD-ABC-ANN, and EMD-ABC-ANN algorithms demonstrate inferior predictive capabilities compared to the remaining algorithms. Furthermore, it can be inferred that the LMD-ABC-ANN approach yields the least accurate estimates.

Fig. 15
figure 15

Scatter plots of ETo prediction results at Damietta station

The ETo prediction models at Damietta station were evaluated in Fig. 16 using violin diagrams. The evaluation of the dataset’s structure and distribution, estimated by the actual dataset, was conducted using violin plots. The analyses show that the training and testing phases demonstrate the best estimation performance. This is because the ABC-ANN algorithm has the structure closest to the actual values. Furthermore, the LMD-ABC-ANN model, characterized by the farthest distribution from the true values, exhibits the weakest results.

Fig. 16
figure 16

Violin plots of ETo prediction results at Damietta station

Results of Port Said station

In Table 5, the ETo prediction results at the Port Said station are evaluated. Accordingly, the ABC-ANN (Train R2: 0.988 and Test R2: 0.987) model, which has the highest R2 and lowest error values, has the highest ETo prediction accuracy, while the weakest results are LMD-ABC-ANN (Train R2: 0.850 and Test R2: 0.735). Moreover, the performance superiority ranking of signal processing techniques is given as CEEMDAN EWT, VMD, and EMD.

Table 5 Comparison of ETo prediction models at Port Said station

The ETo prediction success of the hybrid-ABC-ANN-based data decomposition algorithm at Port Said station was assessed with the Taylor diagram, as shown in Fig. 17. Model performances were assessed according to the statistical values in the diagram. According to this, it can be stated that it has the highest success since the highest R and lowest RMSE values were obtained with the ABC-ANN model both in the training and testing stages. In addition, the CEEMDAN-ABC-ANN model has the second most accurate result.

Fig. 17
figure 17

Estimation of ETo with Taylor diagrams at Port Said station

The scatter plots of ETo predictions for Port Said Station during the training and testing phases are observed in Fig. 18. The precision of predictions is assessed via an examination of the correlation and dispersion among the predicted and actual time series, as illustrated in these charts. Based on the graphs, it can be inferred that the ABC-ANN, EWT-ABC-ANN, and CEEMDAN-ABC-ANN algorithms offer comparable and satisfactory prediction outcomes throughout the training phase. A linear pattern is evident in the scatter plots, with the projected values closely approximating the actual values. Despite this, the other models typically present less accurate ETo forecasts than the stochastic distribution. Upon analyzing the scatter plots of the testing phase, it becomes apparent that there is a comparable pattern.

Fig. 18
figure 18

Scatter plots of ETo prediction results at Port Said station

Furthermore, the forecast predictions obtained from the hybrid algorithms ABC-ANN, EWT-ABC-ANN, and CEEMDAN-ABC-ANN are remarkably precise and closely associated. Despite this, it should be pointed out that the predictive capabilities of the LMD-ABC-ANN, VMD-ABC-ANN, and EMD-ABC-ANN algorithms are inferior compared to the other algorithms. Moreover, one can conclude that the LMD-ABC-ANN technique results in the least accurate predictions.

Figure 19 evaluates ETo prediction models in Port Said with violin diagrams. Violin plots evaluated dataset structure and distribution. The training and testing phase showed that the ABC-ANN algorithm had the closest structure to the actual values and performed the best. The LMD-ABC-ANN model had the least accurate results, the furthest from the true values.

Fig. 19
figure 19

Violin plots of ETo prediction result at Port Said station

Discussion

Accurate prediction of ETo is of paramount importance in water resource management. Traditional instrumental measuring methods, such as the use of lysimeters, are expensive. On the other hand, well-documented empirical methods such as Penman–Monteith (PM) require diverse meteorological parameters that make their application difficult, particularly in scarcely gaged catchments existing in Egypt. Our literature review indicated that the previous applications of ML models in the field of ETo estimation commonly rely on using ordinary ML algorithms (e.g., ANNs, ELM, RF, and GEP) that may estimate ETo using fewer climatic predictors (Feng et al. 2017b; Fan et al. 2019; Gong et al. 2021). Recent studies have proposed hybrid ML models that combine various ML techniques to reduce estimation error (Reis et al. 2019; Chia et al. 2021). However, despite the complicated structure and time-consuming calculations of ensemble models, some cases may yield no significant difference between these ensemble models and standalone methods (Katipoğlu et al. 2023). Achieving new ML algorithms with simpler structures and fast calculation times can contribute to overcoming the weaknesses of ensemble ML models. Although the idea of hybridization using signal preprocessing commonly showed more accurate hydrological estimations (Mehr et al. 2021; Danandeh Mehr et al. 2023), the proposed ABC-ANN model consistently demonstrated superior performance over the hybrid models based on a variety of the models that used decomposed input variables.

Our findings demonstrated that the standalone ANN failed to forecast ETo effectively, especially in complex climatic conditions, as they could capture the nonlinear relationships among the variables. The ABC algorithm, with its advantage of structural optimization through automatic optimization of the number of hidden neurons, can capture the underlying pattern between the data. However, our study showed that the use of signal decomposition methods sometimes struggles to effectively capture temporal patterns such as seasonality and periodicity in ETo time series. The results from the four case-study areas clearly demonstrated that the integration of data preprocessing techniques with a perfectly structured ANN model (here, ABC-ANN) not only increases the evolved models’ complexity but also may reach even less accurate estimations. Accordingly, it is highlighted that the integration of ML models with optimization algorithms to identify the optimal parameters for accurate prediction is crucial. This conclusion agrees with a previous study by Zhu et al. (2020) and Roy et al. (2021), which, respectively, demonstrated the importance of structural optimization of the ELM and ANFIS for accurate ETo estimation. The effectiveness of ABC in structural optimization of ML models for hydrology is consistent with the findings of Huo et al. (2018), Ibrahim et al. (2022), and Katipoğlu et al. (2023). Likewise, the efficacy of the other optimization algorithm in hydrological modeling aligns with the findings of Tikhamarine et al. (2020). Optimization algorithms can quickly and accurately identify the best parameters, as opposed to manual trial-and-error methods.

The ABC-ANN hybrid model, designed for predicting ETo, offers several notable advantages, including high accuracy with impressive R-squared values observed during both the training and testing phases, robustness against data noise using advanced signal processing techniques such as EEMD and CEEMDAN, and adaptability via ABC optimization, which fine-tunes the model to varying datasets and climatic conditions within the tested regions. Although the ABC-ANN hybrid model is innovative, it has some algorithmic limitations that could affect its wider use. Integrating ABC optimization with ANNs, along with multiple signal processing techniques, results in high computational complexity. This complexity requires significant computational resources and can lead to longer execution times. Consequently, the model’s applicability may be limited in real-time or resource-limited environments where quick and efficient processing is crucial.

Furthermore, the scalability of the model presents an additional challenge. To apply the model to more extensive or more diverse datasets or to use it in different geographical and climatic conditions, some adjustments may be needed to ensure accuracy and reliability. Moreover, the performance of the ABC algorithm heavily relies on fine-tuning its parameters, such as population size and limit. This sensitivity requires expertise in optimization algorithms, potentially limiting the accessibility and usability of the model for those without specialized knowledge. As a result, its practical application may be affected.

The data collection process for our hybrid ABC-ANN model faces several limitations that could impact its overall performance and applicability. Challenges arise from issues concerning the availability and consistency of long-term climate datasets. These challenges include gaps, missing values, and variations in data collection methodologies over time. Such inconsistencies can have a negative impact on the model’s performance and restrict the generalizability of the study’s findings across different regions or contexts. Additionally, the data representatives play a crucial role in ensuring the accuracy of the model. Suppose the datasets fail to capture all relevant climate variations across different seasons and years. In that case, the model’s robustness and its ability to accurately predict ETo under various climatic scenarios may be compromised. Furthermore, it is crucial to prioritize the accuracy and resolution of the meteorological data, which is obtained from weather stations or satellite observations. Any inaccuracies or biases in these data sources can result in erroneous estimations of ETo, which in turn can misguide water resource planning and management efforts.

Conclusion and future outlook

The main objective of this study was to develop a novel hybrid prediction model based on an ABC-integrated ANN model with various signal processing techniques for accurate ETo prediction in four Egyptian Govern-orates from 1979 to 2014. Model inputs included Tmin, Tmax, Tmean, P, WS, RH, Rs, and Ra to understand the complex relationships between weather conditions and ETo dynamics. Our findings stated that highly effective factors in predicting monthly ETo time series were solar radiation and temperatures, which include maximum, minimum, and mean temperatures. In contrast, the correlation between precipitation and ETo was weak, highlighting its limited influence on the prediction accuracy of the applied models. As well, the hybrid ABC-ANN model proved capable of predicting ETo using various data decomposition techniques (EEMD, CEEMDAN, VMD, and EMD), demonstrating its effectiveness in capturing complex patterns within the datasets. The ABC-ANN model consistently outperformed other methods, with EEMD and CEEMDAN showing more promising results. The thorough evaluation of the model’s performance at different stations confirms the robustness and general applicability of the proposed hybrid approach. The ability of the ABC-ANN model to effectively capture complex relationships within the meteorological data is emphasized by its undisputed dominance over signal processing-based models. The superior performance of the hybrid model confirms its importance as a powerful tool in addressing the challenges associated with ETo prediction when compared to conventional signal processing techniques. The optimal model achieved highest ETo prediction accuracy in Al-Qalyubiyah, Cairo, Damietta, and Port Said for testing, respectively, compared to other developed models.

In the field of ETo estimation, hybrid models such as ABC algorithms and ANN have been identified as a hybrid approach that produces results close to reality (R2 values are usually around 0.99) due to their ability to optimize neural network parameters and increase model accuracy. Many studies in the literature support these results (Katipoğlu et al. 2024). ABC optimization further improves this by adjusting the neural network parameters and provides better adaptation to the variability in evapotranspiration processes, especially in arid and semiarid climates where such estimates are vital.

Nourani et al. (2014) showed that decomposed data significantly increase ANN performance by isolating important features from noisy datasets. In the study, although ABC-ANN and data decomposition-based ABC-ANN results give very close results, the ABC-ANN model produced slightly more accurate outputs. In this respect, the results do not overlap. The reason for this can be explained by the length of the data series used, the multitude of data parameters and the use of different input types.

Future research can investigate the suitability of this approach for other geographical regions and climates, leading to further advances in accurate ETo estimation and its diverse applications. The combination of nature-inspired optimization and ML has the potential to capture intricate patterns within the data, as highlighted by this finding. A better understanding of ETo prediction could be improved by including other meteorological variables, extending the approach to different regions, and evaluating the transferability of the model to different time scales. The accuracy of novel hybrid models established by combining multi-verse optimizers, artificial algae algorithms, biogeography-based optimization, dragonfly optimization, flower pollination algorithms, monarch butterfly optimization, and various deep learning algorithms can be evaluated to predict ETo values. It is also recommended to compare the performances of models established in different climatic regions using hybrid ML-based non-negative matrix factorization, independent component analysis, and time–frequency signal decomposition techniques.