Estimation of reference evapotranspiration based on machine learning models and timeseries analysis: a case study in an arid climate

Water scarcity is a major challenge for irrigated agriculture, particularly in developing countries where access to meteorological data for calculating reference evapotranspiration (ETo) is limited. Thus, this study explores the potential of two machine learning models (random forest (RF) and long short-term memory (LSTM)) and autoregressive integrated moving average (ARIMA) to forecast ETo. The investigation was conducted for four weather stations in Egypt, from 1982 to 2020. The machine learning models were evaluated using four combinations of inputs: maximum and minimum temperature, relative humidity, and wind speed. The best results for both RF and LSTM models were achieved with the first set of inputs that included all four variables at both regional and local scales. For the regional scale, RF and LSTM models achieved R2 values of 0.85 and 0.86, respectively, with RMSE values of 0.69 and 0.68 mm/day. At the local scale, RF and LSTM models exhibited R2 values ranging from 0.92 to 0.95 and 0.93 to 0.95, respectively, while RMSE ranged between 0.38 and 0.46 mm/day and 0.37–0.43 mm/day, respectively. Additionally, ARIMA models were employed for tim series analysis of the same ETo data. ARIMA (2,1,4) and ARIMA (2,1,3) were found to be the most suitable models for the local-scale analysis while ARIMA (2,1,4) was identified as the optimal model for the regional-scale analysis. For the local-scale analysis, R2 values ranged from 0.86 to 0.91 and RMSE values ranged from 0.26 to 0.38. The regional scale analysis yielded an R2 value of 0.89 and an RMSE value of 0.58 mm/day. The developed models can be used in places where meteorological data for forecasting ETo are limited.


Introduction
The widespread climatic changes in the twenty-first century and the negative impacts that follow on the available water resources have become one of the most important issues that cast their shadows on the focus of contemporary environmental events and issues (Smith et al. 2012;UNEP, 1990).The irrigated agriculture sector represents the largest consumer of water in Egypt, representing 85% of the total water share available to Egypt, which represents 55.5 billion cubic meters.Water is a scarce resource, and the problem is likely to persist into the future.Drought is defined as a water shortage caused by a disparity between supply and demand (Xu et al. 2020).This is due to the tremendous changes in the climate causing variations in air temperature, relative humidity, and solar radiation (Haskett et al. 2000).These climatic factors cause a change in evapotranspiration which 216 Page 2 of 17 disturbs the hydrological cycle on a global scale.The prediction of the reference evapotranspiration (ET o ) helps to guide and evaluate the impact of climate change on agriculture and thus on food security (de Oliveira e Lucas et al. 2020).On the other hand, climate change is the most important issue in water resources studies (Misra 2014).In climatological study, temperature, relative humidity, and precipitation are the most important factors for forecasting, making decisions, managing risks, and optimizing uses of water resources (Meshram et al. 2015).
An accurate estimate of ET o is essential for maintaining the hydrological cycle, crop yield simulation, water management, and irrigation system design, as well as irrigation scheduling.Due to the difficulty of direct measurement of reference evaporation, it is estimated from meteorological data such as wind speed, solar radiation, humidity, and temperature (Pereira et al. 2015).Indirect methods have been used to estimate ET o such as the FAO-56 Penman-Monteith (FAO56-PM) equation (Allen et al. 1998).These methods depend on meteorological variables that are sometimes not available at or near the site, especially those related to solving the aerodynamic term, wind speed, and water vapor pressure deficit in the air.Therefore, ET o estimation methods as a function of climatic elements, such as air temperature and extraterrestrial radiation, can be obtained simply and more feasibly (Hargreaves and Samani 1985;Samani 2000), and some have been tested and verified in many studies (Ahooghalandari et al. 2016;Almorox et al. 2016;Valiantzas 2018;Zanetti et al. 2019).
The main challenge in water scarcity research is to develop suitable methods or techniques to predict the factors that are affected by climate change and the consequences on water availability and ET o estimation.ET o is characterized by high nonlinearity and non-stationarity (Hernández et al. 2011), making its daily forecast difficult.However, timeseries analysis is a specific way of analyzing a sequence of datasets collected over a time interval that allows the development of a mathematical model to explain systematic patterns embedded in the data.The most apparent patterns appearing in timeseries data are trends and seasonality (Box et al. 2015).Moreover, the forecast of timeseries depends on the previous data, which are used to create relationships between the data that have continuous observations (Box et al. 2015).However, the abstraction of autocorrelation components from the timeseries data remains a challenge in timeseries analysis techniques (Box 2013).Recently, more effort has been dedicated to using stochastic models in hydrology and climatology (Mossad and Alazba 2016).
Timeseries analysis is a powerful statistical prediction tool that relies on the collection and analysis of past observations of a variable to create a model for future trends.The timeseries models do not presume knowledge of any structural relationships between variables involved in the studied process, such as evapotranspiration rates and climatic variables.These models are stochastic because an observed timeseries is an actual realization of a stochastic process (Arca et al. 2004).The autoregressive integrated moving average (ARIMA) models (Box et al. 1995) are the most popular timeseries tools.For the ARIMA models, the forecast of a variable is described as a linear (additive) combination of the previous gates of the variable (pure autoregressive component) and the previous forecast errors (pure moving average component).ARIMA models are commonly used to forecast ET o (Alireza and Hossein 2015;Arca et al., 2004;Gautam and Sinha 2016).
Recently used computer software models have shown high accuracy in estimating and forecasting ET o , e.g., support vector machines (SVMs) (Fan et al. 2019).In recent years, the use of machine learning programs for ET o estimation has spread by making relationships between the inputs and outputs used in ET o estimation, which are mainly meteorological data, which gives higher accuracy and power to apply machine learning programs in ET o modeling (Ferreira and Cunha 2020a, b;Kumar et al. 2011).Machine learning methods have been utilized successfully in recent years to estimate ET o with fewer meteorological data.These models can capture complicated interactions between the input and the output data, making them effective ET o modeling tools.Several models have been evaluated such as artificial neural network (ANN) (Afzaal et al. 2020a, b;Alves et al. 2017a, b;Farooque et al. 2021;Traore et al. 2016), support vector machine (SVM) (Farooque et al. 2021;Ferreira et al. 2019a;Mehdizadeh et al. 2017;Traore et al. 2016), multivariate adaptive regression splines (MARS) (Ferreira et al. 2019b;Mehdizadeh 2018;Wu and Fan 2019), and random forest (RF) (Feng et al. 2017) used meteorological parameters to estimate daily ET o amounts through random forest (RF) and generalized regression neural network (GRNN) methods for southwestern China.Although both methods were found to be acceptable, the RF method was found to be superior.Wang et al. (Wang et al. 2019) used meteorological data from the Karst region in southwest China and the RF and GEP models to estimate ET o .The results showed that RFbased models performed better, but GEP-based models were recommended because they provide understandable equations and simpler to use.Moreover, long short-term memory (LSTM) was applied to assess the potential of machine learning models in forecasting irrigation water requirements (IWR) of snap beans by evolving multi-scenarios of inputs parameters to figure out the impact of meteorological, crop, and soil parameters on IWR in Egypt (Mokhtar et al. 2023).
Traditional machine learning models, such as ANN and SVM, have been employed for ET o forecasting as indicated above.In recent years, the deep learning models have received a lot of attention, and they have been applied in a variety of fields, outperforming traditional machine learning models (Alibabaei et al. 2021;Chen et al. 2020;Ferreira andda Cunha 2020a, 2020b;Nagappan et al. 2020;Saggi and Jain 2019;Sattari et al. 2020;Tikhamarine et al. 2019).LSTM in timeseries forecasting (Afzaal et al. 2020c;Alibabaei et al. 2021;Farooque et al. 2021;Son and Kim, 2020;Tian et al. 2018;Zhou et al. 2019) can be employed for timeseries forecasting.Marndi et al. (2021) applied long shortterm memory (LSTM) for predicting rice yield using different input scenarios.The best LSTM model was achieved using rainfall as an input variable for rice yield forecasting.Sultana and Khanam (2020) compared the performance of autoregressive integrated moving average (ARIMA) and artificial neural network (ANN) on univariate time series data of yearly rice production from 1972 to 2013.According to this study, the ARIMA model outperforms the ANN model since the estimated error of ANN was significantly higher than ARIMA errors.
Some studies predict ET o using expected meteorological data, such as public weather forecasts (Cai et al. 2007;Perera et al. 2014;Traore et al. 2017;Yang et al. 2019).However, ET o is predicted in this study using past meteorological data.By this way, there is no need for external data but to rely solely on the collected data from a local weather station.Given the importance of ET o forecasts, the objective of this study is to assess ET o under arid climate conditions in Egypt at the regional and local scales using ARIMA, RF, and LSTM models.In addition, we identify the appropriate model and dataset input for ET o estimation given a limited number of meteorological data.This research is critical in determining the best approach (optimal model and input variables) that could be used as a simple, rapid, and inexpensive approach for timely and reliable ET o prediction at local and regional scales across Egypt.Furthermore, this paper aims to participate in saving water tasks by forecasting ET o across Egypt.To the best of our knowledge, the applied approaches are still poorly investigated for ET o , especially those based on different climate inputs at both regional and local levels.So, the novel contribution of this work is to develop and compare the results from ARIMA and machine learning models.Thus, this investigation presents a pioneering modeling strategy that would lead to improvement of efforts to address the deficiencies in ET o prediction which could make irrigation scheduling improved and so give better solutions for decision-makers.

Study area and data collection
This study was conducted in four regions in Egypt (Fig. 1).The latitude and longitude of the weather stations within the regions are as follows: Station 1 is Cairo (30.1162 N, 31.4094E), Station 2 is Ismalia (30.5567N, 32.2652 E), Station 3 is Benisouif (28.9082N, 30.9505E), and Station 4 is Sohag (26.634 N,31.6526 E).These regions were chosen to reflect the different climatic conditions of Egypt.Data for the daily meteorological variables of minimum and maximum temperature (Tmin and Tmax, o C), relative humidity (RH, %), and wind speed (WS, m/s) were obtained from 1982 to 2020.Wind speed measured at 10 m height was converted to 2 m values as presented by Allen et al. (1998).The meteorological data were obtained from two sources: (i) the Egyptian Meterological Authority and (ii) NASA ( 2015) gridded daily data over the study regions.The daily reference evapotranspiration (ET o ) was determined using FAO version 3.2 calculator software based on the standard FAO-Penman method described in (Allen et al. 1998).
With the LSTM and RF models, four combinations of input sets of (i) Tmin, Tmax, WS, and RH, (ii) Tmin and Tmax, (iii) Tmin, Tmax, and WS, and (iv) Tmin, Tmax, and RH were implemented as presented in Table 1.With the ARIMA model, the input was a relation between the date and the ET o calculated from the four collected weather parameters (Tmin, Tmax, WS, and RH) using the ET o Calculator.Moreover, the collected data were divided into three distinct timeframes for calibration/training (1982-2007), cross-validation (2008-2011), and validation (2012-2020) for forecasting ET o , which was compared to the FAO 56 PM observations at the weather station.A flowchart of the machine learning models and ARIMA model used in the study is shown in Fig. 2. The LSTM and RF models implemented in Python 3.8 and the ARIMA model implemented in MATLAB (2021a) were used.

Long short-term memory
Long short-term memory (LSTM) models were introduced by (Hochreiter and Schmidhuber 1997), and they are a type of recurrent neural network that can learn data dependencies over time.This is possible because the recurring module of the models is made up of four layers that interact with one another.To address short-term memory problems, input (X t ), output (o t ), forget (f t ), hidden gate, and cell gate were added to the LSTM blocks.The forget gate determines which information should be removed from the cell gate, resulting in an (f t ) value.In addition, the forget gate can discard irrelevant information based on relevance.The input gate determines which information from the cell gate should be updated, resulting in its value.The output gate is in charge of producing (o t ), which is used to compute the hidden gate (h t ) using a filtered version of the cell gate.Following the f t function, the tanh and sigmoid functions were used to scale the values for further processing.The combined gate (C t ) is calculated as the dot product of the tanh and sigmoid outputs.Figure 3 depicts a more detailed overview of the LSTM information flow memory block.

Random forest (RF)
RF is a supervised algorithm, and it is one of the most widely used algorithms for both regression and classification.It is an improvement on the decision tree algorithm in that decision trees have a massive limitation of overfitting.In this algorithm, decision trees are fitted in different subsets of the training data.The number of trees in the algorithm has a direct relationship with the results it can achieve and has to be optimized.Although each individual tree is a weak learner (Fig. 4), RF combines the predictions of all trees (ensemble), resulting in a powerful model (Huang et al. 2019).This model has the advantage of requiring less hyperparameter adjustment in general.More information on RF can be found in Tyralis et al. (2019).
A tree is created by selecting a random collection of variables that will be utilized to decide the outcome of the forecast.Two crucial parameters in the RF training process are the number of trees (ntree) and the number of variables available for selection in each split (mtry) (Houborg and McCabe 2018).The RF approach is made up of groups of

The ARIMA model
The ARIMA model is based on the Box-Jenkins methodology, which is fitted to a given timeseries, and it is parsimonious in terms of the number of model parameters.It depends on a three-step iterative process of model identification, estimation, and diagnostic checking to define the best parsimonious model.This three-step process is repeated several times until a satisfactory model is finally selected.Finally, this model is used to forecast future values of the timeseries (Box et al. 2015).

Model identification
The AR (autoregressive) part of the ARIMA model shows that the variable of interest is regressed on its own prior values.The MA (moving average) part of the ARIMA model shows that the regression error is a linear combination of error values occurring at various time intervals in the past.The I (integrated) part shows the number of times differencing has been performed.The entire objective of banding adequate AR, I, and MR terms is to produce the best parsimonious model to fit the time series data (Box 2013).The model assumes the data to be a non-seasonal timeseries, and therefore, the data need to be de-seasonalize before modeling.A non-seasonal ARIMA model is generally denoted as ARIMA (p, d, q), where p is the order of AR, d is the order of differencing, and q is the order of MA.The ARIMA methodology has its own limitations by relying on past values.However, it works best for long and stable timeseries (Box 2013;Marco et al. 2012).The nonseasonal part of the ARIMA model can be expressed as: where (B) and θ(B) are polynomials for p and q order, respectively.

Model estimation
After identifying the appropriate model as the first step, the model parameters have to be estimated.The AR and MA parts parameters were calculated using the procedure suggested by (Box 2013).The AR and MA parameters should be tested for statistical significance.

Diagnostic checking
After the estimation of the model parameters, diagnostic checking has to be performed to verify the adequacy of the model.Several diagnostic statistics and plots of residuals were investigated to ascertain whether the residuals are correlated or white noise.The residuals analysis involves checking the autocorrelation function (ACF), the partial autocorrelation function (PACF), and the histograms of the residuals and the residual distribution around the mean (Box 2013;Mossad and Alazba 2016).

Performance evaluation of the applied models
The models were assessed for each meteorological station using the performance statistics of the mean absolute error (MAE), the Nash-Sutcliffe efficiency coefficient (NSE), the coefficient of determination (R 2 ), and the root-mean-square error (RMSE), expressed as: where O i and P i , i are the observed and the predicted values, respectively, at the time i, and n is the number of observations.O and P represent the average values of the observed and the predicted values, respectively.

Basic statistics
Table 2 shows the mean and the standard deviation values of the climatic variables from 1982 to 2020 in four weather stations.The mean of maximum air temperature was in the range of 28.4-30.9°C with standard deviation values being between 7.1 and 7.7 °C for all stations, and those of the minimum air temperature were in the range of 14.8-15.7 °C for the mean and 5.3-6.7 °C for the standard deviation.For the wind speed, the mean range was 4.9-8.7 m/s and the standard deviation was 1.5-2.6 m/s for all stations, and the mean of the relative humidity was in the range of 30.1-54.1% and the standard deviation of 10.7-15.5%, and the mean ET o computed from the FAO-56 method had a range of 4.33-4.82mm/day and the standard deviation a range of 1.6-1.8mm/day for all the stations.

RF model
The results obtained with the RF model using the four input combinations datasets of (1) Tmin, Tmax, RH, WS; (2) Tmin, Tmax; (3) Tmin, Tmax, WS; and (4) Tmin, Tmax, RH, are presented in Fig. 5 for the regional scale.Using all input variables (1) yielded the best result having R 2 = 0.85, RMSE = 0.69 mm/day, MAE = 0.51 mm/ day, and NSE = 0.85, while the second set of input combinations was the worst performer with R 2 = 0.80, RMSE = 0.80 mm/day, MAE = 0.62 mm/day, and NSE = 0.80 but its results are still considered as good.
At the local scale, we trained and tested the RF model for each weather station to investigate how the climate variables impact evapotranspiration in the different climatic regions.Table 3 shows the heatmap of R 2 , RMSE, NSE, and MAE indices over the four weather stations.Of the four weather stations, the R 2 of the input dataset 1 (4 inputs) ranged between 0.92 and 0.95, and the range for the RMSE was 0.38-0.46mm/day.For the MAE, the range was 0.28-0.33mm/day and for the NSE it was between 0.92 and 0.95.As was the case for the regional scale, the second input combination dataset recorded the lowest values of R 2 (0.86-0.93) and NSE (0.86-0.93) and the highest values for RMSE (0.50-0.58 mm/day) and MAE (0.37-0.44 mm/day).Thus, for both the local and regional scale, the first input combination dataset (Tmin, Tmax, RH, WS) yielded the best results, stemming from the fact that it used the maximum number of input variables which increases the accuracy of evaporation prediction.

LSTM model
Figure 5 is also shown the results of the LSTM model using the different input data combinations for the regional scale.For the regional scale, the best result was

Overall evaluation
To compare LSTM and RF models, a boxplot was made based on the residuals (Fig. 6).For the regional scale, the two best input combination datasets with the LSTM and the RF models were selected to determine the best input combination dataset and model pair (Fig. 6a).As shown in Fig. 6a, the inter-quartile range (IQR) values of RF1, RF4, LSTM1, LSTM4 were 0.756, 0.801, 0.870, and 0.885, respectively.Furthermore, the RF model with input combination dataset 1 appears to be the best model having the lowest error in comparison with the other pairs.For the local scale (Fig. 6b), the IQR values of RF1-St1, RF1-St2, RF1-St3, and RF1-St4 were 0.442, 0.4, 0.447, and 0.409, respectively, while the corresponding values of RF4-St1, RF4-St2, RF4-St3, and RF4-St4 were 0.445, 0.520, 0.413, and 0.433, respectively.The LSTM1 was the best for all stations.The Q1 value of LSTM1-St1 was −0.149, while LSTM4-St3 was −0.205 (Fig. 6c).Moreover, a smaller IQR by LSTM1-St1 of 0.452 clearly shows that the distribution error of LSTM1-St1 is much better than the others because of the higher concentration around the mean.
The RF model performed well at both the local and the regional scales.(Jeong et al. 2016) reported that the RF model may suffer overfitting to data because its algorithm consists of an ensemble of a large number of decision trees that may not be fully described mechanistically.Also, RF may cause a loss of accuracy when extreme ends are expected or responses are outside the limits of the training dataset (Jeong et al. 2016).Our results for the RF model for ET o forecasting were better than the results obtained by Son and Kim, (2020b) based on RMSE.Although the ET o was forecast based on ANN and SVR models in Iran (Maroufpoor et al. 2020), our results based on RF are better than theirs.The main reason is that the input datasets are more related to the ET o in our study area.Using the extreme learning machine model to forecast ET o in eight provinces in China (Wu et al. 2019), they achieved better results (R 2 = 0.99 and RMSE = 0.15 mm/day) than those obtained in this study which was conducted on the local scale and a shorter timeseries data which decrease the variations between the data that finally resulted in high model performance.Ferreira et al. (2019a) reported performance improvements when relative humidity was added to temperature in machine learning models developed for Brazil.Our study gave better results when we considered wind speed in addition to relative humidity and temperature.Furthermore, Afzaal et al. (2020b) reported performance improvements when relative humidity was added to temperature in the LSTM model developed for Canada.Moreover, Barzegar et al. (2020); Ferreira and da Cunha (2020c); Kim and Cho (2019); and Landeras et al. ( 2009) also reported better performance of deep learning over traditional machine learning models Marndi et al. ( 2021) have proven that LSTM was a good model for rice production in India.Mokhtar et al. (2023) used the LSTM and RF models to predict the irrigation water requirements of the green bean crop in Egypt through an actual field experiment, and they gave satisfactory results.
In the present study, however, the deep learning model provided marginally better results.Nevertheless, deep learning models generally have more hyperparameters to be adjusted, requiring more time for training.By contrast (Landeras et al. 2009), forecasting weekly ET o with ARIMA and ANN reduced RMSE with respect to weekly historical means by only 6-8%.ET o forecasting is a complex task since it is affected by several meteorological variables, which can vary widely daily.

Identification of AR(p) and MR(q) components
The generated timeseries for daily ET o is illustrated in Fig. 7 for the four selected weather stations from 1982 to 2011. Figure 7 shows a nonlinear and a seasonal component in the original timeseries data, exhibiting an annual cycle.Therefore, a non-Gaussian is often used to evaluate the effectiveness of nonlinear models.Figure 7 shows that there are no abnormal flocculating trends in the timeseries.The same thing is observed by inspection of the autocorrelation (ACF) and the partial autocorrelation functions (PACF) for the original data displayed in Fig. 8.Because the data show non-stationarity, the differencing approach was applied to make the timeseries stationary (Hyndman and Athanasopoulos, 2018) and the ACF and PACF of the transformed data are plotted in Fig. 9. Correlations within the blue lines indicate that they are significantly different from zero.For the regional scale, the average daily ET o for the four stations was used for estimating the ACF and PACF in Figs. 8 and 9.

Estimation of the appropriate p, d, q values
Model selection can be made by the maximum likelihood used in the parameter estimation as explained by Box (2013).The most appropriate model structure was selected through two information criteria: the Akaike information criterion (AIC) and the Bayesian information criterion (BIC).Generally, the model with the lowest AIC and BIC values is considered the best as explained by Ampaw et al. (2013;Hyndman and Athanasopoulos, (2018);and Marco et al. (2012).
According to the results shown in Table 4, the ARIMA (2,1,4) for station 1, ARIMA (2,1,3) for station 2, ARIMA (2,1,4) for station 3, and ARIMA (2,1,3) for station 4 models were identified as the best-fitted models among the 16 ARIMA models tested at the local scale.For the regional scale, only ARIMA (2,1,4) and ARIMA (2,1,3) were tested as they were the best-fitted models for the local scale yielding the minimum AIC and BIC values.The model with the smallest AIC value yields residuals that resemble white noise (Mossad and Alazba 2016), a confirmation of the appropriateness of the selected ARIMA model (Ord et al. 2017).

Estimation of the best ARIMA model
The selected ARIMA models needed to be validated to check their appropriateness, an essential and last step before using the selected model for forecasting.The diagnostic checking step assures the reliability of the chosen models (Dimri et al. 2020) (Mossad and Alazba 2016).One of the convenient methods used to validate the model is the graphical technique.Hence, many validation plots were investigated to check whether the residuals were white noise.Figure 10 shows the estimated ACF and PACF of the residuals for the candidate model at various numbers of lags with 95% probability at local and regional scales.Most of the ACF and PACF values were not significantly different from zero as they lie within the confidence limits.Therefore, there is no significant correlation between the residuals.

Model forecasting
Forecasting helps to predict future uncertainty based on the behavior of the past and the current observations.It was done using the best-fitted ARIMA models.Data from 2012 to 2020 (2987) were used in the forecast to ascertain the validity of the developed models.Some performance statistics (R 2 , RMSE, MAE, NSE) were used for evaluating the agreement between the predicted and the observed timeseries at the local and regional scale (Table 5).For the four weather stations (the local scale), the R 2 values range was 0.86-0.92and that for RMSE was (0.61-0.90 mm/day).A range of 0.45-0.76mm/day was recorded for MAE and 0.80-0.90for the NSE.For the regional scale, the performance statistics were R 2 = 0.90, RMSE = 0.58 mm/day, MAE = 0.42 mm/day, and NSE = 0.93.Sultana and Khanam (2020) forecasted the production of rice in Bangladesh using ARIMA and ANN.The results indicated that the ARIMA model outperformed the ANN model for predicting rice production based on the  (Chen et al. 2018);and Poland (Murat et al. 2018).They indicate an important note that the performance of the ARIMA model may vary depending on the specific dataset, geographical location, and climatic conditions.Therefore, the ARIMA model changes with each climate zone.However, these different studies state that ARIMA models could be promising tools for ET o and climate variables forecasting.

Conclusion
This study focused on modeling and forecasting the daily reference evapotranspiration (ET o ) using two machine learning models [random forest (RF) and long short-term memory (LSTM)] and autoregressive integrated moving average (ARIMA) models.Four input climatic data combinations assisted the machine learning models.The results of this study could assist policy and decision-makers to develop water resources strategies in Egypt, and any arid region for that matter, which are very important nowadays due to water shortages resulting from climate change.The first part of the study focused on the possibility of forecasting ET o using the RF and LSTM models.The results attested that RF1 and LSTM1, i.e., using all the climatic input data, have the lowest error at both the regional and local scales.Further, the RF1 model having the lowest error shows that its distribution of error is much better than the others.The second part of the study focused on the possibility of forecasting ET o using the ARIMA modeling approach.Therefore, different ARIMA model structures have been proposed based on correlation methods (ACF and PACF) for the four stations in Egypt.The best ARIMA model structure was selected according to the lowest AIC and BIC values, and the selected models were ARIMA (2,1,4) for station 1, ARIMA (2,1,3) for station 2, ARIMA (2,1,4) for station 3, and ARIMA (2,1,3) for station 4. In addition, a high correlation was noticed between the four weather stations, and the ARIMA model (2,1,4) appeared to be a reasonable prediction of the ET o for the combined four stations' data to constitute the regional data.These results are promising, and the proposed ARIMA model structure, RF, and LSTM could be considered for forecasting daily ET o within the study regions.

Fig. 1 ARIMA
Fig. 1 The location of the weather stations

Fig. 2
Fig. 2 Computational flowchart adopted for the ET o forecast: a machine learning models and b ARIMA model

Fig. 5
Fig. 5 Performance statistics of LSTM and RF models based on the four input datasets

Fig. 6
Fig. 6 Boxplots showing the distribution of the estimation errors at the testing stage for the four input datasets for a regional scale, b RF for each station (local scale), and c LSTM for each station.Q1: lower

Fig. 7 Fig. 8 Fig. 9
Fig. 7 Timeseries of daily ET o of the four selected study weather stations

Table 2
The mean and the standard deviation values of the daily climate variables during the period from 1982 to 2020 for the four weather stations NB Tmax and Tmin are the daily observed maximum and minimum air temperatures, respectively; WS is the wind speed; RH is the relative humidity, ET o is the daily reference evapotranspiration

Table 4
The goodness of fit for the ARIMA (p,d,q) models Autocorrelation function (ACF) and partial autocorrelation function (PACF) of the residuals for the best selected ARIMA models with 5% significance confidence limits at the local and regional scales Page 15 of 17 216 RMSE, MAE, and MAPE values.The ARIMA model can be a valuable tool for forecasting daily reference evapotranspiration.Accurate ET0 forecasts are crucial for irrigation scheduling, drought monitoring, and water allocation.Several studies have successfully utilized ARIMA models for ET o and climate variables forecasting in many countries, such as Saudi Arabia (Mossad and Alazba 2016); India (Dimri et al. 2020); Colombia (Martínez-Acosta et al. 2020); China

Table 5
Performance statistics values at the local and regional scales