Abstract
Water scarcity is a major challenge for irrigated agriculture, particularly in developing countries where access to meteorological data for calculating reference evapotranspiration (ETo) is limited. Thus, this study explores the potential of two machine learning models (random forest (RF) and long short-term memory (LSTM)) and autoregressive integrated moving average (ARIMA) to forecast ETo. The investigation was conducted for four weather stations in Egypt, from 1982 to 2020. The machine learning models were evaluated using four combinations of inputs: maximum and minimum temperature, relative humidity, and wind speed. The best results for both RF and LSTM models were achieved with the first set of inputs that included all four variables at both regional and local scales. For the regional scale, RF and LSTM models achieved R2 values of 0.85 and 0.86, respectively, with RMSE values of 0.69 and 0.68 mm/day. At the local scale, RF and LSTM models exhibited R2 values ranging from 0.92 to 0.95 and 0.93 to 0.95, respectively, while RMSE ranged between 0.38 and 0.46 mm/day and 0.37–0.43 mm/day, respectively. Additionally, ARIMA models were employed for tim series analysis of the same ETo data. ARIMA (2,1,4) and ARIMA (2,1,3) were found to be the most suitable models for the local-scale analysis while ARIMA (2,1,4) was identified as the optimal model for the regional-scale analysis. For the local-scale analysis, R2 values ranged from 0.86 to 0.91 and RMSE values ranged from 0.26 to 0.38. The regional scale analysis yielded an R2 value of 0.89 and an RMSE value of 0.58 mm/day. The developed models can be used in places where meteorological data for forecasting ETo are limited.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
The widespread climatic changes in the twenty-first century and the negative impacts that follow on the available water resources have become one of the most important issues that cast their shadows on the focus of contemporary environmental events and issues (Smith et al. 2012; UNEP, 1990). The irrigated agriculture sector represents the largest consumer of water in Egypt, representing 85% of the total water share available to Egypt, which represents 55.5 billion cubic meters. Water is a scarce resource, and the problem is likely to persist into the future. Drought is defined as a water shortage caused by a disparity between supply and demand (Xu et al. 2020). This is due to the tremendous changes in the climate causing variations in air temperature, relative humidity, and solar radiation (Haskett et al. 2000). These climatic factors cause a change in evapotranspiration which disturbs the hydrological cycle on a global scale. The prediction of the reference evapotranspiration (ETo) helps to guide and evaluate the impact of climate change on agriculture and thus on food security (de Oliveira e Lucas et al. 2020). On the other hand, climate change is the most important issue in water resources studies (Misra 2014). In climatological study, temperature, relative humidity, and precipitation are the most important factors for forecasting, making decisions, managing risks, and optimizing uses of water resources (Meshram et al. 2015).
An accurate estimate of ETo is essential for maintaining the hydrological cycle, crop yield simulation, water management, and irrigation system design, as well as irrigation scheduling. Due to the difficulty of direct measurement of reference evaporation, it is estimated from meteorological data such as wind speed, solar radiation, humidity, and temperature (Pereira et al. 2015). Indirect methods have been used to estimate ETo such as the FAO-56 Penman-Monteith (FAO56-PM) equation (Allen et al. 1998). These methods depend on meteorological variables that are sometimes not available at or near the site, especially those related to solving the aerodynamic term, wind speed, and water vapor pressure deficit in the air. Therefore, ETo estimation methods as a function of climatic elements, such as air temperature and extraterrestrial radiation, can be obtained simply and more feasibly (Hargreaves and Samani 1985; Samani 2000), and some have been tested and verified in many studies (Ahooghalandari et al. 2016; Almorox et al. 2016; Valiantzas 2018; Zanetti et al. 2019).
The main challenge in water scarcity research is to develop suitable methods or techniques to predict the factors that are affected by climate change and the consequences on water availability and ETo estimation. ETo is characterized by high nonlinearity and non-stationarity (Hernández et al. 2011), making its daily forecast difficult. However, timeseries analysis is a specific way of analyzing a sequence of datasets collected over a time interval that allows the development of a mathematical model to explain systematic patterns embedded in the data. The most apparent patterns appearing in timeseries data are trends and seasonality (Box et al. 2015). Moreover, the forecast of timeseries depends on the previous data, which are used to create relationships between the data that have continuous observations (Box et al. 2015). However, the abstraction of autocorrelation components from the timeseries data remains a challenge in timeseries analysis techniques (Box 2013). Recently, more effort has been dedicated to using stochastic models in hydrology and climatology (Mossad and Alazba 2016).
Timeseries analysis is a powerful statistical prediction tool that relies on the collection and analysis of past observations of a variable to create a model for future trends. The timeseries models do not presume knowledge of any structural relationships between variables involved in the studied process, such as evapotranspiration rates and climatic variables. These models are stochastic because an observed timeseries is an actual realization of a stochastic process (Arca et al. 2004). The autoregressive integrated moving average (ARIMA) models (Box et al. 1995) are the most popular timeseries tools. For the ARIMA models, the forecast of a variable is described as a linear (additive) combination of the previous gates of the variable (pure autoregressive component) and the previous forecast errors (pure moving average component). ARIMA models are commonly used to forecast ETo (Alireza and Hossein 2015; Arca et al., 2004; Gautam and Sinha 2016).
Recently used computer software models have shown high accuracy in estimating and forecasting ETo, e.g., support vector machines (SVMs) (Fan et al. 2019). In recent years, the use of machine learning programs for ETo estimation has spread by making relationships between the inputs and outputs used in ETo estimation, which are mainly meteorological data, which gives higher accuracy and power to apply machine learning programs in ETo modeling (Ferreira and Cunha 2020a, b; Kumar et al. 2011). Machine learning methods have been utilized successfully in recent years to estimate ETo with fewer meteorological data. These models can capture complicated interactions between the input and the output data, making them effective ETo modeling tools. Several models have been evaluated such as artificial neural network (ANN) (Afzaal et al. 2020a, b; Alves et al. 2017a, b; Farooque et al. 2021; Traore et al. 2016), support vector machine (SVM) (Farooque et al. 2021; Ferreira et al. 2019a; Mehdizadeh et al. 2017; Traore et al. 2016), multivariate adaptive regression splines (MARS) (Ferreira et al. 2019b; Mehdizadeh 2018; Wu and Fan 2019), and random forest (RF) (Feng et al. 2017) used meteorological parameters to estimate daily ETo amounts through random forest (RF) and generalized regression neural network (GRNN) methods for southwestern China. Although both methods were found to be acceptable, the RF method was found to be superior. Wang et al. (Wang et al. 2019) used meteorological data from the Karst region in southwest China and the RF and GEP models to estimate ETo. The results showed that RF-based models performed better, but GEP-based models were recommended because they provide understandable equations and simpler to use. Moreover, long short-term memory (LSTM) was applied to assess the potential of machine learning models in forecasting irrigation water requirements (IWR) of snap beans by evolving multi-scenarios of inputs parameters to figure out the impact of meteorological, crop, and soil parameters on IWR in Egypt (Mokhtar et al. 2023).
Traditional machine learning models, such as ANN and SVM, have been employed for ETo forecasting as indicated above. In recent years, the deep learning models have received a lot of attention, and they have been applied in a variety of fields, outperforming traditional machine learning models (Alibabaei et al. 2021; Chen et al. 2020; Ferreira and da Cunha 2020a, 2020b; Nagappan et al. 2020; Saggi and Jain 2019; Sattari et al. 2020; Tikhamarine et al. 2019). LSTM in timeseries forecasting (Afzaal et al. 2020c; Alibabaei et al. 2021; Farooque et al. 2021; Son and Kim, 2020; Tian et al. 2018; Zhou et al. 2019) can be employed for timeseries forecasting. Marndi et al. (2021) applied long short-term memory (LSTM) for predicting rice yield using different input scenarios. The best LSTM model was achieved using rainfall as an input variable for rice yield forecasting. Sultana and Khanam (2020) compared the performance of autoregressive integrated moving average (ARIMA) and artificial neural network (ANN) on univariate time series data of yearly rice production from 1972 to 2013. According to this study, the ARIMA model outperforms the ANN model since the estimated error of ANN was significantly higher than ARIMA errors.
Some studies predict ETo using expected meteorological data, such as public weather forecasts (Cai et al. 2007; Perera et al. 2014; Traore et al. 2017; Yang et al. 2019). However, ETo is predicted in this study using past meteorological data. By this way, there is no need for external data but to rely solely on the collected data from a local weather station. Given the importance of ETo forecasts, the objective of this study is to assess ETo under arid climate conditions in Egypt at the regional and local scales using ARIMA, RF, and LSTM models. In addition, we identify the appropriate model and dataset input for ETo estimation given a limited number of meteorological data. This research is critical in determining the best approach (optimal model and input variables) that could be used as a simple, rapid, and inexpensive approach for timely and reliable ETo prediction at local and regional scales across Egypt. Furthermore, this paper aims to participate in saving water tasks by forecasting ETo across Egypt. To the best of our knowledge, the applied approaches are still poorly investigated for ETo, especially those based on different climate inputs at both regional and local levels. So, the novel contribution of this work is to develop and compare the results from ARIMA and machine learning models. Thus, this investigation presents a pioneering modeling strategy that would lead to improvement of efforts to address the deficiencies in ETo prediction which could make irrigation scheduling improved and so give better solutions for decision-makers.
Material and methods
Study area and data collection
This study was conducted in four regions in Egypt (Fig. 1). The latitude and longitude of the weather stations within the regions are as follows: Station 1 is Cairo (30.1162 N, 31.4094 E), Station 2 is Ismalia (30.5567 N, 32.2652 E), Station 3 is Benisouif (28.9082 N, 30.9505 E), and Station 4 is Sohag (26.634 N, 31.6526 E). These regions were chosen to reflect the different climatic conditions of Egypt. Data for the daily meteorological variables of minimum and maximum temperature (Tmin and Tmax, oC), relative humidity (RH, %), and wind speed (WS, m/s) were obtained from 1982 to 2020. Wind speed measured at 10 m height was converted to 2 m values as presented by Allen et al. (1998). The meteorological data were obtained from two sources: (i) the Egyptian Meterological Authority and (ii) NASA (2015) gridded daily data over the study regions. The daily reference evapotranspiration (ETo) was determined using FAO version 3.2 calculator software based on the standard FAO-Penman method described in (Allen et al. 1998).
With the LSTM and RF models, four combinations of input sets of (i) Tmin, Tmax, WS, and RH, (ii) Tmin and Tmax, (iii) Tmin, Tmax, and WS, and (iv) Tmin, Tmax, and RH were implemented as presented in Table 1. With the ARIMA model, the input was a relation between the date and the ETo calculated from the four collected weather parameters (Tmin, Tmax, WS, and RH) using the ETo Calculator. Moreover, the collected data were divided into three distinct timeframes for calibration/training (1982–2007), cross-validation (2008–2011), and validation (2012–2020) for forecasting ETo, which was compared to the FAO 56 PM observations at the weather station. A flowchart of the machine learning models and ARIMA model used in the study is shown in Fig. 2. The LSTM and RF models implemented in Python 3.8 and the ARIMA model implemented in MATLAB (2021a) were used.
Machine learning models
Long short-term memory
Long short-term memory (LSTM) models were introduced by (Hochreiter and Schmidhuber 1997), and they are a type of recurrent neural network that can learn data dependencies over time. This is possible because the recurring module of the models is made up of four layers that interact with one another. To address short-term memory problems, input (Xt), output (ot), forget (ft), hidden gate, and cell gate were added to the LSTM blocks. The forget gate determines which information should be removed from the cell gate, resulting in an (ft) value. In addition, the forget gate can discard irrelevant information based on relevance. The input gate determines which information from the cell gate should be updated, resulting in its value. The output gate is in charge of producing (ot), which is used to compute the hidden gate (ht) using a filtered version of the cell gate. Following the ft function, the tanh and sigmoid functions were used to scale the values for further processing. The combined gate (Ct) is calculated as the dot product of the tanh and sigmoid outputs. Figure 3 depicts a more detailed overview of the LSTM information flow memory block.
Random forest (RF)
RF is a supervised algorithm, and it is one of the most widely used algorithms for both regression and classification. It is an improvement on the decision tree algorithm in that decision trees have a massive limitation of overfitting. In this algorithm, decision trees are fitted in different subsets of the training data. The number of trees in the algorithm has a direct relationship with the results it can achieve and has to be optimized. Although each individual tree is a weak learner (Fig. 4), RF combines the predictions of all trees (ensemble), resulting in a powerful model (Huang et al. 2019). This model has the advantage of requiring less hyperparameter adjustment in general. More information on RF can be found in Tyralis et al. (2019).
A tree is created by selecting a random collection of variables that will be utilized to decide the outcome of the forecast. Two crucial parameters in the RF training process are the number of trees (ntree) and the number of variables available for selection in each split (mtry) (Houborg and McCabe 2018). The RF approach is made up of groups of the classification tree or the regression tree, depending on the situation. Repeated runs of the RF algorithm achieve the best model setup (García-Peñalvo et al. 2018).
The ARIMA model
The ARIMA model is based on the Box–Jenkins methodology, which is fitted to a given timeseries, and it is parsimonious in terms of the number of model parameters. It depends on a three-step iterative process of model identification, estimation, and diagnostic checking to define the best parsimonious model. This three-step process is repeated several times until a satisfactory model is finally selected. Finally, this model is used to forecast future values of the timeseries (Box et al. 2015).
Model identification
The AR (autoregressive) part of the ARIMA model shows that the variable of interest is regressed on its own prior values. The MA (moving average) part of the ARIMA model shows that the regression error is a linear combination of error values occurring at various time intervals in the past. The I (integrated) part shows the number of times differencing has been performed. The entire objective of banding adequate AR, I, and MR terms is to produce the best parsimonious model to fit the time series data (Box 2013). The model assumes the data to be a non-seasonal timeseries, and therefore, the data need to be de-seasonalize before modeling. A non-seasonal ARIMA model is generally denoted as ARIMA (p, d, q), where p is the order of AR, d is the order of differencing, and q is the order of MA. The ARIMA methodology has its own limitations by relying on past values. However, it works best for long and stable timeseries (Box 2013; Marco et al. 2012). The non-seasonal part of the ARIMA model can be expressed as:
where \(\phi (B)\) and θ(B) are polynomials for p and q order, respectively.
Model estimation
After identifying the appropriate model as the first step, the model parameters have to be estimated. The AR and MA parts parameters were calculated using the procedure suggested by (Box 2013). The AR and MA parameters should be tested for statistical significance.
Diagnostic checking
After the estimation of the model parameters, diagnostic checking has to be performed to verify the adequacy of the model. Several diagnostic statistics and plots of residuals were investigated to ascertain whether the residuals are correlated or white noise. The residuals analysis involves checking the autocorrelation function (ACF), the partial autocorrelation function (PACF), and the histograms of the residuals and the residual distribution around the mean (Box 2013; Mossad and Alazba 2016).
Performance evaluation of the applied models
The models were assessed for each meteorological station using the performance statistics of the mean absolute error (MAE), the Nash–Sutcliffe efficiency coefficient (NSE), the coefficient of determination (R2), and the root-mean-square error (RMSE), expressed as:
where Oi and Pi, i are the observed and the predicted values, respectively, at the time i, and n is the number of observations. \(\overline{O}\) and \(\overline{P}\) represent the average values of the observed and the predicted values, respectively.
Results and discussion
Basic statistics
Table 2 shows the mean and the standard deviation values of the climatic variables from 1982 to 2020 in four weather stations. The mean of maximum air temperature was in the range of 28.4–30.9 °C with standard deviation values being between 7.1 and 7.7 °C for all stations, and those of the minimum air temperature were in the range of 14.8–15.7 °C for the mean and 5.3–6.7 °C for the standard deviation. For the wind speed, the mean range was 4.9–8.7 m/s and the standard deviation was 1.5–2.6 m/s for all stations, and the mean of the relative humidity was in the range of 30.1–54.1% and the standard deviation of 10.7–15.5%, and the mean ETo computed from the FAO-56 method had a range of 4.33–4.82 mm/day and the standard deviation a range of 1.6–1.8 mm/day for all the stations.
The machine learning models
RF model
The results obtained with the RF model using the four input combinations datasets of (1) Tmin, Tmax, RH, WS; (2) Tmin, Tmax; (3) Tmin, Tmax, WS; and (4) Tmin, Tmax, RH, are presented in Fig. 5 for the regional scale. Using all input variables (1) yielded the best result having R2 = 0.85, RMSE = 0.69 mm/day, MAE = 0.51 mm/day, and NSE = 0.85, while the second set of input combinations was the worst performer with R2 = 0.80, RMSE = 0.80 mm/day, MAE = 0.62 mm/day, and NSE = 0.80 but its results are still considered as good.
At the local scale, we trained and tested the RF model for each weather station to investigate how the climate variables impact evapotranspiration in the different climatic regions. Table 3 shows the heatmap of R2, RMSE, NSE, and MAE indices over the four weather stations. Of the four weather stations, the R2 of the input dataset 1 (4 inputs) ranged between 0.92 and 0.95, and the range for the RMSE was 0.38–0.46 mm/day. For the MAE, the range was 0.28–0.33 mm/day and for the NSE it was between 0.92 and 0.95. As was the case for the regional scale, the second input combination dataset recorded the lowest values of R2 (0.86–0.93) and NSE (0.86–0.93) and the highest values for RMSE (0.50–0.58 mm/day) and MAE (0.37–0.44 mm/day). Thus, for both the local and regional scale, the first input combination dataset (Tmin, Tmax, RH, WS) yielded the best results, stemming from the fact that it used the maximum number of input variables which increases the accuracy of evaporation prediction.
LSTM model
Figure 5 is also shown the results of the LSTM model using the different input data combinations for the regional scale. For the regional scale, the best result was obtained with the first input combination dataset (four input variables), yielding R2 = 0.86, RMSE = 0.68 mm/day, MAE = 0.52 mm/day, and NSE = 0.86. With this model too, the second input dataset exhibited the worst performance having R2 = 0.84, RMSE = 0.73 mm/day, MAE = 0.5 mm/day, and NSE = 0.84. In terms of the four climate stations (local scale) executed with the input combination dataset 1, the R2 values range was 0.95–0.92, the RMSE range was 0.37–0.43 mm/day, and those of MAE and NSE were 0.28–0.33 mm/day and 0.95–0.92, respectively. Here too, the second input combination dataset for all stations yielded the lowest values of R2 (0.87–0.93) RMSE (0.47–0.56) and highest values of MAE (0.38–0.470 mm/day) and NSE (0.87–0.93).
Overall evaluation
To compare LSTM and RF models, a boxplot was made based on the residuals (Fig. 6). For the regional scale, the two best input combination datasets with the LSTM and the RF models were selected to determine the best input combination dataset and model pair (Fig. 6a). As shown in Fig. 6a, the inter-quartile range (IQR) values of RF1, RF4, LSTM1, LSTM4 were 0.756, 0.801, 0.870, and 0.885, respectively. Furthermore, the RF model with input combination dataset 1 appears to be the best model having the lowest error in comparison with the other pairs. For the local scale (Fig. 6b), the IQR values of RF1-St1, RF1-St2, RF1-St3, and RF1-St4 were 0.442, 0.4, 0.447, and 0.409, respectively, while the corresponding values of RF4-St1, RF4-St2, RF4-St3, and RF4-St4 were 0.445, 0.520, 0.413, and 0.433, respectively. The LSTM1 was the best for all stations. The Q1 value of LSTM1-St1 was −0.149, while LSTM4-St3 was −0.205 (Fig. 6c). Moreover, a smaller IQR by LSTM1-St1 of 0.452 clearly shows that the distribution error of LSTM1-St1 is much better than the others because of the higher concentration around the mean.
The RF model performed well at both the local and the regional scales. (Jeong et al. 2016) reported that the RF model may suffer overfitting to data because its algorithm consists of an ensemble of a large number of decision trees that may not be fully described mechanistically. Also, RF may cause a loss of accuracy when extreme ends are expected or responses are outside the limits of the training dataset (Jeong et al. 2016). Our results for the RF model for ETo forecasting were better than the results obtained by Son and Kim, (2020b) based on RMSE. Although the ETo was forecast based on ANN and SVR models in Iran (Maroufpoor et al. 2020), our results based on RF are better than theirs. The main reason is that the input datasets are more related to the ETo in our study area. Using the extreme learning machine model to forecast ETo in eight provinces in China (Wu et al. 2019), they achieved better results (R2 = 0.99 and RMSE = 0.15 mm/day) than those obtained in this study which was conducted on the local scale and a shorter timeseries data which decrease the variations between the data that finally resulted in high model performance. Ferreira et al. (2019a) reported performance improvements when relative humidity was added to temperature in machine learning models developed for Brazil. Our study gave better results when we considered wind speed in addition to relative humidity and temperature. Furthermore, Afzaal et al. (2020b) reported performance improvements when relative humidity was added to temperature in the LSTM model developed for Canada. Moreover, Barzegar et al. (2020); Ferreira and da Cunha (2020c); Kim and Cho (2019); and Landeras et al. (2009) also reported better performance of deep learning over traditional machine learning models Marndi et al. (2021) have proven that LSTM was a good model for rice production in India. Mokhtar et al. (2023) used the LSTM and RF models to predict the irrigation water requirements of the green bean crop in Egypt through an actual field experiment, and they gave satisfactory results.
In the present study, however, the deep learning model provided marginally better results. Nevertheless, deep learning models generally have more hyperparameters to be adjusted, requiring more time for training. By contrast (Landeras et al. 2009), forecasting weekly ETo with ARIMA and ANN reduced RMSE with respect to weekly historical means by only 6–8%. ETo forecasting is a complex task since it is affected by several meteorological variables, which can vary widely daily.
ARIMA model
Identification of AR(p) and MR(q) components
The generated timeseries for daily ETo is illustrated in Fig. 7 for the four selected weather stations from 1982 to 2011. Figure 7 shows a nonlinear and a seasonal component in the original timeseries data, exhibiting an annual cycle. Therefore, a non-Gaussian is often used to evaluate the effectiveness of nonlinear models. Figure 7 shows that there are no abnormal flocculating trends in the timeseries. The same thing is observed by inspection of the autocorrelation (ACF) and the partial autocorrelation functions (PACF) for the original data displayed in Fig. 8. Because the data show non-stationarity, the differencing approach was applied to make the timeseries stationary (Hyndman and Athanasopoulos, 2018) and the ACF and PACF of the transformed data are plotted in Fig. 9. Correlations within the blue lines indicate that they are significantly different from zero. For the regional scale, the average daily ETo for the four stations was used for estimating the ACF and PACF in Figs. 8 and 9.
Estimation of the appropriate p, d, q values
Model selection can be made by the maximum likelihood used in the parameter estimation as explained by Box (2013). The most appropriate model structure was selected through two information criteria: the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Generally, the model with the lowest AIC and BIC values is considered the best as explained by Ampaw et al. (2013; Hyndman and Athanasopoulos, (2018); and Marco et al. (2012).
According to the results shown in Table 4, the ARIMA (2,1,4) for station 1, ARIMA (2,1,3) for station 2, ARIMA (2,1,4) for station 3, and ARIMA (2,1,3) for station 4 models were identified as the best-fitted models among the 16 ARIMA models tested at the local scale. For the regional scale, only ARIMA (2,1,4) and ARIMA (2,1,3) were tested as they were the best-fitted models for the local scale yielding the minimum AIC and BIC values. The model with the smallest AIC value yields residuals that resemble white noise (Mossad and Alazba 2016), a confirmation of the appropriateness of the selected ARIMA model (Ord et al. 2017).
Estimation of the best ARIMA model
The selected ARIMA models needed to be validated to check their appropriateness, an essential and last step before using the selected model for forecasting. The diagnostic checking step assures the reliability of the chosen models (Dimri et al. 2020) (Mossad and Alazba 2016). One of the convenient methods used to validate the model is the graphical technique. Hence, many validation plots were investigated to check whether the residuals were white noise. Figure 10 shows the estimated ACF and PACF of the residuals for the candidate model at various numbers of lags with 95% probability at local and regional scales. Most of the ACF and PACF values were not significantly different from zero as they lie within the confidence limits. Therefore, there is no significant correlation between the residuals.
Model forecasting
Forecasting helps to predict future uncertainty based on the behavior of the past and the current observations. It was done using the best-fitted ARIMA models. Data from 2012 to 2020 (2987) were used in the forecast to ascertain the validity of the developed models. Some performance statistics (R2, RMSE, MAE, NSE) were used for evaluating the agreement between the predicted and the observed timeseries at the local and regional scale (Table 5). For the four weather stations (the local scale), the R2 values range was 0.86–0.92 and that for RMSE was (0.61–0.90 mm/day). A range of 0.45–0.76 mm/day was recorded for MAE and 0.80–0.90 for the NSE. For the regional scale, the performance statistics were R2 = 0.90, RMSE = 0.58 mm/day, MAE = 0.42 mm/day, and NSE = 0.93. Sultana and Khanam (2020) forecasted the production of rice in Bangladesh using ARIMA and ANN. The results indicated that the ARIMA model outperformed the ANN model for predicting rice production based on the RMSE, MAE, and MAPE values. The ARIMA model can be a valuable tool for forecasting daily reference evapotranspiration. Accurate ET0 forecasts are crucial for irrigation scheduling, drought monitoring, and water allocation. Several studies have successfully utilized ARIMA models for ETo and climate variables forecasting in many countries, such as Saudi Arabia (Mossad and Alazba 2016); India (Dimri et al. 2020); Colombia (Martínez-Acosta et al. 2020); China (Chen et al. 2018); and Poland (Murat et al. 2018). They indicate an important note that the performance of the ARIMA model may vary depending on the specific dataset, geographical location, and climatic conditions. Therefore, the ARIMA model changes with each climate zone. However, these different studies state that ARIMA models could be promising tools for ETo and climate variables forecasting.
Conclusion
This study focused on modeling and forecasting the daily reference evapotranspiration (ETo) using two machine learning models [random forest (RF) and long short-term memory (LSTM)] and autoregressive integrated moving average (ARIMA) models. Four input climatic data combinations assisted the machine learning models. The results of this study could assist policy and decision-makers to develop water resources strategies in Egypt, and any arid region for that matter, which are very important nowadays due to water shortages resulting from climate change. The first part of the study focused on the possibility of forecasting ETo using the RF and LSTM models. The results attested that RF1 and LSTM1, i.e., using all the climatic input data, have the lowest error at both the regional and local scales. Further, the RF1 model having the lowest error shows that its distribution of error is much better than the others. The second part of the study focused on the possibility of forecasting ETo using the ARIMA modeling approach. Therefore, different ARIMA model structures have been proposed based on correlation methods (ACF and PACF) for the four stations in Egypt. The best ARIMA model structure was selected according to the lowest AIC and BIC values, and the selected models were ARIMA (2,1,4) for station 1, ARIMA (2,1,3) for station 2, ARIMA (2,1,4) for station 3, and ARIMA (2,1,3) for station 4. In addition, a high correlation was noticed between the four weather stations, and the ARIMA model (2,1,4) appeared to be a reasonable prediction of the ETo for the combined four stations' data to constitute the regional data. These results are promising, and the proposed ARIMA model structure, RF, and LSTM could be considered for forecasting daily ETo within the study regions.
References
Afzaal H, Farooque AA, Abbas F, Acharya B, Esau T (2020a) Computation of evapotranspiration with artificial intelligence for precision water resource management. Appl Sci 10:1621. https://doi.org/10.3390/APP10051621
Afzaal H, Farooque AA, Abbas F, Acharya B, Esau T (2020b) Computation of evapotranspiration with artificial intelligence for precision water resource management. Appl Sci 10:1621. https://doi.org/10.3390/app10051621
Ahooghalandari M, Khiadani M, Jahromi ME (2016) Developing equations for estimating reference evapotranspiration in Australia. Water Resour Manag 30:3815–3828. https://doi.org/10.1007/S11269-016-1386-7
Alibabaei K, Gaspar PD, Lima TM (2021) Modeling soil water content and reference evapotranspiration from climate data using deep learning method. Appl Sci 11:5029. https://doi.org/10.3390/APP11115029
Alireza T, Hossein B (2015) Capability evaluation of time series model and chaos theory in estimating reference crop evapotranspiration (Torbat-E-Heydarieh synoptic station, Khorasan Razavi)
Allen R, Pereira L, Raes D, Smith M (1998) Crop evapotranspiration guidelines for computing crop water requirements. FAO irrigation and drainage, Paper no. 56, Food and agriculture organization of the United Nations, Rome. 48
Almorox J, Senatore A, Quej VH, Mendicino G (2018) Worldwide assessment of the Penman-Monteith temperature approach for the estimation of monthly reference evapotranspiration. Theor Appl Climatol 131:693–703
Alves WB, Rolim GDS, Aparecido LEDO (2017a) Reference evapotranspiration forecasting by artificial neural networks. Engenharia Agrícola 37:1116–1125. https://doi.org/10.1590/1809-4430-ENG.AGRIC.V37N6P1116-1125/2017
Alves WB, Rolim GDS, Aparecido LEDO (2017b) Reference evapotranspiration forecasting by artificial neural networks. Eng Agric 37:1116–1125. https://doi.org/10.1590/1809-4430-eng.agric.v37n6p1116-1125/2017
Ampaw EM, Akuffo B, Larbi SO, Lartey S (2013) Time series modelling of rainfall in new juaben municipality of the Eastern region of Ghana. Int J Bus Soc Sci 4(8):116–129
Arca B, Duce P, Snyder RL, Spano D, Fiori M (2004) Use of numerical weather forecast and time series models for predicting reference evapotranspiration. Acta Horticult 664:39–46
Barzegar R, Aalami MT, Adamowski J (2020) Short-term water quality variable prediction using a hybrid CNN–LSTM deep learning model. Stoch Environ Res Risk Assess 34(2):415–433. https://doi.org/10.1007/S00477-020-01776-2
Box G, Jenkins G, Reinsel G, Ljung G (1995) Thrid edition time series analysis forecasting and control
Box GEP, Jenkins GM, Reinsel GC, Ljung GM (2015) Time series analysis : forecasting and control, Wiley
Box G (2013). Box and Jenkins: time series analysis, forecasting and control. A very British affair, pp 161–215. https://doi.org/10.1057/9781137291264_6
Cai J, Liu Y, Lei T, Pereira LS (2007) Estimating reference evapotranspiration with the FAO Penman-Monteith equation using daily weather forecast messages. Agric for Meteorol 145:22–35. https://doi.org/10.1016/J.AGRFORMET.2007.04.012
Chen Z, Zhu Z, Jiang H, Sun S (2020) Estimating daily reference evapotranspiration based on limited meteorological data using deep learning and classical machine learning methods. J Hydrol 591:125286. https://doi.org/10.1016/J.JHYDROL.2020.125286
Chen P, Niu A, Liu D, Jiang W, Ma B (2018) Time series forecasting of temperatures using SARIMA: an example from Nanjing. IOP Conference Series: Materials Science and Engineering. IOP Publishing, p 052024
Dimri T, Ahmad S, Sharif M (2020) Time series analysis of climate variables using seasonal ARIMA approach. J Earth Syst Sci 129:1–16. https://doi.org/10.1007/S12040-020-01408-X
Farooque AA, Afzaal H, Abbas F, Bos M, Maqsood J, Wang X, Hussain N (2021) Forecasting daily evapotranspiration using artificial neural networks for sustainable irrigation scheduling. Irrig Sci. https://doi.org/10.1007/S00271-021-00751-1
Feng Y, Cui N, Gong D, Zhang Q, Zhao L (2017) Evaluation of random forests and generalized regression neural networks for daily reference evapotranspiration modelling. Agric Water Manag 193:163–173. https://doi.org/10.1016/J.AGWAT.2017.08.003
Ferreira LB, da Cunha FF (2020a) Multi-step ahead forecasting of daily reference evapotranspiration using deep learning. Comput Electron Agric 178:105728. https://doi.org/10.1016/J.COMPAG.2020.105728
Ferreira LB, da Cunha FF (2020b) New approach to estimate daily reference evapotranspiration based on hourly temperature and relative humidity using machine learning and deep learning. Agric Water Manag 234:106113. https://doi.org/10.1016/J.AGWAT.2020.106113
Ferreira LB, da Cunha FF, de Oliveira RA, FernandesFilho EI (2019a) Estimation of reference evapotranspiration in Brazil with limited meteorological data using ANN and SVM–a new approach. J Hydrol 572:556–570. https://doi.org/10.1016/J.JHYDROL.2019.03.028
Ferreira LB, Duarte AB, da Cunha FF, Filho EIF (2019b) Multivariate adaptive regression splines (MARS) applied to daily reference evapotranspiration modeling with limited weather data. Acta Scientiarum Agron. https://doi.org/10.4025/ACTASCIAGRON.V41I1.39880
García-Peñalvo F, Cruz-Benito J, Martín-González M, Vázquez-Ingelmo A, Sánchez-Prieto JC, Therón R (2018) Proposing a machine learning approach to analyze and predict employment and its factors. Int J Interact Multimed Artif Intell 5:39. https://doi.org/10.9781/IJIMAI.2018.02.002
Gautam R, Sinha AK (2016) Time series analysis of reference crop evapotranspiration for Bokaro district, Jharkhand, India. J Water Land Develop 30:51–56. https://doi.org/10.1515/JWLD-2016-0021
Hargreaves GH, Samani ZA (1985) Reference crop evapotranspiration from temperature. Appl Eng Agric 1:96–99. https://doi.org/10.13031/2013.26773
Haskett JD, Pachepsky YA, Acock B (2000) Effect of climate and atmospheric change on soybean water stress: a study of Iowa. Ecol Model 135:265–277. https://doi.org/10.1016/S0304-3800(00)00369-0
Hernández S, Morales L, Sallis P (2011) Estimation of reference evapotranspiration using limited climatic data and Bayesian model averaging. Proceedings—UKSim 5th European modelling symposium on computer modelling and simulation, EMS, pp 59–63. https://doi.org/10.1109/EMS.2011.81
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780. https://doi.org/10.1162/NECO.1997.9.8.1735
Houborg R, McCabe MF (2018) A hybrid training approach for leaf area index estimation via cubist and random forests machine-learning. ISPRS J Photogramm Remote Sens 135:173–188. https://doi.org/10.1016/J.ISPRSJPRS.2017.10.004
Huang G, Wu L, Ma X, Zhang W, Fan J, Yu X, Zeng W, Zhou H (2019) Evaluation of CatBoost method for prediction of reference evapotranspiration in humid regions. J Hydrol 574:1029–1041. https://doi.org/10.1016/J.JHYDROL.2019.04.085
Hyndman RJ, Athanasopoulos G (2018) Forecasting: principles and practice. OTexts
Jeong JH, Resop JP, Mueller ND, Fleisher DH, Yun K, Butler EE, Timlin DJ, Shim KM, Gerber JS, Reddy VR, Kim SH (2016) Random forests for global and regional crop yield predictions. PLoS ONE 11:e0156571. https://doi.org/10.1371/JOURNAL.PONE.0156571
Kim TY, Cho SB (2019) Predicting residential energy consumption using CNN-LSTM neural networks. Energy 182:72–81. https://doi.org/10.1016/J.ENERGY.2019.05.230
Landeras G, Ortiz-Barredo A, López JJ (2009) Forecasting weekly evapotranspiration with ARIMA and artificial neural network models. J Irrig Drain Eng 135:323–334. https://doi.org/10.1061/(ASCE)IR.1943-4774.0000008
Lucas PDO, Alves MA, Silva PCDL, Guimaraes FG (2020) Reference evapotranspiration time series forecasting with ensemble of convolutional neural networks. Comput Electron Agric 177:105700
Marco JB, Harboe R, Salas JD (2012) Stochastic hydrology and its use in water resources systems simulation and optimization. Springer Science and Business Media
Marndi A, Ramesh K, Patra G (2021) Crop production estimation using deep learning technique. Curr Sci 121:1073
Maroufpoor S, Bozorg-Haddad O, Maroufpoor E (2020) Reference evapotranspiration estimating based on optimal input combination and hybrid artificial intelligent model: hybridization of artificial neural network with grey wolf optimizer algorithm. J Hydrol 588:125060. https://doi.org/10.1016/J.JHYDROL.2020.125060
Martínez-Acosta L, Medrano-Barboza JP, López-Ramos Á, RemolinaLópez JF, López-Lambraño ÁA (2020) SARIMA approach to generating synthetic monthly rainfall in the Sinú river watershed in Colombia. Atmosphere 11:602
Mehdizadeh S (2018) Estimation of daily reference evapotranspiration (ETo) using artificial intelligence methods: Offering a new approach for lagged ETo data-based modeling. J Hydrol 559:794–812. https://doi.org/10.1016/J.JHYDROL.2018.02.060
Mehdizadeh S, Behmanesh J, Khalili K (2017) Using MARS, SVM, GEP and empirical equations for estimation of monthly mean reference evapotranspiration. Comput Electron Agric 139:103–114. https://doi.org/10.1016/J.COMPAG.2017.05.002
Meshram DT, Jadhav VT, Gorantiwar SD, Chandra R, Meshram DT, Jadhav VT, Gorantiwar SD, Chandra R (2015) Modeling of weather parameters using stochastic methods. Clim Change Modell, Plan Policy Agric. https://doi.org/10.1007/978-81-322-2157-9_8
Misra AK (2014) Climate change and challenges of water and food security. Int J Sustain Built Environ 3:153–165. https://doi.org/10.1016/J.IJSBE.2014.04.006
Mokhtar A, Al-Ansari N, El-Ssawy W, Graf R, Aghelpour P, He H, Hafez SM, Abuarab M (2023) Prediction of irrigation water requirements for green beans-based machine learning algorithm models in arid region. Water Resour Manag 37:1557–1580
Mossad A, Alazba AA (2016) Simulation of temporal variation for reference evapotranspiration under arid climate. Arab J Geosci 9:1–9. https://doi.org/10.1007/S12517-016-2482-Y
Murat M, Malinowska I, Gos M, Krzyszczak J (2018) Forecasting daily meteorological time series using ARIMA and regression models. International agrophysics, vol 32
Nagappan M, Gopalakrishnan V, Alagappan M (2020) Prediction of reference evapotranspiration for irrigation scheduling using machine learning. Hydrol Sci J 65(16):2669–2677. https://doi.org/10.1080/02626667
Ord K, Fildes R, Kourentzes N (2017) Principles of business forecasting
Pereira LS, Allen RG, Smith M, Raes D (2015) Crop evapotranspiration estimation with FAO56: past and future. Agric Water Manag 147:4–20. https://doi.org/10.1016/J.AGWAT.2014.07.031
Perera KC, Western AW, Nawarathna B, George B (2014) Forecasting daily reference evapotranspiration for Australia using numerical weather prediction outputs. Agric for Meteorol 194:50–63. https://doi.org/10.1016/J.AGRFORMET.2014.03.014
Saggi MK, Jain S (2019) Reference evapotranspiration estimation and modeling of the Punjab Northern India using deep learning. Comput Electron Agric 156:387–398. https://doi.org/10.1016/J.COMPAG.2018.11.031
Samani Z (2000) Estimating solar radiation and evapotranspiration using minimum climatological data. J Irrig Drain Eng 126:265–267. https://doi.org/10.1061/(ASCE)0733-9437(2000)126:4(265)
Sattari MT, Apaydin H, Shamshirband S (2020) Performance evaluation of deep learning-based gated recurrent units (GRUs) and tree-based models for estimating ETo by using limited meteorological variables. Mathematics 8:972. https://doi.org/10.3390/MATH8060972
Son H, Kim C (2020b) A deep learning approach to forecasting monthly demand for residential-sector electricity. Sustainability 12:3103. https://doi.org/10.3390/SU12083103
Sultana A, Khanam M (2020) Forecasting rice production of Bangladesh using ARIMA and artificial neural network models. Dhaka Univ J Sci 68:143–147
Tian C, Ma J, Zhang C, Zhan P (2018) A deep neural network model for short-term load forecast based on long short-term memory network and convolutional neural network. Energies 11:3493. https://doi.org/10.3390/EN11123493
Tikhamarine Y, Malik A, Kumar A, Souag-Gamane D, Kisi O (2019) Estimation of monthly reference evapotranspiration using novel hybrid machine learning approaches. Hydrol Sci J 64:1824–1842. https://doi.org/10.1080/02626667.2019.1678750
Traore S, Luo Y, Fipps G (2016) Deployment of artificial neural network for short-term forecasting of evapotranspiration using public weather forecast restricted messages. Agric Water Manag 163:363–379. https://doi.org/10.1016/J.AGWAT.2015.10.009
Traore S, Luo Y, Fipps G (2017) Gene-expression programming for short-term forecasting of daily reference evapotranspiration using public weather forecast information. Water Resour Manag 31:4891–4908. https://doi.org/10.1007/S11269-017-1784-5
Tyralis H, Papacharalampous G, Langousis A (2019) A brief review of random forests for water scientists and practitioners and their recent history in water resources. Water 11:910. https://doi.org/10.3390/W11050910
Valiantzas JD (2018) Temperature-and humidity-based simplified Penman’s ET0 formulae. Comparisons with temperature-based Hargreaves-Samani and other methodologies. Agric Water Manag 208:326–334. https://doi.org/10.1016/J.AGWAT.2018.06.028
Wang S, Lian J, Peng Y, Hu B, Chen H (2019) Generalized reference evapotranspiration models with limited climatic data based on random forest and gene expression programming in Guangxi, China. Agric Water Manag 221:220–230. https://doi.org/10.1016/J.AGWAT.2019.03.027
Wu L, Fan J (2019) Comparison of neuron-based, kernel-based, tree-based and curve-based machine learning models for predicting daily reference evapotranspiration. PLoS ONE 14:e0217520. https://doi.org/10.1371/JOURNAL.PONE.0217520
Wu L, Zhou H, Ma X, Fan J, Zhang F (2019) Daily reference evapotranspiration prediction based on hybridized extreme learning machine model with bio-inspired optimization algorithms: application in contrasting climates of China. J Hydrol 577:123960. https://doi.org/10.1016/J.JHYDROL.2019.123960
Xu D, Zhang Q, Ding Y, Huang H (2020) Application of a hybrid ARIMA–SVR model based on the SPI for the forecast of drought—a case study in Henan Province, China. J Appl Meteorol Climatol 59:1239–1259. https://doi.org/10.1175/JAMC-D-19-0270.1
Yang Y, Luo Y, Wu C, Zheng H, Zhang L, Cui Y, Sun N, Wang L (2019) Evaluation of six equations for daily reference evapotranspiration estimating using public weather forecast message for different climate regions across China. Agric Water Manag 222:386–399. https://doi.org/10.1016/J.AGWAT.2019.06.014
Zanetti SS, Dohler RE, Cecílio RA, Pezzopane JEM, Xavier AC (2019) Proposal for the use of daily thermal amplitude for the calibration of the Hargreaves-Samani equation. J Hydrol 571:193–201. https://doi.org/10.1016/J.JHYDROL.2019.01.049
Zhou Y, Chang FJ, Chang LC, Kao IF, Wang YS (2019) Explore a deep learning multi-output neural network for regional multi-step-ahead air quality forecasts. J Clean Prod 209:134–145. https://doi.org/10.1016/J.JCLEPRO.2018.10.243
Funding
Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB). The authors received no specific funding for this work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
Authors confirm that appropriate ethics were adopted during the period of experimentation and documentation.
Consent to publish
All authors agree to publish this manuscript. There is no conflict of interest.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Hendy, Z.M., Abdelhamid, M.A., Gyasi-Agyei, Y. et al. Estimation of reference evapotranspiration based on machine learning models and timeseries analysis: a case study in an arid climate. Appl Water Sci 13, 216 (2023). https://doi.org/10.1007/s13201-023-02016-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13201-023-02016-y