Data intelligence and hybrid metaheuristic algorithms-based estimation of reference evapotranspiration

Elbeltagi, Ahmed; Raza, Ali; Hu, Yongguang; Al-Ansari, Nadhir; Kushwaha, N. L.; Srivastava, Aman; Kumar Vishwakarma, Dinesh; Zubair, Muhammad

doi:10.1007/s13201-022-01667-7

Data intelligence and hybrid metaheuristic algorithms-based estimation of reference evapotranspiration

Original Article
Open access
Published: 06 May 2022

Volume 12, article number 152, (2022)
Cite this article

Download PDF

You have full access to this open access article

Applied Water Science Aims and scope Submit manuscript

Data intelligence and hybrid metaheuristic algorithms-based estimation of reference evapotranspiration

Download PDF

Ahmed Elbeltagi¹,
Ali Raza²,
Yongguang Hu²,
Nadhir Al-Ansari³,
N. L. Kushwaha⁴,
Aman Srivastava⁵,
Dinesh Kumar Vishwakarma⁶ &
…
Muhammad Zubair⁷

2315 Accesses
Explore all metrics

Abstract

For developing countries, scarcity of climatic data is the biggest challenge, and model development with limited meteorological input is of critical importance. In this study, five data intelligent and hybrid metaheuristic machine learning algorithms, namely additive regression (AR), AR-bagging, AR-random subspace (AR-RSS), AR-M5P, and AR-REPTree, were applied to predict monthly mean daily reference evapotranspiration (ET₀). For this purpose, climatic data of two meteorological stations located in the semi-arid region of Pakistan were used from the period 1987 to 2016. The climatic dataset includes maximum and minimum temperature (T_max, T_min), average relative humidity (RH_avg), average wind speed (U_x), and sunshine hours (n). Sensitivity analysis through regression methods was applied to determine effective input climatic parameters for ET₀ modeling. The results of performed regression analysis on all input parameters proved that T_min, RH_Avg, U_x, and n were identified as the most influential input parameters at the studied station. From the results, it was revealed that all the selected models predicted ET₀ at both stations with greater precision. The AR-REPTree model was located furthest and the AR-M5P model was located nearest to the observed point based on the performing indices at both the selected meteorological stations. The study concluded that under the aforementioned methodological framework, the AR-M5P model can yield higher accuracy in predicting ET₀ values, as compared to other selected algorithms.

A CMIP6-ensemble-based evaluation of precipitation and temperature projections

Article Open access 27 June 2024

Assessment of the AquaCrop model to simulate the impact of soil fertility management on evapotranspiration, yield, and water productivity of maize (Zea May L.) in the sub-humid agro-ecology of Nigeria

Article Open access 26 June 2024

Analysis of factors affecting evapotranspiration zoning

Article 13 June 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The difficulty to access water has become one of the fundamental issues globally through the twenty-first century (Gleeson et al. 2012; Elbeltagi et al. 2021). Most of the readily available fresh water on the earth’s surface is used up by the agricultural sector. The amount of water withdrawn from the surface of the earth in developing countries is estimated as ~ 81%, while the same is ~ 71% globally. Furthermore, irrigation plays a major role in the consumption of not less than 55% of the world’s freshwater reserves (Amarasinghe and Smakhtin 2014). Being able to provide food for the world’s population has become a conundrum amidst freshwater scarcity (Fischer et al. 2007; Shukla et al. 2021). Heightened demand for the limited water resources under rising climate change impacts and certain agricultural commodities have constantly hinted upon the need for ways to devise efficient use of the existing water resources at our fingertips. This will also require smart distribution of limited water resources in terms of appropriate time and through the right channel toward premium food production (Dhillon et al. 2019; Kushwaha et al. 2016). Gaining knowledge of the biophysical processes included in the uptake of water through the root zone of the soil and the processes of transpiration through the plant canopy has become essential. The sustenance of the development of irrigation and its processes depends on these aforementioned mechanisms. Hence, it becomes necessary to develop abilities for determining the precise amount of water requirements for an effective irrigation schedule (Rossini et al. 2013; Kushwaha et al. 2021). Reference evapotranspiration (ET₀) is regarded as the principal component of the global hydrological cycle which has a direct effect on the yield of crops, water requirements, irrigation facilities, future plannings, as well as the management of water resources. It is considered as the aggregate term that combines the removal of water from the vegetative surface, which contains enough moisture, and the evaporation from the catchment surfaces. Certain factors such as certain management activities, characteristics of crops, condition of the weather, land type, and operations on the field are the major constituents that impact the process of ET₀ (Sherma 2016). A substantially uniform field of alfalfa (commonly known as grass) is utilized worldwide for the reference surface. The properties of a reference surface crop include uniform height, soil properties, a specified amount of applied water, and the full growth of the crop in relation to at a certain period under the standard meteorological and agronomic conditions (Łabędzki et al. 2011).

Being able to estimate ET₀ is paramount in the crop irrigation requirements at the regional and global scale as well as the preparation of water budget and also in the influence of the various climatic changes (Nouri et al. 2013; Vishwakarma et al. 2022). The further assessment of ET₀ is of the essence in the fields of agro-meteorology and hydro-meteorology. In reference to Lu et al. (2010) and Zhao et al. (2013), serious challenges have been encountered in the accurate assessment of ET₀ as a result of imperceptible processes exclusively in an ecosystem or a watershed spatial scale with the desired level of accuracy. In lieu of ET₀ being a biophysical process that provides ample challenges on the land surface, the need for varying techniques and hydrological models was invented to assist in the estimation of ET₀. As such, an accurate measurement of the ET₀ plays other major roles which are not limited to just the study of climate change and in the assessment of water resources, but also toward effectively monitoring and providing adequate forecast on drought as well as in the proper use and growth of water resources (Zhao et al. 2013). As studied by (NOURI et al. 2013), the ET₀ computation process has a profoundly high level of desirability in its impact on climate change, simulating and predicting crop water scheduling, in hydrological modeling (HM), and in situations of poor data situation and how it affects land use.

An accurate estimation of the ET₀ values is essential to the estimation of the soil water balance; this, in turn, depicts the measure of water held within the body of the soil and can be related to checkbook balancing processes. In most cases, the function of the irrigation processes is to oversee the content of the soil water in other to promote proper development. Good irrigation planning and management are enhanced by determining the soil water balance, and hence, the process is of paramount importance. Knowing the soil’s water balance helps negate the risk associated with applying excessive water which enhances percolation and overflow. Irrigation scheduling is defined as the proper amount of water being applied at the appropriate period, and it is impossible without bearing knowledge of the water balance. There are varying methods and techniques applicable to estimate ET₀, and they include indirect, direct, and machine learning models (Karimaldini et al. 2011).

The best choice for direct measurement of ET₀ is by using lysimeters. However, it is a cumbersome field experimentation that is less time-efficient and expensive. In addition, the method is also not considered to be viable due to the lack of precision in its planning, while the indirect and soft computing models over the years have gathered importance in the estimation of ET₀ from climate and meteorological data. The indirect methods utilized for estimating or calculating of ET₀ include (1) empirical and semi-empirical equations, (2) pan evaporation methods, (3) energy budget methods, (4) mass transfer equations, (5) combination equations, and (6) radiation-based methods. Further information on indirect methods can be found in McMahon et al. (2013). In addition, empirical and semi-empirical equations based on the premise of Priestly–Taylor and Penman–Monteith methods (Vinukollu et al. 2011) have been widely adopted for estimation of ET₀.

Coherently, one of the most widely accepted and well-known indirect methods for ET₀ estimation has been introduced by the Food and Agriculture Organization (FAO) of the United Nations (Allen et al. 1989, 1994). The method includes the infusion of the Penman–Monteith equation which was modified and reformed by Allen et al. (1998) as a reference equation (FAO-PM56). This equation (FAO-PM56) is built on the premise that it has certain factors, such as various climatic, aerodynamic, and surface resistance parameters, which it control and influence them. They include (in terms of minimum and maximum magnitude) air temperature, relative air humidity, wind speed, solar radiation, saturation vapor pressure deficit, slope vapor pressure curve, and psychometric constant. Most weather stations are known to suffer anomalies to air temperature. The reliability and completion of other variables cannot be trusted in many locations (Rahimikhoob 2010). However, this might not be a valid case for developed countries, even though it comes across as a big challenge in developing countries where the integrity of the quantity and quality of data cannot be assured. According to Trajkovic and Kolakovic (2009), there are limitations to the reliability of weather datasets of radiation, relative humidity, and wind speed in developing countries. The need for geographic data (latitude, longitude, altitude) becomes essential for an adequate local adjustment of the different weather parameters. Such weather parameters are atmospheric pressure, extra-terrestrial radiations (R_a), and daylight hours (N) in the FAO-PM56 equation. Furthermore, the field measurement efforts and the experimental approaches used are not time-effective or even labor-intensive for the post-processing output processes. As a result, it is quite difficult to formulate the ET₀ equation to overcome these effects that can produce reliable and verified results (Gavilan et al. 2007).

The modernization and invention of machine learning algorithms are making it easier to tackle nonlinear processes (ET₀) in various disciplines (Kumar et al. 2011). The major problem associated with the estimation of ET₀ is its nonlinear dynamic and high complexity nature. Instead of this, machine learning algorithms provide an essential alternative for ET₀ estimation. The functionality of these algorithm works on the principles associated with the computational intelligent system which aims to eradicate imprecision and vulnerability in producing results. The proficiency associated with dealing with complex problems makes these methods well accepted. More so, there is the added superiority of using these methods to deal with complex problems using just a set of available data (Ibrahim 2016).

Currently, machine learning models, based on robust algorithms, are applied in mapping nonlinear processes, using input and output (target) variables. Raza et al. (2021a, b) reviewed ET₀ estimation based on accuracy, structure, and its usefulness for the study period 2012–2020. The primary objective of the reported researches was to develop an alternative soft computing model against FAO-PM56 against the limitation of requiring large quantity of climate data as input and limited availability of such data in the public domain across developing countries. The study found that designing soft computing models using all the usable data is not effective (similar to FAO-PM56). Besides, limited studies are available that aimed for the development of a generalized ET₀ model for the accurate ET₀ estimation in all stations within an area (Raza et al. 2021a, b). This becomes significant in the case of developing countries where climate data from most stations are either missing or not available due to technical issues and lack of technology. Thus, the development of ET₀ model with fewer climatic inputs (e.g., temperature data) is mainly requisite. For this purpose, different types of machine learning algorithms were developed in ET₀ modeling, for example, support vector machine (SVM) (Kisi and Imen 2009; Kushwaha et al. 2021; Mehdizadeh et al. 2017; Ferreiraand and Cunha 2020), least square support vector machine (Kisi 2013; Guo et al. 2011), genetic programming (Traore et al. 2016; Valipour et al. 2019; Mattar 2018), extreme learning machine (ELM) (Shamshirband and Kamsin 2016; Abdullah et al. 2015), tree-based models (Raza et al. 2021a, b) such as M5 model tree (Fan et al. 2018a, b; Granata 2019), random forest (Feng et al. 2017a, b; Fang et al. 2018; Saggi and Jain 2019), and extreme gradient boosting (XGBoost) (Ferreira and Cunha 2020; Fan et al. 2018a, b; Han et al. 2019), artificial neural networks (ANNs) (Torres et al. 2011; Tang et al. 2018; Walls et al. 2020), and adaptive neuro-fuzzy inference system (ANFIS) (Nourani et al. 2019; Tabari et al. 2013).

In recent studies, data intelligence and hybrid metaheuristic algorithms have been applied in ET₀ estimation on New Delhi and Ludhiana stations located in north India due to its capability of detecting patterns and changes in time-series data (Kushwaha et al. 2021). Moreover, these algorithms can also grab series data without discretization. The perfect handling of time-series data makes their use successful in various engineering problems, especially ET₀ estimation. Wu et al. (2021) conducted a study on ET₀ estimation in south China and applied ELM in combination with a clustering approach. The study recommended using proposed machine learning models in ET₀ estimation. Roy et al. (2021) estimated ET₀ using the ANFIS model along with various algorithms in combination and concluded that hybrid models outperformed single machine learning models. Similarly, Ahmadi et al. (2021) compared ET₀ estimation with novel data intelligent models and another commonly used genetic expression programming (GEP) and SVM single machine learning model. The study concluded that the proposed novel data intelligent model in combination with SVM has outperformed and found best in comparison with the empirical model. Sattari et al. (2021) applied five data intelligent machine learning models for estimating ET₀ using several input meteorological combinations and found that combining machine learning (hybrid) models increase predictive ability in results. Malik et al. (2019) estimated ET₀ using five machine learning models using different meteorological input combinations and concluded that accuracy in results and efficiency can be increased by using hybrid machine learning models. It can be inferred from aforesaid studies that data intelligent and hybrid metaheuristic machine learning algorithms are recommended in ET₀ modeling and can be considered the best alternative to conventional FAO-PM56 equation. Thus, this study applied five data intelligent and hybrid metaheuristic machine learning algorithms, namely additive regression (AR), AR-bagging, AR-random subspace (AR-RSS), AR-M5P, and AR-REPTree, for estimating monthly mean daily ET₀. Coherently, the contributions of the current study in scientific literature are as follows: (1) developing and evaluating the potential of AR, AR-bagging, AR-RSS, AR-M5P, and AR-REPTree in ET₀ estimation; (2) determining effective meteorological input combination by performing sensitivity analysis; and (3) investigating best performing hybrid data intelligent and hybrid metaheuristic machine learning algorithms based on different standard statistical indices.

Materials and methods

Study area, data preparation, and preprocessing

The study area is comprised of two stations, namely Faisalabad and Islamabad of Pakistan located in semi-arid climatic regions based on aridity and continentality indices as found in Raza et al. (2021a, b). Climatic parameters of maximum and minimum temperature (T_max, T_min, °C), an average of the maximum and minimum relative humidity (RH_avg, %), average wind speed of 24 h at 2 m height (U, km/day), and sunshine hours (N, h) are obtained from Pakistan Meteorological Department (PMD), Lahore, Pakistan. The data duration for each station is from 1987 to 2016. The geographical location of the study area is shown in Fig. 1. In addition, properties of geographic parameters such as latitude (Lat), longitude (Lon), altitude (Alt), and an average of each meteorological variable for both climatic stations are presented in Table 1. It can be observed from Table 1 that mild temperature was recorded for Faisalabad and Islamabad (semi-arid climate). However, wind speed for Faisalabad station is recorded highest (386.28 km/day) among all the selected stations due to its geographical position. Also, severe types of the thunderstorm were observed every year due to western cold wind direction. Further, it has humid summer and dry winter seasons. The statistical values of each meteorological variable used in the training and testing of machine learning algorithms are also presented in Table 1.

Table 1 Parameter of the selected algorithms

Full size table

Methodology

Regression and sensitivity analysis for best input combination

The performance of the combination of different input variables on model outputs is identified by conducting sensitivity analysis coupled with regression algorithm−based analysis. In machine learning, the regression is preferably calculated by an algorithmic process that results in the estimation of the value of a numerical dependent variable (the output). For example, standard statistics, such as coefficient of regression and root mean square error, are used for evaluating sensitivity, while regression analysis was based on statistics, such as mean absolute error, relative absolute error, and root-relative mean square error, apart from using root mean square error and correlation coefficient. This is discussed in depth in Sect. 2.3.

The present study has employed the three scientific tools for testing the models, thereby selecting the best model, viz. (1) MATLAB 9 (MathWorks Inc., Natick, MA, USA) used for validating the regression algorithms; (2) R 3.4 (R Foundation, Vienna, Austria) used for testing the algorithms based on regression trees; and (3) Weka 3.8.1 (The University of Waikato, Hamilton, New Zealand) containing different types of decision trees used for conducting linear regression, k-nearest neighbors, Bayes networks, logistic regression, K* algorithm, locally weighted learning, rule-based methods, etc.

Additive regression (AR)

In recent years, Bayesian additive regression trees (BART), which are a flexible prediction machine learning approach, have gained popularity among the research community due to their widespread applications (Sparapani et al. 2021; Tan and Roy 2019). This study has used BART for additive regression using MATLAB. This study has a continuous outcome for ET₀ (say y) and p covariates x = (x₁, …, x_p). The BART model, which aims for the prediction, can define complex relations between the aforesaid x and y by estimating f(x) from models of the form y = f (x) + ε, where ε ∼ N(0, σ²). Further, a sum of m regression trees is used, i.e., f(x) = ∑ g (x; T_j, M_j) ranging between j = 1 and j = m which allows estimation of f(x). The expression for BART is shown in Eq. 1.

$$ y = f\left( x \right) + \varepsilon = \mathop \sum \limits_{j = 1}^{m} g\left( {x;T_{j} ,M_{j} } \right) + \varepsilon. $$

(1)

Bagging

Bootstrap aggregation or bagging is generally employed to decrease variance within a noisy dataset. It follows the ensemble learning method such that when an algorithm overfits (high variance and low bias) or underfits (low variance and high bias) to its training set, ensemble methods can then account for the generalization of the model to new datasets (Breiman 1996). Considering this, in the present study, a random sample of data in a training set was selected with replacement. This allowed the selection of individual data points more than once; however, the models so generated were weak because, on an individual level, its performance may not be significant due to high variance or high bias. Post-generation of several data samples, these weak models were then trained independently, given regression and classification. In general, the aggregation of these weak models allowed reducing biases and variances, thereby yielding improved model performance.

Random Subspace (RSS)

The random subspace (RSS) ensemble is a machine learning algorithm introduced by Ho (1998). The algorithm combines multiple classifiers and their outputs (predictions) from multiple decision trees via a voting approach. It has overcome one of the critical shortcomings of traditional decision trees (Lasota et al. 2013). This has been achieved by addressing the decision-making tree classifier overfitting issue (i.e., high variance and low bias). Further, it ensures that the precision of the training results remained protected. Skurichina and Duin (2002) classified inputs of the RSS algorithm into four categories, viz. (1) training dataset (T_x), (2) base classifier (C_w), (3) size of subspaces (S_L), and (4) number of subspaces (S_ds). In general, the procedure follows selecting a random subset of input features (columns) for each model in the ensemble and thereby fitting the model on the model in the entire training dataset. It can be implemented using bootstrap or random sample (rows) in the training dataset.

M5P

Quinlan (1992) introduced the M5 algorithm which was further reconstructed to develop the M5P model tree. This integrates the traditional decision tree with the linear regression function. Wang and Witten (1996) described the four steps in the M5P algorithm, viz. (1) splitting of input spaces; (2) developing a linear regression model; (3) pruning process; and (4) smoothing process. Besides, the M5P algorithm has been recognized as a robust algorithm due to its greater efficiency while dealing with missing data problems. Since M5P can efficiently handle and process large datasets so as to ensure reduced errors in the output, this study has considered it for analyzing and predicting the ET₀ process in the study area.

The present study acquired information about the splitting criteria for the M5P model tree based on the error calculated at each node. (Linear regression functions are assigned on terminal nodes.) The standard deviation of the class values is used for analyzing the error at each node. The attribute at each node is tested so as to select a particular attribute for splitting. This selection is majorly driven by determining the attribute that maximizes the expected error reduction, which can be obtained by standard deviation reduction (SDR), as shown in Eq. 2.

$$ {\text{SDR}} = {\text{SD}}\left( A \right)\mathop \sum \limits_{1}^{i} \frac{{\left| {A_{i} } \right|}}{\left| A \right|}{\text{SD}}\left( {A_{i} } \right) $$

(2)

where A represents the set of instances that attain the node, A_i represents the subset of illustrations that have the ith product of the possible set, and SD represents the standard deviation.

REPTree

Quinlan (1987) introduced the reduced error pruning tree (REPTree) algorithm as a representative technique to explain decision tree learning problems. REPTree is an ensemble of a traditional decision tree, wherein it generates a decision regression tree by using the gain ratio information and by separating and pruning the regression tree. More precisely, in this algorithm, the training data are split into two sets, viz. training and pruning set, such that ~ 75% of the data are used for training purposes (training the decision tree), while the remaining data are used for pruning purposes (pruning data help determine accuracy measurement). As pruning decreases the decision tree size by error elimination of individual trees (Rahman et al. 2021), this study has considered the REPTree algorithm as a standard model for predicting evapotranspiration.

Model performance indicators

This study included different statistical indices for evaluating the model performance, such as mean absolute error (MAE), relative absolute error (RAE), root mean square error (RMSE), root-relative mean square error (RRMSE), and correlation coefficient (r). MAE statistics represent the mean absolute deviation of forecasted values from the observed values of time series, as shown in Eq. 3, while RAE statistic represents the ratio of the absolute error of the measurement to the actual measurement which helps to determine the magnitude of the absolute error in terms of the actual size of the measurement. RMSE statistics represent the root mean square deviation of forecasted values from the observed values of time series, as shown in Eq. 4. More precisely, the relative error provides inferences about the strength of measurement with reference to the actual measurement. RRSE statistic normalizes the total squared error by dividing it with the simple predictor (average of the actual values), thereby allowing reducing the error to the same dimensions as the quantity being forecasted, mathematically shown by Eq. 5. The correlation coefficient represents the measure of linear association between the dependent and independent variables, as shown in Eq. 6.

$$ {\text{MAE}} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left| {{\text{ET}}_{{{\text{pi}}}} - {\text{ET}}_{{{\text{oi}}}} } \right| $$

(3)

$$ {\text{RMSE}} = \sqrt {\frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left( {{\text{ET}}_{{{\text{pi}}}} - {\text{ET}}_{{{\text{oi}}}} } \right)^{2} } $$

(4)

$$ {\text{RRMSE}} = \frac{{{\text{RMSE}}}}{{{\text{ET}}_{{{\text{pi}}}} }} = \frac{{\sqrt {\frac{1}{N}\mathop \sum \nolimits_{i = 1}^{N} \left( {{\text{ET}}_{{{\text{pi}}}} - {\text{ET}}_{{{\text{oi}}}} } \right)^{2} } }}{{{\text{ET}}_{{{\text{pi}}}} }} $$

(5)

$$ r^{2} = \frac{{\mathop \sum \nolimits_{i = 1}^{N} \left( {{\text{ET}}_{{{\text{pi}}}} - {\text{ET}}_{{\text{p}}} } \right)\left( {{\text{ET}}_{{{\text{oi}}}} - {\text{ET}}_{{\text{o}}} } \right)}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{N} \left( {{\text{ET}}_{{{\text{pi}}}} - {\text{ET}}_{{\text{p}}} } \right)^{2} } \sqrt {\mathop \sum \nolimits_{i = 1}^{N} \left( {{\text{ET}}_{{{\text{oi}}}} - {\text{ET}}_{o} } \right)^{2} } }} $$

(6)

where ET_pi and ET_0i are ith ET₀ values predicted/forecasted and observed/calculated by FAO56-PM, respectively, and ET_p and ET₀ are average values of ET₀ predicted/forecasted and observed/calculated by FAO56-PM, respectively. Besides, N represents the number of data, while the range between zero and the closest value indicates good performance for all indices except r², such that the best value for r² is 1.

Results

Best subset regression and sensitivity analysis

Selection of best input combination

The best input combination has been selected using the six statistical criteria (i.e., MSE, determination coefficients (R²), adjusted R², Mallows' C_p, Akaike's AIC, and Amemiya's PC) at two stations, and the results are given in Tables 1 and 2. From Table 1, it can be seen that four number of input variables is identified as the best input combination, as it has the lowest values of Mallows' C_p of 4.002 and Amemiya's PC of 0.058 and has the highest values of R² (0.943) and high adj-R² (0.943) among all input combinations at Faisalabad station. Similarly, at Islamabad station, the four number of input variables is identified as the best input combination with the lowest values of Mallows' C_p of 6.696 and Amemiya's PC of 0.045 and the highest values of R² (0.956) and high Adj-R² (0.955) among all input combinations, as given in Table 2. In this study, whole datasets for both the stations are divided into two segments, viz. training dataset and testing dataset. Seventy-five percentage of datasets were allocated for training the models and the remaining 25% were considered for validating the models.

Table 2 The best subset regression analysis for determining the best input combinations at Faisalabad station

Full size table

Sensitivity analysis

The sensitivity analysis of the input variables has been carried out through regression analysis to identify the most effective input parameters in the prediction of ET₀ using machine learning models. The obtained results from the regression analysis are given in Tables 3 and 4 and Figs. 2 and 3. The results of performed regression analysis on all input parameters proved that T_min, RH_Avg, U_x, and n by having absolute standard coefficients (0.634, −0.225, 0.384, and 0.070) were identified as the most influential input parameters, respectively, for prediction of ET₀ at Faisalabad station. In the case of Islamabad station, the findings of performed regression analysis on all input parameters revealed that T_min, RH_Avg, U_x,, and n by having the highest standard coefficients (0.661, −0.212, 0.312, and 0.168) were identified as the most influential input parameters, respectively.

Table 3 The best subset regression analysis for determining the best input combinations at Islamabad station

Full size table

Table 4 The regression analysis for identifying the most effective parameters for ET_o estimation at Faisalabad station

Full size table

Implementation of machine learning algorithm at two different gauging stations

The ET₀ at two different meteorological stations was estimated by applying novel hybrid machine learning algorithms. The performances of the applied algorithms were evaluated and compared based on performance indicators (i.e., MAE, RMSE, RAE, RRSE, and r). The model with a high r and lowest values of MAE, RMSE, RAE, and RRSE with close to zero is considered the higher accuracy in the prediction of ET₀. The general trend of MAE, RMSE, RAE, RRSE, and r is presented in Tables 5 and 6. The M5P algorithm has improved the performance of AR in the prediction of ET₀ with greater accuracy as compared to other hybrid algorithms at selected (both) meteorological stations.

Table 5 The regression analysis for identifying the most effective parameters for ET_o estimation at Islamabad station

Full size table

Table 6 RMSE, NSE, WI, and r for meta-heuristics algorithms-based models during the training and testing span at Faisalabad station

Full size table

Prediction of ET₀ at Faisalabad station

The performance of the applied algorithms, namely additive regression (AR), AR-bagging, AR-RSS, AR-M5P, and AR-REPTree, was assessed using performance indicators (i.e., MAE, RMSE, RAE, RRSE, and r) at Faisalabad station and is presented in Table 6. It is apparent from Table 6 that the hybrid AR-RSS algorithm performed better during the training period and the AR-M5P models performs better than other applied algorithms during the testing period. The additive AR, AR-bagging, AR-RSS, AR-M5P, and AR-REPTree provided MAE = 0.468, 0.489, 0.430, 0.459, and 0.477, RSME = 0.669, 0.658, 0.620, 0.638, and 0.684, RAE = 23.03, 24.06, 21.16, 22.57, and 23.47%, RRSE = 29.04, 28.55, 26.88, 27.665, and 29.66%, and r = 0.957, 0.959, 0.963, 0.961, and 0.955 during training the period and MAE = 0.726, 0.692, 0.656, 0.570, and 0.679, RSME = 0.999, 0.994, 0.915, 0.789, and 0.984, RAE = 32.38, 30.87, 29.30, 24.45, and 30.34%, RRSE = 39.08, 38.88, 35.78, 30.90, and 38.49%, and r = 0.927, 0.928, 0.934, 0.957, and 0.934 during the testing period, respectively. Therefore, the value of correlation coefficient was highest and error parameters were obtained lowest for the AR-M5P model as compared to the other models testing span and considered as the best model in estimation of ET₀ at Faisalabad meteorological station.

Figure 4a–e shows the time-series plot (left side) and scatter plot (right side). In scatter plots, the regression line provided the coefficient of determination (R²) as 0.860 for the AR, 0.826 for the AR-bagging model, 0.872 for the AR-RSS, 0.915 for the AR-M5P model, and 0.872 for the AR-REPTree model (Fig. 4a to e). The regression line (RL) was located below the best fit (1:1) for all the applied models, which means these models underestimated the ET₀ values concerning the observed ET₀ values. The AR-M5P model provided the RL near to the best fit line and showed superior performance among other models.

In addition to the above, the performance of the applied models was assessed using a radar chart of the best-calculated value of RMSE. Having a better diagnostic analysis of the efficiency of all models, the values of RMSE are shown in Fig. 5. It can be inferred that the AR-M5P model has a lower value of RMSE; this revealed that the AR-M5P model performed better than other models. Further comparative analysis (AR) model was located furthest. The AR-M5P model was located nearest to the observed point based on the standard deviation, correlation, and RMSE. This showed AR as the worst model and AR-M5P as the best model among the selected models.

Prediction of ET₀ at Islamabad station

The results obtained for ET₀ estimation at Islamabad station are shown in Table 7. It is revealed that the AR-bagging model was superior with r = 0.964, MAE = 0.355, RMSE = 0.470, RAE = 423.85% and RRSE = 26.73% in the training period, whereas the AR-M5P model showed superiority among other models during testing period with r = 0.961, MAE = 0.437, RMSE = 0.570, RAE = 27.86%, and RRSE = 31.37%. It is revealed that the AR-M5P model performs superior to the applied models.

Table 7 RMSE, NSE, WI, and r for meta-heuristics algorithms-based models during the training and testing span at Islamabad station

Full size table

The scatter plots (right side in Fig. 6) and time-series graphs (left side in Fig. 6) of the observed ET₀ against the predicted ET₀ of the additive regression (AR), AR-bagging, AR-RSS, AR-M5P, and AR-REPTree models over the testing span are shown in Fig. 6a–e. The RL provided the coefficient of determination (R²) as 0.850 for the AR, 0.886 for the AR-bagging model, 0.908 for the AR-RSS, 0.923 for the AR-M5P model, and 0.834 for the AR-REPTree model. The RL of the AR-M5P model is located just above the best fit 1:1 line. This reveals that the AR-M5P model has high accuracy in the estimation of ET₀ values at the Islamabad station.

Figure 7 represents the radar chart for the best-calculated values of the RMSE. It can be inferred that the AR-M5P model has higher accuracy in estimating streamflow values as the model has a lower value of RMSE. Further comparison among the applied models has been made using the Taylor diagram. The AR-M5P model (in Fig. 8) showed the highest correlation coefficient with a low value of RMSE and located near the observed point. The AR-REPTree model is located farthest from the observed point with a lower value of correlation coefficient and a high value of RMSE. The AR-M5P model can be considered as the best model in the estimation of ET₀ at Islamabad station.

Pearson correlation matrix and Heat Maps

Figure 10a, b presents the Pearson correlation matrix and heat maps of Faisalabad and Islamabad stations resulting from input dataset for explaining the relation between explanatory and response variables. The input parameters, namely minimum temperature and sunshine hours, exhibited equally strong positive correlation with actual and AR-M5P ET₀ which was computed as 0.91 and 0.73 for Faisalabad station, but 0.84 and 0.73 for Islamabad station, respectively. Strong negative correlation was found between average RH and actual (PM56) ET₀ (0.39, 0.48) in Faisalabad station. Similar relation was found between average RH and AR-M5P ET₀ (0.36, 0.49) in case of Islamabad station. Interestingly, wind speed parameter was also found strongly correlated with the AR-M5P ET₀ in positive direction (0.85) in Faisalabad, but looked weakly correlated, i.e., 0.57 in case of Islamabad station. This indicated that wind speed plays a vital role and should be considered as an effective climatic parameter for ET₀ estimation at Faisalabad station. In addition, wind speed for Faisalabad station was recorded highest (386.28 km/day) due to its geographical position. Also, severe types of the thunderstorm were observed every year due to western cold wind direction. Further, it has humid summer and dry winter seasons which also support our above-mentioned results.

Discussion

The performance of hybrid machine learning algorithms was assessed for the estimation of reference ET₀ values at two different meteorological stations. The results revealed that machine learning algorithms have the prediction potential for ET₀. More specifically, the AR-M5P model showed the superior result. The scatter plot between the observed and estimated ET₀ values at different locations is presented in Figs. 4 and 5. It can be inferred that the RL provided the high value of the R² with reference to the AR-M5P model at both the selected stations. This showed the superiority of the AR-M5P model. Further comparison among AR alone and hybrid algorithms was made using a radar chart (Figs. 6, 7) of best-calculated value of RMSE. The result revealed that the AR-M5P model predicted ET₀ more precisely at both stations, as it has a lower value of RMSE. The Taylor diagram (Figs. 8, 9) showed the more comparable depiction of model performance in ET₀ values. The AR-REPTree model was located furthest and the AR-M5P model was located nearest to the observed point based on the standard deviation, correlation, and RMSE at both the selected meteorological stations. This showed AR-M5P model has higher accuracy in the prediction of ET₀ values as compared to other selected algorithms.

The results obtained from this study were also compared with the recent work (Kisi et al. 2015; Kisi 2015; Feng et al. 2017a, b; Shriri, 2018; Fan et al 2018a, b; Wang et al. 2017a; Wang et al. 2017b; Malik et al. 2018) conducted in different continents of the world. Kisi et al. (2015) investigated the comparative performance of four different artificial neural network algorithms, namely multi-layer perceptron-artificial neural networks (MLP-ANN), ANFIS with grid partition (ANFIS-GP), ANFIS with subtractive clustering (ANFIS-SC), and GEP, in predicting monthly ET₀ from 50 meteorological stations in Iran. The study concluded that the ANFIS-GP model was better than the others applied models. Similarly, Kisi (2015) explored the applications of LSSVM, MARS, and M5Tree in simulating monthly pan evaporation for the locations of Antalya and Mersin in Turkey. MARS's performance was better than that of the LSSVM and M5Tree. Feng et al. (2017a, b) computed daily ET₀ for southwestern China between 2009 and 2014 using RF and GRNN models utilizing meteorological information. The RF method was deemed to be superior even though both methods were judged to be suitable. Shiri (2018) estimates ET₀ by weather data via the combined wavelet random forest approach (WRF). It was concluded that outcomes were better when the WRF hybrid model. Fan et al. (2018a, b) investigated the potential for the daily ET₀ modeling of restricted weather data using the K-fold-validation approach in terms of gradient boosting decision-making (GBDT), extreme gradient boosting (XGBoost), RF, and M5 modeling tree (M5Tree). The authors also applied SVM and ELM models to compare the outcomes. They employed meteorological variables from a variety of climates to validate the models from 1961 to 2010 in China. The study suggested using GBDT and XGBoost models to estimate ET₀ in China's varying climate. Wang et al. (2017b) applied models, such as the multi-layer perceptron (MLP), the fuzzy genetic (FG), the long short-term memory (LSVM), the multi-layer perceptron (MLR), and the SS models, to predict pan evaporation in China. They claimed that soft computing strategies outperformed both the MLR and SS models in terms of performance. A study conducted by Wang et al. (2017a) examined the ability of the FG, ANFIS-GP, and M5Tree models to predict monthly pan evaporation in the Yangtze River Basin in China. The results revealed that the FG model outperformed the other models in terms of estimated proficiency. Granata (2019) used meteorological data from central Florida, USA, with four distinct neural network models such as bagging, RF, M5P regression tree, and support vector regression. These algorithms were tested in the humid subtropical climate conditions and compared with the observed ET₀ value. Experimental results showed that the M5P model produced the greatest results when coupled with meteorological data and soil moisture content. The study also confirmed that the AR machine learning algorithms performed better with M5P algorithms and have higher accuracy than other applied hybrid models in the prediction of ET₀ at both stations.

Conclusions

This study applies five data intelligent and hybrid metaheuristic algorithms, namely additive regression (AR), AR-bagging, AR-random subspace (AR-RSS), AR-M5P, and AR-REPTree, in order to investigate their potential for reference evapotranspiration (ET₀) prediction. The input dataset of 30 years (1987–2016) for two meteorological stations from semi-arid climatic conditions has been used in this study. In addition, ET₀ was determined using the global standard FAO-PM56 method and used as the benchmark for selected data intelligent and hybrid metaheuristic algorithms. The reduction in climatic parameters was performed using sensitivity analysis through regression methods. The study found that minimum temperature, average relative humidity, wind speed, and sunshine hours are prime climatic parameters for ET₀ prediction at studied stations. Based on the result of performing indices, it was concluded that AR-M5P ranked at first place compared with other machine learning algorithms using limited meteorological input for the ET₀ modeling process. Experimental results showed that AR machine learning algorithms performed better with M5P algorithms and have higher accuracy than other applied hybrid models in prediction of ET₀ at both stations.

Data availability and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Software and Code availability

MATLAB 9 (MathWorks Inc., Natick, MA, USA), R 3.4 (R Foundation, Vienna, Austria), and Weka 3.8.1 (The University of Waikato, Hamilton, New Zealand) are used in the current study.

References

Abdullah SS, Malek MA, Abdullah NS et al (2015) Extreme Learning Machines: a new approach for prediction of reference evapotranspiration. J Hydrol 527:184–195. https://doi.org/10.1016/j.jhydrol.2015.04.073
Ahmadi F, Mehdizadeh S, Mohammadi B, Pham QB, Doan TNC, Vo ND (2021) Application of an artificial intelligence technique enhanced with intelligent water drops for monthly reference evapotranspiration estimation. Agric Water Manag 244:106622. https://doi.org/10.1016/j.agwat.2020.106622
Article Google Scholar
Allen RG, Jensen ME, Wright JL, Burman RD (1989) Operational estimates of reference evapotranspiration. Agron J 81:650–662. https://doi.org/10.2134/agronj1989.00021962008100040019x
Article Google Scholar
Allen RG, Pereira LS, Raes D, Smith M (1998) Crop evapotranspiration-Guidelines for computing crop water requirements-FAO Irrigation and drainage paper 56. Fao, Rome 300(9):5109
Allen R, Smith M, Perrier A, Pereira LS (1994) An update for the definition of reference evapotranspiration. ICID Bull. 43:1–34. https://doi.org/10.12691/ajwr-5-4-3
Amarasinghe UA, Smakhtin V (2014) Global water demand projections: past, present and future Colombo, Sri Lanka: International Water Management Institute (Vol. 156). IWMI. https://doi.org/10.5337/2014.212
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. https://doi.org/10.1007/BF00058655
Article Google Scholar
Dhillon R, Rojo F, Upadhyaya SK et al (2019) Prediction of plant water status in almond and walnut trees using a continuous leaf monitoring system. Precis Agric 20:723–745. https://doi.org/10.1007/s11119-018-9607-0
Elbeltagi A, Aslam MR, Mokhtar A, Deb P, Abubakar GA, Kushwaha NL, Venancio LP, Malik A, Kumar N, Deng J (2021) Spatial and temporal variability analysis of green and blue evapotranspiration of wheat in the Egyptian Nile Delta from 1997 to 2017. J Hydrol 594:125662. https://doi.org/10.1016/j.jhydrol.2020.125662
Article Google Scholar
Fan J, Yue W, Wu L et al (2018a) Agricultural and forest meteorology evaluation of SVM, ELM and four tree-based ensemble models for predicting daily reference evapotranspiration using limited meteorological data in different climates of China. Agric For Meteorol 263:225–241. https://doi.org/10.1016/j.agrformet.2018.08.019
Article Google Scholar
Fan J, Yue W, Wu L, Zhang F, Cai H, Wang X, Lu X, Xiang Y (2018b) Evaluation of SVM, ELM and four tree-based ensemble models for predicting daily reference evapotranspiration using limited meteorological data in different climates of China. Agric For Meteorol 263:225–241. https://doi.org/10.1016/j.agrformet.2018.08.019
Article Google Scholar
Fang W, Huang S, Huang Q et al (2018) Reference evapotranspiration forecasting based on local meteorological and global climate information screened by partial mutual information. J Hydrol 561:764–779. https://doi.org/10.1016/j.jhydrol.2018.04.038
Article Google Scholar
Feng Y, Cui N, Gong D et al (2017a) Evaluation of random forests and generalized regression neural networks for daily reference evapotranspiration modelling. Agric Water Manag 193:163–173. https://doi.org/10.1016/j.agwat.2017.08.003
Article Google Scholar
Feng Y, Cui N, Gong D, Zhang Q, Zhao L (2017b) Evaluation of random forests and generalized regression neural networks for daily reference evapotranspiration modelling. Agric Water Manag 193:163–173. https://doi.org/10.1016/j.agwat.2017.08.003
Article Google Scholar
Ferreira LB, da Cunha FF (2020) New approach to estimate daily reference evapotranspiration based on hourly temperature and relative humidity using machine learning and deep learning. Agric Water Manag 234:106113. https://doi.org/10.1016/j.agwat.2020.106113
Article Google Scholar
Fischer G, Tubiello FN, van Velthuizen H, Wiberg DA (2007) Climate change impacts on irrigation water requirements: effects of mitigation, 1990–2080. Technol Forecast Soc Change 74:1083–1107. https://doi.org/10.1016/j.techfore.2006.05.021
Article Google Scholar
Gavilan P, Berengena J, Allen RG (2007) Measuring versus estimating net radiation and soil heat flux: impact on Penman-Monteith reference ET estimates in semiarid regions. Agric Water Manag 89(3):275
Article Google Scholar
Gleeson T, Wada Y, Bierkens MFP, van Beek LPH (2012) Water balance of global aquifers revealed by groundwater footprint. Nature 488:197–200. https://doi.org/10.1038/nature11295
Article Google Scholar
Granata F (2019) Evapotranspiration evaluation models based on machine learning algorithms—a comparative study. Agric Water Manag 217:303–315. https://doi.org/10.1016/j.agwat.2019.03.015
Article Google Scholar
Guo X, Sun X, Ma J (2011) Prediction of daily crop reference evapotranspiration (ET o) values through a least-squares support vector machine model. Hydrol Res 42:268–274. https://doi.org/10.2166/nh.2011.07222
Article Google Scholar
Han Y, Wu J, Zhai B et al (2019) Coupling a bat algorithm with XGBoost to estimate reference evapotranspiration in the arid and semiarid regions of China. Adv Meteorol, pp 1–16. https://doi.org/10.1155/2019/9575782
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844. https://doi.org/10.1109/34.709601
Article Google Scholar
Ibrahim D (2016) An overview of soft computing. Proc Comp Sci 102:34
Article Google Scholar
Karimaldini F, Teang Shui L, Ahmed Mohamed T, Abdollahi M, Khalili N (2011) Daily evapotranspiration modeling from limited weather data by using neuro-fuzzy computing technique. J Irrig Drain Eng 138(1):21
Article Google Scholar
Kisi OC¸ Imen M (2009) Evapotranspiration modelling using support vector machines. Hydrol Sci J 54:918–928. https://doi.org/10.1623/hysj.54.5.918
Article Google Scholar
Kisi O (2013) Least squares support vector machine for modeling daily reference evapotranspiration. Irrig Sci 31:611–619. https://doi.org/10.1007/s00271-012-0336-2
Article Google Scholar
Kisi O (2015) Pan evaporation modeling using least square support vector machine, multivariate adaptive regression splines and M5 model tree. J Hydrol 528:312–320. https://doi.org/10.1016/j.jhydrol.2015.06.052
Article Google Scholar
Kisi O, Sanikhani H, Zounemat-Kermani M, Niazi F (2015) Long-term monthly evapotranspiration modeling by several data-driven methods without climatic data. Comput Electron Agric 115:66–77. https://doi.org/10.1016/j.compag.2015.04.015
Article Google Scholar
Kumar R, Shankar V, Kumar M (2011) Modelling of crop reference evapotranspiration: areview. Universal J Environ Res Technol 1(3):239.
Kushwaha NL, Rajput J, Elbeltagi A, Elnaggar AY, Sena DR, Vishwakarma DK, Mani I, Hussein EE (2021) Data intelligence model and meta-heuristic algorithms-based pan evaporation modelling in two different agro-climatic zones: a case study from Northern India. Atmosphere 12:1654. https://doi.org/10.3390/atmos12121654
Article Google Scholar
Kushwaha NL, Bhardwaj A, Verma VK (2016) Hydrologic response of Takarla-Ballowal Watershed in Shivalik foot-hills based on morphometric analysis using remote sensing and GIS. J Indian Water Resour Soc 36:17–25. http://iwrs.org.in/36-1/
ŁabęDzki L, Kanecka-Geszke E, Bak B, Slowinska S (2011) Estimation of reference evapotranspiration using the FAO Penman-Monteith method for climatic conditions of Poland. Evapotranspiration, In: Tech. Prof. Leszek Labedzki Ed.; Poland.
Lasota T, Łuczak T, Niemczyk M, Olszewski M, Trawiński B (2013) Investigation of property valuation models based on decision tree ensembles built over noised data. In: Bǎdicǎ C., Nguyen N.T., Brezovan M. (eds) Computational collective intelligence. technologies and applications. ICCCI 2013. Lecture Notes in Computer Science, vol 8083. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40495-5_42
Lu G, Wu Z, He H (2010) Hydrological cycle and quantity forecast. Beijing, Science Press (in Chinese).
Malik A, Kumar A, Kisi O (2018) Daily pan evaporation estimation using heuristic methods with gamma test. J Irrig Drain Eng 144. https://doi.org/10.1061/(ASCE)IR.1943-4774.0001336
Malik A, Kumar A, Ghorbani MA, Kashani MH, Kisi O, Kim S (2019) The viability of co-active fuzzy inference system model for monthly reference evapotranspiration estimation: case study of Uttarakhand State. Hydrol Res 50(6):1623–1644. https://doi.org/10.2166/nh.2019.059
Article Google Scholar
Mattar MA (2018) Using gene expression programming in monthly reference evapotranspiration modeling: a case study in Egypt. Agric Water Manag 198:28–38. https://doi.org/10.1016/j.agwat.2017.12.017
Article Google Scholar
Mcmahon T, Peel M, Lowe L, Srikanthan R, Mcvicar T (2013)Estimating actual, potential, reference crop and pan evaporation using standard meteorological data: a pragmatic synthesis. Hydrol Earth Syst Sci 17(4):1331.
Mehdizadeh S, Behmanesh J, Khalili K (2017) Using MARS, SVM, GEP and empirical equations for estimation of monthly mean reference evapotranspiration. Comput Electron Agric 139:103–114. https://doi.org/10.1016/j.compag.2017.05.002
Article Google Scholar
Nourani V, Elkiran G, Abdullahi J (2019) Multi-station artificial intelligence based ensemble modeling of reference evapotranspiration using pan evaporation measurements. J Hydrol 577:123958. https://doi.org/10.1016/j.jhydrol.2019.123958
Article Google Scholar
Nouri H, Beecham S, Kazemi F, Hassanli A, Anderson S (2013) Remote sensing techniques for predicting evapotranspiration from mixed vegetated surfaces. Hydrology and Earth System Sciences Discussions 10(3):3897
Google Scholar
Quinlan JR (1987) Simplifying decision trees. Int J Man Mach Stud 27(3):221–234. https://doi.org/10.1016/S0020-7373(87)80053-6
Article Google Scholar
Quinlan, J. R. (1992). Learning with continuous classes. In: Adams & Sterling (eds) 5th Australian joint conference on artificial intelligence, World Scientific, vol. 92, pp. 343–348. Singapore.http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.34.885&rep=rep1&type=pdf
Rahimikhoob A (2010) Estimation of evapotranspiration based on only air temperature data using artificial neural networks for a subtropical climate in Iran. Theoret Appl Climatol 101(1):83
Article Google Scholar
Rahman M, Chen N, Elbeltagi A, Islam MM, Alam M, Pourghasemi HR et al (2021) Application of stacking hybrid machine learning algorithms in delineating multi-type flooding in Bangladesh. J Environ Manage 295:113086. https://doi.org/10.1016/j.jenvman.2021.113086
Article Google Scholar
Raza A, Hu Y, Shoaib M, Abd Elnabi MK, Zubair M, Nauman M, Syed NR (2021a) A systematic review on estimation of reference evapotranspiration under prisma guidelines. Polish J Environ Stud. https://doi.org/10.15244/pjoes/136348
Raza A, Shoaib M, Baig MAI, Ahmad S, Khan MM, Ullah MK, Hashim S (2021b) Comparative study of powerful predictive modeling techniques for modeling monthly reference evapotranspiration in various climatic regions. Fresenius Environ Bull 30(6b):7490–7513.
Rossini M, Fava F, Cogliati S et al (2013) Assessing canopy PRI from airborne imagery to map water stress in maize. ISPRS J Photogramm Remote Sens 86:168–177. https://doi.org/10.1016/j.isprsjprs.2013.10.002
Article Google Scholar
Roy DK, Lal A, Sarker KK, Saha KK, Datta B (2021) Optimization algorithms as training approaches for prediction of reference evapotranspiration using adaptive neuro fuzzy inference system. Agric Water Manag 255:107003. https://doi.org/10.1016/j.agwat.2021.107003
Article Google Scholar
Saggi MK, Jain S (2019) Reference evapotranspiration estimation and modeling of the Punjab Northern India using deep learning. Comput Electron Agric 156:387–398. https://doi.org/10.1016/j.compag.2018.11.031
Article Google Scholar
Sattari MT, Apaydin H, Band SS, Mosavi A, Prasad R (2021) Comparative analysis of kernel-based versus ANN and deep learning methods in monthly reference evapotranspiration estimation. Hydrol Earth Syst Sci 25(2):603–618. https://doi.org/10.5194/hess-25-603-2021
Article Google Scholar
Shamshirband S, Kamsin A (2016) Comparative analysis of reference evapotranspiration equations modelling by extreme learning. Machine 127:56–63. https://doi.org/10.1016/j.compag.2016.05.017
Article Google Scholar
Shiri J (2018) Improving the performance of the mass transfer-based reference evapotranspiration estimation approaches through a coupled wavelet-random forest methodology. J Hydrol 561:737–750. https://doi.org/10.1016/j.jhydrol.2018.04.042
Article Google Scholar
Shukla R, Kumar P, Vishwakarma DK, Ali R, Kumar R, Kuriqi A (2021) Modeling of stage-discharge using back propagation ANN-, ANFIS-, and WANN-based computing techniques. Theor Appl Climatol. https://doi.org/10.1007/s00704-021-03863-y
Article Google Scholar
Skurichina M, Duin RP (2002) Bagging, boosting and the random subspace method for linear classifiers. Pattern Anal Appl 5(2):121–135. https://doi.org/10.1007/s100440200011
Article Google Scholar
Sparapani R, Spanbauer C, McCulloch R (2021) Nonparametric machine learning and efficient computation with Bayesian additive regression trees: the BART R package. J Stat Softw 97(1):1–66. https://doi.org/10.18637/jss.v097.i01
Summit Sherma RDG (2016) Prediction of evapotranspiration by artificial neural network and conventional methods. Int J Eng Res 5(1):184.
Tabari H, Martinez C, Ezani A, Hosseinzadeh Talaee P (2013) Applicability of support vector machines and adaptive neurofuzzy inference system for modeling potato crop evapotranspiration. Irrig Sci 31:575–588. https://doi.org/10.1007/s00271-012-0332-6
Article Google Scholar
Tan YV, Roy J (2019) Bayesian additive regression trees and the General BART model. Stat Med 38(25):5048–5069. https://doi.org/10.1002/sim.8347
Article Google Scholar
Tang D, Feng Y, Gong D et al (2018) Evaluation of artificial intelligence models for actual crop evapotranspiration modeling in mulched and non-mulched maize croplands. Comput Electron Agric 152:375–384. https://doi.org/10.1016/j.compag.2018.07.029
Article Google Scholar
Torres AF, Walker WR, Mckee M (2011) Forecasting daily potential evapotranspiration using machine learning and limited climatic data. Agric Water Manag 98:553–562. https://doi.org/10.1016/j.agwat.2010.10.012
Trajkovic S, Kolakovic S (2009) Estimating reference evapotranspiration using limited weather data. J Irrigation Drainage Eng 135(4):443.
Traore S, Luo Y, Fipps G (2016) Deployment of artificial neural network for short-term forecasting of evapotranspiration using public weather forecast restricted messages. Agric Water Manag 163:363–379. https://doi.org/10.1016/j.agwat.2015.10.009
Article Google Scholar
Valipour M, Sefidkouhi MAG, Raeini-Sarjaz M, Guzman SM (2019) A hybrid data-driven machine learning technique for evapotranspiration modeling in various climates. Atmosphere (Basel) 10:311. https://doi.org/10.3390/atmos10060311
Article Google Scholar
Vinukollu RK, Wood EF, Ferguson CR, Fisher JB (2011) Global estimates of evapotranspiration for climate studies using multi-sensor remote sensing data: Evaluation of three process-based approaches. Remote Sens Environ 115(3):801
Article Google Scholar
Vishwakarma DK, Pandey K, Kaur A, Kushwaha NL, Kumar R, Ali R, Elbeltagi A, Kuriqi A (2022) Methods to estimate evapotranspiration in humid and subtropical climate conditions. Agric Water Manag 261:107378. https://doi.org/10.1016/j.agwat.2021.107378
Article Google Scholar
Walls S, Binns AD, Levison J (2020) Prediction of actual evapotranspiration by artificial neural network models using data from a Bowen ratio energy balance station. Neural Comput Appl 32:14001–14018. https://doi.org/10.1007/s00521-020-04800-2
Article Google Scholar
Wang L, Kisi O, Hu B, Bilal M, Zounemat-Kermani M, Li H (2017a) Evaporation modelling using different machine learning techniques. Int J Climatol 37:1076–1092. https://doi.org/10.1002/joc.5064
Article Google Scholar
Wang L, Kisi O, Zounemat-Kermani M, Li H (2017b) Pan evaporation modeling using six different heuristic computing methods in different climates of China. J Hydrol 544:407–427. https://doi.org/10.1016/j.jhydrol.2016.11.059
Article Google Scholar
Wang Y, Witten IH (1996) Induction of model trees for predicting continuous classes. (Working paper 96/23). Hamilton, New Zealand: University of Waikato, Department of Computer Science. https://hdl.handle.net/10289/1183
Wu L, Peng Y, Fan J, Wang Y, Huang G (2021) A novel kernel extreme learning machine model coupled with K-means clustering and firefly algorithm for estimating monthly reference evapotranspiration in parallel computation. Agric Water Manag 245:106624. https://doi.org/10.1016/j.agwat.2020.106624
Article Google Scholar
Zhao L, Xia J, Xu C-Y, Wang Z, Sobkowiak L, Long C (2013) Evapotranspiration estimation methods in hydrological models. J Geogr Sci 23(2):359
Article Google Scholar

Download references

Acknowledgements

We are thankful to Pakistan Meteorological Department to access climatic data.

Funding

The authors received no specific funding for this work.

Author information

Authors and Affiliations

Faculty of Agriculture, Agricultural Engineering Department, Mansoura University, Mansoura, 35516, Egypt
Ahmed Elbeltagi
School of Agricultural Engineering, Jiangsu University, Zhenjiang, 212013, People’s Republic of China
Ali Raza & Yongguang Hu
Environmental and Natural Resources Engineering, Lulea University of Technology, 97187, Lulea, Sweden
Nadhir Al-Ansari
Division of Agricultural Engineering, ICAR–Indian Agriculture Research Institute, New Delhi, 110012, India
N. L. Kushwaha
Department of Civil Engineering, Indian Institute of Technology (IIT) Kharagpur, Kharagpur, 721302, West-Bengal, India
Aman Srivastava
Department of Irrigation and Drainage Engineering, G.B. Pant, University of Agriculture and Technology, Pantnagar, Uttarakhand, 263145, India
Dinesh Kumar Vishwakarma
School of Transportation, Southeast University, Nanjing, 21009, China
Muhammad Zubair

Authors

Ahmed Elbeltagi
View author publications
You can also search for this author in PubMed Google Scholar
Ali Raza
View author publications
You can also search for this author in PubMed Google Scholar
Yongguang Hu
View author publications
You can also search for this author in PubMed Google Scholar
Nadhir Al-Ansari
View author publications
You can also search for this author in PubMed Google Scholar
N. L. Kushwaha
View author publications
You can also search for this author in PubMed Google Scholar
Aman Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Dinesh Kumar Vishwakarma
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Zubair
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AR was involved in conceptualization and data curation; DKV, AS, and AE were involved in performance analysis; AE and AR were involved in initial draft—writing; NA was involved in funding; YH was involved in supervision; and NLK, AS, and MZ were involved in review and editing.

Corresponding authors

Correspondence to Ahmed Elbeltagi, Ali Raza or Nadhir Al-Ansari.

Ethics declarations

Conflict of interest

The authors declare no conflict of interests.

Ethical approval

Not applicable.

Consent to participate

Not applicable.

Consent to Publish

All authors give their permission.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Elbeltagi, A., Raza, A., Hu, Y. et al. Data intelligence and hybrid metaheuristic algorithms-based estimation of reference evapotranspiration. Appl Water Sci 12, 152 (2022). https://doi.org/10.1007/s13201-022-01667-7

Download citation

Received: 13 December 2021
Accepted: 29 March 2022
Published: 06 May 2022
DOI: https://doi.org/10.1007/s13201-022-01667-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Data intelligence and hybrid metaheuristic algorithms-based estimation of reference evapotranspiration

Abstract

Similar content being viewed by others

A CMIP6-ensemble-based evaluation of precipitation and temperature projections

Assessment of the AquaCrop model to simulate the impact of soil fertility management on evapotranspiration, yield, and water productivity of maize (Zea May L.) in the sub-humid agro-ecology of Nigeria

Analysis of factors affecting evapotranspiration zoning

Introduction

Materials and methods

Study area, data preparation, and preprocessing

Methodology

Regression and sensitivity analysis for best input combination

Additive regression (AR)

Bagging

Random Subspace (RSS)

M5P

REPTree

Model performance indicators

Results

Best subset regression and sensitivity analysis

Selection of best input combination

Sensitivity analysis

Implementation of machine learning algorithm at two different gauging stations

Prediction of ET0 at Faisalabad station

Prediction of ET0 at Islamabad station

Pearson correlation matrix and Heat Maps

Discussion

Conclusions

Data availability and materials

Software and Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Consent to Publish

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Prediction of ET₀ at Faisalabad station

Prediction of ET₀ at Islamabad station