1 Introduction

Evapotranspiration (ET) is the simultaneous process of water loss from the soil surface by evaporation and from the crop by transpiration [1]. It consists of evaporation from open water, bare soil, and rainfall interception as well as transpiration from plants [2]. Evapotranspiration (ET) is a key process in the energy and hydrologic budget of the earth with implications in water management works such as agriculture and hydropower generation. For irrigation planning, ET is a prerequisite [3]. Accurate quantification of ET, especially in arid or semiarid areas, is important in agroecosystems management to efficiently use water resources such as in irrigation management, water allocation, environmental assessment, protecting ground and surface water quality, and evaluating water yield for changing land use [4].

ET has also an important role in the land surface energy and hydrologic cycles [2]. On average, across all continents, the annual ET amount is about 70% of annual precipitation. This figure varies from up to 90% in dry regions (e.g., Australia) to approximately 60% in Europe [5]. Changes in actual evapotranspiration (AET) will change atmospheric dynamics which have a direct impact on weather and climate as it changes the sensible and latent heat energy partitioning [6]. Thus, the global water cycle and energy budget and their role in the climate system can be understood by a detailed understanding and knowledge of changes in regional and global ET [7].

Actual evapotranspiration (AET) is measured by lysimeter, Bowed ratio, or the eddy covariance technique [8]. In contrast to routine meteorological observations, in many regions across the world, in-situ measurements of AET are not available [9]. Evapotranspiration is a complex and nonlinear phenomenon because it depends on several interacting climatological factors such as temperature, humidity, wind speed, radiation, type and growth stage of a crop [10, 11]. Since direct measurement of AET using lysimeter measurement is a difficult task, potential evapotranspiration (PET) and Class A pan evaporation measurements have been used to estimate AET [12, 13]. PET is usually estimated using empirical methods or equations or PET models and cannot be directly observed [14]. PET denotes the upper limit of AET and it is used to estimate AET [9]. Note that PET is the total possible amount of moisture evaporating from the land surface that would occur with sufficient water supply and under certain meteorological conditions without advection and heating effects whereas AET is the total amount of ET that occurs [13, 15, 16].

Nowadays, PET and reference evapotranspiration (ETo) are considered to be the same [17]. The Penman–Monteith method of the Food and Agricultural Organization (FAO56) of estimation of reference crop evapotranspiration or simply reference evapotranspiration denoted by ETo is the ET rate from a reference surface, not short of water [1]. This method was developed for a reference hypothetical crop with an assumed 0.12 m height, 70 s m−1 surface resistance, and albedo = 0.23, resembling closely actively growing and adequately watered green grass of uniform height and the evaporation from its extensive surface, [1]. The equation by [1] has been suggested as the standardized ETo equation, but it has a high climatic data requirement. As a result, there is a practical need to find other best alternative methods to estimate ETo in the areas where full climatic data are deficient [18]. From the available conventional methods, selecting the most reliable methods requires selecting methods that give reliable results with preferably minimum data prerequisite [19]. Moreover, the methods result in different estimates due to different data requirements, the different climate regions, etc. they are based on. That means ET varies according to weather and climate conditions. Hence, for a particular climate region, the most reliable method(s) has to be identified and selected from the available numerous methods or a new method that is suitable for that particular climate condition has to be developed. In this regard, many studies have been conducted to compare the different PET models [3, 16, 20, 25]. The studies concluded quite differently and PET models vary across climate regimes. Thus, Numerous PET formulae exist.

Based on the types of input variables [25], and [26] classified PET models into three groups namely radiation-based, temperature-based, and combination models. PET models can also be classified into five main groups: (1) water budget, (2) mass transfer, (3) radiation-based, (4) temperature-based, and (5) combinational [27]. For example, the Penman–Monteith (FAO56) method of estimation of reference crop evapotranspiration which is widely used for estimation of PET is one of the combination models.

A few studies in Ethiopia have evaluated the performance of PET models using either a short period observations [28, 29] or a single station [30]. Their result may not be representative of the climatology (a long period average) and for a large area like Ethiopia where the climate varies regionally. Thus, evaluating the performance of different PET models over longer periods in different areas across Ethiopia with different climate regimes was a necessity [20]. In this study, for the period 1982–2020, the performance of seventeen PET models in five climatologically different sites across Ethiopia is evaluated with reference to the ETo estimated using the FAO56 (Penman–Monteith) method.

Therefore, the merit of this study is to identify the most reliable method that can be used for estimaton of ETo or PET in places which lack full meteorological data to estimate ETo using [1] (FAO56). Thus, the finding of the study is more relevant for countries having less number of synoptic or principal meteorological stations such as Ethiopia. Ethiopia is selected as a case study because the number of stations recording meteorological parameters that are necessary for calculating ETo are limited in number and they are sparsely distributed. Also, most of the stations have shorter records (observation time) with missing values.

2 Materials and methods

2.1 Study area

The study was conducted at five sites in Ethiopia (Fig. 1). Ethiopia is located in the Horn of Africa. The climate in Ethiopia varies from dry to sub-humid [31]. Ethiopia’s climate is multifaceted in the interior short horizontal distances; climates from tropical to sub-humid and subtropical to arctic can happen [32]. 43% of Ethiopia is highland (altitude above 1500 m a.s.l.) covering the whole 72 zones of Ethiopia [33]. Almost half of all the highlands of Africa are found in Ethiopia [30]. The major rain season for the highlands of Ethiopia is between June and September [34]. In Ethiopia, the summer period, June–September, is locally known as ‘Kiremt’; Kiremt is the rainy period representing 50–70% of the mean annual rainfall. The winter period, October-January, is locally known as ‘Bega’; Bega is the dry season in most parts of Ethiopia except some southern parts. February-May is locally known as ‘Belg’; Belg is the second rainy period in most parts of Ethiopia representing 20–30% of the mean annual rainfall.

Fig. 1
figure 1

Study area, Ethiopia located in East Africa, and five Meteorological stations

For the five study sites namely Bahir Dar, Bale Robe, Hawassa, Metehara, and Nazareth (Adama) cities, climate data was obtained from the Ethiopia Meteorology Institute formerly called the National Meteorology Agency of Ethiopia (NMA), and its regional offices at Bahir Dar, Bale Robe, Hawassa, and Nazareth. Bahir Dar city which is one of the most beautiful cities in Africa is the capital of the Amhara regional state of Ethiopia where 85% of the water of the longest river in the World (the Nile River) originates. The Nile flows within the city of Bahir Dar. Also, the largest lake in Ethiopia, Lake Tana (area: 2156 km2), is also part of the city. Currently, the Amhara region is characterized by erratic rainfall, high land degradation, high population density, high rate of poverty and malnutrition [35] and it has a monsoonal climate with annual rainfall varying between 800 and 3000 mm and annual evapotranspiration amid 1400 and 1681 mm [30]. Bale Robe is the capital town of the Bale zone or district; 10 years ago Bale Goba, which has more tourist attractions, was the capital city of the Bale district. Bale zone is characterized by bimodal rainfall categories with a total rainfall of about 590 mm in Kiremt, 560 mm in Belg, and almost no rainfall in Bega seasons. Bale is highly productive and suitable for agricultural activities. The highest elevations in Bale Highlands are at Mount Tuludimtu (4377 m) and Batu (4307 m). Hawassa, like Bahir Dar, is one of the most beautiful cities in Africa. It is the capital city of the southern part of Ethiopia. Lake Langano (area: 230 km2) is in the vicinity of Hawassa. Hawassa, Metehara, and Nazareth cities lie in the great rift valley of Africa. Metehara station is characterized by an arid and semi-arid environment. In the district surrounding Metehara & Nazareth cities, livestock production is the main source of income for the communities followed by mixed crop livestock production [36].

2.2 Data

For the years from 1982 to 2020, 14,245 observations for each daily meteorological parameter such as precipitation (P), maximum air temperature (Tmax), minimum air temperature (Tmin), air Temperature (T), relative air humidity (RH), wind speed at 2 m (u2), and sunshine hours/duration (SD) were obtained from Ethiopia Meteorology Institute (EMI). The percentage of missing data was above 0.8% for RH, u2, and SD while P, Tmax, and Tmin data have almost no missing values. The missing values for the three daily meteorological parameters (RH, u2, and SD) were filled fitting simple linear regression equation between observation data and global re-analysis data sets such as the National Aeronautics and Space Administration (NASA) and Climate Research Unit (CRU) data sets (Table 1).

Table 1 Brief overview of Meteorological Stations used in this study from 1982 to 2020. (

2.3 Potential evapotranspiration (PET) models

The Food and Agricultural Organization (FAO) Penman–Monteith method (FAO56) for estimation of the reference evapotranspiration (ETo) by [1] is taken as the sole standard reference method for estimating reference evapotranspiration as well as for comparing potential evapotranspiration across the world [37]. In this study, seveteen methods of estimation of potential evapotranspiration (PET) are compared with FAO56 method (see Eq. 2.7 in Table 2). The seventeen PET methods are selected based on literature review. For example, Enku (Enk) and Priestley Tayler (PT) are selected based on their particular suitability for the climate condition of Ethiopia (refer Table 2 for references). Penman (1963) (Pen) and Thornthwaites (1948) (Tho) are chosen because of their high global acceptance. Hargreaves's (1985) method (Har) which requires air temperature and extraterrestrial solar radiation which is computed from the latitude of the study site has been used in many countries. Wendling (1991) (Wen) is principally selected for data spare areas. Antensay Mekoya’s method (Ant) is recent and was applied to the climate of Germany; in this study, the suitability of this method for the climate of Ethiopia is checked. The Albrecht (Alb), Schendel (Sch), Turc (Tur), Makkink (Mak), and Blaney–Criddle (BC) methods are selected because they are applied in many tropical regions across the world. In a recent study by [20], the Hamon version2 temperature-based method (Ha2) is found to be the second best method next to FAO56 in estimating pan evaporation in China [20]. As compared to solar radiation, the effect of humidity and wind speed in estimating ET is relatively minimal [38,39,40]. Thus, the so-called solar radiation method (Rad) is also compared in this study. As compared to FAO56 (FAO), all seventeen methods are easy to apply and applied in many countries when a full set of climate data is not available.

Table 2 The PET models (methods) used in the study; for names of input variables see Table 3
Table 3 List of input variables used for the calculation of PET models

2.4 Model validation metrics

The above seventeen PET models are compared with each other and with the reference method (FAO56) using model validation metrics such as Nash–Sutcliffe efficiency (NSE), root mean square error (RMSE), mean percent of error (MPE) or equivalently mean absolute percent of error (MAPE), standard deviation (s) or equivalently the standard deviation difference from the reference method (sd), and correlation (r) [20, 51,52,53,54,55,56]. The model evaluation statistics were applied by considering the reference method (FAO56) as measured (observed) values (x) while the rest seventeen PET models were taken as estimated (simulated) values (y). In the linear regression equation (‘y = ax + bʹ), the y-intercept (b) and slope (a) indicate how well ‘y’ relates or matches with ‘x’. The y-intercept indicates the presence of a lead or lag, or that the data sets are not perfectly aligned while the slope indicates the degree or magnitude of the relationship between model predictions and measured data[57].

2.5 Standard deviation (s)

$${\text{s}} = \sigma = \sqrt {\frac{{\Sigma \left( {xi - {\upmu }} \right)^{2} }}{N}}$$
(2.19)

where; s: sample standard deviation, \(\sigma\): population standard deviation, µ: the population mean, xi: each value from the population, N = 14,245 days from 1/1/1982 to 31/12/2020: the size of the population. In this study, the standard deviation difference (sd) was calculated by subtracting the standard deviation of the reference method from the standard deviation of the PET models.

2.6 Nash–Sutcliffe model efficiency coefficient (NSE)

$${\text{NSE }} = 1 - \frac{{{\Sigma }_{{{\text{t}} = 1}}^{{\text{n}}} \left( {{\text{X}}_{{\text{t}}} - {\text{Y}}_{{\text{t}}} } \right)^{2} }}{{{\Sigma }_{{{\text{t}} = 1}}^{{\text{n}}} \left( {{\text{X}}_{{\text{t}}} - {\text{X}}_{{\text{m}}} } \right)^{2} }}$$
(2.20)

where Yt: is PET by each of the seventeen PET models or methods at time t, Xt: is PET by FAO56 method (FAO) at time t, and Xm: is the mean of PET by FAO56 method; t ranges from t = 1 to t = 14,245.

2.7 Root mean square error (RMSE)

$${\text{RMSE}}\,\,{ = }\,\,\sqrt {\left( \frac{1}{n} \right)} \sum\nolimits_{{{\mathbf{i = 1}}}}^{{\mathbf{n}}} {\,\,\left( {{\mathbf{Ep}}_{{\mathbf{i}}} - {\mathbf{E}}t_{i} } \right)}^{{\mathbf{2}}}$$
(2.21)

whereas Epi is estimated PET values by the 17 PET Models, Eti tested data from cross-validation (PET by FAO56 method) and n is the total number of simulated values. It was used as the standard statistical metric providing a relatively high weight to large errors.

2.8 Centered root mean square error (CRMSE)

$${\text{CRMSE}} = \sqrt {\left( {\frac{{\mathbf{1}}}{{\mathbf{n}}}} \right)\sum\nolimits_{{{\mathbf{i = 1}}}}^{{\mathbf{n}}} {\left[ {\left( {{\mathbf{Ep}}_{{\mathbf{i}}} {\mathbf{ - Ep}}_{{{\mathbf{mean}}}} } \right){\mathbf{ - }}\left( {{\mathbf{Et}}_{{\mathbf{i}}} {\mathbf{ - Et}}_{{{\mathbf{mean}}}} } \right)} \right]}^{{\mathbf{2}}} }$$
(2.22)

CRMSE is centered as the mean values of the data (observations and predictions) are subtracted first.

2.9 Mean percentage of error (MPE)

$${\text{MPE }} = { }\frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {\frac{{\left( { Y_{i} - X_{i} } \right)}}{{X_{i} }}{ } \times 100{\text{ \% }}} \right)$$
(2.23)

where MPE = 0 is the optimal value and the positive and negative values of MPE indicate an over- and under-estimate of the PET estimates by the seventeen PET Models as compared to FAO56, respectively [58].

2.10 Mean absolute percentage of error (MAPE)

$${\text{MAPE }} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {\frac{{|Y_{i} - X_{i} |}}{{X_{i} }}{ } \times 100{\text{ \% }}} \right)$$
(2.24)

The mean absolute percentage error (MAPE) also called mean absolute percentage deviation (MAPD) is the most common measure used to know model (forecast) error; it works best if there are no zeros and no extremes to the data,

2.11 Correlation coefficient (r)

The correlation coefficient is an index of the degree of parallel relationship between observed and simulated data and Pearson’s product-moment correlation coefficient (r) is calculated as shown below:

$${\text{r }} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {X_{i} - \overline{X}} \right)\left( {Y_{i} - \overline{Y}} \right)}}{{\sqrt {\left[ {\mathop \sum \nolimits_{i = 1}^{n} \left( {X_{i} - \overline{X}} \right)^{2} } \right]} \sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {Y_{i} - \overline{Y}} \right)^{2} } }}$$
(2.25)

where Xi is the ith observation for the constituent being evaluated, Yi is the ith simulated value for the constituent being evaluated, n is the total number of observations, and the overbar denotes the mean for the entire period of the evaluation with r ranging from− 1 to 1. If r = 0, no linear relationship exists. If r = 1 or − 1, a perfect positive or negative linear relationship exists, respectively.

2.12 Methodology (research design)

This study compared and evaluated the performance of seventeen selected PET models in estimating reference evapotranspiration (ETo) over five sites in Ethiopia (see Appendix 1). First, daily PET values were calculated for each of the five sites using the seventeen PET models. The Penman–Monteith (FAO56) method for estimating crop reference evapotranspiration (ETo) simply denoted here as ‘FAO’ was the reference method. Second, areal average ETo and PET (for the seventeen models) computed using the five sites were used as a proxy value for Ethiopia. For precise comparison of the performance of the seventeen PET models with each other, five ranks were computed for each of five validation metrics (statistical measures) namely standard deviation difference (sd), RMSE, MPE, NSE, and correlation (r). Then, the rank of the average rank of the five ranks was taken as a final rank. Third, out of a total of seventeen PET models, ten more reliable models having 1st to 10th rank, r \(\ge\) 0.58, and MAPE ≤ 27.3% were screened for the next analysis. Note that the absolute values of sd & MPE (not shown in (Tables 4, 5, 6, 7, 8, 9) were used to rank the seventeen and ten PET models over Ethiopia and in the five sites, respectively. While ranking, equal weight was given to each of the five validation metrics. Box plots and Taylor diagrams were also used for visual comparison. Fourth, for each of the five sites, the screened ten PET models were also compared with each other using FAO56 as the reference method. Validation metrics were also applied to compare the ten PET models with each other in a similar way as described for the seventeen PET models.

Table 4 Statistical summary and rank of ten PET models as compared to FAO over Ethiopia
Table 5 Statistical summary and rank of ten PET models as compared to FAO56 (FAO) at Bahir Dar
Table 6 Statistical Summary and Rank of ten PET models as compared to FAO56 (FAO) at Bale Robe
Table 7 Statistical Summary and Rank of Ten PET Models as compared to FAO56 (FAO) at Hawassa
Table 8 Statistical Summary and Rank of Ten PET Models as compared to FAO56 (FAO) at Metehara
Table 9 Statistical summary and rank of ten PET models as compared to FAO56 (FAO) at Nazareth

3 Results and discussions

3.1 Results

3.1.1 Areal average ranks of PET models

For each of the five sites, the correlation test showed that the seventeen PET models were significant at a 95% confidence level indicating that all are appropriate methods for estimation of reference evapotranspiration or potential evapotranspiration; df = n−2 = 14,243, p-value < 0.05. Six PET models namely Alb, BC, Ha2, Ha3, Har, and Sch highly under or over-estimated PET (MAPE > 47%); they had also lower correlation (see Table 4 and Fig. 2). On the other hand, one PET model namely Tur had negative correlation although it had not highly over- or under- estimated PET (MAPE = 12.24%). The average of the five sites ETo can be used as a proxy value for Ethiopia particularly in data-scarce areas. On the basis of the areal average of the five sites, Wen, Ant, Pen, Mak, PT, Ha5, Ha4, Tho, Rad, and Enk ranked 1st to 10th (see Table 4 and Fig. 2).

Fig. 2:
figure 2

1982–2020 areal average FAO56 (FAO) and seventeen PET models (Alb to Wen) over Ethiopia

3.1.2 Rank of PET models over selected five sites in Ethiopia

For each of the five sites the ten PET models (Wen, Ant, Pen, PT, Mak, Ha5, Ha4, Tho, Rad, and Enk) which were ranked 1st to 10th, respectfully were selected for next steps (see Table 4 and Fig. 2). For a site, the rank of the average rank of the five statistical measures was used as a measure to compare the ten PET models. For example, at Bahir Dar site, as seen in Table 5, PT, Pen, Wen, Ant, and Ha5 were ranked 1st to 5th (see also Figs. 3, 4). At Bale Robe and Hawassa sites, PT and Tho got the first rank while Wen was ranked first at two sites, at Metehara and Nazareth sites (see (Tables 6, 7, 8, 9, Figs. 5, 6, 7, 8, 9, 10, 11, 12).

Fig. 3
figure 3

Boxplot of the daily reference and potential evapotranspiration at Bahir Dar (1982–2020)

Fig. 4
figure 4

Taylor Diagram between FAO56 (FAO) and ten PET models at Bahir Dar (1982–2020)

Fig. 5
figure 5

Boxplot of the daily reference and potential evapotranspiration at Bale Robe (1982–2020)

Fig. 6
figure 6

Taylor Diagram between FAO56 (FAO) and ten PET models at Bale Robe (1982–2020)

Fig. 7
figure 7

Boxplot of the daily reference and potential evapotranspiration at Hawassa (1982–2020)

Fig. 8
figure 8

Taylor Diagram between FAO56 (FAO) and ten PET models at Hawassa (1982–2020)

Fig. 9
figure 9

Boxplot of the daily reference and potential evapotranspiration at Metehara (1982–2020)

Fig. 10
figure 10

Taylor Diagram between FAO56 (FAO) and ten PET models at Metehara (1982–2020)

Fig. 11
figure 11

Boxplot of the daily reference and potential evapotranspiration in Nazareth (1982–2020

Fig. 12
figure 12

Taylor Diagram between FAO56 (FAO) and ten PET models at Nazareth (1982–2020)

3.2 Discussions

3.2.1 Areal average ranks of PET models

The NSE values were relatively ranged from − 255.40 to 0.36. The largest drawback of NSE was that calculation of squared values was required to show the relationships among the measured and the imitation values which may result in over- and under-estimation of larger values while the lower values were neglected [59, 60]. As suggested by [28] the Priestley-Tayler calibrated method as the most suitable method for ET estimation in Ethiopia. In his study, Seleshi used monthly data from 167 stations found in different climatic zones of Ethiopia for the period from 2011 to 2015 to compare PT with BC and Enk methods using FAO as the reference method. The result of this study is in agreement with Seleshi; however, it has the advantage of using long-year daily data and consideration of many methods. In another study conducted using the last 20 years’ daily climate data of 27 sites in the Nile Basin of Ethiopia to compare four methods, the estimated PET ranged from 3.5 to 5.5 mm/day [61]. Their result does not contradict the current study result (1.73–5.74 mm/day). Using four statistical measures and daily climate data from 1985 to 2014 for Malaysia, a tropical country like Ethiopia, [62] compared 31 methods. In their study, they found PT to be the most reliable method. In another similar study conducted for the evaluation of eight ET estimation methods using 30 years of daily data in Iran, PT was also the most reliable method [63]. The result of the above studies is in agreement with the result of this current study.

3.2.2 Rank of PET models over selected five sites in Ethiopia

For the Bahir Dar site [61], suggested that ETo ranged from 2.06 to 4.54 mm/day and had a standard deviation of 0.77 while in this study it was 1.64–7.71 mm/day and 0.54, respectively. The calibrated Priestley-Taylor model was found to be the most reliable ([13, 28]). In a similar study that uses average ranks of statistical measures (validation metrics) to compare PET models with FAO, Mak obtained the 7th rank [51]. However, in another study conducted ten years ago, the Mak method was the most suitable method followed by Tur and PT [47]. [30] compared eight PET models and Piche pan evaporation with 11 years of Class A pan measurements and found that Enk is the most reliable method next to FAO. However, in this study, Enk was the 10th most reliable method. For Bale Robe ETo ranged from 1.48 to 4.47 mm/day with a mean of 2.81 mm/day. In another study, the range and mean values of PET for Bale Robe and its surrounding areas were overestimated (3.1 to 3.9 & 3.4 mm/day, respectively) based on information extracted from five meteorological stations’ data from 1987 to 2017 [64].

For the Hawassa site and its surrounding Rift Valley areas of Ethiopia such as Ziway, the PET estimates are expected to be similar. [38] used remotely sensed data to estimate Lake Ziway evaporation due to the limited availability of observed meteorological data. Using NOAA’s spectral data from 1994 to 1995, PET was estimated to range from 4.6 to 6.1 mm/day with a mean and standard deviation of 5.6 mm/day and 0.5, respectively [65]. [65] also overestimated ETo as compared with the result of this study (range = 1.57–4.84 mm/day, mean = 2.82 mm/day, s = 0.4). In another similar study conducted in Malaysia to compare PET models using climate data from 1972 to 2001, Tho estimated ETo with the least error [47]. For the Metehara site and its surrounding areas, according to a study conducted in a similar climatic area with Metehara which is located in the Rift Valley of Ethiopia using climate data from 1977 to 2018, Tho was found to be the best method [66]. Nazareth (Adama) is also located in the Rift Valley of Ethiopia. For Nazareth and its surrounding areas, Wed & Mak were found to be the most reliable PET models. The result of this study (see (Tables 5, 6, 7, 8, 9) agrees with the findings of a study conducted using monthly average daily data from 1950 to 2014 [9]. In another study conducted in Tharandt, Germany using ten minutes and daily climate data from 2004 to 2014, Wen was found to be the most reliable method [67]. Mak method is preferred for a region where its geographic feature is multifaceted such as southern China where all the calculated values were based on daily (1962–2013) meteorological data [17].

3.2.3 Justification of the results

The best performed models for estimtation of ETo at the five sites in Ethiopia were Wen or PETWen (Eq. 2.2), Ant or PETAnt (Eq. 2.5), and Pen or PETPen (Eq. 2.1). Wen and Ant are radiation-based methods while Pen is combination method. According to [26], combination and radiation-based models give more reliable PET estimates compared to the mass transfer- and temperature-based methods. From the three best performed models, Wen and Ant are better than Pen (or they are preferred to Pen) due to their low data requirements. The two radiation-based models (Wen and Ant) are a function of relative air humidity (RH). RH may have significant impact in PET or ETo estimation in Ethiopia. It may also be a dominant meteorological parameter affecting ETo changes in the country. Thus, the justification of the result may require further study.

4 Summary, conclusions and recommendations

4.1 Summary

For five sites in Ethiopia namely Bahir Dar, Bale Robe, Hawassa, Metehara, and Nazareth daily potential evapotranspiration (PET) values were calculated for selected seventeen PET models using 39 years (1982–2020) of daily meteorological data such as maximum and minimum air temperature, maximum and minimum relative air humidity, wind speed at two meters above the ground, and sunshine duration hours. The Penman–Monteith (FAO56) method was used as the reference method to calculate daily reference evapotranspiration (ETo) values. Then, in each site, the seventeen PET models’ daily values were compared with ETo values. Out of the seventeen PET models, ten PET models with correlation ≥ 0.58 and mean absolute percent error (MAPE) ≤ 27.3% were selected (screened) as reliable methods for the estimation of ETo or PET in Ethiopia, particularly at the five sites. Wen, Ant, Pen, PT, Mak, Ha5, Ha4, Tho, Rad, and Enk, ranked first to tenth, were taken as reliable PET models for estimating ETo or PET over Ethiopia. To compare and rank the seventeen and ten PET models over Ethiopia and at the five sites, the rank of the average rank of five model validation metrics namely standard deviation (s) or equivalently standard deviation difference from the reference method, root mean square error (RMSE), mean absolute percent error (MAPE), Nash–Sutcliffe Efficiency (NSE), and correlation (r) were used.

The result showed that at Bahir Dar and Bale Robe sites and nearby areas Priestly-Taylor method (PT) was the best method while the Wendling method (Wen) was the best method for Metehara and Nazareth and in their vicinity. The method by Thornthwaites (Tho) was found to be the most reliable method for Hawassa and nearby areas. For each of the ten reliable PET models, by taking the average ranks of the five sites, Wen, Pen, Ant, PT, Mak, and Ha5 were found to be the most reliable methods in the five sites; ranked 1st to 6th, respectively.

International relevance (appeal statement) All the methods used in the study are standard methods of estimation of PET. The comparison is also performed using standard model validation metrics. Thus, the models used in this study are useful to compare and identify the most reliable methods of estimation of PET in a particular place across the world. The models can be used by practitioners following Table 2. For example, to estimate ETo using the Wending (1991) and Mekoya (2020b) methods which require less data inputs, Eq. 2.2 and Eq. 2.5 can be used, respectively (see Table 2 and Table 3). The models can also be used for spatial mapping.

4.2 Conclusions

Generally, evapotranspiration (ET) has a crucial role in the land surface energy and hydrologic balances as well as in agricultural activities such as irrigation management in water stress areas. Because direct measurements are difficult, they are often replaced by indirect methods. Also, the lack of sufficient meteorological data is a constraint for estimating ETo from widely used well-known models such as the Food and Agricultural Organization method by Penman–Monteith (FAO56). Agricultural planning in arid and semi-arid areas cannot be achieved without an understanding of the meteorological parameters; particularly precipitation and evapotranspiration. While precipitation is measured, evapotranspiration can only be predicted from other climatic parameters. The approach is presumably simple but provides reliable estimates in the data-scarce conditions of Ethiopia. The methodology applied in this study can be extended and applied to any location in the world to understand the regional hydrological phenomena; for instance, to know the water resources capacity and the irrigation needs of East Africa.

The novelty of the study is that an old method by [42] (Wen) and a new method by [41] (Ant) which are applied for the first time in Ethiopia performed the best in estimating potential evapotranspiration (PET) over the study area without any calibration. Although these two methods had shown good performance in a few studies that they are applied, e.g., [26, 67], and [68], they have not often been used worldwide. For instance, PET by [41] was not even considered in the recent study which compared 127 PET models in two sites in Greece [26] that inturn makes this study unique.

4.3 Recommendations

In Ethiopia, particularly in Amhara and Oromia regional states where the five sites are located, if climate data is a constraint to compute ETo using the reference method (FAO56), ensembles of three or more of the best performed six PET models (Wen, Pen, Ant, PT, Mak, and Ha5) are recommended in Ethiopia because they can estimate ETo with reasonably good accuracy. However, to produce a more reliable result, a detailed investigation using long year (above 30 years) climate data from as many meteorological stations as possible (> 100 stations or > 300 grid points) that can represent the whole Ethiopia is recommended.