1 Introduction

The Ethiopian economy heavily relies on agriculture, which supports more than 85% of the population and accounts for almost half of the country’s Gross domestic product (GDP) (CSA 2015). Cereals constitute approximately 70% of the crops grown in the nation and the majority of agricultural GDP (about 95%) and employment (85%) is generated by around 11.7 million smallholder households. Half of these households operate on plots of land that are one hectare or smaller, despite only 20% of the total arable land is being cultivated (CSA 2015). These farmers, and hence Ethiopia, face considerable challenges due to climate-induced droughts and water-related stresses, which have adverse effects on both crop and livestock productivity. These challenges are particularly pronounced in pastoral regions of the lowlands and densely populated, food-insecure areas in the highlands (FAO et al. 2017; Funk et al. 2018; WFP 2015). Drought-induced famines are aggravated by limited coping mechanisms and insufficient contingency planning to mitigate drought and the impacts of climate change. Weather and climate in Ethiopia have a significant influence on crop growth, overall yield, pest occurrences, and water and fertilizer requirements. Extreme weather events, such as dry spells, droughts, heatwaves, and excessive rainfall, have substantial adverse impacts on crop yields and subsequently contribute to food insecurity in the whole East African region (Ayugi et al. 2021; Funk et al. 2014; Megersa et al. 2022; Rettie et al. 2023; Van Ittersum et al. 2016). To assess how crop yields respond to these events and future climate change effects, statistical or process-based crop simulation models are employed. However, the accuracy of the relationship between weather and crop yield estimation depends on the quality of the input weather and climate data (Han and Ines 2017; Hansen 2005; Ines and Hansen 2006; Leng and Hall 2020; Roberts et al. 2017; Schlenker and Lobell 2010).

Major obstacles continue to exist to develop credible climate information in sub-Saharan African countries. While conditions vary in each country, common challenges include lack of quality, limited availability and accessibility of actionable climate and weather information (Dinku et al. 2010, 2018, 2008a,b; Verdin et al. 2020). Inadequate or fragmented spatial climate data coverage makes it difficult to provide location-specific crop advisories to farmers who are not in the vicinity of weather stations with long-term data records (Han et al. 2019; Hansen 2005, 2002; Hansen et al. 2006; Ines and Hansen 2006). Traditionally, the spatial distribution of precipitation is analyzed using ground observations forming a rain gauge network. In emerging countries, the network of rain gauge stations is sparse, and therefore, the interpolation using point-based rainfall information is subjected to a large uncertainty (Woldemeskel et al. 2013). Therefore, these countries face the significant challenge of lacking accessible and reliable meteorological datasets. Gridded weather datasets, which combine gauge stations, remote sensing, and climate models, offer a solution to bridge this gap and have been widely employed to evaluate climate impacts on agriculture in regions with limited data (Tarek et al. 2020).

In recent decades, new methodologies have been developed to estimate rainfall amounts and spatial distribution using a combination of gauge and satellite data, and re-analysis data. To this end, several gauges, satellite-based and reanalysis precipitation products have been developed and have recently become operational for rainfall monitoring systems (Sun et al. 2018). Evaluating the existing gridded products is necessary to promote their use, yet the evaluation of these products over Africa has been very limited (Dinku et al. 2014, 2008a; Gebremichael et al. 2010). Furthermore, study on comparison of CHIRPS against re-analysis data to determine the capability of each product in accurately identifying precipitation intensity and frequency needs further attention.

Satellite rainfall estimates are more important in areas where there are sparse or non-existent rain gauge networks. On the other hand, satellite-driven datasets may be misused if a validation process is not considered to detect systematic and/or random errors (Gebremichael et al. 2010). Validation may help quantify the errors of different products and be useful to choose the one that best represents the precipitation over a certain region. Dinku et al. (2014, 2010) evaluated seven different and merged satellite products over Ethiopia, finding a relatively low good performance for the Climate Hazards Group InfraRed Precipitation (CHIRPS v2) with Station data and CMORPHv0. These studies showed that products were good in detecting rainfall occurrence but poor in estimating the daily precipitation rate.

Historical observations of the Earth's system play a crucial role in understanding climate change and extreme weather events. Indeed, even in the era of satellite technology, observations alone do not offer a comprehensive view of the Earth's system at any given moment. Reanalysis methods help bridge these gaps in the observational record consistently, minimizing spurious signal fluctuations. The most recent reanalysis dataset from the European Centre for Medium-Range Weather Forecasts (ECMWF) is known as ERA5, covering data from 1979 to the present. ERA5 provides a wide range of variables, including air temperature, wind, rainfall, sea-surface temperature, and ocean wave height, spanning from the Earth's surface to the upper atmosphere (Hennermann 2020; Hersbach et al. 2020). ERA5 data are accessible in near real-time, typically within five days, and also include uncertainty information, with data from areas or time periods having fewer observations being associated with greater uncertainty. ERA5 data has been shown to perform better than the previous generation of ECMWF analysis (ERA-Interim) over Africa and reproduced a seasonal and annual cycle of temperature and rainfall reasonably (Gleixner et al. 2020). Owing to this good performance, ERA5 data find applications in various fields, including hydrological modeling, agriculture, renewable energy, and climate change monitoring (Tarek et al. 2020; Zandler et al. 2019).

The availability of observation station data for calibration and initialization significantly influences the performance of satellite-based precipitation estimates and reanalysis products. Ethiopia presents an ideal context to investigate the aforementioned hypothesis for several reasons. Firstly, the volume of gauge data has been steadily increasing over time at the National Meteorological Agency (NMA), although enhancing data sharing between NMA and global centers for model calibration and validation still needs to be improved. Secondly, because the country's terrain is complex, with more than eight different rainfall regimes (Diro et al. 2011; Korecha and Barnston 2007), it presents a challenge for climate model products to accurately capture microclimatic conditions. In this study, we aim to address potential concerns regarding gridded precipitation products in remote, topographically complex mountainous regions, considering spatiotemporal variations. In pursuit of this objective, we have computed a set of performance metrics to illuminate the temporal inaccuracies within the dataset and its capacity to depict atmospheric precipitation quantities in the complex mountainous peripheries of Ethiopia. Furthermore, we have quantified the sources of bias, offering insights that can guide future enhancements of gridded products.

The aims of this study are (i) to examine the spatiotemporal performance of ERA5 and CHIRPS across various time intervals within Ethiopia, (ii) to evaluate whether the upscaling approach employed is used to ensure a consistent comparison between point-to-pixel to a homogenous rainfall clustering that affects the evaluation of the performance and (iii) evaluate the capability of each product in accurately identifying diverse levels of precipitation intensity. The evaluation and validation of reanalysis products within the study area opens the door to their multifaceted applications, including the evaluation of extreme climate trends, support for agricultural decision-making models, and the efficient allocation of water resources.

2 Study area, datasets and methods

2.1 Study area

Ethiopia exhibits significant geographical diversity and is characterized by a wide range of altitudes, from 116 m below sea level in the Danakil depression to over 4600 m above sea level in the mountainous area of Ras Dashen (Fig. 1a). The northern part of the Great Rift Valley is its most distinctive feature, running through the entire country's northeast-southwest direction with elevation ranges from 1500 to 3000 m. It creates a central plateau discontinuity and is followed by the High Plateau region to the west. To the east of the Great Rift Valley lies the Somali Plateau, which is characterized by arid and rocky semidesert terrain, extending into the Ogaden region and covering the southeastern part of the country. In the northern region, the Danakil Desert stretches toward the Red Sea and the coastal foothills of Eritrea. The western boundary of Ethiopia is defined by the western escarpment of the High Plateau.

Fig. 1
figure 1

Topography and precipitation patterns in Ethiopia. a altitude in meters above sea level, represented according to the color scale reported in the legend. Cross symbols represent climate stations considered in this study; b annual average precipitation, computed using CHIRPS from 1981 to 2018

The complex topography of Ethiopia has a significant influence on its climate, resulting in a diverse range of microclimates that span from hot deserts in the lowlands to cooler conditions in the highlands. Particularly, the steep and high mountain profiles found in the northwestern and central-western regions of Ethiopia have a pronounced impact on the distribution, the amount, and the variability of precipitation patterns throughout the year (Gamachu 1988). While some areas, like the southeast, receive minimal rainfall, others, such as the northwestern highlands, experience more substantial rainfall from June to September (Fig. 1b). These variations are primarily driven by factors such as elevation, atmospheric pressure patterns, and local geographical characteristics (Steeneveld and Enyew 2014). Topography plays a critical role in triggering rainfall, initiating convective processes due to the heating of plateaus. Ethiopia experiences its primary rainy season during the northern hemisphere summer when air masses carrying moisture from the Indian Ocean and the Gulf of Aden converge above the Ethiopian plateau (Viste and Sorteberg 2013). The Guinea monsoon winds from the Gulf of Guinea in the Atlantic Ocean contribute a significant portion of Ethiopia's rainfall during the rainy seasons. Consequently, the northeastern, eastern, and southeastern parts of Ethiopia receive less precipitation compared to the western regions, resulting in predominantly dry river basins in these areas, such as Ogaden, Aysha, and Dinakle (Stojanovic et al. 2022). The hot zone in Ethiopia encompasses areas with elevations lower than 1500 m, characterized by torrid daytime conditions with greater daily temperature fluctuations than the other regions. Ethiopia’s dry season occurs from December to February, when dry winds blow over the country from South Asia, the Middle East, and the Arabian Peninsula, leading to virtually no precipitation. Orographic lifting, a phenomenon where mountains influence local meteorological conditions by inducing orographic rainfall on the windward side and creating a rain shadow effect with reduced rainfall on the leeward side of mountain ranges, is well-documented in Ethiopia (Georgiev et al. 2016; Van den Hende et al. 2021; White et al. 2021).

Ethiopia experiences three distinct seasons: the “short” rains (Belg; February–May), followed by the “long” rains (Kiremt; June–September), and the “dry season” (Bega; October–January) over most of the country (Korecha and Barnston 2007; Shanko and Camberlin 1998). Kiremt rains contribute substantially to the annual rainfall totals (Fig. 2b) in many parts of the country and are primarily influenced by the seasonal northward movement of the tropical rain band over Ethiopia (Gissila et al. 2004; Segele et al. 2009; Segele and Lamb 2005). Rainfall during the Kiremt is of utmost importance for agriculture, as it coincides with the growth stages of major rainfed crops like barley, maize, teff, and wheat, which cover most of the cultivated land and require substantial amounts of water for their development.

Fig. 2
figure 2

a Station clustering based on monthly climatology, altitude, and geographical locations. b The annual cycle monthly mean rainfall for each cluster based on CHIRPS data

2.2 Datasets

2.2.1 Local station data

Station data used in this study were obtained from Ethiopia’s National Meteorological Agency (NMA), which operates over 1200 rain gauges and automatic weather stations. Daily rainfall data for 1981–2018 were obtained for 167 high-quality NMA observed stations that were included for point-to-pixel evaluation (Fig. 1a). The gauge data have been subjected to regular quality assessments conducted by the NMA and some of these data have been employed in a previous study (Dinku et al. 2014, 2010; Segele and Lamb 2005). For this study purpose, further homogeneity tests and data quality checks were employed. Those stations that contributed in developing CHIRPS and ERA5 datasets are not included in further analyses to have an independent validation.

To capture the climatic and topographic diversity, we used a k-means clustering approach to group the homogenous stations distributed across the country (Fig. 2a). k-means is an algorithm for grouping data into a given number of clusters. The summary of the k-mean clustering approach that we used for this study is available from supplementary materials S3. The results obtained by these techniques (after 20 valid runs for k-means), six optimal number clusters were found (Fig S3) with a performance of 92.345% dissimilarity of each cluster. The clustering results show that the rainfall pattern is complex and cannot easily be reduced to altitude variation (Fig. 2b); some areas have rainfall for more than eight consecutive months (C and F). Others receive southwestern and northeaster rainfall for just a few months or days, while the southern part of Ethiopia is characterized by two distinct rainfall seasons. A significant portion of the northern highlands and central areas (Clusters B and D) follow boreal summer monsoon patterns, characterized by peak rainfall from June to September (JJAS). This period accounts for 50% to 80% of the total annual rainfall in Ethiopian agricultural regions (Korecha and Barnston 2007). In the southern part of Ethiopia, two distinct rainy seasons occur, with bimodal patterns observed in Cluster E during the long rains (JJAS) and the short rains (FMAM).

2.2.2 Reanalysis and blended data

2.2.2.1 CHIRPS

The Climate Hazards Group InfraRed Precipitation with Stations version 2 (CHIRPS V2.0) was developed through a partnership with scientists at the USGS Earth Resources Observation and Science (EROS) Center, aiming to provide comprehensive, accurate, and current data sets for various early warning goals, including trend analysis, monitoring seasonal droughts and global environmental changes (Funk et al. 2015a, 2014). CHIRPS data, which combines satellite products and ground stations to form a gridded rainfall time series, is available with a spatial resolution of 0.05° and 0.25° and has a global coverage ranging from 50S to 50N. The second version of CHIRPS is updated regularly and provides improved daily rainfall time series from 1981 to the present. The development and application of CHIRPS in drought monitoring in Africa (e.g., Ethiopia) are explained in detail (Funk et al. 2015a, 2014). For our study, we used 0.250 resolution of CHIRPS.

2.2.2.2 ERA5 reanalysis data

In 2019, the fifth generation of ECMWF atmospheric reanalysis version (ERA5) data was released under the Copernicus Climate Change Service (Hersbach et al. 2020). ERA5 provides hourly estimates of a large number of atmospheric, land, and oceanic climate variables. The data cover the entire globe from 1979 on a resolution of 0.25° grid with hourly intervals. The dataset is accessible through the Copernicus Climate Change Service Climate Data Store (CDS). A recent study shows that the climatological biases in temperature and precipitation across the East African region are reduced in ERA5 compared to ERA-interim (Gleixner et al. 2020). Comparison of ERA5 with gauge data makes sense as ERA5 precipitation were computed from initial reanalysis conditions (Hennermann 2020).

2.3 Methods

2.3.1 Categorization of rainfall events

In this study, we applied a point-to-pixel analysis for validating grided-based rainfall estimates against ground-based measurements from representative rain gauge stations (Fig. 1a). This methodology has been widely used for satellite rainfall estimation validation for capturing small-scale variability in heavy rainfall events and rainy-day frequency (Ayehu et al. 2018; Baez-Villanueva et al. 2018; Dembélé and Zwart 2016; Liu et al. 2020; Rahmawati and Lubczynski 2018; Zambrano-Bigiarini et al. 2017). Our approach involved comparing time-series rainfall data from the gauges with corresponding grid cells, similarly to Zambrano-Bigiarini et al. (2017) and Dinku et al. (2018). To maintain spatial resolution consistency, we utilized a common grid resolution of 0.25° × 0.25° for both CHIRPS and ERA5 products.

We categorized and examined daily precipitation occurrences using the criteria established by Zambrano-Bigiarini et al. (2017), as shown in Table 1. We aggregated daily observations from the 167 rain gauges and their corresponding satellite estimates into monthly, seasonal (JJAS, ONDJ, MAM), and annual values. This allowed us to evaluate the accuracy of each rainfall product across various timescales. Considering the temporal coverage of CHIRPS and ERA5 data, the assessment period for this study spans 38 years, from January 1981 to December 2018.

Table 1 Classification of rainfall events based on daily intensity (i) adopted from Zambrano-Bigiarini et al. (2017)

2.3.2 Evaluation metrics

Six different performance indices (three continuous and three categorical) were applied to the study area. Continuous indices are described in S1: these indices are the modified Kling–Gupta Efficiency (KGE), the Root Mean Square Error (RMSE), and the percent bias (PBIAS). The KGE (Eq. S1.1) is an index used to compare observed data with estimations. It decomposes the total performance into three different components with the same weight: the linear correlation (r), the bias ratio (β) and the variability ratio (γ) (Gupta et al. 2009; Kling et al. 2012; Zambrano-Bigiarini et al. 2017). The optimum value for KGE is unity. For KGE, the linear correlation (r) presents its optimum value at the unity (perfect correlation) being the minimum value − 1.0 (perfect negative correlation), the 0 indicates the absence of correlation. The bias term (β) measures the average tendency of the model estimated values to overestimate (β > 1) or underestimate (β < 1) the observed values. The variability ratio (γ) evaluates the dispersion of the model estimates compared to the observed data. Also, it presents its optimum value at the unity. In this study, we employed categorical indices to analyze different rainfall intensities (Table 1), using a contingency table (found in Supplement S2 Table 2) across various homogenous rainfall clusters (Fig. 2a). This approach facilitated the comparison of gridded rainfall predictions from ERA5 and CHIRPS with actual observed rainfall. Our analysis centered on three primary indices: Probability of correctness (PC), the Probability of Detection (POD, outlined in Eq. (S2.2)), Frequency Bias (fBias, Eq. (S2.4)), and False Alarm Ratio (FAR, Eq. (S2.3)). These metrics were crucial in assessing the accuracy of the grid-based rainfall forecasts from CHIRPS and ERA5. Details are available in Supplementary material S2. The POD, as highlighted by (Gneiting and Raftery 2007) and Stephenson (2000), is instrumental in assessing the ability of gridded rainfall products to correctly identify rainfall events, with values ranging from 0 (no correct identifications) to 1 (perfect detection). FAR is a measure emphasized by Wilks (2011) for its importance in meteorological accuracy assessment; it quantifies the frequency of false detections, also ranging from 0 (no false alarms) to 1 (all detections are false). These metrics are crucial in delineating the reliability of grided data in capturing actual rainfall events. fBias is an index used to compare the count of rainfall events detected by grided products against actual events recorded at corresponding rain gauges. Values greater than 1 in fBias indicate overestimation by CHIRPS and ERA5, while values less than 1 suggest underestimation.

Table 2 Comparative analysis of ERA5 performance relative to CHIRPS, aiming to assess the differences in bias and accuracy of rainfall estimates for various clusters across different time scales in Ethiopia

3 Results

To evaluate the performance of each product, we computed spatial maps in daily, seasonal (MAM-‘Belg,’ and JJAS-‘Kiremt,’). The six indices (KGE, RMSE, PBIAS, POD, FAR, and fBias) were applied to each timescale to evaluate the performance of each product in the study area.

3.1 Daily and seasonal gridded values

At the daily scale, CHIRPS KGE values predominantly fell in the range of 0.3 to 0.6 across most high-altitude areas (> 2000 m a.s.l), indicating reasonable agreement with observed data (Fig. 3). In contrast, ERA5 performed poorly for the higher elevations, between 2000 and 3500 m a.s.l. especially in the northern and central regions, reflecting challenges in capturing daily rainfall variability. Both products show lower KGE values over the eastern and southwestern part of the country.

Fig. 3
figure 3

Comparison of the Modified Kling–Gupta efficiency (KGE) between CHIRPS (upper panels), ERA5 (lower panels) and observations at the corresponding stations on a daily (left panel) and seasonal (MAM, middle and JJAS, right panel) time scales. The colors range from intense red to dark green, indicating very poor performance to optimum performance

The assessment of KGE at a monthly timescale revealed that CHIRPS consistently achieved the highest KGE’ values across all locations in comparison to ERA5. ERA5 exhibited lower performance, especially in the adjacent areas at higher latitudes. However, both products demonstrated good agreement in the lowland areas of the country. This aligns with previous research findings (Dinku et al. 2014), which showed that monthly time scales generally yield better performance compared to the daily scale.

The seasonal scale of KGE’ evaluation for CHIRPS and ERA5 presented a good performance for JJAS and MAM in the high-elevation areas. At the same time, CHIRPS performed better for both major rainfall seasons. ERA5 presented the lowest performance over Mid to high-altitude regions, which are high rainfall areas during JJAS season; during MAM season, except for some pocket areas, ERA5 shows a comparative performance for MAM season benefitted areas (south and southwestern Ethiopia). In general, ERA5 demonstrated subpar performance for both the MAM and JJAS seasons (corresponding to the Belg and Kiremt seasons, respectively) across Ethiopia.

3.2 Bias and temporal variability performance

The overall performance of KGE can be dissected into three components: linear correlation (r), bias (β), and variability (γ). This decomposition is beneficial because it allows to evaluate various aspects of precipitation accuracy. Specifically, r assesses the temporal dynamics of precipitation, while β and γ evaluate the volume and variability of rainfall, respectively.

Except for the JJAS season, both CHIRPS and ERA5 displayed moderate correlations (0.4–0.6) with observed data across all time scales. CHIRPS exhibited slightly better performance, especially in complex terrains, indicating a closer alignment with ground observations (refer to Fig. 4). The corresponding bias values generally fell within acceptable ranges, with higher values observed at daily time scales (0.4–0.6) compared to seasonal and annual (0.25–0.95) scales. Overall, both ERA5 and CHIRPS showed moderate correlations during the MAM season and the lowest values during the JJAS season (Fig. 4)

Fig. 4
figure 4

Comparison of the Pearson product-moment correlation coefficient (r) between CHIRPSv2 (upper panel), ERA5 (lower panel) and observations at the corresponding stations on a daily (left panel) and seasonal (MAM (middle) and JJAS (right panel) time scale. The Colors for the r performance range from intense red to intense green, representing very poor linear correlation, and a perfect positive linear correlation, respectively

Fig. 5
figure 5

Comparison of the bias ratio (ß) between CHIRPS (upper panel), ERA5 (lower panel) and observations at the corresponding stations on a daily (left panel) and seasonal (MAM, middle, and JJAS, right panel) time scale. Colors for ß range from intense red to dark blue representing a large under and overestimation of the observed precipitation amount, respectively

In most stations, both products tended to overestimate rainfall. However, ERA5 often underestimated rainfall, particularly in lowland areas. In terms of variability, both products tended to underestimate seasonal variability, with CHIRPS marginally better at capturing the inherent unpredictability of Ethiopian rainfall patterns.

CHIRPS exhibited slightly better performance than ERA5 in most stations, with a bias ratio ranging from 0.75 to 1.2 (Fig. 5). This suggests that CHIRPS may provide a more balanced daily rainfall prediction in certain Ethiopian regions at the daily time scale. Overall, for all time scales and seasons, except for the Belg (MAM) season, where both products exhibited a general overestimation in all time scales across all stations, CHIRPS and ERA5 tended to underestimate precipitation in lowland areas of the northeastern and eastern parts of the country (Refer to Fig. 5). This underestimation by CHIRPS is consistent with previous studies (e.g., Abera et al. 2016; 2014; Dinku et al. 2010), especially at Kiremt (JJAS) time scales. ERA5 showed the highest overestimation of precipitation for almost all the time scales, likely due to its limited use of synoptic station data from Ethiopia for model calibration and validation, in contrast to CHIRPS, which utilizes more than 78 stations for these purposes (Funk et al. 2015b).

Both ERA5 and CHIRPS generally underestimated precipitation variability (γ < 0.5) at all time scales. Notably, CHIRPS captured observed variability well at daily time scales in the central rift valley pocket area (between 6 and 8 latitude degrees North) and in northern Ethiopia. In contrast, ERA5 captured seasonal variability during the MAM season over Ethiopia. It is important to note that accurately representing observed precipitation variability is crucial for impact-based studies. Both ERA5 and CHIRPS exhibited high linear correlation and were nearly unbiased for monthly time scales (Refer to Fig. 6).

Fig. 6
figure 6

Comparison of the variability ratio (γ) between CHIRPS (upper panels), ERA5 (lower panels), and observations at the corresponding stations on a daily (left panel) and seasonal (MAM, middle and JJAS, right panel) time scale. Colors range from intense red to dark blue representing a large under and overestimation of the observed variability of precipitation, respectively

3.3 Cluster performance

In our analysis, CHIRPS and ERA5 displayed varying performances across different Ethiopian clusters. In the Eastern Plains (Cluster A), both products showed similar daily performances, but CHIRPS exhibited a relatively better performance in the JJAS season, as indicated by a 34.63 mm RMSE difference of ERA5 relative to CHIRPS (Table 2). In the North-Central Highlands (Cluster B), notable differences were observed during the JJAS season, with a substantial 96.79% PBIAS difference and a 307.61 mm RMSE difference, highlighting potential challenges for ERA5 in high-altitude regions. The Northwestern Highlands and the East-Central Highlands Cluster (Cluster C and D) also shows CHIRPS performing more effectively across all seasons, especially in the JJAS season with significant RMSE differences, suggesting its better suitability for mid-altitude and east-central highlands. In the South-Central Valley (Cluster E), CHIRPS again proved more accurate, particularly in the JJAS season with a notable RMSE difference of 93.22 mm, suitable for valley-like terrains. However, in the Southwestern Highlands Cluster (Cluster F), a mixed pattern emerged, with ERA5 showing a slightly lower daily bias relative to CHIRPS.

We also evaluated the performance categorical indices, focusing on skill metrics such as Percent of Correct (PC), Probability of Detection (POD), False Alarm Rate (FAR), and Frequency Bias (fBIAS). These were computed for each homogeneous cluster between CHIRPS, ERA5, and observations at the corresponding grid cell for the five classes of daily rainfall intensity defined in Table 1, summarized in a box plot shown in Figs. 7, 8, S3 and S4.

Fig. 7
figure 7

Percent of Correct (PC) between CHIRPS (red), ERA5 (blue) datasets and the observations at the corresponding grid cell, for daily five classes of rainfall events ([0,1), [1,5), [5,20), > 40 [mm/day]). Each panel represents individual clusters

Fig. 8
figure 8

Frequency bias (fBIAS) between CHIRPS (blue), ERA5 (yellow) datasets and the observations at the corresponding grid cell, for five different classes of daily rainfall events ([0,1), [1,5), [5,20), > 40 [mm/day]). Each panel represents individual clusters

Figure 7 shows both ERA5 and CHIRPS consistently exhibit high PC across all clusters, particularly for rainfall intensities exceeding 20 mm d−1, where the PC often exceeds 0.90. In Cluster A, for instance, the PC for heavy rainfall events ([20,40) mm d−1) is above 0.95 for both models, highlighting their effectiveness in predicting significant rainfall. However, this high PC, especially in clusters with frequent no-rain events, may be influenced by a high rate of correct negative predictions, demanding a more detailed analysis of other performance metrics. In addition, the POD analysis (Fig S1) reveals that both models generally perform well in detecting no-rain events across most clusters, with CHIRPS showing slightly superior performance (POD ≥ 0.75) compared to ERA5 (POD ≥ 0.62), except cluster F. However, ERA5 tends to have a higher POD for moderate rainfall events, especially in clusters such as B, C, and F cluster, where its ability to detect heavy rainfall (POD > 0.5) is notably better than CHIRPS.

Furthermore, the FAR results indicate that both ERA5 and CHIRPS effectively control false alarms for no-rain events across all clusters, typically maintaining FAR values below 0.2 (Fig S2). This aligns with the high PC and POD observed for no-rain events, suggesting both datasets’ ability to accurately identify these scenarios with minimal false alarms. However, the FAR increases for heavier rainfall intensities, implying a higher likelihood of false alarms during such conditions. This trend is particularly evident in all clusters, where the FAR for violent rain (> 40 mm d−1) events exceeds 0.7 for both datasets, posing a significant challenge in maintaining accuracy during high-intensity rainfall events.

Finally, the fBIAS analysis shown in (Fig. 8), combined with PC, POD, and FAR, paints a complex yet coherent picture of the models’ rainfall performance capabilities. In all clusters, both datasets show a balanced estimation of no-rain events (fBIAS ≈ 1), consistent with high PC and POD values, and low FAR, indicating an accurate representation of the absence of rainfall events with minimal bias. Equally, an overestimation was noted in lighter rainfall events ([1, 5) mm d−1), where fBIAS exceeds 1. This suggests that while sensitive to detecting light rainfall, the datasets tend to predict these events more frequently than observed, evidenced by moderate POD values and higher FAR in these clusters. On the other hand, ERA5 tends to underestimate the frequency of heavy and violent rainfall events, except in clusters B and C. This underestimation is reflected in lower POD for intense rainfall events and increased FAR, especially in Cluster E, F and D, indicating a discrepancy in accurately predicting these more extreme weather conditions. This analysis underscores the importance of region-specific application of these datasets, as their performance varies significantly across clusters and rainfall intensities.

4 Summary

Reanalysis and satellite-based rainfall estimates offer a unique opportunity to enhance our monitoring and comprehension of regional climates, with the potential to support various meteorological, agricultural, and hydrological applications. In this research, we assess the accuracy of ERA5 reanalysis data and CHIRPS satellite-based rainfall estimates by comparing them with data from 167 independent rain gauges in Ethiopia, a country with a challenging topography.

Each product performed differently for each area and for different timescales. At a daily timescale, CHIRPS performed best over highlands of Ethiopia. However, both products showed the lowest performance in the lowland areas of the country (eastern and southeastern Ethiopia). The main reason might be the poor rain gauge density networks used for data assimilation and calibrations in this area, an observation reinforcing the fact that coverage of the stations is one of the most important factors in determining the performance of gridded datasets (Cornes et al. 2018). A similar pattern was also observed in a few stations located in the highland part of the country (from 2000 to 3200 m a.s.l), although in this case CHIRPS performed better than ERA5. On a seasonal scale (for MAM, JJAS), both products presented a good performance at high altitude areas, while ERA5 presented the lowest performance over mid- to high-altitude regions. When looking at the RMSE, which tells us about prediction errors, both products mostly had values between 5 and 10 mm. CHIRPS was steady in the middle parts of Ethiopia, but ERA5 had a bit more error in the northeastern areas, especially around Afar. In addition, CHIRPS mostly predicted higher rainfall than what was observed, with most places showing a 20–60% overprediction. Some areas in the north and south even showed up to 80% overprediction. ERA5 was different: while it overpredicted in the central and eastern parts of Ethiopia as well, it underpredicted in the west near the Sudan border.

The linear correlation coefficient (r) between each product and the corresponding station observations in high-elevation areas varies between 0.25 and 0.95 at the seasonal timescale and between 0.25 and 0.64 at the daily timescale. Both products exhibit the highest correlation coefficient during the JJAS season and the lowest values during the MAM season. Furthermore, both products tend to overestimate rainfall over the central and western Ethiopian highlands across all time scales. In the northwestern highlands, CHIRPS shows nearly unbiased performance, while ERA5 tends to overestimate rainfall. This overestimation by ERA5 is likely due to its limited use of observed precipitation data for data assimilation and bias correction, in contrast to CHIRPS, which benefits from data from 78 observational stations (Funk et al. 2015c; Verdin et al. 2020). Both products significantly underestimate observed rainfall variability across almost all time scales (γ < 0.25), with the exception of the eastern and southeastern parts of the country, where variability is significantly overestimated (γ > 1.75). This finding implies that future improvements to the ERA5 and CHIRPS datasets should place a premium on representing rainfall variability realistically.

In terms of the Probability of Detection (POD), both products achieve a high POD (~ 1) for no-rain intensity in all clusters, while the POD is lower for high-rain events. Both products are relatively good in capturing moderate rain events in most clusters characterized by high altitudes and significant rainfall (Clusters C, B, F, except D). The Frequency Bias (fBias) exhibits higher variations for heavy (> 40 mm) and light rain events ([1, 5) mm/day), which aligns with the findings in the POD index. The False Alarm Ratio (FAR) shows consistency with the results obtained with POD, with low values for no-rain events.

5 Discussion

This study boarded on a comprehensive evaluation of two predominant rainfall estimation products, ERA5 reanalysis data and CHIRPS satellite-based rainfall estimates, in the diverse climatic and topographical context of Ethiopia. The primary objective was to understand the relative performance of these datasets in accurately representing rainfall across various Ethiopian landscapes, ranging from lowland regions to highlands extending above 3000 m. This investigation is particularly relevant given Ethiopia's susceptibility to climatic variabilities, which significantly impact local agriculture, hydrology, and overall weather patterns.

By comparing these products against observations from 167 rain gauges distributed across the country, the study aimed to provide insights into the spatial and temporal accuracy of ERA5 and CHIRPS, highlighting their strengths and limitations. The analysis focused on performance across different altitudes and timescales, temporal variability and bias, and homogenous rainfall cluster-specific performance. This detailed examination is crucial for understanding how these datasets can be optimally utilized in regional climate monitoring and forecasting and for guiding improvements in their methodologies to better cater to the needs of diverse and complex environments.

The comparative analysis between CHIRPS and ERA5 across varying altitudes in Ethiopia is particularly revealing. CHIRPS's superior performance at higher altitudes (> 2000 m a.s.l) on a daily scale is a critical finding. The superior detection of high-intensity rainfall events by CHIRPS aligns with the observations made by Kolluru and Kolluru (2021) emphasizing its effectiveness in challenging complex topographical settings (Kolluru et al. 2020). In contrast, ERA5’s underperformance in these regions could be attributed to its calibration methodologies, which may not adequately account for the complex dynamics of high-altitude precipitation. This is corroborated by Jiang et al. (2021) who noted significant biases in ERA5 under varied conditions, particularly in complex terrains. Such discrepancies highlight the necessity for tailored calibration approaches in reanalysis data, especially in regions with diverse topographies like Ethiopia.

The temporal dynamics and bias are also crucial in evaluating rainfall prediction models. In this context, CHIRPS’s more balanced daily rainfall predictions in Ethiopian regions are noteworthy. This result is aligned with previous studies which highlighted CHIRPS’s relatively good quality in measuring volumetric rainfall (Ayehu et al. 2018; Degefu et al. 2022). This indicates that CHIRPS algorithms are possibly more adjusted to the traces of daily precipitation cycles, particularly in areas with erratic rainfall patterns. ERA5, on the other hand, demonstrates certain limitations in this respect, likely restricting from its broader focus on global climatic patterns, which may not capture local variations as effectively.

The cluster-specific performance analysis is particularly revealing, demonstrating significant variations in CHIRPS and ERA5’s effectiveness across different regions and seasons. This echoes the findings of Gebremicael et al. (2019), who observed CHIRPS's general superiority in varied spatial and temporal scales compared to other products, including ERA5 (Gebremicael et al. 2019). This variation in performance can be attributed to CHIRPS more extensive local calibration and validation network, which seems to provide it with an edge in capturing the unique rainfall patterns of each cluster, especially in high-altitude areas. In contrast, ERA5 moderate performance could be due to its reliance on fewer observational inputs for calibration in the Ethiopian context, limiting its accuracy in region-specific rainfall predictions.

In conclusion, the study’s comparative approach reveals nuanced differences in the performance of CHIRPS and ERA5 across Ethiopia’s diverse topographical and climatic landscape. CHIRPS robust performance in high-altitude areas, particularly for daily rainfall prediction, highlights its potential as a more reliable tool for rainfall estimation in similar settings. Meanwhile, ERA5’s limitations in these areas underscore the need for enhanced calibration and validation methodologies, especially tailored to complex regions like Ethiopia. The findings of this study contribute significantly to the understanding of satellite-based and reanalysis rainfall datasets, offering crucial insights for their application in meteorological, agricultural, and hydrological studies, particularly in areas with diverse climatic and topographical conditions.