1 Introduction

Precipitation is one of the main forcing variables for the terrestrial hydrological cycle. This variable is crucial in the study and investigation of a multitude of situations, including hydroclimatic modeling, climate change, agriculture optimization, water resources modeling and management (Sun et al. 2018; Try et al. 2020). Considering its importance, the effective deployment of hydrological models relies heavily on precision, consistency, sustained availability, and the establishment of a well-distributed network of rain gauges (Tong et al. 2014; Essou et al. 2017; Cassalho et al. 2020; Schreiner-McGraw and Ajami 2020; Arshad et al. 2021; Dhungana et al., 2022). However, as pointed out by Dhungana et al. (2022), access to high-quality and fine spatial-temporal resolution precipitation data is still a challenge.

Usually, precipitation is measured and obtained through the rain gauge approach, providing a point-based surface rainfall (Tong et al. 2014; Li et al. 2017). Although the method is considered highly accurate in those specific locations where the rain gauges are installed (Villarini et al. 2008), the representation of the spatial distribution of precipitation data and patterns depends on the density and configuration of the gauge network (Baez-Villanueva et al., 2020).

As referred to by Woldemeskel et al. (2013), the gauge network is characterized by sparse distribution in many developing countries. This fact increases the uncertainty in the estimation of spatial precipitation patterns when only ground-based methods are used. According to Rozante et al. (2010), those sparse networks, most of the time also marked by missing data, are more common in remote or inaccessible areas due to the challenge of installing and maintaining the gauges. For instance, in South America, the observational network is sparse and irregular, with most of the stations distributed toward populated centers and the coastline (Rozante et al. 2010). This may limit the usage of data in hydrologic modeling (Monteiro et al. 2016).

This is the case of the Amazon and Cerrado biomes, where in the central areas the rain gauge distribution is not uniform and may not allow the correct characterization of rainfall patterns. Despite this challenge, the Amazon biome houses the largest river basin globally, spanning approximately 7,000,000 km2 —the Amazon River Basin. While both biomes, Amazônia and Cerrado encompass the Tocantins-Araguaia watershed, covering 920,000 km2 and standing as the largest watershed entirely within Brazilian territory (ANA, 2015).

To overcome part of the issues identified before, related to point rain gauges precipitation, different types of precipitation datasets are being constructed and made available in an operational way. Usually, those datasets are provided in a gridded format known as gridded precipitation products (GPP). Different from rain gauges, these products provide optimized global estimates of climate data without spatial or temporal gaps (Gleixner et al. 2020; Vega-Durán et al. 2021).

As Sun et al. (2018) refers, GPPs have seen a significant increase in their use for a diverse range of applications, such as the quantification of climate-change trends, the study of climatology in different regions, or even the monitoring of droughts and floods (Solman and Orlanski 2016; Do et al. 2020; McClean et al. 2023). They are also being increasingly used in hydrological applications, as a substitute for surface precipitation, temperature, wind, and other meteorological parameters (Cassalho et al. 2020; Tarek et al. 2020). Thus, GPPs can represent a good alternative dataset of precipitation data for regions with sparse distribution or nonexistent stations (Essou et al. 2017; Cassalho et al. 2020).

According to Degefu et al. (2022), GPPs can be grouped into four classes, namely, gauge interpolated, satellite-based precipitation, satellite-gauge merged, and reanalysis products. Gauge interpolated products are developed through the interpolation or gridding of the gauge observations, while satellite-based precipitation products are those derived from satellites orbiting Earth. The satellite-gauge merged precipitation results from the combination of satellite precipitation data with gauge or radar observation, with the latter being used to correct the former (Rozante et al. 2010). Finally, the reanalysis products are atmospheric numerical models that consider ground or satellite observations via data assimilation.

Among the different types of GPPs, the reanalysis has the added value of providing a great number of variables besides precipitation. It also presents continuous time series, avoiding issues related to missing data, and facilitates convenient accessibility to a wide array of information. Also, it shows a continuous time series, avoiding problems with missing data, and it allows easy accessibility to data. However, they also have limitations, mainly related to accuracy.

The accuracy of GPPs is intimately related to the topography and the spatial resolution adopted, which is relatively coarse, causing errors in the estimation of the precipitation, and, commonly, their data are biased in comparison to observational data (Dinku et al. 2011; Ali et al. 2017; Gleixner et al. 2020). Also, in reanalysis products, the precipitation variable is not directly assimilated from observational data, such as surface pressure, temperature, humidity, wind components, and others (Hersbach et al. 2020). It is rather a result of the physics within the model, and thus it is the most challenging variable to simulate accurately (Tapiador et al. 2012; Li et al. 2021).

With hydrological models completely dependent on the accuracy of rainfall data, it is required that the spatial distribution of precipitation input data over large regions is well represented at high temporal resolutions (minutes to daily) (Schreiner-McGraw and Ajami 2020). Thus, the evaluation of GPPs for different time scales, periods, and study areas is crucial to determine the reliability and limitations of these products in different regions (Rozante et al. 2010; Ali et al., 2017; Arshad et al. 2021).

Many known precipitation products showed good behavior in different regions of the world. There is the Climate Hazards group InfraRed Precipitation (CHIRP) in Etiopia (Degefu et al. 2022); Climate Prediction Center morphing method (CMORPH) in East Africa (Sahlu et al. 2017); Global Precipitation Climatology Centre (GPCC), Global Precipitation Climatology Project (GPCP) and Climatic Research Unit (CRU) for West and Central Africa (Kouakou et al. 2023); also, the Multi-Source Weighted-Ensemble Precipitation (MSWEP V2.0) evaluated for the globe, by Beck et al. (2017).

Although all showed good performance, these products are mainly focused on precipitation. But in hydrological models’ other parameters are also required. In this aspect, meteorological models, such as reanalysis, have an advantage, because they provide all the information required as a boundary condition in a hydrological-hydrodynamic model. These include parameters beyond precipitation, such as temperature, wind, solar radiation, and humidity.

Then, considering reanalysis models, two models have been improved since 2002 (Li et al. 2022), namely European Centre for Medium-Range Weather Forecasts ReAnalysis version 5 (ERA5; Hersbach et al. 2018); and Modern-Era Retrospective Analysis for Research and Applications Version 2 (MERRA-2, Gelaro et al. 2017). Both models are well-established reanalysis products (Hassler and Lauer 2021), that assimilate a vast array of different information (Gelaro et al. 2017; Hersbach et al. 2020), and MERRA-2 includes a bias-corrected surface precipitation variable. ERA5 also has a downscaled land dataset, named ERA5-Land (0.1◦, 1 hourly) (Xu et al. 2022a). However, literature suggests no added improvement from ERA5 to ERA5-Land related to precipitation (Xu et al. 2022a; Gomis-Cebolla et al. 2023).

In terms of temporal resolution, both ERA5 and MERRA-2 have an hourly time step, which can improve the hydrological simulation. Also, ERA5 has been previously used in hydrological models implemented with the MOHID Land model (Oliveira et al. 2020, 2024), which was used to implement the Tocantins-Araguaia watershed and Pará river estuary models.

Hence, the main goal of this study is to assess the accuracy of two specific GPPs datasets by comparing them with two observational datasets and evaluate their potential to be used as atmospheric boundary conditions in a hydrological-hydrodynamic model applied to the Tocantins-Araguaia watershed and Pará river estuary. Therefore, two reanalysis products were evaluated here, namely, the ERA5 and the MERRA-2.

2 Methodology

2.1 Study area

The study area corresponds to the Tocantins-Araguaia watershed (TAW), located in Brazil (Fig. 1). The watershed extends through five states, Goiás, Mato Grosso, Tocantins, Maranhão, and Pará. It has an area of approximately 920.000 km2, occupying nearly 11% of the Brazilian territory (Serrão et al. 2021). The main rivers are Tocantins and Araguaia (Lima and Loureiro 2021). The watershed is divided into four main sectors, the Araguaia river and the Tocantins river (Fig. 1a), the Pará river (Fig. 1b), and the Lower Tocantins river (Fig. 1c). The Tocantins and Araguaia rivers confluence in the border of three states, namely, Pará, Tocantins and Maranhão, and after that point it becomes the lower Tocantins river.

As it approaches the coastal zone, the lower Tocantins river flows into the Pará river. This sector receives water from the Lower Tocantins, and it’s connected to the Amazon river via the Breves Strait. Camargo (1973) suggests that the connection between these two watersheds occurred during the “big thaw” in the Quaternary period. Despite each river’s discharge, Barthem and Schwassmann (1994) found an inflow from the Amazon River into the Pará River. Subsequently, the Pará river discharges into the Atlantic Ocean.

According to the Ministério do Meio Ambiente (MMA, 2006), the annual average flow in the watershed between 1931 and 2001 was approximately 13,600 m3.s−1. Prestes et al. (2020) implemented a simple box model to estimate the watershed outflow, which was the sum of different inputs of freshwater and assumed a stationary flow. It was based on historical discharge average from continental stream flows, including the Amazon River, Tocantins River, and minor tributaries, and estimated an annual discharge of approximately 21,000 m3.s−1.

Both Barthem and Schwassmann (1994) and Prestes et al. (2020) observed a shift in the influence of the main river discharges in Pará river. During the rainy season, the Tocantins river contributes significantly to the fluvial waters entering the Pará river. However, in the dry season, the streamflow of the Tocantins River decreases primarily due to the reduced rainfall in the region, which allows an increase of Amazon river flow through the Breves Strait into the Pará river.

TAW has a Köppen tropical climate (Aw), with a mean annual temperature of 26ºC. It is characterized by two defined seasons: a dry winter, with a reduction of rainy days, named as dry season (May to September); and a wet summer, with an increase in rainy days and rain intensity, named as rainy season (October to April) (MMA, 2006; Conejo et al. 2007). According to MMA (2006) the Convergence Zone (ITCZ) and Squall Lines near the equator, and the South Atlantic Convergence Zone (SACZ) at the south of the watershed (Reboita et al. 2010; Luiz-Silva et al. 2021). The overall wind regime in the watershed is characterized by calm trade winds, with annual average values of 1.3 m.s−1(MMA, 2006).

According to Conejo et al. (2007), the annual average precipitation is approximately 1840 mm, calculated for the period from 1961 to 1990. The average annual precipitation was determined from the isohyet map through kriging interpolation of the Total Precipitation Normals from the National Institute of Meteorology (INMET), using 204 rain gauges for Brazil (Conejo et al. 2007). ANA (2009) estimated an annual average precipitation of approximately 1740 mm, which increases from south to north, with values from 1500 mm in Brasília to 3000 mm in Belém, and decreases from west to east, with values around 1800 mm to 1200 mm, respectively.

The Amazon biome predominates in the northwestern part of the basin, mainly at the Lower Tocantins and Pará river areas, while Cerrado biome occupies the remaining basin area. Both biomes are the largest ecological formations in Brazil, with greater biological diversity.

Fig. 1
figure 1

Tocantins-Araguaia watershed localization at left, and emphasis over each watershed section at right. a The Tocantins-Araguaia watershed, showing the main rivers, Araguaia river at left and Tocantins river at right, b the Pará river estuary, at north of the watershed, c the lower Tocantins river; and a d map focused on Brazil with red dots representing each State Capital

2.2 Rain gauges datasets

In TAW area, there are two governmental organizations operating rain gauge networks, namely, the Brazilian National Meteorologic Institute (Instituto Nacional de Meteorologia, INMET) and the Water National Agency (Agência Nacional das Águas, ANA).

Considering the INMET database, 340 automatic gauges can be found in TAW domain and the surroundings (INMET, 2022). When downloading the data, the filter applied to avoid the use of non-automatic stations decreased the probability of errors and allowed greater efficiency and data quality (Moura et al. 2005). However, it was verified that the INMET’s automatic rain gauge distribution was not uniform and did not cover the whole watershed. In fact, the rainfall patterns of the Araguaia River sub-basin could not be represented since the majority of the stations are located southeast of the Tocantins river. A similar issue was found by Clarke et al. (2011). Nonetheless, the INMET rain gauge network’s main advantage relies on the fact that it provides observed precipitation values with an hourly time step.

According to ANA’s database, approximately 6300 rain gauges are located in TAW and the surrounding areas (ANA, 2022). Those stations provide daily accumulated precipitation values and comprehend automatic and non-automatic operationality.

The data from INMET and ANA’s selected rain gauges was downloaded for the period of 12 years comprehended between 2010 and 2021. Then, a second filter was applied to the stations of both datasets based on their data. That filtering task followed three main criteria: (i) precipitation data must be initiated in 2010; (ii) the dataset must cover at least ten years of data; and (iii) the station must be located inside or near the watershed.

Data quality control was then applied following the World Meteorological Organization (2017). Accordingly, two rules must be followed to minimize the uncertainties introduced by missing data. It is required to ignore a month if (i) observations in the analyzed month are missing for 11 or more days; and (ii) observations are missing for a period of 5 or more consecutive days during the analyzed month.

Besides the filtering process, the rain gauge datasets still have gaps at different periods in each gauge, caused by instrumental malfunctioning, maintenance period, and other reasons. Those gaps were identified in the observational datasets and the missing days were ignored in the analysis by removing the corresponding values in the reanalysis datasets. Hours with gaps in the INMET time series were marked as NaN, and subsequently the specific day was removed from the analysis. Thus, no other methodologies to fill the gaps were applied avoiding wrong estimations of precipitation in the absence of data (Xu et al. 2022b).

The application of those criteria resulted in a dataset composed of 32 stations from INMET and 729 stations from the ANA database. As shown in Fig. 2, INMET’s stations are concentrated in the east part of the Tocantins-Araguaia watershed, with almost no coverage in the Araguaia River sub-basin. ANA’s stations showed a more uniform and comprehensive spatial coverage. The observational datasets were accumulated into daily, monthly, annual, and climatology time steps.

Fig. 2
figure 2

Map distribution of the observational rain gauges from INMET and ANA database. Red dots represent INMET gauges and blue dots represent ANA rain gauges

2.3 Meteorological reanalysis products

Reanalysis models are reconstructions of the atmosphere state, using data assimilation methods. They rely on a meteorological forecast model that combines observations in a physical solver, producing gridded data for a different range of variables. According to Bosilovich et al. (2016), reanalysis integrates satellite-based and observational data into a modeling framework, that provides datasets continuous in space and time.

Two global meteorological reanalysis products were here validated for precipitation, namely, ERA5 and MERRA-2 for the period from 2010 to 2021. Developed by the Copernicus Climate Change Service (C3S) and by the European Centre for Medium-Range Weather Forecasts (ECMWF), ERA5 is the fifth generation of atmospheric reanalysis to be produced for the global climate and weather analysis (Hersbach et al. 2020). It provides hourly estimates of several atmospheric, land, and oceanic climate variables with a spatial resolution of 0.25º×0.25º, from 1950 to the present. For this research, the ERA5 reanalysis precipitation dataset was downloaded from the C3S climate data store (Hersbach et al. 2018). Since TAW area is characterized by the absence of snow events, the ERA5 variable considered to represent the precipitation was the total precipitation.

The MERRA-2 model is an atmospheric reanalysis that uses the Goddard Earth Observing System (GEOS 5.12.4) atmospheric data assimilation system and was developed by the National Aeronautics and Space Administration (NASA). MERRA-2 is an upgraded version of the MERRA model and has a spatial resolution of 0.5ºx 0.625º and an hourly timestep, from 1980 to the present. The MERRA-2 reanalysis precipitation dataset was downloaded from the GesDisc website (GMAO, 2015). The variable PRECTOTCORR, which corresponds to a bias-corrected total precipitation, was here considered for validation and its values between 2010 and 2021 were downloaded. As for the observational datasets, the models’ hourly precipitation was accumulated considering daily, monthly, annual, and climatology time steps.

2.4 Point-to-pixel approach

To evaluate the comparison of precipitation between observational datasets and reanalysis products, it was applied the point-to-pixel approach. This method consists of the identification of the closest cell of the model grid to the analyzed station and then, the extraction of the values of that cell for comparison purposes (Zhu et al. 2021; Kim and Lee 2022; Wu et al. 2022). This analysis followed the guidelines from World Meteorological Organization (2017).

2.5 Climatology: watershed approach

To determine the watershed precipitation, it was applied the Thiessen polygon methodology based on the observational stations resulting from the filtering process. The Thiessen polygon methodology creates polygons based on the rain gauge distribution, and it assigns a weight for each polygon calculated as the ratio of the polygon’s area that relies upon the watershed and its area (Cheng et al. 2012). The precipitation over the watershed is then estimated by multiplying the precipitation values found in each station by the respective polygon weight and summing the resulting values. The watershed’s precipitation was calculated on a daily scale. ANA’s observational precipitation data was directly used in this calculation, while INMET’s hourly precipitation data was initially accumulated into a daily timeframe.

The precipitation from the meteorological reanalysis data was extracted using a shapefile delineating the watershed boundaries, resulting in an average precipitation time series specific to the watershed. Initially available at hourly intervals, the time series were accumulated into daily values. This approach yielded the final time series encompassing the watershed, comprising: (i) ANA time series; (ii) INMET time series; (iii) ERA5 time series; and (iv) MERRA time series.

2.6 Evaluation methods and metrics

The precipitation analysis was performed over the 12-year period from 2010 to 2021, which is common to all downloaded datasets (INMET, ANA, ERA5, and MERRA-2). The comparisons between the observational and reanalysis datasets were performed visually and estimating five goodness-of-fit indicators, namely, the Pearson correlation (R), the coefficient of determination (R2), the Nash-Sutcliffe Efficiency (NSE), the root mean square error (RMSE), and the percent bias (PBIAS):

$$R=\frac{\sum _{i=1}^{n}\left({O}_{i}-\stackrel{-}{O}\right)\left({M}_{i}-\stackrel{-}{M}\right)}{\sqrt{\left(\sum _{i=1}^{n}{({O}_{i}-\stackrel{-}{O}))}^{2}\right(\sum _{i=1}^{n}{({M}_{i}-\stackrel{-}{M}))}^{2}}}$$
(1)
$${R}^{2}={\left(R\right)}^{2}$$
(2)
$$NSE=1-\frac{\sum _{i=1}^{n}{\left({O}_{i}{-M}_{i}\right)}^{2}}{\sum _{i=1}^{n}{\left({O}_{i}-\stackrel{-}{O}\right)}^{2}}$$
(3)
$$RMSE=\frac{1}{n}\sum _{i=1}^{n}{{(O}_{i}-{M}_{i})}^{2}$$
(4)
$$PBIAS=\left[\frac{\sum _{i=1}^{n}{(O}_{i}-{M}_{i})}{\sum _{i=1}^{n}{O}_{i}}\right] x100$$
(5)

Where \({O}_{i}\) and \({M}_{i}\) represent observational data and modeled reanalysis precipitation values, respectively, n represents the total number of days, months, or years in the analyzed period, and \(\stackrel{-}{O}\) and \(\stackrel{-}{M}\) are the mean of the observed and reanalysis values in that same period, respectively.

The Pearson correlation varies from − 1 to 1, where 1 is a positive linear correlation, -1 is a negative linear correlation, and 0 represents the independence of the variables, meaning that no correlation exists between the variables (Benesty et al. 2009; Wehbe et al. 2017; Profillidis and Botzoris 2019). The coefficient of determination equals the square of the correlation coefficient, and it can take values between 0 and 1 (Dodge 2008), with values closer to 1 indicating better fits between the compared data.

NSE can range from –∞ to 1, where the closer the parameter is to 1, the more accurate is the estimation (Moriasi et al. 2007; Wehbe et al. 2017). The RMSE reflects the mean error, indicating the degree of difference between the model results and the observed data (Carrasco et al. 2018). PBIAS explains the tendency of the reanalysis underestimation (positive value), or overestimation (negative value) compared to the observed data (Gupta et al. 1999).

The range of the goodness of fit values adopted here are based on hydrological studies (Table 1). Goodness-of-fit ranging between − 10% and 10% for PBIAS, above 0.7 for NSE and R are very good results (Ratner 2009; Cardoso de Salis et al. 2019), while satisfactory results can be achieved if NSE above 0.50, R2 above 0.60, and PBIAS between − 15% and 15% (Moriasi et al. 2007). Goodness-of-fit indicators were estimated for each station of INMET and ANA datasets combined individually with ERA5 and MERRA-2 products. Then, the average, minimum and maximum values of those indicators considering all the stations were estimated.

Table 1 Goodness-of-fit indicators and their acceptable range based on Ratner (2009), Cardoso De Salis et al. (2019), and Moriasi et al. (2007)

3 Results

3.1 Daily validation

3.1.1 Point-to-pixel approach

The reanalysis products poorly represented daily precipitation based on INMET and ANA datasets (Tables 2 and 3). Figures 3, 4, 5 and 6 display the goodness of fit results for NSE, R, R², and PBIAS, respectively, for each station within the study area. These results are presented across four-time scales: daily, monthly, annual, and climatology, for ERA5 and MERRA-2.

Table 2 Evaluation statistics of daily to climatology precipitation over the Tocantins-Araguaia watershed using the INMET dataset and the reanalysis products, ERA5 .And MERRA-2
Table 3 Evaluation statistics of daily to the climatology precipitation over the Tocantins-Araguaia watershed using the ANA dataset and the reanalysis products, ERA5 .And MERRA-2
Fig. 3
figure 3

Regional maps showing NSE results for all stations from INMET and ANA datasets. a ERA5 at daily scale; b MERRA-2 at daily scale; c ERA5 at monthly scale; d MERRA-2 at monthly scale; e ERA5 at annual scale; f MERRA-2 at annual scale; g ERA5 at climatology scale; and h MERRA-2 at climatology scale. Stars represent INMET gauges, and circles indicate ANA gauges

Fig. 4
figure 4

Regional maps showing R results for all stations from INMET and ANA datasets. a ERA5 at daily scale; b MERRA-2 at daily scale; c ERA5 at monthly scale; d MERRA-2 at monthly scale; e ERA5 at annual scale; f MERRA-2 at annual scale; g ERA5 at climatology scale; and h MERRA-2 at climatology scale. Stars represent INMET gauges, and circles indicate ANA gauges

Fig. 5
figure 5

Regional maps showing R2 results for all stations from INMET and ANA datasets. a ERA5 at daily scale; b MERRA-2 at daily scale; c ERA5 at monthly scale; d MERRA-2 at monthly scale; e ERA5 at annual scale; f MERRA-2 at annual scale; g ERA5 at climatology scale; and h MERRA-2 at climatology scale. Stars represent INMET gauges, and circles indicate ANA gauges

Fig. 6
figure 6

Regional maps showing PBIAS results for all stations from INMET and ANA datasets. a ERA5 at daily scale; b MERRA-2 at daily scale; c ERA5 at monthly scale; d MERRA-2 at monthly scale; e ERA5 at annual scale; f MERRA-2 at annual scale; g ERA5 at climatology scale; and h MERRA-2 at climatology scale. Stars represent INMET gauges, and circles indicate ANA gauges. Red color represent overestimation, while blue represents underestimation

For the INMET dataset, ERA5 had an average RMSE of 9.30 mm, NSE of 0.13, R of 0.46, R2 of 0.22, and PBIAS of 0.91%. MERRA-2 showed an average RMSE of 8.12 mm, NSE of 0.34, R of 0.59, R2 of 0.35, and PBIAS of 9.30%. Although the PBIAS for MERRA-2 shows a higher value than ERA5, the remaining statistical parameters indicate that MERRA-2 has a slightly better performance in representing the daily precipitation (Table 2).

Considering ANA’s dataset, only the average PBIAS showed acceptable results. ERA5 showed an average RMSE, NSE, R, R2, and PBIAS equal to 11.04 mm, 0.02, 0.37, 0.14, and − 0.25%, respectively, and MERRA-2 showed a RMSE of 10.72 mm, a NSE of 0.08, a R of 0.38, a R2 of 0.15, and a PBIAS of 9.89%, respectively (Table 3).

The average NSE for ERA5 and MERRA-2 is lower than that proposed by Moriasi et al. (2007) to accept the models as satisfactory. Similarly, the R2 fails to meet the recommended value, higher than 0.6. The best results were found comparing INMET and MERRA-2 data, in which the highest R² achieved in this analysis was 0.47, and the average was 0.35. The average PBIAS values are in the acceptable range. ERA5 shows a small underestimation of daily precipitation compared with INMET, and a slight overestimation compared with the ANA dataset. MERRA-2 underestimates the precipitation considering both datasets.

Evaluating the 32 stations of INMET, ERA5 showed two stations with satisfactory PBIAS (± 15%), and 17 stations with very good PBIAS (± 10%). Of the 729 stations from the ANA dataset, 159 stations showed a satisfactory PBIAS, while 325 showed a very good PBIAS. Compared with INMET, MERRA-2 showed that 19 stations had a PBIAS between ± 10%, and 6 stations between ± 15%. Only one station showed negative PBIAS, which illustrates that MERRA-2 is mainly underestimated (Fig. 6b). For ANA, 144 stations showed a PBIAS between ± 15%, and 205 between ± 10%. Thus, ERA5 showed the best mean PBIAS and had more stations that could properly reproduce the precipitation (Fig. 6a).

For INMET, the best RMSE average, maximum, and minimum values were found for MERRA-2, in comparison with ERA5 (Table 2). Similar behavior was observed for ANA (Table 3). The NSE, R, and R2 values for both reanalysis datasets at the daily scale are deemed unacceptable. These are visible in Figs. 3, 4, 5 and 6, items a and b. For more detailed results, refer to Tables 2 and 3.

3.1.2 Daily watershed precipitation

The INMET precipitation shows higher peaks than the ANA dataset, which, in rainy season in 2012, 2018, and 2019 there are days that the precipitation is higher than 30 mm.day−1. While for ANA, only in 2018 the precipitation is above 30 mm.day−1. Figure 7 shows the daily watershed precipitation for INMET and ANA datasets in color black.

The primary difference between the two observational data lies in the precipitation record. INMET registers high peak values but shows numerous days with little to no precipitation (Fig. 7a). Whereas ANA consistently records some level of precipitation. Over the 12 years, INMET indicated 250 days without any precipitation during the dry season, whereas ANA recorded only 3 days without any measured precipitation. In the rainy season, INMET documented 202 days with precipitation values lower than 1 mm.day⁻¹, whereas ANA exhibited 19 days with precipitation levels below 1 mm.day⁻¹. Consequently, ANA demonstrates a higher average precipitation compared to INMET.

In Fig. 7a-b, it’s visible that ERA5 and MERRA-2 have smaller precipitation peaks throughout the year compared to the observational data. The average precipitation recorded by INMET and ANA ranges from 0 mm to 35 mm, whereas the reanalysis models show lower range: 0 mm to 15 mm for ERA5, and 0 mm to 10 mm for MERRA-2.

Fig. 7
figure 7

Daily precipitation (mm) over the Tocantins-Araguaia watershed for the INMET and ANA datasets and the reanalysis products. a INMET (Thiessen polygon approach), ERA5, and MERRA-2 daily precipitation for the watershed. b ANA (Thiessen polygon approach), ERA5, and MERRA-2 daily precipitation for the watershed. Black line with dot represents the observational data; red line corresponds to the ERA5 precipitation data; and the blue line is the MERRA-2 precipitation data

3.2 Monthly validation

3.2.1 Point-to-pixel approach

At monthly timescale, both reanalysis products show an improvement in simulating the accumulated precipitation, see Figs. 3, 4, 5 and 6, items c and d. In general, all the average goodness-of-fit for ERA5 and MERRA-2 compared with INMET showed satisfactory to very good results (Table 2). The ERA5 average RMSE, NSE, R, R2 and PBIAS were 59.24 mm, 0.70, 0.88, 0.78 and 0.91%, respectively. Unsatisfactory NSE was verified for only three stations, while only two were found for R2. Seventeen stations showed very good PBIAS, and two stations showed satisfactory results.

The MERRA-2 obtained 46.65 mm for the average RMSE, 0.82, 0.93 and 0.86 for the average NSE, R, and R2, respectively, and an average PBIAS of 9.30%. For NSE, two stations showed satisfactory results, and 30 stations showed very good results. All stations showed values above 0.7 for R2. Six stations showed satisfactory results and 19 stations showed very good results for PBIAS.

The comparison of ERA5 with ANA’s dataset resulted in an average RMSE, NSE, R, R2, and PBIAS of 70.39 mm, 0.53, 0.85, 0.73, and − 0.25%, respectively (Table 3). Among the stations, 575 showed NSE above 0.6, 689 stations had a R2 above 0.5, and 484 stations had PBIAS values between ± 15%. For MERRA-2 and ANA’s stations, the average RMSE was 70.71 mm, PBIAS was 9.89%, and NSE, R, and R2 were 0.59, 0.84, and 0.72, respectively. Also, 560 stations showed NSE above 0.6, 674 stations had R2 above 0.5, and 349 stations had values between ± 15% (PBIAS).

Both reanalysis demonstrated an improvement in the goodness of fit parameters between their predictions and observational data. Overall, MERRA-2 exhibited smaller errors compared to INMET, and ERA5 displayed smaller errors compared to ANA. Additionally, the findings indicate that the reanalysis models have a better performance in estimating the monthly precipitation rather than daily precipitation.

3.2.2 Monthly watershed precipitation

The accumulation of daily to monthly precipitation generates higher values for the ANA data compared to INMET data, as illustrated in Fig. 8. ERA5 followed the general precipitation pattern, displaying satisfactory performance in capturing high precipitation values during the rainy season. In the dry season, ERA5 tends to overestimate the precipitation, failing to meet minimum values observed at the watershed scale. MERRA-2 shows an inadequate representation of precipitation across both rainy and dry seasons. Figure 8a showcases a comparison between INMET and the reanalysis products, while Fig. 8b shows ANA and the reanalysis products.

Fig. 8
figure 8

Monthly precipitation (mm) over the Tocantins-Araguaia watershed in the period of 2010 to 2021. a Precipitation comparison between INMET, ERA5 and MERRA-2; b Precipitation comparison between ANA, ERA5 and MERRA-2. Black line with dot represents the observational data; red line corresponds to the ERA5 precipitation data; and the blue line is the MERRA-2 precipitation data

3.3 Annual validation

3.3.1 Point-to-pixel approach

Evaluating the annual goodness-of-fit results, ERA5 showed high average RMSE values (293.96 mm), the maximum value found was 1304.80 mm, and the minimum of 101.77 mm. It also showed poor average NSE (0.16), but good results for R results (0.79), R2 values (0.66), and PBIAS (0.91%). Only 16 stations showed NSE values above 0.6, 25 stations had R values above 0.7, and 24 stations had satisfactory R2 values (> 0.6).

MERRA-2 had lower errors (average RMSE of 242.17 mm) than ERA5. The average NSE relies on a range of unacceptable values (0.44). The average R, R2, and PBIAS showed satisfactory to good results, 0.86, 0.76, and 9.30% respectively. Of the 32 stations, 23 had NSE values above 0.5, 29 stations had R values above 0.7, and 27 stations showed R2 values above 0.6.

As was verified in the other time scales, ERA5 and MERRA-2 performed worse when compared to the ANA dataset, and higher errors for NSE and RMSE were found. Both reanalysis showed an average RMSE above 300 mm, NSE is negative, R values are lower than 0.7, and R2 is lower than 0.6. The PBIAS is the only parameter relying on the recommended range (ERA5=-0.25%, and MERRA-2 = 9.89%).

From the set of 729 stations from ANA’s dataset, 503 had unacceptable values for NSE, 282 stations showed R below 0.7, and 389 stations were unacceptable for R2. For MERRA-2 the inconsistencies are even higher, with 575 stations showing NSE values below 0.5, 339 stations showing R values below 0.7, and 437 stations having R2 values lower than 0.6. For a detailed visualization of the error distribution, refer to Figs. 3, 4, 5 and 6, items e and f.

3.3.2 Annual watershed precipitation

The annual observational precipitation ranges from 1600 mm.year−1 to 900 mm.year−1, while ERA5 ranges from 1800 mm.year−1 to 1400 mm.year−1. MERRA-2 indicates relatively consistent precipitation levels throughout the years, fluctuating from 1400 to 1100 mm.year−1 (Fig. 9). Figure 9a highlights that neither ERA5 nor MERRA-2 aligns with the precipitation found in INMET data. However, Fig. 9b shows that when comparing ERA5 with ANA, there is a similar pattern. However, ERA5 tends to overestimate precipitation levels.

Fig. 9
figure 9

Annual accumulated precipitation (mm) over the Tocantins-Araguaia watershed in the period of 2010 to 2021. a Precipitation comparison between INMET, ERA5 and MERRA-2; b Precipitation comparison between ANA, ERA5 and MERRA-2. Black line with dot represents the observational data; red line corresponds to the ERA5 precipitation data; and the blue line is the MERRA-2 precipitation data

3.3.3 Climatology

The climatological precipitation analysis showed very good results based on the average values of the goodness-of-fit parameters. For INMET, MERRA-2 showed the strongest fit, with an average RMSE of 20.10 mm, NSE of 0.91, R of 0.99, R2 of 0.98, and PBIAS of 9.30%. ERA5 showed the best results for PBIAS (0.86%), but overall, the other metrics also had good performance, with RMSE of 25.15 mm, NSE of 0.82, R of 0.98, and R2 of 0.96 (Table 2).

Three stations had unsatisfactory NSE results for ERA5 and only one for MERRA-2. PBIAS showed a wide range of values, -54–39% for ERA5, and − 32–26% for MERRA-2. The RMSE followed a similar trend as PBIAS, with stations showing higher positive or lower negative PBIAS values having the highest RMSE values. R and R2 metrics were deemed good for all INMET stations for both reanalysis.

In the evaluation of the ANA dataset, ERA5 and MERRA-2 showed similar metrics values. ERA5 displayed an average RMSE of 31.54, NSE of 0.67, R of 0.96, R2 of 0.94, and PBIAS of 0.28%. MERRA-2 showed results of an average RMSE of 34.29, NSE of 0.75, R of 0.96, R2 of 0.94, and PBIAS of 10.31% (Table 3). ERA5 had 58 stations with unacceptable NSE values. Additionally, 14 stations exhibited poor R2, and 486 stations demonstrated a PBIAS within the range of satisfactory to very good results. Different from INMET, the RMSE values did not align with the PBIAS trend. MERRA-2 had a total of 39 stations with poor NSE results, and 14 stations displayed R2 values below 0.6. Also, only 352 stations are in the range of satisfactory to very good PBIAS results. For a detailed visualization of the error distribution, refer to Figs. 3, 4, 5 and 6, items g and h.

3.4 Climatology at watershed scale

Overall, the observational climatology suggests that ANA’s precipitation values tended to be higher than those from INMET, as stated in previous timeframes (Table 4). Particularly, ANA recorded higher precipitation throughout the year. During the wet months (November to April), ANA averaged 40 mm.month−1 more than INMET, and in dry months, recorded 6 mm.month−1 higher than INMET. Consequently, the INMET shows a lower annual climatological precipitation, of 1308 mm, while ANA showed values of 1530 mm.

Table 4 Monthly climatology precipitation for the Tocantins-Araguaia watershed for the period of 2010 to 2021

Also, the reanalysis products exhibited variations in seasonal performance (Table 5). ERA5 showcased a higher average value for the rainy season (191 mm) compared to MERRA-2 (119 mm). However, the differences between them are narrowed to 9 mm during the dry season. This is evident when comparing INMET with the reanalysis data. In the rainy season, ERA5 demonstrates a closer alignment with the observational data compared to MERRA-2. The RMSE is 19 mm for ERA5 and 53 mm for MERRA-2. However, in the dry season, the reanalysis products tend to overestimate the precipitation, resulting in high RMSE values, 49 mm for ERA5, and 41 mm for MERRA-2.

Table 5 Average precipitation and root mean square error per season of the climatology in the Tocantins-Araguaia watershed

Assessing ANA, the inaccuracy in MERRA-2’s performance becomes more evident, mainly because ANA’s precipitation has higher values. During the rainy season, ERA5 displayed an RMSE of 21 mm, whereas MERRA-2 showed a substantially higher RMSE of 94 mm. In the dry season, ERA5 showed a higher RMSE (53 mm) than MERRA-2 (44 mm).

ERA5 had higher annual precipitation (1721 mm) than MERRA-2 (1233 mm) and surpassed the observational datasets as well. This discrepancy led to high RMSE values for ERA5 when compared to INMET (412 mm), and lower RMSE values for MERRA-2 compared to INMET (75 mm). However, considering the ANA dataset, ERA5 (191 mm) displayed lower RMSE than MERRA-2 (296 mm). Figure 10 illustrates the average monthly climatology precipitation from 2010 to 2021 for each product.

Fig. 10
figure 10

Monthly climatology precipitation (mm) over the Tocantins-Araguaia watershed in the period of 2010 to 2021. A Precipitation comparison between INMET, ERA5 and MERRA-2; b Precipitation comparison between ANA, ERA5 and MERRA-2. Black line with dot represents the observational data; red line corresponds to the ERA5 precipitation data; and the blue line is the MERRA-2 precipitation data

4 Discussion

4.1 Point-to-pixel approach

The comparisons performed in this research were considered between two observational datasets, INMET and ANA, and two gridded precipitation products, ERA5 and MERRA-2, in a point-to-pixel approach and at the watershed scale. The results showed a better performance of MERRA-2 when compared with INMET data, and poor results compared to the ANA dataset. ERA5 showed similar performances when compared with INMET and ANA datasets.

Considering the different time frequencies analyzed, MERRA-2 demonstrated a better fit with observational data from INMET and ANA for daily precipitation than ERA5. However, the average goodness-of-fit values showed an unsatisfactory performance.

For monthly precipitation and the INMET dataset, MERRA-2 obtained a better performance. However, when the ANA dataset is considered, ERA5 and MERRA-2 had similar performance. For the annual analysis, MERRA-2 also showed better results for INMET. ERA5 performance is slightly better than MERRA-2 for ANA. Finally, the comparison of the climatology showed a better fit between the MERRA-2 and INMET datasets, while MERRA-2 and ERA5 presented similar results when compared with the ANA dataset.

Overall, GPPs’ showed the best performance on the monthly scale and in the climatology analysis. ERA5 and MERRA-2 showed poor performance over the daily and annual scales. Similar conclusions were found by Valencia et al. (2023), which studied eight GPPs in Colombia using a point-to-pixel approach. According to the authors, the GPPs show better performance on a monthly scale rather than daily or annually. Climatology was not evaluated by the authors. Izadi et al. (2021), and Malayeri et al. (2021) also found better results at a monthly scale rather than a daily scale while studying Iran. Arshad et al. (2021) also found a decrease in GPPs performance in the annual analysis for Pakistan.

In the literature, different conclusions for the climatology analysis can be found. Balmaceda-Huarte et al. (2021) concluded that climatology or long-term average values show the best performance. However, Hassler and Lauer (2021) found the contrary, stating that annual climatology reanalysis does not perform well. Here, it was observed a good agreement between the reanalysis’ climatology precipitation and the observational data.

Across multiple studies, there isn’t a single GPPs that consistently outperforms others in every study site, or time-frequency analyzed. For instance, Arshad et al. (2021), and Vega-Durán et al. (2021) showed that ERA5 outperforms other GPPs. Arshad et al. (2021) identified high correlation coefficients at daily (R = 0.85) to monthly scale (R = 0.96) for ERA5 in comparison with MERRA-2 (R = 0.59, R = 0.81, respectively), while studying reanalysis precipitation products in Pakistan. However, they noted that on an annual scale, MERRA-2 (R = 0.84) surpasses ERA5 (R = 0.77).

Vega-Durán et al. (2021) studied ERA5 and MERRA-2 in the Sinú River basin, Colombia (South America), and found very contrasting results between ERA5 and MERRA-2 considering 49 rain gauges. Although the metric had a poor performance for this watershed, ERA5 (BIAS = 0.52 and NSE=-0.22) had a better behavior than MERRA-2, (BIAS = 2.12 and NSE=-16.95) on the climatology scale between the years 1985 to 2019.

It was also found in the literature that MERRA-2 outstands ERA5, or studies where ERA5 had the lowest performance in South America (Birkel et al. 2022; Valencia et al. 2023). Valencia et al. (2023), in Colombia, found that ERA5 is one of the GPPs with the worst performance for all spatial and temporal scales, with a decrease in the performance over altitudes higher than 1000 m due to precipitation overestimation. The authors found a monthly RMSE average of 125 mm and an annual RMSE average of 699 mm. The Pearson correlation values were 0.69 on the monthly scale and 0.53 at the annual scale.

Birkel et al. (2022) studied ERA5 and MERRA-2 precipitation and temperature in the Andes Cordillera region, in Peru and Bolivia, from 2014 to 2019 using 6 rain gauges. The original hourly precipitation was converted to total monthly precipitation. The average RMSE maximum value for ERA5 was 103.5 mm and a minimum of 13.92 mm. For MERRA-2 the maximum RMSE values were 51.89 mm and minimum. 7.55 mm. According to Birkel et al. (2022), MERRA-2 produced lower precipitation error scores in comparison with ERA5. However, according to Ramos et al. (2021), it is wise to carefully use MERRA-2 data since there are significant differences in the results regarding the observational data in South Brazil.

Pedreira-Junior et al. (2021) found underestimated values for MERRA-2 over the three time scales studied (daily, monthly, and annually) in Mato Grosso state, Central Brazil. Pedreira-Junior et al. (2021) studied MERRA-2 precipitation from 2000 to 2018, using 11 rain gauges from the INMET database. Similar to our findings, the author stated that MERRA-2 consistently underestimated observations on the time-frequency studied.

For the annual scale, Pedreira-Junior et al. (2021) found RMSE values ranging from 500 mm to 200 mm, and a Pearson correlation of 0.1 to 0.9. In this research, including both databases, the RMSE values found were 3347 mm to 20 mm, and a correlation of -1 to 1. On a monthly scale, the authors found RMSE values from 100 mm to 50 mm, and a Pearson correlation of 0.6 to 0.9. While in this research the findings are 447 mm to 13 mm, and − 0.9 to 0.9. On a daily scale, Pedreira-Junior et al. (2021) found a RMSE of 15 mm to 10 mm for RMSE, and a Pearson correlation of 0.1 to 0.4. In this research, it was found a RMSE of 35 mm to 5 mm, and a correlation of 0.04 to 0.67.

In comparison to the findings by Pedreira-Junior et al. (2021), this research observed higher error ranges. Also, negative values were found for RMSE and Pearson correlation, which aligns with similar findings reported by Vega-Durán et al. (2021) in Colombia. Vega-Duran et al. (2021) suggested that negative NSE values indicate that observational records are more reliable than reanalysis products. Additionally, the results reported by Balmaceda-Huarte et al. (2021) indicated that among five Global Precipitation Products (GPPs) evaluated in Southern South America, it was not identified any specific reanalysis product that consistently outperformed the others.

Both Vega-Durán et al. (2021), Pedreira-Junior et al. (2021), and Birkel et al. (2022) used considerably fewer rain gauges than the amount used in this research (49, 11, and 6, respectively). Valencia et al. (2023) used 1334 rain gauges, and overall, the authors found that ERA5 overestimates the precipitation, with poor results over Colombia. Summing INMET and ANA rain gauges, in this research, 761 stations were used, occupying a larger area than the watershed itself (920,000.km2). Independent of the number of stations used, the results found in this research corroborate with the literature.

Overall, ERA5 overestimates the precipitation, and MERRA-2 underestimates. Also, neither ERA5 nor MERRA-2 shows a strong agreement with the observational data in Tocantins-Araguaia watershed at the daily and annual scales.

4.2 Climatology at watershed scale

At the watershed scale and considering the climatology, the reanalysis showed that it follows the observational precipitation regime trends, effectively defining the dry and rainy seasons. ANA (2009) reported an annual precipitation of 1740 mm for the watershed, while Conejo et al. (2007) indicated a slightly higher annual precipitation of 1840 mm.

Neither INMET (1308 mm) nor ANA datasets (1530 mm) achieved those values, though ANA showed more approximated results. Probably, the ratio of rain gauges plays a major role in this subject since ANA rain gauge coverage is higher and better distributed over the watershed. ERA5 successfully simulated the climatology value similar to the literature (1721 mm). However, MERRA-2 underestimated the watershed precipitation (1233 mm).

The seasons’ behavior was also different for each dataset. Specifically, ERA5 tended to generally overestimate the precipitation in comparison to INMET. However, its behavior varied concerning the seasons when compared to ANA. Specifically, ERA5 underestimated precipitation during the rainy season but overestimated it in the dry season. MERRA-2 underestimates the precipitation in the rainy season and overestimates it in the dry season for both datasets.

The overestimation in ERA5 corroborates with findings from Vega-Duran et al. (2021) and Valencia et al. (2023), both in Colombia. Hassler and Lauer (2021) also found a wet bias in the tropics while studying eleven global and regional precipitation products, such as ERA5 and MERRA-2.

4.3 Gridded Precipitation products and the global position

Similar to the results found in this research, the literature presented here also showed modest to poor performance of the reanalysis products when South America or the Tropical zone is considered (Hassler and Lauer 2021; Pedreira-Junior et al. 2021; Vega-Durán et al. 2021; Birkel et al. 2022; Valencia et al. 2023). Likewise, Reddy and Pramada (2022) also showed inaccurate results of MERRA-2 in the tropical zone at the Kalada river, in India. According to Lavers et al. (2022), the ERA5 best performance results are located at latitudes between 20º and 60º, North and South. While the region encompassing the tropical zone shows the largest errors for precipitation.

Thus, are the poor performance results related only to the latitude position and the reanalysis simulation? Hence, are the tropics a challenging zone for the reanalysis products to represent the precipitation properly and accurately? Or are there other limitations affecting reanalysis performance?

According to Hersbach et al. (2020), the main areas with high errors for ERA5 are over the ITCZ in the eastern Pacific, over the extratropical Pacific, the South Atlantic ocean, the Indian ocean, continental South America, and Africa. Ali et al. (2021) pointed out that ERA5 and MERRA-2 precipitation products fail to reproduce precipitation extremes over the tropics, likely due to their convective nature and the coarse resolution of the climate reanalyzes.

Degefu et al. (2022) studied 20 precipitation data products for Ethiopia (between 3° and 15° N) and stated that high-resolution products (< 0.3º) had better performance compared to lower resolution precipitation products. However, considering the reanalysis evaluated here, this only applies to MERRA-2 (0.5ºx 0.625º). Hence, there may be a need for a higher-resolution product to simulate the tropical zone and the convective regime rather than the one developed for ERA5 (0.25º x 0.25º).

Ozturk et al. (2021) studied the capacity of the GPPs to predict landslides over Japan. Although their main goal was not to validate reanalysis products, they found interesting results. According to the authors, the resolution of GPPs, such as IMERG and ERA5, may not be the only issue, the errors could also be emphasized by the spatial inaccuracy. The authors found a precipitation shift of a few kilometers for these products, which, in their case, can have significant impacts on landslide prediction. While using ground radar rainfall substantially improved their model performance over other precipitation products. Therefore, precipitation shifts can be an issue while validating point-to-pixel precipitation, since the location can also influence the accuracy measured by goodness-of-fit indicators.

Another issue could be the reproduction of the irregular altitudes in the reanalysis products. Jiao et al. (2021) studied the performance of ERA5 precipitation against the observational data in China, considering annual and seasonal trends, and spatial patterns. According to the authors, altitudes have an impact on ERA5 accuracy, and it showed a better agreement with the observational data at altitudes below 1000 m. Valencia et al. (2023) also found a decrease in ERA5 performance in altitudes above 1000 m due to precipitation overestimation.

According to Lavers et al. (2022), the poor reproduction of urban areas, orography, or islands may also lead to the poor simulation of the local atmospheric circulation, which can cause temperature and precipitation errors. The domain study here shows variations in topography between 0 and 1000 m. Although we are not analyzing the spatial variation, this altimetry differentiation could influence the precipitation regime if not well reproduced by the reanalysis.

The answers to the inquiries above are complex. The stated arguments showed specific issues that can be improved and can influence the GPPs performance. The results showed that ERA5 and MERRA-2 still need to improve the simulation over north and central Brazil. High errors were found, corroborating with South American research (Vega-Durán et al. 2021; Valencia et al. 2023).

4.4 Dataset’s reliability

Among the observational datasets employed in this research, and considering the filter applied to each product, it is evident that the ANA dataset is a more reliable source. Ana datasets showed a sufficient number of rain gauges that provided a more accurate representation of the watershed.

INMET data mainly occupies the southeast of the Tocantins river, resulting in an inadequate representation of the watershed. The absence of rain gauges in key areas such as the Araguaia sub-basin (west) and the Pará estuary (north) contributes to a sub-representation of the watershed. Both northern and western regions of the basin are characterized by higher levels of precipitation, but suffer from a scarcity of monitoring stations, potentially resulting in the underestimation of precipitation by the INMET dataset.

The climatological precipitation recorded by INMET (1308 mm) closely aligns with values stated by literature specific to the southeast area (1500 mm to 1200 mm; ANA, 2009), supporting this assertion. It’s important to clarify that this conclusion does not suggest INMET’s unreliability but rather highlights that the filtering method applied to this dataset in this research might not be suitable for accurately representing the characteristics of the Tocantins-Araguaia watershed.

Considering the ANA dataset as the most reliable implies that neither ERA5 nor MERRA-2 can accurately replicate the precipitation regime over the watershed at the daily and annual scale. Although a considerable number of stations displayed satisfactory to very good results, it’s crucial not to overlook the fact that the majority of stations exhibited poor performance.

In the context of reanalysis, MERRA-2 showed better average values at daily time frequency, while other analyses showed slightly inferior results compared to ERA5. ERA5 showcased the best average values across monthly, annual, and climatological timescales, consistently showing a higher number of stations with superior performance. Therefore, in terms of reanalysis products, ERA5 demonstrated superior performance in representing the precipitation regime.

Overall, the reanalysis shows a shortage in simulating the precipitation in the tropical zone. It is anticipated that future advancements will lead to improved simulations over the Tropical Zone, as has been happening with the evolution of ERA-Interim to ERA5 (Hersbach et al. 2020), and from MERRA to MERRA-2 (Gelaro et al. 2017).

Initially, the selection of Global Precipitation Products (GPPs) for application over the Tocantins-Araguaia watershed was driven by consistent time series availability, offering higher temporal and spatial resolutions (Hersbach et al. 2020). Additionally, GPPs, such as ERA5 and MERRA-2, provide a broader range of different parameters available, such as temperature, winds, evaporation, surface pressure, and snow, among others.

However, the evaluation conducted in this research shows that the most suitable precipitation product for application in a hydrological-hydrodynamic model over the Tocantins-Araguaia watershed is the ANA observational precipitation data. This dataset stands out due to its well-distributed nature, a substantial number of rain gauges, and most accurately represents the reality in the region.

5 Conclusion

This paper compares two climate reanalysis products, ERA5 and MERRA-2, alongside two observational datasets, INMET and ANA. The main goal was to identify the most suitable precipitation product for application in a hydrological-hydrodynamic model within the Tocantins-Araguaia watershed. The assessment evaluated ERA5 and MERRA-2 across multiple temporal scales (from daily to climatology) utilizing data from 32 in situ rain gauges from the INMET database and 729 rain gauges from the ANA database in the Tocantins-Araguaia watershed, Brazil.

Results indicated that both GPPs, ERA5 and MERRA-2, described the overall precipitation pattern for the analyzed period. However, the GPPs do not show good agreement with observational data, particularly at daily and annual scales. Better agreement was observed at monthly and climatological scales. ERA5 exhibited a higher number of acceptable performing stations than MERRA-2 and showed the lowest PBIAS. However, both reanalysis products struggled to accurately simulate a significant number of observational rain gauges sourced from the ANA database.

Both reanalysis products showed good agreement in the climatological analysis, but evaluating more detailed the seasonality, the precipitation is not well simulated in the dry season. The reanalysis shows more precipitation than the observational data. In the rainy season, ERA5 overestimates the precipitation for INMET but underestimates for ANA. While MERRA-2 consistently underestimated precipitation for both observational datasets. Overall, the results found here corroborate with the literature, such as the ERA5 showing a wet bias (overestimation) for precipitation, and the MERRA-2 underestimates precipitation regime.

This reflects the modest to poor performance of reanalysis products when applied in South America or tropical zones globally. Notably, more accurate performance is observed between latitudes of 20º and 60º, while significant errors persist in the tropical zone for precipitation estimates.

Despite the advantages of GPPs such as consistent time series and higher resolutions, the ANA observational precipitation data is deemed the most suitable input for hydrological-hydrodynamic modeling in the Tocantins-Araguaia watershed. Its widespread coverage, numerous rain gauges, and accurate representation of regional conditions make it an ideal choice for hydrological modeling in the region.

In conclusion, this study highlights the importance of accurate precipitation data in various applications, primarily in hydrological models, and demonstrates the strengths and limitations of different datasets. The choice of input data for each watershed should be carefully considered to ensure reliable hydrological modeling and effective water resource management.