1 Introduction

Recent reports bolster the connections between climate change and high-impact extreme events such as heavy rainfall, droughts, heat waves, cold waves, and tropical cyclones, which have triggered numerous floods, landslides, wildfires, and avalanches across Latin America and the Caribbean (IPCC 2022; WMO 2021a, b). These regions are considered highly vulnerable to current and future climate extremes due to many factors, including low socio-economic development (Collins et al. 2016; Reyer et al. 2017; Seneviratne et al. 2021) and economic dependence on agricultural commodities (Marengo et al. 2014). According to Nagy et al. (2018), between 2000 and 2015, approximately 74 million people in South America were affected by floods, storms, landslides, and extreme temperatures. For example, the extreme drought in the Brazilian Pantanal between 2019 and 2020 (Libonati et al. 2022; Marengo et al. 2021b) and the drought in the Parana Plata basin (Naumann et al. 2021) not only affected human activities over southern South America but may have exacerbated fire activity that affected the natural biodiversity in Bolivia during 2019 and Pantanal in 2020 (Baxter et al. 2020; Marengo et al. 2021a). Likewise, the flood of 2021 in Manaus, Brazil, has been reported as one of the largest Amazon River flood events of the twenty-first century (Espinoza et al. 2022).

The story is similar to Central America and the Caribbean. Between 2015 and 2019, a prolonged rainfall deficit over most of Central America resulted in severe droughts and crop losses (Depsky and Pons 2021; Pascale et al. 2021; WMO 2021a). The number of tropical cyclones globally was above average in 2020, with 96 events across the 2020 Northern Hemisphere and 2019–20 Southern Hemisphere seasons. Two major hurricanes made landfall in quick succession in Central America (Category 4 Eta and Category 5 Iota), causing severe flooding in the region and the first Category 5 system to strike the Nicaraguan coast. The Honduran Government estimated that 53,000 hectares of crops were devastated by Hurricane Eta, and more than 2.8 million people were affected. Combined with the COVID-19 pandemic and pre-existing humanitarian crises, these extreme compound events left incredible loss and suffering (Shultz et al. 2021), bringing our response to these events into focus.

According to the WMO in its recent report on the State of the Climate in Latin America and the Caribbean (WMO 2022), in the period 1981–2010, the trends indicate an increase in the intensity and frequency of hot extremes and decrease in the intensity and frequency of cold extremes, as well as a significant intensification of total and heavy precipitation in south-eastern South America. As for droughts and dry spells, the report identified mixed trends in different subregions of the Caribbean and Central America, while in Mexico, central Chile, and the Paraná–La Plata Basin, there is some evidence of increased frequency and severity of meteorological droughts. On the other hand, the Seneviratne et al. (2021) project, throughout the twenty-first century, increases in the frequency, duration, and magnitude of warm daily temperature extremes and decreases in cold extremes, as well as an increase in heavy precipitation or the proportion of total rainfall from heavy rainfalls, mainly in tropical regions.

Understanding the dynamics and trends in extreme climate events provides vital information to help policymakers establish actions necessary to combat climate change and its impacts. Providing reliable information regarding historical and future projections of such events represents an enormous challenge for climate researchers (McPhillips et al. 2018; Medeiros and Oliveira 2022; Mistry 2019; Mysiak et al. 2018; Santos et al. 2017). Previous studies have investigated the historical evolution of climate extremes in different parts of the world, including Latin America and the Caribbean, using the set of indices established by the Expert Team on Climate Change Detection and Indices (ETCCDI) (Aguilar et al. 2005; Avila-Diaz et al. 2020b; Cornes and Jones 2013; Donat et al. 2013, 2016; Dunn et al. 2020; Gouveia et al. 2022; Kitoh and Endo 2016; Nakaegawa et al. 2014; Skansi et al. 2013; Zilli et al. 2017). Those studies have indicated two components linked to uncertainties of gridded or observational datasets (e.g., reanalyses and satellite products) and climate simulations (e.g., Earth system models—ESMs) concerning estimates of climate extremes events at local and regional scales: (i) these events have large temporal and spatial variability (Akinsanola et al. 2020; Avila-Diaz et al. 2020b; Campozano et al. 2016; Na et al. 2020); (ii) the assessments of climate extremes in the Northern Hemisphere are more abundant and reliable when compared to the Southern Hemisphere (Donat et al. 2016; Lehmann et al. 2015; Sillmann et al. 2013), due to the fact that in the latter there is a greater lack of climatic data for an analysis of long periods and low spatial distribution of meteorological stations (Condom et al. 2020; Liebmann and Allured 2006; Pabón-Caicedo et al. 2020; Solman 2013). However, despite the deficiencies mentioned above, information from reanalyses, satellites, and the combinations of both, are useful and reliable datasets to evaluate and validate the ESMs (Beck et al. 2019; Contractor et al. 2020; Sun et al. 2018; Yin et al. 2013).

Natural (e.g., El Niño-Southern Oscillation) and anthropogenic (e.g., land–use/land—cover change, fossil fuel burning) forcings can influence the frequency and intensity of climate extreme events, leading to an intensification of hazards such as floods, droughts, fires, cold/heat waves, and landslides (AghaKouchak et al. 2020; Changnon et al. 2000; Chen and Sun, 2021). To improve the representation of the climate extremes patterns and variability, climate scientists have been applied new physical parameterizations to the ESMs (e.g., to improve biosphere–atmosphere interaction processes), increased the horizontal spatial resolution, and used large Multi-Model Ensembles (MME) with a large number of simulations (Bador et al. 2020; Lehner et al. 2020). In this sense, recent studies have shown the effects of the improvements on the parameterizations in the last two generations of ESMs from the CMIP (CMIP5 and CMIP6) (Brown et al. 2020; Fan et al. 2020; Lun et al. 2021; Ortega et al. 2021; Thorarinsdottir et al. 2019; Wehner 2020). However, all the studies agree that the improvements in the simulation of temperature and precipitation climate extremes are small and statistically not significant.

Moreover, studies analyzing the sources of ESMs’ uncertainties from CMIP experiments are focused on extensive areas, like continental regions (Almazroui et al. 2021c; Kim et al. 2020; Na et al. 2020; Sillmann et al. 2013). In a recent study, Akinsanola et al. (2020) found that MME from CMIP6 performs better than most individual models in capturing precipitation extremes at a seasonal scale over the United States. On the other hand, most regional studies indicated that to assess the ability of CMIP models to capture the variability of climate extremes, it is necessary to evaluate the performance of individual models to identify the shortcomings (Akinsanola et al. 2020; Avila-Diaz et al. 2020b; Medeiros and Oliveira 2022; Rivera and Arnould 2020).

Finally, to increase the spatial resolution, Regional Climate Models (RCMs) have traditionally been used through dynamical downscaling of ESM outputs to obtain finer climate information for a particular region (Ban et al. 2021; Vichot-Llano et al. 2021). However, Denis et al. (2002) show that although RCMs provide a more detailed representation of the complex topography and the continent–ocean contrast, they introduce new sources of uncertainty (Giorgi 2005; Giorgi and Francisco 2001) like closure problems in lateral boundary conditions (Ambrizzi et al. 2019; de Medeiros et al. 2020). To counteract this drawback, high-resolution ESMs are being developed, which have the potential to provide relevant regional and global climate information and include more climate processes than RCMs (Demory et al. 2020). An example is the new High Resolution Model Intercomparison Project—HighResMIP (Haarsma et al. 2016), which provides an evaluation framework for ESM simulations in horizontal grid spacings ranging from 0.18º to 2.5º. In this way, it is possible to understand the role of increasing horizontal resolution in climate simulations (mean and extreme values, variability, etc.). This increase in spatial resolution has shown considerable improvements in the simulations of the magnitude and frequency of meteorological systems of different scales, such as tropical cyclones (Roberts et al. 2020; Vannière et al. 2020) and atmospheric blocking events (Schiemann et al. 2020).

The main objective of this study is to assess the performance of a sub-set of HighResMIP models, which are members of the CMIP6, in simulating daily temperature and precipitation climate extremes events (as represented by the indices recommended by the ETCCDI) over Latin America and the Caribbean regions during 1981–2014. Additionally, we evaluate the impact of the increase in the horizontal spatial resolution in the HighResMIP models in estimating extreme climate variability on a local/regional scale. Finally, we analyze climate projections for the 2021–2050 period under the new Shared Socioeconomic Pathways (SSP) scenario SSP5-8.5.

2 Data and Methodology

2.1 Climate Extremes Indices

Table 1 shows the temperature (8) and precipitation (8) extremes indices chosen as the most relevant for the studied region from the 27 indices proposed by the ETCCDI (http://etccdi.pacificclimate.org/). The selected ETCCDI indices have been widely used for monitoring changes in daily extremes of temperature and precipitation in Latin America and the Caribbean regions (Aguilar et al. 2005; Almazroui et al. 2021c; Avila-Diaz et al. 2020b; Collins et al. 2016; Heidinger et al. 2018; Skansi et al. 2013; Valverde and Marengo 2014). These indices are calculated from daily maximum (TX) and minimum (TN) temperature and precipitation (PR) data. The capital “X” and “N” stand for the daily maximum and minimum temperature, respectively. Indices can be classified into four groups: (1) absolute indices such as hottest day (TXx) and coldest night (TNn) or daily and 5-day maximum PR (RX1day and RX5day, respectively); (2) threshold indices that represent the number of days exceeding a fixed threshold, as the number of days with PR greater than 20 mm (R20mm); (3) percentile-based threshold indices, that indicate the number of days that is surpassing rates below 10th percentile (cold nights—TN10p and cold days—TX10p) or above 90th percentile (warm nights—TN90p and warm days—TX90p); and (4) duration indices, that display the warm spell duration (WSDI, based on percentile thresholds), dry spell (consecutive dry days—CDD) and wet spell (consecutive wet days -CWD), based on an absolute threshold. In the case of the absolute indices, the lower case “x” and “n” means the annual maximum and minimum value, respectively. We used the same reference period (1981–2014) for all ESMs to calculate the percentile-based threshold indices. Moreover, we refer to the interested reader to see Zhang et al. (2011) for further details about ETCCDI indices.

Table 1 List of the temperature and precipitation indices applied in this study

Some studies have already employed the ETCCDI indices to monitor changes in intensity, frequency, and duration of temperature and precipitation climate extremes over the study area using meteorological stations (Aguilar et al. 2005; Ávila et al. 2019; Ceron et al. 2020; Croitoru et al. 2016; Domínguez-Castro et al. 2020; Marengo et al. 2021a). Other works assess the skills of the different gridded datasets (e.g., ESMs, reanalyses, satellites products) to reproduce the spatial–temporal variability (Avila-Diaz et al. 2021; de Lima and Alcântara 2019; Kim et al. 2020; Ongoma et al. 2018). In this sense, we evaluated the performance of CMIP6 models in simulating the ETCCDI indices using annual summary values, similar to other studies (Aerenson et al. 2018; Gouveia et al. 2022; Thibeault and Seth 2014) that indicate that most of the impactful climate extremes can be described by annual indices. Additional details about the performance of CMIP6 models over North, Central, and South America on the annual cycle of total precipitation and mean temperature are discussed by Almazroui et al. (2021a, b).

2.2 Climate Reference Datasets

The reference climate datasets used in this study are the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis v5—ERA5 (Hersbach et al. 2020), the Global Meteorological Forcing Dataset for Land Surface Modeling—GMFD (Sheffield et al. 2006), and the Climate Hazards Group InfraRed Precipitation with Stations—CHIRPS v.2.0 (Funk et al. 2015). We focus on these gridded datasets because they are available over our study region with a high horizontal resolution covering the reference period defined in this study (1981–2010) and are widely used in the literature. It must be stressed that the level of model performance in simulating climate extremes varies according to the reference datasets (Kim et al. 2020; Sillmann et al. 2013; Srivastava et al. 2020; Ortega et al. 2021). Moreover, some studies showed that using more than one reference dataset reduces the uncertainty of the observational climate gridded information (Avila-Diaz et al. 2021; Bador et al. 2020).

ERA5 data are available from 1950 to the present with a temporal resolution of 1 h and horizontal spatial resolution of 0.25° × 0.25° lat/lon (Hersbach et al. 2020). According to CDS (2020; https://confluence.ecmwf.int/display/CKB/ERA5), in ERA5, the minimum and maximum temperatures are forecast parameters, that is, they are available from the forecasts only, and they have a cold bias in the lower regions of the troposphere over most parts of the globe. Precipitation is a variable available from a combination of data analysis and forecasting, and due to the increased spatial resolution, it presents improved results in ERA5 (Jiao et al. 2021; Tarek et al. 2020). However, important uncertainties remain in tropical regions due to the lack of data available to analyze (Hersbach et al. 2020). In some studies, ERA5 has demonstrated reliable estimates of climate features in some regions of Central and South America (Avila-Diaz et al. 2020b; Balmaceda‐Huarte et al. 2021; Cerón et al. 2021; Zuluaga et al. 2021).

The CHIRPS (Funk et al. 2015) dataset provides daily precipitation outputs at a high horizontal resolution (0.05° × 0.05° lat/lon) with near-global coverage (50° S to 50° N, and 180° W to 180° E) from 1981 to near the present. CHIRPS precipitation combines interpolated station data with the Tropical Rainfall Measuring Mission Multi-Satellite Precipitation Analysis version 7 (TMPA 3B42 v7) to calibrate global Cold Cloud Duration (CCD) rainfall estimates (Funk et al. 2015). Previous studies show that CHIRPS can simulate spatiotemporal precipitation variability for particular regions of South America (Cerón et al. 2020, 2021; Espinoza et al. 2019; Nogueira et al. 2018; Rivera et al. 2018).

The GMFD data have a horizontal resolution of 0.25° × 0.25° lat/lon and provide precipitation and maximum and minimum temperature daily outputs available during the 1948–2016 period (Sheffield et al. 2006). It is constructed from the coupling of the NCEP-NCAR global datasets and reanalyses, daily precipitation from the Global Precipitation Climatology Project (GPCP), and monthly climate variables from the Climatic Research Unit (CRU), 3-hourly temporal resolution precipitation from TRMM, and the NASA Langley monthly surface radiation budget. GMFD has demonstrated reliable patterns for daily temperature extremes; however, it is inadequate to estimate daily precipitation extremes over South America (Avila-Diaz et al. 2020a; b). Therefore, we only used the daily outputs of maximum and minimum temperature.

2.3 Earth System Models Simulations and Projections

According to Haarsma et al. (2016), to assess the historical period (1950–2014), the HighResMIP the following uses two experiments: (1) hist-1950, based on historical coupled ocean atmosphere simulations of the near past at high and standard resolution, and (2) highresSST-present, based on historical atmosphere-only simulations of the near past, driven by sea surface temperature and sea ice concentration data obtained from HadISST2.2 at 1/4 degree daily resolution.

To capture solely the impact of atmospheric resolution on temperature and precipitation extremes, this analysis focuses on highresSST-present experiment, which evaluates a set of HighResMIP simulations forced by observed daily sea surface temperature (SST) data from the HadISST.2.2.0.0 dataset, available for the 1950–2014 period at https://esgf-node.llnl.gov/search/cmip6/. The performance of the HighResMIP simulations is evaluated over the 1981–2014 period. We used the daily climate outputs (e.g., PR, TX, and TN) of 21 ESMs from 10 different meteorological modeling institutions, executed in (at least) two different spatial resolutions (Table 2). This study considered the first ensemble member of all ESMs (i.e., r1i1p1f1), except for the CNRM-CM6-1, that only is available for r1i1p1f2. For more details about HighResMIP experimental design, see Haarsma et al. (2016).

Table 2 List of selected CMIP6 Earth System Models in this study

For consistency with the highresSST-present experiment, the projections of future scenarios were evaluated with the data from the highresSST-future experiment. This last contemplates only a future simulation scenario based on a blend of variability from the 0.25° HadISST2-based dataset and the climate change signal from CMIP5 RCP8.5 simulations. (Haarsma et al. 2016; O’Neill et al. 2016). The CMIP5 RCP8.5 scenario represents the high end of the range of plausible future forcing pathways. It is consistent with high energy intensity, high dependence on fossil fuels, continuous growth in the population, heavy greenhouse gas emissions associated with slow technological development, and no implementation of climate policies (Bozkurt et al. 2018; Silveira et al. 2019).

2.4 Model Performance Metrics

The Kling-Gupta efficiency (KGE) methodology (Gupta et al. 2009; Kling et al. 2012) has been widely adopted to compare diverse-based climate gridded against observed datasets (Avila-Diaz et al. 2021; Beck et al. 2019; Chaney et al. 2014; Nashwan and Shahid 2019; Stewart et al. 2022; Wilson et al. 2022; Zuluaga et al. 2022). The KGE is an index used in this research to compare the reference precipitation/temperature dataset with model estimates; the optimum value of KGE is one (1.0). The total performance of simulations is decomposed into three different metrics with the same weight (Eq. 1) as follows: (i) the linear correlation (CORR), which measures the temporal coherence of the precipitation and temperature indices (Eq. 2), where 1.0 is a perfect score, and 0.0 indicates the absence of correlation (Pearson 1895); (ii) the bias ratio (BR), that is used to measure the overestimation (BR > 1.0) or underestimation (BR < 1.0) compared to the observations (Eq. 3) (Ayehu et al. 2018); and (iii) the relative variability (RV), which is a relative measure of the dispersion (Eq. 3) and its optimal value at unity (1.0).

$$\mathrm{KGE}=1- \sqrt{{\left(1-\mathrm{CORR}\right)}^{2}+{\left(1-\mathrm{BR}\right)}^{2}+{\left(1-\mathrm{RV}\right)}^{2}}$$
(1)
$$\mathrm{Correlation }\,\,(\mathrm{CORR}) =\frac{\sum_{\mathrm{i}=1}^{\mathrm{n}}\left({\mathrm{O}}_{\mathrm{i}}-\overline{\mathrm{O} }\right)\left({\mathrm{S}}_{\mathrm{i}}-\overline{\mathrm{S} }\right)}{\sqrt{\sum_{\mathrm{i}=1}^{\mathrm{n}}{\left({\mathrm{O}}_{\mathrm{i}}-\mathrm{O}\right)}^{2}}\sqrt{\sum_{\mathrm{i}=1}^{\mathrm{n}}{\left({\mathrm{S}}_{\mathrm{i}}-\overline{\mathrm{S} }\right)}^{2}}}$$
(2)
$$\mathrm{Bias \,\,Ratio }\,\,\left(\mathrm{BR}\right)=\frac{\overline{\mathrm{S}} }{\overline{\mathrm{O}} }$$
(3)
$$\mathrm{Relative\,\, Variability }\,\,(\mathrm{RV})= \frac{{\mathrm{CV}}_{\mathrm{S}}}{{\mathrm{CV}}_{\mathrm{O}}}$$
(4)

The parameters \({\mathrm{O}}_{\mathrm{i}}\) and \({\mathrm{S}}_{\mathrm{i}}\) in Eqs. (2) and (3) are the observation and simulation values, and \(\overline{\mathrm{O} }\) and \(\overline{\mathrm{S} }\) are their means, respectively. In Eq. (4), \({\mathrm{CV}}_{\mathrm{O}}\) and \({\mathrm{CV}}_{\mathrm{S}}\) are the coefficient of variation of observed and simulated values, respectively.

We used a comprehensive ranking procedure to assess performance across models and resolutions (Wilson et al. 2022; Yang et al. 2020; You et al. 2018). Regional mean KGE values based on comparisons of extreme temperature and precipitation with the reference datasets (e.g., ERA5, GMFD, and CHIRPS) were used to rank the models and MMEs regardless of resolution group from 1 (best) to 21 (worst) (24 for precipitation). These rankings were then summed to produce a cumulative rank score for each model and MME. As KGE assesses both mean and variability, the score and, therefore, the ranking does not distinguish which component leads to a stronger KGE value. For a comparison of each component of the KGE values, see Supplemental Figs. S1 and S2.

2.5 Data Processing

To study the performance and the impact of the increased horizontal resolution of CMIP6 models in capturing climate extremes, we defined the following three different groups based on the size of the grid (sg): (1) 0.8° ≤ sg ≤ 1.87° (G1L), (2) 0.5° ≤ sg ≤ 0.7° (G2I), and (3) 0.23° ≥ sg ≤ 0.35° (G3H). The capital “L”, “I”, and “H” means the group is classified as low, intermediate, or high resolution, respectively. We established the different resolution groups to conserve the statistical features of the extreme climate indices in the original resolution of ESMs, following Diaconescu et al. (2015). For intercomparison purposes, after calculating the ETCCDI indices, the ESMs and reference datasets (e.g., ERA5, GMFD, and CHIRPS) in the G1L, G2I, and G3H groups were regridded to a common resolution of 1º × 1º, 0.50° × 0.50° and 0.25° × 0.25°, respectively, using a first-order conservative remapping technique (Jones 1999). All reference climate datasets and CMIP6 models were studied during 1981–2014. Finally, a Multi-Model Ensemble (MME) mean was also calculated for each model group (G1L-MME, G2I-MME, and G3H-MME), and those were compared with the observational data sets.

The performance and trend analyses were conducted over ten reference regions of Latin America and the Caribbean, defined by Iturbide et al. (2020) for the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6). The shapefiles and codes for these regions are available at https://github.com/SantanderMetGroup/ATLAS. The regional acronyms in Fig. 1 refer to (1) North Central America (NCA), (2) South Central America (SCA), (3) Caribbean (CAR), (4) Northwestern South America (NWS), (5) North–South America (NSA), (6) North Eastern South America (NES), (7) South American Monsoon (SAM), (8) South Western South-America (SWS), (9) South Eastern America (SES), and (10) South-South America (SSA).

Fig. 1
figure 1

Sub-regional domains over Latin America and the Caribbean, according to Iturbide et al. (2020)

To assess the future changes in HighResMIP climate projections, we established the relative change in each ETCCDI index using Eq. 5, adapted from Bador et al. (2018) as follows:

$$\mathrm{Relative\,\, change\,\, in\,\, a \,\,given\,\, ETCCDI\,\, index }\,\,\left(\mathrm{i}\right)= \frac{{\overline{\mathrm{ETCCDI}} }_{\mathrm{i}-\mathrm{future}}-{\overline{\mathrm{ETCCDI}} }_{\mathrm{i}-\mathrm{his}}}{{\overline{\mathrm{ETCCDI}} }_{\mathrm{i}-\mathrm{his}}},$$
(5)

where \({\overline{ETCCDI} }_{i-future}\) and \({\overline{ETCCDI} }_{i-his}\) are the 30-year averages in each ETCCDI index over the future interval (2021–2050) and historical period (1981–2010), respectively.

The agreement of climate change signal in the ETCCDI indices is considered robust if at least 66% of ESMs have the same directional change and more than 50% delivered a significant change using the Student’s t-test (p value < 0.05) between the historical (reference) and projections (Almazroui et al. 2021b; Avila-Diaz et al. 2020b; Dosio et al. 2019).

3 Results

3.1 Earth System Model Evaluation with Observations

3.1.1 Performance Evaluation in Temperature Indices

The heat maps summarize the individual performance of the ESMs in simulating the climate extremes indices on the annual scale over Latin America and the Caribbean region from 1981 to 2014 compared to ERA5 and GMFD datasets (Fig. 2). In this sense, most ESMs display better performance for the TXx index and when compared with the ERA5 dataset than GMFD. Furthermore, the families of models, including HadGEM3-GC31, ECMWF-IFS, FGOALS-f3, and EC-Earth3P, show plausible performance over almost all regions, except for SSA regions where most models show KGE values close to 0.0 (Fig. 2, first column 1). Interestingly, increasing the horizontal resolution does not show a continuous improvement for the TXx index.

Fig. 2
figure 2

KGE Performance evaluation obtained of ESMs to estimate annual temperature indices compared to ERA5 and GMFD datasets during 1980–2014. The number of regions is the ID according to Fig. 1. The NA (white boxes) means no data value for the index in the CMCC-CM2-HR4, IPSL-CM6-1-ATM-HR, and CMCC-CM2-VHR4

Figure 2 shows that the best models for capturing TNn patterns for those regions are MPI-ESM1-2-XR, EC-Earth3P, and MRI-AGCM3-2-S with KGE between 0.20 and 0.60. The ESMs presented difficulties in reproducing the minimum temperature (TNn) index for almost all regions, except for CAR, NWS, NSA, and NES regions in which BR and RV are close to 1.0, and a significant correlation above 0.4 is found (Fig. S1). The weakest performance is found in regions NCA, SWS, and SES with extremely negative KGE values (Fig. 2), especially when the ESMs are compared with GMFD, which can exceed − 10.0.

For the diurnal temperature range (DTR), the ESMs display KGE values above 0.4 when compared with ERA5, especially in the NCA, NES, SAM, and SWS regions (Fig. 2; third column). In contrast, in most regions, ESMs showed poor performance (KGE < 0.0) using GMFD as a reference dataset. Similarly, Avila-Diaz et al. (2020b) find this poorer representation, indicating that GMFD displays difficulties in representing observational magnitudes of the DTR index during 1980–2016 over Brazil. On the one hand, these results can be related to the DTR values from each reference dataset. GMFD uses CRU-derived DTR data (Sheffield et al. 2006), which shows low statistical performance (correlation and error) over northern South America and the Caribbean region (Harris et al. 2020). Meanwhile, the DTR values from ERA5 are influenced by the cold bias of TX and TN (Hersbach et al. 2020). On the other hand, Wang and Clow (2020) show that the latest generation of CMIP6 models continues to underestimate DTR climatology, mainly due to difficulties in estimating downwelling longwave radiation, which influences TN daily values.

For percentile indices such as cold nights (TN10p), warm nights (TN90), cold days (TX10p), and warm days (TX90p), most ESMs perform reasonably well compared to the reference datasets (Fig. 2, see fourth to the seventh column). The KGE values are above 0.4, BR and RV are variability close to 1.0, and the significance CORR is above 0.6 in almost all regions (Fig. S1 shows the values of the KGE components). However, less agreement is found in the southern regions of the study area for most ESMs (in the SES region, when used, the ERA5 dataset and the SSA region for GMFD). Therefore, simulations of those indices are overestimated in most regions concerning GMFD, except for TX10p in the NWS region. Furthermore, the bias of ESMs compared to ERA5 display mixed results of over/underestimating in the percentiles indices (Fig. S1).

When comparing the reference data sets, the KGE values between ESMs and ERA5 are better than between ESMs and GMFD in most regions for warm spell duration index (WSDI; Fig. 2 eighth column). Furthermore, according to GMFD, the weaker performance of ESMs is displayed for the NSA, SAM, and SES regions with negative KGE values (Fig. 2). In this sense, Avila-Diaz et al. (2020a) evaluated the performance of 21 statistically downscaled ESMs at high horizontal resolution (0.25° of latitude × longitude) over Brazil. They indicate that the worst performance across the ETCCDI indices is for WSDI. This shortcoming may be related to the fact that 21 ESMs use the GMFD data to downscale the CMIP5 models.

To show the spatial performance and avoid many very similar results for temperature indices, we selected the performance of MME for the TXx index (Fig. 3). The MMEs at different resolutions have greater skill than individual models in most regions, especially compared to the ERA5 (Figs. 2 and 3). Similar results are found when comparing MMEs and GMFD over most indices. For instance, KGE values for HadGEM3-GC31 in TXx over the NWS region in their low (G1L-MME; ≥ 0.8° sg ≤ 1.87°), intermediate (G2I-MME; ≥ 0.5° sg ≤ 0.7°), and high resolution (G3H-MME; ≥ 0.23° sg ≤ 0.35°) are 0.69, 066, and 0.73, respectively, and for the MMEs are 0.67, 0.65, and 0.70.

Fig. 3
figure 3

The KGE (af) and climatology bias (gl) of multi-model ensemble (MME) for the TXx during 1981–2014 compared between the ERA5 (left side) and GMFD (right side). The G1L-MME, G2I-MME, G3H-MME are the groups based on the size of the grid (sg) of the MME: low (≥ 0.8° sg ≤ 1.87°), intermediate (≥ 0.5° sg ≤ 0.7°), and high resolution (≥ 0.23° sg ≤ 0.35°), respectively

Thus, there is no clear relationship that increasing horizontal resolution always generates better performance results. In the case of TXx, the HadGEM3-GC31 family of models displays KGE values of 0.26, 0.43, and 0.45 for low, intermediate, and high resolution, respectively. On the other hand, contrasting values are found over the SAM that displayed KGE values of 0.55, 0.43, and 0.39 for the same resolution and family. Furthermore, as observed in TXx, the higher-resolution models do not improve the TNN simulation. For example, in their low and intermediate resolution, the KGE values in the NWS region for the MPI-ESM1-2 family are 0.55 and 0.44, respectively. However, the MRI-AGCM3-2 display considerable improvement for the same region with KGE in their intermediate (KGE = 0.07) and high resolution (KGE = 0.61).

Some performance problems in simulating absolute temperature extremes may be due to climate sensitivity, e.g., the increase in temperature in response to cloud feedback and cloud-aerosol interactions (Collazo et al. 2022). Other authors point out that CMIP6 shows a higher subregional climate sensitivity than CMIP5 (Seneviratne and Hauser 2020; Zelinka et al. 2020). However, this is not reflected in the performance of the ESMs to simulate percentile-based indices since these are generally calculated for each day concerning a long-term reference period (Almazroui et al. 2021c; Collazo et al. 2022). Therefore, an increase in warm days from annual analyses does not necessarily imply warming for the very warmest days of the year (Almazroui et al. 2021c).

Figure 4 shows an overall ranking used to select the model or MME that best represents the temperature indices in each region (Fig. 4a, b), where we considered the mean of KGE´s values in each extreme temperature index between CMIP6 models and ERA5 (a) and GMFD (b). In Fig. 4c, d, we observe that the consistently poorest performance for most indices is the NICAM16 and HadGEM3-GC31 families in their different horizontal resolutions. The best performance is obtained by using the G2I-MME, G1L-MME, and G3H-MME for the three resolutions, and for individual models, it is ECMWF-IFS-HR and ECMWF-IFS-LR. In this context, the MMEs improve the representation of temperature indices over most regions; however, this approach does not generate the best KGE values depending on the analyzed temperature index, region, and horizontal resolution. As noted in the individual performance model, our findings suggest that the resolution increase does not lead to systematic improvement of ESMs.

Fig. 4
figure 4

Comprehensive model ranking based on the regional mean KGE for all values in each extreme temperature indices between CMIP6 models and ERA5 (a, c) and GMFD (b, d) over the ten domains. The height of the color column in Figs. c and d represents the summation of each ranking. Thus, shorter columns indicate a better model or MMEs performance. White, yellow, and gray areas describe the G1L, G2I, G3H are the groups based on the resolution of the grid (sg) of the MME: low (≥ 0.8° sg ≤ 1.87°), intermediate (≥ 0.5° sg ≤ 0.7°), and high resolution (≥ 0.23° sg ≤ 0.35°), respectively. The symbols in (a and b) indicate a particular region shown in Fig. 1. The stars represent the 5 best performing ESMs or MMEs in (c and d). For comparison purposes in (b), values below − 1.2 were not plotted, such values were found for the NES region

It is observed that the performance of the ESMs, simulating the temperature extremes, improves—albeit slightly—with the increase in horizontal resolution from each of the institutes. According to Roberts et al. (2019), Gutjahr et al. (2019), and Kodama et al. (2021), the improvements are attributed to reductions in the biases in the radiation components of the upper-atmosphere, adjustment to cloud forcing and, mainly, to the influence of the high resolution of the oceanic model coupled to the atmospheric model. This latter is particularly visible in the ECMWF models, where the coupled configuration showed a strong sensitivity to the increase in resolution of the NEMO (Nucleus for European Modeling of the Ocean) model, producing significant biases only in Australia and northern Europe (Roberts et al. 2018).

3.1.2 Trends in Temperature Indices

Figure 5 shows the annual trends for temperature indices calculated for the observed datasets and ESMs during the 1981–2014 period. The warm days (TX90p), warm nights (TN90p), and warm spell duration (WSDI) indices show a positive trend over most Latin American regions for the three spatial resolutions groups and GMFD and ERA5 (Fig. 5e, g, h), corroborating the results found by Collazo et al. (2022). Though, in the SSA region, both observed datasets described no statistically significant negative trends for the TN90p and WSDI (only GMFD). Similarly, it was observed in the SWS and NWS regions for the TX90p (only GMFD).

Fig. 5
figure 5

Decadal trends in temperature indices at the annual scale for individual ESMs, Multi-Models Ensembles (MMEs) and reference datasets during the 1981–2014 period. The NA (white boxes) indicates no temperature data available from the CMCC-CM2-HR4, IPSL-CM6-1-ATM-HR, and CMCC-CM2-VHR4 models. Boxes with significant trends at the 95% level have stars

The cold days/nights indices present decreasing trends over almost all regions and for all spatial resolution groups and observed datasets (Fig. 5d, f). The exception is in SWS and NWS regions, where positive trends are observed for these indices using the GMFD dataset.

The observational datasets exhibit high agreement around the decadal increase values of the hottest day index, excluding Central America (NCA and SCA), where the trend signals diverge. Using historical data set for 1961–2003 over SCA, Aguilar et al. (2005) find an increasingly significant trend of 0.3 °C per decade. The TNn is shown as an index of low statistical consensus between the observations over CAR and NES regions and the Amazon biome’s regions (SAM, NSA, NWS). This is reflected in significant differences between GMFD and ERA5 for the diurnal temperature range trends over the mentioned regions. Almeida et al. (2017) find general warming of the entire Brazilian Amazon region from meteorological station records; however, da Silva et al. (2019) concluded that there is still considerable uncertainty in the magnitudes and signals of the TNn and DTR trends due to the influence of continuous deforestation.

Regarding the temperature indices calculated from ESMs, the results show a general agreement for the TXx, TX10p, TX90p, TN10p, TN90p, and WSDI indices over most regions. However, SSA presents the greatest diversity of trend signals between TN90p and WSDI indices. Additionally, the SSA region exhibited in NICAM16 and EC-Earth3P models a decreasing trend for TX90p and an increasing trend for TN10p, contrary to the general pattern shown by the rest of the models and the observations. Similar behavior is found in Patagonia (SSA) by Rusticucci and Zazulie (2021) for TN10p during the austral summer (December-March) from CMIP5, while the trends agree with the other regions in the austral winter (June–August).

The DTR index displays high variability in trend signals between regions. On the one side, the tropical regions (SCA, CAR, NWS, NSA, and NES) tend to show a decreasing trend, coinciding with the GMFD values. On the other hand, the extratropical regions (NCA, SAM, SWS, SSA) show an increasing trend, like ERA5. This lack of the model’s ability to capture the DTR index can be due to the fact that it is strongly affected by land surface conditions, which are very heterogeneous within the model´s grid cells and are transitory in time; results that coincide with the findings of Avila-Diaz et al. (2020a).

Overall, these results indicate warmer conditions during the day and night, an increase in the duration of warm episodes over most of South America, and consistently a decrease in cold days and nights, which is in line with the others (Collazo et al. 2022; Gouveia et al. 2022). In general, the MMEs are adequate in representing the observed trends for each index in the regions of South America on an annual time scale, except for DTR, where the most remarkable inconsistencies are present. Moreover, our results suggest that there is no relationship between spatial resolution and trends since we find similar trend values between the high-, intermediate-, and low-resolution simulations for each ESM.

3.1.3 Performance Evaluation in Precipitation Indices

The skill of the ESMs to simulate the extreme climate precipitation indices are shown in Fig. 6. For the sake of brevity, we selected the annual total wet-day precipitation (PRCPTOT; Fig. 6, first column) index to show the performance of MMEs in the three resolution groups (e.g., low, intermediate, and high; see also Fig. 7).

Fig. 6
figure 6

KGE Performance evaluation obtained of ESMs to estimate annual precipitation indices compared to ERA5 and CHIRPS datasets during 1980–2014. The number of regions are the ID according to Fig. 1:

Fig. 7
figure 7

The KGE (af) and climatology bias (gl) of multi-model ensemble (MME) for PRCPTOT during 1981–2014 compared between the ERA5 (left side) and  CHIRPS (right side). The G1L-MME, G2I-MME, G3H-MME are the groups based on the size of the grid (sg) of the MME: low (≥ 0.8° sg ≤ 1.87°), intermediate (≥ 0.5° sg ≤ 0.7°), and high resolution (≥ 0.23° sg ≤ 0.35°), respectively

The KGE values of PRCPTOT tend to be more consistent with the ERA5 dataset than with CHIRPS (Figs. 6 and 7). Furthermore, the models compare poorly in most regions, with KGE values close to 0.0, except for SCA, NSA, NES, and SES regions with KGE values above 0.4. For instance, in these latter regions, MRI-AGCM3-2-H, EC-Earth3P, and CNRM-CM6-1 models show BR and RV close to 1 with a CORR coefficient above 0.45 (Fig. S2 shows the values of this KGE component). Notably, the FGOALS, in their low and intermediate horizontal resolution, display extremely negative KGE values for most regions. Finally, it should be noted that FGOALS-f3-L and FGOALS-H are not independent and agree on the sign and magnitude of the biases, showing dry biases for almost all regions (Fig. S2).

The ESMs display low performance over most regions in simulating intensity indices such as RX1day, RX5 day, R95p, and SDII (see columns second to fourth of Fig. 6). Especially FGOALS and NICAM16 families of models show insignificant correlation, wet biases, and KGE values above 0.0 for most regions in those intensity indices. However, on average, for RX1day, HadGEM3-GC31-LM and ECMWF-IFS-HR show the best relative performance, while for RX5day and R95p, the EC-Earth3P model shows the best results. The ESMs from HighResMIP continue to present difficulties in simulating the SDII, especially for NCA, SAM, SWS, SES, and SSA compared with the simulation from CMIP5/3 simulations (e.g., Sillmann et al. 2013). Moreover, we find that the performance of ESMs depends on the reference dataset used; a similar result is highlighted by recent studies (Akinsanola et al. 2021; Ngoma et al. 2021) using CMIP6 models. For instance, independent of horizontal resolution, when ERA5 and CHIRPS are used for comparison, the ESMs showed wet and dry bias, respectively.

The very heavy precipitation days (R20 mm; Fig. 6, sixth column) are not well captured in most models. However, EC-Earth3P has the best performance across most regions, with KGE values between 0.21 and 0.67, except for NCA, SAM, SWS, and SSA areas. In the case of the IPSL-CM6A and NICAM families, a relatively poor performance was observed in the CAR and SAM regions, with KGE values varying from − 2.88 to − 0.15.

Regarding the precipitation duration indices, the ESMs poorly represent consecutive wet days (CWD; Fig. 6, seventh column) against both reference datasets in almost all regions, except for the HadGEM3-GC31 model family, which shows the best performance in the NSA, NSE, and SAM regions (0.19 ≥ KGE ≤ 0.76). It is important to note that the IPSL-CM6A (− 3.55 ≥ KGE ≤ 0.30) and FGOALS-f3-L (− 0.55 ≥ KGE ≤ 0.21) families of models display the worst performance over the study region compared to ERA5 and CHIRPS. Moreover, the ESMs still exhibit a significant overestimation of CWD in all subregions except for SSA. This known drizzle bias has persisted since the ESMs generation from CMIP3 (Faye and Akinsanola 2022; Medeiros et al. 2022).

The ESMs capture the climate patterns of consecutive dry days (CDD; Fig. 6, eighth column) only over the NCA, with KGE values above 0.3, except for MPI-ESM1-2-HR, MPI-ESM1-2-XR, and NICAM16-7S. Notable good performance is obtained for the HadGEM3-GC31 family of models in the three different resolutions (0.11 ≥ KGE ≤ 0.82). Nevertheless, the FGOALS-f3 family of models presents the minimum negative values of KGE that vary between − 3.10 and − 0.43 over the northern part of South America (e.g., NSA, NWS, and SAM regions).

Significant uncertainty exists in estimating rainfall and evaluating extreme precipitation indices related not only to model sensibility in estimating daily rainfall but also to the thresholds defined in the indices and the regions considered. Therefore, the better performance for CDD and worse for CWD may be related to these features. For example, studies show that a threshold of daily rainfall less than < 1 mm for arid and desert climate characteristics can be considered adequate. Conversely, by considering a threshold of > 1 mm for CWD over regions of high convection activity and high daily precipitation variability, as the Amazon Basin (NSA and SAM) and the Choco region (in NWS) in Colombia, will generate that almost all days with rainfall will be considered over this threshold (Espinoza et al. 2019; Marengo et al. 2011; Villar et al. 2009).

Like the temperature indices (Sect. 3.1.1), we analyze the overall performance of individual models and MMEs. The best performance is found for EC-Earth3P, CMCC-CM2-HR4, HadGEM3-GC31-MM, G1L-MME, and ECMWF-IFS-HR when compared with ERA5 (Fig. 8a–c). However, when CHIRPS is used as a reference dataset, the best models are ECMWF-IFS-HR, G2I-MME, ECMWF-IFS-LR, G1L-MME, and CNRM-CM6-1 (Fig. 8b, d). For this reason, G1L-MME and G2I-MME, followed by ECMWF-IFS-HR, are good alternatives to capture climate precipitation extremes. Our results are consistent with Liang-Liang et al. (2022), who find the best relative performance for the HighResMIP multi-model ensemble and ECMWF-IFS-LR in capturing the precipitation frequency's climatological (1961–1990) distributions over central Asia.

Fig. 8
figure 8

Comprehensive model ranking based on the regional mean of KGE for all precipitation indices between CMIP6 models and ERA5 (a, c) and CHIRPS (b, d) over the ten domains. The height of the color column in (c and d) represent the summation of each ranking. Thus, shorter columns indicate a better model or MMEs performance. White, yellow, and gray areas describe the G1L, G2I, G3H are the groups based on the resolution of the grid (sg) of the MME: low (0.8° ≤ sg ≤ 1.87°), intermediate (0.5° ≤ sg ≤ 0.7°), and high resolution (0.23° ≤ sg ≤ 0.35°), respectively. The symbols in (a and b) indicate a particular region shown in Fig. 1. The stars represent the 5 best performing ESMs or MMEs in (c and d)

The individual CMIP6 models and the MMEs presented relatively higher KGE values when compared with ERA5 than CHIRPS for most indices in the study areas. However, overall our results indicate no clear relationship between the increase in horizontal resolution and improved performance of the precipitation indices, which is coherent with the results of Bador et al. (2020), who show a global performance analysis of precipitation extremes with CMIP6 models. For example, when ERA5 is used as a reference dataset for the CDD index, the magnitudes of KGE for MMEs in their low, intermediate and high resolution are 0.42, 046, and 0.54 over the NCA region, respectively, and 0.49, 0.82, and 0.69 for the HadGEM3-GC31 models. Finally, the selection of the best models depends on the region, index, and reference dataset used.

When the resolution of the ESMs from each institute is compared individually, we find highly different results. In the case of the models of the CMCC, CNRM-CERFACS, EC-Earth-Consortium, IAP-CAS, NERC-MOHC, IPSL, MPI, MRI, and MIROC Institutes, the increase in resolution results in lower or almost equal performance, similar to Scoccimarro et al. (2022) and Liang-Liang et al. (2022). Only the ECMWF models result in a significant increase in performance coincident with an increase in spatial resolution (Bador et al. 2020). According to Roberts et al. (2018), a higher resolution in the ocean–atmosphere coupling improves the teleconnections associated with the precipitation events simulated by the ECMWF model.

3.1.4 Trends in Precipitation Indices

The annual precipitation trends computed for the reference datasets and the ESMs between 1981 and 2014 are shown in Fig. 9. The precipitation indices generally show different magnitudes and trend signals among regions and models when compared with temperature indices, indicating greater complexity. The PRCPTOT in ERA5 and GMFD datasets exhibit significantly mixed trends (i.e., positive and negative) in the CAR, NWS, and NES regions. Meanwhile, the SDII presents differences for these same regions, in addition to NCA, SAM, SWS, SES, and SSA. The PRCPTOT index only shows significant positive trends for CHIRPS in the NWS and NSA regions, which are consistent with the results of the MRI-AGCM3-2-S and EC-Earth3P-HR models in the 0.25° resolution grid.

Fig. 9
figure 9

Decadal trends in precipitation indices at the annual scale for individual ESM, multi-model ensembles (MMEs) and reference datasets during the 1981–2014 period. Boxes with significant trends at the 95% level have stars

In tropical regions (SCA, CAR, NWS, NSA, NES, and to a lesser extent SAM), models show strong agreement with observations regarding increases in RX1day, RX5day, R95p, and R20mm. In contrast, the extratropical regions show the greatest discrepancies with observed trends. This pattern persists in the results of the ESMs, showing greater variability in the trend signals in the extratropical regions than in the tropical ones. Bador et al. (2020) find, on the other hand, that in the extratropical regions of the northern hemisphere, the models agree better with the observations than in the tropical regions. This may be related to greater data records for calibrating ESMs across the northern hemisphere.

The increase of PRCPTOT in the NWS region is consistent with the increasing trends of RX1day, RX5day, R95p, SDII, and R20mm (Fig. 8b–f), which are also significant and consistent in several models at the three resolutions analyzed. In addition, significant positive trends for RX1day, RX5day, and R95p are observed in the SCA region for CHIRPS and ERA5 (Fig. 8b–d) at all three resolutions; however, only the 0.5° and 0.25° multi-models show significant positive trends. A larger number of 0.25° resolution models generally show better coherence with observed trends for RX1day and 0.5° resolution models for RX5day and R95p.

The CDD index shows significant positive trends in the NCA and SWS regions for CHIRPS and ERA5, which are consistent across most models at all three resolutions, including the MME in each group. The most considerable inconsistencies are present in CWD (Fig. 8h). Although ERA5 shows significant negative trends in most regions, except SWS, SES, and SSA, the models fail to represent such a trend; only the 1.0° resolution for CMCC-CM2-HR4 and CNRM-CM6-1 models showed consistencies in the NWS and NSA region. It can also be observed that the 1.0° resolution MME shows negative trends like those of ERA5; however, they are not significant.

Similar to the temperature indices of Bador et al. (2020), the increase in spatial resolution does not influence the values and signals of the precipitation trends, except for specific cases without statistical significance. Finally, the indices display a general increase in rainfall in regions of northern South America and southern Central America, while in NES, NCA, and southern regions of South America, rainfall rates decrease. However, it is also evident that some drought events in the NWS, NSA, and SAM regions (Amazon region) have also increased; yet the temperature indices' changes were more significant than the precipitation indices in these regions. On the other hand, the models and the observations show high consistency in increasing drought events in the NES region. All of the above align with previous studies (Almeida et al. 2017; da Silva et al. 2019; Olmo and Bettolli 2021; Solman et al. 2021; Medeiros and Oliveira 2022).

3.2 Future Projections for the 2021–2050 Period Under Scenario SSP5-8.5

3.2.1 Temperature Projections

Figures 10 and 11 illustrate the regional and spatial changes in temperature indices for 2021–2050 compared to the baseline period (1981–2010) for each MME group. Following the trends from 1981 to 2014, future MME projections describe warmer conditions over most regions and resolutions. Though, between resolutions, a behavior change is observed in the spatial change over the MME-G2I compared to the other two MME groups. G1L-MME and G3H-MME show a consistent reduction of cold nights and days over all the regions, reaching the greatest magnitude near the Equator over NSA and NWS regions (< − 6.6% for TN10p and < − 5.7 for TX10p), and over the CAR region (< − 6.7% for TN10p and < − 5.9 for TX10p). However, TX10p shows an increase, mainly over Brazil’s Andes and sub-tropical coast. This increase is also noted for TN10p and spatially extends into the Sierra Madre in Mexico and Central America. Considering the mean of the regions at the intermediate resolution, TN10p has an increase of 4% days in the SCA region and more than 10% days in the NCA region. Noteworthy, this was not observed when comparing the trends from 1981 to 2014 (Fig. 5). This change detected in these regions deserves a more comprehensive explanation. Whereas MME-G2I includes the CNRM-CM6-1-HR, EC-Earth3P, HadGEM3-GC31-MM, MRI-AGCM3-2-H, and NICAM16-7S models, some of these are not included in the other two groups (see Table 2). Ortega et al. (2021) evaluate 33 models, MME for the CMIP6 and the best 6-models from CMIP6 in Central and South America, listing the best models representing the annual cycle and performance against ERA5 data. From the models in MME-G2I, only MIROC is included in this list of best models. However, even for the Andes, the same study describes that EC-Earth3-Veg and MRI-ESM2-0 have a good performance for temperature.

Fig. 10
figure 10

Future changes of multi-model ensemble in temperature extremes for TXx (ad), TNn (df), TN10p (gi), and TX90p (jl) indices under SSP5-8.5 scenario for 2021–2050 relative to the reference period (1981–2010)

Fig. 11
figure 11

Future changes of multi-model ensemble in temperature extremes indices DTR (ac), TN10p (df), TX10p (gi) and WSDI (jl) under SSP5-8.5 scenario for 2021–2050 relative to the reference period (1981–2010)

Absolute (TXx, TNn) and percentile-based threshold indices (TN90p and TX90p) increase over most regions and the three MME groups. TXx describes an increase that reaches nearly 2.0 °C over the SAM region, followed by NCA with an increase of 1.6 °C. TNn shows values greater than 1.5 °C over the north of NCA and the Pacific slope of the Andes within the three resolution groups. Nevertheless, in South America, the largest increase (> 1.4 °C) is found across the southeast region (SSA). Since the MMEs show poorer performance from 1981 to 2014 (Fig. 5), caution must be used when analyzing the results of this region. In the case of TN90p and TX90p, the three MME groups show spatially inconsistent results over South America, particularly over the Amazon Basin. In comparison, TN90p shows an increase of more than 20% of warm nights over the northeast Amazon Basin, with a spatial inconsistence in MME-G2I. TX90p describes a similar percentage increase over the central Amazon Basin, extending to other regions in MME-G2I: NWS, NES, SES, CAR, and SCA. Interestingly, the MME-G3H shows the greatest percentage change over the Amazon River channel and floodplain, which could be related to the thermal characteristics of the water compared to the surrounding landmass. WSDI, as the previous percentile-bases threshold indices, displays spatial inconsistencies over most of South America between the three resolution MME groups. This is more evident in the Amazon Basin, the Caribbean, and Central America. Unrealistic land–atmosphere interactions and misrepresentation of the Amazon evapotranspiration, at least for half of all CMIP5 models (Baker et al. 2021), which may also be found in CMIP6 models, could be one of the reasons for the spatial inconsistencies. While in the Caribbean and Central America region, this would be related to misrepresentation of land-oceanic interaction.

The diurnal temperature range illustrates contrasting changes among regions. An increase is observed over the three MMEs for NCA and SWS regions in the range of 0.2–0.3 °C, and yet it shows an increase over the Amazon Basin (SAM, NES, and NSA regions). This increase is more remarkable for the MME-G1L. Noteworthy, only for the DTR contrasting changes are observed when comparing the Caribbean and Pacific region of Central America (Durán-Quesada et al. 2020); more evident in the coarsest spatial resolution. The warmer conditions over most regions coincide with an increase in WSDI. However, its spatial representation varies in MME-G2I compared to the other two MMEs. While the NES region increases nearly 39 days and 50 days in the 2021–2050 period compared to 1981–2010 for the highest and lowest resolution, the intermediate resolution describes an increase of nearly 94 days. The highest increase over the MME-G2I is also evident for the rest of the regions, but it is highest for SCA and the northwest and northeast South America. Consider that these two indices had the lowest performance between the MMEs in most regions compared to the 1981–2014 period.

Overall, our results with the CMIP6-MME for all the regions and resolutions indicate a warmer future, which is consistent with the results described by the IPCC AR6 (IPCC 2021) and previous works on Latin America and the Caribbean (Almazroui et al. 2021b, a; Lovino et al. 2021; Ortega et al. 2021; Seneviratne et al. 2021). The summary of the main projected changes expected for temperature climate extremes is depicted in Fig. 12 and Table 3. Though, as observed for G2I-MME, spatial representations indicate a reduction of the models' performance in representing atmospheric conditions related to the diurnal temperature range and dry spell duration. The different magnitudes in MME across the three resolutions can be explained as they do not account for the same models in the ensemble mean calculation (Table 2). For instance, from G2I- MME and for the NCA, SCA, and CAR regions, Almazroui et al. (2021b) find that the largest surface air temperature negative bias, when compared to the Climatic Research Unit (CRU) (1995–2014), is the CNRM-CM6-1-HR, which is included in the lowest and intermediate horizontal resolution MME groups. On the other hand, the CRU dataset has shown a satisfactory performance compared to other temperature datasets, at least for the NCA and SCA regions (Cavazos et al. 2020).

Fig. 12
figure 12

Summary of the projected changes in temperature climate extremes for each Latin America and Caribbean reference region for 2021–2050 under the SSP5-8.5 scenario

Table 3 Projected changes over the 2021–2050 period in temperature indices relative to the reference period (1981–2010) for the Multi-Model Ensemble for each resolution group (G1L, G2I, G3H) over analysis regions under the SSP5-8.5 scenario

3.2.2 Precipitation Projections

Projected changes in precipitation indices relative to the 1981–2010 period are presented in Figs. 13 and 14. As depicted by Sillmann et al. (2013) and Avila-Diaz et al. (2020a, b), relative changes are expressed in percentage. Table 4 shows each region's changes for 2021–2050 relative to the reference period 1981–2010.

Fig. 13
figure 13

Future changes of multi-model ensemble in precipitation extremes indices PRCPTOT (ac), CDD (df), RX1day (gi), and RX5day (jl) under SSP5-8.5 scenario for 2021–2050 relative to the reference period (1981–2010)

Fig. 14
figure 14

Future changes of multi-model ensemble in precipitation extremes R95p (ac), SDII (df), R20mm (gi), and CWD (jl) indices under SSP5-8.5 scenario for 2021–2050 relative to the reference period (1981–2010)

Table 4 Projected changes over the 2021–2050 period in precipitation indices relative to the reference period (1981–2010) for the Multi-Model Ensemble for each resolution group (G1L, G2I, G3H) over analysis regions under the SSP5-8.5 scenario

The PRCPTOT index change depicts the highest decrease across resolution and regions over the Chilean coast (SWS), with magnitudes between − 7.7 and − 4.0% for the different resolution groups, respectively. These changes follow the significative negative trends (p < 0.05) found for the 1981–2014 period among the three MMEs, though this trend is not statistically significant for the observed datasets. A decrease is also found for NCA and NES: − 7.1% and − 6.2% for the lowest, − 4.7% and − 7.7% for the intermediate, and − 3.4% and − 3.0% for the highest resolution, respectively.

Absolute indices (RX1day and RX5day) show an increase over nearly all regions and between resolutions, with SCA and the CAR depicting the largest increase (more than 8%) for the lowest and highest resolutions. Conversely, SWS reaches − 2.0% for both indices at the intermediate and highest resolution. As for these two indices, SCA region also presents a relative change in the R95p index of 13.7%, 34.1%, and 18.1% from the lowest to the highest resolution, only surpassed by the NWS with 13.5%, 66.1%, and 15.2%, respectively. However, the highest value for both regions could be related to modeling issues, such as the spatial inconsistency in G2I-MME. At the same time, the SDII displays a general increase in all regions for the different MMEs, especially in the northern (NCA, SCA, and CAR) and central regions (NWS and NSA), with changes that reach up to 6.92%.

Finally, the spatial variability of CDD is consistent over the three resolutions, with an increase nearly over all regions. In South America, the largest increase is localized in the SAM and NES regions, which agrees with the results by Medeiros et al. (2022) with CMIP6 models. Over SAM, the change reaches nearly 16.0% for the intermediate resolution, while an increase of 19.2% is projected over NES for the same resolution. A similar increase, larger than 10%, is projected for NCA, SWS, and SES. A decrease over Panamá, Colombia, and Venezuela is described by the three MMEs, being this reduction more evident over the lowest resolution. On the other hand, the CWD index is projected to decrease over the Amazon Basin, NCA, and SWS, while it increases over the Andes and La Plata Basin for G1L-MME. In the case of the reduction in CWD projected by the intermediate resolution, this is larger than the other two resolutions. A decrease in the magnitude of the projected change in CWD in the Amazon from G1 and G2 resolutions to G3 is remarkable, going from a more pronounced decrease in CWD in G1 and G2 to a pattern close to zero in G3 (Fig. 14j, k, l and Table 4). We could expect that an increase in spatial resolution could improve the simulation of the interaction between surface and atmosphere, and consequently improve the representation of mesoscale meteorological systems, leading to an improvement in the simulation of precipitation. However, this hypothesis cannot be confirmed in our analyses, as the performance of the simulations did not improve with increasing resolution of the climate models (Fig. 8), and agrees with previous findings by Akinsanola et al. (2020) and Bador et al. (2020). Moreover, Almazroui et al. (2021a), analyzing monthly mean temperature and precipitation, found no clear systematic linkage between model performance and the magnitude of projected climate change. In this sense, further studies are needed to investigate this subject in more detail. Additional studies to investigate this in more detail, including an individual ESM assessment to understand how each model projects extreme precipitation climate indices, are encouraged.

In general, results show an increase in all precipitation indices in the future, except for a reduction in PRCPTOT, which is consistent with an increase in CDD. The summary of the main projected changes expected for precipitation climate extremes is depicted in Fig. 15. Studies that have found similar results (Ge et al. 2021; Medeiros and Oliveira 2022; Santos et al. 2019) conclude that this could imply a potential risk of intensified extreme rainfall, which would accentuate the vulnerability of various socioeconomic sectors, such as agriculture, water management, forests and, disasters preparedness.

Fig. 15
figure 15

Summary of the projected changes in precipitation climate extremes for each Latin America and Caribbean reference region for the 2021–2050 period under the SSP5-8.5 scenario

Over the Amazon Basin, a deficient model representation of land–atmosphere interactions (Baker et al. 2021; Levine et al. 2016; Ruiz-Vásquez et al. 2020; Yin et al. 2013) could affect the estimation of extreme climate indices in ESMs (Avila-Diaz et al. 2020a). Further analysis is needed for the HighResMIP CMIP6 models, particularly for Central America and the Caribbean Islands, in which the land–ocean interaction between the narrow landmass and two oceans (Durán-Quesada et al. 2020; Herrera et al. 2020) is still a challenge for the ESMs.

In summary, the three MMEs describe an intensification of extreme rainfall events, which are spatially consistent between the resolutions (Fig. 15). The greatest changes are mostly located in the SCA region for the lowest and highest resolution. However, there are no strongly projected changes across the different indices in the SSA region. RX1day and RX5day precipitation amounts increase over almost all regions except SWS and the northern part of NCS. This increase is accompanied by a projected rise in the R95p index, which spatially agrees, at least for the G3H-MME.

4 Discussion and Concluding Remarks

We assessed the performance of a sub-set of HighResMIP models, which are members of the CMIP6, in simulating daily temperature and precipitation climate extremes events over Latin America and the Caribbean region during 1981–2014. This was achieved by comparing three gridded datasets (ERA5, CHIRPS, and GMFD). Additionally, we evaluated the impact of the increase in the horizontal spatial resolution in the HighResMIP models in estimating extreme climate variability on a local/regional scale. Finally, the projected extreme temperature and precipitation changes for 2021–2050 under the Shared Socioeconomic Pathways SSP5-8.5 scenario were investigated.

Historical gridded datasets (reanalysis and satellite-based precipitation product) evaluated during the last few decades (1981–2014) show that, in general, for both temperature and precipitation indices, the ERA5 dataset displayed better results compared to GMFD and CHIRPS, respectively. However, it is worth mentioning that these results depend on the region and models analyzed. For example, in the regional ranking based on the average of the KGE values for the temperature indices, the ERA5 and GMFD datasets presented five models and/or ensembles with a better performance each. Four of these five were the same for both datasets (ECMWF-IFS-LR, G1L-MME, G2I-MME, and G3H-MME). As for the precipitation indices of the five models and/or ensembles with the best performance for ERA5 and CHIRPS, only two were the same for both datasets (ECMWF-IFS-HR and MME-G1L).

Regarding the three groups of horizontal resolutions used, we can conclude that there is no strong relationship between an increase in resolution and improved performance of the HighResMIP models, which is consistent with what was found by Bador et al. (2020) and Scoccimarro et al. (2022). Noteworthy, among the five best models for temperature indices, both for ERA5 and GMFD, two are part of the 1.00º × 1.00º resolution (ECMWF-IFS-LR and G1L-MME). As for the precipitation indices, three of the five best models compared to the ERA5 dataset are included in the 0.50º × 0.50º resolution (HadGEM3-GC31-MM, ECMWF-IFS-HR, and EC-Earth3P) and two in the 1.00º × 1.00º resolution (CMCC-CM2-HR4 and G1L-MME). While, for the CHIRPS dataset, these results are inverted, two at 0.50º × 0.50º resolution (ECMWF-IFS-HR and G2I-MME) and three at 1.00º × 1.00º resolution (ECMWF-IFS-LR, CNRM-CM6-1, and G1L-MME).

Multi-model ensemble means projections for the near future (2021–2050) indicate an intensification of the warming pattern accompanied by an increase in the extreme precipitation events under the SSP5-8.5 scenario in the three resolutions across most of the ten regions (Figs. 12 and 15). These patterns are strictly in line with the results described by others (AghaKouchak et al. 2020; Gulizia et al. 2022; Medeiros et al. 2022; Olmo et al. 2022). The intensification of temperature warm extreme events may increase heat stress vulnerability (Lapola et al. 2019). Likewise, the projected increase in the extreme precipitation climate indices elevates the risk of heavy rainfall and landslides and indicates that dry spells may be long-lasting in the near future due to climate change (Debortoli et al. 2017; Marengo et al. 2017; Medeiros and Oliveira, 2022). This is consistent with much of what we have experienced over the twentieth and early twenty-first centuries due to a strongly globalized, fossil-fueled society (Chen and Sun 2021; Riahi et al. 2017). However, some spatial inconsistencies exist when comparing the G2I-MME with the lowest and highest resolution groups. Therefore, a careful interpretation is needed when analyzing the effect of increasing resolution in future projections. Not all the models from the HighResMIP have climate projection information for all the spatial scales. For example, when evaluating the temperature indices, five models were available for the G2I-MME, four for the G3H-MME., and only three were used for the G1L-MME. For the precipitation indices, we used four, five, and six models for each MME from the lowest to the highest resolution, respectively. Despite this challenge, the spatial representation of percentage changes, at least for the precipitation indices, shows a consistent relationship among the three MME groups. We hypothesize that a misrepresentation of the temperature could be related to the greatest bias of the CNRM-CM6-1-HR described by Almazroui et al. (2021b). We also acknowledge that regions like Central America and the Caribbean require the finest spatial resolution since the width of their landmass and the diversity of phenomena modulate their natural extreme climate variability.

Earth System Models are essential for understanding the climate variability and atmospheric teleconnections that generate possible future scenarios to provide scientifically based decision tools for developing adaptation and mitigation plans. As the latest IPCC report emphasizes (IPCC 2022), these decision-support tools are critical to ensuring a livable future. This is nowhere more evident than in front-line communities across Latin America and the Caribbean, where vulnerability to extreme climate events is high (WMO 2021b). However, ESM models contain inherent uncertainties, and the complexity of phenomena related to extreme meteorological events across the region makes interpreting their outcomes a significant challenge for the climate research community. Furthermore, since local decision makers desire local projection information, continued improvements are needed in the HighResMIP simulations. One such study should utilize bias correction techniques of ESMs with observational reference datasets to provide robust climate projections for climate impact studies that require high horizontal resolution information (e.g., less than 0.25° or 25 km).