Do CMIP models capture long-term observed annual precipitation trends?

This study provides a long-term (1891–2014) global assessment of precipitation trends using data from two station-based gridded datasets and climate model outputs evolved through the fifth and sixth phases of the Coupled Model Intercomparison Project (CMIP5 and CMIP6, respectively). Our analysis employs a variety of modeling groups that incorporate low- and high-top level members, with the aim of assessing the possible effects of including a well-resolved stratosphere on the model’s ability to reproduce long-term observed annual precipitation trends. Results demonstrate that only a few regions show statistically significant differences in precipitation trends between observations and models. Nevertheless, this pattern is mostly caused by the strong interannual variability of precipitation in most of the world regions. Thus, statistically significant model-observation differences on trends (1891–2014) are found at the zonal mean scale. The different model groups clearly fail to reproduce the spatial patterns of annual precipitation trends and the regions where stronger increases or decreases are recorded. This study also stresses that there are no significant differences between low- and high-top models in capturing observed precipitation trends, indicating that having a well-resolved stratosphere has a low impact on the accuracy of precipitation projections.


Introduction
Precipitation is one of the key climatic variables contributing to the water and energy balance across spatial and temporal scales, ranging from local catchments to globalscale hydrology. Due to their role in accelerating extreme weather events like drought and floods, precipitation anomalies can have significant hydrological, environmental and socioeconomic implications. As such, it is highly relevant to identify possible long term trends in precipitation, as this knowledge can contribute to a better understanding of many ecosystem processes and functions (e.g. land aridity, vegetation greening and browning, crop yield, water resources availability, biodiversity, etc.). In the same context, knowledge about possible future changes in precipitation is essential to inform societal and ecological adaptation. For this purpose, outputs from climate models, with different configurations and scenarios of greenhouse gas (GHG) emissions, have been widely used to assess possible future precipitation changes (e.g. Orlowsky and Seneviratne 2013;Huang et al. 2014;Swain and Hayhoe 2015;Raut et al. 2017;Martin 2018). These models simulate the possible complex feedbacks in the global climate system, including interactions between ocean, land, and atmosphere processes (Räisänen 2007;Dirmeyer et al. 2013). Simulations from these models span a wide variety of climate variables, including precipitation.
Different climate model simulation experiments underpin the reports of the Intergovernmental Panel on Climate Change (IPCC). The third phase of the Coupled Model Intercomparison Project (CMIP3) was developed for the Fourth Assessment Report (AR4) of the IPCC (Meehl et al. 2007). A few years later, simulations of the fifth phase (CMIP5) were delivered in 2011 (Taylor et al. 2012) to coincide with the IPCC's Fifth Assessment Report, followed by those of the sixth phase (CMIP6) (Eyring et al. 2016). Based on these model experiments, several studies have assessed future scenarios of precipitation at global (e.g. Orlowsky and Seneviratne 2013) and regional (e.g. Mariotti et al. 2008;Su et al. 2013) scales. For example, using CMIP5 models under high emission scenarios, different studies have suggested an increase in precipitation at high latitudes of the Northern Hemisphere over the coming century, while general dryness dominates over subtropical regions (e.g. the Mediterranean, southern North America, central America, and South Africa) (e.g. Dufresne et al. 2013;Knutti and Sedláček 2013;Orlowsky and Seneviratne 2013;Cook et al. 2014;Asadi Zarch et al. 2015). The uncertainty associated with these future simulations is commonly assessed using the spread among different emission scenarios, projections, and models (Schaller et al. 2011;Alexander and Arblaster 2017), with a high spread suggesting an increase in the uncertainty of projections (Orlowsky and Seneviratne 2013;Zhao et al. 2015b).
A necessary step to ensure the reliability and performance of climate models is to validate their skill and accuracy in reproducing near-present climate. This procedure is typically implemented by comparing model outputs for a historical period with available ground-based climate observations. In this context, several studies have confirmed the greater skill of climate models in reproducing characteristics (e.g. climatologies, anomalies, and magnitude and spatial patterns of trends) of air temperature over precipitation. (e.g. van Oldenborgh et al. 2013;Kumar et al. 2013;Zhao et al. 2015b;Lee et al. 2019). In fact, climate models show limited skill in capturing long-term observed precipitation variability (e.g. van Oldenborgh et al. 2013;Chen and Frauenfeld 2014).
Several comprehensive studies have attempted to compare long-term precipitation trends from observations with model simulations, mainly CMIP5 models (e.g. Kumar et al. 2013;Nasrollahi et al. 2015;Zhao et al. 2015a;Knutson and Zeng 2018). To date, a global assessment of long-term precipitation trends using CMIP6 simulations is still lacking, albeit with the availability of some preliminary regional evaluations (e.g. Rivera and Arnould 2020;Xin et al. 2020;Peña-Angulo et al. 2020). In addition, very few studies have assessed the possible role of model vertical structure in determining their capacity to reproduce precipitation trends. This includes, for example, the possible effects of the vertical resolution and top height of the models. While some models (so called low-top models) are characterized by their coarse vertical resolution in the stratosphere and associated limitations in explicitly modeling climate processes in this layer (possibly affecting stratosphere-troposphere coupling), other models (so called high-top models) include a well resolved stratosphere (Charlton-Perez et al. 2013;Hurwitz et al. 2014). Numerous studies have evaluated the effectiveness of high-top models to capture the observed signal in precipitation and the different atmospheric mechanisms driving multidecadal variability (e.g. Cagnazzo and Manzini 2009;Lee and Black 2014;Wei et al. 2018a;Haase et al. 2018). Others indicate that models without a well-resolved stratosphere show limitations in identifying atmospheric circulation mechanisms (Deser et al. 2012;Osprey et al. 2013). According to Scaife et al. (2012), future changes in stratospheric circulation are likely to double the increase in winter precipitation over Western and Central Europe, compared to projections from low-top models. Similarly, Marsh et al. (2013) compared simulations from low-and high-top versions of the NCAR Community Earth System Model (CESM), suggesting systematic differences between these two types of models in reproducing some variables like wind and precipitation, especially in the extratropics. These earlier findings emphasize that stratosphere-troposphere coupling could be a major source of uncertainty in climate model projections of precipitation.
In the pursuit of this background, the main aims of this study are to: (i) explore the extent of agreement in both the magnitude and spatial distribution of precipitation trends from observations and climate simulations, (ii) compare precipitation signals in CMIP5 and CMIP6 using multimodel groups, and (iii) assess improvements in reproducing observed precipitation trends in climate models with and without a well-resolved stratosphere (i.e. low-top vs. hightop models).

Data
This study employs observed annual precipitation data from the Global Precipitation Climatology Centre (GPCC) dataset (https:// www. dwd. de/ EN/ ourse rvices/ gpcc/ gpcc. html) at a 2.5º grid resolution. The GPCC dataset is maintained by the Deutscher Wetterdienst, German Weather Service under the auspices of the World Meteorological Organization (WMO). Due to the uneven distribution of gauged data from many national meteorological networks in the early decades of the twentieth century, our analysis for these earlier decades (i.e. from 1891) is restricted to regions with higher density of observations (e.g. North America, Europe, Middle East, North and South Africa, Australia, India, the La Plata basin in South America, and some sparse regions in Central Asia, Central America and East Asia) (See colored regions in the first column of Fig. 1) and, accordingly, more reliable outputs from interpolation algorithms. Our analysis is limited to the year 2014, which corresponds to the end of the historical CMIP6 model simulations.
We use the available historical monthly simulations with observed radiative forcing from a set of CMIP5 and CMIP6 models spanning the period 1891-2014. Because each model has a different number of ensemble members (ranging from one to ten, with the most common situation being one single member per model), a single member per model is used to maintain model homogeneity. A list of CMIP models employed is available in Table S1. Given that historical simulations of the CMIP5 are only available up to 2005, simulations from the RCP8.5 scenario are used for the period 2006-2014. This is justified as the evolution of CO 2 observations from 2006 to 2014 resembles those used in the RCP8.5 experiment more than other scenarios (Schwalm et al. 2020). The spatial resolutions of the CMIP models vary considerably, with the CMIP6 models having improved spatial resolution over the CMIP5 models (Table S1). High-resolution climate models were shown to Fig.1 a Spatial distribution of the magnitude of change (in z-units/ decade) of annual precipitation using GPCC observations and b Spatial distribution of their statistical significance (using a p-level of 0.05 for statistical significance, the dark blue and dark red colours show areas with statistically significant positive and negative trends, respectively, while the light blue and yellow colours denote regions with statistically non-significant increases and decreases, respectively). Results are presented for the periods 1891-2014 and 1951-2014. Due to the scarcity of GPCC data, white areas were not used for comparison be capable of reproducing precipitation with the same skill as regional climate models (Demory et al. 2020). Nonetheless, all models were resampled to a common grid interval of 2.5° using a bilinear interpolation approach to match with the selected resolution of the GPCC data. The aim was to have a comparable spatial resolution to CMIP5 models and a common spatial resolution for all models, allowing for direct comparisons between all models.
As the gridded precipitation data, climate model data are aggregated into annual series. To assess the ability of the well-resolved stratosphere models to represent observed long-term precipitation trends, models are divided into two broad categories: high-top and low-top models. To define a model as a high-top or low-top model, the 1 hPa lid height was used following Charlton-Perez et al. (2013). Specifically, we assess whether there are significant differences in reproducing long-term observed precipitation trends between the two types of models (i.e. low-top vs. high-top models). We also derive trend results for the complete group of models (i.e. including both types of models).

Methods
In literature, it is well-established that the prediction and simulation of precipitation is a challenging task, with large uncertainties from different sources. On one hand, total precipitation can vary largely among models (Su et al. 2013). On the other hand, overestimation or underestimation biases affect large areas of the world (Brands et al. 2013;Mehran et al. 2014;Aloysius et al. 2016). In addition, precipitation is characterized by its high spatial and temporal variability. As we are more interested in trends than in the total volumes, and to account for these issues, we decided to standardize the precipitation series to have an average equal to zero and a standard deviation equal to one. This procedure makes precipitation series comparable over space and time, allowing consideration of the strong spatial and temporal variations of precipitation worldwide. Importantly, based on these standardized precipitation series, a comparison between observations and model outputs was feasible, both spatially and temporally. Following Knutson and Zeng (2018), precipitation series were standardized by means of the widely used procedure of calculating the Standardized Precipitation Index (SPI) (McKee et al. 1993) through fitting the data to a gamma distribution. This procedure was applied to the series for both observations and simulations of the different individuals of each group of models.
To analyze changes in annual precipitation series from both observations and model outputs, we employed a modified version of the non-parametric Mann-Kendall statistic (Hamed and Rao, 1998;Yue and Wang, 2004). This statistic accounts for the possible effect of autocorrelation on trend detection by returning the corrected p values after accounting for temporal pseudoreplication. To facilitate direct comparison between different regions worldwide, we presented the trend results using four categories of trends: positive and significant (p < 0.05), positive and non-significant (p > 0.05), negative and non-significant (p > 0.05) and negative and significant (p < 0.05).
It is quite difficult to establish a robust comparison between trends in series of observations and a group of model simulations. Some studies have made comparison between observations and model simulations by means of multimodel averages (e.g. Orlowsky and Seneviratne 2013;Sillmann et al. 2013;Kumar et al. 2013;Dai and Zhao 2017). However, in comparison to observed series, this approach dramatically reduces internal variability and the standard deviation of the resulting average series from a group of models, making it difficult to establish a comparison with observed series characterized by strong natural variability. For this reason, we calculated the trend for each independent model and computed an average of these magnitudes for all models. Although this approach reduces the range of precipitation changes derived from the models, as compared to those of observations, it allows for a comparison of the spatial patterns of the magnitudes of change between observations and climate models. The percentage of models showing the same trend category, as compared to a multi-model average, was also computed. Herein, four trend categories were considered, including statistically positive (p < 0.05), statistically negative (p < 0.05), statistically non-significant positive (p > 0.05), and statistically non-significant negative (p > 0.05) trends. To assess the magnitude of change in precipitation (z-unit/decade), we used the least squares regression model, in which time was considered as an independent variable, while precipitation represented the dependent variable. The slope of the regression indicated the amount of change, with a higher slope suggesting greater changes and vice versa. As the identification of hydroclimatic trends is strongly impacted by the selection of the base period for analysis (Hannaford et al. 2013;Vicente-Serrano et al. 2020;Peña-Angulo et al. 2020), trends were assessed for two different periods: 1891-2014, and 1951-2014. In addition, we considered the series of observations as a member of each of the group of models. This method is used to determine the position of the trends in the observations in relation to the trends in the members (the independent model runs). This enables a statistical analysis of the possible differences between trends in observations and model outputs. In this context, it is acceptable to infer that changes in observed precipitation are significantly (p 0.05 two-tails) different from those of the group of models when the magnitude of change in the observations is above the 95th or below the 5th percentiles of changes suggested by the different members of a large-enough group of models. A similar procedure has been adopted in earlier studies (e.g. Nasrollahi et al. 2015;Knutson and Zeng 2018).
Also, we compared the magnitude of change in the observations with the different independent model simulations by looking at the position (percentile) of the magnitude of trend in the observations, as compared to the entire range of the models. This approach allows determining if the models overestimate or underestimate precipitation patterns. The significance of the differences between the regression slopes of the observations and the independent models and also between the observations and the average group of models was assessed by means of a statistical test for the equality of regression at a confidence interval of 95% (p < 0.05) (Paternoster et al. 1998). Following this approach, we computed the percentage of independent models that showed significant differences in the magnitude of precipitation changes, compared to observed precipitation. Finally, to remove the noise at the spatial level, we compared trends for different latitude bands. To do so, we compared the observed trend with the trend of each group of the CMIP5 and CMIP6 models. We utilized a two-sample t-test, at the 95% significance level, to compare the precipitation trends between observations and models in the various latitudinal bands and explore whether there are significant differences between them. Figure 1 illustrates the spatial distribution of annual precipitation trends calculated from GPCC precipitation and their significance. Annual precipitation (GPCC) showed negative and significant trends in small areas of the Near East, West Africa, North Africa, and some regions of Chile from 1891 to 2014. On the other hand, Northern Europe, parts of northeastern North America and the La Plata basin in South America exhibited the most significant increase. Large areas of Central, Northern and Eastern Europe exhibited statistically significant increase in annual precipitation between 1891 and 2014. We also noted the dominance of statistically significant increasing trends in northwestern North America, North Australia, and Greenland. The dominant increase in North Eurasia was identified clearly from 1951 to 2014, while the main negative and significant trends were recorded for this period in Central and West Africa.

Results
For the long term (1891-2014), the average change from the individual CMIP6 (Fig. 2) and CMIP5 ( Figure S1) models depicts less spatial consistency with observations. In particular, simulated precipitation shows increases in North Eurasia, Northeast North America, and the La Plata basin, which are qualitatively consistent with observations. Rather, the majority of the world regions show more heterogeneous patterns with less agreement between models and observations. Accordingly, a non-significant statistical relationship was noted between patterns of change in observations and those of the various individual models ( Fig. 3 and Figure S2). For 1891-2014, the Mediterranean region, east China, and the Philippines showed the largest decrease in the average change in the different CMIP6 models. This decrease is more pronounced in low-top models than in high-top models. This decrease was also evident for various CMIP5 groups of models, with precipitation decreasing in Central America, southern Africa, Bangladesh, and Indonesia. Given the internal climate variability and the inconsistent behavior of observations and models, the magnitude of trends between observations and models is expected to differ. A spatial comparison clearly demonstrates that the magnitude of change in annual precipitation from models tends to reinforce precipitation decrease in some regions of the world, particularly in subtropical areas in CMIP5's and in low-top models (Figs. 2 and S1). Between 1951 and 2014, there is noticeable spatial divergence in precipitation trends between model averages and observations, especially in South America, West North America, Africa, the Mediterranean, Australia, amongst other regions. Such divergence is expected to be larger in the trend analysis for shorter periods since the relative contribution of internal variability to the overall trend is typically larger than it is for longer trends (e.g., 1891-2014). The spatial relationship between model averages and observations is statistically non-significant for both CMIP5 and CMIP6 experiments, as well as low-and high-top models.
Thus, the average differences between the individual models and the observations seems to be substantial in several world regions, with a clear underestimation of long-term (1891-2014) observed precipitation trends, particularly in North and East Europe and, central and West Africa, most of Russia, South and North America. When analyzing trends for different latitudinal bands, the spatial differences between annual precipitation trends in observations and model simulations are clearly identified (Figs. 4 and S3-S5). For the period 1891 to 2014, the different groups of CMIP6 models show that models underestimate precipitation increases at high latitudes in the Northern Hemisphere (above 55° N). We found statistically significant differences between the average trends in observations and the individual model simulations for both low-and high-top models. A similar picture can be seen for the northern latitudes between 40° and 55°. The CMIP6 models perform better at mid-latitudes and in tropical and subtropical areas, with no statistically significant differences between trends in observations and models, regardless of using low-or high-top models (not shown). The CMIP5 models show a similar pattern, though they show statistically significant differences between observations and models in more latitudinal bands than the CMIP6 models ( Figure S3). This pattern varies between latitudinal bands for the period 1951-2014 but there are no relevant differences with the behavior of the CMIP6 models. No matter the latitudinal band, there is a domain of statistically significant differences in average trends between observations and models.
All these results contrast with the few significant differences in annual precipitation trends between models and observations (Figs. 5 and S6). This is simply because there is a dominance of non-significant differences between the trends in the individual models of each group of models and the series of observations ( Figure S7 and Figure S8). Thus, for the period 1891-2014, with the exception of a few areas in Eastern Europe, Egypt and La Plata basin, the rest of the regions show a low percentage of models showing statistically significant differences with the trend of observations. With the exception of Central and West Africa, there are few areas where a large percentage of models (> 80%) show statistically significant differences in trends between 1951 and 2014 when compared to observations. The differences in the spatial patterns of trends and the substantial differences in the average magnitude of the changes between the individual models and the observations, suggest that the models' ability to reproduce trend in observations is low. Nevertheless, the small number of statistically significant differences in annual precipitation trends between models and observations seems to suggest the opposite. The problem is that precipitation is characterized by strong interannual variability, making it difficult to obtain statistically significant differences between two series characterized by large interannual variability. Nevertheless, although differences are mostly non-significant, in most of the world's regions, the annual precipitation trend, as recorded in the majority of models, differs from the observed trends. Figure 6 and Figure S9 show the spatial distribution of the percentile of trends in annual precipitation observations in comparison to the trends in the independent models of each CMIP6 and CMIP5 group of models, respectively. The first remarkable feature is the small differences between the spatial patterns in the groups of low-and high-top models and between CMIP5 and CMIP6 models, which suggest similar skill among the different groups of models. For the period 1891-2014, in the majority of the world regions with available data, the models tend to underestimate or overestimate the magnitude of change of annual precipitation (Tables 1 and S1). This would affect both regions with increasing or decreasing annual precipitation in observations, and it means that, despite the fact that there are few statistical differences between trends in individual models and trends in observations, the models are primarily overestimating decreasing trends in annual precipitation in southwestern Europe, southern Africa, and western Australia, and underestimating increasing annual precipitation trends in western North America, the la Plata basin, western Europe, Scandinavia, and western Russia. For the period 1951-2014, most of the models overestimate precipitation increase in the Sahel, central and western Africa, the middle East and eastern Asia, and they underestimate precipitation increase in large areas of north Eurasia, France, West Australia and large areas of North and South America.
The misrepresentation of the annual precipitation trends in different groups of models is also identified, by means of the most common trend significance category ( Fig. 7 and Figure S10). In comparison to the spatial patterns illustrated in Fig. 1B, where large areas of northern and western Europe, northern Eurasia, the La Plata basin, western North America, central Asia, Australia, etc., showed positive and statistically significant trends, the different CMIP5 and CMIP6 groups of models show a dominant pattern characterized by non-significant trends in these areas. Thus the dominant pattern of the models would clearly underestimate regions with significant annual precipitation trends. In comparison with the CMIP5 models, the CMIP6 models tend to reduce the areas with dominance of significant trends. Nonetheless, in some areas where observations do not show statistically significant changes, CMIP5 models show dominant negative significant trends (e.g. Bangla Desh, Indonesia, the Iberian Peninsula). The same dominant behavior is recorded for the most recent period . Thus, it is paradoxical that the regions where the models tend to record dominance of statistically and significant positive trends agree with areas in which observations show a dominant negative trend such as in the Sahel. Moreover, the large areas with positive and significant trends in North Eurasia, West China and Northwest Australia are characterized by the dominance of non-significant trends in both CMIP5 and CMIP6 model experiments. It is also a paradox that only the low-top CMIP5 models record the dominance of positive and significant trends in northern Eurasia, while the rest of the areas where low-top CMIP5 models show dominance of significant trends (negative in North South America and positive in West Africa and Central Asia) do not correspond to significant trends in observations. In general, the sign and significance of annual precipitation trends in observations are reproduced by few models in each group (Figs. 8 and S11). Few areas in the group of CMIP6 models have more than 60% of the models with the same sign and significance of observations. CMIP5 models tend to show the same pattern, although the group that includes both low-and high-top models has higher percentages in eastern North America and Western Europe for the period 1951-2014. Thus, there is dominance of the world regions, where only few models agree on the trend's sign and significance, compared to only a few areas where most models agree on the trend in observations (Figs. 9 and S12).

Discussion
This study provides a global evaluation of long-term annual precipitation trends from observations, as compared to model-based simulations from the CMIP5 and CMIP6 experiments. Specifically, this assessment considered data spanning the periods 1891-2014 and 1951-2014. Our analysis accounts for two different types of models depending on their representation of the stratosphere (i.e. low-top and high-top), with the aim of determining whether a good representation of this atmospheric layer may improve the ability of CMIP models to reproduce observed precipitation changes. An assessment of this possible influence is crucial for evaluating the capacity of climate models to determine long-term precipitation trends under future precipitation projections.
It should be noted that making a direct comparison between observed and modeled precipitation trends has some limitations. In this study, we included observations as an additional member in the different group of models: an approach that was adopted in earlier studies (e.g. Knutson and Zeng 2018;Peña-Angulo et al. 2020). Nevertheless, due to the large range of variability introduced in each group of models, this method can introduce some uncertainties. While there are noticeable differences in the spatial patterns of trends in observations and the dominant patterns of the different group of models (as represented by the average of the trends), no statistically significant differences exist between trends in observations and model members. Exceptionally, in a few cases, the "observation member" might be placed outside the range of trends suggested by the majority of models (Nasrollahi et al. 2015).
Nevertheless, this result cannot support that models provide a good assessment of trends in observations. Although there are very few cases in which there are statistically significant differences in the magnitude of trends between models and observations, our findings suggest that, in large regions of the world, the majority of models tended to either overestimate or underestimate the trend magnitude found in observations. This is evident, given that the annual precipitation trend in observations corresponds mostly to very low or very high percentiles in comparison to the trend magnitudes suggested by the different members of the group. In this context, recalling that simulated changes in precipitation rarely differ significantly from observations, due to the low signal-to-noise ratio presented in local precipitation changes, the simple assessment of statistical significant differences between observations and models is a poor approach (Bhend and Whetton 2013). This effect was demonstrated in regions with high interannual precipitation variability, such as the Western Mediterranean, where earlier studies showed that both CMIP3 and CMIP5 models simulate more drying than observations, albeit with the absence of statistically significant differences in the magnitude of these trends between observations and individual ensemble members (Peña-Angulo et al. 2020). For this reason, although there is a prevalence of statistically non-significant differences in the trend magnitude of precipitation between observations and model simulations, it cannot be inferred that models are capable of reproducing observed trends. Indeed, our study found that model simulations generally show poor skill in replicating trends in observations and that annual observed precipitation trends in many parts of the world is generally overestimated or underestimated by the climate models. In addition, the patterns of observed and modeled precipitation trends were less consistent over space. Overall, our findings support previous studies that demonstrated low agreement between the magnitude of observed and modeled precipitation changes (e.g. Asadieh and Krakauer 2015;Bishop et al. 2018), as well as inconsistent spatial patterns of these changes (e.g. van Oldenborgh et al. 2013;Kumar et al. 2013;Sheffield et al. 2013;Zebaze et al. 2019).
Moreover, we find that the new CMIP6 models do not substantially improve the sign and statistical significance of the magnitude and spatial patterns of trends, compared to CMIP5 models. In some regions, like the West Sahel, the models simulate poorly the observed long-term precipitation trends. The dominant decrease in observed precipitation from 1951 to 2014 contrasts with the models' overall positive trend (both high-and low-top models in CMIP5 and CMIP6), which is characterized by a strong increasing trend. We also detected a long-term precipitation trend in areas of the high latitudes of the Northern Hemisphere that is consistent with anthropogenic influences, as indicated by some previous studies (e.g. Bhend and von Storch 2008;Wan et al. 2015). Nevertheless, we must emphasize that, even in these areas, there are inconsistencies with model simulations, mostly for the long term, with a general underestimation of the annual precipitation trend by the different groups of models. Also, there are substantial differences between the long-term evolution of observations and model simulations in the mid-latitude subtropical regions (i.e. southern North America, Central America, northern South America, East Asia and the Mediterranean). Models in these regions showed stronger drying, which contradicts long-term observations. Some exceptions (i.e. drying suggested by models coincide with drying from observations) were found in a few regions, mainly close to the Near East (Israel and Syria), where changes have been attributed to anthropogenic forcing (Knutson and Zeng 2018). The expansion of Hadley cells in the Southern and Northern Hemispheres could explain the significant drying of these regions in models (Staten et al. 2018). Nevertheless, the CMIP5 models show limitations in reproducing robustly subtropical expansion (Davis and Birner 2016), and they appear to amplify the possible effects of long-term anthropogenic forcing relative to natural variability that characterizes Hadley cells movement (Bronnimann et al. 2015;Grise et al. 2018).
In general, the existing differences between trends in observed precipitation and model simulations for different long-and short-term periods can probably be linked to a misrepresentation of the atmospheric processes driving decadal variability in model simulations (Sheffield et al. 2013;Chen and Frauenfeld 2014). For example, Gu et al. (2015) and Wei et al. (2018b) have shown that most models are inaccurate in simulating the impacts of the Pacific Decadal Oscillation (PDO) on the western Pacific subtropical high and the East Asian summer monsoon. Models also show Fig. 3 Scatterplots showing the spatial relationship between the magnitude of annual precipitation change in GPCC observations and the average magnitude of change derived from each CMIP6 individual models. Colours represent the density of points with red colours showing the highest densities (legends show the frequency of grid points). The significance of the correlation was obtained using a bootstrap method. A total of 1000 random samples of 30 data points each were extracted, from which correlations and p values were obtained. The final significance was assessed by means of the average of the p values ◂ limitations in replicating changes in the Indo-Pacific sea surface temperature (SST) and the Atlantic SST meridional gradients, which are key systems impacting precipitation variability in the Sahel (Biasutti et al. 2008). This could explain the discrepancy between observed and modelled precipitation trends from 1951 to 2014. The model representation of El Niño Southern Oscillation (ENSO) also shows clear differences among simulations. As such, precipitation biases in the CMIP5 model simulations over the western Pacific, the Indian Ocean, and the equatorial Pacific could be linked to the ENSO-related SST biases in the models (Dai and Arkin 2017). Moreover, the position of the Northern Annular Mode (NAM) and the North Atlantic Oscillation (NAO) is not well represented in precipitation simulation models, as reported by previous studies (Bladé et al. 2012;Kelley et al. 2012). This would reinforce, for example, the disparity between observed and modeled precipitation trends in Southern Europe. In the same context, the CMIP5 models do not adequately represent the Southern Annular Mode (SAM), limiting their ability to reproduce precipitation trends in semiarid mid-latitude regions of the Southern Hemisphere (Purich et al. 2013). Sheffield et al. (2013) showed a wide range of model performance in reproducing the displacement of the tropical synoptic-scale disturbances in North America, making tropical cyclones unpredictable. In addition, they showed a large spread in the model's ability to replicate ENSO impacts on the North American climate. In the Tibetan plateau, Duan et al. (2013) and Salunke et al. (2019) showed certain weaknesses of CMIP5 models for the reproduction of summer monsoon.
According to Deser et al. (2012), the internal atmospheric variability associated with the annular modes is a major source of uncertainty in model simulations in the middle and high latitudes. It could be reasonable to consider that high-top models could better represent trends of precipitation at these latitudes. This is simply because this mechanism has been a primary source to explain the interannual variability of precipitation and its long term trends over large regions (Vicente-Serrano and López-Moreno 2008;Bladé et al. 2012). Nevertheless, in this study, there were few differences in the skill of both CMIP5 and CMIP6 high and low-top models in reproducing long-term observed precipitation trends. Several studies have indicated that accurate representation of the stratosphere is required to properly replicate the various atmospheric processes in the troposphere (Cagnazzo and Manzini 2009;Wei et al. 2018a;Haase et al. 2018). Low-top models do not explicitly reproduce stratospheric dynamical variability, which may have consequences for the identification of tropospheric impacts of the SAM and NAM, which have a major influence on precipitation variability at high latitudes and in subtropical regions. Using the Hadley Center Global Environmental Model version 2 (HCGEM2), Osprey et al. (2013) suggested that high-top models could better replicate the Quasi-Biennial Oscillation (QBO), indicating that a better representation of stratospheric variability in tropical and extratropical stratospheric modes is required. Nevertheless, based on high-top model simulations using the Met Office Unified Model from CMIP5, Hardiman et al. (2012) showed that high-top model experiments enhanced the impact of atmospheric teleconnections on surface climate, particularly the response to ENSO. However, surprisingly, the addition of a well-resolved stratosphere had no effect on the annular mode index variability. This finding concurs with the study of Lee and Black (2014), which employed CMIP5 models. According to Scaife et al. (2012) changes in the stratospheric circulation induce a further shift in the tropospheric circulation. This shift in circulation could alter the Atlantic storm track and modify precipitation projections over Western and Central Europe. Our findings indicate minor differences between high-and low-top model groups, demonstrating that there is little improvement in the ability of the models to reproduce precipitation trends when considering a well-resolved stratosphere.

Conclusions
This study provides a comprehensive comparison of long-term precipitation trends using observations and a set of models from two CMIP experiments (CMIP5 and CMIP6). Making such a comparison is challenging in the sense that precipitation is characterized by high interannual and multidecadal variability. This study demonstrates that -in large regions worldwide-there are no statistically significant differences between observed and modeled Fig. 5 Spatial distribution of the average differences between the magnitude of change in annual precipitation of the individual models of each CMIP6 group of models and the magnitude of change in the GPCC observations. Areas with statistically significant differences between observations and model groups are delineated by black lines (90% of the models) precipitation trends. However, this does not imply that different model experiments can reproduce the observed trends. Rather, this finding is associated with the large interannual variability of precipitation, suggesting that the total variance explained by the dominant trend is low. This is evident for long-term (1891-2014) and short term  analysis.
Moreover, there is a statistically significant under-simulation of zonal mean observed increasing precipitation trends (1891-2014) by CMIP6 models over land regions with adequate data coverage (at least poleward of about 40º N in the northern hemisphere, and from 15º S to 40º S in the southern hemisphere). This new key finding generally supports previous findings of under-estimated increasing trends in similar regions using CMIP5 models.
In general, we found that the new experiments in CMIP6 do not significantly improve the representation of the magnitude and spatial patterns of global precipitation trends compared to CMIP5 models. Rather, we found that  there is a spatial discrepancy between observed and modeled precipitation, which is stronger in CMIP5 models. Similarly, no significant differences were found between the high-and low-top models. When comparing individual model trends to observations, there is a large spread in model trends. Models, in particular, overestimate drying in mid-latitude subtropical regions while underestimate observed precipitation increases in the Northern Hemisphere, particularly in North and Central Europe.
Based on comparisons of observations and model simulations, several studies have suggested human influence on precipitation trends (Zhang et al. 2007;Wan et al. 2015;Sarojini et al. 2016;Knutson and Zeng 2018). We argue that large uncertainties remain, mainly due to the strong interannual variability of precipitation, the lack of significant Fig. 7 Spatial distribution of the most frequent category of trend and statistical significance from the individual models in each one of the CMIP6 model groups observed long-term trends (Nasrollahi et al. 2015;Knutson and Zeng 2018;Spinoni et al. 2019), the general overestimation or underestimation of precipitation changes at the regional scale, as well as the general spatial mismatch between observed and modelled precipitation trends.
This study emphasizes that future projections of precipitation trends from CMIP5 and CMIP6 should be viewed with caution because they do not adequately reproduce trends in observations. For example, several papers report that the general drying in large subtropical regions shown by the different groups of models is projected to magnify in future projections (Martin 2018;Spinoni et al. 2020;Ukkola et al. 2020). Nevertheless, even in these regions, these projections cannot be seen as fully robust, given the poor ability of models in reproducing long-term observed trends and correspondingly overestimating drying processes suggested by observations in these regions. Funding Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work was supported by the research projects CGL2017-82216-R, PID2019-108589RA-I00 and PCI2019-103631 financed by the Spanish Commission of Science and Technology and FEDER and CROSSDRO project financed by the AXIS (Assessment of Cross(X)-sectoral climate Impacts and pathways for Sustainable transformation), JPI-Climate co-funded call of the European Commission. This study is also supported by "Unidad Asociada CSIC-Universidad de Vigo: Grupo de Fisica de la Atmosfera y del Océano". CM acknowledges funding from the Irish Environmental Protection Agency (2019-CCRP-MS.60).
LG and RN received partial support from the Xunta de Galicia under the Project ED431C 2017/64-GRC Programa de Consolidación e Estructuración de Unidades de Investigación Competitivas (Grupos de Referencia Competitiva) and Consellería de Educación e Ordenación Universitaria. Fig. 9 Histograms depicting the absolute frequency of the world areas showing a percentage of agreement between categories of trend and significance of the individual models of the different CMIP6 groups in comparison to observations Previous funders received cofounding from the ERDF, in the agenda of the Operational Program Galicia 2014-2020.

Conflict of interest None.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.