Changes in temperature–precipitation correlations over Europe: are climate models reliable?

Inter-variable correlations (e.g., between daily temperature and precipitation) are key statistical properties to characterise probabilities of simultaneous climate events and compound events. Their correct simulations from climate models, both in values and in changes over time, is then a prerequisite to investigate their future changes and associated impacts. Therefore, this study first evaluates the capabilities of one 11-single run multi-model ensemble (CMIP6) and one 40-member single model initial-condition large ensemble (CESM) over Europe to reproduce the characteristics of a reanalysis dataset (ERA5) in terms of temperature–precipitation correlations and their historical changes. Next, the ensembles’ correlations for the end of the 21st century are compared. Over the historical period, both CMIP6 and CESM ensembles have season-dependent and spatially structured biases. Moreover, the inter-variable correlations from both ensembles mostly appear stationary. Thus, although reanalysis displays significant correlation changes, none of the ensembles can reproduce them, with internal variability representing only 30% on the inter-model variability. However, future correlations show significant changes over large spatial patterns. Yet, those patterns are rather different for CMIP6 and CESM, reflecting a large uncertainty in changes. In addition, for historical and future projections, an analysis conditional on atmospheric circulation regimes is performed. The conditional correlations given the regimes are found to be the main contributor to the biases in correlation over the historical period, and to the past and future changes of correlation. These results highlight the importance of the large-scale circulation regimes and the need to understand their physical relationships with local-scale phenomena associated to specific inter-variable correlations.


Introduction
Over the last few years, the interest in statistical correlations between climate variables has become strong in various domains (e.g., Sukharev et al. 2009;Bhowmik et al. 2017;Mengis et al. 2019;Seo et al. 2019;Tukimat et al. 2019, among many others). This interest comes from the fact that, most of the time, climate phenomena need to be characterised by multiple variables (precipitation, temperature, wind, etc.) and not only a single one, if we want to understand their processes and impacts. One typical example is given by the study of "compound events" (CEs), a growing field of research in the impact and climate science communities (e.g., Zscheischler and Seneviratne 2017;Sadegh et al. 2018;Zscheischler et al. 2020;de Brito 2021;Ridder et al. 2021;Singh et al. 2021;Zscheischler et al. 2021, among many others). These climate events can be defined as resulting from a combination of events-not necessarily extreme by themselves-whose simultaneous or successive occurrences might generate major impacts. Different types of compound events have been categorised into a specific typology by Zscheischler et al. (2020): "preconditioned" events (a weather-or climate-driven preconditioning intensifies the impacts); "multivariate" events (several simultaneous univariate hazards create the impact); "temporally compounding" events (successive hazards generate an impact); and "spatially compounding" events (univariate hazards in several places cause an impact). The key statistical aspect of such events is the dependence characterisation of the different univariate events that, together, form the CEs and cause the impacts.
In all these compound events, the dependence structure between the univariate variables or events (e.g., the correlation matrix) has to be known or estimated in a robust way. In studies investigating potential changes in CE properties and frequencies, it is thus necessary to have both correct dependence properties in the historical simulations and robust climate change signal regarding these multivariate statistical properties. More generally, this information is essential in any study relying on simulated climate data with dependence structures, such as environmental studies (in hydrology, agronomy, ecology, etc.) where the associated impact models and their output can strongly depend on the realism of the climate input, in terms of univariate properties as well as in terms of their dependence characteristics (e.g., Ines and Hansen 2006;Teutschbein and Seibert 2012;Laux et al. 2021). However, it is known that climate models (both Global or Regional ones, GCMs or RCMs) can have biases with respect to observations or reanalyses, not only in terms of marginal distributions (i.e., statistical properties of the variables considered separately) but also in the multivariate properties (e.g., dependence, such as correlations) of the simulations they provide (e.g., Cannon 2017;Vrac 2018;François et al. 2020).
That is why, "bias correction" (BC) methods-also called "bias adjustment" methods-have been developed over the last few decades. Any BC method relies on a transformation of the "raw" climate simulations so that the corrected simulations possess statistical properties (e.g., mean, variance, or more generally their statistical distribution) similar to those of the reference dataset (such as observations or reanalyses). The correction (i.e., transformation) is estimated over a historical period where both reference data and simulations are available. The correction is supposed to be valid in a climate change context and, then, applied to climate simulations over the projection period of interest. BC methods can be univariate (i.e., working on one variable at a time for one location at a time) or multivariate (i.e., working on several variables and/or locations at the same time). In the univariate case, the "quantile-mapping" approach is the most widely spread and applied technique, via its multiple implementations and variants (e.g., Haddad and Rosenfeld 1997;Déqué 2007;Kallache et al 2011;Vrac et al. 2012Vrac et al. , 2016Volosciuk et al. 2017, among many others). Such a method has various advantages: it is easy to implement, fast to run, and the generated corrections globally preserve the main trends of the simulations (e.g., Cannon et al. 2015;Hempel et al. 2013). Moreover, it generally respects the ranks of the simulations to be corrected and, thus, maintains the physical dependence structure of the climate model (see e.g., Vrac 2018). However, this latter point means that if the dependence structure in the model simulations is biased, the corrections preserve this biased dependence as well. This is obviously a major issue for compound event estimates. Indeed, Zscheischler et al. (2019) showed that univariate BC methods (such as quantile mapping methods) are generally not sufficient to reduce biases in multivariate hazard estimates and that multivariate BC methods should be favoured to account for dependence structures within compound events. Hence, multivariate bias correction (MBC) methods aim to correct the dependencies between the different variables of interest, in addition to their marginal distributions. François et al. (2020) have categorised MBC methods into three types of approaches, depending on the way the dependence structure is corrected: based on conditional dependencies (the "successive conditional" approach, e.g., in Piani and Haerter 2012;Dekens et al. 2017); separately from the marginals ("marginal/dependence", e.g., in Cannon 2017;Vrac 2018;François et al. 2021), or marginals and dependence together ("all-in-one", e.g., Robin et al. 2019;Robin and Vrac 2021).
In any BC method (univariate of multivariate), one implicit or explicit desirable feature is that the climate changes that are present in the raw simulations from the calibration period to the projection one (e.g., change in mean temperature, or in its moments, or change in rainfall occurrence probabilities) are respected also by the corrected simulations. This is meant to preserve the main physical information provided by climate models based on a common assumption: Even if climate simulations have some statistical biases, the changes in the main properties are physicallydriven by processes and constraints that are relevant and, thus, provide reliable information on climate evolutions. Note that it is the same assumption made by the IPCC in its various reports when looking at anomalies (i.e., removing the seasonal cycle of each climate model, which is a very simple univariate BC method) to focus only on the changes (in temperature, precipitation, etc.) of the different model simulations up to the end of the 21st century. Regarding evolutions of usual univariate variables (such as temperature or precipitation separately), although uncertainties are still inevitably present, the climate change signal is more and more studied and robust (e.g., Kendon et al. 2008;Matte et al. 2019). However, signals of changes in multivariate properties or dependencies in the climate simulations have not received much interest so far. Yet, these changes can have major repercussions on multivariate BC designs, on compound events evolutions, or more generally on conclusions brought by impact studies. Evolution of multivariate dependence properties is then an essential signal from the climate models that must be investigated to assess its reliability.
Moreover, local univariate and multivariate properties of climate variables are influenced by large-scale synoptic atmospheric circulations (e.g., Yiou et al. 2018;Jézéquel et al. 2020;Faranda et al. 2020;Rust et al. 2013). Hence, biases in modeled circulations can propagate to statistical properties of local climate. For example, Maraun et al. (2021) showed that synoptic circulation regimes and their biases have significant influences on univariate temperature and precipitation biases and on the capability of univariate BC methods (such as quantile-mapping) to correct these biases. However, the influences of atmospheric circulation regimes on local-scale correlations or dependencies between temperature and precipitation have not been investigated so far. Assessing the influence of such regimes on changes of inter-variable dependence properties and correlations is thus an objective of the present study, as it might have important consequences for applicability of MBC methods.
Therefore, the goal of the present paper is to assess how climate models reproduce the key inter-variable dependence between temperature (TAS) and precipitation (PR), as well as their changes over time. Since this correlation is important globally to get realistic situations, we do not focus on a specific event to drive our analyses. However, it is clear that the good or bad representation of this correlation in climate simulations can have major consequences on various specific events. For examples, joint heatwaves and droughts are obviously driven by some parts of the correlation between TAS and PR. This is also the case of convective phenomena, associating temperature and heavy rainfall, which, without a proper correlation, cannot be correctly modeled or characterized. Indeed, the Clausius-Clapeyron relation implies a correspondence between temperature and precipitation (e.g., Lenderink and Van Meijgaard 2008;Luu et al. 2022). More generally, a realistic correlation between temperature and precipitation is needed, even for non-extreme events, in order to correctly simulate basic day-to-day variability of the two marginal variables (e.g., wet/dry spells probabilities, temperature auto-correlation) while, at the same time, accounting for the physical realism of the joint (TAS, PR) daily situations. We first investigate how two climate model ensembles (CMIP6 multi-model ensemble and CESM multirun ensemble) compare to reference reanalysis data in terms of historical change (i.e., evolution) of inter-variable correlations. In addition to basic comparisons, we assess various contributions to the historical changes in temperature-precipitation correlations, as well as to the biases of historical changes in correlation. This is done first by defining synoptic circulation regimes and then, conditionally on each regime, by separating the marginal distributions, the rank correlations (linking the marginal distributions) and the circulation regimes frequencies, which all influence the inter-variable correlations. In a second step, the conditional contributions (given the circulation regimes) to future correlation changes, up to the end of the 21st century, are also explored.
The rest of this article is structured as follows: Sect. 2 describes the reanalysis references and climate simulations used in this study. Section 3 assesses whether temperature-precipitation correlations from climate model simulations are consistent with those from a reanalysis dataset over a historical time period. This is done first based on direct comparisons. Next, atmospheric circulation regimes are defined and basic assessments of the capability of the climate models to reproduce the regimes defined on reanalysis data are provided. Evaluations of the changes in intervariable correlations are made via a decomposition of correlations conditional on the large-scale circulations regimes. Section 4 characterises future changes up to the end of the 21st century in the simulated correlations. Finally, conclusions and discussions are provided in Sect. 5.

Data
Over the historical period, the reference data used in this study come from the ERA5 daily reanalysis (Hersbach et al. 2020)  Two ensembles of climate model simulations are considered. The first one is a multi-model ensemble made of 11 Global Climate Models (GCMs) contributing to the 6th exercise of the "Coupled Models Intercomparison Project" (CMIP6, Eyring et al. 2016). This selection was dictated by the availability of Z500, temperature and precipitation fields on daily time scales at the time of analyses: we have only selected models whose data were fully available for the whole period 1979-2100. The list of the GCMs is provided in Table 1. The second ensemble contains 40 members (i.e., runs) from a single GCM, the "Community Earth System Model" (CESM, Kay et al. 2015) developed at NCAR/ UCAR (USA). The use of these two ensembles (multi-model or multi-run) will allow to distinguish inter-model variability from internal variability in our investigations about change in correlations. From each of these two ensembles, the same variables (i.e., TAS, PR, z500) have been extracted for the same geographical domain as for ERA5 reanalyses, over the 1979-2014 period for the historical runs and over the 2015-2100 period under the shared socioeconomic pathways 585 (SSP585) scenario (Riahi et al. 2017). Hence, for each run of each ensemble, we consider continuous simulations from 1979 to 2100, which we separate into 1980-2019 to characterise the historical period-and that will be cut in 1980-1999 and 2000-2019 for historical evaluationsas well as into four 20-year future periods: 2021-2040, 2041-2060, 2061-2080, 2081-2100. Moreover, to ease the comparisons between the different datasets, all temperature, precipitation and Z500 fields have been regridded to a common spatial resolution of 1 • × 1 • .

Evaluations of historical biases and changes in inter-variable correlations
As a first assessment of the capability of the various climate models to reproduce the reference inter-variable correlations, Fig. 1 displays the maps of the ERA5 TAS-PR correlations ( ERA5 ) over the 1980-1999 period, for both winter and summer (Fig. 1a, d respectively). This figure also shows the maps of the correlation mean biases with respect to ERA5 (i.e., mean of model − ERA5 for all models or runs in a given ensemble) from CMIP6 (second column, Fig. 1b, e) and CESM runs (third column, Fig. 1c, f). One can wonder if these correlation biases are driven by biases in one of the two univariate variables more predominantly. Even with perfect marginals, i.e., without any bias for the TAS and PR variables separately, the correlation value can still be strongly biased due to a wrong dependence structure (copula function) linking TAS and PR. However, biases in the marginal distributions can also reinforce the bias in correlation.
To investigate if this is the case, for each GCM (or run), each season, each grid-cell and each variable, we have performed a Cramer-von Mises test (Darling 1957) to compare the univariate (TAS or PR) distributions of the simulations, to the distribution of the ERA5 reanalyses. A p value lower than 0.05 indicates that, with a confidence of 95%, we reject the null hypothesis that the two distributions (from simulations or from reanalyses) are the same. A p value higher than 0.05 indicates that we cannot reject (i.e., we accept) the equality of the distributions, with a confidence of 95%. The results are given in the supplementary material figure SM.1 showing the boxplots of the mean p values (among the 11 CMIP6 GCMs or the 40 CESM runs) of CMIP6 and CESM temperature and precipitation, over the 4 seasons. Based on this figure, it is clear that, for both TAS and PR, the univariate variables are biased, as most of the obtained p values are below 0.05. This means that the modelled distributions are often significantly different from the ERA5 ones. There is not one variable that appears to be more responsible (i.e., more biased) than the other. Yet, surprisingly, PR distributions seem to be slightly more often considered as similar to ERA5 PR distributions, or at least with higher p values than for temperature, for both CMIP6 and CESM ensembles. Yet, the central question of this study is the capability of the simulations to provide changes in TAS-PR correlations over time. Hence, for each grid-cell of the domain, the change of correlation, Δ , is calculated as the difference between the 2000 and 2019 correlation, 2000−2019 , and the 1980-1999 correlation, 1980−1999 , i.e., for each model run ( Δ run ) or for ERA5 ( Δ ERA5 ). For each ensemble (CMIP6 or CESM), the mean change of the different runs is computed to get Δ CMIP6 and Δ CESM . Then, the bias in change of correlation is defined as where "ENS" is either CMIP6 or CESM. The last two rows of Fig. 1 show, for winter and summer, the maps of these "biases in changes" of inter-variable correlations from 1980-1999 to 2000-2019, with respect to the change observed in ERA5. The equivalent maps for spring and fall are provided as supplementary materials in Figure SM.2. Regarding ERA5 correlations (Fig. 1a, d), a seasonal effect is clearly visible on the spatial patterns of the correlation. This seasonal effect is also visible in the CMIP6 (Fig. 1b, e) or CESM (Fig. 1c, f) biases of correlation, with positive and negative patterns distributed differently according to the season. In general, correlation biases seem more pronounced with the CESM ensemble than with the CMIP6 one. This reflects the fact that, although the CESM ensemble is composed of 40 runs, as only a single climate model is used here, the runs are consistent within the ensemble and, thus, the correlations (and their biases) are similar from one run to another. For CMIP6 model simulations, the variability of the correlations is larger and then reduces the mean correlation biases. This difference in internal vs. inter-model variability of correlation is illustrated in Fig This ratio allows us to quantify the proportion of the inter-model variability in correlations explained by the internal CESM variability of correlations. This figure illustrates that, for all seasons, the CESM ensemble shows a quite reduced variability in correlations, with respect to the CMIP6 variability. Then, the inter-model differences found in the present study cannot be explained by the internal variability alone, because the latter represents only a small part of the inter-model one, about 30% of variance on average. When looking at ERA5 correlation changes ( Fig. 1g, j), the changes also appear season-dependent. This is thus also true for the associated CMIP6 (Fig. 1h, k) and CESM (Fig. 1i, l) biases. However, here, the intensity of the biases are not stronger for one ensemble. Interestingly, the spatial structures of these biases in correlation changes are rather similar for the two ensembles. These biases are neither uniform nor close to zero, indicating that CESM and CMIP6 simulations do not reproduce the ERA5 correlation changes. Moreover, they both look like the "negative pictures" of the structures of change seen from ERA5. This means that neither CMIP6 nor CESM simulations provide correlation changes between 1980-1999 and 2000-2019. Indeed, these "negative picture" patterns imply that the (TAS, PR) correlations issued from CMIP6 and CESM are mostly stationary (i.e., they do not evolve much in time) while they should show some non-stationarity as indicated by ERA5. Hence, neither CMIP6 nor CESM capture the historical changes in correlations.
Note that significant changes in inter-variable correlations in ERA5 can be caused either by internal low-frequency variability or by climate change. In ERA5, the contributions of the two factors cannot be dissociated (at least easily). In the CESM ensemble, the climate change signal can be estimated since averaging the results obtained from different runs reduces the effect of the internal variability. However, results on CESM cannot be directly transposed to ERA5 as CESM can also be biased with respect to ERA5. To consider model biases, it is important to analyse the CMIP6 ensemble as it provides an idea of the inter-model variability. Hence, in order to investigate more these biases-and more precisely the biases in changes of correlations-it is important to take advantage of the ensembles and consider the distribution of changes in inter-variable correlations, rather than only the mean changes across the various runs.

Distributions of changes in correlations over historical period
We investigate the distributions of changes in correlations from the CMIP6 and CESM ensembles and evaluate if they are compatible with the historical changes seen with the ERA5 reanalyses. Therefore we define the probability ERA5 that corresponds to the probability that the change in correlations-from one reference period ( p 1 = 1980-1999) to a period of interest ( p 2 = 2000-2019) as defined in Eq.
(1)-is lower than or equal to the correlation change provided by ERA5: with where N is the number of members of the ensemble "ENS" of interest ( N = 40 for CESM and N = 14 for CMIP6). In the following, the correlation change from an ensemble is said to be "compatible" at a 90% confidence level with the ERA5 correlation change if 0.05 < ERA5 < 0.95 , i.e., if Δ ERA5 lies in the 90% central part of the Δ ENS distribution. If ERA5 < 0.05 , the distribution of changes seen from the ensemble is mostly (or completely) above the ERA5 change.
Conversely, if ERA5 > 0.95 , the ensemble distribution from the ensemble is below the ERA5 change.
To visualize the results, Fig. 2 shows the winter and summer maps of ERA5 probabilities for CMIP6 and CEMS, where only ERA5 values higher than 0.95 (with upper triangles) or lower than 0.05 (lower triangles) are plotted. Where no triangles are plotted, the correlation change from the ensemble is compatible with the ERA5 change.
To interpret these results, it is important to know where the ERA5 change in correlation is significant. Thus, a Fisher test (with a 95% confidence level) is performed based on Fisher's z-transformation (Fisher 1915;Hotelling 1953) to assess if the ERA5 correlations changed from one period to another. Hence, significant changes are also plotted in colours in the maps of Fig. 2, while non-significant ERA5 changes are left white. The major result conveyed by Fig. 2 is that, in general, for both CMIP6 and CESM, the ensemble distribution of changes is not compatible with an ERA5 correlation change when the latter is significant. Indeed, for many of the coloured grid cells (i.e., with significant correlation change), a triangle is also present. For negative ERA5 changes (blue), a lower triangle is visible, while for positive ERA5 changes (yellow-red), it is an upper triangle. Conversely, most of the domains where the distributions of changes are compatible with ERA5 corresponds to nonsignificant changes. Therefore, CMIP6 and CESM do not seem to be able to reproduce the main ERA5 changes in inter-variable correlations.

The role of circulation regimes in historical changes
One can wonder how much these disagreements (between ERA5 and models) in terms of change of correlations are influenced by the disagreements in frequencies between ERA5 and simulated circulations. Do the partial disagreements in changes of correlations come from the biased simulated regime frequencies? Or from biases in the marginal properties (of temperature and precipitation) conditionally on the regimes? From biased conditional temperature-precipitation correlations, given a regime? If they come from a combination of such features, what are their relative contributions? To answer such questions, it is necessary to define these regimes.

Circulation regimes: definition and basic GCM assessment
For each of the four seasons separately (winter: DJF; spring: MAM; summer: JJA; fall: SON), the ERA5 daily z500 fields are pre-processed in two steps: (1) they are first deseasonalized and detrended. For that, the seasonality is estimated and removed by fitting a smoothing spline over the spatially averaged z500 over all years as a function of the day in the year. The temporal trend is then computed and removed as a smoothing spline of the deseasonalized Z500 as a function of time; (2) a Principal Component Analysis (PCA) is performed on the detrended and deseasonalized fields. The principal components (PC) explaining 90% of the total variance are kept (12 PCs in DJF, 14 in MAM and SON, 17 in JJA). Then, for each of the four seasons separately, based on the PCs retained, the k-means clustering algorithm is applied to define K = 4 regimes. Here, this number K = 4 is arbitrarily selected to be consistent with circulation regimes found in the literature (e.g., Michelangeli et al. 1995;Corti et al. 1999;Yiou and Nogaj 2004). The k-means algorithm is performed with the Hartigan-Wong algorithm (Hartigan and Wong 1979) with a maximal number of iterations equal to 100. Since the algorithm is sensitive to the cluster initialization, the algorithm is performed for 10 random cluster initialisations. The clustering for which the internal sum of squares is minimum is kept. The four resulting ERA5 composite maps for Winter are shown in Fig. 3. We obtain the four traditional circulation regimes in winter: map Fig. 3a corresponds to the "Blocking" regime, Fig. 3b to the "Atlantic Ridge" one, Fig. 3c to the positive phase of the North Atlantic Oscillation (NAO+) and Fig. 3d to its negative phase.
Each CMIP6 and CESM daily Z500 field is next attributed to one of the ERA5 regimes. The following steps are performed: 1. The daily Z500 fields from the models are pre-processed by removing the average temporal trend and seasonality over North Atlantic in the same way as for ERA5. Note that the trend and seasonality is estimated individually for each dataset (i.e., run). 2. Each detrended and deseasonalized daily Z500 field is projected onto the selected principal components defined from the ERA5 dataset, in order to get the daily time series of PCs. The projection is made thanks to the PCA rotation matrix obtained with the ERA5 dataset. 3. Finally, a day is attributed to a given circulation regime if the Euclidean distance between the PCs of this day and the PCs of the centroid representing the regime is minimum.
Hence, the circulation regimes are forced to be the same for the model simulations and for the reanalyses. By construction, the CESM and CMIP6 composite maps obtained for each regime are very close to those from ERA5 (not shown). Note that the CESM and CMIP6 historical simulations do not represent the chronology of observations but it is assumed that each model simulates its own meteorology that its consistent with the chronology of natural and anthropogenic forcings. Therefore, although the chronological sequences are different, it is expected that the statistics are comparable. This must be reflected in the regimes and their frequencies. So, it is meaningful to compare ERA5's regimes to CMIP6/CESM's, even if the chronologies are not the same. However, a basic evaluation of the frequencies of each regime indicates some differences between the frequencies from the ERA5 regimes and those from CESM or CMIP6, as shown in Fig. 4 for the four seasons.  Fig. 4 For each season, boxplots of frequencies of z500 regimes occurrences for CMIP6 (in red) and CESM simulations (in green). The blue segments correspond to the ERA5 frequencies of the regimes At first sight, the frequencies from models appear to be biased with respect to the reference ERA5 frequencies. However, the ranges of frequencies in Fig. 4 are quite tight, visually emphasising the biases. Generally, the inter-model variability of the frequencies from the CMIP6 models is slightly higher than the single-model internal variability brought by the CESM runs, in particular in summer and fall. Nevertheless, most of the time, the frequency biases are somehow equivalent for CMIP6 models and CESM runs: they generally both either overestimate or underestimate the ERA5 frequencies, with the exception of cluster 4 in winter. These biases in frequencies can be quite pronounced. In many cases, the reference ERA5 frequency is out of the inter-quartile interval (e.g., C3 in winter among others), and even out of the whole boxplot (i.e., distribution), as in regime C2 for CESM, or in regime C3 in fall for both CMIP6 and CESM. Corti et al. (1999) speculated that the increase in temperature over the Northern Hemisphere may be due to a change in the frequency of the regimes. In the present article, the influences of the regimes are investigated in terms of changes of (temperature-precipitation) inter-variable correlation, instead of change of temperature as done by Corti et al. (1999).
Even if the conditional temperature-precipitation correlation given this regime is correct (which is not assessed so far), the frequency biases can have major consequences on the overall inter-variable correlation.
Hence, it is interesting to look at the conditional biases of the inter-variable (temperature vs. precipitation) correlations given the weather regimes, as well as the conditional biases of changes in inter-variable correlations, given the weather regimes. For winter, biases of correlations are given in 14 of the supplementary materials section. Roughly speaking, the spatial structure of the ERA5 correlation maps (first column of Fig. 5) is rather similar for unconditional or conditional calculations, with, in winter, a latitudinal gradient, going from strong positive correlation values towards the North, e.g., along the Norwegian coast, to strong negative correlations towards the South, e.g., over the Mediterranean region. However, the magnitude of these correlations varies from one circulation regime to another, reflecting their influences. For example, winter regime 2 (Fig. 5g) induces the strongest TAS-PR correlations over Finland and mild ones over France, while regime 3 (Fig. 5j) displays mild correlations over Finland and the weakest correlations over France. The spatial structures of the CMIP6 maps of winter mean biases are also quite similar for unconditional and conditional calculations whatever the regime: a central band going from Spain and France to the East (to Poland and Belarus) shows correlation biases close to zero or slightly negative, while out of this band (i.e., to the North or South), most of the biases are positive. Nevertheless, as for ERA5 correlation maps, the biases show some variability. For example, winter regime 2 (Fig. 5h) is the only circulation type presenting almost no bias (or slightly negative) over Finland, and quite strongly positive biases over Greece. The spatial structure of the CESM TAS-PR correlation mean biases (third column) is quite different and more pronounced than that of CMIP6 average. As for CMIP6, although some variability is visible, the winter CESM pattern of biases is relatively the same from one regime to another. However, the magnitude of the mean biases is much stronger, inversely following the ERA5 correlations: highly negative biases (i.e., underestimating the correlations) to the North and highly positive biases (i.e., overestimating the correlations)to the South.
Regarding ERA5 conditional changes of correlations (Figs. SM.11-SM-14, (a, d, g, j, m)), here, the weather regimes conditioning brings signals different from the unconditional changes of correlations, with a high variability of change from one regime to another. More pronounced changesboth increasing or decreasing-are visible with spatial structures appearing when looking at changes conditional on the circulation regimes, for example, for winter regimes 2 and 4 ( Fig. SM.11(g,m)), or for fall regime 4 ( Fig. SM.14(m)). Regarding CMIP6 or CESM conditional biases in changes of correlation, they are somehow equivalent to each other. Interestingly, as already observed for the unconditional case, they mostly correspond to the "negative pictures" of the ERA5 maps of changes, indicating that CMIP6 and CESM ensembles do not see much of the historical changes and tend to have stationary TAS-PR correlations from 1980-1999 to 2000-2019, both in the unconditional and conditional cases.

Conditional decomposition of correlation
In order to investigate the role of the defined circulation regimes in the (historical or future) changes of inter-variable correlations, we rely on a decomposition of the correlation that is applicable when the statistical population (our daily time series) is composed of clusters (here, circulation regimes). This decomposition of correlation was introduced by Charter and Alexander (1993). Based on a bivariate time series (x i , y i ) i=1,…,N (here, temperature and precipitation at a given gridcell) that is clustered in K groups (here, K = 4 circulation regimes) of size (n k ) k=1,…,K , the correlation between X and Y can be decomposed into: where k is the kth subgroup correlation between X and Y, S X k and S Y k are the kth subgroup standard deviations, and X k and

(7)
and corresponds to the reduction of bias in terms of change of correlation that is brought by having 'a 'correct" information. The influences Ib( ) and Ib( ) of the components other than can be calculated the same way by permuting "ERA5" and "run" labels at the appropriate locations in Eq. (6) to compute Δ WRCI| and Δ WRCI| , allowing to get Ib( ) and Ib( ) from Eq. (7). For each of the three components, in order to get a single Ib value for each ensemble and gridcell, the N Ib values given for a given gridcell by the different runs or models in an ensemble (CMIP6 or CESM) are averaged. Figure 6 displays the boxplots of the influences of the three components to the CMIP6 (red) and CESM (green) biases Generally speaking, all seasons give qualitatively similar results, which are also very similar for CMIP6 and CESM ensembles: the major influence on the biases in change in correlations come from the biases in the component, explaining on average about 75% of the biases in correlation change. The biases in , conditional marginal distributions, mostly explain the remaining 25%, while the biases in , the size (i.e., frequency) of the circulation regimes, does not influence the biases in correlation changes. Some differences appear however when looking more closely at the results. For example, while the influence of the conditional correlations is very strong for spring (Fig. 6b) and summer (Fig. 6c), pushing down the relative influence of the conditional marginal distributions, the winter and overall fall seasons (Fig. 6a, d) see more pronounced influences of the conditional distributions, thus reducing the influence of the conditional correlations.
The main conclusion is that, although the conditional marginal properties have some moderate influences, the biases in conditional correlations-given the circulation regimes-are the main drivers of the biases in correlation changes over the historical period. The regime frequency biases being relatively small (Fig. 4, their influence appears almost negligible.

Contributions of the "Weather Regimes Conditional
Information" changes to the historical changes of correlations In addition, whether they are biased or not, the Conditional Information can change over time and it is important to know how much these conditional changes contribute to the change of the unconditional TAS-PR correlation. To quantify the contribution of the change of a given component-say for the illustration-to the change in inter-variable correlation, the correlations over the two period p 1 and p 2 are calculated based on Eq. (5) by considering that the component of interest (e.g., ) is stationary over time, i.e., is the same for the two periods. Hence, for the example, and where "data" is either "ERA5" or the "run" of a model. The important point to note is that, here, data,p2| is calculated with a stationary component estimated from period 1: hence, the components are the same for the two periods. The hypothetical change between these two correlations is data,p 1 = , = ( (data,p 1 ) , (data,p 1 ) , (data,p 1 ) ) (8) data,p2| = ( (data,p 1 ) , (data,p 2 ) , (data,p 2 ) ) Δ data| = data,p2| − data,p 1 .
The contribution of the change in the component to the change of the unconditional correlation is then quantified as: The contributions C Δ ( ) and C Δ ( ) of the conditional information other than can be calculated the same way by permuting " p 1 " and " p 2 " labels at the appropriate locations in Eq. (8) to compute data,p2| and data,p2| , allowing to get C Δ ( ) and C Δ ( ) from Eq. (9).
As previously, for each of the three components, in order to get a single C Δ value for each ensemble and gridcell, the N C Δ values given for a given gridcell by the different runs or models in an ensemble (CMIP6 or CESM) are averaged. Figure 7 shows the boxplots of contribution values of the changes in the three components to the changes in temperature-precipitation correlations from 1980-1999 to 2000-2019 for the four seasons, for CMIP6 and CESM ensembles as well as for ERA5. As for Fig. 6, a relatively similar behaviour can be observed for the four seasons as well as for the different datasets: the major part of the unconditional correlation changes is due to the changes in the conditional correlations given the circulation regimes. The changes in conditional marginal properties only contribute at a quite moderate level, while the contribution values of the changes in frequencies of the regimes are centered around 0. The ensembles (CMIP6 and CESM) contributions are consistent with those from ERA5. However, for winter and fall (Fig. 7a, d), the contributions of the components are somehow different between ERA5 and the two ensembles: the CMIP6 and CESM contributions are stronger for the components, while their contributions are underestimated. Interestingly, this coincides with a similar pattern observed in Fig. 6 for influences on biases of changes. This means that, for winter and fall, the changes in unconditional correlation are biased not only by the values of the conditional correlations but also by the changes in the conditional correlation values, i.e., the time evolutions of the conditional correlations. For spring and summer, the agreement in terms of relative contributions between the three datasets suggests that the time evolution of the conditional correlations is not the main contributor of the biases in changes of correlations, and that-as shown in Fig. 6-the biases of the conditional correlation values themselves correspond to the major reasons.

Projections of future changes in inter-variable correlations
A natural question is then whether, in a future climate, the changes in intervariable TAS-PR correlations will continue to be mostly driven by changes in conditional correlations or if changes in frequencies of the circulation regimes or in conditional marginal properties will take over. To do so, the CMIP6 and CESM simulations up to 2100 are used, with a focus on the 2081-2100 period. In this context, the goal is not to perform an evaluation of of the simulated changes-as no reference is available for comparison in the future-but to characterise if CMIP6 and CESM ensembles provide significant changes in inter-variable correlations in the future simulations, and if so, how these changes are driven by the conditional changes, given the circulation structures.

Distributions of changes in correlations in future projections
First, for each given ensemble, season and grid cell, the mean correlation over 1981-2000 is compared to the mean correlation over 2081-2100 based on a Student t test at a 95% significant level. Values of change in mean correlations (i.e., mean 2081-2100 correlation minus mean 1981-2000 correlation) found significant are plotted in Fig. 8   change, 0 is now computed as the probability that the changes from an ensemble is lower than 0.
where Δ ENS is defined as in Eq. (4). Thus, the probability 0 indicates where the "no change" case is located in the ensemble distribution of changes in correlations from 1981-2000 to 2081-2100. This information is superimposed into Fig. 8, only for 0 < 0.05 (as lower triangles) indicating that a zero change is in the lower tail of the distribution, and for 0 > 0.95 (as upper triangles) indicating that a zero change is in the upper tail. These two cases can thus be interpreted as opposite but significant changes of correlations. Hence, where no triangle is plotted, a stationary correlation between the two time periods cannot be rejected. Contrary to the results over the historical period (Fig. 2) that showed that, in general, CMIP6 and CESM are not able to reproduce the main ERA5 significant changes in inter-variable correlations, here, the correlation changes up to 2081-2100 indicate that the "no change" case is regularly excluded (upper and lower triangles in Fig. 8). Moreover, this rejection is made for a very large portion of the patterns identified with a significant change in mean correlation, implying that the changes in correlation distributions are sufficiently strong to significantly reject the "no change" case. Nevertheless, it is also clear here that CMIP6 and CESM ensembles do not show a strong agreement on this change of correlation criterion: for example, patterns of significant change in mean correlations-as well 0 triangles-are very distinct in summer for CMIP6 (Fig. 8b) and CESM (Fig. 8d). Hence, although significant changes of correlations are simulated by the models, their variability between models (as in CMIP6) or between ensembles (CMIP6, CESM) is quite strong, questioning the robustness of these changes.

Contributions of the circulation regimes to the future changes of correlations
In order to know how these changes in inter-variable correlations and their disagreements between CMIP6 and CESM are driven by the conditional information brought by the circulation regimes, Fig. 9   historical period, it is reinforced within the future projections for both CMIP6 and CESM ensembles.

Conclusion
This study investigated first the capability of two climate model ensembles-one multi-model (CMIP6) and one multi-run from a single model (CESM)-to reproduce the historical inter-variable temperature vs. precipitation correlations from ERA5 reanalyses over Europe, as well as their changes over the historical period 1980-2019. As ERA5 inter-variable correlations are season dependent, so are the associated model biases, with distinct patterns for CMIP6 and CESM (Fig. 1a-f). Some changes in ERA5 TAS vs. PR correlations from 1980-1999 to 2000-2019 were found significant, also with spatial structures depending on seasons. However, both CMIP6 and CESM biases of changes are almost the exact "negative picture" of the ERA5 changes ( Fig. 1g-l), indicating that these ensembles do not show changes of the inter-variable correlations over the 1980-2019 period. This has been further analysed by a comparison between the ERA5 correlation changes and the ensembles distributions of correlation changes (Fig. 2), showing that, most of the ERA5 significant changes belong to the lower (i.e., < 5th percentile) or upper tail (i.e., > 95th percentile) of the distribution, therefore out of the 90% confidence interval of the simulated changes. These results confirmed the inability of the tested ensembles to reproduce the ERA5 historical changes in correlations. Second, to investigate if and how these mismatches between ERA5 and ensembles are driven by specific large-scale atmospheric circulation structures, conditional analyses have been performed. First, circulation regimes (or clusters) have been defined for each season separately (Fig. 3), via a k-means algorithm applied to daily fields of geopotential heights at 500 hPa (Z500). Then, simulated Z500 fields from 1980 to 2100 have been classified into the regimes. Although the ensembles regimes frequencies were shown to have more or less errors depending on the seasons (Fig. 4), this may not be the only reason of the mismatches. Other regimes-related statistical properties can also contribute, such as the conditional TAS vs. PR correlations or the conditional marginal properties of TAS and PR, both given the regimes. Hence, based on a mathematical decomposition of the correlation (Eq. (5)), the influences of the biases of the size of the regimes ( ), the conditional correlations ( ) and the conditional marginal properties ( ), both given fixed clusters, onto the mismatches have been investigated. The results (Fig. 6) showed that the bias of the size of the regimes have a rather negligible effect on the unconditional correlation biases, while the misrepresentation of the marginal TAS and PR properties (means and variances) has a stronger influence ( ∼ 25%) on the final correlation bias. However, the major influence is due to the conditional correlation, the biases of which explains about 75% of the unconditional correlation. Moreover, the contribution of the changes (from 1980-1999 to 2000-2019) in the three conditional properties ( , , ) to the changes in the unconditional correlations is distributed the same way (Fig. 7): quite small for , ∼20% for and ∼80% for . In addition, a comparison to ERA5 over the historical period (Fig. 7) shows that, although the ensembles conditional contributions have equivalent structures as ERA5's, the contributions of the conditional correlation are generally overestimated by the models (essentially for winter, summer and fall), at the expense of the contributions from the conditional marginals properties that are, thus, rather underestimated.
Hence, the general answer to the question "Are climate models reliable in terms of changes in temperature-precipitation correlations?", asked in the title, is "no". Clear biases with respect to ERA5 are present in terms of TAS vs. PR correlations, as well as in terms of changes (over 1980-2019) of these correlations, with inappropriate contributions of changes from the WRCI components.
It was found that there is not one specific univariate variable that is more responsible (i.e., more biased) than the other for the models correlation biases. Indeed, both TAS and PR appear biased. Nevertheless, these results indicate that efforts should be made to improve both the marginals and the dependence structure in the GCMs to improve the inter-variable correlations and thus reduce their biases.
Future changes in correlations have also been investigated for the two ensembles, based on 2081-2100 TAS vs. PR correlations with respect to those from 1981-2000. Significant changes were found ( Fig. 8) with season-dependent patterns but quite different for CMIP6 and CESM. This reflects a not-so-robust signal in terms of future evolution of the correlations. The analysis of the different "weather regimes conditional information" components ( , , ) showed that the future changes in conditional correlations provide the largest contributions to the future changes in unconditional correlations ( ), for both ensembles (Fig. 9). This was already true over the historical period and will continue-and will even be slightly reinforced-in future SSP585 climate scenario. Hence, although changes provided by climate models in terms of marginal properties (e.g., mean, variance, univariate distributions, temporal properties) are not challenged by this study, it is clear that the confidence on model projections regarding multivariate properties-and here, more specifically (TAS, PR) correlations-are questioned.

Discussion and perspectives
These results highlight the importance of the large-scale circulation structures/regimes and the need to understand their physical relationships with local-scale phenomena associated to specific inter-variable correlations. If these relationships are misrepresented within climate models, the local-scale correlations related to circulations (i.e., conditional correlations), as well as their changes in time, can be biased. This can lead to major biases in the unconditional inter-variable correlations and therefore on the simulated compound events. Hence, various perspectives and future works can be envisioned from this study.
First, we can note that some areas or even some regimes can be better represented than others in terms of correlations between temperature and precipitation. This type of investigation-which is out of the scope of the present study-could be one starting point to identify the potential processes correctly or badly represented in climate models, and responsible for the biases in inter-variable correlations and in correlation changes.
TAS and PR are obviously not the only climate variables. Equivalent studies could be performed for variables other than TAS and PR, for example, analysing correlations between wind and PR, or humidity and TAS, etc. Also, intervariable correlations are not the only dependence property of interest in the climate system. Equivalent studies for spatial dependencies and/or temporal dependencies (e.g., auto-correlations and/or cross-auto-correlations) could be carried out.
More generally, our results strongly motivate not only to improve climate models in terms of univariate climate variables (e.g., in order to get precipitation right, cloud parameterizations and microphysics might need to be improved), or in terms of relationships between spatial scales, but also to continue developing and improving multivariate bias correction (MBC) methods, in order to make (e.g., intervariable) dependencies more realistic. However, as this study showed that large-scale structures have influences on the local/regional-scale dependencies and, thus, on their biases, MBC must include large-scale information into the correction process. One "easy" possibility for this is to condition MBC applications on circulation regimes but other ways could be defined. This would allow MBC methods to be physically driven.
Nevertheless, signals in terms of future evolution of TAS vs. PR correlations were found to be present but not very robust, i.e., with a large variability within the CMIP6 ensemble, and overall a large variability between the CMIP6 and CESM ensembles. This asks the question of whether and how simulated future changes in inter-variable correlations should be accounted for. If the simulated changes of correlation are meaningless or considered not robust enough, the many compound events (CE) analyses in a future climate context should rather rely on a stationarity assumption for the dependence structures, only allowing to change the marginal distributions and properties of the variables of interest. This could be made by estimating the dependence properties from a reference dataset over a historical period and, then, injecting them into future climate simulations instead of their dependence. Obviously, this could have major consequences on the CE results and must be investigated with caution.
This question of stationarity assumption of the dependence structure also matters for multivariate bias correction design. Indeed, if changes in multivariate dependence (e.g., correlation) in the climate simulations are reliable, MBCs have to reproduce them and generate corrections with similar changes. However, if these simulated changes are not robust, MBC could rely on a stationarity assumption of the dependence from the reference dataset. Hence, the multivariate properties (e.g., correlations) would not evolve and stay similar to the reference. Understanding the robustness of the changes in simulated dependencies is thus key to choose and apply the appropriate MBC methods in climate change context.