1 Introduction

For more than 2 decades, coordinated efforts of applying regional climate models (RCMs) to downscale global climate model (GCM) simulations for Europe have been pursued by an ever-increasing group of scientists (Rummukainen et al. 2015; Rummukainen 2016). This endeavour showed its first results during framework projects supported by the European Union and sister projects in North America all initiated in the 1990s (Christensen et al. 1997; Machenhauer et al. 1998; Takle et al. 1999; Hagemann et al. 2001). Here, the foundation for today’s advanced World Climate Research Programme (WCRP) initiative COordinated Regional Downscaling EXperiment (CORDEX) (Giorgi and Gutowski 2015; Gutowski et al. 2016) was laid out, as the first ensembles of coordinated RCM simulations aiming to assess future regional climate change emerged. It was already realized, however, at this early stage that systematic model biases in GCMs as well as RCMs made this task very challenging (Christensen et al. 1997, 2001; Pan et al. 2001; Schiermeier 2004). As an immediate outcome, the idea was therefore conceived to undertake even more concerted efforts by constructing even more well-defined and structured sets of common simulations; this led to the PRUDENCE project (Christensen et al. 2002; Christensen and Christensen 2007) (2001–2004). Additional coordinated efforts involving an increased number of GCMs and RCMs then followed in the ENSEMBLES project (Van der Linden and Mitchell 2009; Christensen et al. 2010) (2004–2009) and continue in the ongoing Euro- and Med-CORDEX initiatives (Jacob et al. 2014; Ruti et al. 2015) (2011-present). Meanwhile, along with the overall coordination, model resolutions have increased from a grid point distance of about 50 km (PRUDENCE) to 12 km (Euro-CORDEX11) and from time slice simulations covering 30 years (PRUDENCE) to transient experiments representing the time span of 1951–2100 (ENSEMBLES and CORDEX); from two, but one dominating, driving GCMs and the SRES (Nakicenovic et al. 2000) A2 and B2 emission scenarios (PRUDENCE) to several GCMs (Euro-CORDEX) and multiple RCP (Meinshausen et al. 2011) scenarios. So far, this wealth of simulations has mainly been used to provide a measure of baseline change according to a particular emission scenario, or relating to the passage of global mean temperature thresholds, e.g., 2 °C (Vautard et al. 2014). This has typically been defined by a multi-model mean change with associated uncertainties estimated from model spread within the specific ensemble (Christensen and Christensen 2007; Deque et al. 2007; Van der Linden and Mitchell 2009; Jacob et al. 2014) and has resulted in largely incomparable projections that only leave room for relatively simple statements about the future climate conditions in Europe. Only in a few cases a comparison between some parts of the modelling suites have been attempted (Vautard et al. 2014; Rajczak and Schär 2017; Fernández et al. 2019), but never across the entire suite.

Climate projections such as those derived from the abovementioned efforts are widely used to explore projected impacts of future climate change. Specifically, for use in risk-based impact analyses there is an increasing demand for high-resolution probabilistic climate change information at the regional scale based on such multi-model approaches (e.g., Jacob et al. 2014). The interest in achieving robust projections is shared by different scientific communities as well as by practitioners and other stakeholders striving to identify means and measures to construct robust estimates of future changes, their impacts and consequences at the relevant scales, where measures to adapt to these changes need to be taken.

The ability to simulate regional climate realistically is a formidable challenge and while improvements to do so have been steady they have also been very slow (Collins et al. 2013; Christensen et al. 2013). As a result, the full plausible range of climate change for any given scenario cannot in practice be assessed using the presently available model information (Hawkins and Sutton 2009), such as that defined by CORDEX, nor even when including also the extensive coarser-resolution and larger-scale Coupled Model Intercomparison Project Phase 5 (CMIP5) (Taylor et al. 2011) database. Consequently, pattern scaling (i.e., simple scaling of model-mean changes of temperature and precipitation patterns with global mean temperature change) has long been used to generalize climate change information beyond the information available from individual climate models (Santer et al. 1990; Tebaldi and Arblaster 2014; Matte et al. 2019). This includes the Fifth Assessment Report from the Intergovernmental Panel on Climate Change (IPCC AR5) (Collins et al. 2013; IPCC 2013), where this approach was used to directly compare CMIP3 (Meehl et al. 2007) and CMIP5 annual multi-model mean temperature and precipitation change signals. Here it was demonstrated that the many fundamental differences between the model set-ups in CMIP3 and CMIP5 made the ensembles different in a statistical sense, even though visually the annual multi-model mean changes are largely indistinguishable in the two model compilations (Collins et al. 2013).

In this paper we investigate the projected regional climate change scenarios for Europe across PRUDENCE, ENSEMBLES and Euro-CORDEX, the latter at two different model resolutions (P–E–C, hereafter). Using a methodology based on the abovementioned pattern scaling to circumvent the many differences that exist between the three generations of multi-model ensemble approaches within the P–E–C experimental sequence (see Fig. 1), we systematically reassess their results across different emission scenarios, GCM/RCM combinations, model resolution, project and model versions and improvements. The added value of regional climate models in terms of increased resolution over coarser GCMs, e.g., in terms of an improved representation of relevant climate processes at the regional to local scales, is well documented. Here we assess to which degree model improvements and higher resolution from regional downscaling over time have been able to resolve apparent inconsistencies in climate change projections for Europe, e.g., the opposite trends in precipitation change inferred from individual RCMs for certain areas and seasons. Specifically, we consider the projected temperature and (relative) precipitation changes on a seasonal scale, i.e., model mean regional climate change signals for summer (June–July–August; JJA) and winter (December–January–February; DJF) for all the available P–E–C simulations using the original model resolution, followed by a scaling in relation to the global average warming of the driving global model. The scaled projections are also compared with observed trends from the middle of the last century to the present, a period over which the global mean temperature rose close to 1 °C. The overarching objective is to identify components of the model projections that may possibly be considered robust and in line with already experienced change (e.g., Pal et al. 2004). This comparison is not an attempt to assess to what extent the observed changes are anthropogenic in origin, but to demonstrate whether or not the projected changes found using the P–E–C suite of models resemble current long-term observed trends when these are robust and statistically significant.

Fig. 1
figure 1

GCM/RCM combinations in the PRUDENCE, ENSEMBLES, CORDEX11 and CORDEX44 experimental sequence. RCMs are sorted alphabetically within each model family. Models omitted in the EOF analysis (see text) are marked with an asterisk (*). The total number of individual simulations studied are 15 (PRUDENCE), 18 (ENSEMBLES), 34 (CORDEX44), and 32 (CORDEX11)

2 Methods

The three research projects in the P–E–C sequence span a period of nearly 20 years of European regional climate modelling developments. As a result, each of them is constrained in different ways (e.g., by computing resources). This reveals itself in the way the various experiments were conducted. PRUDENCE was initially set up to only use one specific set of boundary conditions for all RCMs, from the Hadley Centre HadAM3H (Buonomo et al. 2007) model. This atmosphere-only GCM was set up in comparatively high resolution at the time, 150 km grid distance, for two time slices: a present day time slice 1961–1990 with observed sea surface temperature (SST) and sea ice, and a future time slice 2071–2100 with forcing from SRES A2, while taking the SST and sea ice climate change signal from a HadCM3 (Johns et al. 2003) global coupled simulation and added to the fields of the observed present-day period. Corresponding simulations following the (weaker) SRES B2 scenario were also performed. Regional simulations were set up with a common integration area of 50 km resolution. During the PRUDENCE project the HadAM3H-driven simulations were supplemented by a few simulations forced by the ECHAM4/OPYC (Roeckner et al. 1996) coupled global model, also following the SRES A2 and B2 scenario (see also Christensen et al. 2001). Several PRUDENCE simulations used somewhat different configurations resulting in different areas than the standard due to specific considerations at the respective institutions, who carried out the simulations.

The ENSEMBLES experiment was set up to include several sets of GCM boundary data and ended up using 9 different coupled GCM simulations for simulations reaching 2100, all following the SRES A1B scenario. Regional simulations were transient, covering the full integration period of 1951–2100, and used a common integration area with a 25 km grid distance. Further, in this project, the exact area configuration was fixed except for a few necessary exceptions due to specificities of the map projection used by certain regional climate models.

CORDEX is still ongoing, and new simulations are added continually. Two different resolutions are used for Europe, 0.44° and 0.11°, corresponding to around 50 km and 12 km grid separation, respectively, created by aggregating or disaggregating the ENSEMBLES domain. Scenarios are now based on the Representative Concentration Pathways (RCPs; van Vuuren et al. 2011), and many simulations exist with RCP2.6, RCP4.5 and RCP8.5. Only simulations from the latter two are included in this study, in parts due to the relatively limited number of available simulations for RCP2.6 compared to the other two. Moreover, since the RCP2.6 simulations do not consistently project a global warming signal of more than 1 °C within the twenty-first-century, we want to avoid extrapolation when applying a pattern scaling approach (detailed in the following section).

2.1 Climate scenarios and pattern scaling

The pattern scaling approach we use to bring together different members of the PRUDENCE, ENSEMBLES and CORDEX ensembles is similar to the one adopted by IPCC AR5. It allows us to scale the geographical patterns of projected changes in temperature and precipitation to 1 °C of global warming for all simulations and thus to facilitate their detailed comparison. Specifically, we use the projected global mean temperature change in 2100 by the driving GCM for each of the individual RCMs and climate scenario (PRUDENCE: A2 and B2; ENSEMBLES: A1B; CORDEX: RCP4.5 and RCP8.5) as a general scaling parameter. The intensity of the simulated regional change is scaled with respect to the modelled annual mean global averaged temperature change (deduced from the driving GCM) at each grid point, which allows for identifying similarities and differences in the simulated spatial patterns across emission scenarios, GCM/RCM combinations, model resolution and model generation.

2.2 Uncertainty metrics and robustness definitions

To assert the estimation of temperature and precipitation change and the corresponding robustness for projected climate scenarios across the P–E–C suite, every available model simulation within these experiments should be used. Historical and projected runs in the suite include the period 1961–1990 and 2071–2100 for PRUDENCE whereas corresponding periods are 1981–2000 and 2081–2100 for ENSEMBLES and 1986–2005 and 2081–2100 for Euro-CORDEX. The climate change scenarios used include SRES A2/B2 (pre-CMIP3) for PRUDENCE, SRES A1B (CMIP3) for ENSEMBLES and RCP4.5/RCP8.5 (CMIP5) for Euro-CORDEX. Model resolutions are 50 km and 25 km for PRUDENCE and ENSEMBLES, respectively, whereas Euro-CORDEX simulations comes in both a 50 km (Euro-CORDEX44) and 12 km (Euro-CORDEX11) resolution, resulting in four different lines of model suites to be compared.

Within each model suite, the individual model mean difference between the periods mentioned above is calculated per season for DJF and JJA and subsequently scaled with the corresponding globally averaged annual mean temperature anomaly for the driving GCM and scenario in concern. For ENSEMBLES and CORDEX, these are available from CMIP3 and CMIP5, while the necessary data for PRUDENCE were collected in Christensen et al. (2001). After scaling, we infer the multi-model mean change(s). The colouring in Figs. 2 and 3 illustrates the result of this exercise, adopting the same colour map and scale as used in IPCC AR5. Results are presented in the original model resolution corresponding to the particular part of the P–E–C model suite, e.g., 50 km for the PRUDENCE simulations. As a measure of robustness or model agreement, the inter-model standard deviation, which we hereafter refer to as ‘noise’ (N), within each model suite is calculated to overlay the plotted change ‘signal’ (S) with the corresponding S/N ratio plotted as contours.

Fig. 2
figure 2

Seasonal mean scaled temperature change [°C] for PRUDENCE, ENSEMBLES, Euro-CORDEX44 and Euro-CORDEX11 for the JJA and DJF seasons. The contour lines show the ratio between the multi-model mean and the multi-model standard deviation (S/N; see Methods). Hatching indicates areas with S/N < 1. As indicated in the figure S/N is everywhere larger than one (for comparison see Fig. 3)

Fig. 3
figure 3

Seasonal mean scaled relative precipitation change [%] for PRUDENCE, ENSEMBLES, Euro-CORDEX44 and Euro-CORDEX11 for the JJA and DJF seasons. As in Fig. 2 the contour lines show the S/N ratio, while hatching indicates areas where S/N < 1

For an ensemble of independent RCMs with a long-term, mean seasonal average temperature and precipitation of similar magnitude, the model-mean state, as here represented by S, has been demonstrated to be a representative measure of the projected future climate (Collins et al. 2013). Along the same lines, the model spread, here represented by N, is often used to represent the ensemble robustness in terms of specifying a range for the projected future climate. In Figs. 2, 3 and 6, areas with S/N < 1 are depicted with hatching, emphasizing areas of model disagreement. Conversely, regions without hatching thereby exhibit a certain level of robustness. This follows the approach by Laux et al. (2017) and Matte et al. (2019) and to some extent IPCC AR5 (IPCC 2013).

2.3 EOF analysis

To confirm potential changes in spatial patterns of change between the model suites, we use an Empirical Orthogonal Function (EOF) analysis (Lorenz 1956), based on the same models as are used to construct the mean climate change signal and for the S/N analysis. The analysis was carried out using the implementation in MATLAB (MATLAB and Statistics Toolbox Release 2017a, The MathWorks, Inc., Natick, Massachusetts, United States). In order to perform this analysis, the results of all simulations in P–E–C were interpolated to the 50 km resolution adopted in PRUDENCE and Euro-CORDEX44. To ensure a suitable spatial extent of the EOF analysis, six PRUDENCE simulations (three models—scenario A2/B2) and one ENSEMBLES simulation however had to be omitted, since these unfortunately exclude larger parts of the European domain. In the following distributed plots of model EOFs, Principal Components (PCs) and variabilities will be used to support inter-comparison of the four lines of models discussed above.

2.4 Comparison to observed trends

The similarities between the P–E–C simulations has been compared to current observed trends. For demonstration we use the CRU TS 4.01 dataset (Harris et al. 2014), which covers all land areas except Antarctica with a resolution of 0.5° for the period 1901–2016. However, as pointed out in IPCC AR5 (Bindoff et al. 2013) only the latter half of the twentieth-century is clearly influenced by anthropogenic climate change at the regional scale. We thus restrict ourselves to the period 1950–2016. For these six-and-a-half decades, we calculate the linear trend scaled with the corresponding global mean temperature trend, its statistical significance together with the smallest and largest linear trends that consistently fit the observations within the error margins for each grid cell. This is done as follows: For the period 1950–2016, a least square fit has been applied for each land grid point (as the CRU TS dataset is only defined over land). For temperature the best fit and the smallest and largest trends within the error margins are then calculated accordingly. The scaling with the global temperature trend is expressed as the quotient of the local and the globally averaged annual mean temperature trend for the same period. Thus, a value of 2 implies that the local trend is twice as large as the global one. For precipitation a similar approach is applied but it is here calculated as percent per 1 °C global warming: from the regression curve, we calculate the percentage change with respect to the first year; then the scaling with global temperature change is done as for temperature. To estimate the confidence interval for the slope of the regression line, the standard error (SE) of the distribution of the slope needs to be known. Assuming a normal distribution, SE can be expressed by the following equation

$$SE = {{\sqrt {\frac{{\left( {y_{i} - \hat{y}_{i} } \right)^{2} }}{n - 2}} } \mathord{\left/ {\vphantom {{\sqrt {\frac{{\left( {y_{i} - \hat{y}_{i} } \right)^{2} }}{n - 2}} } {\sqrt {\varSigma (x_{i} - \bar{x})^{2} } }}} \right. \kern-0pt} {\sqrt {\varSigma (x_{i} - \bar{x})^{2} } }}$$

where yi is the value of the dependent variable for observation i, ŷi is the estimated value of the dependent variable for observation i, xi is the observed value of the independent variable for observation i, \({\bar{\text{x}}}\) is the mean of the independent variable, n is the number of observations, and n − 2 is the number of the degrees of freedom. Significance levels are calculated with a p value of 0.05; for temperature and precipitation, we applied a one-sided, respectively, a two-sided Student t test.

In order to go beyond merely visual comparison of similarity, we calculate the ranking of the observational trend pattern among the simulation ensemble of climate change fields in terms of pattern correlation.

3 Results

3.1 Seasonal temperature and precipitation change

Figures 2 and 3 compare the projected seasonal multi-model mean JJA and DJF changes in temperature [°C] and precipitation [%] per degree of global annual mean temperature change across the P–E–C sequence. Broadly spoken, there is an overall agreement of the patterns of change in either variable and for both seasons. While the projected warming signal in the PRUDENCE simulations stands out as compared to the others and is considerably stronger during JJA, there is a better general agreement in the projected warming pattern in DJF. The excessive JJA warming signal in PRUDENCE has previously been attributed partly to systematic errors in the surface schemes in the older models (Christensen et al. 1997; Hirschi et al. 2011; Boberg and Christensen 2012), but is also related to the strong dependence on the single dominating driving GCM. The projected change patterns in mean precipitation are quite similar in DJF, while there is a much stronger drying signal in PRUDENCE during JJA. This can be largely explained by the same systematic errors responsible for the excessive Mediterranean JJA warming (Seneviratne et al. 2006).

Figures 2 and 3 show that the amplitude of the multi-model mean climate change signal for both temperature and precipitation is mostly reduced along the P–E–C sequence. The temperature response over Europe in JJA as well as in DJF is everywhere large enough to ensure a signal-to-noise (S/N) ratio well above 1 (and generally higher in Euro-CORDEX11 than in PRUDENCE), as is documented by Fig. 2. Relative changes in precipitation are much more complicated to interpret, primarily due to the fact that wherever modest changes are projected in the multi-model mean, even a numerically small inter-model spread will result in a low S/N ratio. Roughly speaking, across the P–E–C suite of models, any projected change within ± 5% has a S/N ratio below 1, while numerically larger changes appear more robust. The S/N ratio for precipitation in DJF is well above 1 except for a relatively narrow transition zone in southern Europe separating a Mediterranean drying from a central and northern European moistening. This signal is basically unaltered along the P–E–C sequence (Fig. 3). For JJA the reduced amplitude in the climate change signal along the P–E–C sequence results in an expansion of the transition zone between drying and moistening, suggesting that the only really robust climate change signal related to precipitation is the drying over the Iberian Peninsula and central parts of Turkey and a moistening in the northernmost parts of Scandinavia. This is also emphasized by the gradual reduction in amplitude of the climate change signal along the P–E–C sequence. We note that the relatively large transition zone in JJA with no statistically significant projected precipitation change may be seen as consistent with “no change”. We further note that the areas here identified with an S/N < 1 seem to closely resemble the regions identified in IPCC AR5 (IPCC 2013) to exhibit a natural variability exceeding the climate change signal.

3.2 EOF analyses

In order to test the robustness of the projected changes in temperature and precipitation, we perform an EOF analysis on the full set of scenarios from the P–E–C sequence. Figure 4 displays the loading on the first three principal components (PCs) for each of the RCM/GCM combinations shown in the same P–E–C sequence (in arbitrary units). Importantly, a few models can be identified as having a substantially anomalous loading on the PCs. Assuming that any identifiable outlier model could be responsible for a poorer S/N ratio and thus significantly affect the perceived robustness of the regional climate change signals, we redid the EOF analysis leaving out those model runs that are identified as ‘outliers’ in Fig. 4. We found, however, the effect to be small with regards to both the multi-model mean and the S/N ratio for the full P–E–C sequence as well as for each of the four sub-samples (not shown). The small and in practical terms negligible effect of removing single outliers clearly reflects that the ensemble size proves to be large enough not to be strongly influenced by outliers. This suggests that the multi-model ensemble that is represented by Euro-CORDEX11 yields a fairly robust estimate of the future European climate, i.e., scaled with global mean temperature change corresponding to RCP4.5 and possibly RCP8.5 but not beyond (i.e., any forcing stronger than RCP8.5 or beyond the twenty-first-century). This limitation is to avoid extrapolation rather than interpolation of the multi-model ensemble mean change.

Fig. 4
figure 4

Values (arbitrary units) of the first three principal components (PC1 to PC3) relating to individual models from the P–E–C multi-model ensembles for JJA temperature, DJF temperature, JJA precipitation and DJF precipitation. The grey shading (see the legend) distinguished the model sub-samples, whereas inserts show the variance within each sample normalized by the variance for the entire ensemble

We calculate the total variance for the investigated fields and seasons for each project individually (cf. Table 1). For each field a tendency towards higher inter-model agreement within each project along the P–E–C sequence is shown with the exception of JJA precipitation; conversely, DJF precipitation shows very little change. Further, projection on individual EOFs (not shown) reveals that the observed increase in the variability of JJA precipitation rests solely on the first and dominating EOF, which is a generally homogeneous pattern across most of Europe. Thus, we find a large variability in the sensitivity of JJA precipitation in the climate projection for both CORDEX and ENSEMBLES as compared to PRUDENCE. This we ascribe to the already mentioned fact that PRUDENCE is dominated by one specific GCM simulation.

Table 1 Variance of PCs within each project (arbitrary units)

For each of the three dominating EOFs we have further investigated the corresponding PCs for each of the sub-ensembles (PRUDENCE, ENSEMBLES, Euro-CORDEX44 and Euro-CORDEX11) separately. The ensemble variance can be split into the combined variance within each of the model ensembles and the variance among the individual model means, weighted by the number of simulations in each ensemble. The ratio of these two indicates the degree to which the ensembles are different with respect to the EOF in question. The values after proper normalization (explained in the caption of Table 2) can be seen in Table 2. Note that this estimation is highly approximate, as we are here treating the PCs of model results as independent, normally and identically distributed quantities (iid). For JJA temperature, PC1 is evidently the pattern with the largest difference between ensembles of all fields investigated.

Table 2 Statistical F test quantity, basically the ratio between variance between ensemble means weighted by the number of simulations in each ensemble and the weighted mean intra-model variance

Figure 5 shows EOFs for the fields referred to in Table 2. Signs of PCs and EOFs have been chosen to make the majority of grid points in EOF maps positive. Interpretations of PCs and EOF’s from Figs. 4 and 5 for the individual fields/seasons follow below.

Fig. 5
figure 5

Spatial patterns corresponding to the leading three PCs (EOF1–EOF3) for seasonal mean scaled temperature change, respectively, seasonal mean scaled relative precipitation change for PRUDENCE, ENSEMBLES, Euro-CORDEX44 and Euro-CORDEX11. All patterns are normalized with the sum of squares being unity. First and third row indicate JJA, second and fourth row indicate DJF. The sign of the EOF plots is selected to reflect a positive sum across cells

The first EOF pattern for JJA temperature shows a much increased warming in southern Central Europe, particularly southern France compared to the other ensembles. In Fig. 4 it is seen that the PC loadings for the PRUDENCE simulations is much more positive than the other simulations (cf. Table 1). The non-homogeneous distribution of PC values among PRUDENCE simulations seen in Fig. 4 indicates that rather few simulations are responsible for this deviation.

For DJF temperature, only the third PC is seen to be very inhomogeneous across the different modelling projects in Table 2. This pattern shows a north–south gradient (Fig. 5 row 2). A sub-set of CORDEX at both resolutions shows a larger than average warming at the northern edge of the domain, whereas PRUDENCE shows lower values than the ensemble-mean (Figs. 2, 4). It should be noted, however, that the corresponding amplitude of this PC is not very high (Fig. 4); the various project signals shown in Fig. 2 do not look very different.

For JJA precipitation (Fig. 5 row 3) two inhomogeneous PCs are found: PC number 1, where most models of PRUDENCE as well as of ENSEMBLES (Fig. 4 row 3) generally show much larger drying over Europe than the CORDEX simulations, particularly over the Mediterranean region (see also Fig. 3). Also, PC number 3 indicates an anomalously high precipitation increase in the Baltic Sea and decrease in the western Mediterranean, which is higher for PRUDENCE than for the other ensembles. This is a feature most likely related to the dominating single GCM in this set of experiments (Christensen and Christensen 2007).

Finally, the last row of Fig. 5 indicates that the signal of DJF precipitation is quite independent of project across all the different ensembles, except for PC number 3, which indicates that PRUDENCE shows a significantly higher precipitation increase in Central and Northern Europe compared to the other projects (Fig. 3).

Overall, all of the abovementioned findings (Fig. 5) are consistent with the patterns shown in Figs. 2 and 3, and serve to document the robustness of these signals across the P–E–C sequence.

3.3 Observed trends

To test the results discussed above, we have compared the scaled regional climate model projections with observed trends from 1950 to 2016. Figure 6 depicts the observed linear trends in seasonal mean temperature and relative precipitation change, scaled by the observed global temperature change to allow for direct comparison with the scaled results from the three regional climate model ensembles (Figs. 2, 3). For observed temperatures, even the smallest trend within the estimated error margins can be found to be significantly different from zero (p = 0.05) almost everywhere, in particular in JJA (no hatching). Conversely, in the eastern Mediterranean, the observed trends are generally weak; and they are not significant in DJF. This corroborates findings, e.g., by Barkhordarian et al. (2016).

Fig. 6
figure 6

Scaled observed linear trends in seasonal mean temperature [°C] and relative precipitation change [%]. The hatching indicates where the pointwise trends are not considered significant with a p value of 0.05

As expected, the observed seasonal temperature (Fig. 6) increases across European land masses are in general larger than the annual global mean, with the strongest increase depicted towards the northeast in DJF and towards the southwest in JJA. Considering precipitation on the other hand, the observed trends are far from significant everywhere: in DJF, we find significantly positive trends only over Russia, northern Scandinavia and Scotland. Isolated positive trends are also found further south, whereas significantly negative trends are found over the Iberian Peninsula. In JJA there is a significant increase in precipitation in the east (north-eastern Scandinavia and parts of Turkey). For precipitation, neither the smallest nor the largest trends within the error margins are significantly different from zero with the notable exceptions just mentioned (not shown).

In order to compare and quantify the robustness of the projected changes in temperature and precipitation with these patterns in observed trends, we examine the normalized patterns from observed trends and from all of the 99 P–E–C simulations. We calculate the root mean squared distance between each pair of patterns; this distance is obtained by subtracting the spatial average from each pattern and normalizing with the spatial standard deviation over the grid points in question; then calculating the root of the squared sum of differences between these normalized patterns over the area. This is different from the EOF analysis, where the pointwise ensemble-mean pattern is subtracted, and no normalization is performed. We sum all distances from each pattern to all other patterns. The rank of each pattern among the 100-member set is calculated according to this sum. For the observational pattern, the lower this rank, the higher the agreement between the observational pattern and the ensemble of model patterns.

Examining the first line of Table 3, it seems that the pattern of observational trends has deviations from the set of model results, particularly for winter temperature, since the ranking of the observations is close to the bottom position of number 100. Considering only geographical points, which are not hatched in Fig. 6 (line 2 in Table 3; i.e., areas of significant observational change) the situation changes somewhat towards higher agreement between observation and model ensemble, particularly for winter temperature and summer precipitation, but not for winter precipitation. The number of relevant points, which are land points in the observational lattice with a significant (p = 0.05) observational trend inside the common model domain, are listed in line 3 of Table 3; for winter temperature all possible points have significant observational trend, whereas only around one eighth of the points have a significant summer precipitation trend (cf. Fig. 6).

Table 3 Ranking of the pattern of observations among the 100 patterns examined, according to the sum of distances to all other elements

4 Discussion and conclusion

In view of the known model- and GCM/RCM configuration deficits discussed above, particularly in the older part of the P–E–C sequence, the pattern-scaled projections of temperature and precipitation change across 20 years of model evolution appear quite similar. Likewise, the patterns of change of annual mean temperature and precipitation can be found to compare well with the results from global models (not shown; e.g., Collins et al. 2013). This apparent robustness of the P–E–C sequence represents an important contribution to credibility of the projections provided by the Euro-CORDEX data set representing the current state of the art.

A second argument towards the aim of credibility is the degree of similarity of the P–E–C climate change patterns with the scaled observed linear trends for both JJA and DJF temperature and precipitation where these trends are statistically significantly different from zero, as indicated by comparing Figs. 2 and 3 with Fig. 6 and specifically in Table 3. This only holds marginally for winter conditions, in particular for precipitation. In addition, it is not well established how to attribute the observed trends between decadal scale natural variations and a forced signal associated with global warming. The results regarding a lesser disagreement between models and observation when only considering points with a significant observational trend suggest that the uncertainty in values for observed trends caused by decadal climate fluctuations is, to a large degree, responsible for the disagreement between observations and models. There are, however, caveats to the conclusion: with fewer points, there will be a larger noise obscuring differences between model ensemble and observations; also, many model simulations are very similar (e.g., members of both CORDEX11 and CORDEX44), and therefore the low rank of observations is probably too extreme compared to a case with more independent simulations.

A third objective would be the ability to demonstrate that internal decadal variability is small in comparison to the climate change signal. While this is beyond the scope of the present paper, we note that Matte et al. (2019) demonstrate this to hold for the Euro-CORDEX ensembles, whenever S/N is approximately larger than 1. Despite the higher degree of freedom following from a higher grid resolution of app. 12 km, the large-scale pattern and even local details in Euro-CORDEX11 are clearly in line with coarser resolution findings, suggesting that credible projections can be deduced from this data set. The usefulness of the multi-model mean values as a measure of the projected signal of change and the associated robustness is supported by recent work on the behaviour of multi-model ensemble data by (Christiansen 2017). That said, maps using multi-model statistics to represent the projected future climate change in Europe must in general be considered with caution, as no individual model will replicate the ensemble-mean patterns of change, i.e., the pattern representing the multi-model mean may not be plausible in a strictly physics based interpretation (Madsen et al. 2017).

In this study we consider the robustness of temperature and precipitation changes at the seasonal scale. However, there is supporting evidence that these findings could be extended to other time scales as well, such as the occurrence of heatwaves or extreme daily precipitation (Boberg et al. 2009, 2010; Fischer and Schär 2010; Christensen et al. 2015; Rajczak and Schär 2017). Likewise, we have here only considered the average climate conditions—a necessary prerequisite to moving on to climate parameters/conditions representing, e.g., extremes. Again, the apparent robustness of the multi-model mean climate change signal across the P–E–C model suite indicates that for the overall warming patterns, and for the drying and moistening tendencies, more complex climate variables may also be robust. This proposition clearly needs to be tested, and analyses such as comparing the results of regional climate simulations with observed trends at the grid point level could be applied if the existence of observational data sets will allow for such a comparison. To this end, new data sets combining high resolution in both time and geography are critically needed. Yet, it still remains to be seen whether the implied matching trends are formally detectable or attributable to anthropogenic climate change (Allen and Ingram 2002; Hall and Qu 2006).