1 Introduction: uncertainties in climate projections

So far, decades of research and development have gone into a better understanding of atmospheric processes and their interactions with other components of the climate system like land surface, oceans and ice. Yet, there are still large uncertainties about the future development of the climatic conditions, particularly at the regional scale. For well-planned adaptation measures, which can include the use of impact models driven by (regional) climate models, decision makers demand more precise projections of how the future might look like. Climate scientists have a hard time explaining why their models still cannot show a clearer picture of likely future changes, narrowing down the model spread of ensemble projections (Hawkins and Sutton 2009, 2011; Deser et al. 2012a).

In general, the overall uncertainty of such climate projections can be separated into three parts: (1) scenario uncertainty, corresponding to the different emission scenarios that can be used as external forcing to the climate models, (2) model response uncertainty, corresponding to the response of the different models, developed by different institutions around the globe and (3) natural variability, inherent to the chaotic nature of the climate system (Hawkins and Sutton 2009). Scenario uncertainty is difficult to reduce, since this includes future knowledge about greenhouse gas and aerosol emissions and advances in technology (or a best guess of it), but might potentially become less in the future (Moss et al. 2010). Model uncertainty might be reduced by advances in the knowledge of natural processes, and the capabilities of models in representing those. For instance, the increase in spatial resolution of regional climate models (RCMs) from 0.44° to 0.11° was shown to have a significant effect on the model performance, often referred to as “added value” (Torma et al. 2015; Fantini et al. 2016; Prein et al. 2016). On the one hand, increasing computational capabilities available to modeling groups and further advances in describing processes more accurately has the potential to narrow down uncertainties, but adding more complexity to models may on the other hand also lead to higher uncertainties in the projections (Knutti and Sedláček 2013). Constraining models by observations can also reduce the uncertainty of future projections in some cases (Borodina et al. 2017; Lorenz et al. 2018). The third part of uncertainty in climate projections is natural variability. Natural variability can be associated with interactions between the internal components of the climate system, which can lead to inter-annual, decadal or even multi-decadal variability like ENSO, NAO, etc. (Solomon et al. 2010; Zheng et al. 2018).

Observational data of the earth’s climate usually cover only several decades, and are thus not long enough to capture this mid- to long-term variability (Hawkins et al. 2016). Brisson et al. (2015) for example, used a weather generator to quantify uncertainties in estimating climatic conditions from a 30-year period as recommended by the World Meteorological Organization (WMO), and showed that these uncertainties can well exceed 15% for extreme precipitation. This lack of data makes it very difficult or nearly impossible to use observations for the quantification of natural variability on longer time scales.

A possible solution to deal with the lack of sufficiently long observational time series to assess long-term climate variability can be the use of climate models. The underlying assumption has to be of course that the climate models are capable of simulating the climate system, including its natural variability (represented by the internal variability of the models). Aalbers et al. (2018) showed for example that the interannual variability of an EC-EARTH-RACMO initial conditions large ensemble of 16 members is similar to the one of E-OBS. So, if we accept the concept of a climate model as a surrogate of the natural climate system, we can use the model to generate a huge amount of data to create plausible parallel modifications of current and future climate. Therefore, the ClimEx project (http://www.climex-project.org) provides 50 members of the Canadian global earth system model CanESM2 (Fyfe et al. 2017), dynamically downscaled by the Canadian regional climate model version 5 (CRCM5) over large parts of Europe and northeastern North America, running from 1955 to 2099. This leads to 50 parallel realizations of climate, just altered by very small differences in the initial conditions in the CanESM2 members, which makes all 50 members equally likely realizations of the same long-term climatic conditions. This unprecedented dataset, hereafter referred to as the ClimEx CRCM5 large ensemble (CRCM5-LE) is described in Leduc et al. (2019).

This ensemble can be used to better understand and quantify the uncertainties of ensemble approaches in regional climate change studies. In those studies usually an ensemble of different combinations of global climate models (GCMs) and regional climate models (RCMs) is used to cover the range of possible outcomes (Jacob et al. 2014; Vautard et al. 2014; Roudier et al. 2016; Smiatek et al. 2016; Rajczak and Schär 2017). Often, the derived uncertainties from these multi-model ensembles are relatively large, since usually all three components of uncertainty are incorporated: Different scenarios driving different GCM/RCM combinations and inherent natural variability. For regional studies using RCMs, another dimension is added, as the model uncertainty is now composed of the GCM and RCM response. Further constraint for assessing scenario uncertainty and model response uncertainty is that the “matrix” of all possible combinations of scenario, GCM and RCM is sparse and not balanced according to these three dimensions. Analysis is always restricted by the available simulations (often on an opportunity basis), thus making systematic comparisons on the influence of these factors difficult. Additionally, not all models are fully independent from each other, why the often claimed and applied model democracy (one model one vote) is increasingly challenged by some authors (Pennell and Reichler 2010; Leduc et al. 2016a; Knutti et al. 2017). In addition to the fact that multiple members are rarely available to assess internal variability, all these factors together induce large problems in distinguishing the three sources of uncertainty in climate projections in existing multi-model ensembles like EURO-CORDEX.

Therefore, several studies have produced and analyzed single model large ensembles with small alterations in initial conditions to address the uncertainty resulting from internal variability. On the GCM scale: The 40-member CCSM-LE (Deser et al. 2012a, b, 2014) and the 30-member CESM-LE (Kay et al. 2015). Another set of experiments with 90–100 members was conducted by a group of Japanese authors with a 60 km resolution global atmospheric model and a 20 km resolution regional model for a Japanese domain (Mizuta et al. 2017). The Netherlands Meteorological Institute (KNMI) performed an RCM experiment with 16 EC-EARTH members, downscaled by the Dutch regional model RACMO2 to a resolution of 0.11° for Western Europe. Aalbers et al. (2018) give an overview of this ensemble and investigate several aspects of natural variability for mean and extreme precipitation. A set of 21 members of the CESM was downscaled by the regional model COSMO-CLM at a resolution of 0.44° for Europe (Fischer et al. 2013). All these studies showed that a single model large ensemble setup can be very valuable to better quantify natural variability in climate projections. However, only very few comprehensive comparisons with multi-model ensembles have been conducted, and so far only with GCMs (Deser et al. 2012b; Kay et al. 2015; Leduc et al. 2016b).

In this study, we use the ClimEx CRCM5-LE to compare the inter-member spread due to natural variability in a single model large ensemble with the multi-model variability (which implicitly includes the effect of natural variability) of a corresponding EURO-CORDEX ensemble. This results in the following research questions:

  1. 1.

    Which part of the multi-model spread (model uncertainty and internal variability combined) of a 0.11° EURO-CORDEX ensemble can be attributed to internal variability?

  2. 2.

    What are the implications of this attribution for analyses of climate change signals from multi-model ensembles?

This study will first investigate the composition of the CORDEX ensemble in terms of GCM (member)/RCM combinations. This analysis is supposed to evaluate the ability of this ensemble of opportunity to represent the different sources of uncertainty of a multi-model ensemble in a satisfying manner for the following comparison: The signals of the CRCM5-LE are compared to the signals of the CORDEX ensemble to quantify the spread of each ensemble on a grid point and regional scale. Thus, the uncertainty of signals originating from model-internal variability (natural variability uncertainty) is compared against that originating from a multi-model ensemble (model response uncertainty and internal variability combined). After a discussion of the results, the implications for the analysis of multi-model ensembles are briefly discussed.

2 Data

Two different ensembles of RCP 8.5 driven models in daily and 0.11° resolution are used in this study:

  1. (A)

    22 models from EURO-CORDEX.

  2. (B)

    50 members of the ClimEx CRCM5-LE.

All available datasets that share the common EURO-CORDEX 0.11° grid in the EURO-CORDEX and ReKliES-De datasets at the Earth System Grid Federation (ESGF) are pooled together and only refered to as ‘CORDEX’ in the following. This means that the two MPI-REMO2009 runs, the ALARO run and the ALADIN53 run are not considered because of their different grid specifications. The UHOH data cannot be used for the signal calculation since no historical data are stored for this run [see Supplementary Material (SM) Table S1 for all excluded runs]. During the analysis of the data, the model IPSL-WRF showed very high increases (quadrupling) in summer precipitation, especially for coastal parts of France and the Mediterranean and is therefore excluded from the CORDEX ensemble. Excluding a model in a variability study for its extreme results is of course crucial. However, other studies also excluded the IPSL-WRF model from their analysis (Kotlarski et al. 2014; Smiatek et al. 2016; Rajczak and Schär 2017), supporting this decision. Differences introduced by including the IPSL-WRF model into the analysis are shown in Fig. S1, Supplementary Material. The 22 resulting CORDEX models are listed in Table 1 as a combination of RCMs and driving GCMs. These models are used to analyze the composition of the ensemble. Two members of EC-EARTH have been downscaled with RACMO22E (r1 and r12), but we only use r12 for the inter-comparison with the CRCM-LE for balancing reasons. Keeping both would double count this GCM-RCM pair (as the two EC-EARTH members are expected to be much more similar to each other than compared with other driving models), which would likely shrink the total variance of CORDEX (although probably by a small amount).

Table 1 Matrix of GCM (-member) and RCM used in the CORDEX ensemble

The second dataset is a 50-member ensemble of the Canadian earth system model CanESM2 (Fyfe et al. 2017), downscaled by the Canadian regional model CRCM5 (Version 3.3.3.1, Martynov et al. 2013; Šeparović et al. 2013) for two domains: Northeastern North America (not part of this study) and Europe. The 50 members originate from five families of simulations, each starting at different 50 year intervals of a preindustrial run with a stationary climate and run from 1850 to 1950. They are then seperated into ten members each by small atmospheric perturbations and run from 1950 to 2099. This 50 member CanESM2-LE was then dynamically downscaled within the ClimEx project with CRCM5 over a domain covering Europe (EU11d2) with a horizontal grid-size mesh of 0.11° on a rotated latitude–longitude grid, corresponding to a 12-km resolution, using 5-min time steps, which fits the common EURO-CORDEX grid specifications. Further information on the settings of the whole experiment, as well as a detailed description and analysis of the dataset (also for the American domain) can be found in Leduc et al. (2019). The stored variables and the terms of use can be found in the respective documentation on the project homepage (http://www.climex-project.org). The ClimEx 12-km grid equals the one used in EURO-CORDEX simulations, although the ClimEx domain is slightly smaller, still covering most of Europe (Fig. 1).

Fig. 1
figure 1

The EURO-CORDEX and ClimEx domains with the subregions for analysis

3 Methods

The subregions used for analysis are taken from the PRUDENCE project (Christensen and Christensen 2007) and were already used in several other studies in the context of European climate model analysis (Lenderink 2010; Lorenz and Jacob 2010; Kotlarski et al. 2014). The subregions cover almost all the ClimEx domain despite the Mediterranean parts of Morocco, Algeria and Tunisia, the Aegean Sea and small parts of Eastern Europe. Note that the Scandinavian subregion here is smaller than in other studies, since the ClimEx domain does not cover the northern parts of Scandinavia. An additional analysis domain is a combination of all subregions (‘TOT’). For the whole study, only land grid points are considered.

The data of the 22 CORDEX models is first cut to the ClimEx domain. Since the CORDEX and CRCM5-LE data share the same 0.11° grid, no interpolation is needed to compare the two datasets. Annual and seasonal means of temperature and precipitation sums are calculated at every grid point on an annual basis from 1980 to 2099. The analysis of the data comprises grid point analysis as well as spatially averaged results (arithmetic mean) for the subregions (TOT as an area weighted mean of: BI, SC, FR, ME, EA, IP, AL, MD). For the spatially averaged results, the temperature/precipitation values are aggregated before signals are computed. The change signals are calculated using a 30-year reference period (REF: 1980–2009) as a baseline for three future periods: The near future (FUT1: 2020–2049), the mid future (FUT2: 2040–2069) and the far future (FUT3: 2070–2099).

Seasonal means (temperature) and sums (precipitation) are calculated in each grid point for every year and member, followed by calculation of the temporal means for each 30-year period (REF, FUT1-3). This allows calculating the standard deviation of signals for both ensembles and every grid point. The variabilities are compared using the ratio of standard deviations:

$$ {\text{SDR }} = \sigma \left( {{\text{CRCM5}} - {\text{LE}}} \right) \, /\sigma \left( {\text{CORDEX}} \right). $$

For this comparison the EC-EARTH_r1_RACMO22E run is not part of the CORDEX ensemble (see chapter 2). Additionally a two-sample F-Test is applied at the significance level of 5%. When the Null Hypothesis of this test cannot be rejected, we assume that the two distributions have similar variances. The test also accounts for the different sample sizes of the two ensembles (50 and 21). To see how similarities in spatial patterns change over time between the two ensembles, a simple pattern correlation is calculated for moving window 30-year periods.

4 Results

4.1 Composition of the CORDEX ensemble

The CORDEX ensemble in this study consists of 22 different combinations of GCM, GCM member and RCM, with eight GCMs and five RCMs in total. Not all combinations of these have been realized, leaving about half of the GCM (-member)/RCM matrix blank (Table 1). The composition is quite heterogeneous, with a slight dominance of EC-EARTH and HadGEM2-ES for the GCMs and CCLM4-8-17 and REMO2015 for the RCMs. The EC-EARTH is the only model with different downscaled members, and fortunately there is even a pair of two members downscaled with the same RCM (RACMO22E). Additionally, there are two simulations using the first member of the CanESM2 ensemble, downscaled with CCLM4-8-17 and REMO2015, giving us insight in the role of CRCM5’s contribution to the signals of the CanESM2-CRCM5-LE. Yet overall, the sampling is relatively random, which makes systematic analysis on the influence of GCM (-member) and RCM on the variability extremely difficult. Nevertheless, it seems valuable to take a look at the variabilities inside the CORDEX ensemble to better assess the capacity of the ensemble for the variance comparison with the CRCM5-LE.

To get an overview of the influence of the components (GCM/RCM) of each simulation, the climate change signals of temperature and precipitation for the far future FUT3 (2070–2099) are displayed in scatterplots for winter (DJF, Fig. 2) and summer (JJA, Fig. 3) for the TOT domain. A general clustering of the simulations sharing the same GCM can be observed, although there are large differences between the GCM cluster extents.

Fig. 2
figure 2

Scatterplot of the signals (2070–2099 minus 1980–2009) of temperature and precipitation for winter (DJF) in the combined (area weighted mean of all other subregions) domain TOT (land grid points only). The color of the marker denotes to the GCM and the symbol denotes to the RCM. The members r1 and r3 of EC-EARTH have a black frame to distinguish them from member r12 simulations. The grey points show the signals of the 50 CRCM5-LE members

Fig. 3
figure 3

Same as Fig. 2, but for summer (JJA)

In winter, the differences between RCM simulations, driven by the same global model, range from 0.1 K in CanESM2 to 0.9 K in MIROC5 for temperature and from 3.6% points [pp] in MIROC5 to 5.1 pp in CanESM2 for precipitation (Fig. 2). These ranges are usually similar to the CRCM5-LE extent. The EC-EARTH members r1 and r3 are close again, as well as the two RACMO22E simulations (r1 and r12). The two CanESM2 simulations fit quite well into the CRCM5-LE, although being at the colder end of the cloud.

In summer, the spread of temperature signals of the same GCM downscaled by different RCMs range from 0.2 K in CNRM-CM5 to 1.5 K in HadGEM2-ES, while the spread for precipitation signals ranges from 4.5 pp in CNRM-CM5 to 36 pp in CanESM2 (Fig. 3). These GCM ranges are larger than in winter, due to the higher importance of large scale circulations, and these are mainly driven by the GCM. The RCM CCLM4-8-17 shows the strongest decreases in precipitation regardless the driving GCM—except CNRM-CM5, which generally seems to have a larger influence on the RCM output than other GCMs. The two CanESM2 simulations show large differences and span a larger range, both in temperature and precipitation, than the CRCM5-LE. The combination of the rather warm and dry CanESM2 with the also rather dry CCLM4-8-17 (usually the driest RCM, driven by the same GCM) results in an extreme decrease of precipitation, accompanied by a strong warming signal. The EC-EARTH is the only GCM with different members (1*r1, 1*r3, 4*r12), giving insight into the variability of another multi-member ensemble. The two simulations of member r1 and r3 have colder and less dry signals than the r12 simulations, yet still at the edge of the r12 cluster. The two simulations using the same RCM and only different members of EC-EARTH (r1_RACMO22E and r12_RACMO22E) are very close. The CRCM5-LE cluster is at the very warm and dry end of the CORDEX ensemble, but the MIROC5_CCLM4-8-17, HadGEM2-ES_CCLM4-8-17 and CanESM2_CCLM4-8-17 simulations show similar or even stronger signals.

The GCMs usually dominate the signal of the simulations, but there are some cases where the RCM can significantly impact the resulting signal. These findings for the TOT domain can be found in a similar manner in the subdomains as well, although differences occur of course. For example, the positive winter precipitation signals mostly originate from Northern European subregions like Scandinavia, whereas the Mediterranean shows mostly negative signals (Figs. S2 and S3, SM). On the other hand, the same contrasting signals (mostly positive in SC, mostly negative in MD) result in a more or less balanced signal spread in TOT for summer (Figs. S4 and S5, SM).

In general, this CORDEX ensemble consists of a number of GCMs and RCMs with a wide range of signals. Although it is not a perfectly composed multi-model ensemble (which would be necessary for a real structured framework), the analysis suggests the assumption that its composition represents a fair assumption of the sources of uncertainty in a multi-model ensemble. It is therefore suited for the comparison with the 50 member single model large ensemble.

4.2 Comparison of variability in signals of CRCM5-LE and CORDEX

To better assess and quantify the fraction of internal variability in the CORDEX ensemble, we compare the standard deviations of the CRCM5-LE and the CORDEX ensemble. The variability between EURO-CORDEX models is analyzed on the grid point level, while other publications usually only mark areas where models agree on the sign of change and significant changes, without quantifying the uncertainty of the respective ensemble (Jacob et al. 2014; Vautard et al. 2014; Rajczak and Schär 2017).

4.2.1 Temperature

Figures 4 and 5 show the standard deviation of signals of CRCM5-LE and CORDEX for all three future periods and the respective SDR values for winter and summer temperature over Europe (respective figures for SON and MAM: Figs. S6 and S7, SM). The CRCM5-LE variability shows a large scale gradient from lower values in the West to higher values in the (North-) East, which might be associated with increasing continental climate as seen in Köppen-Geiger climate classifications (Beck et al. 2018). In winter CORDEX shows a similar, yet not so clear gradient, whereas in summer higher values seem to be found in southern parts of Europe. In both seasons, the most obvious gradient in CORDEX appears in mountainous regions like the Alps, the Pyrenees and the Scandinavian Mountains. Additionally, the variability increases from FUT1 to FUT3, especially in winter. The CRCM5-LE in contrast shows rather small variability in these mountainous regions.

Fig. 4
figure 4

Rows 1–2: standard deviation of the winter temperature (tas-DJF) signals of all models in each ensemble for the three future periods. Note the logarithmic scale. Row 3: SDR (σ [CRCM5-LE]/σ [CORDEX]) for the respective periods. Row 4: Two-sample F-Test on equal variances at 5% significance level; For white grid points, the Null Hypothesis of equal variances cannot be rejected (Equal VAR = ‘true’)

Fig. 5
figure 5

Rows 1–2: standard deviation of the summer temperature (tas-JJA) signals of all models in each ensemble for the three future periods. Note the logarithmic scale. Row 3: SDR (σ [CRCM5-LE]/σ [CORDEX]) for the respective periods. Row 4: Two-sample F-Test on equal variances at 5% significance level; For white grid points, the Null Hypothesis of equal variances cannot be rejected (Equal VAR = ‘true’)

The SDR mainly lies well below 1 in both seasons for most of Europe. This result suggests that for temperature, only a small part of the variability in the CORDEX ensemble can be explained by internal variability. For most parts, the SDR is smaller in summer than in winter. This is a result of two opposing effects: On the one hand, the overall CRCM5-LE variability is higher in winter; on the other hand, the CORDEX variability is smaller in winter. This mainly affects the British Islands, Scandinavia and other parts of Northern Europe, where SDR values can even exceed 1, especially in the early and mid-future. In these cases, the internal variability estimated from CRCM5-LE is larger than the CORDEX multi-model variability. While this result would be rather unexpected in a systematic framework, the current CORDEX imbalanced composition could lead to an underestimation of either (or both) RCM and GCM contributions to the total ensemble spread. In this context, it is not clear whether CRCM5-LE over- or underestimates the average internal variability of the CORDEX models. It is also not clear to which extent the true internal variability of the CORDEX ensemble is fully sampled by the available simulations.

A two-sample F-Test reveals the grid points with similar variances in both ensembles (Figs. 4 and 5, lowest rows). Empirical analysis shows that these are generally grid points with SDR values between 0.7 and 1.5 (similar to findings of Deser et al. 2012b). The share of grid points with similar variance decreases in both seasons for further future periods, with generally higher values in winter. Interesting to note is how the British Isles and parts of Norway fail the test for FUT1 because of the high variance in CRCM5-LE, showing similar variances for FUT2 and FUT3 with increasing CORDEX variability and relatively stable CRCM5-LE variability.

4.2.2 Precipitation

The precipitation signals are calculated for each member individually in percent change, so the standard deviation over these members is also expressed in percent change. In winter, the general patterns of CRCM5-LE and CORDEX are quite similar with higher variability in southern parts of Europe (Fig. 6). Northern Africa shows the largest standard deviations of relative changes, because the absolute sums of precipitation are relatively small here. A remarkable band of high variability stretches from the southeastern parts of Spain over coastal France to the Italian Alps in both ensembles. The variability in CRCM5-LE on the Iberian Peninsula is higher than in CORDEX, leading to high SDR values in FUT1. Other high SDR values can be found in Northern Africa and some mainly coastal areas in FUT1 and FUT2, whereas in FUT3 most of Europe shows SDR values around 1 and below.

Fig. 6
figure 6

Rows 1–2: standard deviation of the winter precipitation (pr-DJF) signals of all models in each ensemble for the three future periods. Note the logarithmic scale. Row 3: SDR (σ [CRCM5-LE]/σ [CORDEX]) for the respective periods. Row 4: Two-sample F-Test on equal variances at 5% significance level; For white grid points, the Null Hypothesis of equal variances cannot be rejected (Equal VAR = ‘true’)

Some differences in the variability of signals can be observed in summer between the ensembles, despite a general increasing West–East and North–South gradient. The variability in CORDEX is larger in all future periods and almost all of Europe (Fig. 7). Topography does not seem to be a significant factor in CORDEX, while CRCM5-LE displays especially low variability in mountainous regions, as already seen for temperature in winter. This results in SDR values around 1 in FUT1, decreasing to values well below 0.5 for FUT3. For FUT1 in summer, the number of grid points with similar variability is comparable to the number found in the map for winter (Fig. 6), but decreases significantly until FUT3 in contrast to the winter season. The respective figures for SON and MAM can be found in the SM, Figs. S8 and S9.

Fig. 7
figure 7

Rows 1–2: standard deviation of the summer precipitation (pr-JJA) signals of all models in each ensemble for the three future periods. Note the logarithmic scale. Row 3: SDR (σ [CRCM5-LE]/σ [CORDEX]) for the respective periods. Row 4: Two-sample F-Test on equal variances at 5% significance level; For white grid points, the Null Hypothesis of equal variances cannot be rejected (Equal VAR = ‘true’)

The pattern correlations as function of the time horizon between the standard deviation maps of both ensembles show two different behaviors for temperature and precipitation (Fig. 8). Correlations are generally high for precipitation as well as for summer and fall temperatures. For all precipitation seasons, a decrease of correlation can be observed, which fulfills the expectation of a decreasing contribution of internal variability on the overall variability in the further future.

Fig. 8
figure 8

Pattern correlations of the standard deviations of CRCM5-LE and CORDEX for all seasons with a 30-year moving window; the x-axis denotes to the 30-year period beginning with the respective year (2020–2049, 2021–2050,…)

Temperature seasons show a remarkable behavior. The pattern correlation increases in all seasons during the first half of the twenty-first century, and remains more or less stable in tas-DJF and tas-MAM, while dropping significantly for tas-JJA and tas-SON afterwards. The patterns of temperature thus do not seem to follow the expectation of decreasing correlation with time, and even increase for winter and spring. Further research is needed in this direction, especially if these results can be reproduced with other initial condition ensembles in the future.

To give an overview, Fig. 9 shows the regionally averaged SDR grid point values for all seasons and future periods for the subregions from Fig. 1. In general, the contribution of internal variability is much higher for precipitation than for temperature, and it decreases significantly the further the future period lies ahead. Annual and summer temperature/precipitation and fall temperatures have significantly lower SDR values than the other seasons. The share of boxes with at least 2/3 of the grid points confirming the F-Test on equal variances is especially high for temperatures in FUT1 (12 boxes) and precipitation in FUT1 (32) and FUT2 (15). Ratios above 1 mainly appear in FUT1 for spring temperatures and fall precipitation. The threshold of 2/3 is chosen on the basis of similar existing concepts like robustness of change as a function of the numbers of climate models agreeing in the sign of change (Jacob et al. 2014).

Fig. 9
figure 9

SDR in all regions for annual and seasonal temperatures (ac) and seasonal precipitation (df) in the three future periods. The white dots indicate no rejection of the Null Hypothesis (of equal variances) of the two-sample F-Test at a significance level of 5% in at least 67% of the grid points in this subregion

5 Discussion

5.1 Composition of the CORDEX ensemble

The composition of the 22-member CORDEX ensemble is defined by the available datasets that match the preconditions described in the Data part of this study (chapter 2), leading to an ensemble of opportunity as a result. Although there have been efforts to fill the matrix of GCM-RCM-combinations, totally systematic analysis to separate contributions from GCMs and RCMs to the total ensemble spread is difficult—if not impossible—due to the sparsity of the matrix, the imbalance between GCM/RCM combinations, and the lack of several members for each combination to better discriminate model uncertainty from internal variability (except for EC-EARTH, all GCMs only have one member). The uncertainty of the CORDEX ensemble signals is thus a combination of model response uncertainty and internal variability.

Overall, the spread of signals in CORDEX can mostly be explained by the different GCMs used for long-term projections, as already described by Kendon et al. (2010). Especially for larger GCM samples (EC-EARTH and HadGEM2-ES), the GCM dominance can be observed more robustly. There have been attempts to separate out the noise components in climate model ensemble signals. For example, Saffioti et al. (2017) showed that the removal of atmospheric circulation variability largely decreases the spread of trends in an initial condition ensemble as well as a multi-model ensemble of GCMs. Since there is no better multi-model ensemble available so far (in terms of sampling different models and members), the CORDEX ensemble can be seen as a first order approximation of the uncertainty in a multi-GCM/RCM ensemble. To better assess the uncertainties between signals in RCMs and their driving GCMs over Europe further research is needed. For instance, considering several RCMs driven by the same GCM could help to better understand the uncertainty due to the choice of the RCM, while doing similarly over an RCM column (in Table 1) would allow to assess the RCM sensitivity to boundary conditions. Nevertheless, we tried to assess the contribution of internal variability to the CORDEX ensemble spread by comparing with CRCM5-LE, where the internal variability is sampled in a systematic manner.

5.2 Comparison of signals in CRCM5-LE and CORDEX

The comparison of a single model large ensemble, comprised of 50 initial condition members of the CanESM2-CRCM5 model chain (CRCM5-LE) with a multi-model ensemble of 21 different EURO-CORDEX models was conducted to better quantify the contribution of natural variability in climate change signal uncertainty. In general, the CRCM5-LE ensemble shows stronger climate change signals for temperature than the CORDEX ensemble—especially for summer, where CRCM5-LE signals are very dry and warm. The negative precipitation signals in summer and fall are not due to a general dry bias of the CRCM5-LE data, since neither the CanESM2 nor the ERA-Interim driven CRCM5 simulation showed significant dry biases for most parts of Europe from 1980 to 2012, especially not for the Mediterranean, where the severe decreases are projected (Leduc et al. 2019). The CanESM2_CCLM4-8-17 simulation shows even drier signals for these seasons than the already relatively dry CRCM5-LE, while the CanESM2_REMO2015 simulation fits well into the CORDEX ensemble. The choice of RCM for downscaling CanESM2 thus seems to have a significant influence on the summer precipitation signals. A similar range can also be found for the five HadGEM2-ES simulations.

5.3 Signal variability comparison

Grid-point based analysis showed that the CORDEX variability is usually much higher than the CRCM5-LE variability for temperatures (Figs. 4 and 5). However, both ensembles generally show an increase in variability of temperature towards more continental climates in the East, which suggests that this gradient in CORDEX is at least partly due to internal variability. Only in the near future during winter larger parts of Europe show similar variabilities in both ensembles. For the British Isles and Norway, CRCM5-LE even shows higher variability in FUT1 than CORDEX. A significant difference between the two ensembles is that mountainous areas in CORDEX often show the highest standard deviations for temperature, while in CRCM5-LE they show rather low values. This is probably because of the different orographies and snow representations used in the different RCMs in CORDEX, since it is mostly visible during DJF and MAM, when the largest snow packs are present. For summer and winter precipitation the patterns of equal variances are way more scattered over all of Europe (Figs. 6 and 7). While in FUT3 for summer almost no grid points show similar variances, the internal variability can still reach a similar variance than the multi-model variance in large parts of Europe in winter precipitation.

The analysis of pattern correlations between signals of the two ensembles gave a two-folded result. While for precipitation, the expectation of decreasing pattern correlations was met, the temporal dynamics for temperature did not show clear indications (Fig. 8). Further research may be needed to clarify the relationships between patterns in this context.

In general, the contribution of internal variability is much higher for precipitation signals than for temperature signals. Additionally, the influence of internal variability significantly decreases for later future periods (Fig. 9). Nevertheless, in many regions the contribution lies between 0.25 and 0.5 for seasonal temperature, and between 0.5 and 1.0 for seasonal precipitation. SDR values around or even above one, seem to be implausible on the first glance. Even if internal variability plays an important role in the uncertainty, the sum of model response uncertainty and internal variability in the CORDEX ensemble should generally be higher than the internal variability alone in CRCM5-LE. These values probably occur when the CORDEX ensemble cannot capture the whole range of internal variability because of the limited sampling in this ensemble. Or the internal variability of the CRCM5-LE is not representative for other models (GCM/RCM combinations). Further research is needed on the comparison of the representations of internal variability in different large ensembles.

Deser et al. (2012b) conducted a similar experiment with 21 CMIP3 models and 40 CCSM3 members, building a ratio between the standard deviations of trends from 2005 to 2060 globally. They also find ratios above 0.75 and 1.0 for large parts of Europe for annual temperature and precipitation—the latter generally showing much higher ratios. The ratios found by Deser et al. (2012b) for Europe are usually higher than the ones calculated for the ensembles in this study. This might originate from the different models and methods. While we used dynamically downscaled regional models for Europe, they used GCMs. Additionally they quantified the precipitation trend variability in mm/day, while we used relative changes.

To evaluate the influence of spatial resolution on the regionalized SDR values, we conducted a small methodological experiment, which cannot directly clarify the differing results, but helps to identify possible sources better. For the values in Fig. 9, the SDR is calculated on a grid point scale and is averaged over the subregions afterwards (Method M1). Another method can be to average over the temperature/precipitation values as a first step and do the signal, variability and SDR computation with these spatially averaged values afterwards (Method M2). This is a possible way to “simulate” a coarser resolution of the underlying data, like it would come from a very coarse GCM. The differences between the two methods are shown in Fig. 10. A comparison of the results on annual time scale (Deser et al. 2012b only show annual results) reveal no difference between the two methods for tas-Y, and show higher SDR values by M1 for pr-Y. Thus, a coarser resolution data set will not produce higher ratios in this experiment. This contradicts the hypothesis that differences in the spatial resolution of the applied models could explain differing results. The differences between the two studies in terms of used models and applied methods make the identification of the reasons behind the differences even more difficult.

Fig. 10
figure 10

Difference of regional SDR outcome between methods M1 and M2 (M1 minus M2); M1 results are shown in Fig. 9

6 Conclusions

Natural variability (represented by the model-internal variability of a single model large ensemble) can play a major role in the variability magnitude of future climate projections, depending on the regarded variable, season, region and time horizon. These findings are of such importance, since climate modelers are often facing criticism for the large uncertainty of ensemble projections, with the criticism implying that the variety of model results is a consequence of the models’ inability to correctly represent climate processes (model response uncertainty). If natural variability can explain a large part of the spread of models, then the differences in climate signals from different models might not only be a result of insufficient or competing models, but might also be partly explained by natural variability. Following the idea that natural variability is inherent to the chaotic nature of the climate system and therefore cannot be diminished, a certain part of uncertainty of climate projections will be irreducible, even if scenario definition will become more precise and models will improve (see also Deser et al. 2012b).

The implications for the interpretation of multi-model ensembles in cases with similar variabilities (e.g. precipitation in DJF), might become clearer as a short mind game: First we need to accept the natural variability of the CRCM5-LE as a fair approximation to adopt it for other models. Then two things become apparent:

  1. (A)

    We can add the CRCM5-LE variability as a “cloud” of internal variability (gray dots in e.g. Fig. 3) around each of the dots for the CORDEX models. This is blurring the model response uncertainty dramatically in some cases. This means that only one realization, as in CORDEX usually available, does probably not depict the model response very well.

  2. (B)

    On the other hand, if the CORDEX ensemble might even be totally (or to a large part) explained by natural variability, model response uncertainty may be interpreted as neglectable in these cases.

These conclusions are of course very much depending on the length of the time period, variable, season and region considered, and are not meant as universally valid for multi-model ensembles like CORDEX.

As a short outlook, the existing regional single model ensembles (Fischer et al. 2013; Aalbers et al. 2018) still need to be analyzed in more depth, since they show large potential for a better understanding of climate change uncertainty. Additionally, the previous results should be verified by more single model ensembles, and the differences between these kinds of ensembles need to be specified, e.g. to see if their representations of natural variability are similar (see also Xie et al. 2015).