1 Introduction

The amount, distribution, and intensity of precipitation has major impacts on ecosystems and society, since heavy precipitation may lead to large damages caused by floods, debris flows, or landslides, while the absence of precipitation may cause droughts and has impact on water- and hydropower supply. Consequently, precipitation is regarded as one of the most relevant meteorological variables for society and its regional alteration along with global warming is currently one of the most discussed topics in climate change research.

However, simulating precipitation is challenging because of the wide range of processes involved. There are three factors which affect precipitation (Sawyer 1956): (1) cloud processes and convection, (2) the interaction of the atmospheric flow with the surface, and (3) the large-scale atmospheric circulation. Particularly cloud processes like phase transitions are partly still not well understood and one of the major sources of uncertainty in climate simulations (e.g., Stocker et al. 2013).

Decreasing the horizontal grid spacing in climate models to \(0.11^{\circ }\) can help to improve factors (2) and (3) by better representing surface characteristics (e.g., orography and coastlines) and by more accurately solving the equations of motion. Several studies investigated the influence of model grid spacing on precipitation. Giorgi and Marinucci (1996) showed that precipitation amount, intensity, and distribution are sensitive to grid spacing. By investigating seasonal mean precipitation in nine regional climate model (RCM) simulations from the ENSEMBLES project with 25 and 50 km grid spacing (Rauscher et al. 2010) found that spatial patterns and temporal evolution of summertime precipitation (but not for winter) are improved in most 25 km simulations. An improvement of the higher resolution was especially visible in topographically complex regions, which is in line with findings by Chan et al. (2013). Chan et al. (2013) further emphasize the importance of highly resolved observational data sets to capture regional-scale climate signals.

Major improvements in representing cloud processes and convection (factor 1) can be expected when convection permitting models, using a grid spacing finer than 4 km, are used (e.g., Weisman et al. 1997; Kendon et al. 2012; Prein et al. 2015). On these grids error prone convection parameterizations schemes can be avoided by resolving deep convection explicitly. Convection permitting simulations might also alter the projected climate change signals especially of sub-daily extreme precipitation (Kendon et al. 2014; Mahoney et al. 2012). However, the drawback of these kind of simulations is that they are computationally very demanding. Therefore, transient climate simulations on convection permitting grids are not feasible at the moment on continental-scale domains. Chan et al. (2013) investigated the simulated precipitation of a 50, 12 and 1.5 km grid spacing model over the Southern United Kingdom. The 50 km model underestimates mean precipitation over mountainous regions and the simulated precipitation intensity is too weak. Both biases are reduced in the 12 and 1.5 km model. On a daily time scale, they found no evidence that the skill of the 1.5 km model is superior to the skill of the 12 km model. This is consistent with previous findings (e.g., Prein et al. 2013a; Ban et al. 2014; Fosser et al. 2014), which show that added value of convection permitting simulations can be predominantly found on sub-daily timescales.

Previous European ensemble RCM initiatives defined a target grid spacing of \(0.44^{\circ }\) in case of the PRUDENCE project (Christensen et al. 2007; Jacob et al. 2007) and up to \(0.22^{\circ }\) in case of the ENSEMBLES project (van der Linden and Mitchell 2009). The European branch of the COordinated Regional climate Downscaling EXperiment (CORDEX) called EURO-CORDEX (Jacob et al. 2014) is the first initiative in which multiple RCMs are used to simulate transient climate change with \(0.11^{\circ }\) horizontal grid spacing for an entire continent. In parallel, similar simulations on a \(0.44^{\circ }\) grid are conducted.

In this study we present a comparative evaluation of precipitation from \(0.11^{\circ }\) and \(0.44^{\circ }\) grid spacing EURO-CORDEX simulations by applying scale sensitive and intensity dependent statistical methods. The analysis focuses on the entire ensemble in order to achieve robust results with regard to the effect of model resolution, but does not aim for an in-depth analysis of the performance of single models. A similar set of simulations was already used for a standard evaluation (Kotlarski et al. 2014) and an analysis of heat waves (Vautard et al. 2013). Both studies could not identify added value in the skill of the high-resolution models to simulate regionally and seasonally averaged quantities. Compared to Kotlarski et al. (2014) we focus solely on the evaluation of precipitation and we investigate model performance on a daily and local scale, rather than averages over long periods and larger regions. Another major difference is the usage of highly resolved regional precipitation data sets in this study, which have approximately a ten times higher station density than the E-OBS data set that was used in Kotlarski et al. (2014). This is essential for local-scale analyses (Prein and Gobiet 2015).

Our major research questions are:

  • Is there improved skill in simulated precipitation if the horizontal grid spacing of climate models is increased from \(0.44{^\circ }\) to \(0.11{^\circ }\)?

  • On which spatial scales do differences occur?

  • Are the differences dependent on the intensity of precipitation?

  • What are the main sources of differences?

To answer these questions, six statistical methods are used. We begin with analyzing biases in simulated seasonal mean and extreme precipitation (Sect. 3.1). Then the location and total area of grid cells where a majority of the 0.11\({^\circ }\) models improve/deteriorate seasonal mean and extreme precipitation biases compared to the 0.44\({^\circ }\) models are evaluated (Sect. 3.3). Further, the ability of the RCMs to simulate seasonal average spatial patterns of precipitation is investigated for different horizontal scales (Sect. 3.4) and for different precipitation intensities (Sect. 3.5). In Sect. 3.6 differences in daily precipitation patterns are analyzed by accounting for spatial displacements and finally, 3 hourly and daily precipitation distributions are compared to observations (Sect. 3.7).

2 Data and methods

2.1 Models

Simulations from eight different models of the EURO-CORDEX ensemble (or model versions in the case of WRF) are analyzed within the 19 year period 1989 to 2007 (see Table 1). With each model a pair of simulations with \(0.44^{\circ }\) (approximately 50 km) and \(0.11^{\circ }\) (approximately 12.5 km) horizontal grid spacing has been performed. Both simulations of each pair have a similar setup except for the grid spacing and the associated time step. Only in the case of REMO rain advection was used in the \(0.11^{\circ }\) simulation, which was not applied at \(0.44^{\circ }\).

All models, except ARPEGE, are RCMs and forced by the European Centre for Medium Range Weather Forecasts Interim Reanalysis (ERA-Interim) at their lateral boundaries and by sea surface temperature in the interior of the EURO-CORDEX domain. In contrast, ARPEGE is a global climate model (GCM), its temperature, wind speed, and specific humidity is nudged towards ERA-Interim outside the EURO-CORDEX domain similar to a RCM with and global relaxation zone. Inside the domain no nudging was applied in any of the models. The sea surface temperature, used in the simulations, is taken from ERA-Interim. An overview on the used greenhouse gas and aerosol forcing can be found in the Online Resource 1 Table A1.

Table 1 List of models
Table 2 List of observations

The EURO-CORDEX domain covers the entire European Continent and large parts of Northern Africa and therefore includes a wide range of climate zones (Fig. 1a). The boundaries of this domain are given in a rotated coordinate system with the rotated North Pole at \(198.0^{\circ }\) East and \(39.25^{\circ }\) North and the top left corner of the domain at \(331.79^{\circ }\) East and \(21.67^{\circ }\) North. The domain extends 106 grid cells to the East and 103 to the South with a grid spacing of \(0.44^{\circ }\). In general a zone of a few hundred kilometers was added around the EURO-CORDEX domain to account for the relaxation zone and to prevent spurious boundary effects to enter the analysis.

2.2 Observations

Highly resolved observational data sets are an elementary ingredient for the detection of added value in high-resolution (\({\le}0.11^{\circ }\)) models (e.g., Chan et al. 2013; Prein and Gobiet 2015). However, for daily precipitation on the pan-European scale only the E-OBS gridded data set is available, which has a rather coarse grid spacing of \(0.22^{\circ }\), a low station density in some regions, and known deficiencies, especially with regard to extremes, in orographically complex areas, and areas where the station density is low (Haylock et al. 2008; Hofstra et al. 2009, 2010). These shortcomings motivated us to use regional precipitation data sets from several weather services in Europe. A comparison of those regional data sets with E-OBS can be found in Prein and Gobiet (2015).

In total eight gridded data sets are used, which cover Switzerland, the Alps, Germany, France, the Carpathians, Sweden, Norway, and Spain (see Fig. 1a; Table 2). Except for Switzerland and France, the data sets are solely based on station data, provided on a daily basis, and cover the entire simulated period 1989 to 2007. The Swiss data set (RdissagH) is derived from a combination of surface stations and four weather radar images and has an hourly frequency starting on May 1, 2003. The French data set (SAFRAN) is a regional reanalysis in which observations where assimilated. It is originally provided on a hourly basis. The Alpine data set includes areas in Germany and France, which overlap with the observational data sets of these countries. For the analysis we compare the simulated precipitation with single observational data sets (region by region) and do not account for differences between observational data sets in the overlapping areas.

All precipitation data sets are affected by systematic errors of rain gauge measurements. The most severe source of errors is caused by wind field deformations around the gauge and the induced under-catch of precipitation particles. The resulting underestimation depends on the type and intensity of precipitation, the type of gauge, and the wind speed. In case of rain the errors are on average 3 % but can be as large as 20 % (Sevruk and Hamon 1984). Snow measurements are usually affected by much larger errors, which can be up to 80 % in case of non-shielded gauges (against 40 % in case of shielded gauges) for wind speeds of 5 m/s and temperatures above −8 °C (Goodison et al. 1997). Additionally, systematic errors occur by interpolating the point measurements onto a grid. In case of the Alpine EURO4M-APGD data set this leads to an underestimation of high intensities in the range of 10–20 % (smoothing effect) and an overestimation of low intensities (moist extension into dry regions) (Isotta et al. 2013).

These observational errors have to be kept in mind throughout the study. For simplicity, we still use the terms “bias” and “error” when we compare the simulations to observations and use “differences”, “improvements”, “deteriorations” and so on when the two model grid spacings are compared with each other.

2.3 Evaluation methods

Common evaluation grids To compare the \(0.11^{\circ }\) and \(0.44^{\circ }\) simulations with observations, they have to be available on a common evaluation grid. The most suitable evaluation grid depends on the underlying research question. First of all, the grid spacing of the observational data set defines the finest scale on which comparisons are meaningful. In this study the grid spacings of the observational data sets are smaller or equal to the fine gridded simulations (\(0.11^{\circ }\)) in any case. Further, one may decide to evaluate on the grid of the coarse-gridded (\(0.44^{\circ }\)) or on the grid of the fine-gridded simulations. The former is the “fairer” option with regard to the \(0.44^{\circ }\) simulations, since it compares only features, which are resolvable by all models. The latter option (evaluating on the \(0.11^{\circ }\) grid) penalizes the coarse-gridded simulations, because even a perfect \(0.44^{\circ }\) simulation would feature pattern errors because of missing sub-grid-scale features. However, from an end user’s viewpoint this option can provide valuable insights, since precipitation data is frequently used on very small-scales, e.g., as a driver for hydrological simulations. Even though the comparison is somehow “unfair” for the coarse-gridded simulations, it is not trivial for the \(0.11^{\circ }\) models to produce meaningful information on scales smaller than \(0.44^{\circ }\). Therefore, most analyses of this study are conducted on the \(0.44^{\circ }\) grid, but differences to the analyses on the fine grid are discussed and depicted where needed.

Technically, the different data sets are transferred to the evaluation grid by a conservative resampling procedure (Suklitsch et al. 2008). In the first step all grids are artificially refined (if necessary) to a grid spacing, which is at least three times finer than the one of the evaluation grid, in order to reduce sampling errors. Thereby, all smaller grid cells retain the value from the larger grid cell they originate from. After refining, all grid cells whose centers are inside a grid-box of the evaluation grid (\(0.44^{\circ }\) or \(0.11^{\circ }\) regular lon/lat grid) are averaged. This method can be used to transfer finer to coarser grids and vice versa, while spatial averages and patterns are conserved.

Intensity dependent analysis Climate models usually have intensity dependent errors in their simulated precipitation (e.g., Themeßl et al. 2011). This is because heavy precipitation is often caused by small-scale processes (e.g., deep convection) while processes leading to light rain are more large-scale processes (e.g., stratiform precipitation in a warm front) that can be resolved in a \(0.44^{\circ }\) model. Therefore, it can be expected that decreasing the horizontal grid spacing of climate models is especially beneficial for high precipitation intensities.

This is examined by evaluating total and extreme precipitation separately. Thereby, all values above the 97.5 percentile are called extreme and are selected in observations and simulations independently. This means extremes do not have to match in time. Additionally, analyses are performed for different intensity classes (Sect. 3.5) and for different intensity thresholds (Sect. 3.6).

Scale-dependent spatial correlation analysis Seasonal extreme and mean precipitation patterns are evaluated by using the Pearson product-moment correlation coefficient (Pearson 1895). Information about scale dependence of correlation coefficients is derived by smoothing out smaller-scale precipitation patterns with a square boxcar averaging method. Thereby, the smoothed field \(R_{i,j}\) is calculated from the original field \(A_{i,j}\) as follows:

$$\begin{aligned} R_{i,j}=\frac{1}{w^2}\sum _{k=0}^{w-1}\sum _{l=0}^{w-1} A_{i+k-(w-1)/2,j+l-(w-1)/2}, \quad \begin{array}{ll} i= \frac{w-1}{2},\ldots ,N-\frac{w+1}{2}\\ j=\frac{w-1}{2},\ldots ,M-\frac{w+1}{2} \end{array} \end{aligned}$$
(1)

where w is the side length of the square smoothing window and N respectively M denote the number of elements in rows and columns. If the smoothing window contains points, which are outside the evaluation domain, the nearest edge points are used instead to derive the smoothed result.

Additionally to this method, two further methods were tested to derive scale-dependent information. Since the three methods lead to very similar results, only the results of the square boxcar averaging are shown here.

Patterns of daily precipitation fields Even if the patterns of simulated daily precipitation fields are very realistic, evaluation with traditional methods like squared errors or correlation coefficients may indicate very low quality of the simulation. This is due to the chaotic nature of precipitation cells, leading to simulated patterns that do not match the exact location and time of observed precipitation cells, although the spatial and temporal frequencies and averages may be very realistic (double penalty problem, e.g., Prein et al. 2013a).

To avoid this problem we apply the fractions skill score (FSS) method (Roberts and Lean 2008), which is based on the assumption that a useful simulation has a realistic spatial frequency of precipitation. Therefore, the fraction coverage in neighboring grid cells (cells within a square window with side length n centered on a grid point) in the observation and simulation are used to calculate a Fractions Brier Score (FBS):

$$\begin{aligned} \text {FBS}=\frac{1}{N} \sum _N \left( \frac{1}{m}\sum _m I_S -\frac{1}{m}\sum _m I_O \right) ^2 \end{aligned}$$
(2)

where m is the number of grid boxes in the neighborhood (\(m=n \cdot n\)), N is the number of neighborhood windows in the domain (number of grid cells), and \(I_O\) (\(I_S\)) is the indicator if the precipitation in a grid box is above a threshold (1 = yes, 0 = no). Finally, the FSS is computed as follows:

$$\begin{aligned} \text {FSS}=1-\frac{\text {FBS}}{\frac{1}{N} \left[ \sum _N \left( \frac{1}{m} \sum _m I_S \right) ^2 + \sum _N \left( \frac{1}{m} \sum _m I_O \right) ^2 \right] }. \end{aligned}$$
(3)

Here the FBS is divided by the worst possible simulation results without any overlap between observation and simulation. A perfect simulation has a FSS of 1 while a complete mismatch results in an FSS of 0. The FSS is a function of horizontal scale (side length n of the square window) and precipitation threshold. The statistical value, which will be investigated here, is the difference (\(0.11^{\circ }\) minus \(0.44^{\circ }\)) between the median FSS from all precipitation days (>1 mm in the observation) within a season.

3 Results

3.1 Precipitation biases and spatial error variability

First, we investigate the climatological errors of median (Fig. A1 in the Online Resource 1) and extreme (days with values above 97.5 percentile) precipitation (Fig. 2) compared to observations on the common \(0.44^{\circ }\) evaluation grid. Simulated minus observed mean seasonal precipitation is calculated for total precipitation and for extremes on a grid point basis. From the resulting biases the median, 25 percentile, and 75 percentiles over all grid boxes of an evaluation domain are derived. The difference between the 75 minus 25 percentile (Q75 minus Q25) will be further on denoted as spatial error variability.

The \(0.11^{\circ }\) simulations tend to produce heavier extreme precipitation than their \(0.44^{\circ }\) counterparts (symbols in Fig. 2 are below the diagonal), however, this can not be generalized. For example, the \(0.11^{\circ }\) median June, July, and August (JJA) extreme precipitation in REMO is lower in all regions (symbol is above the diagonal) while in the RCA4 \(0.11^{\circ }\) simulation it is always higher (below the diagonal).

Improvements of the spatial error variability can be predominantly found in mountainous regions like the Alps (panel a and b), Norway (panel i and j), Spain in December, January, and February (DJF) (panel e), France (panel k and l), and the Carpathians in JJA (panel n). These are also the regions where most of the \(0.11^{\circ }\) simulations are improving the median extreme precipitation bias (symbols located in the green area of Fig. 2). Deteriorations of the spatial error variability are found in Sweden during DJF (panel g) and mixed results prevail in the other regions and seasons.

Results for March, April, and May (MAM) and September, October, and November (SON) (not shown) are frequently in between those of DJF and JJA. The main characteristics of biases and spatial error variabilities of mean precipitation (Online Resource Fig. A1) are similar to those of extreme precipitation. This means, models that underestimate extreme precipitation usually also underestimate total precipitation sums. Also differences in the median biases between the two model resolutions are similar to those of extreme precipitation.

Summing up, biases in extreme and mean precipitation averaged over larger regions are not clearly improved in the \(0.11^{\circ }\) simulations. This means, simulations with \(0.44^{\circ }\) grid spacing might be sufficient if regional average precipitation is of interest.

3.2 Precipitation biases versus model resolution differences

In Fig. 3 we show the relation between the seasonal absolute biases in the mean and extreme precipitation of the \(0.44^{\circ }\) simulations and the precipitation differences between the \(0.11^{\circ }\) and \(0.44^{\circ }\) simulations.

In DJF and JJA (Fig. 3 upper/lower panel) the differences between the \(0.11^{\circ }\) and \(0.44^{\circ }\) simulations are typically smaller than the biases in the \(0.44^{\circ }\) simulations (y-axis ratios are smaller than one). For mean precipitation (green) the differences between the \(0.11^{\circ }\) and \(0.44^{\circ }\) simulations are within 20 and 50 %/80 % (DJF/JJA) of the \(0.44^{\circ }\) simulation biases. For extreme precipitation this ratio is higher and typically between 50 and 100 % with outliers up to 190 %. This means, theoretically, the potential for added value is higher for extreme than for mean precipitation.

For mean precipitation largest ratios are found for the Alps, the Carpathians, France, and Spain while lowest appear in Norway and Sweden. For extreme precipitation Germany, Sweden and the Carpathians have the highest and Norway, Spain, and the Alps the lowest ratios. The reason for this is primarily the magnitude of the absolute biases in the \(0.44^{\circ }\) simulations because the biases strongly vary between different regions in Europe while the grid spacing differences are more uniform. This can already be seen in Fig. 2 and Fig. A1 where the symbols tend to align along the diagonal and do not scatter too much. However, these two figures can not be directly compared to Fig. 3 because they do not show absolute biases and therefore positive and negative biases can cancel out by spatial averaging.

Replacing the precipitation of the \(0.44^{\circ }\) simulations with those of the \(0.11^{\circ }\) simulations in the divisors leads to similar results (not shown).

3.3 Analysis of grid cell biases

Here we investigate how well spatial patterns of extreme and mean precipitation are represented in the EURO-CORDEX \(0.11^{\circ }\) and \(0.44^{\circ }\) simulations by detecting regions of consistent improvements or deteriorations in the \(0.11^{\circ }\) simulations. The term consistent improvement/deterioration is used if more than six out of the eight \(0.11^{\circ }\) simulations (more than 75 %) show smaller/larger absolute biases on specific grid cell than their \(0.44^{\circ }\) counterparts.

European Alps In the Alps extreme precipitation patterns are spatially and temporally highly variable (Fig. 4a, e, i and m) with two distinct hot spots around the Tessin and the Julian Alps (the sub-regions are indicated in Fig. 1b). In addition, the Ligurian Alps and the north-eastern Adriatic coast are highly affected by extreme precipitation in SON and DJF.

Extreme precipitation in the Tessin is well simulated in the multi-model-mean (except for DJF where an overestimation is dominant in the entire Western Alps) while extreme precipitation is underestimated in the Julian and Ligurian Alps.

Consistent improvements are found in 30–40 % of the evaluation grid cells, while consistent deteriorations are only found in 1–8 % (Fig. 4, right column).

In DJF (panel a–d) the domain wide average extreme precipitation bias is close to zero in both resolutions (Fig. 2a), but regionally large differences occur. The minimum and maximum values of the ensemble mean bias are larger in the (\(0.11^{\circ }\)) ensemble (panel b). However, there are large areas where biases are consistently improved (panel d). Added value is particularly visible south- and northward of the Alpine divide, while there are small areas of deteriorations (precipitation overestimation) along the Alpine divide.

In MAM (panel e–h) the domain wide average extreme precipitation bias is close to zero as in winter, and the largest differences between the \(0.44^{\circ }\) and \(0.11^{\circ }\) ensemble occur in the Western Alps. Consistent improvements in the \(0.11^{\circ }\) ensemble can be found along the entire Alpine chain, the Ligurian Alps and Adriatic coast.

In JJA (panel i–l) the domain wide average extreme precipitation is underestimated in both ensembles and most added value can be found in this season. Not only the spatial mean but also the minimum and maximum biases are improved. There are large areas in the south and western part of the Alps where extreme precipitation is consistently improved in the \(0.11^{\circ }\) runs (39 % of the entire area). In the \(0.44^{\circ }\) ensemble too much precipitation is produced along the Alpine divide and too little southward. Both error patterns are nicely corrected in the \(0.11^{\circ }\) simulations.

SON is the season with highest extreme precipitation in the Alps (panel m–p). Although there is a general underestimation of about 10 %, the basic patterns are well simulated in the fine and coarse gridded ensemble. Nevertheless, the \(0.11^{\circ }\) simulations have consistent improvements especially in the mountainous and coastal areas of the domain.

The basic error characteristics are similar for mean precipitation (Online Resource 1 Fig. A2) as for extremes. The simulations are too dry SON and JJA southward of the Alps and to wet in the Alps. In JJA the \(0.11^{\circ }\) simulations mitigate the dry bias a lot. In DJF and MAM too wet conditions are simulated in and northwards of the Alps. In all seasons mean precipitation is consistently improved in large areas by the \(0.11^{\circ }\) simulations (between 30 and 37 % of the evaluation domain). The location and the amount of improved areas are very similar to those of extreme precipitation.

Germany In Germany the season with the highest extreme precipitation amounts is JJA and the season with the lowest is DJF. There is only one major hot-spot, which is located at the borders to Austria and the Czech Republic and which is clearly related to topography. A minor hot-spot can be found in the western part of the Central German Uplands. Northern Germany shows a uniformly gradient where extreme precipitation decreases from west to east (except for JJA).

In DJF (panel a–d in Fig. 5), but also in the transition seasons (not shown) this large-scale gradient is too weak in the EURO-CORDEX simulations, which leads to a growing overestimation of extreme precipitation towards the eastern part of Northern Germany. Areas which are consistently improved and deteriorated by the \(0.11^{\circ }\) runs are approximately in balance. One pattern, which is clearly improved, is the overestimation of extreme precipitation in the southern part of Germany.

In JJA the EURO-CORDEX simulations overestimate extreme precipitation in Central Germany. Extremes are underestimated in the southeast and particularly in the hot-spot region. Consistently improved and deteriorated areas are both small and added value is primarily found in the mountainous areas, while in the flat land, the results are often even deteriorated. Also in MAM and SON (not shown) the consistently improved areas are small and no clear advantage of the \(0.11^{\circ }\) simulations can be detected. Generally, added value of fine-grid spacings is tied to mountainous regions in Germany.

For JJA and SON mean precipitation (Online Resource 1 Fig. A3, SON not shown) improved and deteriorated areas, in the fine-gridded runs, are also small and balance each other. Similar as for extremes, mean DJF (Online Resource 1 Fig. A3) precipitation is consistently improved in Southern Germany and additionally also in Eastern Germany. For MAM (Fig. 10c) larger parts of Central and the North East coast of Germany have a better representation of mean precipitation in the \(0.11^{\circ }\) ensemble.

Spain In Spain a north-south precipitation gradient with high amounts of mean and extreme precipitation in the north is present. Most of the precipitation is falling during DJF while JJA is very dry. Hot-spots of extreme precipitation are the Pyrenees, the Cantabrian Mountain Chain, and parts of the Mediterranean Coast during DJF and SON.

In DJF (panel a–d in Fig. 6), the intensity of extreme precipitation is too low in the \(0.44^{\circ }\) simulations. This is improved by the \(0.11^{\circ }\) runs. Consistent added value covers 43 % of the total territory, while deterioration is only found in 4 % of the region.

In JJA (panel e–h), the underestimation in the \(0.44^{\circ }\) simulations is also improved by the \(0.11^{\circ }\) runs. The consistently improved areas cover 30 % of Spain and are mainly located along the coastlines and the Pyrenees.

Similar or even larger improvements can be found in MAM and SON (not shown) where consistent improvements in the fine-gridded models appear in 44 % of the domain and have similar patterns as in DJF. For mean precipitation (Online Resource 1 Fig. A4) the \(0.11^{\circ }\) ensemble shows highest advantages in SON where 54 % of the domain are improved followed by MAM (44 %), DJF (38 %), and JJA (20 %). The locations of the improved areas are similar to those of extreme precipitation.

Norway and Sweden The precipitation patterns in Norway and Sweden are dominated by the Scandinavian Mountains, which reach from southern Norway up to the North Cape. The mean and extreme precipitation patterns are quite homogeneous in Sweden, which is located downstream of the coastal mountain range. The band of most extreme precipitation follows the Norwegian coastline and is divided into two hot-spots. One is located in Western Norway and the second in the south of Northern Norway. The seasons with the highest mean and extreme precipitation are SON and DJF while spring has the lowest values.

In DJF (panel a–d in Fig. 7) the \(0.44^{\circ }\) simulations underestimate extreme precipitation in large parts of the domain but especially in the two hot-spot regions. The \(0.11^{\circ }\) runs have smaller biases but still underestimate extreme precipitation in large areas. The most consistent improvements can be found along the Atlantic coast and the Norwegian Mountains (25 % of the total area) but improvements barely occur downstream of the Scandinavian Mountains. Deteriorations are found in 12 % of the area but differences are small.

Smaller biases are found in JJA (panel e–h). Both ensembles still underestimate extreme precipitation in Norway and also most added value is found here. In Sweden no clear benefits are visible.

Simulated extreme precipitation in MAM shows similar shortcomings as in JJA and patterns in SON are comparable to those in DJF (not shown). Mean precipitation is underestimated along the Atlantic coast and overestimated in Sweden in all seasons (Online Resource 1 Fig. A5). Contrary to extreme precipitation, consistent improvements in the \(0.11^{\circ }\) simulations are not restricted to mountainous areas, but are also found in the flat areas of Sweden.

France In France extreme precipitation is heaviest in the south and is located in the Western Alps, the Pyrenees, the Central Massif, and Corsica. The season with the heaviest extremes is SON while during JJA extremes are weakest.

In DJF (panel a–d in Fig. 8) the \(0.44^{\circ }\) models underestimate extreme precipitation in southern France, where heaviest extremes occur, and overestimate them elsewhere. The \(0.11^{\circ }\) simulations can mitigate the dry bias consistently while biases in the rest of France remain the same.

Similar patterns can be seen in JJA (panel e–h) where a pronounced dry bias is apparent in the \(0.44^{\circ }\) simulations. Again the \(0.11^{\circ }\) models are found to reduce this dry bias consistently.

Also in MAM and SON (not shown) the same bias patterns occur in the \(0.44^{\circ }\) simulations and similar improvements can be found in the \(0.11^{\circ }\) runs. This is similar for mean precipitation in DJF and MAM (Online Resource 1 Fig. A6) but in JJA and SON entire France is too dry in the \(0.44^{\circ }\) simulations, which is improved in the \(0.11^{\circ }\) simulations. Consistent improvements are predominantly located in the South and along the Atlantic coast.

Carpathians Compared to the other investigated regions, the Carpathians feature moderate extreme precipitation amounts. The most intense season is JJA where extremes of up to \(40\,\hbox {mm}\,\hbox {d}^{-1}\) occur along the entire Carpathian Mountain chain.

In DJF (panel a–d in Fig. 9), and MAM (not shown) the models are too wet in the entire region except of the Southwest. The bias patterns are similar in both model resolutions except of a shift of precipitation. In the \(0.11^{\circ }\) simulations extreme precipitation tempts to fall out upstream of the Carpathians, which leads to deteriorations in their foothills towards Southwest and to improvements above the mountains.

In JJA, the \(0.44^{\circ }\) simulations are too dry in the entire region except in the mountains. The \(0.11^{\circ }\) simulations are showing heavier extremes, which is a consistent improvement in 30 % of the region.

SON (not shown) is characterized by a wet bias in the mountains, which is improved by the \(0.11^{\circ }\) simulations in the Northeast. For mean precipitation (Online Resource 1 Fig. A7) the patterns are similar as for extremes but the relative magnitudes of biases are larger.

Areas with improved mean and extreme precipitation Very often the areas of consistent improvements in mean and extreme precipitation are similar. However, there are some notable exceptions, which are worth to be discussed.

Figure 10 shows differences between the \(0.11^{\circ }\) minus \(0.44^{\circ }\) multi-model-means relative to observations for mean (panel a) and extreme (panel b) MAM precipitation in Norway and Sweden. The patterns of extreme precipitation are very similar to JJA (Fig. 7h). Consistent improvements, due to the fine-gridded models, can predominantly be found in the mountainous region of Norway. However, for mean precipitation also large parts of the flat regions in Sweden are improved, which are located downstream of the Scandinavian Mountains. This is because extreme precipitation in Sweden is predominately caused by south-easterly flow, which advects moist air from the Baltic Sea (Hellström 2005) while mean precipitation is more related to a zonal flow where a rain shadowing effect is present that is caused by the Scandinavian Mountains. Therefore, in the \(0.11^{\circ }\) simulations more precipitation is generated over the Scandinavian Mountains, which leads to less mean precipitation downstream in Sweden and thereby reduces the overall wet bias in the \(0.44^{\circ }\) models.

A similar behavior can be seen in Germany also during MAM (Fig. 10c, d). Extreme precipitation only improves in hilly and mountainous regions in the \(0.11^{\circ }\) models but mean precipitation also gets better in flat areas.

In panel e the net-improved-areas (improved minus deteriorated areas) of the \(0.11^{\circ }\) simulations are summarized for mean and extreme precipitation. In the Alps the difference between the mean and extreme precipitation net-improved-areas are similar (difference \(<\)5 %). In Germany improvements are larger in mean precipitation during MAM (as shown in panel c–d) and in extreme precipitation in SON. Also in Sweden large differences are visible especially in DJF and MAM (as shown in panel a–b) and net-improved-areas of mean precipitation are always larger than those of extreme precipitation. In Spain, Norway, and the Carpathians improvements are partly larger for means and partly for extremes with differences of up to 20 % in the Carpathians during SON. Differences in France are small. More generally speaking, these results do not suggest that extreme precipitation is improved more than mean precipitation.

Figure 10e also demonstrates that the \(0.11^{\circ }\) simulations outperform the \(0.44^{\circ }\) runs with regard to extreme and mean precipitation in all regions and seasons (with some exceptions in Germany during JJA and Sweden during DJF and JJA). The largest net-improved-area fractions can be found in Spain, followed by the Alps, and Norway. In the Alps, the Carpathians, and France the season with the largest net-improved area is JJA while in the other regions JJA is among the season with smallest net-improved-areas.

If we perform the same statistical analysis on a \(0.11^{\circ }\) evaluation grid (Online Resource 1 Fig. A8) most of the features described above stay the same but two remarkable differences deserve to be highlighted. First, on the \(0.44^{\circ }\) evaluation grid, the area for consistently improved summertime extreme precipitation is rarely larger than the one of mean precipitation. Contrary, on the \(0.11^{\circ }\) evaluation grid the improved areas in JJA are larger for extreme precipitation (except in Sweden and the Carpathians). Second, hardly any improved areas are found in Germany on the coarse grid, but clear added value is indicated by the results of the evaluation on the fine grid. Also in other regions, improved areas are larger on the fine evaluation grid.

The reason for this is shown on the example of Germany during DJF in Fig. 11. Compared to Fig. 5a–d we can see that more fine-scale structures can be captured by the \(0.11^{\circ }\) models. This results in larger consistently improved areas compared to the analysis on a \(0.44^{\circ }\) evaluation grid and demonstrates that the \(0.11^{\circ }\) simulations are found to produce realistic precipitation patterns beyond the grid spacing of the \(0.44^{\circ }\) models.

Summing up, the \(0.11^{\circ }\) simulations are found to consistently improve extreme and mean precipitation biases on grid point scale over large parts of Europe but especially in mountainous areas. Since heaviest precipitation is observed in the mountains these improvements can be valuable for flood protection or river runoff studies.

3.4 Scale dependence of spatial correlation coefficients

In contrast to the investigation of biases in Sects. 3.1, 3.2 and 3.3 here we focus on the spatial correlation between simulated and observed precipitation patterns on different spatial scales. Therefore, we calculate the Pearson product-moment correlation coefficient (insensitive to regionally averaged biases) for extreme and mean precipitation fields. Information about the scale dependence is derived by smoothing the fields with the method described in Sect. 2.3.

In the Alpine region (Fig. 12a–d), spatial correlation coefficients are improved in the \(0.11^{\circ }\) simulations uniformly across all investigated scales and all seasons except JJA. In JJA there is a constant decrease in improvement until approximately 400 km where half of the \(0.11^{\circ }\) simulations improve and the other half deteriorates the correlation coefficients.

In Germany the majority of \(0.11^{\circ }\) simulations improve the correlation coefficients in MAM and SON (panel f and h). In MAM there is no clear spatial dependence but in SON improvements are increasing on large scales. Deteriorations are found in DJF and especially in JJA. In the latter not a single \(0.11^{\circ }\) simulation was able to improve the correlation coefficients of its \(0.44^{\circ }\) counterpart.

In Spain almost all \(0.11^{\circ }\) simulations feature higher correlation in SON (panel l) and more than 75 % in DJF and MAM (panel i and j). In JJA there is a clear gradient, where more than half of the \(0.11^{\circ }\) ensemble improve the correlation coefficients below 400 km.

Clear improvements can be found in Sweden during DJF and especially during MAM (panel m and n). In the latter the entire \(0.11^{\circ }\) ensemble has higher correlation coefficients. In SON (panel p) both, improvements and deteriorations occur in equal shares while in JJA (panel o) more than 75 % of the \(0.11^{\circ }\) ensemble has smaller correlation coefficients.

In Norway improvements and deteriorations in SON occur in equal shares (panel t). In DJF and MAM (panel q and r) improvements dominantly occur for scales above approximately 200 km. In JJA (panel s) improvements on scales below 450 km are found for more than half of the \(0.11^{\circ }\) simulations while results on larger scales are mostly deteriorating.

No clear scale dependency is found in the Carpathians (panel u–x). More than 75 % of the \(0.11^{\circ }\) simulations have higher correlation coefficients in all season except in MAM.

Improvements in all seasons and at all scales can be found in France (panel y to bb). During MAM and SON nearly all \(0.11^{\circ }\) models have higher correlation coefficients whereas in DJF and JJA the ratio is approximately 75 %.

For mean precipitation (Online Resource 1 Fig. A9) generally less scale dependence is found than for extremes. The spread is smaller, and improvements are more consistent. In nearly all seasons and regions more than 75 % of the \(0.11^{\circ }\) simulations are found to improve the spatial correlation coefficients of their \(0.44^{\circ }\) counterparts. The only exception is JJA in Germany.

Summing up, most \(0.11^{\circ }\) simulations are found to improve spatial correlation coefficients over a wide range of scales. This means, spatial patterns, like the location of precipitation hot-spots or areas with weaker precipitation, are better represented at spatial scales from the meso scale (\({\sim}50\,\hbox {km}\)) to the regional scale (\({\sim}400\,\hbox {km}\)). The typically weak spatial-scale dependency of the pattern correlation coefficients might be related to the spatial extent of the orographic features in the investigated regions that have a similar size than the spatial scales investigated in Fig. 12. Stronger scale dependencies might be present on synoptic to continental scales.

3.5 Intensity dependence of spatial correlation coefficients

While the spatial-scale dependencies of correlation coefficients were analyzed in Sect. 3.4, here we focus on their intensity dependence. Usually different synoptic situations lead to different precipitation intensities and therefore model errors are often intensity-dependent. For this investigation grid cell precipitation was binned in 2.5 % classes for values above the 50 % percentiles. The 0–50 % percentiles, which include mostly non to weak precipitation values were binned to one class in addition. Thereafter, the resulting spatial correlation coefficients were calculated for each precipitation class (bin).

In the Alps, Spain, and France (Fig. 13a–d, i–l and y–bb) the \(0.11^{\circ }\) simulations improve the correlation more for high intensities (except for JJA in Spain, and MAM in France). For the other regions improvements are predominantly larger for light precipitation.

In the Alps more than 75 % of the \(0.11^{\circ }\) simulations show improvements in all seasons and for all intensities (except for light rain in DJF). In Germany (panels e–h) light to moderate intensities are improved in all seasons. In JJA most models show only small differences between the two resolutions. In Spain (panels i–l) most fine gridded simulations feature higher correlation coefficients on all scales (except in SON for light precipitation). In JJA there is a strong intensity dependence where largest improvements occur for low intensities. Differences in the correlation coefficients are rather small in Sweden (panels m–p) while in Norway (panels q–t) large improvements for light precipitation are found. In JJA no intensity dependence is visible, while in the other seasons improvements are smaller for higher intensities. In the Carpathians light precipitation is more improved during JJA (panel w), while no clear intensity-dependence is found in the other seasons. In France extremes are more improved than light precipitation in all seasons except in MAM.

If we repeat this analysis on a \(0.11^{\circ }\) evaluation grid, the intensity dependencies remain the same, but the higher correlation coefficients of the \(0.11^{\circ }\) simulations are even more pronounced (Online Resource 1 Fig. A10).

Summing up, spatial correlation coefficients of mean and extreme precipitation are larger in most of the \(0.11^{\circ }\) simulations over a wide range of precipitation intensities. Because the size of precipitation intensities is strongly related to differences in synoptic situations, this finding indicates that the fine gridded simulations improve the representation of precipitation patterns for a variety of weather situations.

3.6 Daily spatial precipitation structure

Until now we analyzed precipitation in climatological fields (e.g., median, mean, extreme). Here we are directly comparing observed with simulated precipitation patterns on a day-to-day basis. This can elucidate further added value, since in climatological fields daily model errors may cancel out.

Evaluating precipitation patterns on daily timescales can be challenging because of double penalty problems (e.g., Prein et al. 2013a). Here the FSS method is applied, which is able to avoid the double penalty problem by allowing spatial displacements (see Sect. 2.3 or Roberts and Lean (2008) for more details).

The differences in the median FSSs (\(0.11^{\circ }\) minus \(0.44^{\circ }\) simulations, see Fig. 14) is mostly positive, meaning that the \(0.11^{\circ }\) models have a higher skill to simulate daily patterns of precipitation than their \(0.44^{\circ }\) counterparts. Only for moderate precipitation thresholds (1–10 mm/day) and horizontal scales beyond 400 km some small deteriorations can be identified in Germany (panel m–p), DJF in Spain (panel y), and France during DJF (panel i). Usually, improvements are seen on small horizontal scales (below 200 km) for thresholds up to 5 mm/day and for all scales for thresholds between 5 and 30 mm/day. Largest improvements are found for moderate to intense daily precipitation sums (10–30 mm/day) and allowed displacements larger than 200 km.

Comparing the different regions, largest improvements are found in the Alps (panels a–d) and in Norway (panels q–t), whereas less improvement is found in Germany (panels m–p) and Sweden (panels u–x). In Germany and Sweden improvements are similar in different seasons. In the Alps and the Carpathians largest improvements are found in JJA and lowest in DJF whereas in Norway the opposite is the case. In Spain the transition seasons show largest improvements. This is in good agreement with findings in Sect. 3.3 (see Fig. 10e).

These results indicate that the \(0.11^{\circ }\) simulations are not only capable of improving climatological average precipitation but also precipitation on a daily basis. This means that the fine gridded simulations yield improved precipitation patterns and intensities on the weather timescale. For studies related to, e.g., hydrology or droughts this is important, since they require a correct representation and sequence of weather conditions.

3.7 Daily and 3-hourly precipitation distributions

In this section the shape of simulated daily and 3-hourly precipitation distributions (hourly only available for Switzerland) on grid point basis (\(0.44^{\circ }\) evaluation grid) are compared to observations. Contrary to the analyses in the previous sections, temporal or spatial mismatches do not affect the results of this analysis since the distributions only dependent on the frequency of precipitation intensities irrespective of where or when they occur in a season.

Shown in Fig. 15 is that the \(0.11^{\circ }\) models tend to have higher extreme precipitation values than the \(0.44^{\circ }\) simulations. This is beneficial in MAM where the \(0.11^{\circ }\) simulations improve the representation of extreme precipitation in all regions except the Carpathians and the Alps (thick red lines are closer to the diagonal than the thick white lines). In DJF however, the \(0.11^{\circ }\) models only improve extremes in Norway while they deteriorate their representation elsewhere. The regions with the most consistent improvements across seasons are Germany and Norway. In the Carpathians no improvements are seen (except for SON).

The \(0.11^{\circ }\) simulation spread (red shaded areas) is smaller than the spread of the \(0.44^{\circ }\) simulations (blue contours) during JJA, except for the Carpathians. The spread does not change in DJF while MAM and SON show mixed results.

Extreme precipitation events often have small spatial and temporal extends. Therefore, this evaluation is very sensitive to the underlying temporal and spatial resolution. On a \(0.11^{\circ }\) evaluation grid (Online Resource 1 Fig. A11) the shown improvements are more pronounced. This indicates that the \(0.11^{\circ }\) simulations are found to reproduce extreme events on scales smaller than \(0.44^{\circ }\) more realistically.

To investigate the difference between the observed and simulated precipitation distributions on a sub-daily (3-hourly) scale we have used the RdisaggH data set (Table 2), which provides data for Switzerland within the period May 2003 to December 2007. Figure 16 shows that in Switzerland the distribution of the median of the \(0.11^{\circ }\) models is always closer to the observed distribution than the median of the \(0.44^{\circ }\) models (except for daily DJF and JJA, panel b and f). Additionally, also the simulated spread is smaller in the \(0.11^{\circ }\) ensemble (except for daily and hourly DJF and daily MAM).

In general, improvements in the \(0.11^{\circ }\) simulations are larger for 3-hourly precipitation than for daily values and for high intensities. Especially the maximum values are well represented in all seasons except DJF 3-hourly. Remarkable is the improvement in SON where daily extremes are overestimated in the \(0.44^{\circ }\) runs. This is corrected in the \(0.11^{\circ }\) simulations, At the same time 3-hourly precipitation maxima are underestimated by the \(0.44^{\circ }\) models, which is improved in the \(0.11^{\circ }\) simulations as well. These improvements can only be achieved when precipitation intensity is increased on short time scales while precipitation duration is decreased.

If the same evaluation in Switzerland is performed on the \(0.11^{\circ }\) evaluation grid (Online Resource 1 Fig. A12), improvements in the \(0.11^{\circ }\) simulations are getting larger (except for DJF and SON daily).

Investigating the dry-day frequency and moderate precipitation intensities (below \(25\,\hbox {mm}\,\hbox {d}^{-1}\)), which are barely visible in Fig. 15, reveals that for most regions and seasons the simulated dry-day frequency is too low (except for Spain in all seasons and France and the Carpathians during JJA and SON; Fig. A13). The dry-day frequency tends to be equal or lower in the \(0.11^{\circ }\) simulations compared to the \(0.44^{\circ }\) models (except for DJF in Sweden). Moderate precipitation intensities (between \(0.1\,\hbox {mm}\,\hbox {d}^{-1}\) and \(25\,\hbox {mm}\,\hbox {d}^{-1}\)) tend to be slightly more frequent in the high-resolution models (Fig. A13).

4 Summary

In this study mean and extreme (above 97.5 %) precipitation in 16 evaluation experiments from the EURO-CORDEX initiative with horizontal grid spacings of \(0.11^{\circ }\) and \(0.44^{\circ }\) (8 each) are compared to highly resolved observation data sets in 7 European regions (Alps, Germany, France, Sweden, Norway, Spain, and the Carpathians). The main goal was to find out where differences between the fine and the coarse gridded simulations occur and if these differences result in an improved or deteriorated representation of precipitation in the \(0.11^{\circ }\) models.

Our evaluation strategy focused on:

  1. 1.

    investigating spatial and seasonal median biases and spatial error ranges in the seven investigated sub-regions (Sects. 3.1, 3.2),

  2. 2.

    assessing spatial distribution of seasonal mean biases and the evaluation of consistent improvements/deteriorations of seasonal mean absolute biases in the \(0.11^{\circ }\) simulations compared to the \(0.44^{\circ }\) models on the grid cell scale (Sect. 3.3),

  3. 3.

    evaluating spatial pattern correlation coefficients as a function of spatial scales (Sect. 3.4) and precipitation intensities (Sect. 3.5),

  4. 4.

    analyzing precipitation structures and intensities on a daily basis (Sect. 3.6),

  5. 5.

    and investigating the simulation of daily and 3-hourly precipitation distributions (Sect. 3.7).

In general, no added value was found in regional and seasonal mean and median precipitation (cf. Figs. 2, 4, 5, 6, 7, 8 and 9). The \(0.11^{\circ }\) simulations tend to increase precipitation by reducing the dry-day frequency and by increasing the frequency and intensity of light, moderate, and especially extreme precipitation (cf. Figs. 15, 16, and Fig. A13). Analyzing precipitation differences on a local (e.g., grid cell) basis (cf. Figs. 4, 5, 6, 7, 8 and 9) reveals that the \(0.11^{\circ }\) simulations produce more precipitation especially in areas that are upstream (regarding the predominate westerly wind direction in Europe) of mountain ranges and simulate less precipitation in downstream areas (precipitation shadowing effect). This effect is best visible during DJF because of the strong synoptic-scale flow. Examples are shown for the Carpathians (Fig. 9d and A7 d), Sweden (Fig. A5 d and Fig. 10a, b), the Alps (Fig. A2 d), and Spain (Fig. 6d and Fig. A4 d). This orographically induced differences in the \(0.11^{\circ }\) simulations tend to consistently reduce the precipitation biases in most of the \(0.11^{\circ }\) models and affected regions. Therefore, the regions with the largest areas of consistently improved biases have topographically complex features (e.g., the Alps, Norway, Spain) or are directly affected by mountain ranges (cf. Fig. 10e) such as Sweden, which is shielded by the Scandinavian Mountains towards the West. The strong influence of mountains on the improved precipitation features in the \(0.11^{\circ }\) simulations is also shown in the decrease of spatial error ranges (predominant blue colors in Fig. 2 and A1 in the Alps, Spain or Norway) and the higher improvements in the FSS statistics (cf. Fig. 14 Alps and Norway).

Spatial correlation coefficients for different precipitation intensities show that the \(0.11^{\circ }\) simulations are superior in representing precipitation patterns, compared to their \(0.44^{\circ }\) counterparts for light precipitation in virtually all regions and seasons (see Fig. 13). These improvements get even larger for high precipitation intensities in the Alps, Spain, and France while they tend to get smaller or stay unaltered in the other regions. The spatial-scale dependence of the correlation coefficients is generally weak (cf. Fig. 12 and Fig. A9). This might be related to the too small spatial extent of the regional data sets, which is typically on the order of a few hundred kilometers and therefore beyond the synoptic scale. Strongest scale dependencies of extreme precipitation occur in mountain regions (Alps, Spain, Norway) during JJA. Improvements in correlation coefficients are limited to scales below approximately 400 km. This is probably related to the predominance of convective storms that are the major source of extreme precipitation during JJA in this regions.

A clear result from our analysis is that added value in the \(0.11^{\circ }\) simulations is not restricted to extreme precipitation but is partly even larger in mean precipitation statistics on local scales. An example is shown by the improvements of biases in Sweden during MAM (see Fig. 10a, b).

Improvements in the \(0.11^{\circ }\) simulations are more pronounced when evaluations are performed on a \(0.11^{\circ }\) evaluation grid (all data is remapped on a common \(0.11^{\circ }\) instead of a \(0.44^{\circ }\) grid; compare e.g., Fig. 11 with Fig. 5a–d or Fig. 13 with Fig. A10). This indicates that the \(0.11^{\circ }\) models are found to produce realistic precipitation patterns beyond the grid spacing of the \(0.44^{\circ }\) simulations.

5 Discussion

There are some important differences between the presented results to findings in the EURO-CORDEX standard evaluation paper by Kotlarski et al. (2014). The results agree that there is no added value in seasonal and regional averaged mean precipitation (cf., Fig. A1) however they disagree because Kotlarski et al. (2014) did not find improvements in spatial pattern correlation of mean seasonal precipitation, which is shown here (e.g., Fig. A9). Furthermore, Kotlarski et al. (2014) found a general wet bias in most seasons and over most of Europe, which cannot be confirmed by our findings (cf. Fig. A1).

The reasons for these differences are probably the usage of different observational data sets. Kotlarski et al. (2014) use the E-OBS gridded data set (Haylock et al. 2008), while we use gridded regional data sets, which have a finer-grid spacing, higher observation station densities, and are partially precipitation under catch corrected. The differences between the E-OBS and the regional data sets as well as the implication on model evaluation are shown in Prein and Gobiet (2015). By using the same observational data sets for the European Alps and Spain, Casanueva et al. (2015) show similar improvements in the spatial pattern correlation and similar biases than shown here.

Kotlarski et al. (2014) did not explicitly address the added value of an increased grid spacing and left this topic for further analysis. However, they stated that they would expect benefits for quantities such as daily precipitation intensities and small-scale spatial climate variability in topographically structured terrain, which is confirmed in the here presented study.

Our results are consistent with previous studies that addressed the added value of smaller horizontal grid spacings in simulating precipitation. Rauscher et al. (2010) showed improving spatial patterns and temporal evolution of summertime precipitation for the ENSEMBLES simulations by comparing 25 km grid spacing simulations with 50 km gridded simulations. Largest improvements have been found in topographically complex regions (Rauscher et al. 2010), which is also confirmed by a study of Chan et al. (2013). The reason why Rauscher et al. (2010) did not find improvements in DJF precipitation might be the coarser-grid spacing of the 25 km simulations and the usage of a different precipitation data set (E-OBS).

Jacob et al. (2014) state that biggest differences in the climate change signals between the EURO-CORDEX fine-gridded (\(0.11^{\circ }\)) and coarse-gridded (\(0.44^{\circ }\)) simulations occur in the change pattern for heavy precipitation events. They find a smoother shift from weak to moderate and high intensities. They relate the more detailed spatial patterns of the \(0.11^{\circ }\) grid spacing simulations to better resolved physical processes like convection and heavy precipitation, and due to better representation of surface characteristics and their spatial variability, which can be supported by our findings.

5.1 Sources for added value

In this subsection we will investigate why the \(0.11^{\circ }\) simulations are able to improve the representation of precipitation compared to their \(0.44^{\circ }\) counterparts. Therefore, we try to get insights in differences between the following three factors, which affect precipitation (Sawyer 1956):

  • large-scale atmospheric circulation by comparing the simulation of sea level pressure (Fig. 17),

  • cloud processes and convection by analyzing the convective-to-total precipitation ratio (Fig. 18a–d), and

  • the interaction of the atmospheric flow with the surface (particularly with the orography) by comparing the variability in the 700 hPa vertical wind speed (Fig. 19e–h).

The simulated differences in sea level pressure are typically below 0.6 hPa (see contours in Fig. 17d, h). Even though, areas of consistent improvements are detectable in the \(0.11^{\circ }\) simulations (especially during DJF over the Mediterranean and Eastern Europe), the differences between the \(0.11^{\circ }\) and \(0.44^{\circ }\) simulations are an order of magnitude smaller than the biases in the simulated sea level pressure (Fig. 17b, c and f, g). These differences can partly contribute to improvements found for DJF but are probably too small to be the major source of added value. For this evaluation all simulations, except those of the REMO model, were used (REMO data was not available).

The effect of changing the grid spacing on cloud processes and convection is estimated by the convective-to-total precipitation ratio between the \(0.11^{\circ }\) and \(0.44^{\circ }\) simulations of the CCLM-CLMCOM, WRF-IPSL-INERIS, and RCA4-SMHI models (the data for the other models were not available). Convective precipitation is produced by the deep convection schemes (related to sub-grid-scale convection) while large-scale precipitation is explicitly resolved on the model grid. In DJF (Fig. 18a–c) no major difference are seen over land areas (except for the South-Wests of the Iberian Peninsula). During JJA the \(0.11^{\circ }\) runs tend to reduce the proportion of convective precipitation in most of the investigated areas (Fig. 18d–f). This is in line with findings by Rauscher et al. (2010) who analyzed the ENSEMBLES RCMs (Rauscher et al. 2010). There is no visible relationship between changes in the convective-to-total precipitation ratio and consistently improved areas (dashed contours). The lower ratio of precipitation generated by the deep convection parameterization schemes of the \(0.11^{\circ }\) models means that more precipitation is explicitly generated by the model dynamics.

Fig. 1
figure 1

EURO-CORDEX domain (colored area in a) and orography therein (contour). Colored overlays depict the evaluated regions. The Alpine data set includes areas in Germany and France (dashed regions). b depicts the locations of important sub-regions, which are discussed in the text

Fig. 2
figure 2

Scatter plots showing \(0.11^{\circ }\) (x-axis) against \(0.44^{\circ }\) (y-axis) simulated median extreme precipitation biases for DJF (left column) and JJA (right column) averaged over the Alps, Germany, Spain, and Sweden, (top down left) and Norway, France, and the Carpathians (top down right). Symbol colors show differences in the spatial error variability (Q75 \(-\) Q25; \(0.11^{\circ }\) minus \(0.44^{\circ }\)) in percent (relative to \(0.44^{\circ }\)). A \(0.11^{\circ }\) simulation has a smaller (larger) absolute bias if its symbols is located in the green (red) areas of the plot

Fig. 3
figure 3

Absolute precipitation differences in precipitation (\(0.11^{\circ }\) minus \(0.44^{\circ }\) simulations) divided by the absolute biases in the \(0.44^{\circ }\) simulations. Results for mean/extreme precipitation are shown in green/red. Upper/lower plots show JJA/DJF

Fig. 4
figure 4

Observed extreme precipitation (mean of all values above the 97.5 percentile) in the Alps (first column). The second (third) column shows the relative biases in the \(0.11^{\circ }\) (\(0.44^{\circ }\)) multi-model-mean. Filled contours in the fourth column show differences between the \(0.11^{\circ }\) minus the \(0.44^{\circ }\) multi-model-mean relative to the observation. Red (blue) shaded areas depict regions where more then 75 % of the \(0.11^{\circ }\) (\(0.44^{\circ }\)) simulations have smaller errors than the corresponding \(0.44^{\circ }\) (\(0.11^{\circ }\)) runs. Below the first three columns the mean, maximum (Max), and minimum (Min) values are displayed while below the fourth panel the areal coverage of improved (red; IMPRO) and deteriorated (blue; DETER) shaded areas in the \(0.11^{\circ }\) simulations are shown. The thick black contour line shows the 800 m height level in the \(0.11^{\circ }\) orography

Fig. 5
figure 5

Same as in Fig. 4 but for Germany in DJF and JJA

Fig. 6
figure 6

Same as in Fig. 4 but for Spain in DJF and JJA

Fig. 7
figure 7

Same as in Fig. 4 but for Sweden and Norway in DJF and JJA

Fig. 8
figure 8

Same as in Fig. 4 but for France in DJF and JJA

Fig. 9
figure 9

Same as in Fig. 4 but for the Carpathians in DJF and JJA

Fig. 10
figure 10

ad are similar to the right column of Fig. 4. a and b show results for Norway and Sweden and c and d for Germany in MAM. Statistics for mean precipitation are depicted in panel a and c and for extreme precipitation in b and d. e depicts an overview of the net consistently improved areas (improved minus deteriorated areas in the \(0.11^{\circ }\) simulations)

As a proxy for the interaction between the atmosphere and the orography we investigate the standard deviation of hourly vertical wind speed at 700 hPa (Fig. 19). Since vertical wind speed is no standard output variable in the CORDEX framework and a hourly frequency is beneficial (on lower frequencies up and downward motions might cancel out) we investigate data from a 4-year long (2006 to 2009) simulation with the CCLM-CLMCOM model. The choice of the 700 hPa level is a compromise between being high enough to not intersect with orography and low enough to still see a strong influence of orography on vertical motions. In DJF we see much higher variability in the \(0.11^{\circ }\) simulations (panel a) than in the \(0.44^{\circ }\) runs (panel b) especially over mountainous regions. This is most likely related to the better resolved orography and therefore steeper slopes in the \(0.11^{\circ }\) simulations. In Fig. 19c areas with higher vertical wind standard deviation in the \(0.11^{\circ }\) run are overlapping with, or are surrounded by, consistently improved areas (dashed contours). During JJA synoptic-scale flow is generally weaker but the stratification of air masses is typically more unstable than in DJF. Again, vertical wind speed is more variable in the fine-gridded simulations (panel d). In contrast to DJF the largest variability is not constrained to mountainous regions but covers almost all land regions south of 50° North. In large parts of this area consistent improvements in the \(0.11^{\circ }\) simulations can be found. In the 0.11 simulations the standard deviations in vertical wind speed are larger and the areas with high values for standard deviation are far less confined to mountainous regions than they are in the 0.44 simulations.

It is important to mention that the CCLM is a non-hydrostatic model, which is able to simulate vertical movements due to atmospheric instabilities (buoyancy effect). Non-hydrostatic processes (e.g., deep convection) have scales lower than approximately 10 km (e.g., Kalnay 2003). Such processes start to be resolved in the \(0.11^{\circ }\) run but are unresolved in the \(0.44^{\circ }\) run.

Fig. 11
figure 11

Same as in Fig. 4 but for Germany in DJF evaluated on a \(0.11^{\circ }\) evaluation grid

Fig. 12
figure 12

Differences (\(0.11^{\circ }\) minus \(0.44^{\circ }\) simulations) in spatial correlation coefficients of extreme precipitation as a function of smoothing window size. Alps, Germany, Spain, Sweden, Norway, Carpathians, and France are shown in columns (from left to right) and DJF, MAM, JJA, and SON are depicted in top down order. The thick lines show the median model. Dark (light) shaded areas depict the Q25–Q75 (Q0–Q100) distance. Blue (red) colors indicate higher (lower) correlation coefficients in the \(0.11^{\circ }\) simulations

Fig. 13
figure 13

Same as in Fig. 12 but for different precipitation intensities (x-axis)

Fig. 14
figure 14

Median differences of FSSs between \(0.11^{\circ }\) minus \(0.44^{\circ }\) daily precipitation events. Blue (red) colors indicate higher (lower) FSSs in the \(0.11^{\circ }\) simulations. From left to right DJF, MAM, JJA, and SON is displayed while top down the Alps, Carpathians, France, Germany, Norway, Sweden, and Spain are shown

Summing up, we have made plausible that the major drivers for the added value in the \(0.11^{\circ }\) simulated precipitation are the better resolved model orography and the fact that in the fine-gridded simulations the larger scales in convection are captured by the resolved-scale dynamics which turns out beneficial for the model performance. Added value can therefore commonly be found in regions with complex orography (Pyrenees, Alps, Scandinavian Mountains) or in their surroundings (e.g., rain shadow effect in Sweden, Po valley). A similar result has been found by Beck et al. (2004). They performed regional downscaling over the European Alps with 12 km horizontal grid spacing but with a smoothed model orography representative for a 50 km grid spacing and found that the improvements in their unsmoothed 12 km simulations (compared to a 50 km grid spacing simulation) can be largely attributed to the strong surface forcing in the Alps. Also Prein et al. (2013b) showed that a grid spacing of at most 12 km is necessary to reproduce observed precipitation patterns in the headwaters region of the Colorado River and Chan et al. (2013) showed comparable results for the southern part of Great Britain.

6 Conclusions

The results presented in this study strongly suggest that the EURO-CORDEX \(0.11^{\circ }\) hindcast simulations are found to add value to the representation of extreme and mean precipitation compared to their \(0.44^{\circ }\) counterparts by:

  1. 1.

    consistently (more than 6 out of 8 simulations) reducing seasonal biases on the grid scale in large parts (up to 50 % of the total area) of the investigated regions,

  2. 2.

    improving the seasonal mean spatial patterns of precipitation especially for high precipitation intensities (above \(\sim\)90 percentile) in the Alps, France, and Spain and low intensities (below \(\sim\)80 percentile) in Germany, Sweden, and Norway,

  3. 3.

    simulating more realistic daily precipitation patterns (spatial distribution and intensity of precipitation) especially for intensities above 10 mm/day and when displacements beyond 200 km are allowed,

  4. 4.

    adding skillful information beyond the grid spacing of the \(0.44^{\circ }\) simulations,

  5. 5.

    improving the representation of daily and especially 3-hourly precipitation distributions in Switzerland.

However, on regional scales (e.g., the Alps, the Carpathians) added value in precipitation biases tend to cancel out by averaging. Therefore, the added value is most pronounced on local scales below \({\sim}400\,\hbox {km}\).

The primary reason for the detected added value seems to be the improved representation of orography and capturing larger scales in convection by the resolved-scale dynamics during JJA. This can be concluded from the locations where biases are reduced and the generally larger improvements in mountainous regions (Alps, Spain, and Norway). Improvements are, however, not confined to mountainous areas even though they can be related to orography (e.g., rain shadow effects).

Fig. 15
figure 15

Daily quantile-quantile plots of precipitation rates in the Alps, Germany, Sweden, Norway, Spain, France, and the Carpathians (from left to right). DJF, MAM, JJA, and SON are shown top down. The thick white lines and the blue shaded areas show the median value and the Q0–Q100 interval of the 0.44 ensemble. The thick red line and the red hatched area depicts the median value and the Q0–Q100 interval of the 0.11 ensemble

Fig. 16
figure 16

Same as in Fig. 15 but for three hourly (left column) and daily (right column) quantile-quantile plots of precipitation rates in Switzerland in May 2003–December 2007

Fig. 17
figure 17

Same as in Fig. 4 but for mean sea level pressure in DJF and JJA

Fig. 18
figure 18

Seasonal mean of the convective-to-total precipitation ration in the \(0.11^{\circ }\) and \(0.44^{\circ }\) simulation and their difference (from left to right). Data from the CCLM-CLMCOM, WRF-IPSL-INERIS, and RCA4-SMHI models are used. Shaded regions depict the consistently improved areas from Figs. 4, 5, 6, 7, 8 and 9. Results for DJF/JJA are shown in the first/second row

Fig. 19
figure 19

Same as Fig. 18 but for the standard deviation (STDDEV) of the hourly vertical wind velocity at 700 hPa of a four year long CCLM-CLMCOM simulation

The added value is larger when analyses are performed on a \(0.11^{\circ }\) evaluation grid instead of a \(0.44^{\circ }\) grid. This is not a trivial result because it demands that the \(0.11^{\circ }\) simulations are found to generate skillful information beyond the grid spacing of the \(0.44^{\circ }\) simulations. Thereby, improvements in simulated JJA extreme precipitation are especially enhanced because of their small-scale nature (e.g., convective thunderstorms).

The detection of added value in the \(0.11^{\circ }\) simulations strongly depends on the the availability and accessibility of fine-gridded and high-quality observational data sets. There is an urgent need for an European wide effort to combine existing national data sets into one single homogeneous data set, which is internally consistent and provides an estimate of uncertainty accounting for interpolation, under-catch, and under-sampling errors.

Concluding, simulated precipitation from the EURO-CORDEX \(0.11^{\circ }\) models can be of great value for the assessment of climate change impacts because they are found to reduce errors of both mean and extreme precipitation, particularly on small scales. Future investigations are planned to assess whether simulations with the \(0.11^{\circ }\) models are also capable of improving precipitation when forced by boundary conditions from GCM simulations. This would allow to analyze how errors induced by the GCM simulations (e.g., biases, misrepresentation of synoptic conditions) will propagate into the RCM simulation and affect the RCM precipitation and the added value detected in this paper. Crucial is to analyze if the RCMs are able to compensate errors in the lateral boundary conditions from GCMs. Diaconescu et al. (2007) showed that in their RCM simulations errors in the lateral boundary conditions did not increase nor amplifies. If large-scale errors are present in the lateral boundary conditions the representation of small-scale features in their RCM was rather poor. Exceptions could be found at locations where strong small-scale surface forcing were present. The EURO-CORDEX imitative provides a perfect framework to deepen this analysis and apply it to a large ensemble of GCM driven transient regional climate simulations.