1 Introduction

During the last few years tremendous advances have been made towards understanding regional climate predictability and in improving the reliability of regional climate models (RCMs) (e.g., Jones et al. 1994; Schaeffer et al. 2002; Vidale et al. 2003; Wang et al. 2004; Christensen et al. 2007). In projects conducting long term simulations or climate projections RCMs are currently operated at horizontal grid spacings between 50 and 25 km [e.g., PRUDENCE (Christensen and Christensen 2007), ENSEMBLES ENSEMBLES (Hewitt 2005) and NARCCAP (http://www.narccap.ucar.edu]. One of the foci of these projects is on the quantification of error characteristics and uncertainties of RCMs at their current spatial resolutions (e.g., Jacob et al. 2007). For IPCC AR5 regional climate projections for all land areas of the earth with particular focus on Africa are planned (http://wcrp.ipsl.jussieu.fr/RCD_Projects/CORDEX/CORDEX.html).

Recently, even finer grid spacings became computationally feasible, which is particularly useful in orographically complex regions like the European Alps. It has been shown that a higher resolution enables to investigate climate and climate change in smaller subregions than has been possible before (e.g., subregions of the Alpine area) (Suklitsch et al. 2008). It is expected that high resolution RCMs can more accurately reproduce heavy precipitation events which are likely to become increasingly important in a warmer future climate (Christensen and Christensen 2004), and that higher spatial model resolution renders more accurate precipitation patterns (Hohenegger et al. 2008). Additionally, high resolution climate scenarios are strongly requested by the climate impact research community.

Therefore, a horizontal resolution of RCMs of 7 to 10 km currently becomes increasingly important and will eventually become standard in the near future. Several regionally focused projects already produced such high resolution climate scenarios at 10 km grid spacing [e.g., reclip:more (Loibl et al. 2007) and its successor reclip:century for the Greater Alpine region (http://foresight.ait.ac.at/reclip/, a project in which regional climate scenarios for Germany were produced (Jacob et al. 2008), or CECILIA (http://www.cecilia-eu.org/) for Central and Eastern Europe], but the corresponding quantification of error characteristics and uncertainty of climate scenarios is still missing. Particular concern exists, whether the decreasing spatial scale of analysis goes along with larger model errors. This matter has been discussed for global climate models, e.g., in Reichler and Kim (2008) and Kim and Reichler (2008). Since the knowledge about model errors builds the basis for the interpretation of model results, this study focuses on the quantification of error ranges of RCMs at high resolution over a particularly demanding area, the European Alpine region. The analysis is conducted using a large ensemble (62 members) constructed of four different RCMs under various configurations and aims at the general quantification of high resolution RCM error ranges, rather than on the analysis of the performance of one single model. In order to be able to base this analysis on a large variety of models and configurations, relatively short simulation periods are analyzed (1 year plus 3 to 4 months spin up).

This paper is structured as follows: In Sect. 2 the experimental setup is described, together with a short introduction to every model used in this study. Additionally, a description of the atmospheric conditions during the simulation period is given. Section 3 is devoted to the reference data and the evaluation regions. In Sect. 4 we present the results obtained in this study, first for the entire Greater Alpine region, followed by the results within subregions. The paper then closes with conclusions in Sect. 5.

2 Experimental setup

2.1 Models

The four models used in this study are: CCLM (Böhm et al. 2006), MM5 (Dudhia 1993), WRF (Skamarock et al. 2005) and REMO (Jacob and Podzun 1997; Jacob 2001; Jacob et al. 2007). The different setup options used in each experiment are summarized in Table 1.

  • CCLM. The COSMO model in CLimate Mode is the German community climate model. It is based on the primitive hydro-thermodynamical equations describing a compressible non-hydrostatic flow in a moist atmosphere without any scale approximations. Much information about CCLM and its applications in the CLM community is compiled in a special issue of Meteorologische Zeitschrift (vol. 17, no. 4; e.g., Rockel and Geyer 2008; Feldmann et al. 2008). The model version used in the present study is 4.0. All simulations feature Runge–Kutta numerics and most of them Kain–Fritsch convection scheme.

  • MM5. The Mesoscale Model of the 5th Generation has the longest running history: it evolved from a hydrostatic model in the early 1970s that was later documented by Anthes and Warner (1978). Over the years multiple-nest capability, non-hydrostatic dynamics Dudhia (1993), and more parameterization options (including soil–vegetation–atmosphere-transfer models) were implemented along with several numerical modifications and optimizations. In 2004 further development was suspended in favor of the next generation model WRF (see next paragraph). In its latest version (3.7.4) MM5 solves the governing coupled partial differential equations (capturing the atmosphere) by means of finite differencing schemes: second-order centered finite differences and first-order upstream schemes are used for spatial discretization on a staggered grid (Arakawa-B grid). Temporal discretization is achieved by a second-order leapfrog scheme with time-splitting to handle sound waves on shorter time steps. In vertical direction the governing equations are discretized in unequally distributed steps defined by a terrain-following sigma-pressure coordinate, which allows for implicit treatment of vertical sound waves and vertical diffusion. This latest model version was used in all but one experiments.

  • WRF. The Weather Research & Forecasting model is a community model. In this study version 2.2.1 and Advanced Research WRF dynamical core is used for all experiments. WRF is developed specifically for high resolution modeling applications and offers a broad range of choice in terms of physical options to the user community. WRF model solves the fully compressible non-hydrostatic Euler equations in flux form on a hybrid terrain following vertical coordinate system using the Runge–Kutta split-explicit time integration on an Arakawa-C type grid. It conserves mass, momentum, entropy and scalars using flux form prognostic equations. For details please refer to Skamarock et al (2005).

  • REMO. The REgional climate MOdel is a regional hydrostatic climate model and is used in different regions all over the world. REMO is based on the “Europamodell”, the former numerical weather prediction model of the German Weather Service (Majewski 1991). Further development of the model took place at the Max Planck Institute for Meteorology, where the physical parameterizations from ECHAM4 (Roeckner et al. 1996) were implemented into the Europamodell code (Jacob and Podzun 1997; Jacob 2001). REMO solves the hydrostatic Euler equations with a finite difference method on a hybrid terrain following vertical coordinate system using the leapfrog time integration on an Arakawa-C grid.

Table 1 Setups of each experiment in this study, separated by regional climate model

2.2 Model configuration and ensemble construction

The ensemble of simulations evaluated in this study consists of 62 members. It covers four RCMs, various physical parameterizations, two-step and one-step nesting approaches, various methods of feeding lateral boundary conditions into the models, various domain sizes, varying vertical resolution, a configuration with large scale nudging, different ways of initializing soil moisture and a few other configurations.

All simulations are driven by the same lateral boundary conditions from the ERA-40 Re-Analysis (Uppala et al. 2005). These boundary conditions are regarded as “perfect” in this study and errors in the downscaling results are interpreted as RCM errors. In future climate projections, errors from the driving global climate models and the reaction of RCMs to these errors have to be regarded as well.

The model grid spacing in the evaluation domain of all simulations is about 10 km (for details see Table 1). However, the nesting strategies—i.e., the way the information is brought from the coarse resolution of ERA-40 (∼100 × 120 km) to the final resolution of ∼10 × 10 km—differ. In this respect, the ensemble can be split into two groups: in case of CCLM and REMO the larger part of the experiments were done with a single downscaling step, the data was downscaled directly from ERA-40 to the final 10 km horizontal resolution. In MM5 and WRF a two-step nesting strategy is applied where an additional intermediate resolution domain is simulated on a 30 × 30 km grid and some of the MM5 simulation feature two-way nesting (see Table 2c). This also has an impact on the update interval for the lateral boundary conditions of the 10 km domain. While for the two-step nesting experiments of WRF and MM5 the lateral boundary conditions are updated with the coarse grid time step (180 s), it is by design limited to 1 h for the two-step nesting experiments of CCLM and REMO. In case of the one-step nesting experiments the update frequency is limited to the temporal resolution of the ERA-40 driving data, i.e., 6 h.

A further important distinction can be made regarding the domain sizes. Leduc and Laprise (2009), for example, have shown in a “perfect boundary condition” experiment that the results vary strongly with varying domain size, particularly with respect to the small scales. In this study this subject has also been treated by varying the domain size in some of the experiments. As will be discussed later the choice of the domain size has a stronger effect on the one-step than on the two-step nesting experiments. The model domains are shown in Fig. 1.

Fig. 1
figure 1

Model domains used in this study. Colors correspond to each of the models. Blue CCLM, yellow MM5, green WRF, red REMO. MM5 and WRF domains are identical. The two outermost domains (dash-dotted) were carried out using a coarser spatial resolution (see text)

2.3 Simulation period

All simulations were carried out for the time period September (REMO, CCLM) or October (MM5, WRF) 1998 to December 1999. The evaluation period is the full year 1999. One year is a rather short period for climate simulations and regional climate models do not perform equally well in each year (e.g., Evans et al. 2005). However, since most of the typical synoptic patterns in the Alpine region are covered by the year 1999, including dry spells and heavy precipitation events (an overview on the atmospheric conditions during 1999 is given below), we expect the results to be roughly representative. Several other studies in the past employed rather short simulations (months or seasons) to evaluate error characteristics of RCMs (e.g., Giorgi and Bi 2000; Alexandru et al. 2007; Leduc and Laprise 2009). However, the restrictions that originate from short evaluation periods have to be kept in mind. These restrictions are a certain bias in the error characteristics stemming from the deviation of 1999s weather from the climatological mean and the fact that long term processes like slow drifts in soil moisture are not captured. The former restriction is qualitatively discussed in the following paragraphs.

Climate underlies a year-to-year variability. In the Alpine region, the variability of annual means amount to about ±3 hPa for mean sea level pressure, ±1 K for temperature, and ±20% for precipitation (Auer et al. 2007). In order to judge the representativeness of the model performances in the year 1999, we give an overview of the atmospheric conditions in 1999 and compare them to the climatological mean. The following analysis is based on the reference datasets used for model evaluation. We compare temperature and precipitation of the year 1999 with the period 1971 to 1998. This rather unusual period is prescribed by the precipitation dataset which is available until 1999. Since the last year is the one under evaluation it is excluded from the climatological mean.

In many parts of the Alpine region the year 1999 began warm and there was only one short period with strong frost in mid February, which leaves a few cold anomalies in the winter season (DJF) in Fig. 2a. There was another cold air intrusion in the Alpine region in mid April, but with very hot temperatures by end of May this was turned into a strong warm anomaly for the spring season (MAM; Fig. 2b). This anomaly reaches +1.8 K in the various subregions. In total, there were three heat waves during 1999. One in the end of May, one in July and the last one in August. September brought unprecedented high temperatures to the Alpine region, followed by the first strong cold period in mid October. Towards the end of the year temperatures were rather normal, leaving a slight warm anomaly in autumn (SON; Fig. 2d) of 0.3 to 0.6 K [see Table 1 in Online Resource]. Generally, 1999 was a year with extreme conditions in both “directions” (cold and warm) which gives confidence that a wide range of weather situations is covered. In the annual mean, 1999 was warmer than the climate normal by 1.0 K. This fits nicely into the aim of this study which focuses on the performance the RCMs with regard to future climate simulations where generally warmer conditions are expected.

Fig. 2
figure 2

Differences of daily mean temperature (ad) and relative differences of daily precipitation sums (eh) in the year 1999 as compared to the period 1971–1998. Top to bottom seasons winter (DJF), spring (MAM), summer (JJA) and autumn (SON)

In terms of precipitation the year 1999 was rather moist in the western and northern parts and drier than normal in the southern and eastern parts of the Alps, particularly during winter (see Fig. 2e and Table 1 in Online Resource). The anomalies range from −25% in the South to +64% in the north-west. Outstanding events in spring 1999 are flash floods in the western parts (Switzerland, Western Austria) due to convective storms in May (more than +30% in mean precipitation and +15% in intensity). During summer (JJA; Fig. 2g) precipitation sums are higher than average within the Alps and lower than average further south and north, leaving e.g., a dry anomaly of −6% in the south-west. Autumn (Fig. 2h) was also rather dry, particularly in the south-east, but there was a heavy precipitation event at the end of October which brought flash floods mainly to Southern France and Northern Italy. The relative anomaly in the according subregions amounts to ∼+30% in terms of both mean precipitation and frequency. As with temperature the rest of the year was normal. Again, the general tendency of 1999 (wet in the north-west and dry in the south) fits nicely to conditions that are expected in future climate (e.g., Christensen and Christensen 2007; Gobiet et al. 2006) and the occurrence of wet and dry extremes ensures a reasonable sampling of a wide range of weather conditions.

3 Reference data

Finding suitable datasets to evaluate model results at a horizontal resolution of 10 km is a difficult matter. However, since the effective resolution of the models is at least four times the grid spacing (Δx), e.g., 7Δx for WRF according to Skamarock (2004) and 4Δx for MM5 according to Kapper (2009), observational datasets with 20 to 30 km horizontal resolution should be well suited. Therefore we use the following datasets for evaluation:

  • Temperature. In case of 2 m air temperature we use the E-OBS dataset (version 1) created in the framework of the ENSEMBLES project (Haylock et al. 2008). This dataset has a horizontal resolution of about 25 km. To achieve this resolution the data is first interpolated to a 0.1° master grid and after that averaged to the final resolution. It gives us daily values of mean, minimum and maximum temperature. In our analysis all three parameters are investigated. Inconsistencies in the height assignment due to different resolution of model and observation data is taken care of by resampling the model data to the grid of the evaluation dataset on which we carry out the comparison. For a more detailed explanation of the resampling process refer to Suklitsch et al. (2008). This procedure is also applied to the orography of both the model and the evaluation dataset. The resulting difference in orography between the both datasets is then multiplied with the climatological lapse rate of −6.5 K/km in order to resolve the remaining inconsistencies.

  • Precipitation. For this parameter we use the daily precipitation dataset of the Swiss Federal Institute of Technology described in Frei and Schär (1998), further called ETHZ dataset. The underlying method is similar to the one laid out in Frei et al. (2006). The authors warn against high uncertainties, especially in winter due to measurement problems of solid precipitation (e.g., wind drift) which cause an underestimation of precipitation. This underestimation of winter precipitation can reach up to 40% at stations higher than 1,500 m, the lowest measurement errors occur in summer at low level stations (see Frei et al. 2003 for more details). Despite these uncertainties this is the best precipitation dataset available in the Alpine region. However, this dataset does not cover the whole model domain. The E-OBS dataset would also provide daily precipitation sums and extend over the whole modeled region. Nonetheless we prefer the ETHZ dataset, because it is based on far more stations than E-OBS. The spatial resolution of this dataset is about 20 km. Based on that we also analyze frequencies and intensities of precipitation, where we disregard days with precipitation <1 mm.

  • Mean sea level pressure. To evaluate the RCMs performance on the synoptic scale, sea level pressure is included in our analysis. We use the ERA-40 Re-Analysis dataset (Uppala et al. 2005). While this dataset has by far not the resolution of the models it is sufficient enough to get an idea of whether the models deviate from the driving model in terms of synoptic patterns.

3.1 Subregional analysis

The climate of the Alpine region features very strong regional gradients, particularly in precipitation fields (e.g., Böhm et al. 2005). Generally the Alps act as a precipitation barrier between a rather moist northern and a rather dry southern side. In the future, this contrast might get even stronger (e.g., Gobiet et al. 2006; Christensen and Christensen (2007). Thus a subregional analysis is mandatory. The high resolution of 10 km grid spacing enables to split the model domain into subregions despite its rather small size. In this study we use six subregions as shown in Fig. 3. These subregions give a reasonably good differentiation of the domain. To obtain these subregions we used the clustering method as described in Suklitsch et al. (2008). Since the dataset used for clustering (ETHZ) does not cover the full model domain at the eastern edge we extended the subregions further in that direction. This is in our opinion a valid step, since these areas are mostly plains and therefore should not feature strong variability. Additionally another, seventh, subregion at the western edge of the model domain which consisted only of 15 grid points was merged with the neighboring subregion in the north in order to avoid too small subregions. In Sect. 4.2 the subregionally resolved analysis of model results is presented.

Fig. 3
figure 3

Subregions used for detailed analysis of the regional performance of the different RCMs. The names used in the text and tables are displayed as overlays.

4 Results

4.1 Full domain results

To get an idea of the overall performance of the models and their errors we look at the annual cycle of biases of mean sea level pressure, temperature and precipitation averaged over the entire Alpine region. The latter parameter gives us an estimate whether or not the models deviate from the driving data. The analysis is focused on the error ranges of the RCMs rather than on the performance of single simulations or models. A more detailed analysis of the CCLM, MM5 and WRF results used in this study is given in Suklitsch et al. (2008) and Awan et al. (2010), respectively. In order to roughly distinguish significant error ranges from internal RCM variability, sensitivity experiments with perturbed initial conditions have been performed with one of the models (CCLM), following the methodology described in Giorgi and Bi (2000). Three simulations for one winter (December) and one summer month (June) have been conducted. The average range between these sensitivity simulations amounts to 0.4 K in monthly mean 2 m temperature and to 0.1 mm/day in the monthly mean of daily precipitation sums. Though this estimation of internal variability is by no means comprehensive, it gives a first idea on the significance of the results presented below.

4.1.1 Mean sea level pressure

Depending on the model the bias of mean sea level pressure shows different characteristics. In Fig. 4 this bias is displayed for the four models in terms of the 2.5th and 97.5th percentiles and the median of the ensemble. The grey shaded area shows the 2.5 to 97.5 percentile interval (also known as the “inner 95th percentile range”, further simply called the “error range”) of each sub-ensemble corresponding to the four models. The darker this grey shade, the more models share the same bias. Additionally the error range of the full multimodel ensemble is shown as black solid lines. This indicates whether or not a model produced an outlier. Concentrating on the colored dashed lines in Fig. 4 , which give us the median bias of each model’s ensemble, one sees that the CCLM ensemble features a weak overall bias and the REMO ensemble develops a fairly uniform bias of ∼−0.8 hPa. WRF and MM5 on the other hand show a bias with a distinct annual cycle. During winter (DJF) mean sea level pressure is underestimated in both models (MM5 only slightly with ∼−0.3 hPa, WRF more pronounced with ∼−1.9 hPa), whereas during summer (JJA) it is overestimated in MM5 (∼+1.0 hPa) and still underestimated in WRF (∼−0.7 hPa). Taking the full model ensemble into consideration, one ends up with a weak median bias of −0.1 hPa in summer and a stronger negative bias in the other three seasons [−0.2, −0.3 and −0.5 hPa in autumn (SON), spring (MAM) and winter, respectively; see Table 2]. The seasonal error range of the entire ensemble amounts to −2.1 to +1.6 hPa (Table 2).

Fig. 4
figure 4

Annual cycle of the bias of mean sea level pressure of the four participating models. Blue CCLM, yellow MM5, green WRF, red REMO. Dashed lines median of each model’s ensemble bias, solid lines error range (as defined in text) of each model. Black solid lines error range (as defined in text) of the full model ensemble. The more intensive the grey shade, the more ensemble members share the same bias

Table 2 Five percentile values of the model bias for the full model ensemble gained by averaging over the whole (common) model domain

4.1.2 Temperature

The annual cycles of the temperature bias (Fig. 5) show again a twofold pattern: CCLM and REMO feature cold biases in winter which disappear in the REMO results and remain to a lesser extent (−0.5 K) in the CCLM results in summer. WRF and MM5 show an opposite annual cycle with a pronounced cold temperature bias in summer which is smaller in MM5 and reversed to a warm bias in WRF in winter. The biggest error range within a single model’s ensemble features the WRF model. The median of the full ensemble lies at −1.1 K on the annual time scale which is an indicator that all models have problems with the reproduction of temperature in this mountainous area correctly. According to Table 2 even at the 75th percentile we get a bias of −0.1 K meaning that more than 75% of all simulations share a cold bias averaged over the year. The temperature bias on the seasonal time scale with respect to the median does also not reach positive values. The seasonal error range lies between −3.0 and +1.7 K.

Fig. 5
figure 5

Same as Fig. 4, but for mean air temperature

4.1.3 Precipitation

For precipitation, the CCLM and REMO ensembles again show rather small error ranges and only small biases most of the year. This time they are joined by MM5 which shows a very similar annual cycle (Fig. 6). These three models share the same dry anomaly in the annual cycle of the bias in September which might be caused by one single heavy precipitation event linked to the passage of a short wave trough that is not resolved by these models. The WRF ensemble features a large error range, particularly in the summer half year. Looking at the full ensemble one can see that the single model ensemble’s biases cancel out each other nicely, so that over the year no precipitation bias remains (see Table 2). On the seasonal time scale one gets a bias of +0.4 mm/day (+13.4%), +0.5 mm/day (+14.4%), +0.4 mm/day (+10.8%) and −0.1 mm/day (−3.9%) in winter, spring, summer and autumn, respectively. The error range in summer is larger than in the other seasons which indicates higher uncertainty of precipitation during summer, most likely due to more impact of the parameterized convection and more regionally caused precipitation (i.e., smaller forcing by the lateral boundary conditions) in summer than in winter. The seasonal error range lies between −1.0 mm/day (−28.4%) and +2.4 mm/day (+67.6%).

Fig. 6
figure 6

Same as Fig. 4, but for daily precipitation sums

4.2 Subregional results

In this section we break down the model domain into different subregions as laid out in Sect. 3.1. This enables to investigate whether the error characteristics at smaller scales resemble those on the larger scale or not.

4.2.1 Temperature

In Fig. 7 a very condensed overview of the results for the bias of 2 m mean temperature within each subregion for each season and the full year for each experiment is shown. At a first glance one can identify four blocks which correspond to the four regional climate models used in this study. While CCLM and MM5 predominantly feature cold biases, REMO and WRF show small warm or cold biases. Concentrating on the different subregions one comes to the conclusion that no subregion is captured best by all models in the same period. For instance, CCLM has mostly small biases of less than −0.75 K in subregion NE on the annual basis while WRF in the same region partly even has a pronounced warm bias of more than +2.25 K. When averaged over the full model ensemble (lowest row in Fig. 7) biases between −1.75 and +0.3 K remain on both seasonal and annual time scales (see Table 3). The subregional seasonal error range varies from −3.2 to +2.0 K.

Fig. 7
figure 7

Bias of mean air temperature of each ensemble member and the ensemble mean within each subregion plus subregional mean. Columns subregions according to Fig. 3, rightmost column represents the mean over all subregions. Rows experiments, lowest row represents the ensemble mean. Within each box, the seasonal and annual bias is given according to the legend in the upper right corner. Numerical values for the subregional means are given in Table 3

Table 3 Mean bias of daily mean, minimum and maximum temperature for the full model ensemble consisting of 62 experiments in the different subregions on the seasonal and annual time scale
Table 4 Same as Table 3, but for daily mean, frequency and intensity of precipitation

Besides the mean bias of daily mean temperature, in Table 3 also the mean biases of daily minimum and maximum temperature are compiled together. Generally, minimum temperatures tend to be less cold biased than maximum temperatures. In winter minimum temperatures are even warm biased in the ensemble mean. As a result the ensemble mean diurnal cycle of temperature is dampened. However, several simulations and subregions show different characteristics (see Figs. 2 and 3 in Online Resource). The subregional seasonal error range for daily minimum temperature lies between −3.0 and +1.1 K, the one for daily maximum temperature ranges from −4.2 to +1.7 K.

4.2.2 Precipitation

In Fig. 8 we show an overview of the relative bias for subregional daily mean precipitation. Similar to temperature, one can make out the four models quite easily: MM5 and WRF produce too wet conditions in all subregions during most seasons, CCLM and REMO show mixed results.

Fig. 8
figure 8

Relative bias of daily precipitation sums of each ensemble member and the ensemble mean within each subregion plus subregional mean. Columns subregions according to Fig. 3, rightmost column represents the mean over all subregions. Rows experiments, lowest row represents the ensemble mean. Within each box, the seasonal and annual bias is given according to the legend in the upper right corner. Numerical values for the subregional means are given in Table 4

One prominent feature is the negative precipitation bias in subregion NW which appears only in the latter two models. The reason for this dry bias is the vicinity of the subregion to the inflow boundary. As already stated in Sect. 2.1 the models CCLM and REMO are updated only every sixth hour at the lateral boundaries (in case of the one-step nesting experiments). Additionally these two models have to build up a repository for cloud water from scratch, because they are nested directly into ERA-40 which does not deliver cloud water variables at the lateral boundaries. The other two models get these variables from their coarse domain at every time step. The processes which build up cloud and rain droplets take time during which the weather systems progress further east. This hypothesis is encouraged by CCLM and REMO experiments with increased domain size and even more notably by the two-step nesting experiments of CCLM (experiment numbers 2030 and 2031 in Fig. 8) which show no dry bias.

Another prominent feature is the massive overestimation of precipitation during winter in MM5 and WRF in subregions W-Alps, E-Alps and SW which, to a lower extent, is also visible in the CCLM and REMO simulations. WRF has a strong wet bias also in spring and summer in these subregions. This indicates problems in the correct representation of orographically induced precipitation in most RCMs. The subregional seasonal error range of daily precipitation sums ranges from −45.7 to +94.7%, corresponding to −2.0 to +3.1 mm/day.

We also compare two other precipitation parameters, intensity and frequency. It has to be noted that these parameters are calculated for wet days, defined as days with at least 1 mm of precipitation. These parameters demonstrate that mainly a positive frequency bias contributes to the wet bias (see Figs. 4 and 5 of Online Resource). Intensity is reproduced well in most simulations of all models. In case of frequency the pattern of biases is very similar to the one of mean precipitation bias. In subregion SE precipitation occurs too often throughout all seasons on the one hand, on the other hand precipitation events are less intense than observed. This subregion is dominated by plain areas. On the contrary, subregion SW features complex topography. In that subregion the mean intensity of events is strongly overestimated in summer and winter, and it also rains and snows too often. The transitional seasons, autumn and especially spring, are captured well in all subregions with respect to intensity. For these two seasons the relative bias hardly exaggerates 10% which lies within the range of measurement errors. The subregional seasonal error range for frequency lies between −34.2 and +47.3%, the one for intensity between −49.4 and +48.2%.

5 Discussion and conclusions

In the presented study we aim at quantifying error ranges (defined as the interval between the 2.5th percentile and the 97.5th percentile) of regional climate models when operated at high resolution (10 km grid spacing) in the Alpine region. Therefore, a total of 62 one-year simulations with 4 different regional climate models were conducted and evaluated for the year 1999. This rather short evaluation period was chosen in favor of a large model ensemble and has been justified by comparison with long term climate avarages. Of course, one has to keep in mind that by choosing a single year for simulation increases the sampling uncertainty due to specific error characteristics within this year. As a consequence one has to be careful when translating these results to simulation periods of several decades and has to consider that the sampling error most likely results in an overestimation of error ranges (assuming that errors in specific years would cancel out to some degree when averaged over several years). But it could result in an underestimation as well, assuming that the investigated year accidentally overpronounces weather situations that are well simulated by all 4 models. However, we consider this latter case as rather unlikely since we did not find worse model performance in longer term simulations performed with CCLM and MM5 (not shown).

One particular focus of this study is on the question whether error ranges heavily depend on the scale of the evaluation region. To answer that we split the model domain into several subregions with a median size of ∼100,000 km2 and compare these results with the ones obtained for the Greater Alpine region (“GAR”, ∼680,000 km2), which roughly corresponds to the scale of analysis in recent projects like PRUDENCE and ENSEMBLES.

Simulated temperatures are predominantly cold biased. The reasons for that are highly model dependent. Some models are better at reproducing minimum temperatures, others at reproducing maximum temperatures. Some examples: CCLM and MM5 have problems to reproduce daily maximum temperatures, which is probably related to underestimation of snow cover in winter and spring, and to overestimation of precipitation frequency and cloud cover in summer. In terms of minimum temperature REMO is cold biased in the mountainous subregions and warm biased in orographically less complex subregions. In MM5 and WRF minimum temperatures are warm biased in winter which might be due to problems in developing a strong inversion layer during nighttime. However, a detailed analysis of the deficiencies of each model is out of scope of this study.

In terms of precipitation models develop a larger error range in seasons in which convection is a dominant factor (i.e., summer) when averaged over the entire Alpine region. However, this effect disappears in subregional analysis, where the error range for summer and winter is nearly the same. In winter precipitation is particularly overestimated in mountainous subregions. Large parts of this wet bias are related to frequency rather than intensity overestimation. In subregions close to the inflow boundary dry biases can occur, depending on the update interval of and the presence of hydrometeor variables in the lateral boundary conditions.

Generally, large precipitation biases do not occur in the same subregions as large temperature biases, and one cannot identify any subregion that is captured best by all models in terms of both precipitation and temperature. Likewise, it is hard to pick out one best model for all subregions, though REMO has the smallest area averaged temperature bias and CCLM and REMO feature the smallest area averaged precipitation biases.

The question whether error characteristics worsen when analyzed at smaller scales is treated in Fig. 9. This figure shows the inner 95th percentile and interquartile ranges evaluated over the GAR and separately evaluated in smaller subregions. For temperature the error ranges do not increase at smaller scales in all seasons except winter. With respect to precipitation biases, the error ranges increase by 28% when the evaluation is done within subregions.

Fig. 9
figure 9

Comparison of biases for temperature (left) and precipitation (right) obtained by averaging over the whole Greater Alpine Region (“GAR”) as displayed in Fig. 2 and the separate subregions displayed in Fig. 3, respectively. The biases are shown in terms of percentiles of the ensemble of all experiments (see legend in the lower left corner). Column A is winter, B is spring, C is summer, D is autumn and E the full year

The subregional seasonal error range over the entire Alpine region for the bias of temperature lies between −3.2 and +2.0 K. The subregional seasonal error range for daily precipitation sums varies from −2.0 (−45.7%) to +3.1 mm/day (+94.7%).

The results of this study demonstrate that high resolution RCMs are applicable in relatively small scale climate hindcast simulations with a comparable quality as on well investigated larger scales as far as temperature is concerned. For precipitation, which is a much more demanding parameter, the quality is moderately degraded on smaller scales. The results give some confidence also for the application of high resolution RCMs in future climate simulations. However, they cannot be mapped directly to future simulations, since they disregard errors of a global climate model and the RCMs reaction to them. Furthermore, the presented error ranges should not be confused with uncertainty in projected climate change: The former relates to the range of differences between simulations and observations, the latter to the range of differences between pairs of simulations performed with the same model (i.e., future scenario simulation and past control simulation). In the latter case systematic model biases cancel out, which narrows the uncertainty range considerably. However, in applications where RCM output is directly fed into climate change impact investigations (e.g., crop models, river discharge models, etc.), the presented RCM error ranges should be considered and, where they exceed the acceptable range, empirical–statistical post processing methods (e.g., Themeßl et al. 2010) should be applied.