1 Introduction

Atmospheric-Ocean General Circulation Models (AOGCMs) constitute the primary and most comprehensive tools to study future climate. However, due to the large number and complexity of processes to be represented, the long simulations needed for climate studies, and the need of ensemble simulations to provide robust statistical estimates, computational constraints severely restrict the horizontal grid mesh used in the discretized equations. Present horizontal grid intervals of the atmospheric component of AOGCMs are usually between 125 and 400 km (Randall et al. 2007); these are insufficient to resolve the fine-scale structure of several climatic processes.

The method most often used to perform affordable high-resolution regional climate simulations is the nesting regional climate modelling technique; it consists of using time-dependent large-scale atmospheric fields and ocean surface boundary conditions to drive a high-resolution atmospheric model integrated over a limited-area domain (Giorgi et al. 2001). The atmospheric driving data are either derived from lower resolution General Circulation Models (GCMs) simulations or analyses of observations (reanalyses). Typical Regional Climate Models (RCMs) horizontal grids for climate simulations are about 50 km, although long-term simulations are increasingly being performed using grids of 10 km (Kanamitsu and Kanamaru 2007; Suklitsch et al. 2010). For a detailed description of potential merits and limitations of nested RCMs, refer to Laprise et al. (2008) and Rummukainen (2010). Alternative methods to obtain regional climate information also exist, such as variable-resolution global models, time-slices of high-resolution global models and empirical-statistical techniques (e.g. Christensen et al. 2007), but these will not be addressed in this paper.

RCMs have been used in a broad spectrum of applications such as the reconstruction of recent-past climate on the regional scale (e.g., Mesinger et al. 2006; Kanamitsu and Kanamaru 2007), the downscaling of low-resolution global simulations in seasonal prediction investigations (e.g., Rauscher et al. 2007; Seth et al. 2007; De Sales and Xue 2010) and the study of processes and mechanisms in the regional scale (Pielke et al. 1999; Roebber and Gyakum 2003). During the last decade, RCMs have become increasingly used for dynamical downscaling of climate-change projections (Christensen et al. 2007; and references therein), by driving RCMs with GCM-simulated climate-change projections.

In any of these applications, the RCMs objective is to simulate small-scale climate processes that are absent in the coarser resolution simulation providing the driving data. This implies that, from a practical viewpoint, the key issue in the nesting regional modelling technique evaluation is to determine whether RCM simulations improve the representation of climatic statistics compared to the driving data (Prömmel et al. 2010). Generally, studies evaluating the relative skill of RCMs and the driving fields are designated as added value (AV) studies (Bärring and Laprise 2005; Rockel et al. 2010).

Despite of the great importance of identifying AV in RCM simulations, the AV issue has not received much attention till recently (Laprise 2005; Feser and von Storch 2005). In recent years, however, the AV problem received increased attention and become the main subject of several studies (Castro et al. 2005; Sotillo et al. 2005; Duffy et al. 2006; Feser 2006; Kanamitsu and Kanamaru 2007; Rauscher et al. 2007; Sanchez-Gomez et al. 2009; Winterfeldt and Weisse 2009; De Sales and Xue 2010; Prömmel et al. 2010) and a central topic in a number of workshops (Bärring and Laprise 2005; Rockel et al. 2010). Although some authors (e.g. Liang et al. 2008) believe that the existence of AV generated by the RCM downscaling technique was already demonstrated for some particular measures (e.g. reduction of precipitation biases), evidence found in a large number of articles do not tend to univocally support this view, rather suggesting that AV remains an important open question for the community.

For example, when downscaling reanalyses data, studies generally show that RCMs add value to their driving data for surface variables (e.g., surface temperature and 10-m wind speed) in regions characterized by small-scale orographic features such as mountainous regions (Feser 2006; Prömmel et al. 2010) and coastal areas (Sotillo et al. 2005; Winterfeldt and Weisse 2009); but little AV and even degradation is sometimes found in regions with no important small-scale physiographic forcings (Winterfeldt and Weisse 2009). Long-term large-scale features (i.e., general circulation) are generally reasonably well reproduced by RCMs (Feser 2006; Sanchez-Gomez et al. 2009), but degradation of large-scale fields arises when considering shorter time scales (e.g., daily mean) (Castro et al. 2005; Sanchez-Gomez et al. 2009).

Somewhat similar results are found when using GCM-simulated lateral boundary conditions (LBCs). According to Seth et al. (2007) and De Sales and Xue (2010), RCMs generally improve the simulation of precipitation compared to GCMs in regions where small-scale surface forcings are important and/or GCMs do not performed very well, but RCMs can degrade the simulated climate in those regions where GCMs perform well and/or large-scale forcings are dominant. De Sales and Xue (2010) also showed that the AV of RCMs is strongly dependent on the region considered; in their study, the improvement on the representation of the Andes Mountain Range by the RCM compared to the GCM was a key factor to adding value to the simulation of low-level moisture fluxes and precipitation in South America.

The value added by RCMs seems to depend on a variety of factors. A key factor is related with the climatic variable considered in the assessment, understanding the term “climatic variable” in a broad sense as some statistical measure of a variable computed for a given season and region. So some remaining questions are: For which climate statistics should one hope to find AV from dynamical downscaling? How the AV depends on the temporal scale of the climatic variable? Where and when can some AV be found for monthly-mean values? The objective of this article is to examine these issues by making a systematic characterization of a necessary condition to be satisfied by climate statistics in order that AV be generated through the use of the RCM technique. Given that the ansatz behind the dynamical downscaling technique is that an RCM, driven by large-scale atmospheric fields at its LBC, generates fine scales that are dynamically consistent with these, this paper will focus specifically on the fine-scale information generated by the use of nested high-resolution RCM. The presence and magnitude of fine-scale variance required to adequately describe a given climate statistics will then be used to quantify the potential of RCMs to add value.

Our study will be performed using precipitation data simulated by several RCMs; this will allow to determine which of the findings are inherent to the downscaling technique and which are specific to a particular model. Datasets based on observations will also be analysed in order to highlight limitations of RCMs performance when possible as well as to indicate disagreements among observed datasets. The use of precipitation is justified because it is a variable that displays a wide range of temporal and spatial scales, and thus a variable that tends to maximize the potential AV. It is also a key variable because some of the most important societal impacts of climate change will probably result from changes in precipitation (Trenberth et al. 2003; Gutowski et al. 2007).

The paper is organized as follows. The next section discusses in more detail the issue of added value and the objectives of this article. Section 3 presents a brief description of the data used. Section 4 describes the method used to analyze the dependence of the precipitation field on various temporal and spatial scales, together with the manner in which statistics are computed. Results are presented in Sect. 5 with some general results of the method and specific results of the characterization of AV as function of several parameters. Some discussion of the results and conclusions is given in Sect. 6.

2 Added value issue

2.1 General characterization

Figure 1 shows a diagram adapted from Orlanski (1975) and von Storch (2005) illustrating the characteristic temporal and horizontal spatial scales of atmospheric processes, together with the range of scales represented by climate models. Grey shaded areas are regions of dominant spectral power resulting from the composite of a broad range of atmospheric variables. Due to the space and time truncation, numerical models can only resolve explicitly a part of the atmospheric processes; smaller scale phenomena are at best accounted for in an average sense through the subgrid-scale parameterisations. Differences between RCM- and GCM-resolved scales can be conceptually seen in Fig. 1 by comparing the areas boxed in by the blue and red solid lines. These boxes were constructed using as lower limit the temporal and spatial intervals of discretization, and as the upper limit the entire computational domain and length of simulations. For a typical RCM the lower limits are located roughly at 50 km and 5 min., and the upper limit at spatial scales of 10,000 km. For a standard GCM the lower limit is here taken as 300 km and 30 min., and the upper limit as 40,000 km. It should be noted that timestep and grid spacing of models only constitute a lower limit to temporal and spatial resolution (Pielke 1991).

Fig. 1
figure 1

Characteristic temporal and horizontal spatial scales of atmospheric processes (in black) and the range of scales represented in RCMs (blue line) and GCMs (red line). Light- red and blue shaded regions represent added value of type 1 and 2, respectively. Blue squares represent temporal and spatial scales of the data produced with the multi-resolution method

Figure 1 highlights that the main potential advantage of a RCM over a GCM is related with the representation of spatial scales smaller than 300 km and/or temporal scales smaller than 30 min that are absent in the GCMs. The enhanced horizontal resolution of an RCM implies some potential advantages compared to a lower resolution GCM: (i) a more accurate discretization of equations; (ii) a broader range of fine-spatial scales explicitly resolved; and (iii) an improvement in the representation of surface forcings such as topography, lakes, coastal regions and others. Figure 1 also shows that scales larger than the RCM domain are not within the resolved scale interval of RCM; hence planetary scales are only felt insofar as they are provided by the driving data through the LBC.

The original paradigm of the nesting technique is that RCMs can be used as sophisticated magnifying glasses (MG) where “the generated small scales accurately represent those that would be present in the driving data if they were not limited by resolution” (Laprise et al. 2008; hereafter referred as MG hypothesis). The idea behind this hypothesis is that an RCM can be used to represent small scales that would be present in a desirable but in practice unaffordable high-resolution GCM (HRGCM). The evaluation of the MG hypothesis has been addressed without having to use observed data, through a systematic approach developed by Denis et al. (2002): the “Big Brother Experiment” (BBE). In its idealized version, the BBE consists in comparing two high-resolution simulations generated by using different configurations of the same model: a simulation conducted with a high-resolution global model (referred to as “big brother”) and a RCM simulation (referred to as “little brother”) run at the same resolution and with the same discretisation and parameterisation as the big brother, but forced by low-resolution LBCs derived from filtering out the big-brother simulated fine scales. The low-resolution data resulting from filtering mimics the situation when driving an RCM from GCM data. The use of the same model, run in different configurations while using the same physics, dynamics and numerics, permits to circumvent errors due to the model itself, through the BBE perfect model approach. As a consequence, the differences between the little and the big brothers can be attributed solely to the nesting technique and to differences in LBCs due to the use of low-resolution data to drive the little brother.

A low-cost version of the BBE was obtained by replacing the HRGCM by a very large domain high-resolution RCM; it was successfully used to show that the little brother tends to replicate the magnitude and spatial distribution of small-scale climate statistics present in the big brother, at least in mid-latitude climates (see Laprise et al. 2008 and references therein). The BBE was also very useful to study the influence of a variety of parameters in RCMs setup such as the impact of the size of the domain used to run the model (e.g., Leduc and Laprise 2009) and the impact of LBC errors (Diaconescu et al. 2007).

2.2 Added value concept

The evaluation of the MG hypothesis with the BBE does not depend on the model performance, i.e. model’s skill at reproducing the observed climate, which has definite advantages as discussed above; it has also its downside. Satisfying the MG hypothesis does not imply that the high-resolution RCM-derived statistics are closer to observed statistics than those that would be produced by a low-resolution GCM; hence the conclusions are mute about whether RCM provide any real added value compared to coarser resolution GCM. Indeed, as discussed by several authors for both GCMs (Boer and Lazare 1988; Boville 1991; Boyle 1993) and RCMs (Giorgi and Marinucci 1996), higher resolution simulations do not necessarily produce results closer to the observed values, in part because the approximations in models do not converge monotonically with resolution and the performance is strongly dependent on the behaviour of parameterizations.

These days, the most popular paradigm used to evaluate RCMs is through a pragmatic consideration about their usefulness, evaluating if RCMs are able to add value (AV) to, i.e. improve, the simulation of climate statistics compared to those produced by GCMs. The AV hypothesis has important differences compared to the MG hypothesis. First, it introduces the necessity of using observed data in its validation, thus inducing important constraints due to the scarceness of fine-scale observations and the limited number of variables (e.g., precipitation, temperature, surface pressure) available for validation. Second, the evaluation of the AV hypothesis implies the assessment of the performance of the RCM downscaling technique but also of the relative performance of the RCM and its driving data (i.e., the AV is model dependent). It is generally very difficult to determine if an improvement (degradation) of a given climate statistics comes from advantages (disadvantages) of the RCM downscaling technique or because the RCM performed better (worse) than the GCM; there may also be compensating errors between the driving data and the RCM that may result in apparent improvements, but for wrong reasons. From this viewpoint, it seems that the only way of using the AV paradigm to obtain some intrinsic characteristics of the downscaling technique is through the use of a large ensemble of RCM-GCM pairs of simulations in order to extract common behaviours.

In general, RCMs could simulate more realistic climate than the lower resolution driving data by adding value in two different ways. First by adding climate variability in scales that are not explicitly resolved by GCMs (hereafter referred to as added value of type 1 (AV1), as indicated by the light-blue shaded region in Fig. 1. A second way is by improving the simulation of climate in those scales that are common to both RCMs and GCMs, hereafter referred to as added value of type 2 (AV2), as indicated by the light-red shaded region in Fig. 1. The separation of AV in two components can be helpful because of the different methodological approach needed to assess both types of AV. The estimation of AV coming from the additional climate variability in scales only resolved in RCM simulations (AV1) is ultimately an evaluation of the performance of RCMs to simulate small-scale variability. On the other hand, the evaluation of the improvements produced by the RCM in the range of scales resolved by both models (AV2) can be done by comparing results from the RCM and the GCM with large-scale analyses of observations to determine which one produces better performance. This classification can be complemented with the one proposed by Castro et al. (2005) in which RCM dynamical downscaling technique is separated into four distinct types according to the LBCs used to drive the RCM.

Due to the limited domain size of RCMs, the lack of two-way interaction between the regional domain and the rest of the globe, and the lateral boundary condition issues, it is not clear whether RCMs actually improve or degrade the larger scales; hence AV2 has not been clearly identified and it is still a debated topic in the modelling community (e.g., Laprise et al. 2008), although recent results indicate that there may be improvements under specific circumstances (Veljovic et al. 2010). In the following we will concentrate on AV1 although it will be generally refer to as AV.

2.3 Potential added value concept

It is important to note that in some cases the absence of added value might be related to the failure of the assumptions from which AV is expected. For example, from Fig. 1 it seems clear that very little AV can be found when analyzing monthly-mean precipitation data in regions without important surface forcings because monthly scales are predominantly associated with large-spatial scales that are probably well resolved by GCMs. A necessary condition for RCMs to produce AV1 is that the contribution of the simulated fine-scale details on the climate statistics of interest is not negligible. That is, if the RCM does not produce any climatic information at small scales then, by definition, there is no AV1.

The study of the relative contribution of fine scales in a given climate statistics has lead to the concept of “potential added value” (PAV) as discussed in Bresson and Laprise (2009). The term “potential” in this definition accounts for the fact that the presence of small scales is not a sufficient condition to have AV1. A simple example is an RCM that generates small scales but with little resemblance between simulated and observed patterns or amplitude. Then, we can argue that the small scales are not skilful and do not add any real value to the coarser GCM climate, even if they suggest a large PAV. Clearly however, the presence of PAV in RCM simulations is a prerequisite for, although not a definite proof of AV1.

In this article, a perfect-model approach was developed to study the PAV. The idea behind the PAV concept is that the high-resolution (e.g., 50-km grid spacing) precipitation field simulated by a RCM will be aggregated into a coarse-resolution (e.g., 300-km grid spacing) in order to generate what we can call a “virtual GCM” field. The important hypothesis behind this framework is that the virtual GCM can be interpreted to represent more or less the same statistics as those resulting from a climate model operating at similar grid spacing. Evidently, a virtual GCM differs from a real GCM due to a number of reasons, among them that the virtual GCM fields are influenced through the upscaling of fine-scale processes that are resolved in the high-resolution RCM simulation but would be absent in a low-resolution GCM simulation.

Di Luca (2009) compared the statistics of extremes (e.g., 95th percentile) of a virtual GCM with those of a real GCM, for the precipitation as simulated by the Canadian Global Climate Model (CGCM3) and the Canadian RCM (CRCM) driven by the CGCM3. The evaluation focused on time scales larger than a day and on spatial scales that are common to both models, i.e., the CRCM was aggregated at the CGCM3 resolution to generate virtual GCM datasets. Using as reference daily observational time series, the comparison showed that seasonal biases are of the same magnitude in the CGCM3 and the CRCM. Both models display also comparable skills to simulate the frequency and intensity of observed daily values, but the CGCM3 generally shows larger 95th percentile values. It is important to note that the performance of GCMs, under the assumption that model precipitation output represents an areal mean (see Chen and Knutson (2008) for a detailed discussion), is evaluated using observations which are always made at finer spatial resolutions than the GCMs resolutions. This implies that GCMs are evaluated and sometimes “calibrated” using observed statistics that are similar in nature as virtual GCM statistics, containing for example the upscale of fine-scale processes that occur in the real climate system.

The aim of this study is to develop some simple measures to characterize the PAV of the precipitation field as simulated by a number of RCMs and as represented in reanalyses and observations. The dependence of PAV on several parameters will be evaluated: the choice of the temporal scale of the data (ranging from 3-hourly to 16 days means), the region (e.g., complex topography region versus flat region), and the season (e.g., mostly convective in summer versus stratiform in winter precipitation).

3 Data

The potential added value as defined in the last section is dataset dependent. Four different but not independent sources of high-resolution precipitation data (HRD) are used in this study. RCM-simulations are used to evaluate the PAV suggested by models. One reanalysis and two gridded observed datasets are used to estimate the PAV of changing resolution of data in the real climate system. Observed datasets are more reliable in the conterminous United States due to the higher density of stations compared to Canada and oceanic regions and, for this reason, the region of study is located in continental United States. Particularly at fine temporal scales and over complex terrains, observed datasets cannot be fully trusted and will not be considered as a ground truth.

3.1 NARCCAP simulations

The RCM simulations to be used in this study are those from the North American Regional Climate Change Assessment Program (NARCAAP; http://www.narccap.ucar.edu/; Mearns et al. 2009). In NARCCAP, six RCMs were run with a horizontal grid spacing of about 50 km over similar North American domains covering Canada, United States and most of Mexico, during the 25 years between 1980 and 2004. The RCM simulations to be used are those of contemporary climate using driving data derived from the National Centers for Environmental Prediction (NCEP)—Department of Energy (DOE) Atmospheric Model Intercomparison Project II (AMIP-II) global reanalysis (R-2; Kanamitsu et al. 2002). Table 1 gives the acronyms, full names and the modelling group of each RCM, together with the number of grid points within the computational domain of each model available for the analysis and the map projection and the number of vertical levels. The computational domain of the CRCM RCM is shown in Fig. 2 and a brief description of each NARCCAP RCM can be found at http://narccap.ucar.edu/data/rcm-characteristics.html.

Table 1 Acronyms, full names and modelling group of RCMs involved in the NARCCAP project
Fig. 2
figure 2

Computational domain and topographic field as represented in the CRCM together with the specification of the seven regions of interest. All regions have the same dimensions (i.e., 6.4° by 6.4°)

3.2 CPC gridded precipitation

An interesting source of high spatial resolution precipitation data for daily and longer time scales is given by the gridded Climate Prediction Center’s (CPC) product derived using stations from the Unified Raingauge Database (URD) (Higgins et al. 2000). This dataset consists in daily analyses, gridded at 0.25° by 0.25°, from over 8,000 stations each day covering the period 1948–1998 with no missing values. The dataset covers the domain 20–60° N, 140°–60° W over continental United States with an heterogeneous density of stations, higher in the eastern part of United States but with relatively good coverage in all continental United States.

3.3 UWash gridded precipitation

This daily gridded precipitation dataset was obtained from the Surface Water Modeling group at the University of Washington (UWash) from their web site at http://www.hydro.washington.edu/Lettenmaier/Data/gridded/ and is described by Maurer et al. (2002). Within the coterminous United States, it uses daily totals of precipitation from the National Oceanic and Atmospheric Administration Cooperative stations, also included in the URD database, to produce a 1/8° gridded dataset using the synergraphic mapping system algorithm of Shepard (1984). In order to better capture local variations due to complex terrain, each grid cell of the 1/8° gridded dataset is adjusted using monthly-mean values computed with the parameter-elevation regressions on independent slopes model (PRISM). PRISM (for more details see Daly et al. 1994) is an analytical model that uses statistical relations between the observed precipitation and several topographical parameters (e.g., elevation, steepness of the terrain, orientation of the slope, and others) derived from a digital elevation model (DEM) in order to provide gridded precipitation products better adapted over elevated terrains where rain gauge data are sparse. The influence of PRISM has little effect on the adjusted precipitation in flat regions and so UWash is expected to be similar to CPC in these regions.

3.4 NARR reanalyses

The North American Regional Reanalysis (NARR, http://www.emc.ncep.noaa.gov/mmb/rreanl/index.html) is a product created at NCEP that combines, in a dynamically consistent way, the simulated fields by the NCEP regional Eta model (Mesinger 2000) driven at its LBCs by the R-2 reanalysis, together with numerous additional observed datasets through the use of the NCEP Data Assimilation System (Mesinger et al. 2006). NARR has a grid spacing of 32 km and 45 layers in the vertical, and reanalysis fields are available every 3 h between 1979 and 2003, over a large domain covering Canada, United States and Mexico. According to Mesinger et al. (2006), in addition to its higher resolution, one of the main advantages of these reanalyses is the assimilation of latent heating profiles derived from precipitation analyses (Lin et al. 1999). The precipitation dataset assimilated in NARR is a daily, 1/8° analysis obtained by gridding rain gauge observations from the URD using the orographic adjustment technique PRISM already discussed (Mesinger et al. 2006).

4 Methodology

4.1 Multi-resolution approach

To analyze the scale dependence of the above-described data, the multi-resolution (MR) approach is used. The MR method (see Mallat 1989 for details) has been used in several studies to analyze the temporal (Howell and Mahrt 1997; Vickers and Mahrt 2002) and spatial (Zepeda-Arce et al. 2000; Harris et al. 2001) variability of atmospheric variables. The MR method consists in the application of numerical filters in order to aggregate the original high-resolution time-varying precipitation fields into lower-resolution temporal and spatial scales. In both the temporal and spatial dimensions, the filtering is performed by aggregation of the original precipitation field into several lower resolution grids. A total of five spatial scales (~0.4°, 0.8°, 1.6°, 3.2° and 6.4°) and 8 temporal scales (ranging between 3 h and 16 days) resolution datasets are considered. As it will be explained in detail later, the dependence of several precipitation statistics on spatial scales will be used to determine the relative importance of small scales and define various PAV quantities.

4.1.1 Spatial scale analysis

In this study, a slightly different version of the MR method of Mallat (1989) is developed by aggregating the original HRD precipitation fields on some common lower resolution grid meshes. The precipitation aggregation is performed on various resolution meshes occupying regions of 6.4° by 6.4° (i.e., about 550 km by 550 km at a latitude of 40°) as a compromise between two opposing needs in relation to their size: first, that regions be large enough to estimate climate statistics at a range of spatial scales spanning the minimum resolved by current GCMs, and second, that they be small enough to represent fairly homogeneous regions across North America in order to analyze the dependence of results on different surface forcings. Figure 2 shows the seven regions selected for the analysis, together with the topography field as represented in the CRCM. In the following, regions are denoted by adding to LON the west longitude of their centre (e.g., the region centred on −118.0° of longitude is called LON118).

The finest scale of the MR analysis is done over grid meshes with 0.4° of grid spacing, which was chosen such as to be finer than the grid spacing of all NARCCAP RCMs; on this scale the precipitation field is identical to that simulated by the RCM, ensuring that the full information of each RCM is retained. The number of RCM grid points contributing to the aggregation at each scale depends on each RCM due to their specific map projections and horizontal grid spacing. Table 2 shows the minimum and maximum number of grid points inside 6.4° by 6.4° regions (see Fig. 2) together with the mean grid spacing inside each region for each HRD.

Table 2 Minimum and maximum number of grid points and the corresponding effective grid spacing in the 6.4° by 6.4° regions of Fig. 2 for each high-resolution dataset

NARR and CPC grid spacings are smaller than 0.4°. In those grid boxes with more than one NARR or CPC grid points, the 0.4° grid spacing value is obtained simply by computed the arithmetic average of the points every 3 h. Hereafter, the finest scale will be denoted as 0.4°, but it should be clear that for models it does represent the data at the original grid spacing of each NARCCAP RCM (i.e., in the range 0.44°–0.51°). As an example of the spatial distribution of grid points inside a region, Fig. 3a presents the location of CRCM grid points (blue squares) together with the 0.4°-grid mesh (red crosses) over the 6.4° by 6.4° LON118 region.

Fig. 3
figure 3

Location of the new grid points (red crosses) defined by aggregating the original high-resolution fields (blue dots) over grid meshes with grid spacings given by 0.4° (a), 0.8° (b), 1.6° (c), 3.2° (d) and 6.4° (e). Data correspond to the CRCM model for the LON118 region. Blue dots represent the grid points of the CRCM model in its original grid mesh

The second scale next to the finest is obtained by aggregating the original precipitation field of each HRD over grid boxes defined by a grid mesh with a horizontal grid spacing of 0.8°. The upscaling at the 0.8° scale is made by simply computing, at each time interval, the average of all HRD-points inside each 0.8° grid box (i.e. by computing a simple arithmetic area-average value). As shown in Fig. 3b, grid boxes at the 0.8° scale contain a variable number of the original RCM grid points, that vary between 2 and 4 in the case of the CRCM.

In a similar way, other spatial scales are calculated by aggregating the original precipitation field over grid meshes characterizes by horizontal grid spacings of 1.6°, 3.2° and 6.4°, as illustrated in Fig. 3c, d and e, respectively. The 6.4° scale corresponds to the coarser spatial scale and it is obtained by averaging, at each time step, the precipitation rate values of all the RCM grid points inside the region (see Fig. 3e).

Giorgi (2002) used a similar filtering approach to study the spatial-scale dependence of interannual climate variability of temperature and precipitation over several regions around the Earth. Starting with a 0.5° grid spacing dataset, he computed a two-dimensional running spatial average at various spatial scales.

4.1.2 Temporal scale analysis

The temporal scale analysis is performed in a similar way as the spatial one. In this case, the finest temporal scale corresponds to the 3-hourly time series of archived precipitation for any given grid point. The second temporal scale, the 6-h scale, is obtained by simply computing the arithmetic average between two consecutive 3-hourly data and thus reducing in a factor of two the total number of data in the time series. Similarly, six other temporal scales are defined.

For each year between 1981 and 2000 (1998 for CPC gridded precipitation), we selected two subsets of 128 days in order to generate cold- and warm-season time series (a long season of 128 days was chosen in order to be able to represent the amount of 3-hourly data as a power of 2). Cold season is defined by the four months between November and February and warm season is defined by those months between April and June. For these periods, a total of 19 cold seasons and 20 warm seasons are obtained.

4.1.3 Spatiotemporal scale analysis

The spatial and temporal scale filtering are then applied simultaneously to each HRD in order to obtain a spatio-temporal multi-scale dataset composed by a total of 40 (five spatial scales and eight temporal scales) time-varying fields. For any given HRD, the multi-scale dataset is denoted as Prn,m with index n, varying between 0 and 4, identifying the spatial scale and index m, varying between 0 and 7, denoting the temporal scales. As already mentioned, the five spatial scales are associated with grid spacings of 0.4°, 0.8°, 1.6°, 3.2° and 6.4°, and temporal scales vary from 3 h (m = 0) to 384 h (m = 7). Each dataset Prn,m is illustrated in Fig. 1 according to their minimum temporal and spatial scale. Filled blue squares denote those datasets with spatial grid spacings smaller than ~3.2° (~275 km at 40° of latitude) that can only be represented by standard RCMs. Datasets denoted with non-filled blue squares correspond to those with spatial scales larger or equal than 3.2° that can be represented by both, RCMs and GCMs.

4.2 Multi-scale statistics

In order to compare results of the precipitation field at different resolutions, we will calculate a number of statistics over each region.

Grid point statistics (q n,m95 ): for each time series Prn,m (red crosses in Fig. 3) at spatial scale n and temporal scale m, the corresponding temporal histograms are calculated by partitioning the interval of possible wet events outcomes into subsets of 0.1 mm/day width, and then divided by the total number of outcomes to obtain the frequency in each bin. 95th percentiles are then computed for each grid point frequency distribution. Wet events are defined here as those events with a mean precipitation rate larger than 0.1 mm/day (as done in Lenderink et al. 2007).

Spatialmean statistics (q n,mmean,95 ): for each spatial scale n, the spatial-mean 95th percentiles computed by putting in the same histogram (pooling) events from all grid points and then computing the 95th percentile (similar to method 3 of quantiles computations in Déqué and Somot (2008).

Spatial-maximum statistics (q n,mmax,95 ): in each region, the maximum value of the grid-point 95th percentile distribution at spatial scale n is taken.

In this way, (q n,mmean,95 ) constitutes a regional measure by representing the mean statistics over the entire region and (q n,mmax,95 ) a local measure at one grid point. Differences between the regional and local measures arise from the presence of spatial gradients in the precipitation distributions. Several mechanisms can generate these gradients in instantaneous fields; when considering climatic statistics computed from 20-year data however, they are quite probably due to the existence of stationary forcings. It should be noted that the spatial-mean quantity is roughly equivalent to what would be obtained by applying a Fourier transform at a similar wavenumber.

In this paper results are presented only for the 95th percentile, but the analysis was conducted also for other quantities and some of these results will be summarized in the next section.

5 Results

5.1 Multiscale intensity-frequency distributions

The process of aggregating precipitation data in space (time) acts as a spatial (temporal) filter that tends to smooth out the extremes in any given field (time series), thus narrowing the intensity–frequency distribution. As a result, systematic changes are introduced in the original high-resolution precipitation field (time series) as it is upscaled into lower resolution fields (time series):

  • Local maximum values in the lower resolution dataset are always smaller than or equal to those in the original dataset. That is, higher-order percentiles (e.g., 95th percentile) tend to be smaller in the coarser resolution datasets than at higher resolution.

  • The absolute number of dry events (those events with precipitation rate smaller than 0.1 mm/day) tends to decrease when the precipitation field is aggregated into lower resolution grid meshes.

  • Low and moderate precipitation rates tend to be more frequent in lower resolution datasets, compensating the deficit in dry and heavier events.

The general changes suggested in this three points can be illustrated by showing the spatial-mean intensity distributions in the NARR data aggregated at several temporal and spatial resolution (see Fig. 4). Results correspond to the LON86 region but similar results are obtained in other regions (not shown). For 3-hourly data in cold (Fig. 4a) and warm (Fig. 4b) seasons, dry events represent on the order of 30–70% of the total events, with a larger value in the high horizontal resolution dataset (70.5% in cold and 66.9% in warm season) compared to the coarser one (46.5% in cold and 32.3% in warm season). Low and moderate precipitation events (those between 0.1 and 16 mm/day) are more frequent in the aggregated data at 6.4° grid spacing (42.8% in cold and 54.7% in warm season) compared to the 0.4° horizontal interval dataset (22.5% in cold and 20.5% in warm season). Finally, 3-hourly events with precipitation rates higher than 64 mm/day show a relative frequency more than an order of magnitude larger in the 0.4° grid spacing than in the 6.4° data (0.38% vs. 0.005% in cold and 0.57% vs. 0.03% in warm seasons); that is, heavier precipitation events are more frequent in high-resolution precipitation field (see Fig. 4a, b).

Fig. 4
figure 4

LON86 spatial-mean intensity distributions of precipitation rate as simulated by the NARR reanalysis for 3-hourly data in cold (a) and warm (b) seasons and for 16-days data in cold (c) and warm (d) seasons. Colors are associated with 0.4° (red), 1.6° (green) and 6.4° (blue) spatial scales. Only frequencies greater than 0.01% are shown

A similar behaviour is found when computing intensity-frequency distributions for several spatial resolution datasets for 16-day cumulated periods (Fig. 4c, d). In this case, the temporal aggregation tends to filter out the more extreme simulated precipitation (both the no-precipitation and heavier events), thus producing an increase in the relative frequency of low- to moderate-precipitation events for every spatial scale. As a result, differences in relative frequencies between different horizontal resolution datasets are strongly reduced, showing that time averaging can limit the effect of changing the spatial resolution of the data. Nevertheless, in both seasons, heavier precipitation events (those larger than 8 mm/day) are more frequent in the higher resolution dataset.

The dissimilar sensitivity to changes in spatial resolution exhibited by different precipitation intensities has important implications in AV studies. It suggests that different statistics will show different potential for added value depending on which part of the distribution is sampled. That is, higher moments of the distribution (e.g., intensity and frequency of heavier precipitation rate events) show a much larger sensitivity to changes in resolution than central moments (e.g., low-moderate precipitation rate events). As already mentioned in Sect. 4.2, we will use the 95th percentile of the wet-event distribution in order to assess the PAV for the several HRD.

5.2 Regional (spatial-mean) potential added value results

Top panels in Fig. 5 show the spatial-mean 95th percentile (q mean,95, see Sect. 4.2 for computation details) of 3-hourly precipitation over the region LON86 as a function of spatial scales. Left and right panels show cold- and warm-season results, and the different curves represent percentiles as calculated from NARR and NARCCAP-RCM simulations. In both seasons and independently of the HRD considered, there is an increase of the (q mean,95) value as the spatial scale increases. Quantitative changes, however, are significantly different when considering different HRDs. For example, in cold season, the WRFP model suggests an increase of 28 mm/day in (q mean,95) between the 6.4° scale (~25 mm/day) and the 0.4° spatial scale (~53 mm/day). The CRCM model shows a change of only 8 mm/day between the same spatial scales (~15 and ~23 mm/day, respectively). It is also clear from Fig. 5 that the spread between models tends to be larger as the horizontal scale of the data decreases; that is, the model uncertainty associated with the estimation of (q mean,95) is higher as the horizontal resolution of the data increases.

Fig. 5
figure 5

Spatial-mean 95th percentile as a function of spatial scales for 3-hourly precipitation in cold (a) and warm (b) seasons. Also shown is the 16-days spatial-mean 95th percentile in cold (c) and warm (d) seasons. Results correspond to the LON86 region. Symbols and colors denote the HRD used in each case. Red line represents CPC results (only available for 16-days results)

Figure 5c, d show (q mean,95) for 16-days precipitation datasets. In this case, differences between the (q mean,95) value in high- and low-resolution datasets are greatly reduced and the spatial-scale dependence of the (q mean,95) is very low. Differences between the several dataset estimations of the spatial-mean 95th percentile are somewhat less important than in the 3-hourly case, and the change of (q mean,95) between 0.4° and 6.4° seems to be quite similar in all RCMs.

The difference between small and large spatial scale climatic statistics can be highlighted by defining the PAV measure as

$$ {\text{PAV}}^{m} = q_{{95,{\text{mean}}}}^{0,m} - q_{{95,{\text{mean}}}}^{3,m} , $$

where q 0,m95,mean and q 3,m95,mean represent the spatial-mean 95th percentile at temporal scale m and spatial grid spacings of approximately 0.4° and 3.2°, respectively (i.e., a jump in resolution of around 8 in the linear horizontal dimension). The PAVm quantity measures the difference between the representation of q m95,mean at fine (i.e., RCM’s) horizontal scale and its large-scale approximation at the temporal scale m. Assuming that the 3.2° spatial scale can be interpreted as a good proxy of the statistics estimated from a GCM at 3.2° grid spacing, then PAVm can be used to estimate the potential added value of a RCM over a GCM as discussed in Sect. 2.3.

A near zero value of the PAV quantity means that, for the quantity of interest (e.g., spatial-mean 95th percentile), the high-resolution estimation does not add extra information over the coarse resolution one. Analogously, PAV ~ 0 can be interpreted as if the application of spatial filters in order to approximate the high-resolution precipitation field at lower resolutions doesn’t filter out any fine-scale variability. Sufficient conditions for PAV = 0 are given by a spatially uniform intensity-frequency distribution field or a field that only contains variability at scales larger than 3.2°.

It should be clear, as was discussed in the introduction, that a non-zero value for PAVm does not necessarily mean that the RCM is adding value, because the small-scale variability may not necessarily be skilful. That is, PAVm ≠ 0 is a necessary, but not sufficient, condition for a high-resolution adding value of type 1 to lower resolution fields.

A relative measure of spatial-mean PAVm can also be obtained by defining

$$ r{\text{PAV}}^{m} = {\frac{{{\text{PAV}}^{m} }}{{q_{{95,{\text{ mean}}}}^{0,m} }}} = 1 - {\frac{{q_{{95,{\text{ mean}}}}^{3,m} }}{{q_{{95,{\text{ mean}}}}^{0,m} }}} , $$

so that 0 ≤ rPAVm ≤ 1. The rPAVm quantity evaluates the proportion of fine spatial scale 95th percentile (q 0,m95,mean ) that is not accounted by its large-scale part (q 3,m95,mean ). Thus rPAV ~ 0 suggests that no fine-scale information is needed to determine q 0,m95,mean , and rPAV ~ 1 means that q 0,m95,mean is solely determined by the fine-scale information.

5.2.1 High temporal resolution data

The improvement in the representation of surface forcings such as topography, lakes and coastal regions due to the higher resolution of RCMs compared to GCMs is expected to strongly influence the added value. A simple but partial assessment of this dependence can be performed by evaluating the PAV in regions with significantly different surface conditions. We expect that the most important forcing is the topographic one (see Fig. 2) over the western regions characterized by complex topography, with higher absolute values and larger gradients than eastern regions. It should be clear however, that differences between regions are not limited to surface forcings but can also be related with other stationary forcings such as the planetary-scale waves (e.g., summertime subsidence in the West Coast) or the moisture sources (e.g., Gulf of Mexico low-level jet in the Great Plains).

Figure 6 shows the 3-hourly PAV (top panels) and rPAV (bottom panels) as a function of regions (from west to east) for the spatial-mean 95th percentile (q mean,95). In cold season (see Fig. 6a), NARCCAP-RCMs show PAV values on the order of 12 mm/day with little variations between the regions, but showing some higher values in eastern regions (those regions to the east of LON98). Differences between RCM PAV estimations are on the order of ±5 mm/day (i.e., ~50%) with a slightly larger spread in eastern regions. PAV values as estimated from the NARR dataset (black line) are also on the order of 12 mm/day, showing a large resemblance with the NARCCAP ensemble-mean (grey line) latitudinal profile.

Fig. 6
figure 6

3-hourly regional PAV measure as a function of regions for the 95th percentile for a cold season and b warm season. Also shown is the relative PAV measure for c cold season and d warm season. Symbols denote individual NARCCAP-RCMs results and lines denote the NARCCAP-ensemble mean (grey) and NARR results (black)

In warm season (Fig. 6b), most RCMs show significant differences between eastern and western regions, with maximum values in LON92 and LON86 regions, and minimum values to the west of LON105. The maximum near central regions is probably due to a relative decrease of the influence of convective activity toward eastern regions. PAV values are on the order of 10 mm/days in western regions and on the order of 25–30 mm/day in eastern regions. Differences between RCM estimations are approximately ±5 mm/day (i.e., ~50%) in western regions and ±15 mm/day (i.e., >50%) in eastern regions. In this season, the ECPC RCM shows much larger values of PAV than others RCMs, particularly in western regions with values three times larger than every other RCM, mainly due to much larger values of q mean,95 for fine spatial scales (not shown). It is also evident that in this season the NARR tends to produce the lowest PAV values for all regions considered. In this case, NARR-low PAV values are related with a tendency to produce very low 95th percentile at fine spatial scales.

Figure 6c, d show the 3-hourly rPAV measure as a function of regions for cold and warm season, respectively. NARCCAP ensemble-mean cold-season values are on the order of 0.4, suggesting that around 40% of the fine scale q mean,95 comes from fine-scale variability that is absent in the large-scale part. In warm season, the NARCCAP ensemble-mean value is on the order of 0.6, showing that a larger part of the fine scale q mean,95 comes from fine spatial scale variability. That is, in all regions, warm-season rPAV values are higher than cold-season values, showing that fine spatial scale variability of precipitation is relatively more important in warm season due to the finer scale of precipitation systems in summer (i.e., convection systems dominates) compared to winter (i.e., synoptic systems dominate). Again, in all regions and particularly in the warm season, NARR tends to produce the lowest rPAV values of all datasets with an average over regions of 0.3 and 0.4 in cold and warm seasons, respectively.

Interesting changes in the regional behaviour are noted when analyzing the rPAV measure. In both seasons, the ensemble-mean of rPAV shows higher values in western regions (0.45 in cold and 0.6 in warm seasons) compared to eastern region (0.35 in cold and 0.5 in warm seasons). As already stated, western regions are characterized by more important surface forcings than eastern regions and so the larger rPAV values in western regions are probably induced by a fine-scale orographic component. In the warm season, there is also a decrease of rPAV from central to east regions maybe related with a relative decrease of the convective activity towards the Atlantic coast.

The spread between the rPAV as estimated from different RCMs is somewhat smaller compared to the PAV quantity, suggesting that absolute values can be very different but the “scaling” properties of precipitation are similar for the several models.

5.2.2 Low temporal resolution data

Figure 7 shows PAV (top panels) and rPAV (bottom panels) for the spatial-mean 95th percentile for the 16-days temporal scale. In both seasons, differences between q mean,95 at fine and large scales are much smaller than in the 3-hourly case, with PAV values generally smaller than 2 mm/day.

Fig. 7
figure 7

As in Fig. 6 but for the 16-days regional PAV and rPAV measures. Red and blue lines represent CPC and UWash results, respectively

Cold-season results show that there is a very good agreement, particularly in eastern regions, between results produced by NARCCAP RCMs, NARR, CPC and UWash observations. In warm season and assuming that UWash observations represent the most reliable source of information, it seems that NARCCAP RCMs tend to produce an overestimation of the PAV quantity over western regions, with a large overestimation by the ECPC RCM. In eastern regions, NARCCAP RCM values are in good agreement with those from CPC and UWash, with NARR data tending to produce a slight underestimation compared to observed values. The underestimation of NARR is also noted when studying daily data (not shown), suggesting that the low values over eastern regions at 3-h (Fig. 6b, d) are, at least partially, related with an underestimation of the potential AV of the NARR data.

More interesting is the behaviour of the rPAV measure (bottom panels in Fig. 7). As for the absolute PAV measure, rPAV decreases significantly compared to the 3-hourly values. In both seasons, the rPAV ensemble-mean value decreases by a factor of ~3–4 compared to the 3-hourly values (from 40 to 15% in cold season and from 60 to 15% in warm season). This decrease is due to the fact that the application of the temporal filter induces a different change in high and low spatial resolution 95th percentiles. As shown in Fig. 5, the relative change of the fine spatial resolution q mean,95 between 3-h and 16-day period (by a factor of 6–10) is much more important than the same change for the coarse-resolution q mean,95 (by a factor of 3 only).

The 16-days NARCCAP ensemble mean rPAV measure still shows higher values in mountainous compared to non-mountainous regions, with values of 17 and 9%, respectively, for cold season, and 24 and 13%, respectively, for warm season. In the cold season, NARCCAP ensemble mean results are in very good agreement with those obtained using the observed datasets. In the warm season, however, CPC and NARR show almost identical values of rPAV no matter the region considered, suggesting no clear influence of surface forcings in this season. In contrast, UWash mean values over mountainous and non-mountainous regions are of 20 and 14% respectively, indicating that there is some impact of surface forcings in agreement with NARCCAP mean results. Whereas all datasets suggest similar values for the rPAV in non-mountainous regions, the differences between datasets arise in the representation of rPAV in mountainous regions. Given that the PRISM algorithm has exhibited a superior performance than others geostatistical methods in distributing point measurements of precipitation (see Daly et al. 1994), differences in mountainous regions may be interpreted as an underestimation of CPC rPAV compared to UWash data. The reasons of this underestimation are not well known but could be related with a misrepresentation of stations in these regions. The CPC station density is highest in the eastern two-thirds of the United States with lowest values over western regions (Higgins et al. 2008) where the complex topography would demand for higher densities.

5.3 Local (spatial-maximum) potential added value results

So far, we have analyzed a regional measure by computing the PAV quantity using spatial-mean percentiles (q n,m95,mean ). In this section, we present results obtained by using a more local measure of the fine spatial scale variability by computing the PAV quantity with q n,m95,max (see Sect. 4.2). The use of the PAV measure computed from q n,m95,max could be interpreted as an estimation of the maximum value that can be obtained from RCM simulations by considering individual grid-point (i.e., local) results over a given region. As was already stated, differences between q n,m95,mean and q n,m95,max arise mainly due to the presence of horizontal gradients of stationary forcings and so they should be more important in those regions with complex topography.

5.3.1 High temporal resolution data

Top panels in Fig. 8 show the 3-hourly q n,m95,max PAV for the different models as a function of regions for cold (Fig. 8a) and warm (Fig. 8b) seasons. In most of the regions, the PAV values computed from q n,m95,max are around twice as large as in the spatial-mean case (q n,m95,mean ) (see top panels in Fig. 6), with the exception of the LON118 region that shows PAV values on the order of six times larger than in the mean case. In both seasons, the spread between the several HRD estimations is somewhat larger than in the mean case, with absolute differences on the order of ±15 mm/day.

Fig. 8
figure 8

As in Fig. 6 but for the 3-hourly local PAV and rPAV measures

Bottom panels in Fig. 8 show the spatial-maximum rPAV for cold (Fig. 8c) and warm (Fig. 8d) seasons. rPAV values are also higher in the spatial-maximum case than in the spatial-mean case (see bottom panels in Fig. 6), but differences are very dependent on the region and the season considered. In cold season and mountainous regions, the NARCCAP ensemble-mean rPAV value is ~40% for the regional measure and ~65% for the local measure. For the same season and non-mountainous regions, NARCCAP mean rPAV value is ~30 and ~45% for the regional and local measures, respectively. Similar results are found for the NARR dataset.

In the warm season, the NARCCAP mean rPAV values is ~70% (60%) in mountainous regions and ~60 (~55%) in non-mountainous regions for the local (regional) measure. In this season, much smaller values on the rPAV measure are estimated when using the NARR dataset. As it will be clear in the next section when including in the analysis CPC results, differences between NARCCAP and NARR arise because NARR tend to slightly underestimate rPAV values in both eastern and western regions, and the NARCCAP ensemble-mean tends to overestimate rPAV values, particularly in western regions.

5.3.2 Low temporal resolution data

Figure 9 shows PAV (top panels) and rPAV (bottom panels) for the spatial-maximum 95th percentile for the 16-day temporal scale. As in the spatial-mean case, PAV values are much smaller than in the 3-hourly case, generally smaller than 6 mm/days with the exception of western regions in cold season (see Fig. 9a).

Fig. 9
figure 9

As in Fig. 6 but for the 16-days local PAV and rPAV measures. Red and blue lines represent CPC and UWash results, respectively

Interestingly, in both seasons, q n,m95,max rPAV results (Fig. 9c, d) show that the relative importance of small-scale features in western regions is quite well preserved after the temporal averaging, with a NARRCAP ensemble-mean rPAV value of ~55% (versus ~65% in 3-hourly data) in cold season and of ~60% (versus ~70% in 3-hourly data) in warm season. The fact that rPAV is barely sensitive to temporal average shows that locally the surface forcing component of rPAV (i.e., stationary forcings) in mountainous regions plays an important role (non-stationary convective activity will, for example, tend to be cancelled out in temporal average). In regions with lower influence of surface forcings, a larger decrease of rPAV is noted when comparing 3-hourly and 16-day spatial-maximum values, with rPAV values near ~25 and ~45% (~30 and ~60%) for 16-day and 3-hourly data, respectively in cold (warm) season.

To what extent the results obtained for the 95th percentile can be extrapolated to other climate statistics? As mentioned, an analysis similar to this one was conducted for other climate statistics such as temporal mean, wet-events statistics and other percentiles. For example, the spatial mean of the temporal average is conserved for changes in the spatial resolution of the data and so the PAV associated with this quantity is nil. However, the spatial maximum of the temporal mean is not conserved (i.e., locally, the mean value can be different) and can be used to estimate the associated PAV. Results (not shown) suggest almost identical results as for the 95th percentile, with slight decrease in rPAV values. That is, for local measures, the sensitivity of the temporal mean to changes in resolution tends to be similar to those found in high-order percentiles.

6 Discussion

The use of RCMs to dynamically downscale large-scale atmospheric fields in present and future climate conditions has gained popularity during the last 20 years. There is still a need, however, to objectively quantify the added value obtained by the RCM downscaling technique. For example, specific knowledge about where and with respect to which climate statistics RCMs can produce more skilful results than GCMs constitutes a very useful information for climate-scenarios users such as those performing impact and adaptation studies. Studies trying to validate the RCM downscaling technique are also essential to highlight the importance of developing RCMs and the use of its products instead of those coming from lower resolution GCMs in some particular applications.

This article concentrated on the characterization of a necessary condition that RCM-simulated climate statistics must satisfy in order to generate some AV: that the climate statistics of interest contain some fine spatial scale variability that is absent in coarser GCMs. This prerequisite condition and its dependence on several factors (seasons, regions, etc.) was assessed in the context of a perfect-model framework, designated as potential added value framework, that includes:

  1. 1.

    The multi-resolution method is used to aggregate at several spatial and temporal scales the original high-resolution precipitation fields simulated by six RCMs (NARCCAP; Mearns et al. 2009) and as represented by a reanalysis (NARR; Mesinger et al. 2006) and two observation gridded datasets (CPC, Higgins et al. 2000; UWash, Maurer et al. 2002). The MR technique is particularly suitable for the precipitation variable due to its non-periodicity (both in time and space), which allows performing a local analysis that cannot be done with, for example, Fourier-based procedures.

  2. 2.

    95th percentiles are computed from each of the several datasets defined by the MR technique based on two different methods: one that estimates the spatial mean (regional) 95th percentile over a given region and a second that estimates the maximum (local) 95th percentile computed from individual grid points over a given region.

  3. 3.

    Potential added value (PAV) measures are then defined as the difference between 95th percentiles estimated at large (GCM scale) and small (RCM scale) spatial scales for every high-resolution dataset.

The methodology appears to be robust to small changes in spatial scales and the location and size of regions. Several sensitivity tests were performed by slightly changing these three parameters and PAV values changes were on the order of 5–10%, rarely exceeding 15% in mountainous regions for longest temporal scales due to the lesser number of data. In any case, regional and seasonal dependence of PAV measures remains the same after the slight changes in parameters.

The bulk of results found by applying this methodology are summarized in Fig. 10 for the regional (Fig. 10a, b) and the local (Fig. 10c, d) rPAV measures. Results are shown for the NARCCAP RCM ensemble-mean and for NARR, CPC and UWash datasets when available. NARCCAP ensemble error bars are estimated by using the standard deviation computed from the ensemble of NARCCAP-RCM estimations. In general, results tend to confirm some statements generally outlined with respect to the advantages of using high-resolution RCMs. For the regional measure we obtain:

Fig. 10
figure 10

Ensemble-mean values for the rPAV measure computed from the regional 95th percentile for a 3-hourly and b 16-days data. Also shown is the local 95th percentile for 3-hourly (c) and 16-days (d). Colors denote NARCCAP (grey squares), NARR (black crosses), CPC (red diamonds, when available) and UWash (blue triangles, when available) results. West and east designate the mean value obtained for regions to the west and east of the −98° of longitude, respectively. Error bars are given by the standard deviation of the several NARCCAP-RCM estimations

  • PAV is much higher for short temporal scales due to the influence of transient forcings (e.g., convection) that tend to be filtered out by the time-averaging process. rPAV is 3–4 times larger in 3-hourly (see Fig. 10a) data than in 16-day mean data (see Fig. 10b).

  • PAV is higher in warm compared to cold season due to the larger fraction of precipitation falling from small-scale systems (e.g., convection) in warm season (see Fig. 10a, b).

  • Regions of complex topography (i.e., western regions) induce an extra component of rPAV, no matter the season or the temporal scale considered. Its relative importance is larger for long-term mean quantities and cold season due to the relatively minor importance of transient PAV sources (see Fig. 10a, b).

  • Assuming that the UWash precipitation analysis constitutes the most reliable estimation of the real climate PAV, then the NARCCAP-RCMs ensemble-mean constitutes a very good approximation of the PAV measures with a slight overestimation of PAV in warm season and western regions. NARR tends to produce a slight underestimation of PAV values in warm season and in eastern regions.

When assessing the local measure some differences appear:

  • No matter the region and season considered, there is an increase in rPAV values compared to the spatial mean rPAV estimations.

  • The relative importance of the orographic component in the rPAV measure is larger than in the spatial mean case (see Fig. 10c, d), particularly for longer temporal scales.

Results point out that the potential of RCMs to add some value can be very limited when considering time–averaged statistics for regional measures. For example, the spatial-mean rPAV for 16-day means data is on the order of 10–15% for non-mountainous regions in both warm and cold seasons.

The estimated PAV was derived from the precipitation field, a variable that is particularly characterized by a flat power spectrum with a sizable variance in a wide range of spatial scales. PAV is expected to be less important for variables with a steeper power spectrum (e.g., geopotential height, temperature, sea level pressure), but this speculation remains to be confirmed and quantified.