Introduction

A number of climate simulation studies based on regional climate models (RCMs) nested within global climate models (GCMs) have now been carried out over Europe, North America, China, and now, through the COordinated Regional climate Downscaling EXperiment (CORDEX) [1], most regions of the world. The purposes of these programs are multifold and include exploring the uncertainty in regional model simulations based on multiple RCMs driven by multiple GCMs and provision of scenarios of climate change for the impacts and adaptation research communities (e.g., [2, 3] for Europe). The major program of this type over North America, the North American Regional Climate Change Assessment Program (NARCCAP), similarly has multiple goals. In this paper, we explore how the results of the NARCCAP [4•] have been used for impacts and adaptation research. In particular, we examine why the researchers who chose to use NARCCAP results did so, how they used the NARCCAP data, and what the major outcomes of their research were. We also consider how these experiences point to possible improvements of such programs (e.g., CORDEX) so that their value for impacts users is increased. We first present a brief overview of NARCCAP and a summary of the ways in which researchers have used the output. We then attempt to assess lessons learned from these articles. To our knowledge, this is the first such attempt for any of the RCM-GCM programs.

The Issue of Spatial Scale and Impacts Research

The issue of the mismatch of spatial scale of climate models and the needs of impacts researchers has long been with us [5]. And it is clear that some form of downscaling is needed for most impacts use of climate model output. There are generally two different types of downscaling approaches for rendering global climate model results useful for impacts research: statistical and dynamical techniques, which have been well reviewed in the literature. The reader is directed to these [6, 7, 8 and 9•], for more information. Nesting regional climate models in GCMs is the major dynamical approach. Large statistical downscaling projects using the GCMs of the Climate Model Intercomparison Project versions 3 and 5 (CMIP3 and CMIP5) [10, 11•] have been produced to support impacts.

One critical issue is determining the approach to use for downscaling for impacts assessment. Essentially the statistical techniques are computationally much cheaper than using dynamical downscaling, but it is hard to determine if they really add value to the climate scenario. But they do allow the use of many GCMs (which often is important as stakeholders need to consider the range of future projections). However, most statistical downscaling techniques amount to “training” a transfer function that relates high spatial resolution observations (often gridded) to lower spatial resolution model output for a historical period and then applies the same transfer function(s) to GCM future climate scenarios. The weak link is the assumption that the same transfer function that was relevant in the historical period is applicable to the future. In our experience, many user groups recognize this weakness, but are faced with the choice between using a small number (in many past studies, one) of RCM scenarios and a larger number of statistically downscaled GCM scenarios that provide an idea of the range of variability. There is, therefore, a critical need for structured RCM studies that produce a sufficient range of future scenarios (essentially RCM × GCM combinations).

Basics of NARCCAP

NARCCAP was developed to examine the uncertainties in future regional climate associated with dynamical downscaling and to provide to the climate impact and adaptation communities relatively high-resolution (50 km) climate change scenarios (for numerous variables) with sub-daily (3 hourly) output frequency for use in assessing the impacts of climate change [4•]. Four different GCMs (from the CMIP3 set of simulations) provided boundary conditions for six different RCMs, using a balanced fractional factorial matrix sampling design, for 30 years of a current period (1971–2000) and 30 years of the future (2041–2070) using the SRES A2 emissions scenario [12]. In addition, 25 years of simulation with each RCM was produced using boundary conditions from the NCEP R2 reanalysis [13]. In all, 11 sets of current and future climate simulations were produced, covering nearly half the 4 × 6 GCM/RCM matrix. In addition, two global atmospheric model time slices at the same resolution were produced for the NCAR GCM (CAM3) and the GFDL atmospheric model (AM2.1). A large volume of data (∼40 TB), comprising 53 different variables at 3-h intervals, was produced and made available to the larger climate research community (URL:www.narccap.ucar.edu).

Overview of the Use of NARCCAP Output

The NARCCAP dataset has been used for various purposes, and we point to the number of output users (over 1000), the number of papers and reports published (over 110) that employed the NARCCAP output, and the number of times (about 1000) the program and/or output has been cited. What is most important, however, is exploring what we have learned about the generation and use of future climate projections using RCMs and implications of this use for impacts.

We divided the publications based on standard categories in climate and climate change research, and in relation to the number of articles in each group: (1) climate analysis (including evaluation and projections of future climate and exploration of added value), (2) impacts (using NARCCAP output to determine the effect of climate change on various resource systems). (3) extremes (on various time scales), (4) development of statistical methods, and (5) others (including educational purposes and use of NARCCAP as a test bed for developing evaluation systems or tools). Assessments of the areas other than impacts will be covered in future articles. Over 30 articles and reports have been produced in the impacts area (although new articles continue to appear).

The use of the data in impacts research is highly diverse. Published research covers hydrology, water resources, distribution of bird species, road safety, urban storm water structural design, chemical weathering, freeze-thaw cycles, species habitats, forest drought, human health, pollen production, agriculture, and wild fire risk. We examined all these articles to determine (1) why the NARCCAP data were used for the study and (2) exactly how it was used and (3) assess the lessons learned from these articles.

Making NARCCAP Output Useable in Impacts Studies

Maximizing Usability—the NARCCAP Website and Data Management

The NARCCAP program took a hands-on approach to data management in an effort to maximize the usability of the data to impacts users, providing guidance on accessing and using data through workshops, extensive documentation on the website, a video tutorial, mailing lists for announcements and discussions, and consultation to anyone who contacted the data management team. This was an important feature of the program because the ways that model output is used in impacts studies differ considerably from its use in climate analysis. Atmospheric scientists are typically interested in looking at the behavior of a model across regional or larger areas; they have specialized tools and computer systems for dealing with numeric analysis on the large scale, and the datasets are archived in a form that is tuned to that use pattern. Impacts users, by contrast, are more often interested in a small set of mainly surface variables in a single region or a single location. This “hands-on” aspect of the NARCCAP experience has been and continues to be used to inform the development of new scientific data gateways and access tools that will better enable the use of climate model output in impacts and other research.

Bias Correction

A common problem impacts users face when using climate model output is the biases in the climate model simulations. If the raw model outputs are used directly as inputs for an impacts model, the results will in general not resemble that of the impacts model output when using actual observed climate as input (e.g., [14]). This issue of bias correction has received considerable attention over the past 15 years or so. Wood et al. [15] showed that the problem of bias exists for both GCMs and RCMs.

Historically, the simplest and most common method of bias correction has been the “delta method” [16], wherein the current climate is represented by observational data and the future climate is constructed by adjusting the observations by the difference between the current and future model runs. (For precipitation, this is a multiplicative adjustment based on ratios instead of differences.) The delta method is simple and preserves the temporal structure and other features of the observed data, but because it only shifts the distribution of values, it is limited in its ability to project changes in extremes, variability, and so on.

Over the past decade, a host of methods have been developed to correct the biases in current and future simulations. The bias-corrected climate model output is then used directly as input to impacts models. Most statistical downscaling methods perform a bias correction along with downscaling (e.g., the widely used Bias Correction and Spatial Disaggregation or BCSD method [15]).

While these methods have been widely used to correct biases in global model output (e.g., [10]), they are also useful as post-processors to RCM output. Most commonly, methods such as BCSD or MACA, the more recent Multivariate Constructed Analog method [17] (typically applied on a daily basis), are applied to gridded GCM or RCM output fields using gridded observations as climatology. This post-processing is used to correct the entire distribution of values (usually of temperature and precipitation) at each point value (i.e., grid box value). Various comparisons of these methods have been made [18].

Since the developers of NARCCAP did not provide a standard set of bias-corrected results, the users of the data were obliged to decide how to deal with the biases themselves. Chen et al. [19], for example, used the output from NARCCAP to test six different bias correction methods and assessed the impacts of the methods on hydrology using a lumped empirical hydrology model. They found that distribution-based methods are always superior to mean adjustment methods. Another important issue is that the methods assume stationarity (i.e., that the biases in the current period will be similar in the future), which is impossible to directly verify. Teutschbein and Seibert [20] provide some evidence for the appropriateness of assuming stationarity based on a split sample design and application to hydrologic basins in Sweden. The issue of stationarity of biases is far from resolved, however, and this issue emerges in some of impacts research discussed here.

Review of the Research

Hydrology/Water Resources

The greatest number of impacts articles (currently 14) concern hydrology and water resources. Most such applications attempt to assess the implications of climate change and variability over the next approximately half a century for hydrology (mostly stream flow) and/or water resources (e.g., reservoir system reliability). The approach taken by most such studies is to use the (bias corrected) NARCCAP future climate simulations to force a hydrological model and assess changes relative to similar simulations forced with NARCCAP output for the historical period. Most such studies recognize the major advantage of NARCCAP regarding the higher spatial resolution of the simulations, but they also recognize the need to remove the effects of the RCM biases (see, e.g., [15]).

Among studies that have used the NARCCAP output in this way are those of Bürger et al. [21], who projected future changes in the hydrology of the upper Columbia River watershed; Grillakis et al. [22], who assessed future discharge changes in Spencer Creek, southern Ontario; Qiao et al. [23], who studied future hydrologic changes in the lower Missouri River basin; Shrestha et al. [24] and Zhang et al. [25], who both studied future discharge changes in the Upper Assiniboian River of southern Saskatchewan; and Sulis et al. [26], who performed a similar application for the des Anglais River basin of southwestern Quebec.

Takle et al. [14] applied NARCCAP output without bias correction to simulate stream flow changes at multiple locations in the Upper Mississippi River basin and found a large range of stream flow projections, at least some of which were attributable to the effects of bias. Moreover, this research compared results using GCM input versus the NARCCAP results. Chen et al. [27•] conducted a study that focused on the effects of alternative empirical downscaling approaches compared to NARCCAP output for determining changes in hydrological mean and extreme flows.

One of the most comprehensive hydrology studies to use the NARCCAP output is the 20 Watersheds Study [28•, 29•]. Hydrology and water quality models were applied to 20 different large river basins throughout the USA to characterize the sensitivity of stream flow, nutrient loading, and sediment loading to a suite of mid-twenty-first century scenarios. Six different NARCCAP sets (current and future) of simulations based on different GCM-RCM pairings were used, as well as two statistical downscaling approaches starting with the same GCMs: bias correction spatial disaggregation (BCSD) [10] and the delta approach. These were applied to two different hydrologic models at five of the 20 different basins throughout the USA. Both the selection of underlying GCM and downscaling method significantly affected the changes in stream flow and water quality. Importantly, the effect of the variability between downscaling of a single GCM with different RCMs on resultant change in hydrology can be of the same order of magnitude as the effect of the ensemble variability between GCMs (on changes in hydrology). The six NARCCAP scenarios (i.e., six pairs of current and future simulations) were also used at all 20 basins. Just considering the six NARCCAP scenarios, results for change in total flow volume varied considerably at most locations, with some scenarios producing increased and others decreased flow. Figure 1 illustrates these results for the six different NARCCAP scenarios for the 20 watersheds. It is striking that at all basins except the one in Alaska, both increases and decreases in flow are projected depending on the scenario used. However, the study did not discuss whether one method of downscaling or any particular NARCCAP model result was considered preferable over the others. Such a determination would require much more in-depth process level analysis of the climate model simulations.

Fig. 1
figure 1

Total simulated future stream flow volume relative to current conditions based on NARCCAP current and future climate simulations for the 20 watersheds (from data displayed in Table 7-7 in [28•]). Climate scenario designations indicate the regional model acronym (upper case) followed by the driving global model acronym (lower case) according to NARCCAP convention [32•]. GFDL_slice refers to the time slice with the GFDL global atmospheric model

Sulis et al. [26] provided an example of combining the results of the NARCCAP simulations with other ensembles of RCMs to examine multiple sources of uncertainty in the hydrologic response in the des Anglais catchment in southwest Quebec. They used additional simulations of the Canadian RCM (CRCM) nested in two additional GCMs as well as five simulations of CRCM driven by five different realizations of the Canadian Global Climate Model version 3 (CGCM3). All simulations were bias corrected using a quantile-quantile approach. Each source of uncertainty represented by simulations of the different subgroups of climate models contributed significantly to the variability of discharge results.

Fire Risk and Forest Drought

Two articles focused on fire risk and changes in fire season, using different fire risk indices. Luo et al. [30] used the Haines Index (HI) to determine current and future effects of climate on fire in the western USA for the month of August using six different GCM-RCM combinations. The HI is calculated using the 700–500 mb lapse rate and dew point depression at 700 mb. No bias correction was performed. While there is clearly variability in the model results, there was general agreement that total and consecutive days of high fire risk (HI > 5) would increase in the west, although the range of the latter was quite large.

Liu et al. [31] used the Keetch-Byram Drought Index (KBDI), to examine fire risk in the continental USA, primarily using one of the NARCCAP scenarios, but making comparisons for some regions with all scenarios. No bias correction was used in calculating KBDI, which uses daily maximum temperature and precipitation as meteorological inputs. The results are typical for research about changes in fire risk, but the variability of results on the large regional scale based on the different scenarios was large. An important contributor in particular for the summer KBDI change values was the change in summer maximum temperature. This is not surprising given that Mearns et al. [32•] demonstrated that the RCMs, not the driving GCMs, were dominant in the determination of summer climate change, when forcings are more localized.

Williams et al. [33] devised a forest drought stress index (FDSI) for the southwestern USA, using long time series of tree ring data and applied it to future climate conditions using the full suite of CMIP3 simulations, as well as seven of the NARCCAP model combinations. They state that supplementing the CMIP3 suite with NARCCAP is desirable because of the higher resolution in this area of complex topography and that the NARCCAP results provide added value ([33] Supplementary Material). The NARCCAP results provide added credence to the CMIP3 results. All simulations point to very high increases in the FDSI by mid-twenty-first century. The most important climate variables affecting this response are the warm season vapor pressure deficit (VPD) and cold season precipitation. Interestingly, although the CMIP3 suite of models is much larger than that of NARCCAP, because many of the CMIP3 models did not provide variables that could be used to calculate VPD, the final calculations of changes in FDSI are based on seven NARCCAP current and future simulation pairs and simulations from only ten of the CMIP3 models. Thus, the NARCCAP results contributed substantially to the final conclusion of high confidence in substantial increases in drought stress by the mid-twenty-first century in the southwestern USA.

Human Health

One of the more certain effects of climate change will be increased heat stress in many areas, and several articles made use of the NARCCAP dataset to explore these future conditions [3437]. Both Li et al. [36] and Zhou et al. [34] used just one NARCCAP scenario to illustrate methods development, the latter for a sophisticated means of bias correction for some sights in Alabama and the former for establishing relationships between morbidity and heat stress in Milwaukee.

Grundstein et al. [35] examined changes in wet bulb globe temperature (WBGT) throughout the USA, using three NARCCAP scenarios to establish changes in the important threshold event of WBGT > 32.3 °C when outdoor athletic activity should cease. The physically based model of WBGT used requires inputs of air temperature, relative humidity, wind speed, global solar radiation, and surface pressure. Their approach to bias correction was to use a modified “delta” approach wherein the WBGT was calculated using actual climate observations and model current and future output. Then, the difference in the model future minus model current WBGT was added to the observed. They found, looking at the ensemble average of the NARCCAP model simulations used, that there is an increased frequency of oppressive days between 15 to 30 days per year across broad swaths of the country.

Jones et al. [37] combined an extreme heat index with anticipated change in population growth for the A2 scenario and devised a person-exposure index to indicate the combined effect of population growth and increased heat. They used all 11 NARCCAP RCM scenarios for maximum temperature (bias corrected using the method of Mcginnis et al. [38]) and found that in terms of exposure, which would increase substantially by mid-twenty-first century, population growth is just about as important as the change in incidence of temperatures above 35 °C. The NARCCAP output was of interest for this project because of the added value of higher resolution regional climate model simulations (discussed briefly in the paper) and the importance of temperature extremes, which are assumed to be better represented by higher resolution simulations.

Other Impacts Areas

Since the different impacts areas in which NARCCAP results have been used are so varied, we here highlight a few additional impacts studies that illustrate particular points.

Few impacts studies selected a subgroup of NARCCAP simulations based on their putative quality. However, Cotton et al. [39] used a subset of the NARCCAP current and future simulations to investigate the effect of climate change on twenty-first century chemical weathering rates. In this case, two of the NARCCAP model results were selected based on how well they reproduced annual mean precipitation compared with observations. Foresee and Ahmad [40] used five NARCCAP current and future simulations to assess storm water infrastructures in one basin in southern Nevada. Two of the model results were eliminated because their 6-h 100-year return event (the depth of which is the design standard for storm water facilities in this area) were higher than the observed values (from the North American Regional Reanalysis (NARR)). It was assumed that the NARCCAP values should be lower than those of the NARR since the resolution of the NARCCAP simulations is lower (50 km) than that of the NARR (32 km). While it is true that reproduction of extremes largely varies systematically with spatial resolution, there can be other reasons for over- or underestimating extremes, such as the mean error in precipitation. It is worth mentioning that one clear advantage of using higher resolution model simulations is better reproduction of precipitation extremes [41, 42]. For the three remaining climate change scenarios, the 6-h 100-year return event values increased substantially although the range was quite large. This result has significant implications for adaptation planning.

Two articles focused on agricultural impacts, but they examined several important issues in addition to the effect of climate change on crops. Diffenbaugh and Scherer [43] examined changes in suitability for wine grapes in California, and other western areas using three different ensembles, NARCCAP, CMIP3, and a collection of simulations with the RCM RegCM3. They found that intra-ensemble spread effect on the suitability is similar in the NARCCAP and CMIP3 ensembles, indicating that high-resolution uncertainty is as large as large-scale climate uncertainty. Also, bias correction appears to reduce all spreads. Glotter et al. [44] used four RCM simulation sets from NARCCAP, two of the Weather Research and Forecasting Model (WRF) and two of CRCM, driven by two different GCMs: the NCAR Community Climate Model version 3 (CCSM3) and CGCM3. They applied bias correction to simulations from both GCMs and RCMs and used those results in a maize crop model for an area of the upper Midwest. They concluded that the dynamical downscaling did not have much effect on the final calculations of changes in yield. However, this region of application does not include the types of regions (coastlines and complex topography) where the added value of higher resolution climate change information has been established over North America (see ‘Discussion and Conclusions’ section).

The NARCCAP results were also widely presented in the recently released US National Climate Assessment [45]. The dataset was one of three sources of information about future climate change in the report. For example, Wilder et al. [46] used mainly the NARCCAP results in their discussion of climate change and extreme temperatures in the Southwest, because of the higher spatial resolution.

Discussion and Conclusions

The Value of NARCCAP to Impacts Researchers

The main feature common to all the studies discussed here is their need for regional and finer scale information that is not directly available from GCM simulations. While there are various datasets that provide statistical downscaling of these coarser simulations, concerns remain regarding the assumption of stationarity required by these techniques and the limited number of variables downscaled. Greater confidence in the dynamically downscaled results and the assumption of added value are key reasons for their use. There have been several papers specifically demonstrating the added value of the NARCCAP simulations particularly for coastal areas and regions of complex topography [47•, 48• and 49•].

The main value of the NARCCAP model simulations was allowing impacts researchers to explore the uncertainty in climate change impacts due to the effect of the spatial scale of the climate projections (e.g., [14, 21, 23, 25, 27•, 43]). Such complete exploration of this important issue has heretofore not been possible because previous to NARCCAP, the necessary high-resolution scenarios were not available in sufficient number. The high spatial resolution of the NARCCAP simulations was the main reason given by authors for using the dataset. It also enabled exploration of other critical research issues in a more robust way. The adequate reproduction of extreme events, particularly extremes of precipitation, which is related to spatial scale, is critical to a number of these studies (e.g., [39, 46, 50]). Furthermore, the NARCCAP dataset served as a basis for exploring a host of bias correction techniques, which was necessary for many of the studies (e.g., [19]). Use with other ensembles [25, 33, 43] as well as contrasting results with those obtained using GCMs as the source of climate input [14] [43] expanded the important research questions that were explored.

An additional characteristic of NARCCAP that was important to some of the studies was its geographic extent, covering most of North America. While there have been many regional modeling experiments in certain regions of North America, there has never been such an extensive set of simulations over such a large area. This was crucial, for example, for the US EPA 20 Watersheds Study [28•].

A number of the impacts calculations were novel and previously unexplored, particularly those that required sophisticated and high-frequency outputs from climate models (e.g., [35, 33, 39]). This availability of a wide range of high-frequency variables (53 variables at 3-h intervals) was another major reason given by the authors for using the NARCCAP dataset.

The third reason authors gave for why they used NARCCAP, that the NARCCAP datasets were developed for use by impacts researchers, addresses one of the important elements of credibility and usability of scenarios [51, 52•]. This concerns the trust or confidence in the products of particular producers of future climate information. The NARCCAP team devoted effort to building this trust by making resources aimed at impacts researchers available on the project website, soliciting their participation in workshops, and offering hands-on support in accessing and using the data.

Some authors referred to all three reasons for using NARCCAP (e.g., [34]). In contrast, some of the articles do not explain why NARCCAP is used, perhaps because the authors consider the reasons self-evident.

Moving Forward

We are at a critical point in the formation and application of climate scenarios over North America in impacts and adaptation work because an increasing number of climate scenario products are available, and decisions need to be made regarding what climate change information will be used for the next US National Climate Assessment [53, 54]. The work of Maurer et al. [10, 11•] shows that relatively simple empirical downscaling can be used on a wide array of GCMs (both the CMIP3 and CMIP5 sets), and thus cover a considerable range of the uncertainty represented in GCMs. However, datasets such as these are limited by the number of variables made available. They still provide only temperature and precipitation and do not provide sub-daily values needed by many applications. While GCM-RCM simulations provide results that have demonstrable added value, it is easier and cheaper to sample all GCMs using the empirical downscaling approach, although the added value of these results is hard to determine. A desirable way forward would be to continue to use both methods and more rigorously compare the results and determine the credibility of the different downscaling approaches [55]. Work on further empirical downscaling of regional climate model results should also be encouraged (e.g., [21]). Some of the NARCCAP studies compared the effect of different downscaling and bias correction techniques on impacts calculations, particularly hydrology [27•, 26], but they made no determination of preference across the methods, only a characterization of the resulting uncertainty.

While NARCCAP produced much useful and usable information for the impacts and adaptation communities, the resulting datasets have certain limitations that constrained how the data could be effectively used by these communities. The four GCMs used to drive the RCMs do not well represent the range of climate sensitivities of the full set of CMIP3 models. Thus, NARCCAP results could not be used to represent the broad climate model uncertainty of CMIP3. Furthermore, the quality of the boundary conditions of the GCMs was not considered in their selection. The selection of the GCMs was more based on opportunity and desirable compromise with the international partners of NARCCAP. Additionally, while the spatial resolution of NARCCAP is a considerable advance even over the CMIP5 GCMs, many processes and landscape features would benefit from even higher resolutions. The dramatic increase in computer power over the past 8 years makes 150-year simulations over continental domains at higher resolutions (25 or even 10 km) quite feasible.

We recommend within the context of CORDEX, the establishment of a North American program using an experimental design that samples a matrix of RCMs and GCMs (from CMIP5) more carefully so that the full range of climate sensitivity is sampled and quality control of boundary conditions established. We also recommend a higher spatial resolution of at least 25 km. With such a carefully constructed framework, it would not necessarily be required to produce as large a dynamically downscaled system of simulations as has been done with the empirically downscaled approach. Such a framework would greatly enhance the value of this next set of RCM simulations for the impacts and adaptation communities.