1 Introduction

The intensifying of extreme heat events, both as a consequence of current anthropogenic climate change and as projected for the future, always captures the attention of a world that continues to be vulnerable to such events (see Anderson et al., and references therein, in this issue, for a study on the effects of heat extremes on human mortality and its context within the scientific literature). Much work has been devoted to the characterization of changes in the statistics of such extremes as observed (Donat et al. 2013, 2014) or as simulated by climate models (Sillmann et al. 2013a, b; Kharin et al. 2013), and about the portion of an individual event’s severity that can be attributed to anthropogenic interference in our climate and. As the climate warms it has been shown that the tail behavior of temperature distributions is often the canary in the coal mine, in the sense of revealing statistically significant changes even before changes in mean quantities emerge from the noise of internal variability (Zhang et al. 2011). In fact, Sanderson et al. in this issue shows that this is indeed the case in the type of simulations we use here.

Comparisons of heat extreme behavior among different future scenarios in the CMIP5 models have been presented before in both primary literature (Sillmann et al. 2013a, b; Kharin et al. 2013) and assessment reports (Collins et al. 2013), but using often a broader brush in depicting differences and focusing less on the uncertainties affecting the statistical estimation of these quantities and their changes than we attempt to do here. Here we set out to characterize as precisely as possible, albeit within a single climate model’s behavior, the gains or losses in terms of heat extreme occurrence and intensity that we can expect under two alternative scenarios: one, RCP8.5, does not apply any mitigation policy to human emissions of greenhouse gases, reaching a global radiative forcing of about 8.5Wm−2 by the end of the century; the other, RCP4.5, does impose stringent mitigation measures and thus limits that forcing to about 4.5Wm−2. The global mean temperature is projected by CESM1-CAM5 to reach about 2 °C above its 1986-2005 baseline value at the end of the 21st century under RCP4.5 forcing. Under RCP8.5, warming of 2 °C is attained by mid-century, eventually exceeding 3 °C by 2100 (see table S1 for details of the models transient global climate response).

This study of avoided impacts is part of a larger project exploring the Benefit of Reducing Anthropogenic Climate changE (BRACE, O’Neill and Gettelman, this issue) from an RCP8.5 world to an RCP4.5 world, with respect to many different types of impacts. Other BRACE studies focus specifically on extremes (Fix et al.), and heat extremes in particular (Oleson et al.; Marsha et al.; Jones et al.), together with their impacts on, for example, human health (Anderson et al.) or crop exposure to damaging events (Tebaldi and Lobell).

2 Data and methods

Like most other studies under this project, we use two large initial condition ensembles, run with NCAR-DOE CESM1-CAM5 (Hurrell et al. 2013; Meehl et al. 2013), the same version of the model for which CMIP5 simulations are available, and that was evaluated by Sillmann et al. 2013a in terms of fidelity to the observed behavior of climate extreme indices.

The larger 30-member ensemble (RCP8.5 henceforth) is initialized in 1920 by numerically perturbing atmospheric conditions. Historical forcings estimated from observations are applied until 2005, and are followed by external forcings according to RCP8.5 (vanVuuren et al. 2011) for all 30 members (Kay et al. 2014). An additional experiment (RCP4.5 henceforth) was constructed for the BRACE project from 15 of the historical members by applying RCP4.5 forcings (vanVuuren et al. 2011) from 2006 onward (Sanderson et al., this issue). The availability of such large initial condition ensembles – a characteristic seldom found in fully coupled model simulations because of their computational cost and storage requirements – allows a more thorough exploration of the effects of internal variability on the projected changes. As a result, our statistical analysis, even though aimed at relatively rare events, has large power to detect changes within a single scenario as time evolves (we will sometime refer to this type of changes as within-scenario), and between the two scenarios at concurrent times. The availability of the large ensemble for historical conditions allows a precise estimate of present conditions, serving as a baseline, as well. Of course, internal variability is only one of the uncertainty sources affecting future projections, model structural choices representing an important additional one that can’t be addressed by our use of this model alone. Nonetheless we consider this a useful first step in characterizing benefits of mitigation and we hope more models will run the type of ensembles used here so to put our results into a larger and more robust context.

We choose four quantities that address the changing behavior of heat extremes both as isolated occurrences and as prolonged events, akin to heat waves: the annual maxima of minimum and maximum daily temperatures and the annual maxima of 3-day average minimum and maximum daily temperature. Note that the first two indices, TXx and TNx, respectively, are part of the suite identified by the Expert Team on Climate Change Detection Indices, ETCCDI (Zhang et al. 2011; Alexander et al. 2006) as indicators of a changing climate, and are regularly monitored in observations and calculated from model output in a standardized fashion, making a substantial part of our study comparable to multi-model studies in the literature (Donat et al. 2013, 2014). The availability of these indices computed from observations allows us to perform a basic validation exercise of the model output over the period 1951–2003 (common to the observations). We report results of the validation in the supplementary material. As can be gauged from those, large biases in the climatological means should be acknowledged, but – arguably more relevant for our comparison of future changes – better results are obtained when validating trends and changes within the observed period. For those, observed metrics fall within the range of the ensemble results in most cases (at most grid-points) both for maximum and minimum temperature extremes.

We consider the 1-day quantities mainly relevant for assessing changes in the physical climate system, while the 3-day quantities have a more direct relevance for impacts: for example, the Chicago heat wave of 1995 was considered particularly devastating because of high temperatures during three consecutive nights, and past studies have used the 3-day minimum temperature metric as a definition of heat wave (Meehl and Tebaldi 2004). We make statistical inference on the 20-year return values of the four quantities using a standard approach from EVT, i.e., modeling them through a Generalized Extreme Value (GEV) distribution (Coles 2001). Because of the large sample sizes that we form by sampling the RCP4.5 and RCP8.5 members we can achieve high precision in our estimates of the GEV parameters, thus detecting statistically significant differences in the statistics of these extremes robustly and early, as we will describe later. Also, the availability of these many replications makes the application of the GEV distribution more appropriate than in other studies, by better satisfying the assumption of quasi-stationarity of the data sample: We construct samples by using 10-year windows around each date of interest, but replicate the sampling across all ensemble members available, so that the sample of extremes is formed by 300 annual maxima for RCP 8.5 and 150 annual maxima for RCP 4.5. This approach assumes quasi-stationarity over a decade, which is not strictly speaking true. However, the trends in temperature within a decade are relatively small compared to internal variability, thus the errors from this approximation are negligible. We remark that the number of simulations under both RCP4.5 and RCP8.5 permits selection of a shorter time period yet results in a larger sample than any previous GEV based temperature extreme analysis known to us. Formally, for each time window t1 < t < t1 + 9 we compute annual maxima of the four variables for all the ensemble members available and use the 300 (150) samples to fit the parameters of the GEV distribution by the L-moments method of Hosking and Wallis (1997). We describe in the supplementary material the details of the statistical analysis, but here we note that previous studies have successfully performed goodness of fit tests on the use of the GEV distribution for temperature extremes (Wehner 2010 and references therein).

We focus on 20-year return values of the four measures of extreme heat, interpretable as those levels exceeded with a 5 % chance in a given year.

We compute baseline values at each grid point using the 10 years between 1996 and 2005 across the historical portion of the RCP8.5 30 members. We then compute 20-year return values at 3 time periods along the 21st century: 2020–2029, from now on identified simply as 2025; 2040–2049, i.e., 2045, and 2070–2079, i.e., 2075 (note that RCP4.5 members only run to 2080). For all three periods we then quantify changes from the baseline and compare them between the two scenarios to derive a measure of avoided impacts in heat intensity through mitigation.

For the same three periods in the future we also compute the new return period (i.e., the new expected frequency, or chance of occurrence) of the current (simulated) 20-year return values and measure the change as a risk ratio, defined as the ratio of the new frequency to the old frequency. A risk ratio of 20 is the largest obtainable value, and indicates that the 20-year return value in the current period becomes in the future a yearly value. A risk ratio of 1 indicates no change in frequency (note that the higher frequency obtainable in our analysis, by construction, are yearly occurrences). Therefore, when we compare the two scenarios by this metric we address avoided impacts in terms of differences in future frequency of heat events of current magnitude. The risk ratio is closely linked to a measure popular in the event attribution literature, the FAR or Fraction of Attributable Risk, where the probability of occurrence of a given event is evaluated under control and anthropogenically changed conditions, and the ratio in these two probabilities is used to “attribute” the fraction of the event’s chance of occurrence that was enhanced due to anthropogenic interference in the climate. In our case the attribution is either to the enhanced radiative forcing under a given scenario compared to present-day conditions or to the differential in radiative forcing when comparing RCP4.5 and RCP8.5. In our analysis we cannot distinguish global radiative forcing differences from regionalized differences due to, for example, short lived climate forcers (aerosols) and land use changes. See Xu et al. and Lawrence and O’Neill in this issue for studies addressing regional forcers explicitly.

3 Results

3.1 Projected changes in return values

Figure 1 shows the simulated present day 20-year return values for the four quantities of interest. These values constitute our baseline, and we compute changes – and their statistical significance -- for the two separate scenarios, RCP4.5 and 8.5 and the three decades centered around 2025, 2045 and 2075. We compare these two sets of changes to estimate the avoided impacts (or from the opposite perspective, the benefits) of mitigation, i.e., of following the lower scenario, RCP4.5. The geographic patterns and relative magnitude of these extreme heat measures from CESM1.0 are similar to a CMIP5 multi-model average (Kharin et al. 2013). Maximum and minimum temperature events of the same length show similar geographic features, while one- and 3-day events show similar absolute values (an effect of the persistence of temperatures from day to day, especially in the summer months) but the intensity of 1 day events extends generally over larger adjacent regions. We will see that despite these similarities in current behavior, changes in intensity and frequency affect differently these indices. Here and in the following we mask out the oceans to highlight impacts over land, since impacts on marine ecosystems from heat stress are related to sea surface temperatures and ocean layers’ temperature rather than air temperature.

Fig. 1
figure 1

Magnitude, in degrees C, of current 20-year events for 1-day (left) and 3-day (right) maximum (top) and minimum (bottom) temperature extremes. These 20-year events for current climate are computed using the 10-year window 1996–2005 from CESM1 historical simulation consisting of 30 initial condition ensemble members

We now consider changes in the return values of Fig. 1 as the 21st century unfolds under the two scenarios.

Figures 2 and 3 show changes in daily and 3-day metrics for maximum temperatures for the three future periods under both scenarios, and differences between the two scenarios (Figure S1 and S2 show corresponding results for minimum temperature). Only statistically significant changes are shown as colored regions, with other regions left blank. (We define a change to be statistically significant – compared to either the current period or between the two scenarios – if the interval extending 2 standard deviations on both sides of the value of the metric does not include zero. Standard deviations are computed using the parametric bootstrap technique from Hosking and Wallis (1997) described in the supplementary information). Due to the statistical power granted by the size of the ensembles, statistically significant changes within-scenario are pervasive even when comparing the baseline period of 1996–2005 to the closest future period of the 10 years around 2025, but differences between scenarios do not emerge over the majority of the land areas until the middle of the 21st century. Both of these characteristics hold true for all four metrics considered. A comparison of within-scenario results reveals that warming of maximum temperature extremes (i.e. daytime extremes) is larger than that of minimum temperature extremes (i.e. nighttime extremes). (Figs. 2 and 3 vs. S1, S2 with common color scale). We note that this relative sensitivity to global warming of the diurnal (and 3 day) cycle, is opposite to that of the seasonal cycle, where cold winter temperature extremes are projected to warm substantially more than hot summer temperature extremes (Kharin et al. 2013).

Fig. 2
figure 2

Change in the intensity of 20-year events for 1-day maximum temperature events. The three future periods around 2025, 2050 and 2075 are shown along the rows. Maps along the left column show changes under RCP4.5. Maps along the central column show changes under RCP8.5. Maps in the right column show differences between the corresponding maps to the left, i.e., the avoided impacts in the intensity of extremes from mitigation. All colored areas have statistically significant values (see text for definition). Areas where changes are not significant are left blank (as are the oceans). RCP4.5 results based on 15 ensemble members, RCP8.5 results are based on 30 ensemble members

Fig. 3
figure 3

Like Fig. 2, but for extremes computed from annual maxima of 3-day average maximum temperature

Due to the larger changes in maximum compared to minimum temperature extremes, the benefits of mitigation are correspondingly larger for maximum than they are for minimum temperature metrics by mid-21st century (compare the right-hand columns of Fig. 2 to Figure S1 or of Fig. 3 to Figure S2). A comparison between 1 and 3-day projected changes shows a larger warming for the latter, in both maximum and minimum temperature extremes. However despite this higher within-scenario warming of the 3-day metrics, the benefits of mitigation are almost identical (compare the right-hand columns of Figs. 2 and 3, and, similarly, S1 and S2).

A quantitative analysis tells an even more compelling story. Consider the projected changes for 1-day metrics only in areas where the changes are statistically significant. By 2025 under RCP8.5, about half of the land regions experience a change of at least of 1 °C in the 20-year return values of both minimum and maximum daily temperatures extremes; that fraction is only 18 % under RCP4.5 for minimum temperature and 31 % for maximum temperature. However, only 2 % of the land regions would experience a benefit of that magnitude from mitigation at this time, since the new return values have substantially increased all over even under RCP4.5 and therefore the differences between scenarios are generally smaller than 1 °C. As the century progresses, changes of much larger magnitudes affect an increasingly larger portion of land regions. Under RCP8.5 by mid-century nearly all land areas see changes of at least 1 °C and three-quarters of the land area is projected to increase by 2 °C. More than a quarter of the land areas experience changes of at least 3 °C under RCP8.5. Changes under RCP4.5 are substantially smaller everywhere so that by mid-century close to 50 % of the land areas experience warming that is reduced by 1 °C or more compared to RCP8.5. By the end of the century about 95 % of land regions would see benefits of 1 °C or more under the lower emissions scenario, and 50 % or more of the land areas would benefit by at least 2 °C in terms of reduced magnitude of the change. 6 % of the land area would benefit by 3 °C or more in projected extreme minimum temperatures and 13 % would benefit by this amount for extreme maximum temperature. Note that by this time in the century, under RCP8.5 more than a quarter of the land area sees warming of extremes by 5 °C or more. Under RCP4.5, almost no part of the land surface shows increases of this magnitude.

Avoided impacts are very similar for 3-day metrics. As noted above, although the magnitude of their current return values are smaller (Fig. 1), these types of prolonged extremes are projected to warm more than their single day equivalents. As Table S2 details, the effect of mitigation on the fraction of land areas exceeding a given threshold is profoundly larger on 3-day heat events. For example, by the end of the 21st century under RCP8.5, the fraction of land experiencing warming of 5 °C or more in 3-day maximum temperatures is around 50 % but is only about 25 % under RCP4.5. As the patterns in all figures indicate, the Northern Hemisphere regions -- in particular the northern tier of North America, Europe and Northern Eurasia -- and central South America, are the regions that should expect the larger changes, and hence the larger potential benefits.

3.2 Projected changes in return periods

Next, we look at a different characterization of the expected changes and the gains from mitigation, inspired by the concept of risk ratio. According to its definition, we keep the intensity of an event fixed at its current value (Fig. 1), and measure changes in its frequency over future periods. Thus, we consider the current 20-year events in the four metrics, and estimate their changing frequency over the three future periods and the two scenarios. Defining the risk ratio as RRt = Ft/F0 where F0 is by definition 1/20 (the annual frequency of the events of interest, in the current period) and Ft is the frequency at time t = 2025, 2050, 2075, we show maps and compute summary statistics of RRt. Values of RRt = 1 indicate no change in frequency, while at the opposite end of the spectrum, values of RRt = 20 indicate that current 20-year events become annual events in the future, i.e. become events expected to occur with 100 % chance every year. (As already noted, our method’s highest frequency is annual; this does not preclude the possibility that in reality – at least the model’s reality -- the 20-year events could happen more than once a year.) As described in the supplementary material we apply a parametric bootstrap technique to compute the variance of our future estimates (keeping the 1/20 year frequency for current events as a constant) and, on that basis, estimate the significance of the changes in frequency.

Figures 4 and 5 show estimated future risk ratios for daily maximum temperature and 3-day average maximum temperature. These figures are the complementary figures to the projected changes in intensity shown in Figs. 2 and 3. Figure S3 and S4 show corresponding results for minimum temperature metrics, for the quantities in the bottom panels of Fig. 1 and complementary to the changes in intensity in Figures S1 and S2.

Fig. 4
figure 4

Risk ratio values, and their differences, for extremes in 1-day maximum temperature. As in Figs. 2 and 3, the different future periods around 2025, 2050 and 2075 are displayed along the rows. Left column shows results for RCP4.5, middle column for RCP8.5, and the last column the difference between them. Areas are colored only when changes are statistically significant

Fig. 5
figure 5

Like Fig. 4, but for extremes computed from the annual maxima of 3-day average maximum temperature

Similarly to Figs. 2 and 3, we show changes within each scenario in the first two columns of each figure, and their difference in the third column, for the three future periods along each row. Changes are shown according to the color scale only if they, or their difference, are statistically significant. Note that changes in risk ratio are statistically significant nearly everywhere by 2025, due to the large sample sizes.

Figures 4 and 5 reveal that the larger changes in intensity in maximum temperatures compared to minimum temperatures, and for 3-day compared to 1-day indices (within scenario) are confirmed for changes in frequency as well. Contrary to the effect on intensity changes, however, mitigation reduces the future risk ratio of minimum temperature extremes more than maximum temperature extremes (comparing Figures 4 and 5 to Figures S3 and S4).

The geographic pattern for the changes in risk ratios appears almost orthogonal to the pattern of changes in intensity. For the latter, the most affected areas are the northern latitudes, consistent with the fact that warming is larger in these areas, a well known pattern of change under anthropogenic forcings (Tebaldi and Arblaster 2014). On the contrary, changes in risk ratio appear to be larger in the tropics, due to the interplay of the size of the shift in the temperature distributions with their width: as is well known, temperature variability is small at low latitudes, and therefore, even relatively small shifts in the location of the distribution cause large changes in its quantiles (which are determining the new frequency of the old return values).

We quantitatively summarize changes in frequency, at the three periods in the future and for the two scenarios, by the fraction of the land area with future risk ratio values upward of a given threshold. We choose four thresholds, RR* = 2, 4, 10 and 20, corresponding to changes in frequency that make the current 20-year event 10-, 5-, 2- or 1-year event, respectively (i.e., 2, 4, 10 or 20 times more frequent). Table 1 reports the percentage of land area affected by changes equal or larger than three of these thresholds.

Table 1 Fraction of the land area (in % of the total) where the Risk Ratio is at least as large as X (X = 4,10 or 20, translating in a 4-, 10- or 20-fold increase in the frequency of the current 20 year event). The difference between these fractions under the two scenarios is reported in the third, right-most column under each metric. In this case we simply report the difference between the corresponding numbers to the left, since reporting statistics about differences in Risk Ratio would have been less interpretable

First we note that by 2025, under both scenarios, all events of interest become at least twice as frequent (thus we omit these results from the table rather than showing 100’s and 0’s uniformly filling the respective cells). At this early period, more than half of the land area sees changes in frequency that are 4-fold for 1-day events, under both scenarios, and more than 90 % of the land area sees changes of that magnitude for 3-day events, under both scenarios. Projected changes of 10 or greater in risk ratio for daily events are minimal, but under RCP8.5, risk ratios of 10 or more are projected for 37 % of the land area for 3-day extreme minimum temperatures and for 21 % of the land area for 3-day extreme maximum temperatures. (Similarly to the effect that makes low latitude areas experiencing larger changes than higher latitude areas, the effect on the frequency of 3-day extremes is larger than on daily extremes because their probability distributions show less variability.) Differences between scenarios indicate that emission mitigation would have an effect even by this early time period on the frequency of what are currently rare temperatures.

By 2050, and even more so by 2075, changes in frequency that have made current rare events at least 4 times as frequent are pervasive, for all metrics and for both scenarios. There are, however, large gains to be had by mitigation in terms of reducing the increase in frequency of all these events over large portions of the land areas. For example, by 2050 under RCP8.5, more than two thirds of the land area see current 20-year events become biannual events for 1-day metrics, while only around 30 % of the land areas are similarly afflicted under RCP4.5. The most striking differences between scenarios are realized by the end of the century, when, under RCP8.5, more than half the land area experiences the current 20-year events every year (up to almost 80 % of the area when considering 3-day minimum temperature, in fact) while only between about 10 and 25 % of the area is affected by such severe changes under RCP4.5.

4 Discussion and conclusions

Using two large initial condition ensembles of CESM1 simulations under two alternative scenarios, we estimate robustly and precisely the parameters of Generalized Extreme Value distributions for four types of hot temperature extreme events, in both current and future conditions to calculate long period return values. We choose annual maxima of high and low daily temperatures and annual maxima of their 3-day averages to address extremes as isolated events and as persistent events having the character of heat waves. We focus on the characterization of 20-year events in the current part of the simulation (1996–2005) and for three future periods (around 2025, 2050 and 2075). Other types of indices, which, for example combine temperature and humidity (Fischer and Schar 2010), or characterize the exceedance of absolute thresholds (Pal and Eltahir 2015; Tebaldi and Lobell, this issue) may better address specific types of impacts, but we propose this study as a first order evaluation of heat extreme impacts and the benefits of their mitigation. Due to the availability of many replicates for each climate model simulation, we accurately represent the statistics of these periods under an assumption of quasi-stationarity by sampling windows of 10-year intervals around those dates. The goal of this analysis is to quantify the benefits – in terms of avoided impacts – that the world would experience by limiting climate change along the emissions path of RCP4.5 rather than RCP8.5. Of course our analysis is limited by the use of a single model, but within those limits we benefit from the high statistical power of the large ensembles. Therefore, even if the absolute magnitudes of the projected changes may not apply literally to the real world, and uncertainties related to model structural choices are not accounted for, we are confident that the statistical significance of the differences in future extreme high temperatures between the two scenarios considered is robust. The scope of this paper does not include a thorough validation of the model performance (but see the supplementary material for a first order evaluation of two metrics related to annual maxima). We can refer to the overall reassuring results from Sillmann et al. 2013a, for an extensive validation of the CMIP5 models (of which the version of CESM used here is one) with regard to various indices of temperature extremes (two of them coinciding with the daily metrics used here). But we also point out that GCMs only provide a large scale picture of the climate and cannot be taken to represent with fidelity observed behavior, especially in extreme quantities at local scales. Hence, the magnitude of the return values measured at the ground may differ from those predicted by the GCMs due to local effects (e.g. landscape, urban effects, and local soil moisture) which is one main reason for which we focus on relative changes rather than absolute projected values. GCMs nevertheless provide a valuable indication of the direction of change, in a coherent global picture and regional features that could not easily be achieved through alternative modeling exercises.

We find that extremes of all four types grow substantially in intensity and frequency under both scenarios, and even as soon as 2025 large portions of the land areas – to which our analysis is confined – see statistically significant changes, especially in the frequency of the current 20-year events. Changes in intensity start slow but grow to several degrees Celsius by the end of the century for the vast majority of the land areas under RCP8.5, and the gains of mitigating climate change are substantial in lowering the fraction of the land affected by these largest changes. However, benefits of mitigation are also found as early as by 2050 and are especially sizeable for 3-day average temperature extremes. The longer duration extremes are somewhat lower in magnitude than their daily counterparts but also exhibit lower variability. Hence, the quantiles of their extreme distributions are more sensitive to shifts towards warmer temperatures, resulting in larger increases in projected future frequency and more sensitivity to mitigation in this regard. There are differences in the details of the results depending on the definition of extreme temperatures and changes, whether we consider 1 vs. 3-day events, minimum vs. maximum temperature and whether the changes are expressed in terms of intensity or frequency. Nonetheless, the overall message is robust: mitigation of climate change from the RCP8.5 pathway to RCP4.5 pathway translates in large benefits from avoided extreme heat events starting as early as 2050.