1 Introduction

The global-mean surface air temperature, expressed as the difference T from an unperturbed steady state, is widely used as an indicator of the magnitude of global climate change, both in observations and in simulations of the past and future. Changes in T occur as a result of unforced (internally generated) variability of the climate system on all timescales, and in response to radiative forcing of the climate system.

Anthropogenic radiative forcing, mainly due to well-mixed greenhouse gases and tropospheric aerosols, has increased monotonically and rather smoothly during the “historical” period i.e. since the mid-nineteenth century, the period for which we have instrumental estimates of T (Fig. 1a), and is projected to continue to rise during the present century at a rate which depends on the emissions scenario. For example, the Fifth Assessment Report (AR5) of the Intergovernmental Panel on Climate Change considered a set of scenarios under which the nominal radiative forcing at 2100 (relative to pre-industrial, regarded as the unperturbed steady state) ranges between 2.6 and 8.5 W m−2 (e.g. Fig. 12.4 of Collins et al. 2013).

When integrated with historical changes in radiative forcing agents, coupled atmosphere–ocean general circulation models (AOGCMs) show an ensemble-mean historical warming trend due to anthropogenic forcing that is very similar to the observed, as many studies have demonstrated (recently assessed by Bindoff et al. 2013). For example, in Fig. 2a we compare the ensemble-mean T from the “historical” simulations of 16 AOGCMs of the Coupled Model Intercomparison Project Phase 5 (CMIP5, Table 1, black line) with an observational estimate (HadCRUT4, green line, Morice et al. 2012). We use one integration of each AOGCM, and the historical T in each is the difference from its parallel control experiment with constant pre-industrial atmospheric composition.

Fig. 1
figure 1

a Timeseries of historical annual-mean radiative forcing F(t) assessed by Myhre et al. (2013) (AR5, Sect. 1) and for volcanic aerosol alone diagnosed from the HadCM3-A sstPiHistVol experiment (Sect. 3). The first years of the six named major volcanic eruptions are indicated by the vertical lines. b Comparison of the AR5 estimate of historical volcanic forcing with the HadCM3-A sstPiHistVol estimate (Sect. 3)

Fig. 2
figure 2

Global-mean surface air temperature T simulated by the ensemble mean of CMIP5 AOGCMs, compared with the ensemble means of estimates made from the historical forcing timeseries of Myhre et al. (2013) using constant TCRP (\(1/\rho\)) (Sect. 1) and using the step model (Sect. 2), (a) historical experiment (anthropogenic and natural forcings), relative to the time-mean of 1961–1990, also showing the HadCRUT4 observational T (Morice et al. 2012), (b) historicalNat experiment (natural forcings only), relative to control, also showing results with forcing diagnosed from the HadCM3-A sstPiHistVol experiment (Sect. 3)

The historical simulations also include natural forcing, due to variability of solar irradiance and to aerosol injected into the stratosphere by explosive volcanic eruptions (henceforth referred to as “volcanic aerosol”). For a few years following the eruption, volcanic aerosol causes a net negative radiative forcing (“volcanic forcing”) of the climate system, by reflection of sunlight (shortwave radiation), partly offset by absorption of outgoing longwave radiation by the volcanic aerosol (Oman et al. 2005; Forster and Taylor 2006). The timeseries of volcanic forcing for the historical period from the Fifth Assessment Report (AR5) of the Intergovernmental Panel on Climate Change (Fig. 1a; Myhre et al. 2013) shows a maximum magnitude of −3.6 W m−2 for the Krakatau eruption of 1883. The Pinatubo eruption of 1992 was the next largest, and there have been no such large events since. Other estimates of time-dependent volcanic F(t) are similar (e.g. Figs. 25 of Forster et al. 2013).

Table 1 CMIP5 AOGCMs used in this work

The CMIP5 ensemble-mean T shows a sudden global-mean cooling of 0.1–0.3 K caused by the negative forcing from each historical volcanic eruption (Fig. 2a). The observed T timeseries has larger interannual variability, because unforced interannual variability is independent in each model; the interannual standard deviation of T would be reduced by a factor of \(\sqrt{16}\) in the mean if it were the same in all the individual models. Against this background, it is not simple to evaluate the cooling due to each volcano in reality. A thorough evaluation of the AOGCMs compared with observations involves removing the influence of and considering the relationship with unforced modes of variability such as El Niño (Thompson et al. 2009; Driscoll et al. 2012; Ding et al. 2014; Maher et al. 2015). In this work, our interest is the factors affecting the magnitude of cooling as simulated by the AOGCMs.

With time-dependent forcing F(t) that increases at a roughly constant rate, experiments with AOGCMs show that \(F(t)=\rho T(t)\) is a fairly good approximation on timescales from about 10 years to several decades (Raper et al. 2002; Gregory and Forster 2008; Gregory et al. 2015), where the climate resistance ρ (W m−2 K−1) is a model-dependent property of the climate system. The physical interpretation of this simple model is that the radiative forcing F is balanced by heat loss αT to space and heat uptake \(\kappa T\) by the ocean, which holds the great majority of the heat capacity of the climate system (Levitus et al. 2001; Church et al. 2013). The climate feedback parameter α and the ocean heat uptake efficiency κ are both positive, and \(\rho =\alpha +\kappa\). In this picture, T is a surface skin temperature, with negligible thermal inertia, determined by the Earth energy balance \(F=N+\alpha T\), where N is the net downward radiative heat flux at the top of the atmosphere, and \(N=\kappa T\) if we neglect heat storage other than in the ocean. We call \(F=\rho T\) the “zero-layer model” (Bouttes et al. 2013), because it does not include any finite heat capacity.

We call \(1/\rho\) the “transient climate response parameter” (TCRP, K W−1 m2, Gregory et al. 2015). It is the increase in global-mean temperature per unit increase in radiative forcing during time-dependent climate change. The standard benchmark for the predicted AOGCM response to anthropogenic forcing is the transient climate response (TCR, Cubasch et al. 2001), evaluated under the idealised 1pctCO2 scenario, in which the atmospheric CO2 concentration increases at 1 % yr−1. The TCR is defined as T after 70 years, the time of \(\text{2 }\times \text{ CO }_{2}\) i.e. double the initial concentration. The TCR and TCRP are related by \(\hbox {TCR}=F_{\mathrm {2\times }}\times \hbox {TCRP} =F_{\mathrm {2\times }}/\rho\), where \(F_{\mathrm {2\times }}\) is the radiative forcing of \(\text{2 }\times \text{ CO }_{2}\). (We use use the term “TCRP” rather than “transient climate sensitivity”, which has also been suggested e.g. Held et al. 2010, to avoid confusion with the equilibrium climate sensitivity, and following an analogy with the relationship between the equilibrium climate sensitivity in K and the climate sensitivity parameter in K W−1 m2.)

Gregory and Forster (2008) and Held et al. (2010) pointed out that in observations and simulations of historical climate change the response of T to volcanic forcing is much smaller than would be expected from \(\rho\) calculated for CO2 forcing from idealised scenarios of CO2 increase. We demonstrate this by comparing the AOGCM ensemble-mean historical T with the ensemble mean of estimates derived from the AR5 historical forcing according to the zero-layer model \(F=\rho T\) (Fig. 2a, black and blue lines) using ρ for each model from its own 1pctCO2 experiment (Table 1). The time-profile of anthropogenic warming is reasonably well-reproduced by the zero-layer model, with somewhat overestimated magnitude—this could be because the TCRP is larger at the higher CO2 concentration at which it is evaluated under 1pctCO2 (Gregory et al. 2015). However, the short-lived pronounced volcanic cooling is hugely exaggerated by the zero-layer model.

As a simple measure of this effect, for each of the six major eruptions named in Fig. 2, we compute the “volcanic cooling” \(\varDelta T=\hbox {min}(T(t),T(t+1))-\hbox {mean}(T(t-2),T(t-1)),\) where t is the year of the eruption, for both the zero-layer and the AOGCM T. This quantity measures the maximum cooling caused by the volcanic forcing with respect to the years immediately before. According to a one-parameter regression (requiring an intercept of zero) of the ensemble-mean \(\varDelta T\) from the zero-layer model against the AOGCM ensemble-mean \(\varDelta T\), the zero-layer model overestimates the cooling by a factor of about five.

Fig. 3
figure 3

Global-mean surface air temperature change T with respect to control simulated by the ensemble of HadCM3 histVol experiments (Sect. 2), and estimated by the step model using various combinations of forcing and response (Sects. 2, 3 and 4). For the dotted red line, the AR5 forcing was adjusted by adding a constant so that it had the same time-mean as the HadCM3 forcing. The HadCM3 ensemble mean is shown by the thick black line and the envelope (from maximum to minimum in each year) of the four integrations by the grey shading

Table 2 Experiments carried out with the HadCM3 AOGCM, indicating in which section of the paper each is first discussed and the number of integrations in cases of an ensemble

Held et al. (2010, their Fig. 3) show that the overestimate by the zero-layer model can be explained by its neglect of the heat capacity \(C_{u}\) of the upper ocean, which is important for episodic forcing (like volcanoes), but not for gradual multidecadal forcing change (like 1pctCO2). They demonstrate this by using the one-layer model (so called by Geoffroy et al. 2013, and “upper-layer model” by Gregory et al. 2015)

$$\begin{aligned} F-\alpha T&=N=C_{u}\frac{\mathrm {d}T}{\mathrm {d}t}+\gamma T \Rightarrow C_{u}\frac{\mathrm {d}T}{\mathrm {d}t}\\ &= F - \alpha T - \gamma T = F - \rho T, \end{aligned}$$

where \(\gamma T\) is the rate of heat loss from the upper ocean to the deep ocean beneath, which is treated as an infinite heatsink. In the limit \(C_{u}\rightarrow 0\), the upper-layer model becomes the zero-layer model \(F=\rho T\), if we identify γ with κ so that \(\rho =\alpha +\gamma\).

If a forcing F is imposed instantaneously at \(t=0\) and held constant, \(T=(F/\rho )(1-e^{-\rho t/C_{u}})\) in the upper-layer model. From CMIP5 abrupt4xCO2 experiments, in which CO2 is quadrupled at the start and subsequently held constant, Geoffroy et al. (2013) found that \(C_{u}\) has the heat capacity of a few tens of metres of water, and the response timescale \(C_{u}/\rho\) is \(4.1\pm 1.0\) years (\(\tau _{f}\) in their Table 4); Held et al. (2010) assume 4 years. For times which are much shorter than this, \(C_{u}\,\mathrm {d}T/\mathrm {d}t=Fe^{-\rho t/C_{u}}\simeq F\) and \(\rho T \ll F\). i.e. most of F is absorbed by the upper-ocean heat capacity (of course volcanic forcing is negative, and “absorption” means loss of heat in this case) rather than by heat loss to the deep ocean or through climate feedback. For volcanic forcing which is large in magnitude for only a year or two, the TCRP and the zero-layer model are therefore not applicable.

Considering the Earth energy balance \(F=N+\alpha T\), we see that the prediction of T in response to volcanic forcing is affected by uncertainties in the forcing F and the feedback \(\alpha\), as well as in the ocean heat uptake N. Two issues in particular have been identified in previous studies of the AOGCM simulation of volcanic response (e.g. Wigley et al. 2005; Boer et al. 2007; Bender et al. 2010), namely that volcanic F is not exactly known in AOGCMs because it is not usually diagnosed, and that \(\alpha\) for volcanic forcing might not be the same as for CO2. As an alternative to analysis of the AOGCM experiments, in this work we estimate the AOGCM response to episodic volcanic forcing using the “step” model (described in the next section), with which we compare the influences on the AOGCM response of N, F and \(\alpha\) in Sects. 2, 3 and 4 respectively, In Sect. 5 we draw conclusions about the TCRP as applied to volcanic forcing, including comparison with the results of Merlis et al. (2014).

2 Ocean heat uptake

The “step” model (Good et al. 2011) has the advantage that it avoids fitting any parametric form to the AOGCM results, unlike the upper-layer model and the upwelling–diffusion model of Wigley et al. (2005). Instead, it uses the response of the AOGCM itself to constant forcing instantaneously imposed (i.e. a “step-change”). The step model relies on the assumption of linear systems theory (Good et al. 2015) that the response of the system depends linearly on forcing, so that the response to a sum of forcings equals the sum of responses to individual forcings. Therefore the response X(t) of any climate variable to a forcing scenario F(t) can be estimated as the sum of responses to a series of t annual forcing increments \(F(t)-F(t-1)\), with \(F(0)=0\), according to

$$\begin{aligned} X(t) = \sum _{t^{\prime }=1}^{t}\,X_{s}(t-t^{\prime }+1)\frac{F(t^{\prime })-F(t^{\prime }-1)}{F_{s}} \end{aligned}$$

where the sum is over years, and \(X_{s}(t)\) is the response after t years to a constant forcing \(F_{s}\) imposed at \(t=0\).

For the sake of argument, let us idealise the forcing due to a single volcanic eruption as a step of F(<0) in year \(t=1\) and a step of \(-F\) in year \(t=2\) back to the initial level i.e. a pulse lasting for a year. According to Eq. 2, the response to the pulse with respect to the mean state is

$$\begin{aligned} X(t) = \frac{F}{F_{s}} (X_{s}(t)-X_{s}(t-1)) \end{aligned}$$

(only these two terms in the sum are non-zero). Although we do not use this equation in a form with continuous time, it is interesting to note that for small \(\delta t\)

$$\begin{aligned} \frac{X(t)}{\delta t} = \frac{F}{F_{s}} \frac{\mathrm {d}X_{s}}{\mathrm {d}t} \end{aligned}$$

and in the limit \(\delta t\rightarrow 0\) the response to a delta-function pulse is the time-derivative of the step response.

Since \(X_{s}(0)=0\) (before the forcing is switched on), \(X(1)=(F/F_{s}) X_{s}(1)\). Hence N in the year of the eruption is simply the first-year \(N_{s}(1)\) in response to the step forcing, scaled by the ratio \(F/F_{s}\) of the forcings, and \(N(1)/F=N_{s}(1)/F_{s}\). In the ensemble mean of CMIP5 abrupt4xCO2 experiments for the models of Table 1, \(N_{s}(1)/F_{s}=0.83\) (estimated from the linear fit shown in Fig. 5 of Gregory et al. 2015). Thus the step model agrees with the upper-layer model in predicting \(N\simeq F\) during a short volcanic eruption. That is, a much larger proportion of F is absorbed by the ocean than in the case of the gradually increasing 1pctCO2 forcing, for which \(N/F=\kappa /\rho \simeq 0.36\) in the model mean at the time of \(\text{2 }\times \text{ CO }_{2}\) (Table 1).

Running the step model with each AOGCM’s abrupt4xCO2 \(T_{s}(t)\) and the AR5 forcing timeseries (Fig. 1a), we estimate the ensemble-mean T(t) for the CMIP5 historical and “historicalNat” experiments (solid red lines in Fig. 2); the latter has the same natural forcing as the former, but no anthropogenic forcing. As Held et al. (2010) found with the upper-layer model, the volcanic cooling in the step model is much smaller than given by constant TCRP. However, it is still somewhat larger than in the AOGCMs. Computing \(\varDelta T\) as defined in Sect. 1 for the historical ensemble, we find that the step model overestimates the volcanic cooling by about 60 % on average. Moreover, the step-model simulations for historicalNat show a long-term negative trend not present in the AOGCM ensemble mean.

Since the CMIP5 historical experiments include solar forcing as well as volcanic, we have carried out a “histVol” experiment, comprising an ensemble of four integrations with time-dependent historical volcanic aerosol 1860–2010 and no other forcing agents using the HadCM3 AOGCM (Gordon et al. 2000). This model was included in CMIP3 (the previous generation of the Coupled Model Intercomparison Project), performs well in comparison to many more recently developed AOGCMs (Reichler and Kim 2008) and is computationally relatively inexpensive by today’s standards. (The experiments referred to in this paper are listed in Table 2.) Given \(\rho =1.6\) W m−2 K−1 for HadCM3 under CO2 forcing (Gregory and Forster 2008), the zero-layer model predicts a maximum cooling of about 2 K during the largest events, whereas HadCM3 cools by 0.3 K at most relative to its control (Fig. 3). Using the HadCM3 abrupt4xCO2 \(T_{s}(t)\) and the AR5 volcanic F(t), the step model gives a maximum cooling of about 0.6 K, and a long-term negative trend (Fig. 3), qualitatively similar to the CMIP5 historicalNat ensemble.

In summary, the results of the step model for CMIP5 and HadCM3 confirm that the zero-layer model’s neglect of upper-ocean heat capacity is the main reason for the excessive cooling it predicts in response to volcanic forcing, but indicate that it is not the whole explanation.

3 Volcanic forcing

Fig. 4
figure 4

Relationship between volcanic aerosol optical depth and volcanic forcing assessed by Myhre et al. (2013) (AR5) and diagnosed from HadCM3-A experiments with historical time-dependent volcanic aerosol (sstPiHistVol, Sect. 3), constant volcanic aerosol approximately as at the peak of the Pinatubo eruption (sstPiPin, Sect. 4), and HadGEM2-A AMIP experiments (Sect. 3). Both the AOD and the forcing are shown as differences from the state of the model in the control experiment

Fig. 5
figure 5

Relationship between ensemble-mean annual-mean N (difference from the control) from either the CMIP5 historicalNat or the HadCM3 histVol experiment and annual-mean F from the HadCM3-A sstPiHistVol experiment. The lines are regressions of N against F for \(F<-0.22\) W m−2, which is the threshold indicated by the dotted line

CMIP5 does not include experiments which can be used to diagnose the volcanic forcing simulated by AOGCMs, unlike for CO2, and the lack of knowledge of F is an obstacle to analysis of the T response (Bender et al. 2010). Therefore we run a pair of experiments (sstPi and sstPiHistVol, Table 2) to diagnose the historical volcanic forcing with HadCM3-A (the atmosphere general circulation model, or AGCM, component of HadCM3), prescribing constant climatological sea surface boundary conditions (temperature and sea-ice) from the coupled HadCM3 control experiment. Experiment sstPi has volcanic aerosol prescribed at the constant level of the HadCM3 control, which is typical of the long-term mean; \(F=0\) by definition if this is regarded as the unperturbed state. Experiment sstPiHistVol has time-varying historical volcanic aerosol.

Since the sea surface conditions are the same in the two experiments, global-mean T is almost the same (0.002 K cooler in sstPiHistVol; the effect of land temperature change is very small). Hence \(\alpha T\) cancels out in the difference in the energy balance \(F=N+\alpha T\) between the two experiments, and F(t) is diagnosed as the difference in N(t) (Hansen et al. 2005; Held et al. 2010; Andrews 2014). There is statistical uncertainty in this estimate of F due to unforced variability in N, whose interannual standard deviation is 0.17 W m−2 in the HadCM3 atmosphere model with constant boundary conditions. Consequently we regard forcing with magnitude \(|F|<1.65\times 0.17/\sqrt{2}=0.22\) W m−2 as insignificant (at the 10 % level) in the ensemble mean of the two integrations of sstPiHistVol.

The HadCM3 volcanic forcing (Fig. 1a) is positive in years when there is no volcanic aerosol (the majority of years, between major eruptions), because there is a non-zero concentration in the control. The alternative assumption, made in the AR5, that zero volcanic aerosol implies zero forcing, is incorrect for the real world, because the unperturbed natural state of the climate system includes the effects of occasional eruptions. The permanent cessation of volcanic eruptions would produce a climatic warming, so must imply a positive forcing (Gregory 2010; Gregory et al. 2013). In the long term, the small positive forcing in years with no volcanic aerosol is balanced by the small number of years with large negative forcing. On the other hand, if an AOGCM control integration does not include volcanic forcing, \(F=0\) for zero volcanic aerosol, and the negative time-mean volcanic forcing in the historical experiment produces a negative trend in ocean heat content.

Apart from this offset, the HadCM3 volcanic forcing timeseries is similar to the AR5 (Fig. 1a) and they are strongly correlated (0.95), as expected because they were derived by different methods from the same volcanic aerosol timeseries (Sato et al. 1993, updated). However, regression of annual means shows that the HadCM3 forcing is only 77 % of the AR5 (Fig. 1b). Because global-mean T is not exactly zero in sstPiHistVol (see above), the magnitude of F will be underestimated, but this effect is only 3 % (not shown). Regression of F against global-mean volcanic aerosol optical depth (AOD) gives a slope of \(-24.6\pm 0.2\) for AR5 (blue line in Fig. 4), consistent with the AR5 formula of −25 W m−2 per unit AOD (Table 8.SM.8 in the supplementary material of Myhre et al. 2013), following the results of Hansen et al. (2005, their equation 2a) from the GISS AOGCM for Pinatubo. For HadCM3, the slope is only \(-19.0\pm 0.5\) W m−2 (solid black line).

We have examined this relationship also in a pair of experiments carried out by Andrews (2014) with HadGEM2-A, the AGCM of HadGEM2-ES (Collins et al. 2011), using the observationally derived time-dependent sea surface boundary conditions for 1979–2008 of the Atmosphere Intercomparison Model Project Phase II (Taylor et al. 2000; Hurrell et al. 2008). One experiment includes all forcings agents, both anthropogenic and natural, and the other anthropogenic only. By the same argument as for sstPiHistVol, the difference in N between these experiments is the natural radiative forcing. To obtain the volcanic contribution, we subtract the AR5 estimate of solar forcing. The AMIP period includes two major volcanic eruptions (El Chichon and Pinatubo), and the regression slope of volcanic F against AOD is \(-17.0\pm 1.0\) W m−2 (red line in Fig. 4), weaker than in HadCM3 and further from the AR5 formula.

We propose that the difference from the AR5 formula is caused by rapid tropospheric adjustment (Myhre et al. 2013; Sherwood et al. 2015) in our AGCMs in response to volcanic aerosol. By including such adjustment, the AGCM diagnosis gives an effective radiative forcing. The AR5 formula is based on results from a version of the GISS AOGCM (Hansen et al. 2005) in which tropospheric adjustment to volcanic aerosol was apparently smaller. We apply the method of approximate partial radiative perturbation (APRP, Taylor et al. 2007) to estimate the shortwave effect of rapid cloud adjustment in HadCM3-A. It is a positive quantity, and arises from a reduction of cloud fraction and the planetary albedo when volcanic aerosol is imposed. With this positive adjustment subtracted, the HadCM3 forcing has a regression slope of \(-26.6\pm 0.5\) W m−2 against the AOD, much closer to the AR5 value (Fig. 4). Thus shortwave cloud adjustment would be sufficient to explain the difference, but there may also be longwave cloud adjustment, which APRP cannot be used to evaluate.

Rapid adjustment in shortwave cloud radiative effect (CRE) has been previously been noted in response to CO2 forcing in AGCMs (Gregory and Webb 2008) and AOGCMs (Andrews et al. 2012; Zelinka et al. 2013), due to reduction in cloudiness. In HadCM3, there is a positive shortwave cloud adjustment of 1.7 W m−2 included in the net \(\text{4 }\times \text{ CO }_{2}\) forcing of 7.7 W m−2. It is interesting and worthy of further investigation that both negative shortwave volcanic and positive longwave CO2 forcing produce a positive shortwave cloud forcing adjustment.

In view of the differences between the AR5 and sstPiHistVol F(t), we rerun the step model for HadCM3 histVol. First, we use the AR5 F(t) adjusted to have the same small time-mean (\(-0.04\) W m−2) as the HadCM3 sstPiHistVol F(t); this adjustment removes the long-term cooling trend (Fig. 3, dotted red line), which is due to the more negative time-mean (\(-0.24\) W m−2) of the AR5 F(t). In fact before the Krakatau eruption there is now a warming trend, because zero volcanic aerosol implies positive forcing. Second, using the sstPiHistVol F(t) instead of the AR5, the step-model estimate of the HadCM3 volcanic cooling becomes smaller in magnitude (dashed red line), and thus more similar to the AOGCM.

Rerunning the step model for the CMIP5 historicalNat ensemble using the HadCM3-A sstPiHistVol F(t) removes the long-term cooling trend and reduces the magnitude of the volcanic cooling (Fig. 2b), just as in HadCM3. This suggests that rapid adjustments may occur in CMIP5 AOGCMs that reduce the magnitude of volcanic forcing, and that this set of CMIP5 AOGCMs is near to steady state with time-mean historical volcanic forcing (like HadCM3), rather than with zero volcanic forcing (like CMIP3 models; Gregory 2010).

We found above that \(N_{s}/F=0.83\) in the CMIP5 abrupt4xCO2 ensemble in the first year of forcing. In the historicalNat ensemble mean, for years with significant volcanic forcing (\(F<-0.22\) W m−2 in the AR5 F timeseries, the threshold obtained from HadCM3, Sect. 3), the slope of the regression of N(t) against HadCM3 volcanic F(t) is \(0.89\pm 0.08\) (red line in Fig. 5), consistent with the expected value. This is further, although circumstantial, evidence that the volcanic forcing in the CMIP5 ensemble mean, as in HadCM3, is less than the AR5 formula. As similar conclusion has been reached by Larson and Portmann (2016), who use an inverse application of the step model to estimate AOGCM forcings. Their method assumes that the same climate feedback applies to volcanic forcing and CO2, which is the issue we examine in the next section.

4 Climate feedback

Fig. 6
figure 6

Relationship between means for years 1–10 (years 9 and 10 are too close to be distinguished in the plot) of the change in global-mean downward radiative fluxes at the top of the atmosphere and the change in global-mean surface air temperature T in the ensemble mean of HadCM3 abruptPin integrations. Time runs from right to left, because the climate is cooling down. The lines show linear regressions of the radiative fluxes against T for years 1–5, whose slopes give the components of climate feedback (W m−2 K−1) shown in the key. A positive parameter indicates a negative feedback on climate change, because \(\alpha T\) opposes F if \(\alpha >0\). The net forcing F and N for the first year are marked in blue in (b). “SW” is shortwave, “LW” longwave, “CRE” cloud radiative effect, “APRP” approximate partial radiative perturbation. In (a) the clear-sky effect is the change in the radiative flux diagnosed with clouds ignored (referred to as Method II by Cess et al. 1993) and the CRE is the difference between the all-sky (i.e. with the cloud fraction simulated by the GCM) and the clear-sky fluxes. In (b) the APRP method is used to decompose the change in SW radiative flux under all-sky conditions into contributions from changes in cloud-free air, cloud and surface albedo

The magnitude of cooling due to volcanic forcing is affected by the climate feedback parameter according to \(T=(F-N)/\alpha\). Therefore a third possible contribution to the smaller cooling than expected from the TCRP is that \(\alpha\) for volcanic forcing is larger than for CO2 forcing. The assumption that \(\alpha\) is the same for both kinds of forcing is a motivation for evaluating climate feedbacks from observations and simulations of the response to the Pinatubo and other large eruptions (Soden et al. 2002; Forster and Collins 2004; Wigley et al. 2005; Merlis et al. 2014). There is evidence in support of the assumption, but there are uncertainties arising from forcing and unforced variability (Bender et al. 2010).

To evaluate \(\alpha\) for volcanic forcing in HadCM3, we carried out a ten-year “step volcano” experiment, called “abruptPin” (analogous to abrupt4xCO2, Table 2), with constant volcanic aerosol as it was in January 1992, approximately the peak of the Pinatubo eruption. An advantage of this method is that we do not need to know the forcing in advance; since \(F=N-\alpha T\), regression of annual-mean N versus T gives both the forcing as the N-intercept and \(-\alpha\) as the slope (Gregory et al. 2004). The NT relationship is not quite linear (Fig. 6), which indicates that \(\alpha\) is not constant, as has been found under constant \(\text{4 }\times \text{ CO }_{2}\) in most AOGCMs (Winton et al. 2010; Andrews et al. 2012; Gregory et al. 2015). Unlike CO2, volcanic forcing is short-lived, and the feedbacks in response to a “step volcano” sustained for many years are not relevant to historical eruptions. We therefore use only the first five years in the regression, although the results are not greatly different for ten years, since non-linearity is not pronounced on this timescale. The experiment comprised an ensemble of 14, to obtain adequate signal/noise.

Before considering the feedback, we check that the forcing is consistent with the results of Sect. 3. From the regression, \(F=-2.7\pm 0.1\) W m−2 (Fig. 6b, cf. \(+7.7\) W m−2 for abrupt4xCO2 in HadCM3). The global-mean AOD is 0.148, so the forcing is \(-18.4\pm 0.1\) W m−2 per unit AOD, nearly the same as in sstPiHistVol. We presume it is slightly different because the geographical distribution of volcanic aerosol is variable in the historical timeseries and has a small effect on the forcing. APRP shows that there is a positive shortwave cloud adjustment of \(0.9\pm 0.1\) W m−2 to the forcing (from the intercept of this term in Fig. 6b), whereas the forcing adjustment from surface shortwave absorption is negligible. Excluding the adjustment increases the forcing to \(-25\) W m−2 per unit AOD, in agreement with the AR5 formula (as in Sect. 3).

We also evaluated the forcing of abruptPin following the method of sstPiHistVol (Sect. 3), as the perturbation to N in experiment sstPiPin (Table 2), which has the constant volcanic aerosol of abruptPin in HadCM3-A with control sea surface conditions. By this method we obtain the same value of forcing (to one decimal place) as from abruptPin, and the value lies on the relationship between forcing and AOD found in sstPiHistVol (Fig. 4). Although in different ways, both methods estimate F as N under the influence of volcanic aerosol and in the absence of climate change; they are therefore expected nearly to agree, and have previously been found to do so (e.g.Hansen et al. 2005; Andrews et al. 2012).

The slope of the regression for abruptPin (Fig. 6) gives \(\alpha =1.81\pm 0.22\) W m−2 K−1, significantly larger (by about 40 %) than its value of \(1.25\pm 0.04\) W m−2 K−1 from the first 20 years of abrupt4xCO2 in HadCM3 (Andrews et al. 2015). (Non-linearity is statistically insignificant during these years; regression using only the first five gives \(\alpha =1.30\pm 0.15\) W m−2 K−1.) We note that \(\alpha\) in abruptPin is consistent with \(\alpha =1.8\) W m−2 K−1 estimated for infinitesimal changes in CO2 relative to the control; in HadCM3, \(\alpha\) is larger for smaller step-increases in CO2 concentration (Good et al. 2011, 2012; Gregory et al. 2015).

The ratio \(N_{s}(1)/F_{s}= -2.16\pm 0.12\div -2.71\pm 0.13=0.80\pm 0.06\) in abruptPin (shown in Fig. 6b, standard errors) agrees with the regression slope of N versus F in histVol (\(0.82\pm 0.10\), blue line in Fig. 5), in accordance with the step model (Sect. 2). For abrupt4xCO2 \(F_{s}\) in HadCM3, the ratio \(N_{s}(1)/F_{s}= 6.38\pm 0.16\div 7.73\pm 0.14=0.83\pm 0.03\) i.e. the fraction of forcing taken up is slightly greater but not significantly different for positive forcing. This is contrary to expectation that negative forcing would cause the ocean to lose heat more readily, by destabilising the vertical temperature profile (e.g. Stouffer and Manabe 1999; Bouttes et al. 2013; Merlis et al. 2014). Clearly this is not an important effect in HadCM3 for this magnitude of forcing.

We expect that the HadCM3 T(t) in histVol should be consistent with the feedbacks shown in abruptPin. To test this, we use abruptPin (instead of abrupt4xCO2) for \(T_{s}(t)\) with F(t) from HadCM3-A sstPiHistVol in the step model. Since abruptPin was only 10 years long, we set \(T_{s}(t)=T_{s}(10)\) in Eq. 2 for all \(t>10\), which means that the cooling caused by a volcanic eruption abruptly vanishes 10 years later. By this time it is actually quite small anyway, because in response to any constant forcing the differences \(|T_{s}(t)-T_{s}(t-1)|\) diminish with time (Fig. 6), so the response to a pulse forcing (Eq. 3) likewise decreases. This is consistent with our expectation that the influence of a brief forcing will gradually be forgotten as time passes and the system returns asymptotically to its unperturbed state.

With this combination of inputs, the step model gives the closest of its estimates to the HadCM3 histVol T(t) (blue line in Fig. 3). This confirms that the smaller climate sensitivity to volcanic forcing during the historical period than for elevated CO2 is also a reason for the overestimation of volcanic cooling by the TCRP in HadCM3, although less important than ocean heat uptake and volcanic forcing. We are not able to test this possibility for other AOGCMs, since similar experiments to abruptPin have not been carried out.

5 Conclusions

The zero-layer model of time-dependent climate change \(T=F/\rho\) gives a fairly accurate reproduction of changes in global-mean surface air temperature T(t) as observed during the CMIP5 “historical” period (since the latter part of the 19th century) and as simulated by CMIP5 AOGCMs in response to the smoothly varying anthropogenic part of the forcing F(t). The transient climate response parameter used in the zero-layer model (TCRP, the increase in T per unit increase in F, in K W−1 m2) is evaluated from idealised climate-change experiments with CO2 increasing at 1 % yr−1. The TCRP is the reciprocal of the climate resistance \(\rho\) (W m−2 K−1), which is the sum of the climate feedback parameter \(\alpha\) and the ocean heat uptake efficiency \(\kappa\).

The zero-layer model ovestimates, by a factor of about five, the sudden cooling in the AOGCMs caused by the short-lived large negative forcing from volcanic aerosol following explosive volcanic eruptions. This could be due to errors in any of the factors involved, namely F, \(\alpha\) and \(\kappa\). With reference to the two-layer model of the Earth energy balance (Gregory 2000; Held et al. 2010; Geoffroy et al. 2013; Gregory et al. 2015). Held et al. (2010) attributed the overestimate to the zero layer model’s neglect of the relatively small upper-layer heat capacity, which is unimportant for warming on multidecadal timescales forced by increasing CO2, but dominates the response of T to an impulsive forcing. Consequently \(\kappa\) does not correctly account for the ocean heat loss.

We confirm that most of the error is due to this effect by using the step model (Good et al. 2011) to estimate the AOGCM response to historical volcanic forcing from the AOGCM response to a step-change in CO2 forcing. In the step model and the AOGCMs, ocean heat content change is more than 80 % of the volcanic forcing during the first year, because the upper ocean readily gives up heat, whereas heat uptake is less than 40 % of CO2 forcing gradually increasing over decades, because it is limited by the less efficient thermal coupling to the deeper ocean. The difference is not due to the sign of the forcing (negative volcanic, positive CO2), but its timescale. However, although much closer than the zero-layer model, the step model ovestimates the cooling simulated by AOGCMs by about 60 %, and we explain this remaining discrepancy in terms of forcing and feedback.

The AR5 formula (\(-25\) W m−2 per unit AOD) overestimates the magnitude of the volcanic forcing in HadCM3 (\(-19\) W m−2 per unit AOD) by about 30 %, and in HadGEM2 (\(-17\) W m−2 per unit AOD) by about 50 %. We have shown that this can be explained in HadCM3 by a rapid positive shortwave cloud adjustment which reduces the magnitude of the negative volcanic forcing. This may be a model-specific result, but the step-model simulations suggest that the CMIP5 ensemble-mean volcanic forcing is more similar to that of HadCM3 than to the AR5 estimate, which was based on the GISS AOGCM (Hansen et al. 2005).

We find that the climate feedback parameter \(\alpha\) is about 40 % greater (climate sensitivity parameter smaller) for volcanic forcing than for \(\text{4 }\times \text{ CO }_{2}\) in HadCM3. This could be related to the different natures of the forcing, or it might be that \(\alpha\) depends on CO2 concentration (Jonko et al. 2012; Meraner et al. 2013; Caballero and Huber 2013; Gregory et al. 2015), for which the value applicable in the historical period is nearer to the pre-industrial CO2 concentration than to \(\text{4 }\times\text{ CO }_{2}\). Because volcanic perturbations to climate in the historical period are short-lived and comparable in magnitude to unforced interannual variability, a large ensemble is needed to evaluate \(\alpha\) with this precision, and consequently we cannot determine whether the CMIP5 models also show this effect.

It would be useful for investigation of volcanic forcing and feedback in CMIP6 if ensemble experiments were conducted with historical volcanic aerosol as the only forcing agent in each AOGCM, to diagnose the climate response, and with the corresponding AGCMs with prescribed sea surface conditions, to diagnose the radiative forcing. This would reveal whether other models also exhibit a cloud adjustment and a lower climate sensitivity for volcanic forcing, and offer the opportunity for analysis of the processes involved. Overestimated historical volcanic forcing, and overestimated climate sensitivity to such forcing, are possible explanations for the need to scale down the forcing in simple climate models in order to reproduce AOGCM results for volcanic cooling (Meinshausen et al. 2011; Lewis and Curry 2015).

An alternative to the use of a large ensemble to investigate the climate response to volcanic eruptions of the size that is typical of the historical period might be to improve the signal/noise by multiplying historical volcanic aerosol in an AOGCM by a large factor. However, this procedure may give inappropriate values for the feedback and forcing. For example, Jones et al. (2005) simulated a “supervolcano” of roughly the size of the Toba eruption 72 ka ago, which was two orders of magnitude greater than Pinatubo, using HadCM3 with Pinatubo volcanic aerosol multiplied by 100. The peak forcing was about \(-60\) W m−2, only 20 times greater than Pinatubo, and the climate feedback parameter \(\alpha \simeq 4\) W m−2 K−1, more than twice the value (half the sensitivity) that we found for Pinatubo in HadCM3. The mechanisms for this are not a subject of the present work, but we note that \(\alpha\) for volcanic aerosol of ten times the magnitude of Pinatubo exceeds 2 W m−2 K−1 in HadGEM2 and MPI-ESM1.1 (a revised version of MPI-ESM of Giorgetta et al. 2013) as well.

If \(\alpha\) for volcanic forcing is larger than for CO2 (its efficacy is less than unity, in the terms of Hansen et al. 2005), the method of Forster and Taylor (2006), used to evaluate the forcings in historical CMIP5 experiments by Forster et al. (2013), will underestimate the magnitude of the volcanic forcing, because it assumes that \(\alpha\) is the same for all forcing agents. If \(\alpha\) is not the same, the response to historical volcanic eruptions cannot be used to estimate the effective or equilibrium climate sensitivity that applies to future CO2-forced climate change.

Merlis et al. (2014) considered the possibility of using the climate response to volcanic (Pinatubo) forcing to place constraints on the TCRP for CO2 increase in the GFDL-CM2.1 AOGCM. As we have seen, the volcanic cooling itself is not correctly predicted by the TCRP, but they follow an alternative and novel approach in their analysis by using the upper-layer model (\(C_{u}\,\mathrm {d}T/\mathrm {d}t=F-\rho T\), Eq. 1) in a time-integral form

$$\begin{aligned} \int _{0}^{\tau }\,F\,\mathrm {d}t-\rho \int _{0}^{\tau }\,T\,\mathrm {d}t = \int _{0}^{\tau }\,C_{u}\frac{\mathrm {d}T}{\mathrm {d}t}\,\mathrm {d}t = T(\tau )-T(0) = 0 \Rightarrow \int _{0}^{\tau }\,F\,\mathrm {d}t=\rho \int _{0}^{\tau }\,T\,\mathrm {d}t, \end{aligned}$$

where the time-integral runs from \(t=0\) when the volcanic forcing begins, to \(t=\tau\), sufficiently long after the eruption that the recovery of T from the volcanic cooling is complete (they assume 15 years). Because the upper ocean heat content is the same at the end as at the beginning, its heat capacity \(C_{u}\) is irrelevant in this integral form, which therefore agrees with the zero-layer model \(F=\rho T\).

The duration of the volcanic forcing is small compared with \(\tau\), so a reasonably accurate picture is that the forcing and the consequent sudden drop in T take place almost instantaneously at the start, and for most of the time-integration F is zero, while T is recovering back towards zero. This means that the time-integral method mainly measures the climate feedback parameter and the TCRP which apply to the relaxation towards the steady state following the perturbation, rather than during the perturbation itself, which happens very quickly. Merlis et al. (2014) find that the TCRP from the time-integral method is 5–15 % smaller than the TCRP for gradual CO2-forced warming in their AOGCM. It could be that this difference is partly due to a larger \(\alpha\) in response to volcanic forcing, as we have found.

In summary, we conclude that the zero-layer model and the TCRP are not applicable to the rapid cooling due to volcanic forcing, mostly because of the importance of the upper-ocean heat capacity on short timescales. However, simple models may also overestimate the the volcanic cooling simulated by AOGCMs because the climate sensitivity parameter is smaller for volcanic forcing, and the volcanic effective radiative forcing is reduced by rapid adjustment.