A refined model for the Earth's global energy balance

A commonly-used model of the global radiative budget assumes that the radiative response to forcing, R , is proportional to global surface air temperature T , R = 𝜆 T . Previous studies have highlighted two unresolved issues with this model: first, the feedback parameter 𝜆 depends on the forcing agent; second, 𝜆 varies with time. Here, we investigate the factors controlling R in two atmosphere–slab ocean climate models subjected to a wide range of abrupt climate forcings. It is found that R scales not only with T , but also with the large-scale tropospheric stability S (defined here as the estimated inversion strength area-averaged over ocean regions equatorward of 50 ◦ ). Positive S promotes negative R , mainly through shortwave cloud and lapse-rate changes. A refined model of the global energy balance is proposed that accounts for both temperature and stability effects. This refined model quantitatively explains (1) the dependence of climate feedbacks on forcing agent (or equivalently, differences in forcing efficacy), and (2) the time evolution of feedbacks in coupled climate model experiments. Furthermore, a similar relationship between R and S is found in observations compared with models, lending confidence that the refined energy balance model is applicable to the real world.


Introduction
The response of the climate system to external forcing is often interpreted using the global top-of-atmosphere energy balance framework (Gregory et al. 2002), which states that the net radiative imbalance N equals the sum of the effective radiative forcing F (Sherwood et al. 2015) and the radiative response R, which is assumed to scale with the global surface air temperature anomaly T. We define downward fluxes as positive, and the anomalies are relative to an unperturbed equilibrium state with N = F = 0 .Our sign convention implies that the proportionality constant , denoted the feedback parameter, must be negative in a stable climate system, so that R opposes F.
The global energy balance (1) is widely used to quantify forcing, feedbacks, and climate sensitivity in climate model experiments, historical observations, and paleoclimate data (see Knutti et al. 2017, and references therein).While simple and powerful, the relationship (1) also suffers from known limitations.First, the value of (i.e., the magnitude of the climate feedbacks) depends on the forcing agent (Joshi et al. 2003;Hansen et al. 2005;Forster et al. 2007;Modak et al. 2016), leading to difficulties in interpreting the energy budget in the historical period, where multiple forcing agents drove climate change (e.g., Marvel et al. 2016;Medhaug et al. 2017).Second, can also vary in time; large variations in occurred during the historical period (Gregory and Andrews 2016;Zhou et al. 2016;Andrews et al. 2018), and in most coupled climate models, climate feedbacks evolve towards more positive values over time under CO 2 forcing (e.g., Murphy 1995; Senior and Mitch- ell 2000; Winton et al. 2010; Andrews et al. 2012; Armour  et al. 2013; Andrews et al. 2015; Proistosescu and Huybers  2017; Ceppi and Gregory 2017).These issues suggest that the radiative response may depend on variables other than just global surface temperature.
(1) N = F + R = F + T Recent studies have explained the time dependence of in terms of sea surface temperature (SST) patterns and their impacts on tropospheric stability, with increasing stability favoring more negative cloud and lapse-rate feedbacks (Zhou et al. 2016;Ceppi and Gregory 2017;Andrews and Webb 2018).Tropospheric stability has long been recognized as a key control on low cloud amount (e.g., Klein and Hartmann 1993;Wood and Bretherton 2006), and has been used to make quantitative predictions of low cloud responses to external forcing (e.g., Qu et al. 2015b;Myers and Norris 2016;Brient et al. 2016).Such predictions have generally been restricted to low-cloud subsidence regions, however, and in the absence of a quantitative understanding of how large-scale stability changes affect the global energy budget, we are unable to account for the "pattern effect" in the energy balance relationship (1).Furthermore, it is unclear whether SST patterns can also be invoked to explain the dependence of on forcing agent.
Here we propose an improved energy balance relationship that helps interpret the two aforementioned issues in a consistent way.We perform experiments with two global climate models to demonstrate that the dependence of on forcing agent and time can be explained by a common dependence of the radiative response on the large-scale stability of the troposphere, independent of the forcing agent or time scale.This allows us to quantitatively account for the radiative impact of SST patterns, via changes in stability, in the energy balance relationship.

The equilibrium radiative response to a range of forcing agents
In this section we demonstrate the dependence of the climate feedback parameter on the forcing agent in the perturbed equilibrium that is reached by the climate system if there is no change in ocean heat transport.We use two atmospheric models, CAM4 (Neale et al. 2010) and HadAM3 (Pope et al. 2000).These models are run either with prescribed SSTs and sea ice concentration, or coupled to a mixed-layer "slab" ocean, which simulates sea surface conditions.Where necessary, we refer to the atmosphere-slab ocean models as CAM4-SOM and HadSM3, respectively.For brevity, we will refer to the experiments with prescribed sea surface conditions as "atmosphere-only", while the atmosphere-slab ocean experiments will be denoted "slab" for brevity.

Models
CAM4 is run at a latitude/longitude resolution of 1.9 • × 2.5 • with 24 vertical levels, while HadAM3's horizontal resolution is 2.5 • × 3.75 • with 19 levels.The slab ocean models' energy budget includes a prescribed monthly climatology of ocean heat flux convergence, mimicking the effect of ocean heat transport, to maintain a realistic spatiotemporal distribution of SST.The depth of the slab is set to 50 m everywhere in HadSM3, whereas it varies spatially in CAM4-SOM, being determined from a reference coupled atmosphere-ocean simulation.

Control parameter values and aerosol treatment
The default parameter values used in our simulations are summarized in Table 1.CAM4 uses prescribed aerosol mixing ratios, set to an 1850 monthly climatology (Neale et al. 2010); the aerosol forcing experiments described in the next section use perturbations relative to this climatology.
HadAM3 uses an idealized representation of aerosols, with prescribed uniform vertical distributions over land and ocean (Cusack et al. 1998).

Forcing agents
The slab models are subjected to a variety of forcing agents, including greenhouse gases ( CO 2 , CH 4 ), solar irradiance ( S 0 ), tropospheric sulphate aerosol ( SO 4 ), black carbon aerosol (BC), volcanic aerosol (VOLC), ocean heat uptake (OHU), and idealized, uniform surface forcings (UNIF).The forcing agents and magnitudes, as well as the experiment names, are listed in Table 2.Additional details are listed below for the VOLC, OHU, and UNIF experiments.
-For VOLC we use the January 1992 aerosol loading, near its peak following the Pinatubo eruption in June 1991.
Because volcanic forcing typically lasts for a few years only, we assess the response to volcanic forcing using a 20-member ensemble of 2-year simulations, with the ensemble members initialized from successive years of the respective control simulations.-The OHU forcings are taken from the multi-model mean of the CMIP5 abrupt4xCO2 experiment, averaged over years 1-20 and 21-150.For practical reasons, they are applied jointly with a 4 × CO 2 forcing; we find that OHU in isolation causes a runaway "snowball earth" response in CAM4 (Rugenstein et al. 2016a), owing to the large negative forcings near the sea ice margins.The details of the OHU calculation are provided in "Appendix".-Finally, the uniform surface forcings are prescribed as extra terms in the surface energy budget.These "ghost" forcings (Hansen et al. 1997;Alexeev et al. 2005;Ceppi and Shepherd 2017) are applied separately in the tropics (equatorward of 30 • ; UNIF T , Table 2) and in the extra- tropics (poleward of 30 • ; UNIF ET ), covering half of the Earth's area in each case.The local forcing magnitude is set to ±7 W m −2 , yielding a global effective forcing comparable to that of a doubling of CO 2 (Table 2).
Note that some forcing cases have not been run for both models (Table 2).Namely, the representation of aerosol in HadAM3 is too limited to allow us to run the SO 4 and BC cases, and we found that CAM4-SOM quickly enters a snowball Earth-type runaway response in negative forcing experiments such as 0.5 × CO 2 , −1.5%S 0 or −UNIF ET .

Experimental design
All forced slab simulations are branched from the same date in the reference control experiment with the forcing switched on at the start of the simulations and held constant thereafter.The simulations are run to steady state, which is typically reached within 20 years (Fig. 1).These simulations are run for 50 years in total, and the new equilibrium climate is calculated as the climatology of years 31-50, except for VOLC where the response is calculated from the ensemble-mean year 2 climate.The control climatology is also based on a 20-year average.For any variable X, we denote the slab control climatology as X ctl , while the perturbed climatology is X pert .The total response of that variable in the forced slab runs is then ΔX cpl = X pert − X ctl .This total response can be decom- posed into an SST-mediated component, ΔX SST , and a fast adjustment, ΔX adj (Hansen et al. 1997).The adjustment is regarded as part of the forcing (Gregory and Webb 2008;Sherwood et al. 2015), whereas we are interested in the SST-mediated component, which drives the radiative response.
We therefore perform additional atmosphere-only simulations to separate the fast adjustments from the SST-mediated response.These atmosphere-only simulations are run for a minimum of 20 years with the same set of forcings as the slab runs, but keeping SSTs and sea ice fixed to their slab control climatologies.Although the sea surface conditions are the same as for the slab control, the removal of coupled feedback means that the atmosphere-only control state (without forcing agents) differs slightly from the slab control.Therefore we also run an atmosphere-only control experiment.
The forcing values are derived from atmosphere-only experiments as in Hansen et al. (1997).The values are those plotted in Fig. 3c, d.Experiments that cause a snowball earth response in CAM4 are marked "*", those that were not run are marked "-"

Forcing agent
Experiment name Denoting the atmosphere-only perturbed climatology as X atm,pert and its control as X atm,ctl , we can then write Note that if X is the net TOA radiative flux N, then Eq. 2 gives the effective radiative forcing F, while Eq. 3 gives the radiative response R. In the remainder of the paper, the results will be calculated following Eq. 3 unless otherwise noted, and we will drop the subscript SST when referring to the SST-mediated responses.
Additional atmosphere-only simulations are performed to assess the responses to uniform and patterned SST changes.These simulations are based on the atmosphere-only control described above and are also run for 20 years.Further details on these simulations are provided in Sect. 3 where these results are discussed.

Results
In the energy balance relationship of Eq. 1, the climate feedback parameter is = R∕T i.e. the radiative response normalized by warming.In both models, there is a wide range of R/T in the slab experiments, contradicting the assumption that is a constant of the climate system (Fig. 2).Although HadSM3 produces less negative R/T than CAM4-SOM, the two models are generally very similar in terms of the dependence of R/T on the forcing agent: for example, the UNIF T and VOLC experiments yield more nega- tive R/T, whereas UNIF ET gives less negative values; and 4 × CO 2 + OHU yields more negative R/T than 4 × CO 2 (2) without OHU.An interpretation for the dependence of R/T on forcing agent will be provided in Sect.5.1.

Radiative response to globally-uniform SST change
The range of R/T that we find in equilibrium climate change for a range of forcing agents indicates that the assumption of proportionality R ∝ T is not accurate.However, it explains most of the variation of R across the slab experiments (Fig. 3a, b; colored symbols).The correlation coefficients   2), the full set of results is shown in the plot margins for each model between R and T are − 0.95 in both CAM4-SOM and HadSM3.
The results from a set of atmosphere-only experiments with globally uniform SST changes are also included in Fig. 3 (black dots), in which global SST perturbations ranging between − 4 and + 10 K in 2-K increments are added to the control state while keeping sea ice fixed.The relationship between R and T in these simulations is overall consistent with the results from the slab experiments.(Note that the uniform-ΔSST experiment results were corrected for the lack of an ice-albedo feedback, for consistency with the slab experiments; see the "Appendix") The uniform-ΔSST experiments predict that R/T is roughly constant in CAM4, and linearly dependent on temperature in HadAM3 (Fig. 3c,  d).Thus, the atmosphere-only experiments reveal that even for idealized, uniform, SST perturbations, the relationship between R and T can be nonlinear (black curve in Fig. 3b).A kernel decomposition of the radiative changes, following Soden et al. (2008), indicates that the nonlinearity is primarily associated with the cloud response (not shown).This nonlinearity constitutes one limitation of the classical energy balance framework in Eq. 1.

Radiative response to SST patterns of change
The radiative responses in some of the slab experiments depart substantially from the relationship expected from the uniform-ΔSST experiments; for a given T, the deviations amount to several W m −2 in some experiments.The differ- ences between the slab and uniform-ΔSST experiments are much more striking when considering R/T (Fig. 3c, d).In the slab experiments, R/T is not constant but shows no obvious monotonic dependence on T. To the extent that the radiative responses can be linearly decomposed into mean and pattern components of the SST change (as will be demonstrated later, Sect.3.3), the deviations must be associated with the SST pattern (Andrews et al. 2015;Gregory and Andrews 2016;Zhou et al. 2016;Ceppi and Gregory 2017).Since the classical energy balance framework (Eq. 1) assumes that the radiative response scales with mean temperature only, this "pattern effect" constitutes a second, arguably more fundamental, limitation of the classical framework.

Combination of the radiative responses to uniform and patterned SST change
We will next demonstrate that the radiative responses in the slab experiments can be partitioned into mean and pattern components of the SST response, and that these components of the radiative response are governed by distinct physical processes.In Fig. 4a we compare the actual R from the slab experiments with R m + R p , which is the sum of the mean SST-driven component R m (predicted from the linear or quadratic fits in Fig. 3a, b) and the pattern component R p , obtained from a separate set of atmosphere-only experi- ments.For these experiments, we calculate the SST anomalies in the equilibrium slab climatology for each month and gridpoint, subtract the global-mean SST anomaly to form a pattern which has zero global mean by construction, and add this pattern to the control atmosphere-only climatology, keeping sea ice fixed.
The relationship in Fig. 4a is generally close to the oneto-one line, although errors are larger in HadAM3, which tends to produce SST patterns of larger amplitude compared with CAM4.In Fig. 4b, the results from panel (a) are normalized by the total temperature anomalies taken from the slab experiments.The predicted R/T values are within 15% of the actual values, with the exception of the −UNIF ET case in HadAM3, which features an anomalously positive albedo feedback (not shown).Overall, however, the sum of the mean and pattern responses accurately predicts R and R/T.Similar linearity of the responses was documented in previous work with the CAM5 model (Zhou et al. 2016(Zhou et al. , 2017)).We can therefore separately investigate the responses to mean and pattern SST changes in order to understand the full radiative responses.
We have already explained the radiative responses to changes in mean SST, R m , in terms of global-mean temper- ature (Fig. 3).In the pattern experiments, the global-mean temperature changes are small by construction (the standard deviations across experiments are 0.04 and 0.26 K in CAM4 and HadAM3, respectively), so R p cannot be explained in terms of T p .Instead, we propose that the radiative impact of the SST patterns comes through changes in near-global tropospheric stability, S (in K).Here we define S as the areaaverage change in estimated inversion strength (EIS; Wood and Bretherton 2006) over ocean areas between 50 • S and 50 • N. S in our definition is therefore not strictly global, but we find a stronger relationship between S and R if the high latitudes are excluded.We speculate this is because large stability changes occur at high latitudes in association with changes in sea ice extent, but these changes are not reflected in the processes controlling radiation-primarily cloud cover, as discussed below.
The relationship between radiative response and stability is demonstrated in Fig. 5 for the pattern experiments (colored symbols).In both models, the relationship is negative, and remarkably linear.That increasing stability promotes a negative radiative response is consistent with the findings of Zhou et al. (2016), Ceppi and Gregory (2017), and Andrews and Webb (2018), who ascribed the stability effect to the cloud and (to a lesser extent) lapse-rate feedbacks.We confirm these findings by performing a kernel decomposition of R p (Fig. 6), which reveals that the stability effect is dominated by shortwave (SW) cloud feedback, with some cancellation by the longwave (LW) cloud feedback, and a smaller contribution from the lapse-rate feedback.SW cloud feedback also explains the stronger sensitivity of R to S in CAM4 relative   The negative cloud-radiative response occurs primarily because increased stability favors more low cloud in the global mean (Fig. 7).On local scales, this relationship between low cloud fraction and tropospheric stability is very well established observationally (Klein and Hartmann 1993;Wood and Bretherton 2006), and is present in most global climate models, even though models tend to underestimate the magnitude of the cloud response (Qu et al. 2015b;Myers and Norris 2016).A novel aspect of our results is to relate the global responses of tropospheric stability and radiation in a quantitative way; to our knowledge, so far this had only been done locally.The relatively large radiative responses to stability variations (Fig. 5) reflect the key importance of low clouds for the global radiative budget and climate sensitivity (Bony and Dufresne 2005;Webb et al. 2013).
In principle, we do not expect the radiative responses to stability variations to be spatially uniform.For example, the latitudinal dependence of insolation means that even under the assumption of a uniform dependence of cloud cover on S, the radiative response would be largest in the tropics.We therefore expect that spatial variations in the dependence of R on stability are implicit in the regression slope in Fig. 5.

A refined energy balance model
Motivated by our findings, we propose a simple refinement of the energy balance relationship (1) to address both the deficiencies that we have demonstrated.We postulate that the radiative response can be formulated as where and are both in units of W m −2 K −1 .If and are constants, R depends linearly on T and S, but in general = (T) and = (S) , as discussed later in this section.
The decomposition of the radiative response into temperature and stability components is not equivalent to decomposing into mean SST and SST pattern components.This is because uniform SST perturbations cause changes in both T and S (Fig. 8; see also Qu et al. 2015a).However, the relationship (4) can be used to interpret the radiative impacts of mean and pattern SST changes, as follows.We have shown that R = R m + R p , to a good approximation, so we formulate both components in terms of temperature and stability: where S m is the stability change induced by the uniform change in SST.Note that since the SST pattern in isolation causes negligible changes in global-mean temperature, T is included only in Eq. 5 and is not subscripted.We parameterize the stability response to mean SST changes as a linear function of temperature: The relationship seems closer to quadratic in the case of CAM4 (Fig. 8), but the linear approximation suffices for our purposes: the correlation coefficient between S m + S p and S (combining all experiments and models) is 0.96.We can then rewrite (5) as and defining we obtain  By taking the sum of ( 6) and ( 8), we obtain an alternative formulation of (4) which allows us to directly relate R to the decomposition into mean SST change and SST pattern discussed in Sect.3.3: We calculated m , , and from our mean and pattern experiments using the fits in Figs. 3, 5, and 8.We then derived using Eq. 7. The values of these parameters are listed in Table 3.Note that the quadratic fit in Fig. 3b suggests that m (and hence also ) is itself a linear function of T in HadAM3.
As a simple test of the refined energy balance model, we plot the actual radiative responses in the slab experiments against those predicted by Eq. 9 (Fig. 9).Consistent with the results in Fig. 4, the prediction tends to be slightly less accurate for HadAM3, but overall the relationship accurately predicts the range of R and R/T values in our experiments for both models.Note that the and parameters are independent of the results we are predicting, since they are derived from the atmosphere-only mean and pattern SST experiments, while the predicted values are from slab runs.Although not shown, the prediction based on Eq. 4 performs equally well.
Our revised energy balance model helps to interpret the results of Dessler et al. (2018), who proposed a variant of the classic model where the radiative response scales with 500 hPa temperature ( T 500 ) rather than with surface air temperature T. finding that R correlates better with T 500 than with T reflects the fact that mid-tropospheric temperature responds to changes in both T and S.An advantage of the model proposed here is that the relationship between R and climate sensitivity is straightforward (since the model is based on global surface temperature), and furthermore additional physical insight is gained by considering the distinct processes associated with mean warming versus stability changes.

Implications of the refined model
We now discuss the significance of our refined energy balance model for three issues: the dependence of the feedback parameter on the forcing agent, the time variation of the feedback parameter, and the interpretation of the observed global radiative budget.These issues are discussed in turn in the next three subsections.

Dependence of the feedback parameter on forcing agent
The value of the classical feedback parameter (Eq. 1) is known to depend on the forcing agent (cf.Fig. 3c, d); equivalently, this dependence can also be interpreted in Table 3 Values of , m , , and derived from the atmosphere-only simulations.m and are taken from the uniform-ΔSST simulations (Figs. 3,8), is calculated from the pattern experiments (Fig. 5), and is calculated using Eq.7 Fig. 9 a R in the slab experiments, versus the value predicted using Eq. 9 and the values in Table 3. b Same for R/T.The black lines denote the one-to-one relationship terms of differences in forcing efficacy, the global temperature response per unit (effective) radiative forcing, T/F, relative to that of CO 21 (Joshi et al. 2003;Hansen et al. 2005;Forster et al. 2007;Winton et al. 2010;Rose et al. 2014;Marvel et al. 2016;Modak et al. 2016;Rugenstein et al. 2016a).Understanding the cause for differences in efficacy among forcing agents has been a long-standing question in climate dynamics.
Here we demonstrate that, at least for the climate models and forcing agents considered here, the forcing agent dependence of the feedback parameter can be explained in terms of the stability response to different forcings.Dividing Eq. 4 by T yields indicating that the classical feedback parameter, = R∕T , should be a linear function of the stability response per unit warming.If is a linear function of T rather than a constant, as is the case in HadAM3, we can substitute = ( 1 T + 0 )T in Eq. 4 before diving by T, yielding For HadAM3, 1 = 0.06 W m −2 K −2 and 0 = −0.91W m −2 K −1 (Fig. 3b).
We confirm this by plotting R/T (Fig. 10a) and R∕T − 1 T (Fig. 10b) against S/T for the slab experiments.In this representation, the intercept of the linear fit represents (or 0 ), while the slope corresponds to .The points lie close to the predicted relationships based on Eqs. 10 and 11.Our results therefore suggest that forcing agents cause different feedbacks, i.e. vary in efficacy, because they induce different SST patterns, and hence different stability responses per unit warming.
Among the forcing agents studied here, a good predictor of the variation in = R∕T is the ratio of tropical to global effective forcing (Fig. 11): forcings that are more focused on the tropics tend to yield more negative , i.e. have lower efficacy ( r = − 0.92 if excluding the two outlier CAM4 experiments BC × 10 and VOLC, discussed below).Compared with uniform forcings, tropical forcings tend to cause enhanced free-tropospheric warming per unit global  3.For reference, the results from the uniform-ΔSST experiments are also included (black dots) Fig. 11 Values of R/T, taken from the slab runs, versus the ratio of tropical to global effective forcing, calculated from atmosphere-only simulations with fixed SST and sea ice.The forcing ratio is defined so that a value of 1 means that the forcing is entirely in the tropics, where the tropics include the area between 30 • S and 30 • N. The global effective forcing values are listed in Table 2 surface warming (higher S/T), because the tropics are generally close to neutral moist stability, and therefore well coupled with the free troposphere through convection, relative to other parts of the world.This interpretation is consistent with Zhou et al. (2017) and Andrews and Webb (2018), who showed that increasing SSTs in tropical ascent regions excites a negative global radiative response (consistent with positive S), while warming away from ascent regions mostly causes positive radiative changes.
The results in Fig. 11 provide a physical basis to interpret the low efficacy of solar and volcanic forcings (Fig. 11; Hansen et al. 2005;Marvel et al. 2016;Modak et al. 2016;Gregory et al. 2016), which are more focused on the tropics relative to CO 2 .They also account for the high efficacy of ocean heat uptake and other extratropical forcings (Winton et al. 2010;Rose et al. 2014;Rose and Rayborn 2016;Rugenstein et al. 2016a;Liu et al. 2018).We note, however, that two CAM4 experiments, BC × 10 and VOLC, have substantially lower R/T than expected given the meridional structure of these forcings.The VOLC experiment is not run to equilibrium (Sect.2.4), which likely affects the pattern of the SST response (and therefore the change in S), since the SST pattern is likely to evolve in time.The BC × 10 forcing is mainly characterized by a pattern of land-sea contrast, rather than by a meridional contrast (not shown), and we speculate that this land-sea contrast causes a large stability response that is not captured by our simple index.In support of this reasoning, Qu et al. (2015a) found that land warming could cause a decrease in coastal stratocumulus cloud via the stability mechanism.We therefore conclude that the meridional structure of the forcing is an important but not the sole factor controlling forcing efficacy.

Apparent time dependence of the feedback parameter
Previous studies have proposed that large-scale stability changes are responsible for time variations in in the historical period (Zhou et al. 2016) and in CO 2 -forced model simulations (Ceppi and Gregory 2017;Andrews and Webb 2018).Here we demonstrate that these variations can be accounted for quantitatively by using the energy balance (4).We begin with historical variations in .Following Gregory and Andrews (2016) and Andrews et al. (2018), we define as the least-squares slope of R versus T, = R∕ T .We assess the evolution of by calculating R∕ T over slid- ing 30-year windows in the amip-piForcing experiment, where our two atmosphere models are forced with historical observed SSTs from 1871 to 2012 while keeping forcing agents at pre-industrial levels.Note that for both models, the results are averages over four ensemble members.Since there is no forcing, in these runs we can readily diagnose R as R = N .If the refined energy balance (4) holds, then we should be able to predict the time evolution of R∕ T using and which is Eqs. 10, 11 rewritten in differential form.Comparing the actual R∕ T with the predicted values, we find that Eqs. 12, 13 predict the time evolution well, despite an overall negative bias in CAM4 (Fig. 12).The results show that during the historical period, the stability response generally led to more negative feedbacks (more negative , lower climate sensitivity) compared to a case with no stability changes (Fig. 12, dashed lines) or compared to the expected response to CO 2 -only forcing (dotted lines).Next we turn to the problem of increasing climate sensitivity over time under CO 2 forcing.Figure 13a, b shows N versus T in simulations with atmosphere-ocean general circulation models (including a three-dimensional dynamical ocean model rather than a slab model), where the models are subjected to an abrupt 4 × CO 2 forcing.In this configura- tion, we refer to our models as CESM-CAM4 and HadCM3.The simulations are 250 years long and 100 years long in CESM-CAM4 and HadCM3 respectively.To minimize noise, we use ensemble averages.For CESM-CAM4 the experiment contains 12 ensemble members over the first 100 years, then 5 members over the remaining 150 years; for HadCM3 there are 7 members over the whole experiment.The CESM-CAM4 and HadCM3 abrupt4 × CO 2 ensembles are described in more detail in Rugenstein et al. (2016b) and Andrews et al. (2015), respectively.
Since the forcing is abrupt and therefore constant, we can consider N instead of R, and we use = N∕ T .As high- lighted by the red least-squares fits in Fig. 13a, b, N∕ T becomes less negative as time passes.We calculate the evolution as above, except that we use a sliding 1.2 K window (as in Rugenstein et al. 2016b) rather than a fixed time window; this maintains an adequate signal-to-noise ratio throughout the time series and yields cleaner results towards the later part of the runs, where T and N evolve very slowly in time.
The sliding regressions over the 4 × CO 2 simulations again indicate that the refined energy balance predicts the evolution of well (Fig. 13c, d).The predictions look noisier towards the beginning of the runs, possibly due to residual noise in the results despite the use of ensemble averages, because there are fewer years in a given T interval at the start of the experiment.In HadAM3, part of the increase in over time is associated with the temperature dependence of (dashed line in Fig. 13d), but changes in stability mostly explain the evolution.
It is interesting to note that although CESM-CAM4 and HadCM3 produce similar feedback values under 4 × CO 2 forcing, they achieve these values through very different combinations of the uniform-SST and pattern responses.In 12, 13, and the dashed lines also correspond to predictions based on Eqs. 12, 13 but excluding the stability term CESM-CAM4, the normalized stability response, S∕ T , is near zero or negative, while it is substantially positive in HadCM3 (compare the thin solid and dashed lines in Fig. 13c, d, and recall R ∝ −S ).This indicates that the two models produce substantially different patterns of SST response to CO 2 forcing, highlighting the need for con- straints on future patterns of SST change in response to forcing.

Observations of the Earth's radiation budget
Having demonstrated the relationship between stability and radiative budget in climate models, we now verify whether our findings apply to the real world.We use global satellite observations of net top-of-atmosphere radiative flux, N, based on the Clouds and the Earth's Radiant Energy System (CERES) Energy Balanced and Filled (EBAF) version 4.0 data product (Loeb et al. 2018).We analyze deseasonalized monthly data for the period March 2000-February 2017.We estimate R as N − F , where F is based on the IPCC AR5 forcing time series, revised and extended by Dessler and Forster (2018).Estimates of T and S are obtained from ERA5 reanalysis data (Hersbach and Dee 2016).R is negatively correlated with S on monthly timescales, and this relationship is statistically significant (Fig. 14a).By comparison, T is a poor predictor of monthly variations in R ( r = −0.14 , not statistically significant; not shown).Since T and S tend to covary in monthly observations ( r = 0.44 ), the relationship in Fig. 14a could include a response to T; however, we obtain a nearly identical result if the effect of T is regressed out from both S and R (Fig. 14b).Meanwhile, the relationship between R and T remains weak if S is regressed out ( r = 0.15 ; not shown).It therefore appears that tropo- spheric stability is a key control on the global energy budget in the real world.
An implication of this result is that previous observational estimates of based on Eq. 1 (e.g., Gregory et al. 2002;Forster and Gregory 2006;Roe and Armour 2011;Otto et al. 2013;Kummer and Dessler 2014;Lewis andCurry 2015, 2018;Resplandy et al. 2018) may have been biased by not accounting for the role of stability variations.Our results also support the findings of Andrews et al. (2018), who showed that accounting for the impact of SST patterns (which we show to be mediated by stability) increases previous observational estimates of climate sensitivity, making them consistent with model-based estimates.

Conclusions
The radiative response to forcing is commonly assumed to follow a simple linear dependence on global surface air temperature, R = T .Using two global climate models, we demonstrate that a better model of the radiative response is obtained by including the effect of large-scale tropospheric stability S, quantified as the estimated inversion strength (EIS, in K): R = T + S .All other things being equal, positive S causes a negative R (a cooling effect), because of (a) increased low cloud cover (a negative shortwave cloud feedback), and (b) increased longwave emission to space from the upper troposphere (a negative lapse-rate feedback).The importance of the stability term in the refined energy balance model results from the fact that low clouds are a The confidence intervals for the regression slopes ( ) are for a 95% confidence level.Following Santer et al. (2000), we account for the reduction in the number of degrees of freedom owing to autocorrelation in the time series before calculating confidence intervals.In panel b, the subscript "fixed T" indicates that T was regressed out from the respective variables, i.e.R fixed T is the residual of the regression of R onto T leading cause of differences in radiative feedback across climate models and forcing agents.The stability term S quantitatively explains the impact of diverse SST patterns on the radiative response.By including this term, we show that differences in efficacy across a wide range of forcing agents are largely due to the associated SST patterns, which cause different stability responses.Forcings focused on the tropics tend to cause a more positive stability response, resulting in lower efficacy, compared with extratropical forcings.This helps to explain previous findings, e.g. the low efficacy of solar and volcanic forcing (Hansen et al. 2005;Marvel et al. 2016;Modak et al. 2016;Gregory et al. 2016), and the high efficacy of ocean heat uptake and other extratropical forcings (Winton et al. 2010;Rose et al. 2014;Rose and Rayborn 2016;Rugenstein et al. 2016a;Liu et al. 2018).
Furthermore, the impact of SST patterns on the time evolution of the feedback parameter ( = R∕ T ) can also be captured by the stability term in our refined model of the radiative response.In periods where the stability response per unit warming, S∕ T , is more positive, the radiative response per unit warming R∕ T is more negative, and vice versa.This explains both the historical variations in R∕ T given the observed evolution of SSTs (Gregory and  Andrews 2016; Zhou et al. 2016; Andrews et al. 2018), and the increase in R∕ T over time in coupled models under CO 2 forcing (e.g., Murphy 1995; Senior and Mitchell 2000;  Williams et al. 2008; Winton et al. 2010; Andrews et al.  2012; Armour et al. 2013; Andrews et al. 2015; Proistosescu  and Huybers 2017; Ceppi and Gregory 2017).
Finally, we show that the relationship between S and R is qualitatively similar in the real world compared with climate models.In recent satellite observations of the radiative budget, most of the monthly variations in R are driven by S, and the two variables are well-correlated on monthly timescales ( r = −0.57).Because the stability response will affect the estimate of = R∕ T , this implies that the role of stability must be taken into account when quantifying climate sensitivity from historical observations.This could be done by diagnosing the two parameters in our refined energy balance model using multiple linear regression, an approach similar to that followed by low cloud observational studies (Klein et al. 2017, and references therein) but extended to global scales.

B. Albedo feedback in the uniform-1SST experiments
Owing to the constraint of fixed sea ice concentrations, the uniform-ΔSST experiments lack an ice-albedo feedback.Since we are interpreting the slab radiative responses as the sum of a uniform-ΔSST component and an SST pattern component, R = R m + R p (Fig. 4), we need to account for the missing ice-albedo feedback in the uniform warming simulations in order to explain the slab responses.We therefore use a modified radiative response R m + T , where is a pseudo-ice-albedo feedback.is estimated by taking the difference between the mean albedo feedback in the slab runs, and the mean albedo feedback in the uniform SST runs, calculated in both cases using CAM5 radiative kernels (Soden et al. 2008;Pendergrass et al. 2017).We estimate to be 0.30 W m −2 K −1 in CAM4 and 0.20 W m −2 K −1 in HadAM3.

Fig. 1
Fig. 1 Evolution of globalmean surface air temperature anomalies T relative to the control climatology in the slab experiments.The symbols at year 50 denote averages over the last 20 years, the period we use to calculate responses.For VOLC, the circles denote the 20-member ensemble average in year 2 (Sect.2.4).Here and in subsequent figures, open symbols indicate CAM4-SOM results, while filled symbols correspond to HadSM3

Fig. 2
Fig. 2 R/T in CAM4-SOM and HadSM3 slab experiments.The black line represents the one-to-one relationship.Because some of the experiments were run with only one of the models (Table2), the full set of results is shown in the plot margins for each model

Fig. 3
Fig. 3 Top row: R versus T in a CAM4 and b HadAM3.Bottom row: R/T versus T in c CAM4 and d HadAM3.Colored circles denote results from the slab simulations, while the black circles are from

Fig. 4
Fig. 4 Left: R versus R m + R p , the sum of the radiative responses in the uniform-ΔSST and patterned-SST simulations.Right: R/T versus (R m + R p )∕T .The black line denotes the one-to-one relationship

Fig. 5 R
Fig. 5 R p versus S p in patterned-SST simulations with a CAM4 and b HadAM3.The black lines denote least-squares fits to the data

Fig. 6
Fig. 6As in Fig.5, but R p is decomposed into contributions from a the Planck response, b the lapse rate, c relative humidity, d surface albedo, e longwave cloud-radiative effects, f shortwave cloud-radiative effects, g net cloud-radiative effects, and h the sum of all contributions.The decomposition is calculated with CAM5 radiative ker-

Fig. 7 Fig. 8 S
Fig. 7 Global-mean low cloud amount response versus S p in the SST pattern experiments.Here low cloud amount is defined as the massweighted vertical average in the layers below 700 hPa

Fig. 10 R
Fig. 10 R/T versus S/T in a CAM4-SOM and b HadSM3.The black lines denote the relationships predicted using Eqs. 10 and 11 using the and values from Table3.For reference, the results from the uniform-ΔSST experiments are also included (black dots)

)Fig. 12
Fig. 12Time evolution of R∕ T , the regression slope of R versus T, calculated over 30-year sliding windows in amip-piForcing simulations(Andrews et al. 2018) with a CAM4 and b HadAM3.For both models, the results are averages over four ensemble members.Black curves are the actual R∕ T values; solid red curves denote the values predicted from Eqs. 12, 13 using the values in Table3; dashed lines also indicate predictions based on Eqs. 12, 13, but omitting the stability term; and dotted lines show R/T obtained from the atmosphereslab ocean 4 × CO 2 simulations

Fig. 13
Fig. 13 Top: N versus T in fully-coupled atmosphere-dynamical ocean 4 × CO 2 simulations.Black circles denote individual years.The simulations are 250 years long in CESM-CAM4 (a) and 100 years long in HadCM3 (b).For both models, the results are ensemble averages (see text).The red lines show the least-squares fits of N versus T over 1.2 K windows, for the first and last windows available in the time series.The lines are solid over the 1.2 K window used to calcu-

Fig. 14 R
Fig. 14 R versus S in observations (CERES-EBAF version 4.0) and reanalysis data (ERA5) during March 2000-February 2017.Black circles denote individual months; black lines are least-squares fits.The confidence intervals for the regression slopes ( ) are for a 95% confidence level.FollowingSanter et al. (2000), we account for the

Table 1
Control parameter values used in CAM4 and HadAM3.We only list those parameter values that are perturbed in our experiments ppmv parts per million by volume, ppbv parts per billion by volume