Abstract
Recent assessments of climate sensitivity per doubling of atmospheric CO_{2} concentration have combined likelihoods derived from multiple lines of evidence. These assessments were very influential in the Intergovernmental Panel on Climate Change Sixth Assessment Report (AR6) assessment of equilibrium climate sensitivity, the likely range lower limit of which was raised to 2.5 °C (from 1.5 °C previously). This study evaluates the methodology of and results from a particularly influential assessment of climate sensitivity that combined multiple lines of evidence, Sherwood et al. (Rev Geophys 58(4):e2019RG000678, 2020). That assessment used a subjective Bayesian statistical method, with an investigatorselected prior distribution. This study estimates climate sensitivity using an Objective Bayesian method with computed, mathematical priors, since subjective Bayesian methods may produce uncertainty ranges that poorly match confidence intervals. Identical model equations and, initially, identical input values to those in Sherwood et al. are used. This study corrects Sherwood et al.'s likelihood estimation, producing estimates from three methods that agree closely with each other, but differ from those that they derived. Finally, the selection of input values is revisited, where appropriate adopting values based on more recent evidence or that otherwise appear better justified. The resulting estimates of longterm climate sensitivity are much lower and better constrained (median 2.16 °C, 17–83% range 1.75–2.7 °C, 5–95% range 1.55–3.2 °C) than in Sherwood et al. and in AR6 (central value 3 °C, very likely range 2.0–5.0 °C). This sensitivity to the assumptions employed implies that climate sensitivity remains difficult to ascertain, and that values between 1.5 °C and 2 °C are quite plausible.
1 Introduction
The Earth's climate sensitivity is a key measure of the longerterm climate response to external forcing. It is perhaps the most important illquantified climate system parameter. In principle, climate sensitivity represents the equilibrium change in mean surface temperature to a doubling of atmospheric CO_{2} concentration from preindustrial levels, once the deep ocean has reached a stable state. In practice it is normally estimated using some approximate measure, often derived from disequilibrium changes. Climate sensitivity has been estimated from various types of evidence, but none of these has narrowly constrained its value. The first five Assessment Reports by the Intergovernmental Panel on Climate Change (IPCC) relied heavily on estimates of climate sensitivity from global climate model (GCM) simulations. The 1.5–4.5 K likely range for climate sensitivity in the 2013 IPCC Fifth Assessment Report (AR5) was identical to the range presented in the landmark Charney (1979) report, with the great increase in GCM sophistication since 1979 not having led to any narrowing of the climate sensitivity range.
GCMs use semiempirical approximations (parameterizations) to represent subgridscale cloud and convection processes that are known to be critical to determining the model's climate sensitivity, which varies by up to a factor of three among GCMs. In one well regarded GCM, a simple change to how convective precipitation was parameterized^{Footnote 1} varied its climate sensitivity by a factor of two, with no obvious change in how well the model otherwise performed (Zhao et al. 2016). Changing the order in which the various parameterized atmospheric modules were updated in each time step was found to vary another GCM's climate sensitivity by a factor of up to two, with ambiguity existing regarding the optimum ordering (Donahue and Caldwell 2018). Moreover, the universal use in GCMs of deterministic parameterizations may bias their climate sensitivity upwards (Strommen et al. 2019). Such issues make the reliability of GCMderived estimates of climate sensitivity questionable.
In the light of such issues, and the further widening of the range of GCM climate sensitivities in the latest (CMIP6) generation of GCMs (Zelinka et al. 2020), the IPCC Sixth Assessment Report (AR6) abandoned the previous reliance on GCM climate sensitivities. Instead, evaluation of climate sensitivity was approached by combining estimates based on different lines of evidence, such as process understanding (feedback analysis), the historical instrumental record, and paleoclimate data.
Combining different lines of evidence should, to the extent that they are independent, enable climate sensitivity to be estimated more precisely than from any single line of evidence (Stevens et al. 2016). A comprehensive attempt to do so was made by Sherwood et al. (2020, henceforth S20), a 92page study. S20 was conducted under the auspices of the World Climate Research Programme's Grand Science Challenge on Clouds, Circulation and Climate Sensitivity and provides a very detailed investigation of climate sensitivity. As the most influential recent assessment, S20 was cited over twenty times in the relevant AR6 chapter, which approached climate sensitivity estimation on very similar lines to S20, albeit not using its formal probabilistic methods. There are in principle considerable strengths in S20's scientific approach. Its main results were derived by combining understanding from feedback analysis (Process evidence) with evidence from changes since circa 1850 (Historical evidence), and from cold and warm past periods (Paleoclimate evidence)—three lines of evidence that S20 judged to be largely independent.
The contribution the present study makes to estimation of climate sensitivity is threefold. First, it identifies statistical problems in S20. The main methodological argument is that, when Bayesian methods are used, an Objective rather than a Subjective Bayesian approach should be taken. This means that rather than the investigator choosing the prior distribution, the prior distribution should be mathematically computed, based on the assumed statistical model relating to all the evidence to be analyzed (Bernardo 2009). S20 used a Subjective Bayesian statistical method, with an investigatorselected prior distribution, that has been shown may produce unrealistic climate sensitivity estimation when used to combine differing types of evidence (Lewis 2018), and S20 provided no evidence that it did not do so in this case. Moreover, for all except Process evidence, S20 used a method of estimating likelihoods that turns out to be unsound. This study validates its likelihood estimates by using multiple methods and crosschecking their results. S20's method is shown to often result in serious likelihood underestimation at higher climate sensitivity levels.
The second contribution of this study is that it develops and applies an Objective Bayesian approach to combining differing climate sensitivity evidence, using a mathematically computed prior distribution. The results using the methodology developed and the same input assumptions as S20 are then used to assess what effect the statistical problems identified in S20 have on its results. It is found that they bias S20's estimation of climate sensitivity downwards, although only to a minor extent even at the upper uncertainty bound when all three lines of evidence are combined.
This study's third contribution is to review and where appropriate revise the input assumptions used by S20, paying particular regard to more recent evidence, and to investigate the effect of the revised input assumptions on estimates of climate sensitivity using the developed Objective Bayesian methodology. Some of the revisions to input assumptions relate to the treatment in certain cases of CO_{2} forcing and/or the warming it causes. This study differs from S20 regarding the appropriate scaling of CO_{2} forcing, and comparison of warming, where different changes in CO_{2} atmospheric concentrations are involved, and regarding scaling CO_{2} forcing where its use requires a different estimation basis from that on which the forcing estimate was derived. The combined effects of the revisions to S20's CO_{2} related estimates and to other input assumptions result in a major reduction in estimated climate sensitivity.
The paper is structured as follows. Climate sensitivity measures are discussed in Sect. 2. Section 3 deals with statistical methods, while Sect. 4 reviews S20's input assumptions and proposes certain revisions to them. Section 5 sets out results based on S20's original input assumptions but using the corrected likelihood estimates, using alternatively S20's chosen Subjective Bayesian prior distribution or the Objective Bayesian (mathematically computed) prior distributions. Section 6 presents results using the revised input assumptions and Objective Bayesian prior distributions. Section 7 discusses the statistical issues and both their effects and the effects of using different input assumptions.
2 Climate sensitivity measures
The traditional measure of climate sensitivity is the equilibrium change in global mean surface temperature (GMST) following a doubling of the atmospheric CO_{2} concentration (equilibrium climate sensitivity, henceforth ECS). While the equilibrium involved allows for the deep ocean to reach a steady state, it excludes changes in slow components (e.g., ice sheets). Such an equilibrium is achievable in a GCM but not in the real climate system. The corresponding equilibrium change over timescales that allow for feedbacks from changes in the slow components to occur is called Earth system sensitivity (ESS).
Under the standard linear forcingfeedback framework, the excess of a change ΔF in effective radiative forcing (ERF) over the change in topofatmosphere (TOA) planetary radiative imbalance, ΔN, is equal and opposite to the climate system's radiative response ΔR, measuring all radiation downwards. ERF is a measure of the increase in TOA radiative imbalance resulting from a change in atmospheric composition, such as an increase in CO_{2} concentration, with surface temperature held constant but the atmosphere allowed to adjust to the change. ΔR is taken to be the product of the change ΔT in global mean nearsurface air temperature (GMAT), or in GMST, and a fixed climate feedback parameter \(\lambda^{{{\text{fixed}}}}\). Accordingly:
Under this framework, it follows that for \(\Delta N = 0\), representing equilibrium:
and hence for the ERF from a doubling of CO_{2} concentration, F_{2⤬CO2}:
However, in GCMs (and the real climate system) the climate feedback parameter may not in fact be fixed, in which case a linear projection to \(\Delta N = 0\) will not provide an accurate estimate of ECS.
Rather than seeking to estimate ECS, S20 instead estimate an effective climate sensitivity S, that corresponds to the effective sensitivity in GCMs derived from feedbacks occurring during the first 150 simulation years after an abrupt quadrupling of CO_{2} concentration from its preindustrial level (abrupt4xCO2), treating climate feedbacks as being fixed.
For GCMs, S is normally derived by linearly regressing, usually using annual average values, changes (ΔN) in TOA radiative imbalance on changes (ΔT) in GMAT over abrupt4xCO2 simulations, with changes being measured relative to values during an unforced preindustrial control simulation by the GCM. The ΔT and ΔN values are rescaled to reflect the ratio of F_{2⤬CO2} to the ERF from a quadrupling of CO_{2}, F_{4⤬CO2}. The slope of the regression line, λ, is a measure of the effective climate feedback parameter operating over the regression period. The regression line is continued forwards to \(\Delta N = 0\), indicating radiative equilibrium, with S being defined as the ΔT value at that point, and backwards to \(\Delta T = 0\), the rescaled ΔN value at that point providing an estimate, \(F_{{2 \times {\text{CO2}}}}^{{{\text{regress}}}}\), of F_{2⤬CO2}. Hence:
If the climate feedback parameter is not fixed, in general λ differs from \( F_{2 \times CO2} /{\text{ECS}}\), S differs from ECS, and \(F_{{2 \times {\text{CO2}}}}^{{{\text{regress}}}}\) differs from F_{2⤬CO2}.
In the vast majority of GCMs, the local slope of the relationship between ΔN and ΔT weakens over the course of 150year abrupt4xCO2 simulations, strongly suggesting that the model ECS exceeds S. Since feedbacks activated only on a long timescale affect the climate extremely slowly, S is more relevant than ECS (or ESS) to climate change over the next few centuries. However, deriving S from paleoclimate evidence, which reflects equilibrium changes, requires an estimate of the ECS to S ratio, with its excess over one being defined as \(\zeta = {\text{ECS}}/S  1\).
To obtain a valid estimate for climate sensitivity to doubled CO_{2} concentration from data involving a different change in CO_{2} concentration, it is necessary to scale the temperature change involved by the ratio of ΔF_{2xCO2} to the ERF associated with the particular change in CO_{2} concentration, even assuming that climate sensitivity is unaffected by the effect of the difference in CO_{2} concentration change on the climate state. S20 define their S in GCMs as the linearly regressed warming over years 1–150 after a quadrupling of CO_{2} concentration, extrapolated to zero ΔN and then divided by two. This scaling factor, while popular, is difficult to justify when the actual ratio of the ERF change involved (F_{4⤬CO2}) to F_{2⤬CO2} has been estimated with reasonable precision to be 2.10, 5% greater than twice that from doubled CO_{2} concentration (Byrne and Goldblatt 2014; Etminan et al. 2016; Meinshausen et al. 2020).
S20 defend their division of abrupt4xCO2 temperature changes by 2 (rather than 2.10) on the basis that it brings S estimated on their basis closer to estimated ECS in models with very long abrupt2xCO2 simulations, which they estimate as 6% higher than S derived by halving temperature changes in those models' abrupt4 × CO2 simulations, implying \(\zeta = 0.06\). However, that argument conflicts with their valid desire for a measure that is as closely related as possible to scenarios of practical relevance. Moreover, the increase in S that S20 introduce by basing it on a biased scaling of F_{4⤬CO2} to F_{2⤬CO2} results in inconsistent estimation of S between their three lines of evidence, a serious flaw. The biased scaling only affects (via the resulting ζ estimate) their estimation of S from Paleoclimate evidence, since its estimation from both Process and Historical evidence is based on directly estimated F_{2⤬CO2}, and is independent of scaled F_{4⤬CO2}.
Here, S20's onehalf scaling factor and the resulting 0.06 central ζ estimate for \({\text{{ ECS}}/S  1}\) is retained when investigating the effect of the objective Bayesian statistical method on their results. However, in Sect. 4 it is revised to 0.135, the mean ζ estimate in both abrupt2xCO2 and abrupt4xCO2 longrun simulations (16 in all) by eleven GCMs (Rugenstein et al. 2020). No scaling from F_{4⤬CO2} to F_{2⤬CO2} is required for these calculations, since temperature changes are only compared within abrupt4xCO2 and abrupt2xCO2 simulations, not between them.
3 Methods
Scientific knowledge regarding the properties of any realworld system, or of a simplified conceptual model used to represent it, emanates from observing aspects of the system behavior. The results of such observations or assessments based thereon ('datavariables') are typically numerical and somewhat uncertain, and are regarded as subject to random errors. Conceptual models of the system usually relate the datavariables used as input assumptions to system properties of interest that are regarded as fixed but unknown ('parameters'), assumed here to be represented by continuouslyvalued variables. A key role of statistical inference is then to draw valid conclusions from datavariables regarding such parameters, as regards their values and associated uncertainty.
It is essential for scientific inference that the statistical methods used are calibrated, in the sense that the uncertainty ranges they generate closely approximate confidence intervals. That is, over the long run (over many applications of the method to different data sets) the true parameter value will lie below a properly derived (x%–y%) confidence interval in about x% of cases, and above it in about (1—y)% of cases.
The data likelihood, a joint function of data and parameter values, plays a central role in statistical parameter inference. It represents the joint probability density of the observed data as a function of the parameter value(s). Provided errors in the observed data are independent, their joint probability density is the product of that for each datavariable. Usually, only the ratio of the data likelihood to its highest value matters.
An important property of likelihood functions is that, where two likelihood functions concerning the same system and parameters but derived from independent data exist, the information they jointly contain about the parameter is representable by their product (Birnbaum 1962; Pawitan 2001 Sects. 2.3 and 7.2). This property is used when combining the three different lines of evidence.
3.1 Bayesian parameter inference
There are two main statistical paradigms, Frequentist and Bayesian (Bernardo and Smith 1994). In both, parameter inference revolves around likelihood functions. However, Bayesians treat fixed but uncertain parameters as having distributions representing degrees of belief, in effect as if random variables with a probability distribution, while Frequentists do not, notwithstanding Frequentist confidence distributions (Schweder and Hjort 2002, 2016).
Both Stevens et al. (2016) and S20 employ Bayesian methods for combining climate sensitivity evidence; Frequentist methods appear less suitable for this task. Bayesian methods provide a means of coherently updating personal beliefs about an unknown parameter with external evidence as to its value. However, for continuouslyvalued parameters they do not in general provide calibrated estimates that properly reflect such evidence. By contrast, Frequentist confidence measures are derived from randomness in the data values and are intrinsically calibrated.
In the continuous case, from Bayes' theorem (Bayes 1763) the posterior probability density function (PDF), \(p_{{\varvec{\theta}}} \left( {{\varvec{\theta}}{\varvec{y}}} \right)\), for a parameter (vector) θ on which observed data y depend, is proportional to the data likelihood \(p_{{\varvec{y}}} \left( {{\varvec{y}}{\varvec{\theta}}} \right)\) (the probability density of the data treated as a function of θ, for fixed y) multiplied by the density of a 'prior distribution' (prior) for θ, \(p_{{\varvec{\theta}}} \left( {\varvec{\theta}} \right)\):
(the subscripts indicating the variable each density is for). The constant c is such that \(p_{{\varvec{\theta}}} \left( {{\varvec{\theta}}{\varvec{y}}} \right)\) integrates to unit probability; it is the reciprocal of \(\int {p_{{\varvec{y}}} ({\varvec{y}}{\varvec{\theta}})p_{{\varvec{\theta}}} (\theta )d{\varvec{\theta}}}\). If the parameter being estimated were a random variable with actual probability distribution \(p_{{\varvec{\theta}}} \left( {\varvec{\theta}} \right)\) then (5) would follow from the conditional probability lemma. However, this is not the case here (Fraser and Reid 2011).
The Bayesian equivalent of a confidence interval, a credible interval, reflects probability implied by the posterior PDF. Whether a credible interval is calibrated or not will among other things depend on the choice of prior, which the investigator is free to select. S20 select a prior that is uniform in λspace, and therefore proportional to \(F_{{{2} \times {\text{CO2}}}} /S^{{2}}\) in Sspace. In the common 'Subjective Bayesian' view adopted by S20, the prior is a probability distribution representing the investigator's degrees of belief about parameter values before incorporating information from the current data. There is no requirement that the posterior PDF be calibrated, and the resulting credible intervals may be far from actual confidence intervals (Fraser 2011). However, to avoid Bayesian inference providing misleading results, it is necessary to use a prior that provides correct calibration of posterior probabilities to frequencies and hence confidence intervals (Fraser et al. 2010; Lewis and Grünwald 2018).
In the alternative 'Objective Bayesian' view, in the absence of existing evidence regarding parameter values the prior should consist of a mathematical weighting function intended to have minimal influence on inference relative to the data, so that it is as noninformative as possible.
Under this approach, any existing probabilistic evidence concerning the parameter being estimated may appropriately be represented by a likelihood function for a notional observation, from which a posterior density has been calculated using a noninformative prior (Hartigan 1965), rather than by a posterior density.
Noninformative priors are mathematical priors that are generally intended to result in probability matching posterior PDFs, which produce credible intervals that are (at least approximately) true confidence intervals, and they are often judged on that basis (Berger and Bernardo 1992, p. 36; Kass and Wasserman 1996). Typically, a noninformative prior primarily reflects how the expected informativeness of the data about the parameter value(s) varies with the parameter value(s). How informative data are expected to be about the parameter value(s) is represented by the (expected) Fisher information,^{Footnote 2} which has a key role in likelihoodbased inference.
The goldstandard method for producing noninformative priors is reference analysis (Bernardo 1979; Berger and Bernardo 1992), which results in Bayesian inference that is objective in the sense that it only depends on observed data and model assumptions, as is the case for Frequentist inference (Bernardo 2011). In Objective Bayesianism as well as Subjective Bayesianism, however, subjective choices will still be made by the investigator in relation to the data and model used.
Under Subjective Bayesianism, independent new evidence about a parameter is incorporated by updating an existing posterior PDF (treated as the prior) by multiplying it by the data likelihood function for the new evidence and renormalizing to unit total probability, as S20 does. Such updating would be valid mathematically were the parameter a random variable, but it is not. Bayesian updating satisfies Subjective Bayesian axioms (Bernardo and Smith 1994) and results in coherent personal beliefs, but that does not imply that the resulting inference will exhibit satisfactory probability matching even if the existing posterior PDF was well calibrated.
Bayesian updating is in any event unsupportable for Objective Bayesianism, since the noninformative prior for the original likelihood will in general differ from that for the new likelihood. Bayesian updating would therefore produce inference that varied with the order in which different evidence was incorporated (Kass and Wasserman 1996). It can also result in quite poor frequentist calibration (Lewis 2018).
The orderdependence problem does not arise if Bayes theorem is applied once only, to the joint likelihood function for the combined evidence, with a single noninformative prior being computed that reflects the nature of the combined evidence Lewis (2013a, b). This is the method employed in the present study. As shown in Lewis (2018) and Lewis and Grunwald (2018), using such a singlestep method when combining climate sensitivity evidence results in more realistic inference than using Bayesian updating, even when a noninformative prior is used to incorporate the first line of evidence, although the magnitude of the improvement will vary.
Where a univariate parameter, such as climate sensitivity, is the only parameter being estimated, a Jeffreys' prior (Jeffreys 1946), which in that case is normally also the reference prior, gives credible intervals that match confidence intervals more closely than any other prior (Welch and Peers 1963; Hartigan 1965), and is therefore the most appropriate prior to use for weighting the combined evidenceproviding data likelihood functions. A Jeffreys' prior is proportional (with arbitrary scaling) to the square root of the Fisher information (of its determinant for a multivariate parameter).
Fisher information for different likelihood functions combines additively, provided the likelihood functions are derived from independent data (Pawitan 2001, Sect. 8.4). Therefore, Jeffreys' prior for inference from the combined likelihood function is obtainable by adding in quadrature the Jeffreys' priors for the separate likelihood functions, after scaling each to equal the square root of the Fisher information. The probability matching of posterior PDFs derived by this method has been tested and found to be accurate in cases involving various probability distributions (Lewis 2013a, b), including in the context of combining evidence regarding climate sensitivity (Lewis 2018).
3.2 Statistical models
I use the same statistical models as S20. These derive from simple forcingfeedback physical models, for the various lines of evidence, as follows. Terms on the righthand sides of (6) to (13) each represent the 'true' value of each variable, the best observational estimate of which is taken to include an additive error (ε) term, which has been omitted for clarity. These variables are termed "datavariables", as generally their estimated values are ultimately derived from observational data. Their estimated error characteristics are inputs to the statistical models.
For Process evidence, total climate feedback λ is taken as the sum of component feedbacks:
where λ_{Planck} is feedback from extra emission to space from vertically uniform warming, the anticorrelated water vapor (WV) and lapse rate (LR) feedbacks are combined into λ_{WV+LR}, λ_{sfcAlbedo} is surface albedo feedback, λ_{clouds} is cloud feedback, λ_{stratospheric} is feedback from changes in stratospheric water vapor and temperature, and λ_{atmosComp} is feedback from changes in atmospheric composition. The error/uncertainty term (ε_{λ}) for total feedback λ represents the sum of independent error terms for its components, with λ_{clouds} being likewise the sum of components, each subject to independent errors. Since errors are independent and are assumed to be normally distributed, ε_{λ} is also normally distributed, with variance equal to the sum of the variances of its components.
Provided that the estimates of the climate feedback components are of the values they take over 150 year abrupt4xCO2 GCM simulations, which they all are, the resulting climate feedback estimate will be on a basis consistent with its derivation over such simulations, and hence be of λ. As a result, S can be derived from this feedback estimate, using (4), without any adjustment being made to it. S20 do so by dividing the feedback estimate into (unadjusted) F_{2×CO2}, but (4) requires use of \(F_{{2 \times {\text{CO2}}}}^{{{\text{regress}}}}\). F_{2×CO2} should accordingly be multiplied by a scaling factor, γ, to convert it to \(F_{{2 \times {\text{CO2}}}}^{{{\text{regress}}}}\) (Sect. 4.1), so that:
For Historical evidence, differences in sea surface temperature (SST) change patterns may cause feedback estimated using (1), denoted \(\lambda_{{{\text{hist}}}} =  [\Delta F_{{{\text{Hist}}}}  \Delta N_{{{\text{Hist}}}} ]/\Delta T_{{{\text{Hist}}}}\), to differ from that over 150year GCM abrupt4xCO2 simulations, λ. An estimate of the effect, denoted \(\Delta \lambda = \lambda  \lambda_{{{\text{hist}}}}\), of such differences in SST change patterns (the historical pattern effect) is allowed for when computing S, but not when calculating an alternative measure, S_{hist}. Since the appropriate λ is thereby being used to estimate S, F_{2×CO2} should be scaled by γ, as for Process evidence. Using (7):
whereas
ΔF_{Hist} is the sum of forcing component ERFs, which (apart from F_{2×CO2}) have uncertainties that are independent of each other and of the likewise independent uncertainties in the other right hand side terms in (8) and (9):
where \(\Delta F_{{{\text{Hist}}}}^{{_{{{\text{othGHG}}}} }}\) includes wellmixed nonCO_{2} greenhouse gases, \(\Delta F_{{{\text{Hist}}}}^{{_{{{\text{O3}}}} }}\) includes tropospheric and stratospheric ozone, \(\Delta F_{{{\text{Hist}}}}^{{_{{{\text{vapor}}}} }}\) represents stratospheric water vapor and \(\Delta F_{{{\text{Hist}}}}^{{_{{{\text{BCsnow}}}} }}\) represents black carbon on snow and ice.
When S is inferred from Process or Historical evidence using S20's assumptions, γ is set to one, as no adjustment to F_{2×CO2} was made in S20.
For Paleoclimate evidence, there are separate models for changes related to the Last Glacial Maximum (LGM), midPliocene Warm Period (mPWP) and PaleoceneEocene Thermal Maximum (PETM), all of which involve a division by (1 + ζ) to convert an estimate for ECS into one for S.
3.2.1 For the LGM
where \(\Delta F_{{{\text{LGM}}}}^{{{\text{CO2}{{/2 \times}}}}}\) represents the LGM − preindustrial CO_{2} forcing change as a fraction of that from CO_{2} doubling; \(\Delta F_{{{\text{LGM}}}}^{{{\text{exCO2}}}}\) represents the corresponding change in nonCO_{2} forcing; and α is a coefficient for state dependence in climate feedback.
3.2.2 For the mPWP
where \((1 + f_{{{\text{mPWP}}}}^{{{\text{ESS}}}} )\) represents the ratio of ESS to ECS for the midPliocene, \(f_{{{\text{mPWP}}}}^{{{\text{CH4}}}}\) the forcing from methane relative to that from CO_{2}, and ΔCO2_{mPWP} the fractional increase in midPliocene CO_{2} concentration over an assumed 284ppm equivalent preindustrial wellmixed greenhouse gases state. A logarithmic CO_{2} forcing–concentration relationship holds over this range.
3.2.3 For the PETM
where β_{PETM} allows for possible statedependence of climate feedback and any slow feedbacks affecting ESS but not ECS, and \(f_{{{\text{PETM}}}}^{{{\text{CO2nonLog}}}}\) scales CO_{2} forcing from a logarithmic relationship with concentration, to correct for deviations therefrom at high concentration. S20 omits \(f_{{{\text{PETM}}}}^{{{\text{CO2nonLog}}}}\), so it is set to zero when S is inferred using S20's assumptions. Uncertainty in \(f_{{{\text{PETM}}}}^{{{\text{CO2nonLog}}}}\) is minute relative to that in ΔCO2_{PETM} and is therefore ignored.
Significant correlation is assumed to exist between errors in ΔT_{PETM} and ΔT_{mPWP} and between errors in ΔCO2_{PETM} and ΔCO2_{mPWP}, with errors in all other variables being independent.
The foregoing equations are rearrangements of S20 Eqs. (4), (18), (20), (21), (22), (23) and (24), with the additional γ and \(\left( {{1} + f_{{{\text{PETM}}}}^{{{\text{CO2nonLog}}}} } \right)\) terms.
S20 also considered, but neither created a statistical model for nor used in their main results, 'emergent constraints' on climate sensitivity. These depend on relationships between selected observationallyconstrainable variables and climate sensitivity in GCMs. In almost all cases, strong relationships found in one generation of GCMs have been statistically insignificant and/or substantially different in another GCM generation (Caldwell et al. 2014, 2018; Schlund et al. 2020), casting substantial doubt on their reliability. Biases common to all or most models are another concern.
3.3 Likelihood estimation for S
Likelihood estimation for a parameter is not straightforward when its value depends on multiple datavariables, even apart from the question of how to combine different lines of evidence. For each line of evidence separately, each set of datavariable value realizations corresponds to a unique value of the parameter and datavariable errors are independent, so the joint probability density for any set of datavariable values can be derived as the product of their PDFs at the values concerned, and assigned to the parameter value that they imply.
However, each value of the parameter, S, will correspond to an infinite set of combinations of differing datavariable values. Accordingly, producing a single likelihood corresponding to each S value requires some method of weighting probability densities for the different datavariable value combinations.
S20's likelihood estimation method involves sampling S uniformly and F_{2×CO2} pro rata to its PDF, the sample ratios providing λ samples, and for each line of evidence except Process (where the likelihood is analytically calculable) also sampling pro rata to its PDF each remaining datavariable involved other than ΔT. They take the likelihood of each resulting multivariate sample set as equal to the PDF of ΔT at the value implied by the sample set's λ and datavariable values. They bin the multivariate sample sets by their S values, and compute the S likelihood for each bin as the average of the likelihoods of the sample sets it contains.
While S20's likelihood estimation method may satisfactorily estimate the actual likelihood for S in simple cases, it is not clear why it would provide a realistic estimate of the likelihood where, for example, a datavariable has substantial asymmetrical uncertainty or S is related to it nonlinearly. Such circumstances arise with Historical evidence, due to the asymmetrical PDF for aerosol ERF, and with PETM (and to a lesser extent mPWP) evidence, due to the logarithmic relationship between CO_{2} concentration ratio and ERF. Investigation confirms that S20's method of likelihood estimation is indeed unsound (supplemental material S2). Their method causes substantial misestimation of Historical and (worsened by a coding error resulting in the ΔCO2_{PETM} standard deviation used being one tenth of its correct value—supplemental material S2) of PETM likelihood, and nonnegligible misestimation of mPWP likelihood. Therefore, I do not use S20's likelihood estimation method in this study.
Rather than relying on a single likelihood estimation method, I employ three alternative methods, with the resulting likelihoods crosschecked. Each method involves setting up S value bins on a fine (0.01 K) grid spanning 0−20 K.
The first and third likelihood estimation methods involve first randomly sampling all the datavariables involved in estimating S from the line of evidence concerned that are not treated as fixed, weighting the sampling pro rata to their PDFs. The S value that each resulting multivariate sample set's datavariable values implies is computed, and each sample set is allocated to the appropriate S bin. The number of sample sets in each bin then provides an estimated (posterior) PDF for S. Between 10^{7} and 10^{8} sample sets are drawn, depending on the case. Since this procedure requires a unique S value for each sampleset, these likelihood estimation methods can only be used for single lines of evidence (for a single period in the case of paleoclimate evidence). This sampling method of deriving a posterior PDF for S has been widely used (Gregory et al. 2002; Otto et al. 2013; Lewis and Curry 2015, 2018). The method is priorfree, in the sense that no explicit prior selection is required, however it is equivalent to Bayesian estimation using a noninformative prior.
The first likelihood estimation method effectively involves estimating, at each Sbin value, the probabilityweighted likelihood integrated over datavariable space; it gives the highest weight to those combinations of datavariable values most likely to arise. This 'integrated likelihood' method is implemented by taking the sample sets generated and used to derive a PDF for S as set out in the preceding paragraph, and computing the likelihood for each sample set (as the product of the datavariable PDFs at their sampled values). The likelihood for the S value at each bin center is then derived as the simple average of the likelihoods of all sample sets in the bin.
The second method uses the profile likelihood (Pawitan 2001), a widely used measure that typically provides a close approximation to likelihood derived using more sophisticated methods. This method applies the entire weight to that combination of datavariable values which, at the S value concerned, maximizes the likelihood. The profile likelihood is derived by using an optimization algorithm to find, for each finegrid value of S, the data values combination that maximizes the product of the datavariable PDFs for the line or lines of evidence concerned, allowing for any uncertainties that are common or correlated between datavariables.
Finally, a likelihood is calculated using the datadoubling method (Efron 1993; Lewis 2018). This involves the supposition that the evidence involved represented an observational data set, and that an identical but independent data set had also been observed. A posterior PDF for S based on emulating the stronger evidence provided by such doubled data is computed. This can often be effected by halving the actual variance of each datavariable and sampling pro rata to the reduced variance datavariable PDFs. An implied likelihood is then computed by dividing that PDF by the posterior PDF corresponding to the actual data set, derived by sampling pro rata to the actual datavariable PDFs. The validity of this 'doubled data likelihood' method follows directly from (5) if a noninformative prior is involved, since the same prior is noninformative for repeated observations from the same experiment, and the doubled data likelihood will equal the product of the original data likelihood with itself:
However, this method will not work satisfactorily where evidence is represented by a distribution for which emulating a doubled data version is problematic. That is the case for S20's Historical evidence, where the aerosol ERF distribution is highly asymmetrical and has no analytical form.
Before each derived PDF and likelihood is used it is generally smoothed with a splinebased method, retaining sufficient degrees of freedom to very closely match the shape of the unsmoothed original.
3.4 Noninformative prior and posterior PDF estimation
When either an integrated or doubleddata likelihood is used, a prior for S is derived by dividing the associated samplingderived PDF by the estimated likelihood. This is an exact probabilitymatching prior by construction, and, since a Jeffreys' prior provides the closest probability matching, the derived prior is necessarily a noninformative Jeffreys' prior (provided the likelihood is valid).
The profile likelihood method only produces a likelihood, so it is necessary to separately derive a Jeffreys' prior, \(\pi_{{{JP}}} (S)\), to use therewith. The 'dataspace movement' method used is based on a direct measure of the local informativeness of the data about the parameter. Details of this method, and of the calibration of all the Jeffreys' priors, are given in the supplemental material (S3).
A PDF for S can then be derived as the product of the profile likelihood and the related dataspace movement prior. This PDF cannot account for probability outside the range of S values used, so it is normalized to unit probability over that range. References to profile likelihood method PDFs are to such PDFs derived directly, or after combining likelihoods and priors from different lines of evidence.
3.5 Applying the Objective Bayesian statistical methods to combining S20's evidence
The statistical models employed in S20 to link S to databased evidence necessitate a more general approach to combining lines of evidence using a single combinationbased noninformative prior than was employed in Lewis (2018) and Lewis and Grunwald (2018). The mechanics involved are detailed in the supplemental material (S4).
A key motivation for using several different likelihood and noninformative prior estimation methods is that comparison of their performance when combining lines of evidence can provide confidence that both those methods, and the combination methods used, are valid. For S20's Historical evidence, to which the doubled data method cannot successfully be applied, likelihoods from the integrated likelihood and profile likelihood methods are almost identical. However, the profile likelihood dataspace movement prior from Historical evidence is poor, due to imperfect optimization and difficulty representing the informativeness of the aerosol forcing distribution used by S20, and profile likelihood based estimates of S from all lines of evidence combined cannot reliably be derived by a single optimization. I therefore examine the combinedevidence likelihoods, priors and posterior PDFs for S that the three methods produce when combining S20's Process evidence with LGM and mPWP Paleoclimate evidence. For the two samplingbased methods, doing so involves combining separate estimates from Process, LGM and mPWP evidence. For the profile likelihood method, these are onestep estimates from simultaneous inference using data from all three individual lines of evidence. For all methods, the combinedevidence posterior PDF for S is normalized to unit probability over 0–20 K.
Figure 1a compares the combined Process, LGM and mPWP evidence likelihood estimates from the aforementioned three methods. They are indistinguishable. Figure 1b compares the three related Jeffreys' priors from \(S = 1.5{\text{ K}}\) up. Below that level the integrated likelihood and doubled data samplingderived priors are artefacted, due to the paucity of samples for Process evidence, and behave erratically. However, the likelihood is almost zero below \(S = 1.5{\text{ K}}\), and the total probability in that region is only 0.05%, so the effect on inference for S is negligible. The resulting posterior PDFs for S using the three methods (Fig. 1c) are indistinguishable. Their medians are all within 0.01 K of each other, their 5th percentiles are all 2.07 K and their 95% percentiles are all within ± 0.03 K.
The very close agreement among combinedevidence inference for S using three different methods of deriving and combining likelihoods, noninformative Jeffreys' priors and PDFs provides strong support for the general validity of all the methods for this application.
The main results presented in Sects. 5 and 6 use the robust samplingbased integrated likelihood method—which in all cases produces a satisfactory likelihood and prior—for all lines of evidence.
4 Review and revision of S20 datavariable assumptions
I now make some revisions to S20's datavariable assumptions for various lines of evidence, which are justified on the basis of more recent evidence, by a preferable alternative interpretation of the same evidence, or because they remedy an error or omission. The scaling factor γ for F_{2×CO2} is included in these revisions. The original and revised estimates for all datavariables are set out in Tables 1, 2 and 3, with the reasons for changes. The evidence justifying each revision is reviewed in detail in the supplemental material (S5); evidence relating to a number of the unrevised datavariable estimates is also reviewed there. Results from applying the Objective Bayesian approach to inference using S20's assumptions and the revised assumptions are given in Sects. 5 and 6 respectively.
The units of all stated feedback values are Wm^{−2} K^{−1}. Uncertainties indicated by ± represent one standard deviation, with a Normal distribution, denoted N(mean, standard deviation), assumed.
4.1 F_{2×CO2} and its scaling when using Eq. (4)
S20 use the estimate of stratosphericallyadjusted forcing from doubled CO_{2} of 3.80 Wm^{−2} per the simplified formula in Etminan et al. (2016), and add 5% for tropospheric adjustments, arriving at an ERF estimate for F_{2×CO2} = 4.0 ± 0.3 Wm^{−2}. Meinshausen et al. (2020) fitted Etminan et al.'s results more precisely, obtaining a F_{2×CO2} value 1.5% lower. Based on their more accurate formula, and using the same 5% tropospheric adjustment, F_{2×CO2} ERF was assessed at 3.93 ± 0.3 Wm^{−2} in AR6 (Forster et al. 2021: 7.3.2.1). The ratio of F_{4×CO2} to F_{2×CO2} per the Meinshausen et al. (2020) formula is 2.10 × , 5.0% higher than under a log(concentration) relationship. I adopt these numbers in Sect. 6 when estimating S.
Care must be taken to use the appropriate F_{2×CO2} value when applying (4). S20 use their estimate of the actual ERF from a doubling of CO_{2} concentration. However, as stated in Sect. 3.2, when feedback is estimated using a linear model on a basis consistent with behavior during years 1–150 of abrupt4xCO2 simulations, normally by ordinary least squares regression, then F_{2×CO2} should be converted into an estimate of \(F_{{{{2 \times CO2}}}}^{{{\text{regress}}}}\), the ERF implied by the yaxis regression line intercept, by multiplying it by \(\gamma = F_{{{{2 \times CO2}}}}^{{{\text{regress}}}} /F_{{{{2 \times CO2}}}}^{{}}\). That is because S is defined in terms of the climate feedback λ arising from system behavior over years 1–150 after a (hypothetical or actual) quadrupling of CO_{2} concentration, and as per (4) equals \( F_{{{{2 \times CO2}}}}^{regress} /\lambda\) (Sect. 2). When climate feedback weakens during the course of 150year abrupt CO_{2} forcing simulations, as it does in the vast majority of GCMs, \(F_{{{{2 \times CO2}}}}^{{{\text{regress}}}}\) will underestimate the GCM's actual F_{2×CO2}. Therefore, dividing the actual F_{2×CO2} by climate feedback estimated over the whole simulation period is bound to overestimate S (supplemental material S1; compare red and black lines in Figure S1.1). For an ensemble of CMIP5 and CMIP6 GCMs, the \(F_{{{{2 \times CO2}}}}^{regress} /F_{{{{2 \times CO2}}}}^{{}}\) ratio is 0.86 ± 0.09 (supplemental material Table S1, rounding up the standard deviation, S5.1.5). I adopt this estimate.
S20 recognize this issue, conceding a similar overestimation of S, but neglect it, asserting incorrectly that it only affects feedback estimates from GCMs. This misconception results in S20's estimates of S from Process and Historical evidence being biased high. The issue is intrinsic to the use of (4) and the S20 definitions of S and λ, which involve a single λ value, estimated on a basis consistent with that obtained by regressing over 150year abrupt4xCO2 simulations.
For Process evidence, the bias is selfevident, since almost all λ components are estimated by or on a basis consistent with regressing changes over abrupt4xCO2 simulations. In particular, low cloud feedback, the dominant cause of weakening feedback over abrupt4xCO2 simulations, and hence of \(\gamma < 1\), is so estimated (supplemental material S5.1.1, S5.1.3).
For Historical evidence, S20 estimate λ_{hist} as \( (\Delta F_{{{\text{Hist}}}}  \Delta N_{{{\text{Hist}}}} )/\Delta T_{{{\text{Hist}}}}\) and divide it into–F_{2×CO2} to estimate S_{hist} (supplemental material Figure S1.1, blue line). They then adjust λ_{hist} by their estimate of the difference between λ in abrupt4xCO2 simulations and λ_{hist}, and estimate S by dividing the resulting λ estimate into–F_{2×CO2}, rather than (as should be done) into \( F_{{{{2 \times CO2}}}}^{{{\text{regress}}}}\), thereby overestimating S (supplemental material Figure S1.1, red line).
Accordingly, in the statistical models S20 uses to estimate S from both Process and Historical evidence, F_{2×CO2} needs to be scaled by \(F_{{{{2 \times CO2}}}}^{regress} /F_{{{{2 \times CO2}}}}^{{}}\). I do so in Sect. 6. Paleoclimate estimation of S is unaffected, since that in effect estimates climate feedback from equilibrium changes, derives ECS by dividing it into–F_{2×CO2}, and then converts ECS into S (see (11), (12) and (13)).
4.2 Process evidence
The datavariable distributions adopted by S20, and as revised here, for estimating S from Process evidence are summarised in Table 1. The main changes made are to low cloud feedback, reflecting strong recent evidence that it is weaker than S20's assessment, and scaling F_{2×CO2} to \(F_{{{{2 \times CO2}}}}^{{{\text{regress}}}}\) (Sect. 4.1).
The significant revision in estimated tropical and midlatitude (60°S–60°N) marine low cloud feedback is discussed in detail in the supplemental material (S5.1.3). In brief, both the S20 and the revised estimates depend primarily on observational estimates of low cloud response to cloudcontrolling factors (CCF). S20's assessment of tropical marine low cloud feedback was based primarily on, and equals, an estimate from the Klein et al. (2017) review, which only took two CCF into consideration. For the 30–60° midlatitude bands S20 also used GCMderived evidence. The revised median estimate is the observationallyconstrained 60°S–60°N value from Myers et al. (2021). They use a more comprehensive set of CCF and argue that their feedback estimate is more realistic than Klein et al.'s. Cessana and Del Genio (2021) likewise find the Klein et al. feedback estimate to be too high.
The revisions to the S20 median datavariable values change the central λ estimate from Process evidence alone from −1.30 ± 0.44 in S20 to −1.53 ± 0.44, while the maximum likelihood estimate of S changes from 3.08 K to 2.21 K.
4.3 Historical evidence
The datavariable distributions adopted by S20, and as revised here, for estimating S from Historical evidence are summarised in Table 2. The changes made to S20's datavariable estimates are discussed in detail in the supplemental material (S5.2). The main changes made are to aerosol forcing, reflecting the most quantitative of the recent evidence that it is weaker than S20 assumed, to other forcings and the ΔGMAT–ΔGMST difference, reflecting AR6 assessments, to the historical pattern effect, reflecting evidence that most SST datasets indicate little unforced element, and scaling F_{2×CO2} to \(F_{{{{2 \times {\text{CO2}}}}}}^{{{\text{regress}}}}\).
The significant revision in estimated aerosol ERF is discussed in considerable detail in the supplemental material (S5.2.3). Briefly, S20 use the unconstrained aerosol forcing distribution from Bellouin et al. (2020; hereafter B20). That distribution is based on a complex theoretical formula that depends on a number of factors, all of which are estimated separately. There is considerable evidence suggesting that B20 overestimate aerosol forcing strength. The revised estimate median uses recent evidence (Gryspeerdt et al. 2019; Possner et al. 2020; Glassmeier et al. 2021) regarding just one of the factors involved in B20's calculations: the cloud liquid water path sensitivity factor used when adjusting the radiative forcing from aerosolcloud interactions to an ERF basis. The revised median aerosol ERF estimate is derived by carrying out the same computation as in B20 except for changing the estimate of that sensitivity factor. The revised ERF estimate adopts a Gaussian uncertainty distribution with the same 5% bound as assessed in AR6; B20's theoreticallyderived distribution assigns significant probability to extremely negative aerosol ERF values, but these appear inconsistent with observational evidence that is independent of the global temperature record.
Another significant revision is that to the Historical pattern effect feedback adjustment, discussed in detail in the supplemental material (S5.2.4). Briefly, S20's median estimate is based on that reported by Andrews et al. (2018), which was derived by comparing, in six GCMs, estimated feedback in abrupt4 × CO2 simulations with that in fixed SST simulations with evolving observational historical SST patterns. S20 reduced the Andrews et al. (2018) estimate by 0.1 Wm^{−2} K^{−1} to allow for the possibility that the pattern effect may be smaller than reported in that study. However, substantial evidence now exists that the observational SST dataset used for all the fixed SST simulations assessed by Andrews et al. (2018) is an outlier in terms of the magnitude of the pattern effect that it gives rise to (Lewis and Mauritsen 2021; Zhou et al. 2021; Fueglistaler and Silvers 2021). The revised pattern effect adjustment reflects that evidence.
The revisions to the S20 datavariable median values reduce the estimates of S and S_{hist} that they imply from 5.82 to 2.16 K, and from 3.37 to 2.05 K, respectively.
4.4 Paleoclimate evidence
Paleoclimate evidence has the advantage of being largely independent of, and sometimes involving a much larger signal than, the historical period, but suffers from relating to different states of the Earth. The evidence is also derived from imprecise, geographically limited, and potentially biased proxies that provide estimates for only some relevant variables.
S20 evaluate evidence from climate transitions during the LGM, mPWP and PETM, but their main results exclude PETM evidence. For all three periods, an estimate of ECS is converted into one for S by dividing it by (1 + ζ), the distribution of which is revised (Sect. 2; supplemental material S5.3.1), as is the median F_{2×CO2} estimate (Sect. 4.1).
4.4.1 LGM
The best studied paleoclimate transition, and that most used for estimating climate sensitivity, is the change from the LGM, the coldest phase in the last ice age, some twenty thousand years ago to the preindustrial Holocene. A significant advantage of the LGM transition is that, unlike more distant periods, there is proxy evidence not only of changes in temperature and CO_{2} concentration but also of nonCO_{2} forcings, and that enables estimation of the effects on radiative balance of slow (ice sheet, etc.) feedbacks, which need to be treated as forcings in order to estimate ECS (and hence S) rather than ESS. Moreover, the temperature proxy evidence is sufficient to enable spatiallyweighted global means to be estimated (Annan and Hargreaves 2013).
The two changes made to S20's datavariable estimates are discussed in detail in the supplemental material (S5.3.2). The revision to land ice and sealevel forcing adds an estimate of the omitted albedo change caused by sealevel fall exposing more land. The revision to ΔT brings it closer to the average of estimates from S20's cited sources. The revisions I make to S20's median LGM datavariable values, along with the revised ζ estimate, reduce the estimate of S that they imply from 2.63 K to 1.97 K.
4.4.2 mPWP
The midPliocene warm period, approximately 3 Ma ago, was moderately warmer than preindustrial times, and in that respect a closer analogue than the LGM of conditions expected during this century. However, the temperature change involved was smaller than for the LGM, there is more uncertainty about CO_{2} levels, temperature proxies are more limited, and usable proxybased estimates of nonCO_{2} forcing are unavailable.
The changes made to S20's datavariable estimates are discussed in detail in the supplemental material (S5.3.3).They relate to ΔT and the ESS–ECS ratio, in both cases reflecting ratios per the more recent PlioMIP2 project (Haywood et al. 2020). The revisions I make to S20's mPWP datavariable median values, along with the revised ζ estimate, reduce the estimate of S that they imply from 3.36 to 2.33 K, more in line with estimates from the LGM and PETM.
4.4.3 PETM
The PETM temperature excursion period some 56 Ma ago was much warmer than the present, and differed geographically and orographically. S20 state that the PETM is arguably the best prePliocene warm interval for estimating ECS; it has been fairly well studied and involves a large signal. Nevertheless, they excluded PETM evidence in their Baseline estimates "due to the large uncertainties and the danger of overconstraining the likelihood should these be underestimated". While the PETM uncertainties are substantial, S20 makes generous allowance for them, and any underestimation of uncertainties appears much more likely to cause overestimation than underestimation of S (supplemental material S5.3.4). In view of that and of the large signal involved in the PETM, I use it, but as an alternative to the mPWP rather than combining their evidence, since doing so provides very little benefit when the estimated datavariable error correlations between the two periods are allowed for (supplemental material S5.3.5).
The one revision I make to S20's PETM datavariable median values allows for the CO_{2} ERF–concentration not being exactly logarithmic, which in combination with the revised ζ estimate reduces the estimate of S that they imply from 2.38 K to 1.99 K. This change is discussed in detail in the supplemental material (S5.3.4), where the evidence relating to several of S20's other PETM datavariable estimates is also discussed.
The datavariable distributions adopted by S20, and as revised here, for estimating S from Paleoclimate evidence for each period are summarised in Table 3.
5 Results using S20 datavariable assumptions: comparison using different methods
I now compare S20's results with those derived here using the same input assumptions and either a computed noninformative Jeffreys' prior or the same prior as used in S20. I start by comparing likelihoods, as these are the foundation of parameter inference and are unaffected by the prior used, and then discuss the computed Jeffreys' priors. Finally, the posterior PDFs and numerical percentile values for S produced in this study are presented and compared with those in S20.
5.1 Likelihoods
The likelihoods derived using the profile likelihood method, the samplingbased integrated likelihood and the data doubling methods described in Sect. 3.3 agree very closely with each other (supplemental material Figures S1 to S3). They should also agree with the likelihoods shown in S20, as they are based on the same statistical models and datavariable assumptions. The integrated likelihoods derived in this study are shown by solid lines in Fig. 2; likelihoods from S20 are the same color but dotted. The overall Paleoclimate likelihood is that from combining evidence from the LGM and mPWP.
Figure 2a shows that mPWP paleoclimate evidence discriminates more strongly against very high S values than does LGM or, particularly, PETM evidence, despite its median value being the highest of the three. This is primarily because fractional uncertainty in forcing, and in those terms that effectively modify forcing, is lowest for mPWP evidence and highest for PETM evidence.
Figure 2b and c show that Paleoclimate evidence likelihoods downweight the possibility of very high S values most strongly, while Historical evidence does so very weakly. The latter primarily reflects the use of the unconstrained Bellouin et al. (2020) aerosol forcing distribution, which assigns substantial probability to strongly negative values.
S20's own likelihoods generally peak marginally earlier than those computed here, and thereafter decline faster. The difference is barely noticeable for the LGM, but rather larger for the mPWP and for the overall Paleoclimate (LGM and mPWP combined) evidence. The difference is major for the likelihoods from PETM and Historical evidence. This is particularly the case, in ratio terms, for S_{hist} (cyan lines in Fig. 2c). The differences arise because S20 employ an invalid method for deriving likelihoods (supplemental material S2). The virtual identity of the present study's estimated likelihoods using three different methods (two for Historical evidence) provides further confirmation that the S20 likelihoods are incorrect, and quantifies their inaccuracy.
When likelihoods from all lines of evidence used in S20 are combined multiplicatively (Fig. 2d), the resulting likelihood drops far more sharply than those from any individual line of evidence. The absolute difference between the combined S20 likelihoods and those computed here is relatively small in this case. This is because the S20 likelihoods are reasonably accurate below the likelihood maxima, and the combined likelihood drops to a low level before the errors in the S20 likelihoods (which partially cancel) grow very significant. By \(S = 5{\text{ K}}\), the S20 combined likelihood is ~ 25% lower than that calculated here, but by that point the 95th probability percentile has been reached.
The likelihood difference for Process evidence is small, but of the opposite sign to the other cases; S20's likelihood peaks marginally later than that computed here and declines more slowly after the peak. The Process likelihood can also be derived using a formula that accurately approximates the distribution of the ratio of two normallydistributed variables, here λ and F_{2×CO2} (Raftery and Schweder 1993; Lewis 2018). The likelihood per that formula almost exactly matches the likelihoods derived using this study's three methods, but not S20's likelihood. Since S20's Process likelihood computation is not subject to the defects identified in their other likelihood computations, that suggests some other problem may exist in S20's statistical computations.
5.2 Computed Jeffreys' priors
Figure 3 shows the computed and calibrated integrated likelihood based Jeffreys' priors for each of the three main lines of evidence, before and after transformation to λspace (which aids comparing them), and sotransformed priors for individual Paleoclimate periods. Since these priors are very similar to those estimated using the revised assumptions, comments on them are deferred until the latter are discussed in Sect. 6.
5.3 Posterior PDFs and percentiles for S
Figure 4a shows (solid lines) samplingderived primary posterior PDFs for each main line of evidence, representing in each case the product of the estimated integrated likelihood and a Jeffreys' prior, normalizing to unit probability over the 0–20 K S range used. Posterior PDFs from using a uniforminλ prior, as in S20, with the same likelihoods are also shown (dashed lines). For Process evidence, for which Jeffreys' prior is uniform in λ, the two PDFs coincide. For Historical and particularly Paleoclimate evidence, use of a uniforminλ prior biases the posterior PDF towards lower S values and, in the Paleoclimate case, excessively constrains high S values.
The primary combined evidence posterior PDF (Fig. 4b, solid red line) represents the product of the estimated Process, Historical and Paleoclimate likelihoods, being the combined likelihood, and Jeffreys' prior for the combined evidence. The PDF using a uniforminλ prior is also shown (solid blue line). The difference between these is much smaller than in the case of separate Historical or Paleoclimate evidence, reflecting the combined likelihood (Fig. 2d) being much narrower. The dotted cyan line shows S20's Baseline, uniforminλ prior based, PDF. It is very close to the uniforminλ prior based PDF derived here, which follows from the closeness of its combined evidence likelihood to that derived here, notwithstanding the substantial differences in the Paleoclimate and, particularly, Historical likelihoods at high S.
Figure 4c shows (solid cyan line) this study's posterior PDF for S_{hist}, derived by a samplingbased method, without normalization to unit probability over 0–20 K. The dashed orange line shows S20's corresponding "nonBayesian S_{hist} PDF", derived directly (by sampling) from their (19), the equivalent of (9) here. When S20's PDF is scaled by a factor of 0.86^{Footnote 3} (dotted red line) it closely matches this study's samplingbased S_{hist} PDF (which equates to a Bayesian posterior PDF derived using a noninformative prior).
Although all S20's main probabilistic estimates are based on Bayesian analysis with a uniforminλ prior, no such results are given for S_{hist}. However, the solid black line in Fig. 4c shows a uniforminλ prior based PDF, using the accurate emulation of S20's S_{hist} likelihood (supplemental material Figure S2.1(a)). This PDF is much better constrained at high S_{hist} than the nonBayesian samplingderived PDFs, primarily reflecting S20's misestimation of the S_{hist} likelihood.
Table 4 presents results in the form of medians and 66%, 90% and 95% uncertainty ranges for posterior PDFs for S and S_{hist} on S20's datavariable assumptions, using this study's methods, with the comparative S20 results where available. It is evident from the high percentile S values that Paleoclimate evidence gives the strongest constraints on upper uncertainty bounds, with Historical evidence constraining them least. That is consistent with the relative shapes of the likelihood functions for the three lines of evidence.
Notwithstanding the difficulty the optimizationbased profile likelihood method has in deriving a satisfactory dataspace movement prior for S20's Historical evidence, the 'All combined' 5% to 95% percentile values from combining that method's results for each line of evidence are all within 0.05 K of those using the samplingbased integrated likelihood method and related Jeffreys' priors, as given in Table 4.
Compared to S20's Baseline combined evidence results, the Table 4 median estimate for S is approximately 0.13 K higher, and the 95% bound 0.35 K higher. The likely reasons for these differences being small are that (i) neither of S20's most seriously inaccurate likelihood estimates were used for its Baseline estimate, resulting in the combined likelihood that they used for that purpose deviating only modestly from that given by this study's methods until both have fallen to a moderate level, as well as being nearly identical to it up to the likelihood maximum; (ii) both S20's uniforminλ prior and this study's combinedevidence Jeffreys' prior fall sharply over the high likelihood (> 0.5 × its maximum) region, by about twothirds in S20's case, which means that differences in the likelihood and (still declining) prior used beyond that region have a minor effect on S estimates; and (iii) over the high likelihood region the Jeffreys' prior only increases by ~ 25% relative to a uniforminλ prior, which difference is only sufficient to produce small upward shifts in the median and higher percentile S estimates.
PDFs for S used in Table 4 were normalized to unit probability over 0–20 K, except in one case. As discussed in Sect. 3, for combined evidence virtually all probability lies within the 0–20 K range over which computations are performed and over which total probability is normalized to unity. Likewise, almost no probability for S lies outside 0–20 K when combining two or more lines of evidence, or using Paleoclimate evidence alone, and under 1% does for Process evidence alone. However, when using Historical evidence alone 30% of samples produce S values that are above 20 K or negative due to ΔR > 0, implying an unstable climate system, and 15% of samples do so for S_{hist}. The substantial proportion of sampled S and S_{hist} values exceeding 20 K primarily reflects the significant probability assigned by S20's datavariable assumptions to highly negative aerosol ERF values: \(\Delta F_{{{\text{Hist}}}}^{{_{{{\text{aerosol}}}} }} <  2{\text{ Wm}}^{2}\) in 17% of samples. Unnormalized results for S and S_{hist} from Historical evidence are therefore also given, without probability being restricted to any range of S values. This unrestricted basis, which correctly reflects the implications of the datavariable uncertainty distributions, is usual for samplingbased energy budget studies that derive S_{hist} (Gregory et al. 2002; Otto et al. 2013; Lewis and Curry 2015, 2018).
S20 did not provide any estimate of the transient climate response (TCR), a shorterterm climate sensitivity measure, however Table 4 does do so. TCR is estimated as for S_{hist} but omitting the deduction for ΔN in (9), a common method (Otto et al. 2013; Lewis and Curry 2015, 2018; Forster et al. 2021 7.5.2.1), so that:
The resulting median TCR estimate of 2.26 K exceeds the AR6 likely range, and there is a 7% probability that TCR exceeds 20 K. Moreover, if S20's estimates of the historical pattern effect are accurate then over half of it is unforced (supplemental material S5.2.4) and will have depressed \(\Delta T_{{{\text{Hist}}}}\). Amending (15) to correct for that would increase the implied S20 TCR estimate.
Table 5 presents equivalent results from posterior PDFs based on uniforminλ priors, with the comparative S20 results where available. Where Process evidence is not used, the entire posterior probability will in theory be located immediately above zero S, resulting in all S percentiles being almost zero (supplemental material S7). This study's Table 5 results reflect imposing a restriction to \(S \ge 0.01{\text{ K}}\), which avoids the uniforminλ prior producing nonnegligible probability at very low S values.
The S values are lower, particularly at high percentiles, than when using Jeffreys' priors, except for Process alone where the two types of prior and hence the S values are identical. This behavior reflects the fact that, while all the Jeffreys' priors decrease with S, they decline less rapidly than the uniforminλ prior. Equivalently, when transformed to λspace, all the nonProcess Jeffreys' priors increase with S, whereas the uniforminλ prior does not. The S values at higher percentiles derived here using a uniforminλ prior differ from those per S20's Baseline results, as would be expected given the identified differences between their likelihoods. However, the differences in S values are small, only reaching 0.1 K beyond the median. That is consistent with the differences in likelihoods only becoming significant beyond the medians; since the uniforminλ prior varies with S^{−2} the effect on the posterior PDFs of sizeable likelihood differences is muted at higher S values.
Comparing the present study's results using the two types of prior, the differences between their 95th percentile S values are quite significant when Historical and/or Paleoclimate evidence are used, either alone (differences of ~ 1 K) or in combination (a difference of 0.6 K, or 0.9 K using S20's results). When Process evidence is used in combination with Historical and/or Paleoclimate evidence the differences are smaller. When combining all three lines of evidence, the 95% bound is only 0.2 K lower when using a uniforminλ prior, and the difference in medians is under 0.1 K. That is mainly due to the combined evidence being much more informative about S, and constraining it more tightly, than any of the separate lines of evidence, and partly due to contributions to the combined Jeffreys' prior from Process and Historical evidence respectively being uniforminλ, and increasing only gently relative to a uniforminλ prior.
S20 also present various posterior PDFs based on a uniforminS prior. That prior is unsuitable for S estimation unless ΔT uncertainty dominates, which it almost never does, and will often result in uncertainty ranges that are far from being true confidence intervals. For Historical evidence, use of a uniforminS prior has been shown to be unsuitable and to result in seriously biased estimation (Annan and Hargreaves 2011; Lewis 2014). For Process evidence, the likelihood for λ is normal in λ space, a case for which there is general agreement that use of a uniform prior is appropriate. When transformed to S space, the resulting appropriate prior for S for Process evidence will (applying the Jacobian factor) be proportional to S^{−2}, very far from uniform.^{Footnote 4} Most damningly, use of a uniforminS prior would result, even for 'All combined' evidence, in the S values for all percentiles being an unbounded function of the upper bound of the S range over which normalization to unit probability occurs, and to increase without limit as that bound tends to infinity.^{Footnote 5}
6 Results using the revised datavariable assumptions
Results are now presented using the datavariable assumptions as revised per Sect. 4, including rectification of the omission of the necessary scaling of F_{2×CO2} when using Process and Historical evidence ("the revised assumptions"), employing the integrated likelihood method with computed Jeffreys' priors.
6.1 Likelihoods
Figure 5 shows likelihoods derived using the revised assumptions (solid lines), with similarly calculated likelihoods using S20's datavariable assumptions in the same colors but dotted. The revised assumptions produce likelihoods that peak at lower S values, and are lower at high S values, than do S20's assumptions.
The likelihoods for both separate and combined evidence derived using the integrated likelihood method, the data doubling method, and the profile likelihood method, are all almost identical when employing the revised assumptions (supplemental material Figures S4 to S7).This includes, unlike on S20's datavariable assumptions, likelihoods from the doubleddata method applied to Historical evidence, confirming that it is difficulty incorporating the unusual historical aerosol forcing distribution employed by S20 that prevents datadoubling working satisfactorily in that case.
6.2 Computed Jeffreys' priors
Figure 6a shows the computed and calibrated Jeffreys' priors for S for each main line of evidence, with Paleoclimate represented by LGM evidence combined with either mPWP or PETM evidence, and for all main lines of evidence combined, from S = 1 K upwards. Below that level, they start to become artefacted—as they do below ~ 1.5 K when using S20's datavariable assumptions—due to the paucity of samples for Process and Historical evidence. The likelihood and the probability are both almost zero below those levels, so the effect on inference for S is negligible.
In all cases the priors decline rapidly with increasing S, reflecting declining informativeness of the evidence about S, making it difficult to compare their behavior at high S levels. Figure 6b shows these priors transformed into λspace—that is converted into priors for λ—but plotted against the corresponding values of S. Figure 6c shows the transformed priors for LGM, mPWP and PETM Paleoclimate evidence separately. Strictly, because for computational reasons the relatively small uncertainty in F_{2×CO2} is ignored when effecting the transformation, it is into priors for \(\overline{{F_{{{{2 \times \text{CO2}}}}} }} /S\), whereas \(\lambda = F_{{{{2 \times \text{CO2}}}}} /S\). The transformed priors show how informative the evidence is about λ at each S value, which varies comparatively little. Their shapes are all similar when using the revised datavariable assumptions to those when using S20's assumptions (Fig. 3b, c), except that the revised Historical evidence prior flattens out much earlier with rising S. For Process evidence, the prior should in both cases be uniform (constant) when transformed to λspace—the same prior as used for all lines of evidence in S20—since the λ likelihood is for the sum of normally distributed variables. The drop in the transformed priors that occurs at low S is due to uncertainty in F_{2×CO2} not being removed by the transformation. At low S values, which arise where ΔR is large and ΔT is small, fractional uncertainty in ΔT translates to significant absolute uncertainty in λ, hence Paleoclimate evidence (which has large ΔT uncertainty) is relatively uninformative.
At low S values, Process evidence is most informative, and dominates the combined evidence prior. As S increases, mPWP and hence Paleoclimate evidence becomes increasingly more informative, due mainly to fractional uncertainty in ΔR being low for the mPWP, and at high S dominates the combined evidence prior. Historical evidence is less informative than Process evidence at all S values, although only modestly so at high S when using S20's datavariable assumptions. The priors generally have slightly lower values when using the revised rather than S20's original datavariable assumptions. The reasons for this are discussed in the supplemental material (S6).
In all cases the priors derived using the doubled data and profile likelihood dataspace movement methods are close to those derived using the primary integrated likelihood method over S of 1–10 K, outside which virtually zero probability lies (supplemental material Figures S8, S9). In some Paleoclimate cases there is minor divergence at \(S > 4{\text{ K}}\) when using the dataspace movement method, but for combined Paleoclimate evidence little probability exists there. Although all the priors have been calibrated to equal (the square root of) Fisher information, those derived using the profile likelihood dataspace movement method are almost identical with and without calibration, confirming the soundness of that method. These comments apply also (at S > 1.5 K) to the priors derived when using S20's datavariable assumptions (Fig. 3), save that only the integrated likelihood method produces a usable prior for S20's Historical evidence (see Sects. 3.3 and 3.5), and hence for the combined evidence.
6.3 Posterior PDFs and percentiles for S
The estimated posterior PDFs based on the revised assumptions (Fig. 7, solid lines) all peak at lower S values, and are better constrained beyond their peaks, than those based on S20's assumptions (shown dotted).
Figure 7a shows the PDFs for separate lines of Paleoclimate evidence, derived using the revised datavariable assumptions and, for comparison, using S20's assumptions. Figure 7b shows PDFs from combining Paleoclimate LGM evidence with that from the mPWP or PETM, and for separate Process and Historical evidence. Figure 7c shows the final PDFs after combining Process, Historical and Paleoclimate evidence. The PDFs incorporating PETM evidence are almost identical to those incorporating mPWP evidence; they peak marginally earlier, and at a slightly higher level.
Figure 7d shows unnormalized posterior PDFs for S and S_{hist}. The PDFs for S are the same as in panel (b) except for the lack of normalization. When using the revised assumptions, there is little difference between the normalized and unnormalized PDFs, as only 2.0% of the probability lies beyond S = 20 K, down from 30% when using S20's assumptions.
Posterior PDFs computed using the doubled data method, or the profile likelihood method and its dataspace movement prior, are visually identical to those using the primary integrated likelihood method shown in Fig. 7, save for a marginal difference in the peak PDF level for separate LGM and PETM evidence (supplemental material Figures S10 to S13). For separate evidence, the sampling based integrated likelihood and doubled data method both derive their PDFs directly by sampling, so they are bound to be identical in these cases. However, PDFs computed using the nonsampling profile likelihood method and its very differently derived dataspace movement prior are completely independent of those from the integrated likelihood and doubled data methods.
Table 6 presents this study's primary results, in the form of medians and 66%, 90% and 95% uncertainty ranges from posterior PDFs, on the revised datavariable assumptions, derived using the integrated likelihood method and Jeffreys' priors. Results based on combining different pairs of lines of evidence, as well as all of them, are given. PDFs for S used in Table 6 were normalized to unit probability over 0–20 K; almost no probability (≤ 0.1%) lies outside that range.
As when using S20's assumptions, the S values at high percentiles confirm that Historical evidence is least important for constraining the upper uncertainty bounds, but Process evidence now constrains them almost as strongly as Paleoclimate evidence.
The limited revisions made to S20's assumptions reduce by onethird, from 3.23 to 2.16 K, the median estimate of S given by the combined evidence, using Jeffreys' priors and warm Paleoclimate evidence from the mPWP in both cases. The 83% and 95% uncertainty bounds reduce respectively from 4.1 to 2.7 K and from 5.05 to 3.2 K. If warm Paleoclimate evidence is instead taken from the PETM when the revised assumptions are used, the PDF percentiles from the median upwards reduce further, by ~ 0.05 K.
All the profile likelihood method (using the associated dataspace movement prior) derived percentile values are within ± 0.02 K of those in Table 6, with medians identical, when combining evidence in two stages (supplemental material S4). Moreover, when using the revised assumptions, the profile likelihood optimization process can simultaneously combine Process, Historical, LGM, and either mPWP or PETM evidence, and hence produce, in a single step, a posterior PDF for all lines of evidence combined. The 1% to 99% percentile points of those two PDFs match those from the samplingbased integrated likelihood method within ± 0.02 K.
The median S values when omitting evidence from each of the three main lines in turn, with Paleoclimate evidence combining LGM evidence with that from either the mPWP or PETM, are all within 0.1 K of the average of the two 'All combined' values.
It is useful to establish how sensitive the combinedevidence results are to the various categories of revisions to datavariable assumptions. A few of the revisions might be regarded as more questionable since they are based wholly or partly on reevaluation of existing evidence. That category includes LGM cooling and nonCO_{2} forcing, and also the revision to Historical aerosol forcing, which although largely based on newer evidence concerns a very poorly constrained forcing.
Table 7 divides the revisions made into six categories, starting with those that appear least debatable, being that to F_{2×CO2} together with appropriate adjustments (omitted in S20) to the calculation of the CO_{2} ERF estimates used, or which arise from alignment of the CO_{2} concentrations used for estimating of the ECS to S ratio. Without these changes, the bases of S estimation are biased and are not consistent between lines of evidence.
The next set of changes relate primarily to the substitution of the AR6 Historical nonaerosol ERF time series estimates, and of the AR6 zeromean estimate of the difference between Historical GMST and GMAT warming, for the estimates used by S20. Together, these two categories of revision reduce the median S estimate by almost 0.6 K, and the 95% uncertainty bound by almost 1 K.
The third category comprises all other revisions that are based entirely on newer evidence or later data, other than cloud feedback, the least well constrained feedback. The main changes involved are to scaling factors used for estimation of S from mPWP evidence, with those factors being derived from a more recent model intercomparison project than previously, and to the estimate of the Historical pattern effect. These revisions reduce the median S estimate by a further 0.25 K. Next, cloud feedback is revised, which reduces the median S estimate by almost 0.15 K more, to 2.25 K, with the 95% uncertainty bound now down to 3.4 K. Including the penultimate category, of revisions to LGM cooling and nonCO_{2} forcing, brings the median S estimate down by another 0.1 K, to 2.15 K.
Finally, and somewhat counter intuitively, revising the Historical aerosol ERF distribution, with the resulting S estimation basis now being the same as that in Table 6, does not further reduce the median S estimate. That estimate remains unchanged within computational uncertainty, although the 95% (and 97.5%) uncertainty bounds reduce by a further 0.05 K. Investigation suggests the principal cause is likely that, although the aerosol ERF distribution used by S20 has a median that (after scaling between periods using the AR6 time series) is some 0.2 Wm^{−2} more negative than for the adopted revised distribution, its mode is actually the less negative of the two. The shapes of the two distributions are such that the Historical S likelihoods resulting from their use, with other revisions to Historical datavariables having been made, are very similar below approximately 2 K. While highly negative aerosol ERF values, which correspond to high S values, are much more probable when using the S20 aerosol distribution, resulting in a larger Historical S likelihood, those high S values are almost ruled out by their low likelihood from Process and Paleoclimate evidence.
Table 8 presents samplingderived medians and 66%, 90% and 95% uncertainty range bounds for S, for the three main lines of evidence and for each Paleoclimate period separately, using the revised and (in italics) the original S20 assumptions. S20 did not provide any similar estimates. Posterior PDFs for S were normalized to unit probability over 0–20 K except where stated; results are also given without excluding probability outside that range, where it is nonnegligible.
Consistency between median S values from different lines of evidence is much improved when based on the revised assumptions; they span (without restriction to 0−20 K) only 1.9−2.4 K, compared to 2.4−6.1 K using S20's assumptions.
The median TCR is 1.54 K, within the AR6 likely range. The median TCR and S_{hist} estimates are respectively almost onethird lower, and almost 40% lower, than when using S20's assumptions.
7 Discussion
This study first identifies statistical problems in S20. Using a Subjective Bayesian statistical method involving an investigatorselected prior distribution, as S20 does, may produce unrealistic climate sensitivity estimation when used to combine differing types of evidence (Lewis 2018), even assuming that the data likelihood functions are correct. In this case, I found that the method S20 used for estimating likelihoods for all but Process evidence was in fact unsound, and that it underestimated likelihood at high S levels, substantially so in some cases. I also found that S20 used an uncertainty estimate for PETM CO_{2} forcing that was a factor of ten too low, due to an apparent coding error, further biasing their likelihood estimate (although not affecting their main results).
This study then develops an Objective Bayesian approach to combining differing climate sensitivity evidence that, unlike the method used in Lewis and Grünwald (2018), is not restricted to dealing with a particular simple statistical model. The approach involves computationally deriving Jeffreys' prior distributions that are designed to maximize the influence of the data on the results and to produce probabilistic estimates that are as close as possible to being confidence intervals, and thus are well calibrated. Three different inferential methods employed for this purpose each provide nearly identical estimated likelihoods and Jeffreys' priors, and final results. This result is very supportive of the validity of the methods used and of the results they produce.
The robustness of S20's results to the use of properly calibrated statistical methods and validly calculated likelihood estimates is then examined, using the Objective Bayesian methods developed in this study. It is shown that while S20's choice of prior and its likelihood misestimation lead to overconstraining of high S levels, based on S20's datavariable assumptions the downwards bias in S20's Baseline combined evidence results is modest: the median S estimate is approximately 0.13 K low, and the 95% uncertainty bound 0.35 K low. However, the bias in S20's No Process results is over twice as large.
The other main contribution of this study is to assess the impact of revising various input datavariable distributions used by S20, by:

(i)
adjusting the F_{2×CO2} value used for inferring S from Process and Historical evidence to reflect the effect of climate feedback changing over GCM abrupt4xCO2 simulations, as should undoubtedly be done;

(ii)
allowing for the CO_{2} concentrationERF relationship being slightly nonlogarithmic, and estimating the ECS to S ratio in a way that is unaffected by that relationship;

(iii)
changing some of S20's other datavariable estimates to reflect more recent information; and

(iv)
using arguably better justified (albeit not based purely on more recent information), alternative estimates for a few other datavariables.
I find that doing so results in substantially lower and better constrained estimates for S. The median S estimate when combining all lines of evidence, using the Objective Bayesian method and the LGM and mPWP for Paleoclimate evidence, reduces from 3.23 to 2.16 K.
All the revised datavariable estimates are not only defensible but, given the evidence now available, in my view are better justified than S20's original estimates. Moreover, omitting the only revisions dependent, to a greater or lesser extent, on reevaluation of existing evidence only very modestly changes the combined evidence results, with the omission just of the revision of the Historical aerosol forcing having almost no effect on the results.
It therefore currently remains quite plausible that S is below 2 K. The truncation in S20's results of the lower bound for S does not appear justified given the range of datavariable estimates supported by relevant, mainly more recent, studies. There is 36% probability of S being under 2 K, considerably greater than the 26% probability of S exceeding 2.5 K, according to the revised datavariable assumptions 'All combined: Paleo LGM + mPWP' results; they also imply that it is extremely unlikely that S is below 1.5 K, and extremely unlikely that S is above 3.2 K.
The revised datavariable median Historical evidence estimates of S_{hist} and TCR are somewhat higher than the comparable estimates in Lewis and Curry (2018), of 1.66 K and 1.33 K respectively. The excess is mainly due to a stronger aerosol ERF change, even after revising S20's assumptions. Further revising S20's median aerosol ERF to match the change per the AR5 timeseries, extended post2011 using AR6's annual changes, would reduce the Table 8 median S_{hist} and TCR to respectively 1.82 K and 1.40 K. Changing the base period to 1869–1882 to match Lewis and Curry (2018), avoiding the poorly observed 1861–1868 period, would further reduce those estimates, to 1.79 K and 1.37 K. The methane shortwave ERF adjustment, and greater estimated change in radiative imbalance, in AR6 can account for the small remaining differences.
Notes
Changing between a threshold convective scheme in which all convective condensate exceeding a threshold value is converted to precipitation, and a fractional removal scheme, in which only a fraction of such condensate is removed as precipitation.
Fisher information (I), in the case of a univariate parameter θ and assuming regularity conditions, is the expected value of minus the second derivative with respect to θ of the loglikelihood function (log p(x  θ)) for the data (x), at a given value of θ: \(I(\theta ) = \int { \, p(x\theta )\left( {  \frac{{\partial^{2} \log p(x\theta )}}{{\partial \theta^{2} }}} \right)dx}\); in the multivariate parameter case it is a matrix, sometimes just called the (expected) information matrix: \((I(\theta ))_{ij} = \int { \, p(x\theta )\left( {  \frac{{\partial^{2} \log p(x\theta )}}{{\partial \theta_{i} \partial \theta_{j} }}} \right)dx}\) (Bernardo and Smith, p. 288).
So that over the range plotted in S20 the cumulative probability of the two PDFs is then equal.
Ignoring uncertainty in F_{2×CO2}, which has a negligible effect on the transformation over the 2.5%–97.5% uncertainty range.
Because use of a uniforminS prior means the PDF for S is a scaled version of its likelihood, which remains above zero as \(\lambda \to 0\) and hence \(S \to \infty\).
References
Andrews T, Gregory JM, Paynter D, Silvers LG, Zhou C, Mauritsen T, Webb MJ, Armour KC, Forster PM, Titchner H (2018) Accounting for changing temperature patterns increases historical estimates of climate sensitivity. Geophys Res Lett 45(16):8490–8499
Annan JD, Hargreaves JC (2011) On the generation and interpretation of probabilistic estimates of climate sensitivity. Clim Change 104(3):423–436. https://doi.org/10.1007/s105840099715y
Annan JD, Hargreaves JC (2013) A new global reconstruction of temperature changes at the Last Glacial Maximum. Clim Past 9(1):367–376. https://doi.org/10.5194/cp93672013
Bayes T (1763) An essay towards solving a problem in the doctrine of chances. Philos Trans R Soc Lond 53 (1763) 370–418; 54 (1764) 269–325. Reprinted in Biometrika 45 (1958), 293–315.
Bellouin, N., Quaas, J., Gryspeerdt, E., Kinne, S., Stier, P., Watson‐Parris, D., Boucher, O., Carslaw, K.S., Christensen, M., Daniau, A.L. and Dufresne, J.L. (2020) Bounding global aerosol radiative forcing of climate change. Reviews of Geophysics, 58(1), p.e2019RG000660. https://doi.org/10.1029/2019RG000660
Berger JO, Bernardo JM (1992) On the development of reference priors (with discussion). In: Bernardo JM, Berger JO, Dawid AP, Smith AFM (eds) Bayesian Statistics 4. Oxford University Press, pp 35–60
Bernardo JM (1979) Reference posterior distributions for Bayesian inference (with discussion). J Roy Stat Soc Ser B 41:113–147
Bernardo JM (2009) Modern Bayesian inference: foundations and objective methods. In: Bandyopadhyay P, Forster M (eds) Philosophy of statistics. North Holland, Oxford, pp 263–306
Bernardo JM, Smith AFM (1994) Bayesian theory. Wiley, 608pp
Bernardo JM (2011) Modern Bayesian Inference: Foundations and Objective Methods, 263–306. In Philosophy of Statistics, P. Bandyopadhyay and M. Forster, eds. North Holland, 1253 pp
Birnbaum A (1962) On the foundations of statistical inference (with discussion). J Am Stat Assoc 57:269–332
Byrne B, Goldblatt C (2014) Radiative forcing at high concentrations of wellmixed greenhouse gases. Geophys Res Lett 41:152–160. https://doi.org/10.1002/2013gl058456
Caldwell PM, Bretherton CS, Zelinka MD, Klein SA, Santer BD, Sanderson BM (2014) Statistical significance of climate sensitivity predictors obtained by data mining. Geophys Res Lett 41:1803–1808. https://doi.org/10.1002/2014GL059205
Caldwell PM, Zelinka MD, Klein S, A.: Evaluating Emergent Constraints on Equilibrium Climate Sensitivity, (2018) J Climate 31:3921–3942
Cesana GV, Del Genio AD (2021) Observational constraint on cloud feedbacks suggests moderate climate sensitivity. Nat Clim Chang 11(3):213–218
Charney JG (1979) Carbon dioxide and climate: a scientific assessment. National Academies of Science Press, Washington, DC, p 22
Donahue AS, Caldwell PM (2018) Impact of physics parameterization ordering in a global atmosphere model. Journal of Advances in Modeling Earth Systems 10(2):481–499. https://doi.org/10.1002/2017MS001067
Efron B (1993) Bayes and likelihood calculations from confidence intervals. Biometrika 80:3–26
Etminan M, Myhre G, Highwood EJ, Shine KP (2016) Radiative forcing of carbon dioxide, methane, and nitrous oxide: a significant revision of the methane radiative forcing. Geophys Res Lett 43:12614–12623. https://doi.org/10.1002/2016GL071930
Forster P et al (2021) The Earth’s energy budget, climate feedbacks, and climate sensitivity. In: Climate Change 2021: The Physical Science Basis. Contribution of Working Group 1 to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [MassonDelmotte, V. et al. (eds.)]. Cambridge University Press
Fraser DA (2011) Is Bayes posterior just quick and dirty confidence? Stat Sci 26(3):299–316
Fraser DAS, Reid N (2011) On default priors and approximate location models. Br J Probab Stat 25(3):353–361
Fraser DAS, Reid N, Marras E, Yi GY (2010) Default priors for Bayesian and frequentist inference. J R Stat Soc B 72(5):631–654
Fueglistaler S, Silvers LG (2021) The peculiar trajectory of global warming. J Geophys Res Atmos 26:e2020JD033629. https://doi.org/10.1029/2020JD033629
Glassmeier F, Hoffmann F, Johnson JS, Yamaguchi T, Carslaw KS, Feingold G (2021) Aerosolcloudclimate cooling overestimated by shiptrack data. Science 371(6528):485–489
Gregory JM, Stouffer RJ, Raper SCB, Stott PA, Rayner NA (2002) An observationally based estimate of the climate sensitivity. J Clim 15:3117–3121
Gryspeerdt E et al (2019) Constraining the aerosol influence on cloud liquid water path. Atmos Chem Phys 19:5331–5347
Gulev SK et al (2021) Changing state of the climate system. In: Climate Change 2021: The Physical Science Basis. Contribution of Working Group 1 to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [MassonDelmotte, V. et al. (eds.)]. Cambridge University Press
Hamilton DS et al (2018) Reassessment of preindustrial fire emissions strongly affects anthropogenic aerosol forcing. Nat Commun 9(1):1–12. https://doi.org/10.1038/s41467018055929
Hartigan JA (1965) The asymptotically unbiased prior distribution. Ann Math Statist 36(4):1137–1152
Haywood AM et al (2020) The pliocene model intercomparison project phase 2: largescale climate features and climate sensitivity. Clim past 16(6):2095–2123. https://doi.org/10.5194/cp1620952020
Jeffreys H (1946) An invariant form for the prior probability in estimation problems. Proc Roy Soc A 186:453–461
Kass RE, Wasserman L (1996) The selection of prior distributions by formal rules. J Am Stat Assoc 91(435):1343–1370
Klein SA, Hall A, Norris JR, Pincus R (2017) Lowcloud feedbacks from cloudcontrolling factors: a review. Shallow clouds, water vapor, circulation, and climate sensitivity, pp 135–157
Kohler P, Bintanja R, Fischer H, Joos F, Knutti R, Lohmann G, MassonDelmotte V (2010) What caused Earth's temperature variations during the last 800,000 years? Databased evidence on radiative forcing and constraints on climate sensitivity. Q Sci Rev 29(1.2):129–145. https://doi.org/10.1016/j.quascirev.2009.09.026
Lee SS, Chu JE, Timmermann A, Chung ES, Lee JY (2021) East Asian climate response to COVID19 lockdown measures in China. Sci Rep 11(1):1–9. https://doi.org/10.1038/s41598021960071
Lewis N (2013b) An objective Bayesian improved approach for applying optimal fingerprint techniques to estimate climate sensitivity. J Clim 26:7414–7429
Lewis N (2014) Objective inference for climate parameters: Bayesian, transformation of variables and profile likelihood approaches. J Clim 27:7270–7284. https://doi.org/10.1175/JCLID1300584.1
Lewis N (2018) Combining independent Bayesian posteriors into a confidence distribution, with application to estimating climate sensitivity. J Stat Plan Inference 195:80–92. https://doi.org/10.1016/j.jspi.2017.09.013
Lewis N, Curry JA (2015) The implications for climate sensitivity of AR5 forcing and heat uptake estimates. Clim Dyn 45:1009–1023. https://doi.org/10.1007/s003820142342y
Lewis N, Curry JA (2018) The impact of recent forcing and ocean heat uptake data on estimates of climate sensitivity. J Clim 31(6051):6071
Lewis N, Grünwald P (2018) Objectively combining AR5 instrumental period and paleoclimate climate sensitivity evidence. Clim Dyn 50(5):2199–2216
Lewis N, Mauritsen T (2021) Negligible unforced historical pattern effect on climate feedback strength found in HadISSTbased AMIP simulations. J Clim 34(1):39–55
Lewis N (2013a) Modification of Bayesian updating where continuous parameters have differing relationships with new and existing data. arXiv:1308.2791 [stat.ME].
Liu P, Kaplan JO, Mickley LJ, Li Y, Chellman NJ, Arienzo MM, Kodros JK, Pierce JR, Sigl M, Freitag J, Mulvaney R (2021) Improved estimates of preindustrial biomass burning reduce the magnitude of aerosol climate forcing in the Southern Hemisphere. Sci Adv 7(22), p.eabc1379. https://doi.org/10.1126/sciadv.abc1379
McClymont EL et al (2020) Lessons from a high CO2 world: an ocean view from ~ 3 million years ago. Clim past 16(4):1599–1615. https://doi.org/10.5194/cp2019161
Meinshausen M, Nicholls ZRJ, Lewis J, Gidden MJ, Vogel E, Freund M et al (2020) The shared socioeconomic pathway (SSP) greenhouse gas concentrations and their extensions to 2500. Geosci Model Dev 13:3571–3605. https://doi.org/10.5194/gmd1335712020
Mülmenstädt J, Salzmann M, Kay JE, Zelinka MD, Ma PL, Nam C, Kretzschmar J, Hörnig S, Quaas J (2021) An underestimated negative cloud feedback from cloud lifetime changes. Nat Clim Chang 11(6):508–513
Myers TA, Scott RC, Zelinka MD, Klein SA, Norris JR, Caldwell PM (2021) Observational constraints on low cloud feedback reduce uncertainty of climate sensitivity. Nat Clim Chang 11(6):501–507
Otto A, Coauthors, (2013) Energy budget constraints on climate response. Nat Geosci 6:415–416. https://doi.org/10.1038/ngeo1836
Paulot F, Paynter D, Winton M, Ginoux P, Zhao M, Horowitz LW (2020) Revisiting the impact of sea salt on climate sensitivity. Geophys Res Lett 47:e2019GL085601. https://doi.org/10.1029/2019GL085601
Pawitan Y (2001) In all Likelihood: Statistical Modeling and Inference Using Likelihood Ch. 3.4. Oxford Univ. Press, 514 pp
Possner A, Eastman R, Bender F, Glassmeier F (2020) Deconvolution of boundary layer depth and aerosol constraints on cloud water path in subtropical stratocumulus decks. Atmos Chem Phys 20:3609–3621
Raftery AE, Schweder T (1993) Inference about the ratio of two parameters, with application to whale censusing. Amer Stat 47(4):259–264
Rugenstein M, BlochJohnson J, Gregory J, Andrews T, Mauritsen T, Li C et al (2020) Equilibrium climate sensitivity estimated by equilibrating climate models. Geophys Res Lett 47:e2019GL083898. https://doi.org/10.1029/2019GL083898
Schlund M, Lauer A, Gentine P, Sherwood SC, Eyring V (2020) Emergent constraints on equilibrium climate sensitivity in CMIP5: do they hold for CMIP6? Earth Syst Dyn 11(4):1233–1258. https://doi.org/10.5194/esd1112332020
Schweder T, Hjort NL (2002) Confidence and likelihood. Scand J Stat 29:309–332
Schweder T, Hjort NL (2016) Confidence, likelihood. Cambridge University Press, Probability, p 500
Sherwood SC, Webb MJ, Annan JD, Armour KC, Forster PM, Hargreaves JC, Hegerl G, Klein SA, Marvel KD, Rohling EJ, Watanabe M, Andrews T, Braconnot P, Bretherton CS, Foster GL, Hausfather Z, von der Heydt AS, Knutti R, Mauritsen T, Norris JR, Proistosescu C, Rugenstein M, Schmidt GA, Tokarska KB, Zelinka MD (2020) An assessment of Earth's climate sensitivity using multiple lines of evidence. Rev Geophys 58(4):e2019RG000678
Smith CJ et al (2021) Figure and data generation for Chapter 7 of the IPCC's Sixth Assessment Report, Working Group 1 (plus assorted other contributions). Version 1.0. https://doi.org/10.5281/zenodo.5211357; Accessed 15 Sept 2021
Stevens B, Sherwood SC, Bony S, Webb MJ (2016) Prospects for narrowing bounds on Earth’s equilibrium climate sensitivity. Earth’s Fut 4(11):512–522. https://doi.org/10.1002/2016EF000376
Strommen K, Watson PA, Palmer TN (2019) The impact of a stochastic parameterization scheme on climate sensitivity in ECEarth. J Geophys Res Atmos 124(23):12726–12740
Tierney JE, Haywood AM, Feng R, Bhattacharya T, OttoBliesner BL (2019) Pliocene warmth consistent with greenhouse gas forcing. Geophys Res Lett 46:9136–9144
Welch BL, Peers HW (1963) On formulae for confidence points based on integrals of weighted likelihoods. J R Soc Ser B 25:318–329
Zelinka MD, Myers TA, McCoy DT, PoChedley S, Caldwell PM, Ceppi P et al (2020) Causes of higher climate sensitivity in CMIP6 models. Geophys Res Lett 47:e2019GL085782. https://doi.org/10.1029/2019GL085782
Zhao M, Coauthors, (2016) Uncertainty in model climate sensitivity traced to representations of cumulus precipitation microphysics. J Clim 29:543–560. https://doi.org/10.1175/JCLID150191.1
Zhou C, Zelinka MD, Dessler AE, Wang M (2021) Greater committed warming after accounting for the pattern effect. Nat Clim Chang 11(2):132–136. https://doi.org/10.1038/s4155802000955x
Zhu J, Poulsen CJ (2021) Last Glacial Maximum (LGM) climate forcing and ocean dynamical feedback and their implications for estimating climate sensitivity. Clim past 17(1):253–267
Acknowledgements
I thank Judith Curry, Ross McKitrick, and Frank Bosse for commenting on drafts, and Mark Zelinka for contributing data. I thank Tamas Bodai and another (anonymous) reviewer for constructive comments that helped improve the submitted manuscript. I acknowledge the World Climate Research Programme's (WCRP) Working Group on Coupled Modelling and the climate modeling groups for producing and making their output available.
Funding
No funding has been received for this work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author has no relevant interests to disclose.
Data availability
All data generated or analysed during this study are included in this published article or are available from publicly accessible archive sites, from its supplementary information files or from its cited references. Raw CMIP5 and CMIP6 abrupt4xCO2 simulation data are available at https://esgfnode.llnl.gov/projects/esgfllnl/.
Code availability
Code to generate this paper's results is available from the author on reasonable request.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Lewis, N. Objectively combining climate sensitivity evidence. Clim Dyn 60, 3139–3165 (2023). https://doi.org/10.1007/s0038202206468x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s0038202206468x