An observation-based scaling model for climate sensitivity estimates and global projections to 2100

We directly exploit the stochasticity of the internal variability, and the linearity of the forced response to make global temperature projections based on historical data and a Green’s function, or Climate Response Function (CRF). To make the problem tractable, we take advantage of the temporal scaling symmetry to define a scaling CRF characterized by the scaling exponent H, which controls the long-range memory of the climate, i.e. how fast the system tends toward a steady-state, and an inner scale \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau \approx 2$$\end{document}τ≈2 years below which the higher-frequency response is smoothed out. An aerosol scaling factor and a non-linear volcanic damping exponent were introduced to account for the large uncertainty in these forcings. We estimate the model and forcing parameters by Bayesian inference which allows us to analytically calculate the transient climate response and the equilibrium climate sensitivity as: \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1.7^{+0.3} _{-0.2}$$\end{document}1.7-0.2+0.3 K and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2.4^{+1.3} _{-0.6}$$\end{document}2.4-0.6+1.3 K respectively (likely range). Projections to 2100 according to the RCP 2.6, 4.5 and 8.5 scenarios yield warmings with respect to 1880–1910 of: \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1.5^{+0.4}_{-0.2}K$$\end{document}1.5-0.2+0.4K, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2.3^{+0.7}_{-0.5}$$\end{document}2.3-0.5+0.7 K and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4.2^{+1.3}_{-0.9}$$\end{document}4.2-0.9+1.3 K. These projection estimates are lower than the ones based on a Coupled Model Intercomparison Project phase 5 multi-model ensemble; more importantly, their uncertainties are smaller and only depend on historical temperature and forcing series. The key uncertainty is due to aerosol forcings; we find a modern (2005) forcing value of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[-1.0, -0.3]\, \,\,\mathrm{Wm} ^{-2}$$\end{document}[-1.0,-0.3]Wm-2 (90 % confidence interval) with median at \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.7 \,\,\mathrm{Wm} ^{-2}$$\end{document}-0.7Wm-2. Projecting to 2100, we find that to keep the warming below 1.5 K, future emissions must undergo cuts similar to RCP 2.6 for which the probability to remain under 1.5 K is 48 %. RCP 4.5 and RCP 8.5-like futures overshoot with very high probability.


Introduction
The atmosphere is a complex system involving turbulent processes operating over a wide range of scales starting from millimeters at the Kolmogorov dissipation scale up to the size of the Earth, spanning over 10 orders of magnitudes in space. The dynamics are sensitive to initial conditions and there are deterministic predictability limits that are roughly equal to the eddy turn-over time (lifetime) of structures. For planetary scale structures in the atmosphere, the overall deterministic prediction limit of about 10 days corresponds to the scaling transition timescale w from the weather regime to the macroweather regime (Lovejoy and Schertzer 2013a).
The atmospheric components of GCMs exhibit the same weather-macroweather scaling transition as the atmosphere and similar predictability limits. Beyond this horizon, the internal variability has to be interpreted stochastically so that a single GCM run is only one realization of the random 1 3 process; at these timescales, weather models effectively become stochastic macroweather generators. For projections over multi-decadal timescales and beyond, multi-model ensembles (MME) that include several models are used. The mean of the MME is taken to obtain the deterministic forced component of temperature variability and average out the internal variability .
Emergent properties of the Earth's climate, i.e. properties which are not specified a priori, are then inferred from GCM simulations. The equilibrium climate sensitivity (ECS) is such a property; it refers to the expected temperature change after an infinitely long time following a doubling in carbon dioxide ( CO 2 ) atmospheric concentration. Another is the transient climate response (TCR), which is defined as the change in temperature after a gradual doubling of CO 2 atmospheric concentration over 70 years at a rate of 1% per year. However, it is not clear whether such emergent properties from computational models can be taken as genuine features of the natural world. The difficulty is that each GCM has its own climate ("structural uncertainty") and this leads to very large discrepancies in ECS and TCR between GCMs; this underscores the need for qualitatively different approaches which can narrow down the properties of the real climate directly from observations. The ecological consequences of global warming could be dire; therefore, better constraining climate sensitivity is of utmost importance in order to meet the urgency of adjusting economical and environmental policies. Since the United States National Academy of Sciences report (Charney et al. 1979), the likely range for the ECS has not changed and remains [1.5, 4.5] K . A likely range corresponds to a 66% confidence interval (CI) and a very likely range corresponds to a 90% CI (Mastrandrea et al. 2010). In the Fourth Assessment Report (AR4) of the Intergovernmental Panel on Climate Change (IPCC), the lower limit was revised upward by 0.5 K, in accordance with the very likely range for ECS of GCMs from the Coupled Model Intercomparison Project phase 3 (CMIP3) ([2.1, 4.4] K). In the Fifth Assessment Report (AR5), the lower limit returned to that of AR3 and earlier assessment reports because of new observationbased results, while the very likely range of CMIP phase 5 (CMIP5) GCMs ([1.9, 4.5] K) remained very close to the CMIP3 one.
In this paper, we extend the approach of Hébert and Lovejoy (2018) to make climate projections through 2100. The approach is based on historical data and a simple model of the system memory based on scaling symmetries. The output of our model is then evaluated against the instrumental record using Bayes' rules in order to obtain a probabilistic estimate of its parameters.
The paper is structured in 3 sections : methods and material, results and conclusion. The methods and material section is divided in three sub-sections. First, we introduce the linear response framework and then describe the scaling climate response function considered. Secondly, the radiative forcing, temperature and GCM simulations used will be described, and thirdly, we explain the method used for the estimation of the model parameters. The results section is also divided into three parts. The first sub-section presents the probability distribution functions for the parameters and applies them to decompose the anthropogenic and natural forced signals, and the internal variability. The second subsection estimates the ECS and TCR with the parameters found, and the third uses the same parameters to produce global projections to 2100 which are better constrained than a 32 CMIP5 GCMs MME with which they are compared.

The linear response framework
The approach used in this study builds on the work of Hasselmann and other authors who worked within the linear response framework applied to the climate (Budyko, Sellers, Schwarz, Li and Jarvis, Held et al., Von Hateren, Rypdal and Rypdal, Dijkstra, Geoffroy et al., Marshall et al.). Below we first provide a review of this work for context. The reader solely interested in the current approach can jump to Sect. 2.2 without loss of continuity.
The internal components of the Earth system are often far from thermodynamic equilibrium, yet, taken as a whole, the Earth is not so far from an energy balance with outer space, and, at any moment, the difference between the incoming and outgoing energy fluxes is stored in the soil, ocean and atmosphere. The deviations from energy balance are typically small-at the level of a few percent-and this justifies the linear response framework. In this paper, we consider "zero-dimensional" energy balance models in which the Earth's globally averaged temperature T(t) and forcing F(t) are anomaly time series, i.e. they represent deviations from their reference values.
The earliest linearized temperature response models were the Budyko-Sellers energy balance models based on the heat equation (Budyko 1969;Sellers 1969). These were originally one dimensional (zonally averaged) models, which, when globally averaged, are equivalent to the single "box" model (Hasselmann et al. 1993). Global energy balance box models are models of the temperature based on a homogeneous "box". The box has a spatially uniform temperature that stores energy according to its heat capacity, density and size. If there is a single box, and one asssumes Newton's law of cooling and that the heat crossing the surface is proportional to the first derivative of the order differential relationship with temperature, then, when perturbed, the Earth's temperature will relax in an exponential way to its new steady-state temperature. When extra boxes are added, they mutually exchange heat, leading to a total response that is the sum of exponentials. Hasselmann et al. (1993Hasselmann et al. ( , 1997 already noted that it was desirable to use the more general linearized framework of response functions. This, they argued, was because empirical box models with a small number of degrees of freedom "lose the detailed information on the climate state and therefore cannot be readily constrained to conform to the detailed linearized dynamics of a more realistic CGCM climate model." In this context, the response functions are called "climate response functions" (CRFs) and, following Hasselmann et al. (1993), we have a choice between the equivalent impulse CRFs G (t) and step CRFs G (t) . The subscript " " indicates Dirac delta function and " " is for its integral, the step (Heaviside) function ( = 1 for t > 0 and = 0 for t ≤ 0). Hasselmann et al. (1993) and especially Hasselmann et al. (1997) already pointed out the advantages of the step CRF that relates the forcing and temperature via: where the F � (t) is the time derivative of the forcing F(t) and s is the equilibrium climate sensitivity (ECS) with units of K per doubling of CO 2 (see Eq. 15 below). While Hasselmann et al. (1993) incorporated the sensitivity into the definition of G (t) , writing the response with the separate factor s has the advantage that G (t) is dimensionless and s and F(t) have their usual dimensions. For generality, we have also extended the range of integration to cover the entire past. In advocating Eq. 1, Hasselmann et al. (1997) pointed out that "The formulation of the climate response in terms of a response integral (i.e. step response) rather than in the traditional form of a differential equation for a box model has further advantages: it is not limited to simple low-order differential equations...". Another advantage, later emphasized by Marshall et al. (2014Marshall et al. ( , 2017, ) is that, physically, basing the theory on G (t) is equivalent to studying the temperature response in classical CO 2 doubling experiments; other advantages are discussed below. The equivalence between the impulse and step CRFs arises because the step function (t) is the integral of (t) so that: The temperature response in terms of G (t) rather than G (t) can thus easily be obtained by integrating Eq. 1 by parts to yield: (1) In the integration by parts, we used Eq. 2 and the boundary conditions G (0) = 0 and F(−∞) = 0 . Since causality requires G (t) = G (t) = 0 when t ≤ 0 , the former condition G (0) = 0 is satisfied by physical systems. Similarly, the relation F(−∞) = 0 is not a restriction as it can be regarded simply as the definition of a convenient reference level of the forcing.
Unfortunately, without more assumptions or information, the linear framework of Eq. 1 (or Eq. 3) is unmanageably general. In order to make progress, Hasselmann et al. (1997) proposed a response function consisting of a sum of N exponentials -effectively an N box model (although without using differential equations: the boxes were only implicit). Nevertheless, they ultimately chose N = 3 out of practical necessity-so as to fit GCM outputs. Following the more usual procedure of deriving the impulse responses from linear differential equations (where impulse CRFs are called "Green's functions"), Li and Jarvis (2009) used Laplace transforms to explicitly show that polynomial forcings of nth ordered differential equations (with constant coefficients and with n an integer), can quite generally be reduced to sums of exponentials. However, in the application part of their paper, they nevertheless used the value N = 3. The N exponential model was later advocated by van Hateren (2013), by the IPCC AR5 (2013, section 8.SM.11.2), and more recently by Frederiksen and Rypdal (2017). However, each exponential has its own amplitude and time constant so that even if we exclude the climate sensitivity parameter, the N exponential models have 2N parameters. This rapidly becomes an unmanageably large number so that in practice N = 2 or N = 3 are often chosen.
An interesting exception is van Hateren (2013) who used N = 6 , but avoided fitting the implicit 2N = 12 independent parameters by linking the amplitudes and time constants by a power law, calling the resulting four parameter model a "fractal climate response" model. His resulting step CRF model specifies four parameters: low and high frequency truncations, a logarithmic oscillation frequency and an overall scaling exponent. In the model we develop below, we eliminate the unnecessary oscillations and low frequency cutoff, thus reducing the step CRF to only two parameters. Using van Hateren's empirically fitted parameters yields an impulse response that over the range of about 6 months to 1000 years is within a factor of ≈ 2 of our two-parameter model described below. For comparison, the key exponent H (denoted by q in his notation) was estimated as −0.15 compared to our value of −0.5 +0.4 −0.5 (below). An approach related to the scaling CRF we develop below is that of Procyk et al. (2020) which is based on the Earth energy balance. A fractional generalization of conventional box models was considered and solved using Mittag-Leffler functions, often called generalized exponentials, also characterized by a scaling exponent H which was estimated as H ∈ [0.33, 0.44] and is equivalent to the negative of the SCRF's H (below).
Although these authors proposed exponentials largely on mathematical grounds, the majority of linear response theory applications attempt to give physical interpretations of their parameters, especially their time constants, and these have not been very satisfactory. If each exponential can be modelled by a box that effectively stores heat, then it is not clear what the box should represent physically. If one chooses the atmosphere (e.g. Dijkstra 2013), then one obtains a short relaxation time of the order of days, whereas if one chooses the ocean, then a wide range of time scales can be obtained depending on the thickness of the relevant ocean layer.
Several estimates of the fast , which would correspond to the rapidly equilibriating mixed-layer of the ocean, find values below 10 years: = 8.5 ± 2.5 years (Schwartz 2008), ≈ 4 years , ∈ [1, 6] years (Geoffroy et al. 2013), = 4.3 ± 0.6 years (Rypdal and Rypdal 2014). The estimates of the slow component, which would correspond to the deep ocean, are widely divergent and Geoffroy et al. (2013) concludes it should be between 60 and 700 years, an interval spanning an order of magnitude well above what can be directly probed from historical observations. The IPCC AR5 suggests a double exponential model with = 8.4 years and = 409.5 years.
But the box models are overly simple: in reality, the earth is highly heterogeneous with dynamical processes redistributing energy over a huge number of degrees of freedom, and a multitude of storage mechanisms, covering a wide ranges of scales. What we really need is a phenomenological model that is approximately valid for the globally averaged temperature, an equation that is valid at time scales longer than the weather scales (about 10 days). Once we accept that our equation is at best valid for averages (globally and over weather scales), it is no longer necessary to constrain the model to a single-or even a small number of degrees of freedom-slabs or boxes.
When F(t) is a step function, Eqs. 1 and 3 describe a linear system that is perturbed by a forcing and that subsequently relaxes to a new steady-state temperature. The problem is that the range of time scales over which the system reacts to a forcing is huge; indeed, due to scaling symmetries, as recognized by Rypdal (2012) andvan Hateren (2013), the response is closer to a power law. We therefore seek to go beyond exponentials, while still restricting our attention to CRFs that correspond to processes which can relax to a stable state of energy balance. To see what constraints such "generalized" relaxation imposes, the step CRF is particularly convenient.
For example, for a physical system, a finite step forcing: must give a finite response. Since in Eq. 1 we included the extra sensitivity factor s , without loss of generality we can consider only normalized step CRF such that: Since sF 0 is the new steady-state temperature, s is the usual ECS. In addition, G should be constrained to functions such that a steady state is established monotonically; we should exclude step responses that oscillate or that overshoot the steady state before returning to it: In addition, the response to a positive forcing should be positive so that combining these constraints, and using Eqs. 5, 6 and causality ( G (0) = 0 ) we obtain : Systems whose step CRFs respect Eqs. 6 and 7 thus define physically plausible generalized relaxation processes. As an example, the classical relaxation box/exponential model yields: where is the "relaxation time", the characteristic time associated with the return to a state of energy balance. We see that the approach to the steady-state is exponentially fast. Although the box model is usually specified via a differential equation, from the above, we see that it could equivalently be specified by G , box (t) ; indeed, it is easy to verify that G , box (t) satisfies the standard box-model relaxation equation: Taking derivatives, we also confirm that G , box (t) = G � , box (t) is indeed the impulse response for the operator.
Other simple CRF's have been proposed, notably the Dirac function itself with a lag t 0 : Which implies: This has been used by Lean and Rind (2008) (with t 0 = 10 years as part of multiple regression), and directly by Lovejoy (2014) who varied t 0 in the range 0-20 years. In both cases, the forcing was at annual or longer temporal resolution so the response was not really instantaneous as Eq. 10 implies.

Scaling relaxation processes
The Dirac CRF has essentially no memory, and the box model has an exponentially decaying one, neither is satisfactory, unless they are combined into a model with multiple response times. To be more realistic, the CRF should satisfy scaling symmetries (respected for example by the GCMs in control runs); we should therefore choose functions that correspond to scaling (power law) relaxation processes. The simplest scaling CRF (SCRF) that satisfies Eqs. 5 and 6 is: where the requirement H < 0 is needed so that lim t→∞ G (t) = 1 and the truncation at a small timescale is necessary so that G (0) = 0 . This step SCRF describes a power law relaxation process, thus converging more slowly, and realistically, to a steady-state than typical exponential models ( Fig. 1 ), with scaling exponent H ( H < 0 ) and a corresponding impulse SCRF: so that the impulse CRF is a also a truncated power law. Rypdal (2012) already proposed a similar CRF with H > 0 , which has the advantage of not needing the (12) truncation at small time scales. This allows the modelling of the high-frequency with the same scaling by the simple addition of a random white noise forcing to the deterministic forcing. This came at the expense of divergence at large time scales, the runaway Green's function effect , since any finite increase in forcing would lead to an ever increasing temperature response, i.e. an infinite ECS. An interesting alternative to model high-and lowfrequency regimes with scaling regimes can be obtained by the fractional generalization of the energy-balance equation, leading to a CRF with a similar (convergent) low-frequency behaviour as the SCRF (Procyk et al. 2020).
It is straightforward to analytically calculate the expected temperature increase using the truncated SCRF in Eq. 13 in response to specific forcing scenario such as to recover TCR and ECS (See Appendix A for details). With the same forcing scenario as ECS, we also define ECS 500 as the expected temperature 500 years after the CO 2 doubling rather than at infinity. This is a more relevant ECS measure from a human perspective and helps to illustrate the contribution to the ECS of the very long-memory beyond 500 years. The ratio of TCR to ECS (Eq. A10) changes from unity for H → −∞ to zero for H ≥ 0 where ECS diverges while TCR remains finite; conversely, the fraction of warming left between 500 years and infinity ( (ECS − ECS 500 )ECS −1 ) goes from zero to unity when H goes from negative infinity to zero and greater (Fig. 2).

Radiative forcing data
In this paper, we consider three sources of external forcing: solar and volcanic which are natural, and anthropogenic forcing which involves several forcing agents produced Fig. 1 A nondimensional comparison of the step SCRF (Eq. 12, red) with the classical two-box exponential step CRF (Eq. 8, blue). The nondimensional (step) forcing (black) is also shown, and the difference between this forcing and the response is the rate of energy storage. The parameters for the two-box exponential are the best estimates from Geoffroy et al. (2013): fast = 4.1 years , slow = 249 years , C = 7.3 year Wm −2 K −1 and C 0 = 106 year Wm −2 K −1 while for the SCRF we use = 2 years and show the curves for different H values indicated on the graph Fig. 2 The analytical ratio between TCR and ECS is shown as a function of the scaling exponent H for a high frequency cutoff = 2 years . The ratio of TCR to ECS 500 is also shown for equilibrium defined at 500 years (black) along with the true equilibrium at infinity (blue) and the leftover warming fraction (red) between 500 years and infinity by humans. The forcing is usually expressed in W m −2 ; however the climate sensitivity is commonly measured in K per doubling of CO 2 . Therefore, it is convenient to also define forcing as a fraction of the forcing imparted by a doubling of CO 2 concentration: F 2× CO 2 . The generally accepted (approximate) carbon dioxide concentration to forcing relationship is: where F CO 2 is the forcing due to carbon dioxide, is the carbon dioxide concentration and 0 is its pre-industrial value which we take to be 277 ppm . Therefore,

a) Greenhouse Gas Forcing
Anthropogenic influences on the climate have been recognized as the main driver of the global warming characteristic of the last century, and the related forcing is mostly due to historical changes in atmospheric composition. Future anthropogenic forcing is prescribed in four scenarios, the Representative Concentration Pathways (RCPs), established by the IPCC for CMIP5 simulations : RCP 2.6, RCP 4.5, RCP 6.0 and RCP 8.5 (Meinshausen et al. 2011), shown in Fig. 3. They are named according to the total radiative forcing in Wm −2 expected in the year 2100 and are motivated by complex economic projections, expected technological developments, and political decisions. The scenarios allow us to verify and compare results from our observations-based SCRF model with CMIP5 simulations. Generally, RCP 6.0 was left out of the analysis since fewer CMIP5 modeling groups performed the associated runs.
The measure of anthropogenic forcing F Ant used in this paper is the carbon dioxide equivalent F CO 2 EQ series given in the RCP scenarios. It corresponds to the combined effective radiative forcing produced by Long Lived Greenhouse Gases (GHG) F GHG : carbon dioxide, methane, nitrous oxide and fluorinated gases, controlled under the

Fig. 3 (top)
The anthropogenic aerosol forcing series used, F Aer RCP (blue) and F Aer Q a (black), are shown over the historical period and over the projection period until 2100 for RCP 2.6 (solid), RCP 4.5 (dashed), and RCP 8.5 (dotted); F Aer Q a was extended with the F Aer RCP series. (bottom) The greenhouse gas forcing series F GHG (blue) and the total anthropogenic forcing series, adding F Aer RCP (black) or F Aer Q a (red) to F GHG , are shown over the historical period and projection period for the 3 RCP scenarios considered, as above Kyoto protocol, ozone depleting substances, controlled under the Montreal Protocol, and aerosols.
Chapter 8 of the IPCC AR5 reports the increase from 1750 to 2005 of effective radiative forcings with a very likely (90%) confidence interval (CI); we summarized them in Table 1 to evaluate the relative uncertainty of the different anthropogenic forcing agents. Note that we will report likely and very likely (symmetrical) CI at the 66% and 90% confidence level, respectively, throughout this work (i.e. ±1 and ±1.645 standard deviations respectively), in accordance with the IPCC. The largest forcing increase stems from GHG, in particular carbon dioxide, and it has a relatively small uncertainty.
b) Aerosol forcing There are also negative contributions to anthropogenic forcing from aerosols' direct effect and indirect cloud albedo effects, both with very high relative uncertainties. The total anthropogenic change in effective radiative forcing is certainly positive, due to the strong GHG forcing, but the large uncertainty on aerosol forcing strongly dominates the total uncertainty. We therefore introduce the aerosol linear scaling factor as an extra parameter to scale aerosol forcing (see Eq. 21 below).
The aerosol forcing in the RCP files F Aer RCP is given implicitly; it can be obtained by subtracting the combined effective radiative forcing from gases controlled under the Kyoto protocol, F Kyt , and from those controlled under the Montreal protocol, F Mtl (Fig. 3) from the CO 2EQ forcing. F Mtl is given in CFC-12 equivalent concentration and we use the relation from Ramaswamy et al. (2001) to convert back to W m −2 .
The very likely CI given for the modern value, defined in 2005, of total aerosol forcing in the IPCC AR5 is [−1.9, −0.1] W m −2 , but Stevens (2015) (S15) demonstrates that a forcing more negative than −1 W m −2 is implausible and suggests, combined with results from Murphy et al. (2009) tightening the upper bound to −0.3 W m −2 , that the interval be revised to S15 proposes a three parameter model to derive the aerosol direct and indirect forcing directly from anthropogenic sulfur dioxide emissions Q a : where Q a is the annual anthropogenic sulfur dioxide emissions, is the direct effect coefficient, is the indirect effect coefficient and Q n is the mean natural sulfur dioxide atmospheric source strength. For a given set of parameters, we can obtain a forcing series which is highly correlated (r = 0.95) with F Aer RCP , and also with Q a (r = −0.996 ) itself since the log term is approximately linear near Q n . Therefore, we also include in the analysis an alternate aerosol forcing series for comparison which takes Q a as a linear surrogate for the total aerosol forcing such that : where the coefficient * is taken to be equal to −9.3 × 10 −6 W m −2 Tg −1 yr so that the modern (2005) value of F Aer Qa and F Aer RCP are both about −1.0 W m −2 . This method allows us to derive an aerosol forcing series (Fig. 3) directly from sulfur dioxide emission data that does not rely on GCMs. We decided to present as the main result estimates made using F AerRCP given it is based on a more accepted aerosol forcing series closer to that used in the MME we use for comparison, and results based on F AerQa are presented alongside to show the imporant downward impact on all warming metrics that would result if it were shown to be a more reliable indication of aerosol forcing. c) Neglected anthropogenic forcing Left out from our analysis are other sources of anthropogenic forcing for which only the direct radiative forcing increase is reported in AR5: tropospheric ozone, stratospheric ozone, statospheric water vapour from CH 4 , surface albedo from land use changes, surface albedo from black carbon aerosol on snow and ice and contrails. They were neglected given the uncertainty on the shape of the response and their small combined impact. Their total should be positive and therefore, the median estimates of sensitivity presented in this paper will possibly be biased high by a small amount.

d) Solar forcing
The two main natural forcings are solar and volcanic, but others include natural aerosols such as mineral dust and sea salt which will not be considered as they are not externally forced and depend on the internal variability of the system.
We use the recommended solar forcing F S for CMIP5 experiments shown in Fig. 4 (along with the volcanic Volcanic forcing F V 1 (blue) is shown alongside two damped versions. The black one is linearly damped by a constant 0.5 coefficient while the red one, F V 0.6 , is damped using Eq. 18 with = 0.6 . The solar forcing F S (orange) has been shifted down by −1.5 and amplified by a factor of 10 for clarity forcing). It is reconstructed by regressing sunspot and faculae time series with total solar irradiance (TSI) (Wang et al. 2005). To obtain the solar perturbation to radiative forcing, the TSI is divided by 4 due to the spherical geometry of the Earth, multiplied by the average co-albedo of the Earth (about 0.7) and the average value of the two 11-year solar cycles from 1882 to 1904 is removed to obtain an anomaly.
For the future, we take the solar cycle 23 (the last one before 2008) and reproduce it to extend the series as was recommended for the CMIP5 experiments. e) Volcanic forcing Contrary to other forcings, there was no standard volcanic forcing series prescribed for CMIP5 experiments. The volcanic forcing F V used here was derived from the optical depth V using the approximate relation for instantaneous forcing. The series for V and the relation to forcing were obtained from the Goddard Institute for Space Science (GISS) website (Sato 2012). The volcanic forcing covers the period from 1850 to 2012 and it was kept null for its extension into the future as was prescribed for CMIP5 experiments. We extend it back to 1765 using the optical depth reconstruction of Crowley et al. (2008). To obtain the radiative forcing, the series was multiplied by a factor of −24 W m −2 so that the total forcing following the Pinatubo eruption from 1991 to 1996 is equal to the Sato (2012) dataset over the same period.
The response to volcanic forcing is crucial in improving the estimation of parameters, especially the scaling exponent H, as it provides insight into the climate system on fast timescales (inter-annual); within this scaling model, there is no characteristic scale and it has implications for longer timescales. Volcanic forcing is peculiar as it is strong and highly intermittent. The intermittency can be quantified by the parameter C 1 which corresponds to the fractal codimension (i.e. 1 minus the fractal dimension) characterizing the sparseness of volcanic "spikes" of mean amplitude (see Lovejoy and Schertzer 2013b;Lovejoy and Varotsos 2016), with large eruptions producing negative forcing spikes of magnitude greater than F 2× CO 2 ; on the other hand, the instantaneous temperature response is weaker than expected using linear response theory.
It was found that even though volcanic forcing dwindles away quickly, it has noticeable effects on the climate at decadal timescales and longer by sharply reducing the ocean heat intake (Church et al. 2005;Stenchikov et al. 2009). This, in fact, corresponds to the physical mechanism behind the long-range memory to forcing in the linear response framework, and it acts to reduce the instantaneous impact of the volcanic forcing. Gregory et al. (2016) found that this reduction in ocean heat intake explains most of the discrepancy between forcing and response, but also added that "the magnitude of the volcanic forcing [...] may be smaller in AOGCMs (by 30 % for the HadCM3 AOGCM) than in off-line calculations that do not account for rapid cloud adjustment".
The volcanic response appears to be non-linear as the intermittency ("spikiness", sparseness of the spikes) parameter C 1 changes from about C 1F V ≈ 0.2 for the input volcanic forcing to C 1T ≲ 0.1 for the temperature response :the latter is therefore much less intermittent than the former. Theoretically, a linear response model with a power-law Green's function cannot alter the intermittency parameter, although these estimates are sensitive to finite size effects and internal variability (Lovejoy and Varotsos 2016).
Since the volcanic forcing is too strong and too intermittent, using it within the SCRF framework requires the use of an effective volcanic forcing series if we are to use it in the linear response framework. The following theoretically motivated non-linear relation damps the amplitude of the volcanic forcing while also reducing its intermittency parameter : where F V is the damped effective volcanic forcing, is the damping exponent and < F V > is the mean F V . The damping exponent required to reduce the intermittency parameter of the volcanic temperature response, C 1T V , can be theoretically calculated using the relation : where MF is the multifractality index of the volcanic forcing (e.g. Lovejoy and Schertzer 2013b). For MF ≈ 1.6 , we find ≲ 0.65 . This calculation might underestimate , because the temperature variability also includes the response to the less intermittent forcing (anthropogenic and solar) as well as the internal variability, which is quasi-Gaussian with C 1 ≈ 0 . Given the difficulty in estimating C 1 and MF to calculate , we introduce as a free parameter and estimate it directly.
Another avenue would be to simply introduce a linear scaling factor to reduce the amplitude of the volcanic forcing, but this would not change the nondimensional spikiness. The two methods are approximately equal for medium size eruptions, but strong peaks get reduced further in the nonlinear damping case (see Fig. 4). From the point of view of maximizing the variance explained by the forced response in the temperature record, the linear damping method would produce very similar result, but we choose the non-linear damping method simply because it is better justified theoretically and decreases the intermittency parameter C 1 .

Surface air temperature data and cmip5 simulations
Our analysis was performed on 5 observational records of surface air temperature each spanning at least the period  (Smith et al. 2008) and Berkeley Earth Surface Temperature (BEST) (Rohde et al. 2013). The HadCRUT4 dataset is a combination of the seasurface temperature records compiled by the Hadley Centre of the UK Met Office with the land surface air temperature records compiled by the Climate Research Unit in East Anglia; the C and W dataset uses HadCRUTv4 as raw data, but aims to address coverage bias by infilling missing data by kriging; the dataset with land air temperature anomalies interpolated over sea-ice was used. GISTEMP is produced by the Goddard Institute for Space Studies by combining the Global Historical Climate Network version 3 (GHCNv3) land surface air temperature records with the Extended Reconstructed Sea Surface Temperature version 4 (ERSST) and the temperature dataset from the Scientific Community on Antarctic Research (SCAR). NOAA National Climatic Data Center also uses GHCNv3 and ERSST, but with different quality controls and bias adjustements. BEST uses its own land surface air temperature product combined with a modified version of HadSST.
The CMIP5 models used are presented in Table 2. The 32 selected GCMs have historical simulation outputs available for the period from 1860 to 2005 and outputs of scenario runs over 2005-2100 for RCP 2.6, RCP 4.5 and RCP 8.5.

Parameter estimation
We have now presented the SCRF and we need to establish a procedure to estimate its parameters s , , and H as well as the forcing parameters and . To achieve this, we relate the temperature and forcing data introduced above through the SCRF in a multi-parameter Bayesian estimation scheme.
The radiative forcing data is used to produce a theoretical forced temperature response by means of convolution with the SCRF. The convolution was implemented numerically as a discrete sum : where T F is the forced temperature response, H is the scaling exponent, s is the sensitivity to integrated radiative forcing, is the high-frequency cutoff and F(t i ) is the annual forcing series F(t) linearly interpolated to a resolution such that t � = t � i+1 − t � i = 0.05 year ; this resolution is much smaller than ≈ 2 years (see below) and it was found to be sufficient to produce a temperature response with negligible error compared to the analytic response for different polynomial forcing scenarios while remaining computationally efficient.
Due to the assumption of linearity, the forcing series used can be written as the sum of the constituent forcings : This includes the two extra parameters discussed earlier: the aerosol linear scaling factor and the volcanic non-linear damping exponent . This allows us to take into account the uncertainty on the forcing when estimating model parameters; the uncertainties on F GHG and F S are thus neglected, because they are overwhelmed by the uncertainty on F Aer . The uncertainties add in quadrature and therefore, if we leave out the uncertainty on F GHG when adding to the uncertainties of the forcing from the aerosol direct and indirect effects, we only underestimate the total uncertainty (at the 90% confidence level) by about 0.01 F 2× CO 2 (using the errors given in Table 1 for the total change over the 1750-2005 period). The amplitude of F V is sensitive to and its parameter spread is informative of the related uncertainty in the volcanic forcing.
All the series used for parameter estimation were adjusted to the same anomaly level so that their mean values were zero over the reference period of 1880-1910. There are thus five parameters to determine. A time-dependent forced response T Forced (s, H, , , ;t) is calculated for each parameter combination and removed from the temperature series to obtain a series of residuals which represent an estimator ̂ T Internal of the historical internal variability.
where T Obs is an observational temperature dataset. This allows us to calculate the likelihood function L to be maximized which corresponds to the probability Pr of the internal variability to follow our assumed error model : The error model we use is a fractional Gaussian noise (fGn) with zero mean (see Lovejoy et al. , 2017, and therefore the residuals are not independently distributed. This model approximates well the scale dependence of the internal variability, and so even if it is misspecified, it will be superior to non-parametric approaches (Poppick et al. 2017;. Using Bayes' rule, we can derive a probability distribution function (PDF) for the parameters of interest: where (s, H, , , ) corresponds to the prior distribution for the parameters. The prior distribution for the high-frequency cutoff is taken to be a Dirac-delta at 2 years. From the point of view of the parameter estimation, this choice has a marginal effect on the results since it is the ratio of H to which is most determinant for sensitivity estimates at equilibrium, and H is allowed to vary. This weak assumption makes the estimation procedure more manageable by reducing the number of free parameters to four : H, s , and . See Appendix B for a detailed discussion concerning this choice of , and the sensitivity of the result to this choice.
Given no previous knowledge of s and , we simply assume a non-informative uniform prior over the range of parameters considered. We know that the scaling exponent H should be negative since the ECS would diverge if H ≥ 0 , and therefore, we need a prior in H which discards such nonphysical values. For each value of H, there is a unique TCR to ECS ratio which can be calculated (shown on Fig. 2). We make use of this fact to derive a prior in H based on the TCR to ECS ratios from an ensemble of 23 CMIP5 GCMs (Yoshimori et al. 2016). In order to minimize the impact of this choice, we take a wide uniform distribution covering the mean ratio plus or minus 4 times the standard deviation, NorESM1-ME (32) i.e. between 0.22 and 0.91, which yields a corresponding uniform distribution in H between −1.1 and −0.1 . Regarding , we take as the prior distribution a normal distribution N(1.00, 0.55) which has a 90% CI coherent with the modern value for F Aer from the IPPC AR5, [−1.9, −0.1] W m −2 , since the modern value of F Aer ≈ −1.0 W m −2 in the series we used. The efficacy of the parameter estimation method is tested on synthetic temperature series in Appendix C.

Scaling climate response function
In this section, using Bayes' theorem as described above, we derived a PDF for the parameters of the SCRF from the mean likelihood of the five observational datasets available: HadCRUTv4, C&W, GISTEMP, NOAAGlobalTemp and BEST. Two different series were used for the aerosol forcing in equation 21: F Aer RCP and F Aer Q a , and the results are compared. Estimates and CIs are rounded to the resolution used for the likelihood function of the variable.
The PDF for the aerosol linear scaling factor , using F Aer RCP (Fig. 5: bottom left, solid line), yields a 90% CI of [0.1, 1.3] with median at 0.8; for comparison, a uniform prior in , and a prior consistent with S15, would yield the median values of 0.6 and 0.7 with CIs of [0.2, 1.2], and [0.4, 0.9] respectively; the posterior result is therefore not sensitive to the prior assumption. Given that in 2005 F Aer RCP ≈ −1 W m −2 , the negative value of translates into confidence intervals for the modern aerosol forcing. It is interesting to note that we recovered a 90% CI close to what had been argued by S15, namely [−1.0, −0.3] W m −2 , thus reinforcing his claim to decrease the uncertainty on aerosol forcing using a completely independent approach, although more comprehensive estimates still support the wider range from the IPCC's AR5 (Bellouin et al. 2020). On the other hand, the result using F Aer Q a supports a significantly weaker (better constrained) aerosol forcing with an median value at 0.2 and a 90% of [−0.2, 0.6] (Fig. 5: bottom left, dashed). The sulfur dioxide emission series Q a , and therefore F AerQ a , shows a sharp increase between 1950 and 1980, sharper than F AerRCP . Meanwhile, the global mean temperature only decreases marginally and therefore, for F AerQ a needs to be smaller for a good agreement.
The volcanic damping exponent was found (when using F Aer RCP ) to have a 90% CI of [0.25, 0.85] with median value at 0.55 (Fig. 5: bottom right, solid line) and using F Aer Q a yielded a slightly higher median of 0.60 with a 90% CI of [0.30, 0.90] (Fig. 5: bottom right, dashed line). The datasets which tend towards a lower (i.e. smoother volcanic forcing), namely NOAAGlobalTemp and GISTEMP, are also those with stronger statio-temporal filtering, and therefore, a smoother volcanic response. These results confirm that volcanic forcing is generally overpowered since = 1 has practically null probability in both cases. This means that using the original volcanic forcing series described above without the non-linear damping does not reproduce well, within the SCRF model presented, the cooling events observed in the instrumental records following eruptions: the cooling would be too strong. It also shows that the volcanic contribution is essential since values near zero, ; for each case the probabilities over the remaining parameters were integrated out. Shown alongside are the corresponding PDFs for the parameter estimation based on both F Aer RCP (solid) and F Aer Q a (dashed). The average PDFs (purple) from the five observational datasets are shown after the prior distribution has been applied; the one using F Aer RCP is shown as the main result with shading and darker 5% tails which effectively corresponds to a constant (weak) mean volcanic forcing, are also ruled out. We also obtain from Eq. 19, taking MF ≈ 1.6 and C 1F V ≈ 0.2 , that the intermittency parameter C 1 of the effective volcanic forcing, and also the linear volcanic temperature response, is C 1T V = 0.07 +0.08 −0.05 at the 90% confidence level for the F Aer RCP result, and C 1T V = 0.09 +0.07 −0.06 for the F Aer Q a result. The most crucial parameter in our model is its scaling exponent H which is the main determinant for ECS estimates made later. We found, when using F Aer RCP , a 90% CI of [−1.0, −0.1] , with median value at −0.5 (Fig. 5: top left, solid line). Using F Aer Q a did not significantly change the median estimate of H and the 90% CI (Fig. 5: top left, dashed line). The purpose of H is somewhat similar to that of an ocean diffusivity parameter in a one-dimensional SCM and we see that in fact, the datasets using HadSST: HadCRUTv4, C&W and BEST, yielded PDFs closer to each others than to those using ERSST: NOAAGlobalTemp and GISTEMP.
We can therefore identify and separate the anthropogenic and natural components of the forced variability, T Anthro and T natural respectively, from the observational temperature series T Obs to obtain the internal variability T Internal , see Fig. 6.
T Anthro and T Natural are obtained by the convolution of the associated forcing with the SCRF; solar and volcanic forcing in the case of T Natural , and greenhouse gases and aerosol forcing in the case of T Anthro . Our result confirms that much of the 20 th century warming is indeed human-induced, while much of the decadal scale variability can be attributed to natural forcing, and internal processes. In the projection period, after 2015, T Natural brings a positive contribution to the temperature anomaly since the prescribed solar forcing is a repetition of the anomalously high solar cycle 23. In reality, the natural forcing for the future will be quite different than those prescribed here for conformity with CMIP5 experiments.

Climate sensitivity
The paleoclimate record has been used to estimate a likely range for ECS (equivalent to our parameter s) by comparing the temperature and forcing changes over past periods such as the last millennium (Hegerl et al. 2006: [1.9, 4.3] K) and the last glacial maximum (LGM) (Schmittner et al. 2011:[1.4, 3.5] K; von der Heydt et al. 2014: [2.0, 2.6] K) There is evidence that the climate sensitivity depends on the mean climate state and therefore the modern ECS does not necessarily correspond to past ECS (Crucifix 2006;Yoshimori et al. 2011). In fact, there are many methodological challenges in order to fairly compare those different (H, , , , t) estimates, and in their extensive review on the subject, PALEOSENS Project Members (2012) considered 21 different paleoclimate studies in order to derive a [2.2, 4.8] K likely range for ECS.
An alternative approach developed to obtain ECS from the instrumental period are one-dimensional simple climate models (SCMs) with few parameters, among which the ECS is specified rather than being measured as an emergent property. The outputs of SCMs are then evaluated against the observational record, usually within a Bayesian framework, to find the best parameter combination and derive a range for the ECS : Recently, Sherwood et al. (2020) have conducted a comprehensive studies of all those lines of evidence and concluded that an ECS below 2 K was hard to reconcile with feedback-, historial-and paleoclimate-based estimates, while estimates paleoclimate-based estimates provide the strongest evidence against an ECS above 4.5 K. Their Bayesian Fig. 6 (top) The mean observational temperature series T obs (green and shifted up by 0.5 K) has T Natural removed and the residual T Obs − T Natural (red) is compared with T Anthro (black) calculated using F Aer RCP ; they are highly correlated (r=0.94). (bottom) T Anthro is removed from T Obs and the residual T Obs − T Anthro (orange) is compared with T Natural (blue); the are correlated with r = 0.51. Once T Natural is also removed, we obtain T Internal (green and shifted up by 0.5 K), which is thus uncorrelated with T Natural . The error (shaded) given for each curve correponds to the 90% CI derived from the estimated parameters, except in the case of the observational data where it correponds to the spread between the datasets (1.645 standard deviation, by analogy with a 90% CI) analysis yielded a likely range for ECS of [2.6, 3.8] K, with 3.1 K at median, and a very like range [2.3, 4.7] K.
Similarly to SCMs, with probabilistic estimates of our model parameters it is straightforward to calculate the associated temperature response to any forcing scenario; this allows us to derive PDFs for common measures of climate sensitvity: TCR and ECS. In addition, we define ECS 500 as the temperature change 500 years after a step-function doubling in carbon dioxide concentration instead of at infinity.
Using Eq. A9 for each parameter combination with the associated probability assigned, we derived the PDFs for TCR shown in Fig. 7 using a uniform prior in TCR. Our result is slightly lower and better constrained than the one given in the IPCC AR5: a [1.0, 2.5] K likely range with best value at around 1.8 K. Using F Aer RCP , we found a median TCR of 1.7 K with a likely range of [1.4, 2.0] K, and when using F Aer Q a the median is revised downward to 1.3 K with an even slimmer likely range of [1.1,1.5] K.
The PDFs for ECS (Fig. 8) are calculated with a uniform prior distribution in ECS and a likely range subset of the corresponding IPCC AR5 likely range of [1.5,4.5] K was found. Using F AerRCP , the ECS PDF found is skewed towards higher ECS with a likely range of [1.8,3.7] K with its median at 2.3 K. The result for F AerQ a yields a likely range of [1.5,2.7] K with its median at 1.8 K. Both results are coherent with the IPCC AR5 likely range, albeit on its lower side. A large fraction of the expected warming for the models with very high ECS only occurs at long timescale as a consequence of the slow convergence time when H → 0 − . This is obvious when considering ECS 500 , calculated with Eq. A5, for we obtain more symmetrical distributions than for ECS: a likely range of [1.7, 2.8] K with its median at 2.2 K for F AerRCP , and a likely range of [1.5, 2.1] K with its median at 1.7 K. The median ECS we found when using F Aer RCP , which is derived from GCM experiments, is 0.6 K higher than when using F Aer Q a which is entirely based on historical observations. The latter is close to other observation-based estimates with low-ECS (Ring et al. 2012;Skeie et al. 2014) while the former is closer to high-ECS observation-based estimates (Meinshausen et al. 2009;Bodman et al. 2013;Otto et al. 2013;Johansson et al. 2015) as well as that of Sherwood et al. (2020), but both are lower than the 3 K best value reported in AR5 which is very close to CMIP3 and CMIP5 GCMs based estimates. All the ECS results are summarized in Table 3

Projections to 2100
Using Eq. 20, we are now able to use the SCRF to reconstruct the forced temperature variability over the historical period and make projections for the coming century according to the RCP scenarios. We compare these purely observations-based projections with those from the CMIP5 MME considered (32 GCMs). The CI given for the MME correspond to the spread between the different GCMs, i.e., the structural uncertainty, since a full probabilistic characterization of uncertainty is generally not possible with GCMs given the hundreds of degrees of freedom involved. In addition, we also show the SCRF projections after recalibrating the sensitivity coefficienct s on the historical part of the MME, thus confirming that its forced response is also linear and that the memory in the MME projections is close to that of the SCRF. Furthermore, the ability of the SCRF method was shown for the majority of individual CMIP5 models by calibrating parameters over the low-emission scenario RCP 2.6 and projecting the medium-and high-emission scenarios, RCP 4.5 and RCP 8.5 (Hébert 2017).
Over the 1860-2000 period, the reconstructed forced temperature variability produced by the SCRF, whether using F AerRCP or F AerQ a , and the mean of the CMIP5 MME track each other closely (Fig. 9, top left). There is only a small gap between the two over the 1915-1960 period when the CMIP5 MME is consistently warmer, but generally by less Fig. 7 The PDF for TCR is calculated using F Aer RCP (solid) and F Aer Q a (dashed). The associated likely intervals (66%) (bars under the axis) are tighter than the IPCC likely range (gray shading) with lower median After 2000, the SCRF reconstruction accurately follows the so-called hiatus while the CMIP5 MME overshoots. This was also shown by  with a simple Dirac CRF and an effective sensitivity to CO 2 . In Schmidt et al. (2014), the overshoot of the CMIP5 MME is attributed to a combination of conspiring factors, mainly errors in volcanic and solar input, in representation of aerosols and in the evolution of El-Niño. In fact, an impulse-response model, similar to what we are using here, is used by Schmidt et al. (2014) to accurately retroadjust the CMIP5 projection ensemble. We did not investigate the effect of those adjustments on the CMIP5 MME for future projections and simply considered the original GCM results.
The low-emission scenario, RCP 2.6, is of particular interest since the dominant anthropogenic forcing starts decreasing around the mid-2040s and allows us to observe the "warming in the pipeline" (Hansen et al. 2011) due to the large thermal inertia of oceans, which is modeled here by the long-range memory to radiative forcing of the SCRF. This means that the upper bound of the projection (higher H) keeps increasing up to 2100 even though the forcing has been decreasing since 2045, while the lower bound (lower H) has less memory and begins to decrease as early as 2055. The median SCRF surface temperature projection for F Aer RCP (or F Aer Q a ; hereafter, the SCRF result using F AerQ a is given in brackets after the result using F AerRCP ) under RCP 2.6 exhibits a behaviour close to a stabilization over the 2043-2100 period, reaching, in 2100, 1.5 +0.4 −0.2 K (or 1.3 +0.3 −0.2 K ) at the 90% confidence level compared to 1.7 ± 0.8K for the MME (Fig. 9, top right).
The behaviour of the SCRF projection is in fact similar to the MME result over 2043-2100 in that regard, which has an average projection which also shows stabilization, and thus we can project the MME rather well after recalibrating the sensitivity coefficient s . Out of the 32 CMIP5 models analyzed and numbered for reference in Table 2, 12 showed a significant cooling trend between 2043 and 2100 (2,5,10,11,13,15,16,18,22,24,28,29) while 9 showed a significant warming trend over the same period (4,6,7,8,12,21,25,26,30); the remaining 11 models did not show a significant trend in either direction.
The forcing of the middle scenario, RCP 4.5, stabilizes in the mid 2060s, but the temperature projections, in Fig. 9 (bottom left), continue rising until 2100, both with the SCRF and the CMIP5 MME, and reaching 2.3 +0.7 −0.5 K (or 1.9 +0.4 −0.3 K ), at the 90 % confidence level, and 2.6 ± 0.8K respectively. Both projections for the business as usual scenario, RCP 8.5, show warming at a staggering rate up to 4.1 +1.3 −0.9 K (or 3.4 +0.7 −0.5 K ) and 4.8 ± 1.3K , respectively, in 2100 as shown on Fig. 9 (bottom right). For the RCP 4.5 and RCP 8.5 scenarios, the SCRF projection has 25% (44%) and 15% (46%) less uncertainty than the MME spread and the median is colder by 0.3 K (or 0.7 K) and 0.7 K (or 1.4 K) respectively. In fact, the SCRF projection approximately corresponds to the lower half (or the lower quarter) of the CMIP5 MME projection range. International negotiations often invoke 1.5 K and 2 K threshold not to be crossed in order to avoid a major climatic upheaval. The probability when those temperature are reached under each scenario can be calculated for the SCRF and the CMIP5 MME (Fig. 10). For the latter, we assumed the global mean temperature of the MME every year to be normally distributed in order to obtain the probability of having crossed the given threshold at that time. Generally, the increase in probability as a function of years is sharper for the SCRF than for the CMIP5 MME given the smaller uncertainty on the projections.
For the low-emission scenario RCP 2.6, it is likely that the 1.5 K threshold would be exceeded in 2100 according to the SCRF projections with 54% probability (or 19%), while it was slightly more likely for the CMIP5 MME with 67% probability. It is very likely (>97%) that the 2 K threshold would not be crossed in 2100 according to the SCRF projection while the CMIP5 MME still shows 26% probability of exceeding.
For the medium-emission scenario RCP 4.5, the SCRF asserts it is extremely likely (>95%) that the global temperature will be above 1.5 K as early as 2038 (or 2059); for the CMIP5 MME, it becomes extremely likely after 2050. The 2 K threshold on the other hand will not as certainly be crossed in 2100 according to the SCRF projections as it has 94% (or 40%) of overshooting, similarly to the CMIP5 MME which shows a 89% probability of overshoot in 2100.
For the high-emission scenario RCP 8.5, all methods show a high probability of a warming exceeding 2 K before 2100. According to the SCRF, the risk of overshooting 1.5 K is negligible before 2024 (or 2028), but extremely likely after 2036 (or 2047), similarly to the CMIP5 MME which reaches the 95% probability of overshooting 1.5 K in 2038. The 2 K threshold is also extremely likely to be crossed about 15 years later in 2055 for both the SCRF and CMIP5 MME (or 2068). Fig. 9 The median forced temperature variability is projected using the SCRF, with the parameters calculated using F Aer RCP (blue) or F Aer Q a (green), and compared with the CMIP5 MME projection (black) and a SCRF projection of the MME (orange) as well as the historical temperature observations (red); 90% CI are indicated along the projections (shaded) and for the year 2100 (thick vertical lines). The historical period (top left) and the projections until 2100, for RCP 2.6 (top right), RCP 4.5 (bottom left) and RCP 8.5 (bottom right), are shown Fig. 10 The probability for the global mean surface temperature of exceeding a 1.5 K threshold (top), and a 2 K threshold (bottom) are given as a function of years for the SCRF, using F Aer RCP (solid) and using F Aer Q a (dashed), and for the CMIP5 MME (circles). The three RCP scenarios are considered for each case: RCP 2.6 (blue), RCP 4.5 (black) and RCP 8.5 (red)

Conclusion
Multidecadal climate projections rely almost exclusively on deterministic global climate models (GCMs) in spite of the fact that there are still very large structural uncertainties between Coupled Model Intercomparison Project phase 5 (CMIP5) GCMs, i.e. each has its own climate, rather than the real world climate. Climate skeptics have argued that IPCC projections are untrustworthy precisely because they are entirely GCM based. While this conclusion is unwarranted, it underscores the need for independent and qualitatively different approaches. It is therefore significant that the alternative GCM-free approach we present here yields comparable results albeit with smaller uncertainty.
This motivated us to elaborate a model, based on the scaling of climate processes, for the response of the global mean air surface temperature of the Earth to external forcing: the scaling climate response function (SCRF). The forced component of temperature variability is reconstructed from external forcing within the linear response framework with a power-law scaling Green's function truncated at highfrequency. The stochastic component ultimately due to the internal turbulent dynamics of the atmospheric system is approximated by a fractional Gaussian noise process, as was proposed in the ScaLing Macroweather Model (SLIMM) . Similarly, GCMs yield stochastic internal variability with an approximately linear mean forced response (Meehl et al. 2004), and we showed that in fact the SCRF model can project their forced response rather accurately.
Our model is robust and by restricting the parameter space, it allows for a full probabilistic characterization of uncertainty by Bayesian inference. A by-product of our analysis is a better constrained aerosol forcing since we found the aerosol linear scaling factor to be within a 90 % CI of [0.1, 1.3] for the RCP aerosol forcing F Aer RCP . This supports a revision of the global modern aerosol forcing 90 % confidence interval to a narrower [−1.3, −0.1] W m −2 , similar to Stevens (2015). On the other hand, we obtain a very weak aerosol forcing if instead we use F Aer Q a , which was reconstructed directly from sulfur dioxide emissions using a linearized version of Stevens' proposed model (Eq. 16). While the difference between the aerosol series might arise from a misreprensation of aerosol effects in GCMs which were used to produce F AerRCP , errors could also arise because of deviations from linearity with respect to SO 2 emissions due to other aerosol species not explicitly taken into account to produce F AerQ a .
Following others (Church et al. 2005;Stenchikov et al. 2009;Lovejoy and Varotsos 2016), we also found that the volcanic forcing was generally over-powered and overly intermittent, or too "spikey", to produce results, within the SCRF framework, consistent with instrumental data. An effective volcanic forcing with lower-intermittency was obtained with a non-linear damping by the exponent . It was found to be within [0.25,0.85] at the 90% confidence level using F AerRCP (or [0.30,0.90] using F AerQ a ), yielding a corresponding CI of [0.02,0.15] (or [0.03,0.16] using F AerQ a ) for the intermittency parameter C 1 of the effective volcanic forcing, which is compatible with the lower intermittency of the temperature.
Our analysis supports better constrained TCR and ECS likely range than the IPCC AR5. When using F Aer RCP (or F Aer Q a ), the range shrinks from [1.0, 2.5] K to [1.4, 2.0] K for the TCR (or [1.2, 1.5] K) and from [1.5, 4.5] K to [1.8, 3.7] K for the ECS (or [1.5, 2.7] K); the median estimates also decrease from 1.8 K to 1.7 K (or 1.4 K) for the TCR and from 3.0 K to 2.4 K (or 1.8 K) for the ECS. This agrees with other recent observation-based studies (Otto et al. 2013;Skeie et al. 2014, andJohansson et al. 2015) which also support a downward revision of the ECS upper 17% bound by at least half a degree. In addition, the ECS 500 was found to be significantly smaller, 2.2 +0.6 −0.5 K (or 1.7 +0.4 −0.2 K ), than the ECS. This implies that if the ECS is on the higher end of the CI, then a large fraction of the warming would be experienced hundreds of years after a potential stabilization of anthropogenic forcing. An important and rather conservative claim supported by this evidence is therefore that the upper 5% ECS bound and median of AR5 can be safely revised downward to 4.0 K and 2.5 K. The lower 5% bound of 1.5 K, on the other hand, remains reliable.
Our ECS likely range is therefore better constrained than paleoclimate estimates such as Hegerl et al. (2006) and PALEOSENS Project Members (2012) who found [1.9, 4.3] K and [2.2, 4.8] K respectively. Our method is strictly based on modern instrumental data and the low uncertainty could be an "epoch bias", i.e. if climate sensitivity depends on the climate mean state, then an estimate based on only one period would not necessarily be representative.
Our historical approach also decreases the uncertainty on projections by more than a factor of two, even with the large uncertainty associated with aerosol forcing, compared to the structural uncertainty of a multi-model ensemble (MME) of CMIP5 global climate models for RCP scenarios. However, the uncertainties compared are qualitatively different since it is not possible to charaterize probabilistically the entire set of parameters involved in GCMs. The structural uncertainty in a MME is rather based on the dispersion between the GCM runs, each produced with a definite set of parameters. Our approach on the other hand does not yield any structural uncertainty since we assumed a single CRF which is able to produce a wide range of climate responses.
The SCRF projections to 2100 are entirely independent of the GCMs. Still, they are within the uncertainty bounds of the latter, effectively providing an independent confirmation of the GCM projections. This eliminates one of the key climate skeptic arguments: projections are not reliable since they are solely GCM-based. This conclusion is therefore important not only for scientists but also for policy makers, stressing the need for mitigation and adaptation measures. The SCRF is a the first of a new family of approaches that directly model the temperature at monthly scales and longer, see for example Procyk et al. (2020) that, in addition to scaling, directly exploits the principles of energy-balance (the fractional energy-balance equation) and produces projections and climate sensitivity estimates in a similar range, although with a better estimation of the scaling exponent H by also utilizing the macroweather high-frequency response.
According to our projections made to 2100, to avert a 1.5 K warming, future emissions will be required to undergo drastic cuts similar to RCP 2.6, for which we found a 46% probability to remain under the said limit; it is virtually certain that RCP 4.5 and RCP 8.5-like futures would overshoot. Even a 2.0 K warming limit would surely be surpassed by 2100 under RCP 8.5 and probably also under RCP 4.5, with only a 6% chance of remaining under the limit. The safest option remains RCP 2.6 which we project to remain under 2.0 K with very high confidence. The question remains whether it is at all realistic given that it relies strongly on the massive deployment of speculative negative emission technologies.
On the other hand, our model has obvious limitations since it assumes a linear stationary relationship between forcing and temperature, neglecting nonlinear interactions which could arise as the system evolves, as it currently warms. In particular, so-called tipping points could be reached in the coming century which would lead to a breakdown of the linear model proposed. Such potential behaviours are of critical value for improving future projections, but they have not yet been observed with high confidence even in GCMs. This underlines the need to exploit paleoclimate archives to achieve a better understanding of low-frequency natural variability, namely the transition scale from the macroweather regime to the climate regime. In this study, we have assumed the increased variability in the climate regime to be strictly a result of forcing, but internal modes of variability could also have a significant contribution for longer timescales.
More relevant to human activities and adaptation policies are regional projections which are also almost entirely produced by GCMs. In addition to the discrepancy between their global mean response, CMIP5 GCMs show widely varying spatial patterns of warming over the last century between themselves, and with significant differences from those observed in gridded instrumental temperature datasets (Hébert and Lovejoy 2018). Future work should explore the possibility of data driven models at the regional scale for climate projection. Already, it was found by Lovejoy and de Lima (2015) that in the macroweather regime, statistical space-time factorization holds for temperature and precipitation, both in instrumental datasets and in GCMs. This implies the possibility of developing linear response models for regional projections, although the main obstacle foreseen will be to identify the forced signal in the stronger regional internal variability. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.

Appendix: Response of the SCRF to theoretical forcing scenarios
In this section, we calculate the expected temperature response for different forcing scenario under the SCRF of Eq. 13. The expected forced temperature response for an N th -order polynomial forcing function F poly = N ∑ i=0 a i t i (t) can be written as: where a i is the i th -order coefficient of the forcing polynomial and the Heaviside function (t) makes the forcing null for negative times (the forcing is assumed to be zero for t < 0 ). This is convenient not only since the ECS and TCR experiments are specific cases, but also because it also allows to calculate the theoretical response for any forcing which can be approximated by a polynomial. Of course, any orthogonal basis of functions could be used. Note that ∀H ∈ {0, −1, −2, ..., −N} , where N is the order of the polynomial, there is a singularity, but the limit at such points is the same from both sides so that we can redefine it the value of the limit without further complications.
To measure ECS, the forcing scenario used has a 0 = F and a i = 0 for all i > 0 , which corresponds to the following step-function : where F corresponds to a doubling in CO 2 and (t) is again the Heaviside step function. We can calculate the (A1) as the temperature keeps increasing monotonically. As the scaling exponent approaches 0 from the left, H → 0 − , the convergence time increases and then diverges for a positive exponent. We also define ECS for a more concrete horizon of 500 years : This allows us to calculate the warming fraction left for a given scaling exponent H between half a millennium and infinity.
To calculate TCR, a different forcing scenario is used where a doubling of carbon dioxide takes place over 70 years, increasing at a rate of 1% per year, and then held constant. The forcing produced by carbon dioxide is proportional to the logarithm of its concentration and therefore, this exponentially increasing carbon dioxide scenario corresponds to a linear increase in forcing with a 1 = F −1 tr and a i = 0 ∀i ≠ 1 : where tr is the time over which the doubling takes place, that is 70 years, and F is the final change in forcing. We can again calculate the temperature response analytically with the SCRF : The temperature change at T tr (t = tr ) , that is when the forcing reaches its final value, is then defined as the TCR. This measure is not representative of the ECS, although they become equal for sufficiently large negative values of the scaling exponent H, but it is even more relevant for near-term projections as the TCR will be the main contribution to global warming as long as the exponential rise in carbon dioxide concentration continues. If we take t → ∞ , we notice that the left term in the bracket vanishes and we are left with Eq. A6. We thus see that the steadystate response does not depend on the forcing scenario as T tr (t → ∞) = T step (t → ∞) , but only on the change in forcing. It is convenient to analytically calculate the ratio of TCR to ECS as a function of H and : (A7) ECS 500 = T step (500 years) temperature change for this step-increase in forcing at any time following the increase: which can be rewritten as: To obtain the ECS, we only need to take t → ∞: ECS is thus the temperature change at infinity when a steady-state is reached; in this model this requires H < 0 . If H ≥ 0 , we effectively have a runaway greenhouse model This ratio is shown on Fig. 2

Appendix: inner scale of the linear response
In this section, we present an argument in favour of the inner scale of the linear response at approximately 2 years. We mentioned that the physical mechanism which we approximate by integrating the radiative forcing history with the SCRF is the accumulation of heat, mostly, in the oceans which in turn drives the forced temperature response in the atmosphere and at the sea-surface. Therefore, we expect a strong correlation between the two at large time scales where the forced variability overwhelms the internal one, and thus a similar large scale scaling behaviour which should be dominated by the anthropogenic effects. It may seem surprising that up until now, the fundamental issue of land-ocean coupling has, to our knowledge, not been investigated empirically. The likely reason is that it requires the use of Haar, or other appropriately defined, fluctuations. The Haar fluctuation over an interval t is simply the difference between the mean over the first and second half of the interval. It is a convenient way to characterize variability as a function of time scale in real space and it is valid whether fluctuations increase or decrease with scale, which is necessary here in order to avoid spurious statistical issues at low-frequencies (Lovejoy and Schertzer 2012) Figure 11 compares the mean Haar fluctuations as a function of timescale of the global sea-surface temperature (SST), the land-surface air temperature (LST) and the global combined land-ocean surface air temperature (SAT). The forced component of variability dominates for large timescales and we see that SST, LST and SAT have similar forced variability, higher for LST and weaker for SST as expected, and a scaling behaviour akin to the forcing for time scales above 20 years. For short timescales, the internal variability dominates and this leads to diverging behaviour between the SST, LST and the SAT since the turbulent dynamics of the ocean and the atmosphere are different. The driving energy rate density, the turbulent that determines the weather-macroweather transition scale w , is about 10 −3 W Kg −1 in the atmosphere, but only about 10 −8 W Kg −1 in the ocean surface layer (Lovejoy and Schertzer 2010). The transition time scale is proportional to − 1 3 so that the weather-macroweather transition in the SST is visible around 2 years rather than at a sub-monthly scale in the atmosphere, about 10 days.
The SAT should scale similarly to LST in the macroweather regime since they are both atmospheric fields with similar turbulent dynamics, but we observe a discrepancy on Fig. 11 : as expected the monthly LST series do not show the sub-monthly transition, but the scaling of SAT below 2 years shows a flattening. It is possible this is just a bias resulting from the inclusion of SST as

Fig. 11
The mean Haar fluctuations (in units of K) are shown as a function of timescale t (in years), adjusted to annual resolution from monthly resolved series, for LST (blue), SAT (black) and SST (red), and the shaded areas correspond to the standard deviation between all the datasets considered. Vertical dashed lines indicate the domains over which internal or forced variability dominates, or where they are of similar magnitude. At high-frequency, the scaling behaviours differ due to diverging turbulent dynamics of the internal variability. The same analysis performed for the forcing series (orange; in number of CO 2 doublings) shows a similar scaling behaviour at low-frequency, where the forced variability dominates, between the forcing and the temperature series. The series used are described in the data sub-section air-surface temperature in global SAT, which could artificially decrease the high-frequency variability of SAT, but more investigation needs to be done to confirm the claim. The model presented is not expected to reproduce the internal variability, which is treated as an independant stochastic process, and the high-frequency cutoff was introduced as the smallest scale over which the linear response approximation is expected to hold. As mentioned above, this should translate in correlation between the ocean and the atmosphere forced responses. Figure 12 shows the mean pairwise correlation between the first-order Haar fluctuations of SST, LST and SAT at different timescales. We observe a steep increase to high correlation between all pairs up to 2 years; the SAT series are composed from the LST and SST series and as such it is not surprising to find high correlation, but the completely independent LST and SST series confirm the same transition. This supports our interpretation that as we consider larger time scales, the internal variability averages out and the air surface temperature becomes proportional to the ocean temperature as a consequence of the low-frequency dominance of the forced response from integrated radiative forcing on both. This motivates us to fix the model parameter = 2 years for simplicity.
Result for ECS and ECS 500 are calculated for ∈ {1, 2, 3} years in order to evaluate the sensitivity of the result to this choice (Table 4). We also give an average of the three results weigthed by their relative likelihood (which yields weights of 0.38,0.33 and 0.29 respectively), and the result is in fact very close to the = 2 years value used. The likelihood function therefore varies slowly along the direction of , with the 3 values almost equally probable, and does not allow to constrain the parameter well by itself. The parameter estimation with a smaller yields an H closer to zero, and therefore the high ECS tail thickens as ECS increases as H → 0 − , and as ultimately diverges for H ≥ 0 . However, the convergence time also increases as H → 0 − and so the larger part of the response will happen at multi-centennial, multi-millenial and even much longer timescales. Therefore, the choice of can have a sizable impact for the theoretical ECS at infinity, but the effect will be marginal for any centennial projections and quasi-equilibrium climate sensitivity such as ECS 500 .

Appendix: Numerical estimation of the SCRF parameters
This appendix provides extra information concerning the SCRF model parameter estimation technique and tests the efficacy of the method on synthetic time series.
The parameters for the fractional Gaussian noise (fGn) (which is used as a model for the internal variability) Fig. 12 The average correlation coefficient r between the first-order Haar fluctuations of LST-SAT (red), SST-SAT (black), and LST-SST (blue) are shown as a function of timescale t in years. It was calculated for an ensemble of all possible pairs between the LST, SST and SAT datasets considered, and the shaded areas correspond to the one standard deviation. There is a rapid increasing trend in correlation between land and ocean as a function of timescale under 2 years and a slowly increasing trend above 2 years. To improve the statistics of the fluctuations at large timescale, the fluctuations from reversed series are also considered; this means there are only 2 fluctuations for timescales above half of the series length and only 1 or -1 correlation values are possible; thus the sharp increase above 40 years is not quantitatively significant Fig. 13 The forced temperature response is removed from the C&W instrumental temperature series to obtain the associated internal variability series (blue). The best volatility and Hurst exponent H ′ parameters to describe the series as an fGn process were estimated as 0.10 and −0.26 respectively. Three single realizations of an fGn process with the same parameters are shown (black, red and orange) are shown as examples. Each series has null mean and fluctuates about its associated reference line (dashed) are estimated with an iterative procedure. A first estimate is made from the internal variability obtained from a likely combination of parameters; it is then used as the error model to find the parameter combination with maximum likelihood whose associated internal variability serves to obtain the second estimate of the fGn parameters; the maximization process is repeated until the parameters converged, which occurred after only 2 steps. Figure 13 shows the internal variability from the Cowtan & Way (C&W) instrumental temperature series and three other realizations of the same estimated fGn process (Table 5). For CMIP5 GCMs, it would be simpler since the fGn parameters could be estimated directly from unforced control runs which correspond to realizations of the internal variability in each GCM. The following Mathematica 10.4 (Wolfram Research Inc 2016) functions were used to perform the above calculations : LogLikelihood[proc,data], FractionalGaussianNoise-Process [ , ,h] and EstimatedProcess [data,proc]. Note that the Hurst exponent h used within Mathematica 10.4 describes the scaling behaviour of the associated fractional Brownian motion obtained by integrating the fGn. The notation H � = h − 1 corresponds to the associated parameter in  which directly describes the scaling associated with the fluctuations of the fGn itself. H ′ should not be confused with the scaling exponent H In addition, the re-normalized product of the individual PDFs, after they were re-interpolated at 0.01 increments, are shown for H (red) and (orange). The PDFs for the individual realizations have been multiplied by a factor of 5 for clarity used to produce the forced temperature response using the SCRF.
To show the efficacy of our method, we test it on simulations generated by adding realizations of fGn, with = 0.1 and H � = −0.25 , onto forced temperature response generated with Eq. 20. Given that it is computationally heavy to calculate the likelihood of a fGn, we restrict the test simulations to = 1 , s = 1 , H ∈ {−0.2, −0.6} and ∈ {0.5, 0.8} . This yields 4 parameter combinations to which are added 25 realizations of fGn with known parameters ( , , h) = (0, 0.1, 0.75) , shown on Fig. 14. Figure 15 shows the resulting PDF for H and with known and the best s found. The likelihood function was calculated ∀H ∈ [−1.1, 0.3] and ∀ ∈ [0.2, 1.0] both in increments of 0.05. For each combination of H and , the best s is found; first an estimate s 0 is made by least-square regression and then a maximum likelihood estimate is made ∀s ∈ [0.85s 0 , 1.15s 0 ] in increments of 0.05s 0 . The likelihood function is then interpolated to find the best s and its associated PDF using Bayes' rule with a uniform prior distribution; the relative error s for each s , defined as s = s s −1 where s is the one standard deviation defined from the PDF, was found to be very small and consistent between the different parameter combinations (H, ) . Results are summarized for each set of simulations considered in Table 6. Therefore, we can recover the parameters accurately with this Bayesian approach with a fGn error model; the 90% confidence intervals are effective, but they remain large and widely varying for single realizations because of the stochastic internal variability.
Decreasing the resolution in H to 0.1 does not alter the result significantly and for the parameter estimation from observational data it will be sufficient to calculate the likelihood function ∀s ∈ [0.085s 0 , 1.15s 0 ],∀H ∈ [−1.5, 0.2] in increments of 0.1, ∀ ∈ [0.2, 1.0] in increments of 0.05 and ∀ ∈ [−0.3, 2.] in increments of 0.1. Since the relative error on s is small and consistent, we neglect the uncertainty on this parameter for simplicity and only consider the best value for each H, and parameter combination. The resulting 3-dimensional likelihood surface was then interpolated to higher resolution with a second-order interpolation and used to calculate the probabilities of the climate sensitivities and projections corresponding to each parameter combination. Table 6 Summary of results for test simulations of four (H, ) parameter combinations. The median H and given along with the associated confidence intervals are obtained from the combination of the PDFs of the 25 individual realizations of fGn The % correct corresponds to the number of realizations for which the true parameter lies in the 90% CI. The average relative error s for all realizations is given with its one standard deviation