# Scaling fluctuation analysis and statistical hypothesis testing of anthropogenic warming

## Authors

- First Online:

- Received:
- Accepted:

DOI: 10.1007/s00382-014-2128-2

- Cite this article as:
- Lovejoy, S. Clim Dyn (2014) 42: 2339. doi:10.1007/s00382-014-2128-2

- 9 Citations
- 2.1k Views

## Abstract

Although current global warming may have a large anthropogenic component, its quantification relies primarily on complex General Circulation Models (GCM’s) assumptions and codes; it is desirable to complement this with empirically based methodologies. Previous attempts to use the recent climate record have concentrated on “fingerprinting” or otherwise comparing the record with GCM outputs. By using CO_{2} radiative forcings as a linear surrogate for all anthropogenic effects we estimate the total anthropogenic warming and (effective) climate sensitivity finding: Δ*T*_{anth} = 0.87 ± 0.11 K, \(\uplambda_{{2{\text{x}}{\text{CO}}_{2} ,{\text{eff}}}} = 3.08 \pm 0.58\,{\text{K}}\). These are close the IPPC AR5 values Δ*T*_{anth} = 0.85 ± 0.20 K and \(\uplambda_{{2{\text{x}}{\text{CO}}_{2} }} = 1.5\!-\!4.5\,{\text{K}}\) (equilibrium) climate sensitivity and are independent of GCM models, radiative transfer calculations and emission histories. We statistically formulate the hypothesis of warming through natural variability by using centennial scale probabilities of natural fluctuations estimated using scaling, fluctuation analysis on multiproxy data. We take into account two nonclassical statistical features—long range statistical dependencies and “fat tailed” probability distributions (both of which greatly amplify the probability of extremes). Even in the most unfavourable cases, we may reject the natural variability hypothesis at confidence levels >99 %.

### Keywords

Anthropogenic warmingScalingNatural climate variabilityStatistical testing## 1 Introduction

Well before the advent of General Circulation Models (GCM’s), (Arrhenius 1896), proposed that greenhouse gases could cause global warming and he even made a surprisingly modern quantitative prediction. Today, GCM’s are so much the dominant tool for investigating the climate that debate centers on the climate sensitivity to a doubling of the CO_{2} concentration which—whether “equilibrium” or “transient”—is defined as a purely theoretical quantity being accessible only through models. Strictly speaking—short of a controlled multicentennial global scale experiment—it cannot be empirically measured at all. A consequence is that not enough attention has been paid to directly analyzing our ongoing uncontrolled experiment. For example, when attempts are made to test climate sensitivity predictions from the climate record, the tests still rely on GCM defined “fingerprints” (e.g. Santer et al. 2013) or the review in section 9.2.2 of 4th Assessment Report (AR4) of the International Panel on Climate Change (IPCC) or on other comparisons of the record with GCM outputs (e.g. Wigley et al. 1997; Foster and Rahmstorf 2011). This situation can easily lead to the impression that complex GCM codes are indispensible for inferring connections between greenhouse gases and global warming. An unfortunate side effect of this reliance on models is that it allows GCM skeptics to bring into question the anthropogenic causation of the warming. If only for these reasons, it is desirable to complement model based approaches with empirically based methodologies.

But there is yet another reason for seeking non-GCM approaches: the most convincing demonstration of anthropogenic warming has not yet been made—the statistical comparison of the observed warming during the industrial epoch against the null hypothesis for natural variability. To be as rigorous as possible, we must demonstrate that the probability that the current warming is no more than a natural fluctuation is so low that the natural variability may be rejected with high levels of confidence. Although the rejection of natural variability hypothesis would not “prove” anthropogenic causation, it would certainly enhance it’s credibility. Until this is done, there will remain some legitimate grounds for doubting the anthropogenic provenance of the warming. Such statistical testing requires knowledge of the probability distributions of natural fluctuations over roughly centennial scales (i.e. the duration of the industrial epoch CO_{2} emissions). To achieve this using GCM’s one would need to construct a statistical ensemble of realistic pre-industrial climates at centennial scales. Unfortunately the GCM variability at these (and longer) scales under natural (especially solar and volcanic) forcings is still the object of active research (e.g. “Millennium” simulations). At present, the variability at these long time scales is apparently somewhat underestimated (Lovejoy 2013) so that it is premature to use GCM’s for this purpose. Indeed, at the moment, the only way of estimating the centennial scale natural variability is to use observations (via multicentennial length multiproxies) and a (modest) use of scaling ideas.

The purpose of this paper is thus to establish an empirically based GCM-free methodology for quantifying anthropogenic warming. This involves two parts. The first part is to estimate both the total amplitude of the anthropogenic warming and the (empirically accessible) “effective” climate sensitivity. It is perhaps surprising that this is apparently the first time that the latter has been directly and simply estimated from surface temperature data. Two innovations were needed. First, we used a stochastic approach that combines all the (nonlinear) responses to natural forcings as well as the (natural) internal nonlinear variability into a single global stochastic quantity *T*_{nat}(*t*) that thus takes into account all the natural variability. In contrast, the anthropogenic warming (*T*_{anth}(*t*)) is treated as deterministic. The second innovation is to use the CO_{2} radiative forcing as a surrogate for all anthropogenic forcings. This includes not only the relatively well understood warmings due to the other long lived Green House Gases (GHG’s) but also the poorly understood cooling due to aerosols. The use of the CO_{2} forcing as a broad surrogate is justified by the common dependence (and high correlations) between the various anthropogenic effects due to their mutual dependencies on global economic activity (see Fig. 2a, b below).

The method employed in the first part (Sect. 2) leads to conclusions not very different from those obtained from GCM’s and other model based approaches. In contrast, the main part of the paper (Sect. 3), outlines the first attempt to statistically test the null hypothesis using the statistics of centennial scale natural fluctuations estimated from pre-industrial multiproxies. To make the statistical test strong enough, we use scaling ideas to parametrically bound the tails of the extreme fluctuations using extreme (“fat-tailed”, power law) probability distributions and we scale up the observed distributions from 64 to 125 years using a scaling assumption. Even in the most unfavourable cases, we may reject the natural variability hypothesis at confidence levels >99 %. These conclusions are robust because they take into account two nonclassical statistical features which greatly amplify the probability of extremes—long range statistical dependencies and the fat tails.

## 2 A stochastic approach

### 2.1 A simple stochastic hypothesis about the warming

Within the scientific community, there is a general consensus that in the recent epoch (here, since 1880) that anthropogenic radiative forcings have dominated natural ones so that solar and volcanic forcings and changes in land use are relatively unimportant in explaining the overall warming. This conclusion applies to centennial scales but by using fluctuation analysis on global temperatures it can be extended to somewhat shorter time scales [i.e. anthropogenic dominant for periods longer than ≈20–30 years for the global average temperature (Lovejoy et al. 2013b)].

*T*

_{globe}is the measured mean global temperature anomaly,

*T*

_{anth}is the deterministic anthropogenic contribution,

*T*

_{nat}is the (stochastic) natural variability (including the responses to the natural forcings) and ε is the measurement error. The latter can be estimated from the differences between the various observed global series and their means; it is nearly independent of time scale (Lovejoy et al. 2013a) and sufficiently small (≈±0.03 K) that we ignore it.

While Eq. 1 appears straightforward, it requires a few comments. The first point is that the anthropogenic contribution *T*_{anth}(*t*) is taken to be deterministic whereas the natural variability *T*_{nat}(*t*) is assumed to be stochastic. The second point is that this definition of *T*_{nat}(*t*) includes the responses to both volcanic, solar and any other natural forcings so that *T*_{nat}(*t*) does *not* represent pure “internal” variability. While at first sight this may seem reasonable, it is actually quite different from the usual treatments of solar and volcanic forcings and the corresponding responses which are deterministic and where stochasticity is restricted to (“pure”) internal variability (see e.g. Lean and Rind 2008). One of the reasons for the classical approach is that there is enough data to allow one to make reconstructions of past forcings. If they can be trusted, these hybrid model—data products allow GCM’s to model and isolate the corresponding responses. However, we suspect that another reason for these deterministic treatments—especially in the case of volcanic forcing—is that the intermittency of the process is so large that it is often assumed that the generating process could not be stationary. If it were true that solar and volcanic processes were nonstationary then their statistics would have to be specified as functions of time. In this case, little would be gained by lumping them in with the internal variability—which even in the presence of large anthropogenic forcing—is quite plausibly stationary since as assumed in GCM climate modelling, the effect of anthropogenic forcings is essentially to change the boundary conditions but not the internal dynamics.

*T*

_{nat}and verifying directly that it has the same industrial and pre industrial statistics.

The wide bounds indicated by the one standard deviation limits show that the variability of the process is so large that in spite of the fact that the RMS amplitude of the volcanic forcing over the industrial period is roughly a factor ≈2 lower than over the pre-industrial period (compare the dashed and solid green lines), that it is nevertheless generally within the one standard deviation bounds (red) of the stochastic multifractal process (i.e. the dashed green line generally lies between the thin red lines).

### 2.2 CO_{2} radiative forcing as a linear surrogate for anthropogenic effects

*T*

_{anth}. The main contribution is from CO

_{2}, for which there are fairly reliable reconstructions from 1880 as well as from reliable in situ measurements from Mauna Loa and Antarctica from 1959. In addition, there is general agreement about its radiative forcing (

*R*

_{F}) as a function of concentration \(\uprho_{{CO_{2} }}\):

_{2}doubling; the basic logarithmic form is a semi-analytic result from radiative transfer models, the values of the parameters are from the AR4. Beyond CO

_{2}, the main other anthropogenic forcings are from other long-lived greenhouse gases (warming) as well as the effect of aerosols (cooling). While the reconstruction of the global GHG forcing since 1880 is reasonably well estimated, that is not the case for aerosols which are short lived, poorly mixed (regionally concentrated), and whose effects (especially the indirect ones) are poorly understood (see below).

_{2}levels provide a convenient surrogate for the latter (over the period 1880–2004, \(\log_{2}\uprho_{{CO_{2} }}\) varies by only ≈0.5—half an octave in \(\uprho_{{CO_{2} }}\)—so that \(\uprho_{{CO_{2} }}\) and \(\log_{2}\uprho_{{CO_{2} }}\) are linear to within ±1.5 % and there is not so much difference between using \(\uprho_{{CO_{2} }}\) or \(R_{{F,{\text{CO}}_{2} }}\) as a surrogate). The strong connection with the economy can be seen using the recent (Frank et al. 2010) CO

_{2}reconstruction from 1880 to 2004 to estimate \(\log_{2} \left( {\uprho_{{CO_{2} }} /\uprho_{{CO_{2} ,pre}} } \right)\), Fig. 2a shows its correlation with the global Gross Domestic Product (GDP; correlation coefficient \(r_{{RFCO_{2} ,GDP}} = 0.963\)). Also shown is the annual global production of sulfates which is a proxy for the total (mostly sulfate) aerosol production. The high correlation coefficient (\(r_{{RFCO_{2} ,sulfate}} = 0.983\)) indicates that whatever cooling effect the aerosols have, that they are likely to be roughly linear in \(\log_{2} \left( {\uprho_{{CO_{2} }} /\uprho_{{CO_{2} ,pre}} } \right)\). Also shown in the figure [using data from Myhre et al. (2001)], is the total forcing of all GHG’s (including CO

_{2}); we find the very high correlation \(r_{{RFCO_{2} RF,GHG}} = 0.997\). This justifies the simple strategy adopted here of considering \(R_{{F,CO_{2} }}\) to be a well measured linear surrogate for

*R*

_{F,anth}(i.e. the two are considered to be equal to within a constant factor).

*R*

_{F,GHG}) as well as the total anthropogenic RF (including aerosols,

*R*

_{F,anth}) we present Fig. 2b. We see that \(R_{{F,{\text{CO}}_{2} }}\) and

*R*

_{F,GHG}are closely related with regressions yielding:

*R*

_{F,anth}, is from the direct and indirect cooling effects of aerosols, and is still under debate. Recent estimates (for both effects) are ≈−1.2 (AR4), −1.0 W/m

^{2}, (Myhre 2009) and ≈−0.6 W/m

^{2}, (Bauer and Menon 2012) (all with large uncertainties). Using the Mauna Loa estimate for \(\uprho_{{CO_{2} }}\) in 2012 (393.8 ppm, http://co2now.org/), these estimates can be compared to ≈1.9 W/m

^{2}for CO

_{2}and ≈3.1 W/m

^{2}for all GHG (the above relation). Using the

*R*

_{F,anth}data in Myhre et al. (2001) we obtain:

^{2}for aerosol cooling at the end of the (Myhre et al. 2001) series (1995). If the most recent cooling estimates (Bauer and Menon 2012) are correct (−0.6 W/m

^{2}), the amplitude of the cooling is diminished by 60 %, so that in Eq. 4 we obtain a proportionality constant ≈1.25 rather than 0.645.

### 2.3 The instrumental data and the effective climate sensitivity

*R*

_{F,anth}(i.e. \(T_{anth} \propto R_{{F,CO_{2} }}\)) we can define the “effective” climate sensitivity λ to a doubling of CO

_{2}by:

*T*

_{globe}(

*t*) against \(\log_{2} \left( {\uprho_{{CO_{2} }} (t)/\uprho_{{CO_{2} ,pre}} } \right)\); the slope yields \(\uplambda_{{2{\text{xCO}}_{2} ,{\text{eff}}}}\) and the residues

*T*

_{nat}(

*t*). As mentioned above, empirical estimates of the annually, globally averaged surface temperatures do not perfectly agree with each other, the differences between the series may be used to quantify the uncertainty in the estimates. For example, in this analysis, we used data over the period 1880–2008 from three sources: the NOAA NCDC (National Climatic Data Center) merged land, air and sea surface temperature dataset (abbreviated NOAA NCDC below), on a 5° × 5° grid (Smith et al. 2008), the NASA GISS (Goddard Institute for Space Studies) dataset (Hansen et al. 2010) (from 1880 on a 2° × 2°) and the HadCRUT3 dataset (Rayner et al. 2006) (on a 5° × 5° grid), and as mentioned earlier, these series only agree to within about ±0.03 K even at centennial scales. There are several reasons for the differences: HadCRUT3 is a merged product created out of the HadSST2 Sea Surface Temperature (SST) dataset and its companion dataset of atmospheric temperatures over land, CRUTEM3 (Brohan et al. 2006). Both the NOAA NCDC and the NASA GISS data were taken from http://www.esrl.noaa.gov/psd/; the others from http://www.cru.uea.ac.uk/cru/data/temperature/. The NOAA NCDC and NASA GISS are both heavily based on the Global Historical Climatology Network (Peterson and Vose 1997), and have many similarities including the use of sophisticated statistical methods to smooth and reduce noise. In contrast, the HadCRUT3 data are less processed, with corresponding advantages and disadvantages. Analysis of the space–time densities of the measurements shows that they are sparse (scaling) in both space and time (Lovejoy and Schertzer 2013). Even without other differences between the data sets, this strong sparseness means that we should not be surprised that the resulting global series are somewhat dependent on the assumptions about missing data.

*T*

_{globe}(

*t*) series is shown in Fig. 3a as functions of \(\log_{2} \left( {\uprho_{{CO_{2} }} (t)/\uprho_{{CO_{2} ,pre}} } \right)\); the result is indeed quite linear with slope equal to the effective climate sensitivity to CO

_{2}doubling. We find:

_{2}from the reconstruction) and 1959–2004 (using the mean of the instrumental Mauna Loa and Antarctica CO

_{2}), the correlation coefficients are respectively \(r_{{RFCO_{2} ,T}}\) = 0.920, 0.968. Note that this simple direct estimate of \(\uplambda_{{2{\text{x}},{\text{CO}}_{2} }}\) can be compared with several fairly similar but more complex analyses (notably multiple regressions which include CO

_{2}), see Lean and Rind (2008), Muller et al. (2013). By use of the proportionality constants between

*R*

_{F,anth}and \(R_{{F,CO_{2} }}\) we can estimate the effects of a pure CO

_{2}doubling. For the strongly cooling aerosols (Myhre et al. 2001) we obtained 0.645 (Eq. 4) whereas for the weakly cooling (Bauer and Menon 2012), aerosols we obtained 1.25. These lead to the pure CO

_{2}doubling estimates \(\uplambda_{{2{\text{x}},{\text{CO}}_{2} ,{\text{pure}}}}\) = 3.61 ± 0.34 and 1.86 ± 0.18 K respectively.

If we plot the temperatures in the usual way as functions of time, we obtain Fig. 3b, c where we also show the anthropogenic contribution estimated with \(\uplambda_{{2{\text{x}},{\text{CO}}_{2} ,{\text{eff}}}}\) from Eq. 6 and *T*_{anth} from Eq. 5. It follows the temperatures very well, and we can already see that the residues (*T*_{nat}(*t*)) are fairly small. Using these estimates of the anthropogenic contribution, we can estimate the total change in temperature as *T*_{anth} = 0.85 ± 0.08 over the entire industrial period (see the discussion below). Note that the same methodology can be used to analyze the postwar cooling and the recent “pause” in the warming; this is the subject of current work in progress.

### 2.4 The time Lagged sensitivities

*R*

_{F}is to warm the oceans (Lyman et al. 2010) so that we expect a time lag between the forcing and atmospheric response, for example, with GCM’s (Hansen et al. 2005) finds a lag of 25–50 years, and (Lean and Rind 2008) empirically find a lag of 10 years (of course, the situation is not quite so simple due to feedbacks). By considering the time lagged cross correlation between \(R_{{F,CO_{2} }}\) and

*T*

_{globe}(Fig. 4) it is found that the cross correlations are so high (with maximum 0.94) that the maximally correlated lag is not well pronounced. To clarify this, we also calculated the corresponding curves for the cross correlation of the temperature fluctuations (ΔT, differences) at a 5 year resolution. The fluctuations are more weakly correlated than with the temperatures themselves so that this is a bit more sensitive to varying lags. In all cases, we can see that the maximum is roughly between a lag of zero and 20 years. However, the effective climate sensitivity to doubling CO

_{2}increases from 2.33 ± 0.22 (zero lag) to 3.82 ± 0.54 with a 20 year lag (see Fig. 3c for a comparison with the zero lag anthropogenic and empirical global temperatures). If we use a Bayesian approach and assign equal a priori probabilities to all the lags between zero and 20 years, then we obtain the estimate \(\uplambda_{{2{\text{x}},{\text{CO}}_{2} ,{\text{eff}}}}\) = 3.08 ± 0.58 K which is (unsurprisingly) quite close to the 10 year lag value (Fig. 4). Note that we could use a general linear relation between forcings and responses using Green’s functions, but this would require additional assumptions and is not necessary at this point.

### 2.5 Effective and equilibrium climate sensitivities

Our estimate of \(\uplambda_{{2{\text{x}}{\text{CO}}_{2} ,{\text{eff}}}}\) has the advantage of being not only independent of GCM’s, but also with respect to assumptions about radiative transfer, historical (non CO_{2}) GHG and aerosol emission histories. However, \(\uplambda_{{2{\text{x}}{\text{CO}}_{2} ,{\text{eff}}}}\) is an “effective” sensitivity both because it uses CO_{2} as a surrogate for all the anthropogenic *R*_{F}, and also because it is not a usual “equilibrium climate sensitivity” defined as “the equilibrium annual global mean temperature response to a doubling of equivalent atmospheric CO_{2} from pre-industrial levels” (AR4). Since only GCM’s can truly attain “equilibrium” [and this only asymptotically in a slow power law manner (Lovejoy et al. 2013a)], this climate sensitivity is really a theoretical/model concept that can at best only be approximated with real world data. From an empirical point of view, whereas the effective climate sensitivity is the actual sensitivity to our current (uncontrolled) experiment, the equilibrium and transient sensitivities are the analogues for various (impractical) controlled experiments.

Because of the differences in the definitions of climate sensitivity, it would be an exaggeration to claim that we have empirically validated the GCM based results, even though our value \(\uplambda_{{2xCO_{2} ,eff}}\) = 3.08 ± 0.58 (taking into account the uncertainty in the lag) is very close to literature values (c.f. the AR5 range 1.5–4.5 K, the AR4 range 2–4.5 K, and the value 3 ± 1.5 K adopted by the National Academy of Sciences (1979) and the AR1–3 reports). It is not obvious whether effective or equilibrium sensitivities are more relevant for predicting the temperature rise in the twenty-first century.

## 3 Statistical analysis

### 3.1 The stationarity of the residuals *T*_{nat} and comparison with the pre-industrial *T*_{nat}

*T*

_{nat}(

*t*)) have statistics very similar to those of

*T*

_{globe}in pre-industrial epochs (when

*T*

_{anth}= 0) so that as hypothesized in Eq. 1, they could all be realizations of the same stochastic process. As a first confirmation of this, in the top two curves of Fig. 5 we plot both

*T*

_{globe}and

*T*

_{nat}estimated from the residuals \((\text{i.e.}\ T_{nat} (t) = T_{globe} (t) -\uplambda_{{2xCO_{2}, eff}} \log_{2} ( {\uprho_{{CO_{2} }} (t)/\uprho_{{CO_{2} ,pre}} } ))\). Even without any formal statistical analysis, we see—as expected—that whereas

*T*

_{globe}is clearly increasing,

*T*

_{nat}is roughly flat. However, for Eq. 1 to be verified, we also require that the residuals have similar statistics to the preindustrial fluctuations when

*T*

_{anth}= 0 and

*T*

_{globe}=

*T*

_{nat}. In order to establish this, we must use multiproxy reconstructions which are the only source of annual resolution preindustrial global scale temperatures.

*t*>≈ 100 years (preindustrial). However, one of these series (Ljundqvist 2010) was at 10 year resolution and was not suited for the present study which required annual series. It was therefore replaced by the Ammann and Wahl (2007) update of the original (Mann et al. 1998) reconstruction which although having somewhat smaller multicentennial variability was statistically not too different (see Fig. 6 for a comparison of the probability distributions of the differences at lags of 1 year). This shows that at 1 year resolution, fluctuations from the different multiproxies have nearly the same probability distributions although with slightly different amplitudes (c.f. the left–right shift on the log–log plot). Changes in the amplitude arise due to varying degrees of spatial averaging so that—given the different types and quantities of data contributing to each multiproxy—these amplitude differences are not surprising (see Lovejoy and Schertzer 2013). In the figure we also see the residuals of the unlagged estimate of

*T*

_{nat}. At this scale the residuals have slightly larger variability (see the comparison of the standard deviations as functions of scale in Fig. 7), although after Δ

*t*≈ 4 years, it falls within the epoch to epoch variations of the mean of the multiproxies.

We can now make a first comparison between the industrial epoch residuals and the pre-industrial anomalies; see the bottom three curves in Fig. 5. To mimick the 125 year industrial period, the multiproxies were divided into 3 × 125 pre-industrial periods (1500–1624, 1625–1749, 1750–1875) as shown, each with its overall mean removed. We see that while the industrial epoch temperatures increase strongly as functions of time, that the amplitudes and visual appearances of the residuals and the multiproxies are strikingly similar.

We now turn to the problem of making this similitude quantitative. The traditional way to characterize the variability over a wide range of scales is by spectral analysis. It is typically found that climate spectra are dominated by red noise “backgrounds” and over wide ranges, these are roughly power laws (scaling) indicating that over the range, there is no characteristic scale and (in general) that there are long range statistical dependencies (e.g. correlations; see Lovejoy 2014 for recent overview and discussion). However spectral analysis has disadvantages, the most important of which is that its interpretation is not as straightforward as for real-space alternatives. This has lead to the development of wavelets and other methods of defining fluctuations [e.g. Detrended Fluctuation Analysis (Peng et al. 1994)]. However Lovejoy and Schertzer (2012b) shows that the simple expedient of defining fluctuations over intervals Δ*t* by the differences in the means over the first and second halves of the interval (“Haar fluctuations”) is particularly advantageous since unlike differences—which on (ensemble) average do not decrease—Haar fluctuations can both increase and decrease. The critical distinction between increasing and decreasing fluctuations corresponds to a spectral exponent greater or less than β = 1 (ignoring small intermittency corrections). In regions where the Haar fluctuations increase they are proportional to differences, in regions where they decrease, they are proportional to averages so that the interpretation is very straightforward.

### 3.2 Fluctuation analysis of the industrial residuals and preindustrial multiproxies

*t*≈ 10 years they are quite close to each other (and slowly decreasing), then they rapidly diverge with the RMS preindustrial differences (σ

_{Δt}) remaining roughly constant (σ

_{Δt}≈ 0.20 ± 0.03) until about 125 years. Figure 8 shows the corresponding figure for the Haar fluctuations. Again we find that the industrial and preindustrial curves are very close up to ≈10 years followed by a divergence due to the high decadal and longer scale industrial period variability. Note that the preindustrial Haar fluctuations decrease slowly until ≈125 years. When we consider the RMS residuals we find they are mainly within the one standard deviation error bars of the epoch to epoch multiproxy variability so that as predicted (Eq. 1) removing the anthropogenic contribution gives residuals

*T*

_{nat}with statistics close to those of the pre-industrial multiproxies (Fig. 8).

For the (preindustrial) multiproxies we see that between ≈10 and 125 years, the RMS differences are ≈constant, this is expected due to the slight decrease of the Haar fluctuations (Fig. 8) over this range, see the “Appendix” for a discussion. The solid line at the right (at scales > 125 years) has a slope ≈0.4; it shows the increase in the variability in the climate regime. From the graph at 125 years the RMS difference may be estimated as 0.20 ± 0.03 K.

The Haar fluctuations were multiplied by a “calibration” factor = 2 so that they would be close to the difference fluctuations (Fig. 7). Note that a straight line slope H corresponds to a power law spectrum exponent 1 + 2H so that a flat line has spectrum E(ω) ≈ ω^{−1}, and hence long range statistical dependencies (for comparison Gaussian white noise has slope −0.5). The roughly log-log linear decline of the multiproxy variability to about Δt ≈ 125 years is the (fluctuation cancelling, decreasing) macroweather regime, the rise beyond it, the “wandering” climate regime (Lovejoy 2013).

### 3.3 Estimating the probability that the warming is due to natural variability

Regressing \(R_{{F,{\text{CO}}_{2} }}\) against the global mean temperature leads to satisfactory results in the sense that the residuals and preindustrial multiproxies are plausibly realizations of the same stochastic process. However, this result is not too sensitive to the exact method of estimating *T*_{anth} and *T*_{nat}—the 20 year lagged residuals are a bit better although using simply a linear regression of *T*_{globe} against time is substantially worse; see Fig. 8. From the point of view of determining the probability that the warming is natural, the key quantity is therefore the total anthropogenic warming Δ*T*_{ant} = *T*_{ant}(2004) − *T*_{ant}(1880). Using the \(\log_{2}\uprho_{CO_2}\) method (Fig. 3a) we find Δ*T*_{anth} ≈ 0.85 ± 0.08 K and with a 20 year lag ≈0.90 ± 0.13 K (the zero lag northern hemisphere value is 0.94 ± 0.09 K). With a Bayesian approach, assuming equal a priori probabilities of any lag between 0 and 20 years, we obtain Δ*T*_{anth} ≈ 0.87 ± 0.11; for comparison, for the linear in time method, we obtain ≈0.75 ± 0.07 K (essentially the same as the AR4 estimate which used a linear fit to the HadCRUT series over the period 1900–2004). We can also estimate an upper bound—the total range Δ*T*_{globe,range} = *Max*(Δ*T*_{globe}) ≈ 1.04 ± 0.03 K so that (presumably) Δ*T*_{anth} < Δ*T*_{globe,range}.

*t*= 64 years, Fig. 9), and then using the scaling of the distributions and RMS fluctuations to deduce the form at Δ

*t*= 125 years, (see the “Appendix”). We find the 125 year RMS temperature difference \(\left\langle {\Delta T(125)^{2} } \right\rangle^{1/2} =\upsigma_{125} = 0.20 \pm 0.03\,{\text{K}}\) (Fig. 7). Theoretically, spatial and temporal scaling are associated with probabilities with power law “fat” tails (i.e.

*Pr*(Δ

*T*>

*s*) ≈

*s*

^{−qD}for the probability of a fluctuation exceeding a threshold

*s*;

*q*

_{D}is an exponent), hence in Fig. 10 we compare

*q*

_{D}= 4, 6 and

*q*

_{D}= ∞ (a pure Gaussian). We see that the former two values bracket the distributions (including their extremes) over the whole range of large fluctuations (the extreme 3 %).

Stated succinctly, our statistical hypothesis on the natural variability is that its extreme probabilities (*Pr* < 3 %) are bracketed by a modified Gaussian with *q*_{D} between 4 and 6 and with standard deviation (and uncertainties) given by the scaling of the multiproxies in Fig. 7: σ_{125} = 0.20 ± 0.03 K. For large enough probabilities (small *s*), the modified Gaussian is simply a Gaussian, but below a probability threshold (above a critical threshold *s*_{qD}) the logarithmic slope is equal to −*q*_{D}; i.e. it is a power law (see the “Appendix” for details). With this, we can evaluate the corresponding probability bounds for various estimates of Δ*T*_{anth}. These probabilities are conveniently displayed in Fig. 10 by boxes. For example, the AR4 Δ*T*_{anth} = 0.74 ± 0.18 K (thick red box) yields a probability (*p*): 0.009 % < *p* < 0.6 % whereas the (unlagged) \(\log_{2}\uprho_{{CO_{2} }}\) regression (filled red box) yields 0.0009 % < *p* < 0.2 % and the 20 year lag (dashed blue) yields 0.002 % < *p* < 0.2 %, the northern hemisphere yields 0.009 % < *p* < 0.1 % with most likely values (using *q*_{D} = 5) of 0.08, 0.08, 0.03, 0.03 % respectively. In even the most extreme cases, the hypothesis that the observed warming is due to natural variability may be rejected at confidence levels 1 − *p* > 99 %, and with the most likely values, at levels >99.9 %. The other cases considered do not alter these conclusions (Fig. 10).

## 4 Conclusions

Two aspects of anthropogenic global warming are frequent sources of frustration. The first is the lack of a quantitative theory of natural variability with which to compare the observed warming Δ*T*_{anth}, the second is the near exclusive reliance on GCM’s to estimate it. In this paper we have argued that since ≈1880, anthropogenic warming has dominated the natural variability to such an extent that straightforward empirical estimates of the total warming can be made. The one favoured here—using CO_{2} radiative forcing (*R*_{F}) as a surrogate for all anthropogenic *R*_{F}—gives both effective sensitivities \(\uplambda_{{2xCO_{2} ,eff}}\) and total anthropogenic increases Δ*T*_{anth} (3.08 ± 0.58 and 0.87 ± 0.11 K) comparable to the AR4, AR5 estimates (1.5–4.5 K and 0.74 ± 0.18 K for the slightly shorter period 1900–2005). The method was justified because we showed that over a wide range of scales, the residuals have nearly the same statistics as the preindustrial multiproxies. An additional advantage of this approach is that it is independent of many assumptions and uncertainties including radiative transfer, GCM and emission histories. The main uncertainty is the duration of the lag between the forcing and the response.

Whether one estimates Δ*T*_{anth} using the empirical method proposed here, or using a GCM based alternative, when Δ*T*_{anth} is combined with the scaling properties of multiproxies we may estimate the probabilities as functions of time scale and test the hypothesis that the warming is due to natural variability. Our statistical hypothesis—supported by the multiproxy data—is that due to the scaling—there are long range correlations in the temperature fluctuations coupled with nonclassical “fat tailed” probability distributions which bracket the observed probabilities. Both effects lead to significantly higher probabilities than would be expected from classical “scale bound” (exponentially decorrelated) processes and/or with “thin” (e.g. Gaussian or exponential) tails. However, even in the most extreme cases, we are still able to reject the natural variability hypothesis with confidence levels >99 %—and with the most likely values—at levels >99.9 %. Finally, fluctuation analysis shows that the variability of the recent period solar forcing was close to preindustrial levels (at all scales), and that volcanic forcing variabilities were a factor ≈2 times weaker (at all scales), so that they cannot explain the warming either.

In the AR5, the IPCC estimated our confidence in the truth of the anthropogenic warming hypothesis as 95–100 %. While our new result is easily compatible with this, it is really more complementary than equivalent. Whereas the IPCC focuses on determining how much confidence we have in the truth of anthropogenic warming, the approach outlined here estimates our confidence in the falsity of natural variability. But there is a fundamental asymmetry: whereas no theory can ever be *proven* to be true beyond a somewhat subjective “reasonable doubt”—a theory can effectively be *disproven* by a single decisive experiment. In the case of anthropogenic warming, our confidence is based on a complex synthesis of data analysis, numerical model outputs and expert judgements. But no numerical model is perfect, no two experts agree on everything, and the IPCC confidence quantification itself depends on subjectively chosen methodologies. In comparison, our approach makes no use of numerical models nor experts, instead it attempts to directly evaluate the probability that the warming is simply a giant century long natural fluctuation. While students of statistics know that the statistical rejection of a hypothesis cannot be used to conclude the truth of any specific alternative, nevertheless—in many cases including this one—the rejection of one greatly enhances the credibility of the other.

## Acknowledgments

P. Dubé, the president of the Quebec Skeptical Society, is thanked for helping to motivate this work. An anonymous reviewer of an earlier version of this paper is thanked for the opinion that a GCM free approach to anthropogenic warming cannot work, concluding: “go get your own GCM”.This work was unfunded, there were no conflicts of interest.