Climate Dynamics

, Volume 42, Issue 9, pp 2339–2351

Scaling fluctuation analysis and statistical hypothesis testing of anthropogenic warming



DOI: 10.1007/s00382-014-2128-2

Cite this article as:
Lovejoy, S. Clim Dyn (2014) 42: 2339. doi:10.1007/s00382-014-2128-2


Although current global warming may have a large anthropogenic component, its quantification relies primarily on complex General Circulation Models (GCM’s) assumptions and codes; it is desirable to complement this with empirically based methodologies. Previous attempts to use the recent climate record have concentrated on “fingerprinting” or otherwise comparing the record with GCM outputs. By using CO2 radiative forcings as a linear surrogate for all anthropogenic effects we estimate the total anthropogenic warming and (effective) climate sensitivity finding: ΔTanth = 0.87 ± 0.11 K, \(\uplambda_{{2{\text{x}}{\text{CO}}_{2} ,{\text{eff}}}} = 3.08 \pm 0.58\,{\text{K}}\). These are close the IPPC AR5 values ΔTanth = 0.85 ± 0.20 K and \(\uplambda_{{2{\text{x}}{\text{CO}}_{2} }} = 1.5\!-\!4.5\,{\text{K}}\) (equilibrium) climate sensitivity and are independent of GCM models, radiative transfer calculations and emission histories. We statistically formulate the hypothesis of warming through natural variability by using centennial scale probabilities of natural fluctuations estimated using scaling, fluctuation analysis on multiproxy data. We take into account two nonclassical statistical features—long range statistical dependencies and “fat tailed” probability distributions (both of which greatly amplify the probability of extremes). Even in the most unfavourable cases, we may reject the natural variability hypothesis at confidence levels >99 %.


Anthropogenic warmingScalingNatural climate variabilityStatistical testing

1 Introduction

Well before the advent of General Circulation Models (GCM’s), (Arrhenius 1896), proposed that greenhouse gases could cause global warming and he even made a surprisingly modern quantitative prediction. Today, GCM’s are so much the dominant tool for investigating the climate that debate centers on the climate sensitivity to a doubling of the CO2 concentration which—whether “equilibrium” or “transient”—is defined as a purely theoretical quantity being accessible only through models. Strictly speaking—short of a controlled multicentennial global scale experiment—it cannot be empirically measured at all. A consequence is that not enough attention has been paid to directly analyzing our ongoing uncontrolled experiment. For example, when attempts are made to test climate sensitivity predictions from the climate record, the tests still rely on GCM defined “fingerprints” (e.g. Santer et al. 2013) or the review in section 9.2.2 of 4th Assessment Report (AR4) of the International Panel on Climate Change (IPCC) or on other comparisons of the record with GCM outputs (e.g. Wigley et al. 1997; Foster and Rahmstorf 2011). This situation can easily lead to the impression that complex GCM codes are indispensible for inferring connections between greenhouse gases and global warming. An unfortunate side effect of this reliance on models is that it allows GCM skeptics to bring into question the anthropogenic causation of the warming. If only for these reasons, it is desirable to complement model based approaches with empirically based methodologies.

But there is yet another reason for seeking non-GCM approaches: the most convincing demonstration of anthropogenic warming has not yet been made—the statistical comparison of the observed warming during the industrial epoch against the null hypothesis for natural variability. To be as rigorous as possible, we must demonstrate that the probability that the current warming is no more than a natural fluctuation is so low that the natural variability may be rejected with high levels of confidence. Although the rejection of natural variability hypothesis would not “prove” anthropogenic causation, it would certainly enhance it’s credibility. Until this is done, there will remain some legitimate grounds for doubting the anthropogenic provenance of the warming. Such statistical testing requires knowledge of the probability distributions of natural fluctuations over roughly centennial scales (i.e. the duration of the industrial epoch CO2 emissions). To achieve this using GCM’s one would need to construct a statistical ensemble of realistic pre-industrial climates at centennial scales. Unfortunately the GCM variability at these (and longer) scales under natural (especially solar and volcanic) forcings is still the object of active research (e.g. “Millennium” simulations). At present, the variability at these long time scales is apparently somewhat underestimated (Lovejoy 2013) so that it is premature to use GCM’s for this purpose. Indeed, at the moment, the only way of estimating the centennial scale natural variability is to use observations (via multicentennial length multiproxies) and a (modest) use of scaling ideas.

The purpose of this paper is thus to establish an empirically based GCM-free methodology for quantifying anthropogenic warming. This involves two parts. The first part is to estimate both the total amplitude of the anthropogenic warming and the (empirically accessible) “effective” climate sensitivity. It is perhaps surprising that this is apparently the first time that the latter has been directly and simply estimated from surface temperature data. Two innovations were needed. First, we used a stochastic approach that combines all the (nonlinear) responses to natural forcings as well as the (natural) internal nonlinear variability into a single global stochastic quantity Tnat(t) that thus takes into account all the natural variability. In contrast, the anthropogenic warming (Tanth(t)) is treated as deterministic. The second innovation is to use the CO2 radiative forcing as a surrogate for all anthropogenic forcings. This includes not only the relatively well understood warmings due to the other long lived Green House Gases (GHG’s) but also the poorly understood cooling due to aerosols. The use of the CO2 forcing as a broad surrogate is justified by the common dependence (and high correlations) between the various anthropogenic effects due to their mutual dependencies on global economic activity (see Fig. 2a, b below).

The method employed in the first part (Sect. 2) leads to conclusions not very different from those obtained from GCM’s and other model based approaches. In contrast, the main part of the paper (Sect. 3), outlines the first attempt to statistically test the null hypothesis using the statistics of centennial scale natural fluctuations estimated from pre-industrial multiproxies. To make the statistical test strong enough, we use scaling ideas to parametrically bound the tails of the extreme fluctuations using extreme (“fat-tailed”, power law) probability distributions and we scale up the observed distributions from 64 to 125 years using a scaling assumption. Even in the most unfavourable cases, we may reject the natural variability hypothesis at confidence levels >99 %. These conclusions are robust because they take into account two nonclassical statistical features which greatly amplify the probability of extremes—long range statistical dependencies and the fat tails.

2 A stochastic approach

2.1 A simple stochastic hypothesis about the warming

Within the scientific community, there is a general consensus that in the recent epoch (here, since 1880) that anthropogenic radiative forcings have dominated natural ones so that solar and volcanic forcings and changes in land use are relatively unimportant in explaining the overall warming. This conclusion applies to centennial scales but by using fluctuation analysis on global temperatures it can be extended to somewhat shorter time scales [i.e. anthropogenic dominant for periods longer than ≈20–30 years for the global average temperature (Lovejoy et al. 2013b)].

Let us therefore make the hypothesis that anthropogenic forcings are indeed dominant (skeptics may be assured that this hypothesis will be tested and indeed quantified in the following analysis). If this is true, then it is plausible that they do not significantly affect the statistical type or amplitude of the natural variability so that a simple model may suffice:
$$T_{globe(t)} = T_{anth} (t) + T_{nat} (t) + \varepsilon (t)$$
Tglobe is the measured mean global temperature anomaly, Tanth is the deterministic anthropogenic contribution, Tnat is the (stochastic) natural variability (including the responses to the natural forcings) and ε is the measurement error. The latter can be estimated from the differences between the various observed global series and their means; it is nearly independent of time scale (Lovejoy et al. 2013a) and sufficiently small (≈±0.03 K) that we ignore it.

While Eq. 1 appears straightforward, it requires a few comments. The first point is that the anthropogenic contribution Tanth(t) is taken to be deterministic whereas the natural variability Tnat(t) is assumed to be stochastic. The second point is that this definition of Tnat(t) includes the responses to both volcanic, solar and any other natural forcings so that Tnat(t) does not represent pure “internal” variability. While at first sight this may seem reasonable, it is actually quite different from the usual treatments of solar and volcanic forcings and the corresponding responses which are deterministic and where stochasticity is restricted to (“pure”) internal variability (see e.g. Lean and Rind 2008). One of the reasons for the classical approach is that there is enough data to allow one to make reconstructions of past forcings. If they can be trusted, these hybrid model—data products allow GCM’s to model and isolate the corresponding responses. However, we suspect that another reason for these deterministic treatments—especially in the case of volcanic forcing—is that the intermittency of the process is so large that it is often assumed that the generating process could not be stationary. If it were true that solar and volcanic processes were nonstationary then their statistics would have to be specified as functions of time. In this case, little would be gained by lumping them in with the internal variability—which even in the presence of large anthropogenic forcing—is quite plausibly stationary since as assumed in GCM climate modelling, the effect of anthropogenic forcings is essentially to change the boundary conditions but not the internal dynamics.

However, it is quite likely that the basic solar and terrestrial stochastic processes responsible for variable solar output and volcanic activity are unchanged over the last millennium, yet that the corresponding stochastic realizations of these processes are highly intermittent, scaling and multifractal giving a spurious appearance of nonstationarity (multifractals have nonclassical scaling behviours: unlike quasi-Gaussian processes, each statistical moment is characterized by a different exponent and there are strong resolution dependencies). While the basic analyses were presented in Lovejoy and Schertzer (2012c) we revisit and reanalyze them here. Consider Fig. 1a which shows the (Gao et al. 2008) volcanic reconstruction from 500 to 2000 A.D. along with three realizations of a multifractal process with identical statistical parameters [estimated by the analysis of the reconstructions in Lovejoy and Schertzer (2012c)], calibrated so that the overall process (but not each realization!) has the observed mean. It is very hard to distinguish the reconstruction from the three independent realizations. Since by construction, the multifractal process is stationary, this strongly supports the hypothesis that the mechanism behind terrestrial volcanism during the last 1500 years has not changed. Similar conclusions apply to the solar output (excluding the 11 year cycle) although—since its intermittency is much smaller—this is perhaps less surprising. Further support for this comes from the fluctuation analysis in Fig. 1b which compares the RMS fluctuations of the reconstruction over the (mostly) pre-industrial period 1500–1900 and the industrial period 1880–2000 with the RMS fluctuations of the corresponding multifractal simulations. We see that although the amplitude of the industrial period fluctuations is a factor ≈2 lower than for the pre-industrial period, that this is well within what is expected due to the (very high) natural variability of volcanic processes (note that the fluctuations isolate the variability as a function of time scale, they are independent of the absolute level of the forcing; for more analysis, see Lovejoy and Schertzer 2012c; Lovejoy et al. 2014). Finally, Fig. 1c shows the corresponding analyses for the volcanic reconstruction as well as two solar reconstructions, with the same basic conclusions: they may all be considered stationary and there is nothing unusual about the statistics in the recent epoch when compared to the pre-industrial epoch. In any event, we shall see below that Eq. 1 can be justified ex-post-facto by empirically estimating Tnat and verifying directly that it has the same industrial and pre industrial statistics.
Fig. 1

a The 1500 year (Gao et al. 2008) volcanic reconstruction of the radiative forcing (over the period 500–2000 A.D.) along with three multifractal simulations with the measured parameters (C1 = 0.2, H = −0.3, α = 1.8; estimated in Lovejoy and Schertzer (2012c). The simulations differed only by their random seeds and were calibrated to have the same average forcing value (0.15 W/m2). The fact that the reconstruction is essentially indistinguishable from these statistically stationary multifractal simulations strongly supports the hypothesis that the basic volcanism responsible for eruptions over this period is constant. The reconstruction is in the upper right, the others are “fakes”. b The RMS fluctuations for the (Gao et al. 2008) reconstruction (green, thick) for the period 500–2000 (solid) and 1880–2000 (dashed; see c for the slightly different curve for the period 1500–1900). The fluctuations over a lag Δt are defined by the difference of the average over the first and second halves of the interval (“Haar” fluctuations, see Sect. 3.1). Also shown is the ensemble average (thin black line) of ten realizations of the multifractal process with the a parameters. The thin dashed black lines indicate the one standard deviation bounds of the log of the RMS fluctuations estimated from the realization to realization variability for 500 year simulated segments. The thin red lines are for the bounds for 100 year segments (they are wider since the variability is less averaged out than for the 500 year bounds). c The RMS radiative forcing fluctuations for the (Gao et al. 2008), volcanic reconstruction (since 1500) as well as the same from sunspot based solar reconstructions (Wang et al. 2005; Krivova et al. 2007) (from 1610). The full lines are for the period up to 1900, the dashed lines for the period since 1880. One can see that the industrial and preindustrial solar fluctuations are of nearly the same. In contrast, the amplitude of the volcanic forcing fluctuations have decreased by a factor ≈2 in the recent period (note that this does not imply a change in the amplitude of the forcing itself). For a more complete analysis of the fluctuations over the whole period, see Lovejoy and Schertzer (2012c)

The wide bounds indicated by the one standard deviation limits show that the variability of the process is so large that in spite of the fact that the RMS amplitude of the volcanic forcing over the industrial period is roughly a factor ≈2 lower than over the pre-industrial period (compare the dashed and solid green lines), that it is nevertheless generally within the one standard deviation bounds (red) of the stochastic multifractal process (i.e. the dashed green line generally lies between the thin red lines).

2.2 CO2 radiative forcing as a linear surrogate for anthropogenic effects

The first step in testing Eq. 1 is to empirically estimate Tanth. The main contribution is from CO2, for which there are fairly reliable reconstructions from 1880 as well as from reliable in situ measurements from Mauna Loa and Antarctica from 1959. In addition, there is general agreement about its radiative forcing (RF) as a function of concentration \(\uprho_{{CO_{2} }}\):
$$R_{{F,CO_{2} }} = R_{{F,2xCO_{2} }} \log_{2} \left( {\uprho_{{CO_{2} }} /\uprho_{{CO_{2} ,pre}} } \right);\;R_{{F,2xCO_{2} }} = 3.7\,{\text{W}}/{\text{m}}^{2} ;\;\uprho_{{CO_{2} ,pre}} = 277\,{\text{ppm}}$$
where \(R_{{F,2xCO_{2} }}\) is the forcing for CO2 doubling; the basic logarithmic form is a semi-analytic result from radiative transfer models, the values of the parameters are from the AR4. Beyond CO2, the main other anthropogenic forcings are from other long-lived greenhouse gases (warming) as well as the effect of aerosols (cooling). While the reconstruction of the global GHG forcing since 1880 is reasonably well estimated, that is not the case for aerosols which are short lived, poorly mixed (regionally concentrated), and whose effects (especially the indirect ones) are poorly understood (see below).
However, all the key anthropogenic effects are functions of economic activity, the CO2 levels provide a convenient surrogate for the latter (over the period 1880–2004, \(\log_{2}\uprho_{{CO_{2} }}\) varies by only ≈0.5—half an octave in \(\uprho_{{CO_{2} }}\)—so that \(\uprho_{{CO_{2} }}\) and \(\log_{2}\uprho_{{CO_{2} }}\) are linear to within ±1.5 % and there is not so much difference between using \(\uprho_{{CO_{2} }}\) or \(R_{{F,{\text{CO}}_{2} }}\) as a surrogate). The strong connection with the economy can be seen using the recent (Frank et al. 2010) CO2 reconstruction from 1880 to 2004 to estimate \(\log_{2} \left( {\uprho_{{CO_{2} }} /\uprho_{{CO_{2} ,pre}} } \right)\), Fig. 2a shows its correlation with the global Gross Domestic Product (GDP; correlation coefficient \(r_{{RFCO_{2} ,GDP}} = 0.963\)). Also shown is the annual global production of sulfates which is a proxy for the total (mostly sulfate) aerosol production. The high correlation coefficient (\(r_{{RFCO_{2} ,sulfate}} = 0.983\)) indicates that whatever cooling effect the aerosols have, that they are likely to be roughly linear in \(\log_{2} \left( {\uprho_{{CO_{2} }} /\uprho_{{CO_{2} ,pre}} } \right)\). Also shown in the figure [using data from Myhre et al. (2001)], is the total forcing of all GHG’s (including CO2); we find the very high correlation \(r_{{RFCO_{2} RF,GHG}} = 0.997\). This justifies the simple strategy adopted here of considering \(R_{{F,CO_{2} }}\) to be a well measured linear surrogate for RF,anth (i.e. the two are considered to be equal to within a constant factor).
Fig. 2

a This shows the annual world sulfate aerosol production from 1880 to 1990 [top, pink, from Smith et al. (2004)], the total Greenhouse Gas radiative forcing from 1880 to 1995 [orange, from Myhre et al. (2001), including CO2], and the world Gross Domestic Product (GDP, 1880–2000, blue, from J. Bradford DeLong of the Department of Economics, U.C. Berkeley: all nondimensionalized by their maximum values (6.9 × 107 metric tons/year, 2.29 W/m2, $4.1 × 1013 respectively). The regression lines have slopes corresponding to an increase of 2.8 × 108 metric tons of sulfate for each CO2 doubling, and an increase of GHG forcing by 6.63 W/m2 for each CO2 doubling, an increase of GDP by $1.1 × 1014 for every CO2 doubling. The correlation coefficients are 0.983, 0.997, 0.963 for sulfate production, total GHG forcing and GDP respectively. b Over the period 1880–1995, the relationship between the radiative forcing of CO2 (\({\text{R}}_{{{\text{F}},{\text{CO}}_{2} }}\)), the radiative forcing of all the long lived Greenhouse Gases (including CO2: RF, GHG) and the total radiative forcing of all the anthropogenic emissions including aerosols; data from Myhre et al. (2001). For reference, current (2012) \(R_{{FCO_{2} }}\) is estimated as ≈1.9 W/m2. The slopes and correlation coefficients are: 1.79 and 0.997 (top) and 0.645 and 0.944 (bottom)

Concentrating on the total GHG radiative forcing (RF,GHG) as well as the total anthropogenic RF (including aerosols, RF,anth) we present Fig. 2b. We see that \(R_{{F,{\text{CO}}_{2} }}\) and RF,GHG are closely related with regressions yielding:
$$R_{F,GHG} = - 0.190 \pm 0.019 + (1.793 \pm 0.027)R_{{F,CO_{2} }}$$
(as in Fig. 2a, \(r_{{RFCO_{2} RF,GHG}} = \, 0.997\)) so that \(R_{{F,{\text{CO}}_{2} }}\) may be considered “enhanced” by the other GHG by ≈79 %. Although ozone, biomass and other effects contribute, the main additional contribution—and uncertainty—in the total anthropogenic RF,anth, is from the direct and indirect cooling effects of aerosols, and is still under debate. Recent estimates (for both effects) are ≈−1.2 (AR4), −1.0 W/m2, (Myhre 2009) and ≈−0.6 W/m2, (Bauer and Menon 2012) (all with large uncertainties). Using the Mauna Loa estimate for \(\uprho_{{CO_{2} }}\) in 2012 (393.8 ppm,, these estimates can be compared to ≈1.9 W/m2 for CO2 and ≈3.1 W/m2 for all GHG (the above relation). Using the RF,anth data in Myhre et al. (2001) we obtain:
$$R_{F,anth} = 0.034 \pm 0.033 + (0.645 \pm 0.048)R_{{F,CO_{2} }}$$
with \(r_{{CO_{2} ,anth}} = \, 0.944\) (Fig. 2b). This is tantamount to assuming −1.5 W/m2 for aerosol cooling at the end of the (Myhre et al. 2001) series (1995). If the most recent cooling estimates (Bauer and Menon 2012) are correct (−0.6 W/m2), the amplitude of the cooling is diminished by 60 %, so that in Eq. 4 we obtain a proportionality constant ≈1.25 rather than 0.645.

2.3 The instrumental data and the effective climate sensitivity

If we take \(R_{{F,CO_{2} }}\) to be a well-measured linear surrogate for RF,anth (i.e. \(T_{anth} \propto R_{{F,CO_{2} }}\)) we can define the “effective” climate sensitivity λ to a doubling of CO2 by:
$$T_{anth} (t) =\uplambda_{{2xCO_{2} ,eff}} \log_{2} \left( {\uprho_{{CO_{2} }} (t)/\uprho_{{CO_{2} ,pre}} } \right)$$
In order to empirically test Eq. 1, it therefore suffices to perform a regression of Tglobe (t) against \(\log_{2} \left( {\uprho_{{CO_{2} }} (t)/\uprho_{{CO_{2} ,pre}} } \right)\); the slope yields \(\uplambda_{{2{\text{xCO}}_{2} ,{\text{eff}}}}\) and the residues Tnat(t). As mentioned above, empirical estimates of the annually, globally averaged surface temperatures do not perfectly agree with each other, the differences between the series may be used to quantify the uncertainty in the estimates. For example, in this analysis, we used data over the period 1880–2008 from three sources: the NOAA NCDC (National Climatic Data Center) merged land, air and sea surface temperature dataset (abbreviated NOAA NCDC below), on a 5° × 5° grid (Smith et al. 2008), the NASA GISS (Goddard Institute for Space Studies) dataset (Hansen et al. 2010) (from 1880 on a 2° × 2°) and the HadCRUT3 dataset (Rayner et al. 2006) (on a 5° × 5° grid), and as mentioned earlier, these series only agree to within about ±0.03 K even at centennial scales. There are several reasons for the differences: HadCRUT3 is a merged product created out of the HadSST2 Sea Surface Temperature (SST) dataset and its companion dataset of atmospheric temperatures over land, CRUTEM3 (Brohan et al. 2006). Both the NOAA NCDC and the NASA GISS data were taken from; the others from The NOAA NCDC and NASA GISS are both heavily based on the Global Historical Climatology Network (Peterson and Vose 1997), and have many similarities including the use of sophisticated statistical methods to smooth and reduce noise. In contrast, the HadCRUT3 data are less processed, with corresponding advantages and disadvantages. Analysis of the space–time densities of the measurements shows that they are sparse (scaling) in both space and time (Lovejoy and Schertzer 2013). Even without other differences between the data sets, this strong sparseness means that we should not be surprised that the resulting global series are somewhat dependent on the assumptions about missing data.
The mean and standard deviation of the Tglobe(t) series is shown in Fig. 3a as functions of \(\log_{2} \left( {\uprho_{{CO_{2} }} (t)/\uprho_{{CO_{2} ,pre}} } \right)\); the result is indeed quite linear with slope equal to the effective climate sensitivity to CO2 doubling. We find:
$$\uplambda_{{2{\text{x}},{\text{CO}}_{2} ,{\text{eff}}}} = 2.33 \pm 0.22\,{\text{K}}$$
(note that for the northern hemisphere only, \(\uplambda_{{2{\text{x}},{\text{CO}}_{2} ,{\text{eff}}}} = 2.59 \pm 0.25\,{\text{K}}\) so that hemispheric differences are not very large). For 5 year averages for 1880–2004 (the CO2 from the reconstruction) and 1959–2004 (using the mean of the instrumental Mauna Loa and Antarctica CO2), the correlation coefficients are respectively \(r_{{RFCO_{2} ,T}}\) = 0.920, 0.968. Note that this simple direct estimate of \(\uplambda_{{2{\text{x}},{\text{CO}}_{2} }}\) can be compared with several fairly similar but more complex analyses (notably multiple regressions which include CO2), see Lean and Rind (2008), Muller et al. (2013). By use of the proportionality constants between RF,anth and \(R_{{F,CO_{2} }}\) we can estimate the effects of a pure CO2 doubling. For the strongly cooling aerosols (Myhre et al. 2001) we obtained 0.645 (Eq. 4) whereas for the weakly cooling (Bauer and Menon 2012), aerosols we obtained 1.25. These lead to the pure CO2 doubling estimates \(\uplambda_{{2{\text{x}},{\text{CO}}_{2} ,{\text{pure}}}}\) = 3.61 ± 0.34 and 1.86 ± 0.18 K respectively.
Fig. 3

a The mean global temperature estimated from NASA-GISS, NOAA NCDC, HADCrut3 data bases as a functions of the logarithm of the mean CO2 concentration from Frank et al. (2010). The dashed lines represent the one standard deviation variations of the three series at 1 year resolution, the thick line is the mean with a 5 year running average. Also shown is the linear regression with the effective climate sensitivity to CO2 doubling: 2.33 ± 0.22 K. b Five year running average of the average temperature. The brown line is the estimate of Tanth(t) from Eq. 6 with \(\uplambda_{{2{\text{xCO}}_{2} }}\) = 2.33 and the difference (residue) is the estimate of the natural variability Tnat(t). Also shown is the regression of the latter with time (straight line) as well the overall estimates ΔTanth = 0.85 ± 0.08 for the unlagged relation and the overall range ΔTglobe,range = 1.04 ± 0.03 K which presumably bounds ΔTanth. c The comparison of the mean global temperature series (red), one standard deviation limits (dashed, all from the three surface series discussed above, all with a 5 year running average), compared with the unlagged (brown, corresponding to a) and 20 year lagged (blue) estimates obtained from \(\log_{2}\uprho_{{CO_{2} }}\) versus Tglobe regressions as discussed in the text

If we plot the temperatures in the usual way as functions of time, we obtain Fig. 3b, c where we also show the anthropogenic contribution estimated with \(\uplambda_{{2{\text{x}},{\text{CO}}_{2} ,{\text{eff}}}}\) from Eq. 6 and Tanth from Eq. 5. It follows the temperatures very well, and we can already see that the residues (Tnat(t)) are fairly small. Using these estimates of the anthropogenic contribution, we can estimate the total change in temperature as Tanth = 0.85 ± 0.08 over the entire industrial period (see the discussion below). Note that the same methodology can be used to analyze the postwar cooling and the recent “pause” in the warming; this is the subject of current work in progress.

2.4 The time Lagged sensitivities

It may be objected that the most immediate consequence of RF is to warm the oceans (Lyman et al. 2010) so that we expect a time lag between the forcing and atmospheric response, for example, with GCM’s (Hansen et al. 2005) finds a lag of 25–50 years, and (Lean and Rind 2008) empirically find a lag of 10 years (of course, the situation is not quite so simple due to feedbacks). By considering the time lagged cross correlation between \(R_{{F,CO_{2} }}\) and Tglobe (Fig. 4) it is found that the cross correlations are so high (with maximum 0.94) that the maximally correlated lag is not well pronounced. To clarify this, we also calculated the corresponding curves for the cross correlation of the temperature fluctuations (ΔT, differences) at a 5 year resolution. The fluctuations are more weakly correlated than with the temperatures themselves so that this is a bit more sensitive to varying lags. In all cases, we can see that the maximum is roughly between a lag of zero and 20 years. However, the effective climate sensitivity to doubling CO2 increases from 2.33 ± 0.22 (zero lag) to 3.82 ± 0.54 with a 20 year lag (see Fig. 3c for a comparison with the zero lag anthropogenic and empirical global temperatures). If we use a Bayesian approach and assign equal a priori probabilities to all the lags between zero and 20 years, then we obtain the estimate \(\uplambda_{{2{\text{x}},{\text{CO}}_{2} ,{\text{eff}}}}\) = 3.08 ± 0.58 K which is (unsurprisingly) quite close to the 10 year lag value (Fig. 4). Note that we could use a general linear relation between forcings and responses using Green’s functions, but this would require additional assumptions and is not necessary at this point.
Fig. 4

The green curve is the cross correlation coefficient of the lagged \(R_{{FCO_{2} }}\) [from the CO2 reconstruction of Frank et al. (2010)] and the global mean temperatures (averaged at 5 year resolution) with dashed lines indicating one standard deviation variations (as estimated from the three global mean temperature series). As can be seen, the cross correlations are so high that the maximally correlated lag is not well pronounced. To bring out the maximum more clearly, we also calculated (red) the corresponding curves for the cross correlation of the fluctuations (differences) of 5 year averages. We can see that the maximum is roughly between zero and lag 20 years. However, the effective climate sensitivity to doubling CO2 (purple, divided by 10) increases from 2.33 ± 0.22 (zero lag) to 3.82 ± 0.54 with a 20 year lag

2.5 Effective and equilibrium climate sensitivities

Our estimate of \(\uplambda_{{2{\text{x}}{\text{CO}}_{2} ,{\text{eff}}}}\) has the advantage of being not only independent of GCM’s, but also with respect to assumptions about radiative transfer, historical (non CO2) GHG and aerosol emission histories. However, \(\uplambda_{{2{\text{x}}{\text{CO}}_{2} ,{\text{eff}}}}\) is an “effective” sensitivity both because it uses CO2 as a surrogate for all the anthropogenic RF, and also because it is not a usual “equilibrium climate sensitivity” defined as “the equilibrium annual global mean temperature response to a doubling of equivalent atmospheric CO2 from pre-industrial levels” (AR4). Since only GCM’s can truly attain “equilibrium” [and this only asymptotically in a slow power law manner (Lovejoy et al. 2013a)], this climate sensitivity is really a theoretical/model concept that can at best only be approximated with real world data. From an empirical point of view, whereas the effective climate sensitivity is the actual sensitivity to our current (uncontrolled) experiment, the equilibrium and transient sensitivities are the analogues for various (impractical) controlled experiments.

Because of the differences in the definitions of climate sensitivity, it would be an exaggeration to claim that we have empirically validated the GCM based results, even though our value \(\uplambda_{{2xCO_{2} ,eff}}\) = 3.08 ± 0.58 (taking into account the uncertainty in the lag) is very close to literature values (c.f. the AR5 range 1.5–4.5 K, the AR4 range 2–4.5 K, and the value 3 ± 1.5 K adopted by the National Academy of Sciences (1979) and the AR1–3 reports). It is not obvious whether effective or equilibrium sensitivities are more relevant for predicting the temperature rise in the twenty-first century.

3 Statistical analysis

3.1 The stationarity of the residuals Tnat and comparison with the pre-industrial Tnat

While the linearity of Fig. 3a, c is encouraging (even impressive), its interpretation as representing an anthropogenic component is only credible if the residuals (Tnat(t)) have statistics very similar to those of Tglobe in pre-industrial epochs (when Tanth = 0) so that as hypothesized in Eq. 1, they could all be realizations of the same stochastic process. As a first confirmation of this, in the top two curves of Fig. 5 we plot both Tglobe and Tnat estimated from the residuals \((\text{i.e.}\ T_{nat} (t) = T_{globe} (t) -\uplambda_{{2xCO_{2}, eff}} \log_{2} ( {\uprho_{{CO_{2} }} (t)/\uprho_{{CO_{2} ,pre}} } ))\). Even without any formal statistical analysis, we see—as expected—that whereas Tglobe is clearly increasing, Tnat is roughly flat. However, for Eq. 1 to be verified, we also require that the residuals have similar statistics to the preindustrial fluctuations when Tanth = 0 and Tglobe = Tnat. In order to establish this, we must use multiproxy reconstructions which are the only source of annual resolution preindustrial global scale temperatures.
Fig. 5

The three lower curves are the means of the three multiproxies discussed in the text over three consecutive 125 year periods starting in the year 1500 with their standard deviations indicated. Each segment had its overall mean removed and was displaced by 0.3 K in the vertical for clarity. The fourth curve from the bottom is from the (unlagged) residuals with respect to the CO2 regression in Fig. 3a (1880–2004). The top (dashed) curve is the annual resolution mean temperature. Whereas the curves from the three multiproxy epochs are quite similar to the residuals in the recent epoch, the actual recent epoch temperature shows a fairly systematic increase

Following the analysis in Lovejoy and Schertzer (2012a), the more recent (mostly post 2003) multiproxies (those developed after 2003) were argued to be more faithful to the low frequency (multicentennial) variability. In particular, when compared to ice core paleotemperatures the low frequencies in Huang (2004), Moberg et al. (2005) and Ljundqvist (2010) were found to be more realistic with fluctuations starting to increase in amplitude for Δt >≈ 100 years (preindustrial). However, one of these series (Ljundqvist 2010) was at 10 year resolution and was not suited for the present study which required annual series. It was therefore replaced by the Ammann and Wahl (2007) update of the original (Mann et al. 1998) reconstruction which although having somewhat smaller multicentennial variability was statistically not too different (see Fig. 6 for a comparison of the probability distributions of the differences at lags of 1 year). This shows that at 1 year resolution, fluctuations from the different multiproxies have nearly the same probability distributions although with slightly different amplitudes (c.f. the left–right shift on the log–log plot). Changes in the amplitude arise due to varying degrees of spatial averaging so that—given the different types and quantities of data contributing to each multiproxy—these amplitude differences are not surprising (see Lovejoy and Schertzer 2013). In the figure we also see the residuals of the unlagged estimate of Tnat. At this scale the residuals have slightly larger variability (see the comparison of the standard deviations as functions of scale in Fig. 7), although after Δt ≈ 4 years, it falls within the epoch to epoch variations of the mean of the multiproxies.
Fig. 6

The temperature differences for Δt = 1 year for the three multiproxies (red, 1500–1900) compared with the (unlagged) residuals from Fig. 1. “Pr” indicates PrT > s) which is the probability that a random temperature difference ΔT exceeds a fixed threshold s. The smooth curves are the Gaussians with the same standard deviations. We see that the multiproxies are quite close to each other—although with some small variations in amplitude—about 10 % between each curve—but not much in shape
Fig. 7

The root mean square difference fluctuations for the mean of the three global surface series [top right, magenta, 1880–2004; from Lovejoy and Schertzer (2012a)]; in the notation of Sect. 3; σΔt. The corresponding (long blue) curve is for the northern hemisphere multiproxies from 1500 to 1900 and the dashed lines show the one standard deviation error bars estimated from the three 125 year epochs indicated in Fig. 5 indicating the epoch to epoch variability. For periods less than about 10 years the fluctuations are roughly the same so that there is no significant difference in the northern hemisphere multiproxies and the global instrumental series. Their divergence beyond 10 years is due to global warming in the recent period

We can now make a first comparison between the industrial epoch residuals and the pre-industrial anomalies; see the bottom three curves in Fig. 5. To mimick the 125 year industrial period, the multiproxies were divided into 3 × 125 pre-industrial periods (1500–1624, 1625–1749, 1750–1875) as shown, each with its overall mean removed. We see that while the industrial epoch temperatures increase strongly as functions of time, that the amplitudes and visual appearances of the residuals and the multiproxies are strikingly similar.

We now turn to the problem of making this similitude quantitative. The traditional way to characterize the variability over a wide range of scales is by spectral analysis. It is typically found that climate spectra are dominated by red noise “backgrounds” and over wide ranges, these are roughly power laws (scaling) indicating that over the range, there is no characteristic scale and (in general) that there are long range statistical dependencies (e.g. correlations; see Lovejoy 2014 for recent overview and discussion). However spectral analysis has disadvantages, the most important of which is that its interpretation is not as straightforward as  for real-space alternatives. This has lead to the development of wavelets and other methods of defining fluctuations [e.g. Detrended Fluctuation Analysis (Peng et al. 1994)]. However Lovejoy and Schertzer (2012b) shows that the simple expedient of defining fluctuations over intervals Δt by the differences in the means over the first and second halves of the interval (“Haar fluctuations”) is particularly advantageous since unlike differences—which on (ensemble) average do not decrease—Haar fluctuations can both increase and decrease. The critical distinction between increasing and decreasing fluctuations corresponds to a spectral exponent greater or less than β = 1 (ignoring small intermittency corrections). In regions where the Haar fluctuations increase they are proportional to differences, in regions where they decrease, they are proportional to averages so that the interpretation is very straightforward.

3.2 Fluctuation analysis of the industrial residuals and preindustrial multiproxies

In Fig. 7, first note the comparison of the RMS difference fluctuations of the three surface series (1880–2008) with those of the three multiproxies (1500–1900). Up until Δt ≈ 10 years they are quite close to each other (and slowly decreasing), then they rapidly diverge with the RMS preindustrial differences (σΔt) remaining roughly constant (σΔt ≈ 0.20 ± 0.03) until about 125 years. Figure 8 shows the corresponding figure for the Haar fluctuations. Again we find that the industrial and preindustrial curves are very close up to ≈10 years followed by a divergence due to the high decadal and longer scale industrial period variability. Note that the preindustrial Haar fluctuations decrease slowly until ≈125 years. When we consider the RMS residuals we find they are mainly within the one standard deviation error bars of the epoch to epoch multiproxy variability so that as predicted (Eq. 1) removing the anthropogenic contribution gives residuals Tnat with statistics close to those of the pre-industrial multiproxies (Fig. 8).
Fig. 8

The RMS Haar fluctuations for the surface series (magenta, top) and the multiproxies from 1500 to 1900 (long, thick green) with the green straight lines showing (roughly) the one standard deviation error bars estimated from the three 125 year epochs (1500–1624, 1625–1749, 1750–1874) indicated in Fig. 5. The difference between the preindustrial multiproxies and industrial epoch surface temperatures is due to global warming. These are compared with the residuals from 1880 to 2004 obtained after subtracting the anthropogenic contribution obtained from the regression in Fig. 3a (thin black line), from the corresponding residuals for a 20 year lag between forcing and temperature (thick black line), and for a linear CO2 concentration versus temperature relation (dashed line). Both the lagged and unlagged \(\log_{2}\uprho_{{CO_{2} }}\) residuals are generally within the one standard deviation limits, although the 20 year lagged residuals are a little closer to the mean

For the (preindustrial) multiproxies we see that between ≈10 and 125 years, the RMS differences are ≈constant, this is expected due to the slight decrease of the Haar fluctuations (Fig. 8) over this range, see the “Appendix” for a discussion. The solid line at the right (at scales > 125 years) has a slope ≈0.4; it shows the increase in the variability in the climate regime. From the graph at 125 years the RMS difference may be estimated as 0.20 ± 0.03 K.

The Haar fluctuations were multiplied by a “calibration” factor = 2 so that they would be close to the difference fluctuations (Fig. 7). Note that a straight line slope H corresponds to a power law spectrum exponent 1 + 2H so that a flat line has spectrum E(ω) ≈ ω−1, and hence long range statistical dependencies (for comparison Gaussian white noise has slope −0.5). The roughly log-log linear decline of the multiproxy variability to about Δt ≈ 125 years is the (fluctuation cancelling, decreasing) macroweather regime, the rise beyond it, the “wandering” climate regime (Lovejoy 2013).

3.3 Estimating the probability that the warming is due to natural variability

Regressing \(R_{{F,{\text{CO}}_{2} }}\) against the global mean temperature leads to satisfactory results in the sense that the residuals and preindustrial multiproxies are plausibly realizations of the same stochastic process. However, this result is not too sensitive to the exact method of estimating Tanth and Tnat—the 20 year lagged residuals are a bit better although using simply a linear regression of Tglobe against time is substantially worse; see Fig. 8. From the point of view of determining the probability that the warming is natural, the key quantity is therefore the total anthropogenic warming ΔTant = Tant(2004) − Tant(1880). Using the \(\log_{2}\uprho_{CO_2}\) method (Fig. 3a) we find ΔTanth ≈ 0.85 ± 0.08 K and with a 20 year lag ≈0.90 ± 0.13 K (the zero lag northern hemisphere value is 0.94 ± 0.09 K). With a Bayesian approach, assuming equal a priori probabilities of any lag between 0 and 20 years, we obtain ΔTanth ≈ 0.87 ± 0.11; for comparison, for the linear in time method, we obtain ≈0.75 ± 0.07 K (essentially the same as the AR4 estimate which used a linear fit to the HadCRUT series over the period 1900–2004). We can also estimate an upper bound—the total range ΔTglobe,range = MaxTglobe) ≈ 1.04 ± 0.03 K so that (presumably) ΔTanth < ΔTglobe,range.

We now estimate the probability distribution of temperature differences from the multiproxies first over the shorter lags with reliable estimates of extremes (up to Δt = 64 years, Fig. 9), and then using the scaling of the distributions and RMS fluctuations to deduce the form at Δt = 125 years, (see the “Appendix”). We find the 125 year RMS temperature difference \(\left\langle {\Delta T(125)^{2} } \right\rangle^{1/2} =\upsigma_{125} = 0.20 \pm 0.03\,{\text{K}}\) (Fig. 7). Theoretically, spatial and temporal scaling are associated with probabilities with power law “fat” tails (i.e. PrT > s) ≈ sqD for the probability of a fluctuation exceeding a threshold s; qD is an exponent), hence in Fig. 10 we compare qD = 4, 6 and qD = ∞ (a pure Gaussian). We see that the former two values bracket the distributions (including their extremes) over the whole range of large fluctuations (the extreme 3 %).
Fig. 9

This shows the total probability of random absolute pre 1900 temperature differences exceeding a threshold s (in K), using all three multiproxies to increase the sample size (compare this to Fig. 6 which shows that the distribution are very similar in form for each of the multiproxies). To avoid excessive overlapping, the latter were compensated by multiplying by the lag Δt (in years, shifting the curves to the right successively by log102 ≈ 0.3), the data are the pooled annual resolution multiproxies from 1500 to 1900. The blue double headed arrow shows the displacement expected if the difference amplitudes were constant for four octaves in time scale (corresponding to negative H for Haar fluctuations, H = 0 for differences, see Fig. 7 for the standard deviations each octave is indicated by a vertical tick mark on the arrow). The (dashed) reference curves are Gaussians with corresponding standard deviations and with (thin, straight) tails (Pr ≈< 3 %) corresponding to bounding s−4 and s−6 behaviors
Fig. 10

The probability of anthropogenic warming by ΔTanth as functions of the number of standard deviations for the five cases discussed in the text. Also shown for reference is the equivalent temperature fluctuation using the mean standard deviation at 125 years. The vertical sides of the boxes are defined by the one standard deviation limits of ΔTanth/σ, the horizontal sides by the qD = 4 (upper) and qD = 6 (lower) limits; the middle curve (qD = 5) is the mean (and most likely) exponent. The classical statistical hypothesis (Gaussian, corresponding to qD = ∞) is indicated for reference. The AR4 ΔTanth = 0.74 ± 0.18 is indicated by the thick red line and using \(\log_{2}\uprho_{{CO_{2} }}\) as a surrogate for the RF followed by linear regression (ΔTanth = 0.85 ± 0.08; the AR5 value for 1880–2012 is 0.85 ± 0.20) is shown in the filled orange box. The other cases are shown by dashed lines: \(\log_{2}\uprho_{{CO_{2} }}\) but with a 20 year lag, linear regression of Tglobe against time and the upper bound on ΔTanth = 1.04 ± 0.03

Stated succinctly, our statistical hypothesis on the natural variability is that its extreme probabilities (Pr < 3 %) are bracketed by a modified Gaussian with qD between 4 and 6 and with standard deviation (and uncertainties) given by the scaling of the multiproxies in Fig. 7: σ125 = 0.20 ± 0.03 K. For large enough probabilities (small s), the modified Gaussian is simply a Gaussian, but below a probability threshold (above a critical threshold sqD) the logarithmic slope is equal to −qD; i.e. it is a power law (see the “Appendix” for details). With this, we can evaluate the corresponding probability bounds for various estimates of ΔTanth. These probabilities are conveniently displayed in Fig. 10 by boxes. For example, the AR4 ΔTanth = 0.74 ± 0.18 K (thick red box) yields a probability (p): 0.009 % < p < 0.6 % whereas the (unlagged) \(\log_{2}\uprho_{{CO_{2} }}\) regression (filled red box) yields 0.0009 % < p < 0.2 % and the 20 year lag (dashed blue) yields 0.002 % < p < 0.2 %, the northern hemisphere yields 0.009 % < p < 0.1 % with most likely values (using qD = 5) of 0.08, 0.08, 0.03, 0.03 % respectively. In even the most extreme cases, the hypothesis that the observed warming is due to natural variability may be rejected at confidence levels 1 − p > 99 %, and with the most likely values, at levels >99.9 %. The other cases considered do not alter these conclusions (Fig. 10).

4 Conclusions

Two aspects of anthropogenic global warming are frequent sources of frustration. The first is the lack of a quantitative theory of natural variability with which to compare the observed warming ΔTanth, the second is the near exclusive reliance on GCM’s to estimate it. In this paper we have argued that since ≈1880, anthropogenic warming has dominated the natural variability to such an extent that straightforward empirical estimates of the total warming can be made. The one favoured here—using CO2 radiative forcing (RF) as a surrogate for all anthropogenic RF—gives both effective sensitivities \(\uplambda_{{2xCO_{2} ,eff}}\) and total anthropogenic increases ΔTanth (3.08 ± 0.58 and 0.87 ± 0.11 K) comparable to the AR4, AR5 estimates (1.5–4.5 K and 0.74 ± 0.18 K for the slightly shorter period 1900–2005). The method was justified because we showed that over a wide range of scales, the residuals have nearly the same statistics as the preindustrial multiproxies. An additional advantage of this approach is that it is independent of many assumptions and uncertainties including radiative transfer, GCM and emission histories. The main uncertainty is the duration of the lag between the forcing and the response.

Whether one estimates ΔTanth using the empirical method proposed here, or using a GCM based alternative, when ΔTanth is combined with the scaling properties of multiproxies we may estimate the probabilities as functions of time scale and test the hypothesis that the warming is due to natural variability. Our statistical hypothesis—supported by the multiproxy data—is that due to the scaling—there are long range correlations in the temperature fluctuations coupled with nonclassical “fat tailed” probability distributions which bracket the observed probabilities. Both effects lead to significantly higher probabilities than would be expected from classical “scale bound” (exponentially decorrelated) processes and/or with “thin” (e.g. Gaussian or exponential) tails. However, even in the most extreme cases, we are still able to reject the natural variability hypothesis with confidence levels >99 %—and with the most likely values—at levels >99.9 %. Finally, fluctuation analysis shows that the variability of the recent period solar forcing was close to preindustrial levels (at all scales), and that volcanic forcing variabilities were a factor ≈2 times weaker (at all scales), so that they cannot explain the warming either.

In the AR5, the IPCC estimated our confidence in the truth of the anthropogenic warming hypothesis as 95–100 %. While our new result is easily compatible with this, it is really more complementary than equivalent. Whereas the IPCC focuses on determining how much confidence we have in the truth of anthropogenic warming, the approach outlined here estimates our confidence in the falsity of natural variability. But there is a fundamental asymmetry: whereas no theory can ever be proven to be true beyond a somewhat subjective “reasonable doubt”—a theory can effectively be disproven by a single decisive experiment. In the case of anthropogenic warming, our confidence is based on a complex synthesis of data analysis, numerical model outputs and expert judgements. But no numerical model is perfect, no two experts agree on everything, and the IPCC confidence quantification itself depends on subjectively chosen methodologies. In comparison, our approach makes no use of numerical models nor experts, instead it attempts to directly evaluate the probability that the warming is simply a giant century long natural fluctuation. While students of statistics know that the statistical rejection of a hypothesis cannot be used to conclude the truth of any specific alternative, nevertheless—in many cases including this one—the rejection of one greatly enhances the credibility of the other.


P. Dubé, the president of the Quebec Skeptical Society, is thanked for helping to motivate this work. An anonymous reviewer of an earlier version of this paper is thanked for the opinion that a GCM free approach to anthropogenic warming cannot work, concluding: “go get your own GCM”.This work was unfunded, there were no conflicts of interest.

Copyright information

© Springer-Verlag Berlin Heidelberg 2014