# A new statistical approach to climate change detection and attribution

- First Online:

- Received:
- Accepted:

DOI: 10.1007/s00382-016-3079-6

- Cite this article as:
- Ribes, A., Zwiers, F.W., Azaïs, JM. et al. Clim Dyn (2017) 48: 367. doi:10.1007/s00382-016-3079-6

## Abstract

We propose here a new statistical approach to climate change detection and attribution that is based on additive decomposition and simple hypothesis testing. Most current statistical methods for detection and attribution rely on linear regression models where the observations are regressed onto expected response patterns to different external forcings. These methods do not use physical information provided by climate models regarding the expected response magnitudes to constrain the estimated responses to the forcings. Climate modelling uncertainty is difficult to take into account with regression based methods and is almost never treated explicitly. As an alternative to this approach, our statistical model is only based on the additivity assumption; the proposed method does not regress observations onto expected response patterns. We introduce estimation and testing procedures based on likelihood maximization, and show that climate modelling uncertainty can easily be accounted for. Some discussion is provided on how to practically estimate the climate modelling uncertainty based on an ensemble of opportunity. Our approach is based on the “*models are statistically indistinguishable from the truth*” paradigm, where the difference between any given model and the truth has the same distribution as the difference between any pair of models, but other choices might also be considered. The properties of this approach are illustrated and discussed based on synthetic data. Lastly, the method is applied to the linear trend in global mean temperature over the period 1951–2010. Consistent with the last IPCC assessment report, we find that most of the observed warming over this period (+0.65 K) is attributable to anthropogenic forcings (+0.67 \(\pm\) 0.12 K, 90 % confidence range), with a very limited contribution from natural forcings (\(-0.01\pm 0.02\) K).

### Keywords

Detection Attribution Climate change Optimal fingerprint## 1 Introduction

Detection and attribution of climate change has received much attention over the last two decades because it seeks to assess whether recent observed changes are consistent with internal climate variability only, or are consistent with the expected response to different combinations of external forcings and internal variability (Santer et al. 1995; Mitchell et al. 2001; Hegerl et al. 2007; Bindoff et al. 2013). The IPCC’s definitions of these notions have partly varied over time, particularly in order to be more suitable to all IPCC working groups. The definitions used in the 5th assessment report (IPCC 2013, thereafter AR5) were taken from the IPCC guidance paper on detection and attribution by Hegerl et al. (2010), and were stated as follows. “Detection of change is defined as the process of demonstrating that climate [...] has changed in some defined statistical sense, without providing a reason for that change. [...] Attribution is defined as the process of evaluating the relative contributions of multiple causal factors to a change or event with an assignment of statistical confidence”. The definition for attribution that was previously used by WGI in the AR4 (IPCC 2007, thereafter AR4), was a bit more precise and mentioned three required conditions. A change Y was attributed to a forcing or a combination of forcings X if Y was detectable, consistent with the expected response to the cause X, and inconsistent with alternative, physically plausible causes. Note that “consistent with” may be understood in different ways that will be discussed further later. Taken together, these two definitions imply that detection and attribution (D&A) requires some knowledge of the statistical properties of the internal (unforced) climate variability and of the expected response to one or several external forcings. They also imply that D&A relies on statistical inference approaches.

Several statistical models have been used in D&A. Over the past two decades, the most commonly used method has relied on linear regression models where observations are regressed onto expected responses to external forcings, with various levels of complexity. Regression based models for D&A were first proposed by Hasselmann (1979), Hasselmann (1993), who introduced the “Optimal Fingerprints” terminology. The links with classical linear regression were further highlighted by Allen and Tett (1999a). More recently, this statistical model was made somewhat more complex so as to account for uncertainty in the estimates of the expected responses due to internal variability (there is uncertainty from this source because the expected responses are estimated from finite ensembles of forced simulations). This is referred to as the Total Least Squares approach (TLS, see Allen and Stott 2003; Ribes et al. 2013).

In all of these studies, a key assumption underlying the use of such a regression approach is that uncertainty in the simulated response to a specified external forcing is dominated by uncertainty in the amplitude of the spatio-temporal pattern of response rather than in the pattern itself. The justification for this assumption is generally that the magnitude of the response is often uncertain while the spatio-temporal pattern of response is more constrained by physical understanding. As an illustration, uncertainties in the amplitude of the response to increasing concentrations of well-mixed greenhouse gases (GHGs) has remained relatively large for more than three decades: about a factor of 2.5 for the transient climate response, and 3 for the equilibrium climate sensitivity (Knutti and Hegerl 2008; IPCC 2013), related to large uncertainty on feedbacks. In contrast, the response to GHGs is robustly expected to be larger over land than over oceans, and to be amplified in the Arctic region. With a very high level of confidence, it is also expected to strengthen with time, as a direct consequence of the historical evolution of emissions and concentrations. Robust, large scale features of response patterns have also been described for other variables—e.g. the temperature of the free atmosphere and some aspects of the precipitation response—and for other forcings, such as natural forcings (e.g. due to the well-known historical occurrence of major volcanic eruptions) and anthropogenic aerosols (as a direct consequence of the location and timing of their emissions). These arguments may justify the assumption that the amplitude of the response is a dominant source of uncertainty, which is subsequently treated as unknown in statistical regression models.

Despite these uncertainties, physical knowledge does provide some information about amplitude of the response to many forcings. As a very naive illustration, the sign of the response, e.g. in terms of the mean surface temperature, is known in many cases. Although uncertainty remains substantial, physical constraints often allow us to discard wide ranges of values for the magnitude of the response to a forcing. Regression based statistical models, by considering the amplitude of the response as being unknown, do not take these constraints into account, although the simulated response magnitudes are subsequently used to interpret the fitted model. This weakness was previously pointed out by Berliner et al. (2000), who argued for a Bayesian treatment. Huber and Knutti (2012) also proposed to incorporate prior knowledge on the magnitude to disentangle contributions from various external forcings. Furthermore, while a few features of the response pattern are well-known qualitatively, uncertainties may stand in the way of their quantification. The land-sea warming ratio, for instance, or the amount of Arctic amplification, vary from one model to another. Many other aspects of the spatial patterns of response also vary considerably among models (e.g. Shin and Sardeshmukh 2011).

Some forcings are quite uncertain, both in magnitude and pattern. For example, aerosols forcing is particularly uncertain (Boucher et al. 2013). Greenhouse gases forcing also has substantial uncertainties if effective radiative forcings are considered rather than radiative forcings (Myhre et al. 2013). The associated time-series have also been shown to be quite uncertain, particularly for aerosols (e.g. Rotstayn et al. 2015), suggesting that it may not be sufficient to represent forcing uncertainty through the use of a scaling coefficient.

In addition to uncertainty on the forcing itself, global scale feedbacks like the water vapour feedback are likely to enhance or reduce the signal everywhere, consistent with a regression framework. In contrast, the cloud feedback, which is the most uncertain feedback in the response to increasing GHGs (Dufresne and Bony 2008), is highly variable over space, and able to induce local and regional changes in the atmospheric circulation (Stevens and Bony 2013), that could contribute substantially to uncertainties on the response patterns.

Further, possible unknown feedbacks within the climate system have been mentioned as factors that contribute uncertainty to the response magnitude. However, if we acknowledge that unknown feedbacks may impact the magnitude of the response as simulated by climate models, then it seems reasonable to expect that these feedbacks may also alter the expected patterns of response as they will probably act differently over different regions—e.g. feedbacks involving specifically the land surface, snow, sea ice, etc.

In terms of D&A, Ribes and Terray (2013) recently reported that inter-model discrepancies in the simulated response patterns were substantial, and potentially large enough to be detrimental to D&A results (see also Jones et al. 2013). This study showed in particular that, as a consequence of the discrepancies in response patterns simulated by different models, the use of regression based methods may lead to nonphysical D&A results such as negative scaling factors - even in cases where the sign of the response is not ambiguous. It therefore suggested that better accounting for physical knowledge could be of interest in D&A, and that climate modelling uncertainty should not be neglected. Note that “climate modelling uncertainty” here includes uncertainties in climate model parameters, and in the representation of physical processes in models, but does not include sampling uncertainty related to internal variability within climate model experiments.

A few approaches have previously been proposed to account for climate modelling uncertainty within a regression framework, using Errors-In-Variables approaches (Huntingford et al. 2006). However, statistical inference is then much more complicated (e.g. maximum likelihood estimates are not explicit), and there are remaining issues in the uncertainty analysis (Hannart et al. 2014). More importantly, if climate modelling uncertainty is explicitly taken into account in the spatio-temporal response pattern, we see no clear reason why the response magnitude should be treated differently. We argue that both the magnitude and the pattern could be treated similarly in order to reflect inter-model uncertainty. In this way, all the information provided by physical models on the response to each forcing could be appropriately taken into account, together with the associated modelling uncertainty. As further illustrated in this work, the climate modelling uncertainty in the estimated response to a given forcing may well be much larger in the magnitude than in the pattern—e.g. it may cover a range of values as large as the factor of 2.5 on the transient sensitivity to GHG forcing that is acknowledged in the IPCC AR5. In such cases, the method we propose will come close to the usual linear regression methods.

The main goal of this study is to describe an alternative to the use of linear regression based statistical models in D&A. This alternative basically proposes a symmetric treatment of the magnitude and the pattern of the response to each forcing. Our method involves simple hypothesis testing to check each of the three conditions mentioned in the IPCC AR4 attribution definition. In particular, we address the important questions such as the consistency between observed and simulated responses, not only in terms of the response magnitudes, but also in terms of the response patterns. We illustrate a few properties of this method and show its efficiency in particular in cases where linear regression is not efficient, e.g. where there are collinear or weak responses.

A second objective of this work is to provide a new statistical framework to deal with climate modelling uncertainty in D&A. This is done by considering the responses simulated by a wide range of climate models, and the corresponding uncertainty, as representing what we know about the physically plausible response to a forcing. Given modelling uncertainty, observations are used to further constrain the response to each forcing. In this way, our method also allows assessment of the contributions of different combinations of external forcings, consistent with the more recent definition of attribution (Hegerl et al. 2010). It is shown that accounting for climate modelling uncertainty in this way leads to a simpler and more accurate statistical treatment than under the EIV approach. Note that if modelling uncertainty is ignored, the method presented in this paper also leads to a simpler and more accurate treatment than under the widely used TLS approach.

An important requirement to deal with climate modelling uncertainty is to be able to estimate it from available ensembles of climate models. This requirement applies whatever the statistical method used. The inference method presented here assumes that such estimates are available. Since this is a challenging task, we provide a brief discussion of how such estimates might be derived. Our estimation is based on the “*models are statistically indistinguishable from the truth*” paradigm, where the difference between any given model and the truth has the same distribution as the difference between any pair of models. However, many different methods might be considered, and their efficiency might depend on the variable under scrutiny, so we provide no definitive recommendation in this respect. In particular, the chosen paradigm may not be appropriate for cases where the models have large common errors and are unable to simulate the pattern or magnitude of a change.

In Sect. 2, we present how an attribution statement may be made based on a single univariate variable if information on the magnitude of the response is considered. Section 3 describes the general statistical framework as well as the proposed inference and hypothesis testing methods. Section 4 deals with a few implementation issues, in particular those related to the estimation of the variance-covariance matrix of climate modelling uncertainty. Section 5 compares our method with traditional linear regression methods, and provides a few illustrations of the method using synthetic data, in order to describe important properties. Lastly, Sect. 6 shows a first application to real-world data by focusing on the linear trend in global mean temperature over the 1951-2010 period.

## 2 D&A based on a single observation of a scalar diagnostic

This section is intended to provide a short illustration of how detection and attribution may be applied to a scalar quantity such as the linear trend in the global mean temperature. Considering one unique variable, response patterns are not really defined (except for the sign of the response), and D&A will only focus on the magnitude of the (observed and simulated) changes. We stress that if more than one causal factor were to be considered, a regression model could not have been used as several regression coefficients cannot be inferred based on a single scalar observation. The discrimination between two forcings, if any, necessarily relies on the magnitude of the observed change. We also argue that such a very simple analysis could be very relevant to determining how much an observation exceeds the range of internal variability, or to assess whether this observation is consistent with an ensemble of simulated responses. It may also be particularly appropriate for ruling out a weak forcing from being a sufficient explanation of an observed change. An illustration of such a single scalar based D&A analysis applied to real-world data is provided in Sect. 6, based in the linear trend in global mean temperature over the 1951-2010 period.

The question of detection, first, is primarily related to assessing whether the observed indicator of climate change, say *y*, is consistent with internal variability only. To address this question, D&A studies usually compare observations with internal variability as simulated from unforced climate models runs (Hegerl and Zwiers 2011). In regression formulations, the regression residuals are interpreted as an observe realization of internal variability, and are additionally compared with model simulated internal variability to assess the ability of the models to simulate such variability (Hegerl and Zwiers 2011). Then considering a pool of unforced control run segments covering the same period, the question comes to a simple comparison of *y* with the same indicator computed from each simulated segment, say \(a_1, \ldots , a_p\). More precisely, the detection question could be rephrased as “Does *y* come from the same population as the \(a_i\)’s?” Assuming a Gaussian distribution for the \(a_i\)’s, which is commonplace at least for mean temperature, this question could be easily addressed with a Student *t* test. Note that we do not provide mathematical details in this section, as the statistical framework will be presented in Sect. 3 under more general assumptions. This simple way of performing detection has been used in several studies, e.g. Santer et al. (2013), Terray et al. (2011) or even much earlier as discussed by Hegerl and North (1997).

Regarding attribution, and based on the IPCC AR4 definition (IPCC 2007), two more questions have to be considered. The first regards the consistency of *y* with the expected response to all forcings considered. To evaluate consistency, climate models are therefore also required to provide estimates of the expected response to all forcings (Hegerl and Zwiers 2011). If we assume that a sample \(b_1, \ldots , b_q\) of simulated responses is available from a pool of climate models, the question, from a statistical point of view, comes to be identical to the previous one, and could be tackled with the same simple tool, i.e. a Student test. Note that here, consistency is meant in a broad sense, i.e. including climate modelling uncertainty. The last question—consistency with alternative, physically plausible causes—demands a third sample of climate model simulations, e.g. only driven by a subset of external forcings. Denoting this sample \(c_1,\dots ,c_r\), the statistical treatment could, again, be the same.

## 3 Statistical framework and inference approach

### 3.1 Statistical model

*i*among the \({n_f}\) forcings considered,

*Y*is the observation, \(\varepsilon _Y\) describes noise in observations, \(X_i\) is the simulated response to forcing

*i*, and lastly, \(\varepsilon _{X_i}\) is the deviation between the simulated and true response to forcing

*i*. Note that all these variables are vectors of the same dimension

*n*. Multiple factors could contribute to \(\varepsilon _Y\), including observational error and internal variability. Multiple factors could also contribute to \(\varepsilon _{X_i}\), including climate model uncertainty, forcing uncertainty and internal variability. The random variables \(\varepsilon _{X_1}, \dots , \varepsilon _{X_{{n_f}}}\) and \(\varepsilon _Y\) are all assumed to be independent and to follow Gaussian distributions with zeros means and known variance-covariance matrices (respectively \({\varSigma }_{X_1},\dots ,{\varSigma }_{X_{n_f}}\), and \({\varSigma }_Y\), hereafter variance matrices). These strong assumptions are further discussed below.

Equation (1) assumes additivity, i.e. that the response to a subset of forcings is the sum of the responses to each forcing taken individually. This strong assumption has been used in most previous D&A methods (see also Shiogama et al. 2013). Equation (2) describes how the external response is altered by noise in the observations *Y*. Noise refers here to both internal variability and measurements errors, possibly including instrumental errors, errors related to observation adjustments and sampling errors associated with the configuration of the observing network and its evolution over time (Brohan et al. 2006; Morice et al. 2012). References to observational uncertainty in this article refer to \(\varepsilon _Y\). Previous studies have suggested that internal variability tends to be the dominant source of observational uncertainty, at least for global near-surface temperature (Jones and Stott 2011). Equation (3) provides a symmetrical representation for climate model output. \(X_i\) is typically the mean response to forcing *i* simulated by a set of climate models. The multimodel mean will be mainly considered, but we refer to Sect. 4 for a comprehensive discussion on deriving a response *X* from a multimodel ensemble. The noise \(\varepsilon _{X_i}\) that contaminates \(X_i\) may be related to both internal variability within the climate model runs and other sources of variation such as forcing and modelling uncertainty, with the latter often being larger than the former. Discussion on how to estimate and combine these two terms is also provided in Sect. 4.

*X*on

*Y*is assumed unknown. Usually, such adjustments are useful as

*X*and

*Y*may not be directly comparable, e.g. may be given with different units, etc. This is obviously not the case in D&A. In particular, the most commonly used approach aims at assessing, in addition to scaling factors, the consistency between model output and observations by testing whether \(\beta\) is consistent with 1. This means that quantities

*X*and

*Y*are considered to be directly comparable. By removing these scaling factors \(\beta\), we assume that the cohort of available climate models can appropriately simulate the response magnitude, with some modelling uncertainty, in addition to the assumption that they are able to appropriately simulate the response patterns, again with some modelling uncertainty. The latter assumption is implicit under the EIV approach. It is also implicit under the TLS approach, but the source of response pattern uncertainty is then limited to internal variability only. Our model thus proposes a more symmetric treatment of uncertainty in the magnitude and the pattern of the change. Note that the climate modelling uncertainty will be estimated from an ensemble of opportunity (e.g. Taylor et al. 2012).

A few important remarks should be made concerning this statistical model.

First, this model may be regarded as a linear Gaussian model. This alternative point of view is discussed in Appendix 8.1, and helps us derive some statistical properties. However, we will still focus on the previous point of view, i.e. (1)–(3), in the following, as the maximization of the likelihood is simpler in this way.

Second, the assumption that the variance matrices \({\varSigma }_Y, {\varSigma }_{X_1}, \dots , {\varSigma }_{X_{n_f}}\) are known is strong. For example, in the single scalar case discussed in Sect. 2 it corresponds to using Gaussian tests instead of Student tests. This strong assumption, however, is consistent with previous attempts to account for climate modelling uncertainty in D&A, in particular Huntingford et al. (2006) and Hannart et al. (2014). Practically, these matrices need to be estimated from multi-model ensembles, as discussed in Sect. 4. Overall, this study proposes a “plug-in” method, where these matrices are estimated first, and then considered as fixed in the main statistical model. Providing a more comprehensive statistical treatment that also accounts for uncertainties in the estimation of these matrices would be very attractive, but is also very challenging and is beyond the scope of this paper. Note that efforts have been made in the standard D&A approach to deal with uncertainty on the covariance matrix related to internal variability (e.g. Allen and Tett 1999b; Ribes et al. 2013; Hannart 2015). It is more challenging here as the larger uncertainty will be, in many cases, related to climate modelling uncertainty, which can only be estimated from a very limited sample of climate models. To our knowledge, previous EIV implementations have used the same plug-in approach.

Fourth, all errors are assumed to follow a Gaussian distribution. Considering non-Gaussian distributions might be very attractive, but is beyond the scope of this paper, although one possibility might be to transform data to bring its distribution closer to being Gaussian.

### 3.2 Inference method

The proposed inference method is based on the method of maximum likelihood (Le Cam 1990). After writing the likelihood function for the model, we derive maximum likelihood estimates and exact confidence intervals are proposed, with no use of asymptotic theory. As all random variables are assumed to follow a Gaussian distribution, we consider the \(-2\) log-likelihood function (i.e. the logarithm of the likelihood, multiplied by \(-2\)) and minimize it, instead of maximizing the likelihood—this makes the calculation easier.

*Y*,

*X*) can be written (up to an additive constant that plays no role) as

### 3.3 Maximum likelihood estimators (MLEs)

This demonstrates that MLEs can be derived explicitly under our model as opposed to the EIV model, where MLEs can only be obtained numerically, with no guarantee that the maximum is actually reached (Hannart et al. 2014) and no possibility to derive their distribution explicitly. Note also that the maximization of this likelihood corresponds to least squares minimization. In this way, the estimators derived above can be of interest even if the Gaussian assumption is not satisfied in (2)–(3).

### 3.4 Distribution of MLEs

*Y*, \(\widehat{X}^*_i\) and \(\widehat{Y}^*\) all follow Gaussian distributions. From (12) and (13), and noting that \({\text {E}}(Y-X)=0\), it is also easy to prove that they are unbiased estimates. Thus their distributions will be fully determined by their variances. The variance of \(\widehat{Y}^*\) can be deduced from (11), leading to

The above equations show that the distribution of the MLEs is explicit under our statistical model, which will allow the computation of exact confidence regions or hypothesis tests, assuming we know the variance matrices. Confidence regions for \(X^*_i\) allow the quantification of uncertainties in the contribution of a particular forcing to the observed changes. In particular, if *Y* is a time series of *n* observations, attributable trends have been commonly used (e.g. Stott et al. 2006; Jones et al. 2013; Gillett et al. 2013; Bindoff et al. 2013) to estimate the relative contributions of several forcings to a change. Within our statistical framework, estimates and uncertainty analysis on trends attributable to a given forcing can be derived respectively from (13) and (17).

Under like-for-like assumptions, similar results on uncertainty analysis are not known under TLS or EIV models. Instead, the computation of confidence intervals on \(\beta\) must be based on approximate results from asymptotic statistics, and may involve some computational challenges. Hannart et al. (2014) further suggested that the use of these asymptotic results in EIV models could lead to confidence intervals that are too permissive, which is not the case here. In addition, uncertainty analysis on attributable trends inferred from TLS or EIV models should in principle involve the computation of confidence regions for \(\widehat{\beta }_i \widehat{X}_i^*\), which is challenging and usually not done.

### 3.5 Hypothesis testing

This subsection describes hypothesis tests that may be used for D&A under our statistical model. Consistent with Sect. 2, we will consider three different tests, corresponding to the three conditions required for attribution: consistency with internal variability only (i.e. detection), consistency with the expected response to all forcings, and consistency with the response to a subset of forcings. We provide a more comprehensive discussion of the tests in “Appendix 8.2”. In particular, all three tests we propose are in a sense “goodness of fit” tests, as is discussed in “Appendix 8.2.1”.

*Y*has to be small. This is usually achieved by a pre-processing of the data. Although preliminary to the main statistical analysis, this pre-processing often focuses the power of the test on the expected change, so the signal comes into consideration indirectly and is in effect not ignored. Such a “low dimension” condition is satisfied in the application presented in Sect. 6.

*X*,

*Y*) are consistent with Model (1)–(3). In other words, we test the goodness of fit to our full statistical model. Denoting \(X=\sum _{i=1}^{{n_f}} X_i\), this test may be implemented based on the minimum log-likelihood given in (14):

*I*is a subset of \(\llbracket 1,{n_f}\rrbracket\). Very similarly to the previous test, this test could be based on

By accounting for information on the magnitude of a change, this test corrects a few deficiencies of the linear regression approach. Under linear regression, this test is usually performed in the presence of at least two forcings \(F_i\), which means that two or more scaling factors \(\beta _i\) are estimated simultaneously. Forcing \(F_1\) is assessed to not be a sufficient explanation if, within, for example, a 2-forcing analysis, \(\beta _2\) is significantly different from 0. Using this procedure, however, two forcings cannot be differentiated if their responses are collinear. Even worse, if the response to the forcing \(F_1\) is very weak, its response pattern will be very uncertain, even if many \(F_1\)-only simulations are available. This will prevent rejecting the hypothesis that “\(F_1\) alone can explain the change”. This is a paradoxical situation because the smallest forcings may be the most difficult to exclude from a list of possible sufficient causes. In such a case, the main information provided by climate model simulations is actually that the magnitude of the response to \(F_1\) is weak, and we argue that this information has to be taken into account.

## 4 Implementation: estimation of *X* and \({\varSigma }_X\)

In the method presented above, the multi-model response *X* and the variance matrix \({\varSigma }_X\) describing the corresponding uncertainty are assumed to be known (as a single forcing is considered, the index *i* is dropped in this section). However, the way a multi-model ensemble of opportunity like CMIP5 may be translated into *X* and \({\varSigma }_X\) is not straightforward, and several approaches might be considered. Before providing illustrations of our new method, we briefly discuss this issue and present one possible way to estimate those quantities.

It is worth noting that a wide literature, and most notably the IPCC Assessment Reports, have addressed the question of quantifying uncertainties in terms of the Earth’s climate sensitivity. However, far fewer studies have provided uncertainty assessments on patterns, i.e. not restricted to one single parameter. Quantifying the uncertainty in the response pattern is necessary to account for climate modelling uncertainty in D&A, whatever the statistical model used, whether EIV or that described in this paper. In particular, previous EIV studies provided only limited discussion on this topic (Huntingford et al. 2006; Gillett et al. 2013; Hannart et al. 2014). Note that other approaches like that by Huber and Knutti (2012) also usually neglect uncertainty on the forcing pattern or time-series.

### 4.1 Paradigms for climate modelling uncertainty

While several methods with various level of complexity could be considered to derive *X* from a multi-model ensemble, this study focuses mainly on the multi-model mean. Empirical evidence from an increasing number of studies suggests that the multimodel mean is a better estimate than responses provided by any individual model (see e.g. Knutti et al. 2010 for a review). Also, the multi-model mean has been used extensively in the IPCC assessments (IPCC 2007, 2013), as well as in many individual studies, making it a well-described variable. Nevertheless, other estimates of *X* might have been considered, e.g. the median, or more generally a trimmed mean (i.e. mean over a subset of climate models, in order to avoid or limit the impact of outlier models on the average). But such techniques may reduce physical consistency across space, as values at different locations may come from different models. Several authors also propose to eliminate some models prior to analysis because of unrealistic behaviour or to seek subsets of models that exhibit adherence to an emergent constraint (e.g. Hall and Qu 2006).

The computation of \({\varSigma }_X\) is more debatable and subject to partly arbitrary choices. Note that we only focus on climate modelling uncertainty and ignore internal variability in this section; the influence of internal variability is discussed in Sect. 4.2. Several paradigms have been used in the literature to compare observations with simulations coming from various models, and more precisely to describe the distance from *X* at which the observations are likely to be found (Annan and Hargreaves 2010; Knutti et al. 2010). Here we only consider the “models are *statistically indistinguishable* from the truth” paradigm. In a Bayesian perspective where the truth is treated as non-deterministic, this paradigm simply assumes that the models and the truth are taken from the same distribution. This paradigm can also be understood in a frequentist perspective, assuming dependence among models, as discussed below. Whatever the point of view, this paradigm assumes that the difference between any given model and the truth has the same distribution as the difference between any pair of models. By considering this paradigm, it is implicitly assumed that the true-world response is somewhere within the distribution of model responses, for the population of models from which the available ensemble of opportunity was drawn. Since this is an assumption about the population of models, it also implies that models not yet observed may lie outside the range of responses seen in the available ensemble of opportunity.

*statistically indistinguishable*paradigm is to assume that the value \(\mu\) differs from the truth \(w^*\) as much as it differs from any independent realization \(w'\). For this reason, we assume that \((\mu -w^*) \sim N\left (0,{\varSigma }_{\text {m}}\right )\). As the multi-model mean \(\overline{w}=\frac{1}{n_m} \sum _{j=1}^{n_m} w_j\) satisfies \((\overline{w}-\mu ) \sim N(0,{\varSigma }_{\text {m}}/n_m)\) and is independent from the latter, it follows that \(\overline{w}\sim N\left ( w^*, (1+1/n_m) {\varSigma }_{\text {m}}\right )\), which corresponds to using \({\varSigma }_X = \left ( 1+1/n_m \right ) {\varSigma }_{\text {m}}\). Note that this paradigm is also equivalent to assuming that

We use this paradigm for two main reasons.

First, this paradigm assumes large uncertainties around the multimodel mean. We argue that it is more appropriate to assume larger rather than narrower modelling uncertainty to provide a conservative statement. Following previous D&A studies, we are primarily interested in deriving an observational constraint, rather than a strong modelling constraint, on the forced responses \(X^*_i\), since we regard the evidence contained by the observations as being paramount. For robustness, we might wish this observational constraint to hold even with an overestimate of modelling uncertainty. This paradigm is also quite pessimistic because it means that no matter how extensively the space of all plausible models is sampled, the multimodel mean will not converge to the truth.

Second, Annan and Hargreaves (2010) and van Oldenborgh et al. (2013) suggest that the *models are statistically indistinguishable from the truth* paradigm is reasonably well supported by observations, although they partly disagree on whether it tends to be overly conservative or not. Notably, this paradigm is better supported by observation than alternatives, in particular the *models centered on the truth* paradigm (e.g. Annan and Hargreaves 2010; Fyfe et al. 2013), which is briefly introduced and discussed in 8.4. Note however that those results are mainly valid for temperature, but might be discussed for other variables such as precipitation.

As a conclusion, the “models are statistically indistinguishable from the truth” paradigm provides a useful framework to compute estimates of climate modelling uncertainty. Basically, this approach assumes that the truth is somewhere within the model envelope, and a possible drawback is that values outside the model range (e.g. in terms of sensitivity) will not be considered. This is the reason why the estimation of climate modelling uncertainty should account for multiple lines of evidence in order to determine whether or not this paradigm might be considered as reliable - this discussion goes beyond this paper. If not, other estimates might be considered, e.g. with an inflated variance, if the ensemble is proven to be under-dispersive, with some models discarded if they are proven unrealistic, or with more specific adjustments. The same conclusion applies if some component of the uncertainty, such as the forcing uncertainty is ignored in the ensemble design. The sensitivity of the results to considering such alternative estimates might also be explored.

Lastly, it may be noted that the EIV approach, via the introduction of unknown scaling factors, is able to cope with an additional uncertainty on the response magnitude. However, this approach still ignores part of the physical knowledge of the response magnitude, and more importantly, we see no evidence why models would be under-dispersive in the magnitude and not in the patterns. Thus some assessment of the paradigm used will also be needed regarding uncertainty in the response pattern.

### 4.2 Estimation of \({\varSigma }_X\) with modelling uncertainty and internal variability

*k*from model

*j*can be decomposed as

*j*, and \(\epsilon _{jk}\) denotes the particular realization of internal variability contained in simulation

*k*from model

*j*. We set \(\epsilon _{jk} \sim N(0,{\varSigma }_{\text {v}})\), assuming the variance matrix \({\varSigma }_{\text {v}}\) to be the same for all models (i.e. doesn’t depend on

*j*), consistent with some previous D&A studies, and \(m_j \sim N(0,{\varSigma }_{\text {m}})\). Within such a framework, uncertainty is related to both internal variability and climate modelling uncertainty, consistent with Huntingford et al. (2006) and Hannart et al. (2014). We further assume these two random terms to be independent, which leads to

While (28) has been obtained in the balanced case where each model performs the same number of simulations \(n_r\), modelling centers usually provide ensembles of various sizes. One possible way to obtain a balanced ensemble would be to consider only a fixed number of simulations from each model. This usually requires a compromise between \(n_m\) and \(n_r\), and tends to exclude some of the available simulations (i.e. not consider all available information). In order to avoid this, we show in “Appendix 8.3” how \({\varSigma }_X\) and \({\varSigma }_{\text {m}}\) may be estimated in the more general unbalanced case (i.e. ensembles of various sizes).

## 5 Properties and several illustrations

This section illustrates a few properties of the proposed method. We first discuss how it compares with and relates to previous linear regression approaches. We then describe the results obtained in a few particular cases, based on synthetic data and with a very small dimension \(n=1, 2\), to illustrate how this method works.

### 5.1 Relationship with linear regression

*X*(usually referred to as Ordinary Least Squares in D&A, following Allen and Tett 1999a), i.e.

*X*is a column vector.

If we consider the term \(X\beta\) as being the response to a forcing \(F_1\), the basic assumption underlying this framework is that the shape of the response to \(F_1\) (i.e. *X*) is perfectly known, while the magnitude of the response to \(F_1\) (i.e. \(\beta\)) is unknown. This assumption may be thought of as defining a very specific uncertainty structure on the \(F_1\) response in (3). The “unknown magnitude” means that uncertainty proportional to the simulated response *X* is large, while uncertainty vanishes in other directions.

*X*, but uncertain knowledge of its amplitude, we may write the variance matrix of

*X*as

*X*. One natural question is to determine whether these two approaches provide consistent results if based on similar assumptions. The answer is yes because, based on (30), and assuming that \(\lambda \rightarrow \infty\), we can show that our method coincides with linear regression. In particular, noting that

Two comments should be mentioned on this topic however. First, comparison might have been performed with a more comprehensive framework for linear regression, such as the EIV approach proposed in Hannart et al. (2014). However, comparison to such a model is much more difficult because maximum likelihood estimates are not explicit. Instead, using OLS as a baseline allows us to make a simple comparison. Second, this result helps to illustrate the main difference between these two possible approaches. While linear regression is equivalent to assuming no error in the shape (\({\varSigma }_X\) is rank-1) and infinite uncertainty in the magnitude (\(\lambda \rightarrow \infty\)), our approach allows a more balanced point of view, basically assuming there is a limited (i.e. finite) amount of uncertainty in both the shape and the magnitude of the response *X*.

### 5.2 Toy examples

We here consider a few practical cases in order to illustrate particular features of our method, in particular with respect to its ability to estimate contributions from individual forcings.

#### 5.2.1 Single scalar analysis \(n=1\)

*Y*on the climate variable \({\text {v}}\). More specifically, we assume estimation of \(X_1\) to be our main concern, and discuss the accuracy of the estimation as a function of uncertainties in both \(X_2\) and

*Y*.

Figure 2 illustrates three typical cases that might be encountered in this configuration. In this figure, we distinguish between two possible ways to derive confidence intervals for the forced contributions \(X_1^*\) and \(X_2^*\). The first, very naive option is to consider only information provided by climate models (in terms of both the mean spread of multimodel ensembles) using (3). Another option is to consider our full statistical model (1)–(3) and thus information provided by both models and observations, which leads to our estimates \(\widehat{X}_1^*\) and \(\widehat{X}_2^*\). Figure 2 describes how these two options may provide different results.

In Fig. 2a, the two contributions \(X_1^*\) and \(X_2^*\) are very uncertain based on multimodel information. As one single observation is not sufficient to separately estimate the contributions from two different forcings, the added value from observations is limited and the resulting confidence intervals on \(X_1^*\) and \(X_2^*\) are only slightly reduced after accounting for the observational constraint. Note that observations (black arrow) fall above the overall forced response as simulated by climate models (magenta arrow). As a consequence, best-estimates \(\widehat{X}_1^*\) and \(\widehat{X}_2^*\) are slightly higher than the first guess \(X_1\) and \(X_2\) simulated by the climate models.

In Fig. 2b, the contribution \(X_2^*\) is rather constrained by model simulations, but observations are dominated by noise, and so the resulting uncertainty on \(X_1^*\) is, again, only slightly reduced. As in panel** a**, the best-estimate \(\widehat{X}_1^*\) is found to have a higher value than the multimodel mean \(X_1\) because the observation *Y* falls above the simulated forced response \(X = X_1+X_2\). However, because observations are noise dominated, only part of this departure \((Y-X)\) is transferred to \(\widehat{X}_1^* - X_1\); the remaining part is actually transferred to \((\widehat{Y}^*-Y)\).

Figure 2c illustrates a more favourable case in which both observations and the contribution from forcing \(F_2\) are well-constrained. In this case, while climate models simulate a relatively uncertain response to forcing \(F_1\) (large pink confidence interval), the use of our statistical model leads to a much better constrained estimate of the response \(X_1^*\) (red confidence interval). This can be understood as a direct consequence of the additivity assumption. Basically, (1) can be re-written \(X_1^* = Y^* - X_2^*\). Because of limited uncertainties (i.e. relatively small \({\varSigma }_Y\) and \({\varSigma }_{X_2}\)), *Y* and \(X_2\) are close to \(Y^*\) and \(X_2^*\), respectively. So computing \(X_1^*\) from the subtraction above allows uncertainties to be reduced substantially. Furthermore, the departure of *Y* from *X* is almost completely transferred to \(\widehat{X}_1^* - X_1\), as the two other terms are much less uncertain. In this toy example, while \(Y-X\) is actually not very large, the resulting confidence interval for \(X_1^*\) doesn’t contain \(X_1\) (which could be considered as a naive first guess).

#### 5.2.2 Analysis when *Y* has dimension \(n=2\)

*x*-axis on Fig. 3). Let \(X^*_{1,1}\) and \(X^*_{2,1}\) be the true responses to \(F_1\) and \(F_2\) respectively on \({\text {v}}_1\), and let \(X^*_{1,2}, X^*_{2,2}\) be the true responses to \(F_1\) and \(F_2\) on \({\text {v}}_2\). In these examples, we assume that uncertainty in the observations

*Y*is relatively small, e.g. because internal variability has somehow been partly filtered out (see Sect. 6 for a concrete illustration). We then discuss how the accuracy of the final estimate of \(X_{1,1}^*\) depends on the structure of the uncertainty on \(X_1\) and \(X_2\), i.e. \({\varSigma }_{X,1}\) and \({\varSigma }_{X,2}\).

In Fig. 3a, for both \(X_1\) and \(X_2\), the uncertainty in the \({\text {v}}_1\) direction is essentially independent from that in the \({\text {v}}_2\) direction, which means that \({\varSigma }_{X,1}\) and \({\varSigma }_{X,2}\) are close to being diagonal. Therefore, these two dimensions \({\text {v}}_1\) and \({\text {v}}_2\) may be considered separately. First, on \({\text {v}}_1\), both \(X_1\) and \(X_2\) are quite uncertain. Consistent with Fig. 2a, the observational constraint is weak and \(X^*_{1,1}\) is poorly estimated. Second, on \({\text {v}}_2\), the situation is closer to Fig. 2c, and \(X_{1,2}^*\) is more accurately estimated. This, however, has little impact on the estimation of \(X_{1,1}^*\).

The case illustrated in Fig. 3b is much different; the simulated response \(X_1\) has the same variance as in panel 3a on both \({\text {v}}_1\) and \({\text {v}}_2\), but there is strong dependence between the two components. The more \(X_1\) deviates from \(X_1^*\) on \({\text {v}}_1\), the more it deviates on \({\text {v}}_2\). Thus, based on (3) only, the confidence region for \(X_1^*\) (pink ellipsoid) is stretched in one particular direction. We chose this direction to be the same as \(X_1\), so that the uncertainty is closer to that which is assumed when linear regression is performed (see Sect. 5.1)—but some constraint on the magnitude of the change is still available from climate models.

In Fig. 3b, under these assumptions, there is a strong observational constraint on estimates of both \(X_{1,1}^*\) and \(X_{2,1}^*\). This can be understood as follows. First, in the \({\text {v}}_2\) direction, the case is close to that of Fig. 3a, and the contribution \(X_{1,2}^*\) of \(F_1\) is well constrained. Second, the assumed shape of the uncertainty on \(X_1\) ensures that this constraint on \({\text {v}}_2\) projects quite clearly on \({\text {v}}_1\), and thus a relatively small confidence interval is found for \(X_{1,1}^*\). Third, this new constraint also allows accurate estimation of the contribution \(X_{2,1}^*\) of forcing \(F_2\) on \({\text {v}}_1\). Finally, thanks to the particular shape of the modelling uncertainty, the observational constraint is relatively strong on each component. This example is also particularly illustrative because, on \({\text {v}}_1\), the observation *Y* falls above the expected total response (*magenta cross*), but the response to \(F_1\) (\(X_{1,1}^*\)) is finally assessed to be smaller than simulated in \(X_{1,1}\). This is due to two facts: (1) on \({\text {v}}_2\), the observation *Y* falls below the expected total response, and (2) the observational constraint on \({\text {v}}_1\) comes from that on \({\text {v}}_2\).

The example of Fig. 3b also illustrates a strong discrepancy with the linear regression approach. Here, the simulated responses \(X_1\) and \(X_2\) are close to being collinear. Discrimination of the responses to \(F_1\) and \(F_2\) would not have been successful based on the linear regression approach, as a direct consequence of this collinearity. In particular the EIV inference method proposed by Hannart et al. (2014) provides unbounded confidence intervals if applied to the same data. Instead, the observational constraint found above is actually a consequence of the strong constraint provided by \(X_2\) on the magnitude of \(X_{2,2}^*\), which is appropriately taken into account here. It may also be noted that while the structure of the uncertainty on \(X_1\) is consistent with a linear regression approach, that of \(X_2\) is not. Our statistical model is also able to appropriately deal with such different cases.

## 6 Application to global mean temperature

As a simple illustration of an application of the method to real data, we consider the global mean warming over the period 1951-2010, as estimated with a linear trend. This period is selected in order to be consistent with the Fig. 10.5 of the IPCC Fifth Assessment Report (IPCC 2013). Based on this example, we illustrate the capabilities of our method, both in terms of hypotheses testing and in terms of estimation of individual forcing contributions. This is done in a 2-forcing analysis (considering both natural and anthropogenic external forcing) of a simple scalar diagnostic, as proposed in Sect. 2. Application of our method to more comprehensive datasets, possibly including time-series, and/or spatial information, is beyond the scope of this methodological paper. It may also be noted that, given the limited number of climate models available worldwide, estimating climate modelling uncertainty variance matrices necessitates working with a reasonably small dimension *n*.

### 6.1 Data

Observed temperature data are taken from the median realization of the HadCRUT4 merged land/sea temperature data set (Morice et al. 2012). We use outputs from unforced pre-industrial control simulations, historical simulations performed with all external forcings combined (ALL) and historical simulations with natural forcings only (NAT), from all available CMIP5 models (see Table 1). These data are available at: http://cmip-pcmdi.llnl.gov/cmip5/. For models providing both ALL and NAT ensembles, the response to anthropogenic forcings (ANT) is computed as the difference between the ALL and NAT responses. As our analysis is based on the 1951-2010 period, we did not consider models providing historicalNat simulations with end dates earlier than 2010. When required, ALL simulations were extended to 2010, either with historicalExt experiments, if available, or with RCP8.5 simulations.

Ensembles of CMIP5 simulations used

Climate model | Nb 60-year | Nb ALL | Nb NAT |
---|---|---|---|

PICTL seg. | Runs | Runs | |

ACCESS1-0 | 9 | 1 | – |

ACCESS1-3 | – | 1 | – |

bcc-csm1-1 | 22 | 3 | 1 |

bcc-csm1-1-m | – | 1 | – |

BNU-ESM | 24 | 1 | – |

CanESM2 | 46 | 5 | 5 |

CCSM4 | 22 | 6 | – |

CESM1-BGC | – | 1 | – |

CMCC-CM | – | 1 | – |

CMCC-CMS | – | 1 | – |

CMCC-CESM | – | 1 | – |

CNRM-CM5 | 47 | 10 | 6 |

CSIRO-Mk3-6-0 | 22 | 10 | 5 |

EC-EARTH | – | 5 | – |

FGOALS-g2 | – | 2 | 2 |

FGOALS-s2 | 22 | 3 | – |

FIO-ESM | – | 1 | – |

GFDL-CM3 | 22 | – | – |

GFDL-ESM2G | 4 | 1 | – |

GFDL-ESM2M | – | 1 | – |

GISS-E2-H | 41 | 5 | 5 |

GISS-E2-R | 70 | 5 | 5 |

HadGEM2-ES | 9 | 4 | 4 |

HadGEM2-CC | – | 1 | – |

inmcm4 | 22 | 1 | – |

IPSL-CM5A-LR | 47 | 4 | 3 |

IPSL-CM5A-MR | – | 1 | – |

IPSL-CM5B-LR | 12 | 1 | – |

MIROC5 | 7 | 4 | – |

MIROC-ESM | – | 1 | – |

MIROC-ESM-CHEM | – | 1 | – |

MPI-ESM-LR | 6 | 3 | – |

MPI-ESM-MR | 22 | 1 | – |

MRI-CGCM3 | 22 | 3 | – |

NorESM1-M | 22 | 3 | 1 |

### 6.2 Results

Figure 4 illustrates results obtained at different stages of the D&A analysis. Note that 90 % confidence intervals are reported. The analysis presented in this section is performed under the “models are statistically indistinguishable from the truth” paradigm. The equivalent analysis assuming the “models are centered on the truth” paradigm is presented and briefly discussed in “Appendix 8.4”.

Figure 4a illustrates the detection step, where observations are only compared to unforced simulations. Estimated from the linear trend over the whole period, the observed warming is about \(+0.65\)K. By contrast, the linear warming of global mean temperature found in segments taken from unforced simulations has mean zero and a standard deviation of about 0.08K. The observed value is thus well outside the range of values expected as a consequence of internal variability alone (\(\pm 0.13\)K at the 90 % confidence level). Detection, based on (18), is very significant, as the *p* value is numerically indistinguishable from 0.

Figure 4b illustrates the comparison of observations to the ALL-forcings simulations. This comparison allows us to test the consistency between the observed and simulated global warming, and to estimate the overall forced response. The range of values obtained from individual climate models is quite large, ranging from \(+0.44\)K to \(+1.16\)K. Consistent with this, the forced warming estimated from (3) only is about \(+0.80\)K, with a 90 % confidence range of \([+0.46\)K\(, +1.15\)K]. The observed value (\(+0.65\)K) is smaller than the center of this distribution. Given the expected range of internal variability, the forced response based on (2) only is within \([+0.52\)K\(, +0.78\)K]. Considering the whole model (1)–(3), the confidence interval for the overall forced response is very similar, \([+0.55\)K\(, +0.79\)K], with a best estimate at \(+0.67\)K. Observations are also found to be very consistent with simulated responses, as the *p* value of the test (19) is 0.51.

Figure 4c illustrates how the contributions from natural and anthropogenic external forcings may be estimated. Note that a smaller ensemble of models has been used here as many CMIP5 models did not run D&A simulations (i.e. historical simulations using specified subsets of forcings). Models simulate a response to natural forcing of \(-0.01\)K (\([-0.03\)K, \(+0.02\)K]). This range of values is particularly narrow, as the NAT response is weak in all models. Surprisingly, this range of values is also much narrower than that reported in Fig. 4a from internal variability only. This is partly because ensemble means are considered, which are less impacted by internal variability than individual runs. Related to this, the estimated modelling uncertainty on this term is exactly 0 (which may happen with the truncated estimate used, see “Appendix 8.3”). This means that the discrepancy between models is consistent with internal variability only. The simulated response to anthropogenic forcing is much more uncertain, with a warming of \(+0.80\)K on average, and a 90 % range of \([+0.39\)K\(, +1.21\)K]. Applying our method leads to reduced uncertainty on the latter term, \([+0.55\)K\(, +0.80\)K] with a best-estimate of \(+0.67\)K, while the natural contribution is left virtually unchanged. Note that the main limitation to a stronger observational constraint comes from noise in the observations, which is dominated by internal variability, and not from uncertainty in the contribution of natural forcings, which is very small here. Based on this 2-forcing analysis, observations are also found to be consistent with models, with a *p* value of 0.60. A final piece of information from this analysis is the attribution test of *other plausible causes*. Based on this analysis of linear warming trends only, we find that the anthropogenic influence is unequivocal, as the hypothesis that *natural forcings alone can explain observations* is strongly rejected (the *p* value is, again, numerically indistinguishable from 0). The symmetric hypothesis that *anthropogenic forcings alone can explain observations* is not rejected (*p* value 0.58). The natural influence is thus much more difficult to demonstrate, consistent with the very small response simulated to natural forcings over this period of time.

Our estimates of natural or anthropogenically-induced warming are well consistent with Figure 10.5 from the last IPCC report, in the sense that the IPCC estimates are included within our intervals. Our ranges, however, are slightly wider. Part of this might be due to fact that we compute 90 % confidence ranges, while the IPCC ranges were assessed to be only *likely* (i.e. 66 %). Part of this difference is also expected as no temporal or spatial information is accounted for here. Our estimates come only from the constraint provided by the linear warming over this period. Applying our procedure to a more comprehensive observed vector *Y* in order to more efficiently distinguish between internal variability and the expected responses to forcing, and to further reduce these uncertainties, would be a natural continuation of this work.

## 7 Conclusion

D&A has, for the most part, used regression-like approaches for the past two decades, where observations are regressed onto expected response patterns. These methods tend to ignore the information provided by climate models on the magnitude of the forced responses. Accounting for climate modelling uncertainty in such regression-based approaches is quite challenging, and thus it has also been common practice to neglect this type of uncertainty.

We have introduced a revised statistical framework for D&A that overcomes these weaknesses. Our approach relies on the additivity of the forcing responses to provide an observational constraint on the contribution of each forcing. This method is able to deal with climate modelling uncertainty. The information provided by an ensemble of climate models is then used both in terms of the response patterns and the response magnitudes, in a very symmetrical way.

This paper describes statistical inference methods required for D&A within this new statistical framework. The estimation of each forced response is based on a maximum likelihood method. Closed-form estimators and exact confidence regions are provided, as opposed to regression-based methods like EIV. Hypothesis tests that are of interest to formally attribute an observed change to some combination of forcings are also presented and discussed. In particular we provide likelihood ratio tests and their null-distribution.

We provide some guidelines on quantifying climate modelling uncertainty from an ensemble of opportunity. Following previous studies, we consider the *models are statistically indistinguishable from the truth* paradigm, where the truth is assumed to lie somewhere within the model envelope, to obtain a more conservative estimate of this uncertainty. The reliability of such a paradigm, however, might be investigated further. We also discuss in an Appendix how this uncertainty may be estimated if the number of simulations available varies among models. This approach is not expected to strongly affect inferences because the strongest constraints on the estimated responses to forcing is from observations.

Our additive decomposition method is shown to have good properties based on simple synthetic examples. In particular, we illustrate how the observational constraint on forced contributions is influenced by the structure of the climate modelling uncertainty - i.e. the corresponding variance matrices. We also demonstrate that our approach is equivalent to linear regression in the particular case where models agree on the response pattern but widely disagree on the response magnitude.

Application of this method to the analysis of the contributions of anthropogenic and natural external forcing to the linear 1951-2010 trend in global mean temperature provides results that are very consistent with the recent IPCC AR5. We find that the observed warming over this period (\(+0.65\)K) is mostly related to anthropogenic forcings (\(+0.67\pm 0.12\)K), with a very limited contribution from natural forcings (\(-0.01\pm 0.02\)K). Application of the same method with space-time information might further reduce these ranges.

Assessing the extent to which this new method may improve the observational constraint on other variables or other external forcings would be a natural continuation of this work.

## Acknowledgments

The authors are grateful to the two anonymous referees for their constructive comments, which were of great value in improving the paper. Part of this work has been supported by the Fondation STAE, via the project Chavana, and by the Extremoscope and ANR-DADA projects.