To anchor the discussion in a common application framework, suppose a climate model is available that can generate simulated annual temperatures \(A_{itf}\) in locations (or grid cells) \(i = 1, \ldots n\) over the time interval \(t = 1, \ldots ,T\) under the assumption of no forcings (f = 0), only natural forcings (\(f = 1\)), or only anthropogenic forcings (\(f = 2\)). Annual observed temperature anomalies \(A_{it}^{o}\) are also available in the same grid cells. The n-length vector y provides a measure of the cell-specific changes (e.g. trends or differences in means) in \(A_{it}^{o}\) over the time period of interest. A matrix X is formed with g columns consisting of “signal vectors” which are created by running the model and generating \(A_{itf}\) for \(f = 1,2\); then computing changes in the same way as for y. All data are zero-centered so a constant term is not needed.Footnote 1
A linear model regressing the observed spatial pattern of changes against the model-simulated signals is written
$${\varvec{y}} = {\varvec{Xb}} + {\varvec{u}}$$
(1)
where u is the regression error and the coefficient vector b measures the scaling factors necessary for the signals to explain the observed climate patterns. A significant value of an element of b implies the associated signal is “detected”.
Denote the (unobservable) covariance matrix of u as \({{\varvec{\Omega}}}\), thus \(E\left( {{\varvec{uu}}^{T} } \right) = {{\varvec{\Omega}}}\). Note \({{\varvec{\Omega}}}\) is a symmetric nonstochastic matrix with full rank. The GM Theorem states that if \({{\varvec{\Omega}}}\) is conditionally (on X) homoscedastic and has zero off-diagonal elements, meaning it can be represented as a scalar multiple of the \(n \times n\) identity matrix \({\varvec{I}}_{n}\), which we write
$$E\left( {{\varvec{uu}}^{T} |{\varvec{X}}} \right) = {{\varvec{\Omega}}} = \sigma^{2} {\varvec{I}}_{n}$$
(2)
where \(\sigma^{2}\) is the variance of the errors u, and if the expectation of the errors conditional on X is zero:
$$E\left( {{\varvec{u}}{|}{\varvec{X}}} \right) = 0,$$
(3)
then the OLS estimator
$$\hat{\user2{b}}_{OLS} = \left( {{\varvec{X}}^{T} {\varvec{X}}} \right)^{ - 1} {\varvec{X}}^{T} {\varvec{y}}$$
is unbiased and has the minimum variance among all linear (in y) unbiased estimators (Davidson and MacKinnon 2004). In this way, an estimator of b that satisfies the GM conditions (2) and (3) can be said to be BLUE.
If GM condition (2) fails OLS coefficient estimates will still be unbiased but potentially inefficient, whereas if GM condition (3) fails OLS coefficients will be biased and inconsistent. If condition (2) fails the common remedies are feasible GLS (where ‘feasibility’ refers to use of a full rank estimator \({\hat{\mathbf{\Omega }}}\) to provide weights for the regression model (1)) or use of White’s (1980) heteroskedasticity-consistent (HC) variance estimator.
In response to concerns that detection of greenhouse gas signals would be thwarted by heteroskedasticity, AT99 proposed the following variant on feasible GLS. They call \({{\varvec{\Omega}}}\) the “climate noise” matrix and denote it \({\varvec{C}}_{{\varvec{N}}}\)Footnote 2:
$$E\left( {{\varvec{uu}}^{{\varvec{T}}} } \right) = {\varvec{C}}_{{\varvec{N}}} .$$
(4)
Rather than define it as the outer product of the error terms in (1) they assume it can be computed using the spatial covariances from the preindustrial (unforced) control run of a climate model. Denote the matrix root of the inverse of \({\varvec{C}}_{{\varvec{N}}}\) as P, so \({\varvec{P}}^{T} {\varvec{P}} = {\varvec{C}}_{N}^{ - 1}\). While \({\varvec{C}}_{N}\) is obtainable from a climate model, due to the limited degrees of freedom in climate models it is in practice rank deficient and its inverse does not exist. Suppose the rank of \({\varvec{C}}_{N}\) is \(K < n\). They propose an estimator \(\hat{\user2{C}}_{N}\) defined using a \(K \times n\) matrix \({\varvec{P}}^{{\left( {\varvec{K}} \right)}}\) consisting of the first K eigenvectors of \({\varvec{C}}_{{\varvec{N}}}\) weighted by their inverse singular values, which implies \({\varvec{P}}^{{\left( {\varvec{K}} \right)}} \hat{\user2{C}}_{N} {\varvec{P}}^{{\left( {\varvec{K}} \right)T}} = {\varvec{I}}_{K}\) (the rank-K identity matrix) and \(\left( {{\varvec{P}}^{{\left( {\varvec{K}} \right)T}} {\varvec{P}}^{{\left( {\varvec{K}} \right)}} } \right)^{ + } = \hat{\user2{C}}_{N}\) where + denotes that it is the Moore–Penrose pseudo-inverse. AT99 assert that the regression
$${\varvec{P}}^{{\left( {\varvec{K}} \right)}} {\varvec{y}} = {\varvec{P}}^{{\left( {\varvec{K}} \right)}} {\varvec{Xb}} + {\varvec{P}}^{{\left( {\varvec{K}} \right)}} {\varvec{u}}$$
(5)
yields an unbiased estimator \(\tilde{\user2{b}} = \left( {{\varvec{X}}^{T} {\varvec{P}}^{{\left( {\varvec{K}} \right)T}} {\varvec{P}}^{{\left( {\varvec{K}} \right)}} {\varvec{X}}} \right)^{ - 1} {\varvec{X}}^{T} {\varvec{P}}^{{\left( {\varvec{K}} \right)T}} {\varvec{P}}^{{\left( {\varvec{K}} \right)}} {\varvec{y}}\) which satisfies the GM conditions as long as \({\varvec{P}}^{{\left( {\varvec{K}} \right)}}\) exists, regardless of the value of K, because, by definition,
$$E\left( {{\varvec{Puu}}^{{\varvec{T}}} {\varvec{P}}^{{\varvec{T}}} } \right) = {\varvec{PC}}_{{\varvec{N}}} {\varvec{P}}^{T} = {\varvec{I}}_{{\varvec{n}}} .$$
(6)
Equation (6) corresponds to Eq. (3) in AT99.
The assertion that (6) implies \(\tilde{\user2{b}}\) is BLUE depends on 5 assumptions, not all of which AT99 state: (i) P is nonstochastic; (ii) it does not matter that \({\varvec{I}}_{K} \ne {\varvec{I}}_{n}\); (iii) there are no necessary conditions other than (6) for the GM Theorem to hold; (iv) the non-existence of \({\varvec{C}}_{N}^{ - 1}\) has no implications for the properties of \(\tilde{\user2{b}}\) and (v) test statistics associated with \(\tilde{\user2{b}}\) do not depend on the assumption that \({\varvec{C}}_{N} = {{\varvec{\Omega}}}\). All five assumptions are incorrect.
-
(i)
As was observed in AT99, \({\varvec{C}}_{{\varvec{N}}}\) (and P) are functions of random natural forcings so they are random matrices. Ignore for the moment the distinction between P and \({\varvec{P}}^{{\left( {\varvec{K}} \right)}}\). AT99 set the randomness issue aside but the observation that the climate model generates “noisy” or random vectors is the motivation behind proposed refinements in Allen and Stott (2003) and elsewhere. Since u is random, if P is also random Eq. (6) is incorrect. Neither can it be the case that \({\varvec{C}}_{N} = {{\varvec{\Omega}}}\) since one is random and the other is not. Using a standard decomposition, \(\tilde{\user2{b}} = {\varvec{b}} + \left( {{\varvec{X}}^{T} {\varvec{P}}^{T} {\varvec{PX}}} \right)^{ - 1} {\varvec{X}}^{T} {\varvec{P}}^{T} {\varvec{Pu}}\). To prove consistency of the AT99 optimal fingerprinting method with random P it needs to be shown (not merely assumed) that the second term on the right converges in distribution to \(N\left( {0,{\varvec{I}}_{g} } \right)\), which requires proving the necessary conditions
$$\left[ {{\text{N1}}} \right]\quad plim\left( {\left( {{\varvec{X}}^{{\varvec{T}}} {\varvec{P}}^{{\varvec{T}}} {\varvec{PX}}} \right)^{ - 1} {\varvec{X}}^{{\varvec{T}}} {\varvec{P}}^{{\varvec{T}}} {\varvec{Pu}}} \right) = 0_{g}$$
and
$$\left[ {{\text{N2}}} \right]\quad plim\left( {\left( {{\varvec{X}}^{{\varvec{T}}} {\varvec{P}}^{{\varvec{T}}} {\varvec{PX}}} \right)^{ - 1/2} {\varvec{X}}^{{\varvec{T}}} {\varvec{P}}^{{\varvec{T}}} {\varvec{P}}{{\varvec{\Omega}}}{\varvec{P}}^{{\varvec{T}}} {\varvec{PX}}\left( {{\varvec{X}}^{{\varvec{T}}} {\varvec{P}}^{{\varvec{T}}} {\varvec{PX}}} \right)^{ - 1/2} } \right) = {\varvec{I}}_{g}$$
where the right-hand side of [N1] is a vector of g zeroes.Footnote 3 Here the probability limit is over \(n \to \infty\) but note that the rank of \({\varvec{P}}^{{\left( {\varvec{K}} \right)}}\) will be constrained to K regardless of sample size, which would make proving [N1] and [N2] particularly challenging. AT99 sidestepped any discussion of these requirements by assuming \({\varvec{P}}^{{\left( {\varvec{K}} \right)}}\) is nonstochastic but in later literature when it was assumed to be random this issue was not revisited, so these conditions still need to be proven. This is true also of methods using regularized variance matrix estimators (see e.g. Hannart 2016) in which invertibility is achieved by using a weighted sum of \({\varvec{C}}_{N}\) and the n-dimensional identity matrix.
-
(ii)
If \({\varvec{P}}^{{\left( {\varvec{K}} \right)}}\) is used in Eq. (6) the resulting identity matrix will have rank less than n so GM condition (2) automatically fails and application of the AT99 method cannot yield a BLUE estimate of b. Even if P is nonstochastic, the pseudo-inverse \(\left( {{\varvec{P}}^{{\left( {\varvec{K}} \right)T}} {\varvec{P}}^{{\left( {\varvec{K}} \right)}} } \right)^{ + }\) cannot equal \({{\varvec{\Omega}}}\) regardless of sample size so \(\hat{\user2{C}}_{N}\) is a biased and inconsistent estimator of \({{\varvec{\Omega}}}\).Footnote 4 Using an inconsistent estimator of \({{\varvec{\Omega}}}\) does not necessarily prevent obtaining consistent coefficient covariances. White (1980) famously showed that under certain assumptions, if the estimator \({\hat{\mathbf{\Omega }}}\) consists of a diagonal matrix with the squared OLS residuals along the main diagonal it is inconsistent but \(n^{ - 1} {\varvec{X}}^{{\mathbf{T}}} {\hat{\mathbf{\Omega }}}{\varvec{X}}\) is nonetheless a consistent estimator of \(n^{ - 1} {\varvec{X}}^{{\mathbf{T}}} {{\varvec{\Omega}}}{\varvec{X}}\) (Davidson and MacKinnon 2004 p. 198). This is the so-called HC method and is a popular alternative to feasible GLS in statistics and econometrics. To obtain a comparable result, AT99 needed to showFootnote 5
$$\left[ {{\text{N3}}} \right]\quad plim\left( {n^{ - 1} {\varvec{X}}^{T} \hat{\user2{C}}_{{\varvec{N}}} \user2{ X}} \right) = \mathop {\lim }\limits_{n \to \infty } n^{ - 1} {\varvec{X}}^{{\mathbf{T}}} E\left( {{\varvec{uu}}^{{\varvec{T}}} } \right){\varvec{X}}.$$
But even if this could be proven, and if its convergence properties could be shown to rival that of White’s method or the variants that have been proposed since then, all it would show is that the AT99 optimal fingerprinting method has the same consistency property as White’s method, which is much easier to use and does not introduce other forms of bias.
-
(iii)
AT99 failed to state GM condition (3) and to test whether it holds or not. Failure of conditional invariance yields biased and inconsistent slope coefficients so it is critically important to detect and remedy potential violations. Adding P to the model does not imply conditional invariance of u or \(\sigma^{2}\) to X is no longer needed, instead it adds the requirement of invariance to P. Hence AT99 should have listed the analogue to Eq. (3) which in the nonstochastic P case implies they needed to show
$$\left[ {{\text{N4}}} \right]\quad E\left( {{\varvec{Pu}}{|}{\varvec{X}},{\varvec{P}}} \right) = 0$$
(7)
Also Eq. (6) (Eq. (3) in AT99) as written was incorrect and should have been stated as
$$E\left( {{\varvec{Puu}}^{{\varvec{T}}} {\varvec{P}}^{{\varvec{T}}} |{\varvec{X}},{\varvec{P}}} \right) = {\varvec{I}}_{{\varvec{K}}} .$$
(8)
AT99 and all who have followed in the fingerprinting literature omitted condition (3) (or (7)) in their discussion of the GM theorem. Equation (7) does not follow from Eq. (8) and needs to be tested separately. Violations of (7) lead to biased and inconsistent coefficient estimates and are typically more difficult to remedy than violations of (8). There is a voluminous econometrics and statistics literature pertaining to testing Eq. (2) (e.g. Davidson and MacKinnon 2004; Wooldridge 2019 etc.) which could be extended to the case of Eq. (7). Since this has never been done, there is no assurance that any applications of the AT99 method have yielded unbiased coefficient estimates.
-
(iv)
AT99 stated (p. 423) “we do not actually require \({\varvec{C}}_{{\varvec{N}}}^{{ - 1}}\) for [\(\tilde{\user2{b}}\)] to be BLUE” but this is inaccurate. Note that \({\varvec{C}}_{{\varvec{N}}}\) is the estimator of the unobservable \({{\varvec{\Omega}}}\), and since it is singular it must be approximated using \(\hat{\user2{C}}_{N} = \left( {{\varvec{P}}^{{\left( {\varvec{K}} \right)T}} {\varvec{P}}^{{\left( {\varvec{K}} \right)}} } \right)^{ + }\) or through application of a regularization method. In feasible GLS the existence of the inverse of the error covariance matrix estimator is one of the sufficient conditions for proving \(E\left( {\tilde{\user2{b}}} \right) = {\varvec{b}}\) (Kmenta 1986 p. 615) so in the AT99 framework whenever recourse must be made to \(\hat{\user2{C}}_{N}\) it perforce cannot be shown that the coefficient estimates are unbiased. Estimating \({{\varvec{\Omega}}}\) using \(\hat{\user2{C}}_{N}\) solves a computational problem but creates a theoretical one. Even if \({\varvec{C}}_{{\varvec{N}}}\) were invertible however, that would only be the start of the matter since consistency of \(\tilde{\user2{b}}\) depends on specific assumptions about the structure of \({\hat{\mathbf{\Omega }}}\) and that of the other variables in the regression model. Amemiya (1973) derived sufficient conditions for consistency and efficiency of a GLS estimator when the errors are known to follow a mixed autoregressive moving average process. But the proof depends on 5 assumptions about the numerical properties of X and u that might not hold in all applications. In the case of AT99 the reader is not told what assumptions about X and \(\hat{\user2{C}}_{N}\) are necessary for \(\tilde{\user2{b}}\) to converge in distribution to \(N\left( {{\varvec{b}},({\varvec{X}}^{T} \hat{\user2{C}}_{N}^{ - 1} {\varvec{X}}} \right) ^{ - 1} )\), the authors merely assert (p. 422) that it does. Especially in light of how small K is in many applications, the non-existence of \({\varvec{C}}_{{\varvec{N}}}^{{ - 1}}\) and the absence of proof of consistency means claims about the unbiasedness of optimal fingerprinting regression coefficients based on the AT99 method are conjectural. What is needed is a proof of the statement
$$\left[ {{\text{N5}}} \right]\quad \sqrt n \left( {\tilde{\user2{b}} - E\left( {\tilde{\user2{b}}} \right)} \right)\to ^{{\varvec{d}}} N\left( {0,V} \right)$$
where \(V = plim\left( {n^{ - 1} {\varvec{X}}^{T} \hat{\user2{C}}_{{\varvec{N}}}^{ - 1} {\varvec{X}}} \right)^{ - 1}\). If u is assumed to be Normal it would suffice to prove [N1] and [N2], otherwise a central limit theorem is required.
-
(v)
The distribution of a test statistic under the null cannot be conditional on the null being false. Reliance on the assumption that \(\hat{\user2{C}}_{{\varvec{N}}} \cong {\varvec{C}}_{{\varvec{N}}} = {{\varvec{\Omega}}}\) creates a risk of spurious inference in the AT99 framework since it assumes that the climate model is a perfect representation of the real climate. The climate model embeds the assumption that greenhouse gases have a significant effect on the climate along with other assumptions about the magnitude and effects of natural forcings. In a typical optimal fingerprinting application, the researcher seeks to compute the distribution of a test statistic under the null hypothesis that greenhouse gases have no effect on the climate. No statistic can be constructed in the AT99 framework that maintains this assumption. Use of preindustrial control runs to generate \({\varvec{C}}_{{\varvec{N}}}\), or combining data from different climate models, does not remedy this issue since all such models, even in their preindustrial era simulations, embed the assumption that elevated greenhouse gases (if present) would have a large effect relative to those of natural forcings.