Abstract
Krämer (Sankhy\(\bar{\mathrm{a }}\) 42:130–131, 1980) posed the following problem: “Which are the \(\mathbf{y}\), given \(\mathbf{X}\) and \(\mathbf{V}\), such that OLS and Gauss–Markov are equal?”. In other words, the problem aimed at identifying those vectors \(\mathbf{y}\) for which the ordinary least squares (OLS) and Gauss–Markov estimates of the parameter vector \(\varvec{\beta }\) coincide under the general Gauss–Markov model \(\mathbf{y} = \mathbf{X} \varvec{\beta } + \mathbf{u}\). The problem was later called a “twist” to Kruskal’s Theorem, which provides conditions necessary and sufficient for the OLS and Gauss–Markov estimates of \(\varvec{\beta }\) to be equal. The present paper focuses on a similar problem to the one posed by Krämer in the aforementioned paper. However, instead of the estimation of \(\varvec{\beta }\), we consider the estimation of the systematic part \(\mathbf{X} \varvec{\beta }\), which is a natural consequence of relaxing the assumption that \(\mathbf{X}\) and \(\mathbf{V}\) are of full (column) rank made by Krämer. Further results, dealing with the Euclidean distance between the best linear unbiased estimator (BLUE) and the ordinary least squares estimator (OLSE) of \(\mathbf{X} \varvec{\beta }\), as well as with an equality between BLUE and OLSE are also provided. The calculations are mostly based on a joint partitioned representation of a pair of orthogonal projectors.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Let us consider the general Gauss–Markov model
where \(\mathbf{y}\) is an \(n \times 1\) observable random vector, \(\mathbf{X}\) is a known \(n \times p\) model matrix, \(\varvec{\beta }\) is a \(p \times 1\) vector of unknown parameters, and \(\mathbf{u}\) is an \(n \times 1\) random error vector. The expectation vector and the covariance matrix of \(\mathbf{u}\) are \(\mathsf E (\mathbf{u}) = \mathbf{0}\) and \(\mathsf Cov (\mathbf{u}) = \sigma ^2 \mathbf{V}\), respectively, where \(\sigma ^2 > 0\) is an unknown constant and \(\mathbf{V}\) is a known \(n \times n\) nonnegative definite matrix. Both \(\mathbf{X}\) and \(\mathbf{V}\) may be rank deficient. It is assumed beforehand that the model (1) is consistent, i.e., \(\mathbf{y} \in {\fancyscript{R}} ( \mathbf{X} : \mathbf{V} )\), where \({\fancyscript{R}}(\mathbf{.})\) stands for the column space of a matrix argument and \((\mathbf{X} : \mathbf{V})\) denotes the \(n \times (p + n)\) columnwise partitioned matrix obtained by juxtaposing matrices \(\mathbf{X}\) and \(\mathbf{V}\); cf. Rao (1973, p. 297) or Puntanen et al. (2011, pp. 43, 125).
In his paper, Krämer (1980, p. 130) posed the following problem: “Which are the \(\mathbf{y}\), given \(\mathbf{X}\) and \(\mathbf{V}\), such that ordinary least squares (OLS) and Gauss–Markov are equal?” In other words, the problem aimed at identifying those vectors \(\mathbf{y}\) for which the OLS and Gauss–Markov estimates of the parameter vector \(\varvec{\beta }\) coincide. Referring to this problem, in a follow-up paper Krämer et al. (1996) called this a “twist” to Kruskal’s Theorem (Kruskal 1968), which provides conditions necessary and sufficient for the OLS and Gauss–Markov estimates of \(\varvec{\beta }\) to be equal. In Krämer et al. (1996) “another twist” to Kruskal’s Theorem is dealt with, and rather than asking when is the OLS equal to the Gauss–Markov for the full regression vector \(\varvec{\beta }\), a condition for the equality of the OLS and Gauss–Markov for a subparameter of \(\varvec{\beta }\) is provided. A more general “final twist” is considered in Jaeger and Krämer (1998), where the single vectors \(\mathbf{y}\) are characterized that yield identical OLS and Gauss–Markov estimators for such a subparameter.
Inspired by Jaeger and Krämer (1998), Krämer (1980), and Krämer et al. (1996), in what follows we “do the twist again”. However, unlike in the three papers, we do not assume that \(\mathbf{X}\) and \(\mathbf{V}\) are of full (column) rank, which means that the vector \(\varvec{\beta }\) is not necessarily unbiasedly estimable. For this reason, instead of the estimation of \(\varvec{\beta }\), we consider the estimation of the systematic part \(\mathsf E (\mathbf{y}) = \mathbf{X} \varvec{\beta }\). Note that this parameter function always has a linear unbiased estimator, namely \(\mathbf{y}\) itself.
An important role in the subsequent considerations will be played by the notion of a projector. It is known that any \(n \times n\) idempotent matrix, say \(\mathbf{F} \in \mathbb{R }^{n \times n}\), is an oblique projector onto its column space \({\fancyscript{R}}(\mathbf{F})\) along its null space \({\fancyscript{N}}(\mathbf{F})\), where \({\fancyscript{R}}(\mathbf{F}) \oplus {\fancyscript{N}}(\mathbf{F}) = \mathbb{R }^{n,1}\). Among many conditions characterizing idempotent matrices one finds for instance: \(\mathbf{F}^2 = \mathbf{F} \Leftrightarrow {\fancyscript{R}}(\mathbf{F}) = {\fancyscript{N}}(\overline{\mathbf{F}}) \Leftrightarrow {\fancyscript{R}}(\overline{\mathbf{F}}) = {\fancyscript{N}}(\mathbf{F})\), where \(\overline{\mathbf{F}} = \mathbf{I}_n - \mathbf{F}\). When idempotent \(\mathbf{F}\) projects onto \({\fancyscript{R}}(\mathbf{F})\) along the orthogonal complement of \({\fancyscript{R}}(\mathbf{F})\), then it is called an orthogonal projector. It can be verified that \(\mathbf{F}\) is an orthogonal projector if and only if it is both idempotent and symmetric, i.e., \(\mathbf{F}^2 = \mathbf{F} = \mathbf{F}^\prime \). Projectors are widely used in Statistics and Econometrics as a basic tool for estimation and test procedures.
Let \(\mathbf{G}\) be an \(n \times n\) matrix. An estimator \(\mathbf{G}\mathbf{y}\) for \(\mathbf{X} \varvec{\beta }\), which is unbiased and of minimal covariance matrix in the Löwner sense fulfills the conditions
where \(\mathbf{H} = \mathbf{X}\mathbf{X}^\dagger \) and \(\mathbf{M} = \mathbf{I}_n - \mathbf{H}\) are the orthogonal projectors onto, respectively, \({\fancyscript{R}}(\mathbf{X})\), the column space of \(\mathbf{X}\), and the orthogonal complement of \({\fancyscript{R}}(\mathbf{X})\) which coincides with \({\fancyscript{N}}(\mathbf{X}^\prime )\), the null space of \(\mathbf{X}^\prime \). The symbol \(\mathbf{X}^\dagger \) denotes the Moore–Penrose inverse of \(\mathbf{X}\). The conditions (2) can be rewritten as
where \(\mathbf{P}_{\mathbf{V}\mathbf{M}} = \mathbf{V}\mathbf{M}(\mathbf{V}\mathbf{M})^\dagger \) is the orthogonal projector onto \({\fancyscript{R}}(\mathbf{V}\mathbf{M})\). It was pointed out in (Baksalary and Trenkler (2012), Remark 3.1) that Eq. (3) always has a solution \(\mathbf{G}\) and that each \(\mathbf{G}\) satisfying (3) yields a representation of the best linear unbiased estimator \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta })\) of \(\mathbf{X} \varvec{\beta }\). All these representations coincide; see Groß (2004, Corollary 3). In the Appendix given below it is demonstrated that there may exist, however, quite useless versions of \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta })\). To avoid this discrepancy subsequently we strengthen the consistency condition \(\mathbf{y} \in {\fancyscript{R}} ( \mathbf{X} : \mathbf{V} )\) to
Then, according to Groß (2004, Corollary 4), \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta })\) is unique.
In the next section some representations of the best linear unbiased estimator (BLUE) and the ordinary least squares estimator (OLSE) are provided, whereas Sect. 3 deals with “another twist” to Kruskal’s Theorem, which was briefly mentioned above. Section 4 is concerned with bounds for the Euclidean distance between BLUE and OLSE of \(\mathbf{X} \varvec{\beta }\), and the last section of the paper revisits the problem of when BLUE equals OLSE.
2 Representations of BLUE and OLSE
Let \(\mathbf{P}\) be an orthogonal projector in \(\mathbb{R }^{n,1}\), i.e., an \(n \times n\) real symmetric idempotent matrix. Assume that the rank of \(\mathbf{P}\) is r. It is known that there exists an orthogonal matrix \(\mathbf{U}\) such that
see Trenkler (1994, Theorem 13). Any other orthogonal projector of the same size, say \(\mathbf{Q} \in \mathbb R ^{n \times n}\), can be represented as
with symmetric matrices \(\mathbf{A}\) and \(\mathbf{D}\) of orders \(r\) and \(n-r\), respectively.
In what follows we assume that \(\mathbf{H} = \mathbf{X}\mathbf{X}^\dagger \) is represented by \(\mathbf{P}\) of the form (5) and \(\mathbf{P}_{\mathbf{V}\mathbf{M}}\) is represented by \(\mathbf{Q}\) defined in (6), i.e., \(\mathbf{P} = \mathbf{H} = \mathbf{X}\mathbf{X}^\dagger \) and \(\mathbf{Q} = \mathbf{P}_{\mathbf{V}\mathbf{M}} = \mathbf{V}\mathbf{M}(\mathbf{V}\mathbf{M})^\dagger \). It can be verified that \(\mathbf{T} = (\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H})^\dagger \), where \(\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}} = \mathbf{I}_n - \mathbf{P}_{\mathbf{V}\mathbf{M}}\), is an idempotent matrix; see Greville (1974, p. 830). From (5) and (6) we obtain
where \(\mathbf{P}_{\overline{\mathbf{A}}}\) is the orthogonal projector onto the column space of \(\overline{\mathbf{A}} = \mathbf{I}_r - \mathbf{A}\). It follows that \(\mathbf{T}\) is the oblique projector onto \({\fancyscript{R}}(\mathbf{H}) \cap [{\fancyscript{N}}(\mathbf{H}) + {\fancyscript{N}}(\mathbf{P}_{\mathbf{V}\mathbf{M}})]\) along \({\fancyscript{R}}(\mathbf{P}_{\mathbf{V}\mathbf{M}}) \stackrel{\perp }{\oplus } [{\fancyscript{N}}(\mathbf{H}) \cap {\fancyscript{N}}(\mathbf{P}_{\mathbf{V}\mathbf{M}})]\), where \(\stackrel{\perp }{\oplus }\) indicates that the two subspaces involved in the direct sum are orthogonal; see Baksalary and Trenkler (2010, Theorem 2). From
(see Baksalary and Trenkler 2009, Theorem 1), we arrive at \({\fancyscript{N}}(\mathbf{H}) + {\fancyscript{N}}(\mathbf{P}_{\mathbf{V}\mathbf{M}}) = \mathbb R ^{n,1}\), which leads to the conclusion that \(\mathbf{T}\) is the oblique projector onto \({\fancyscript{R}}(\mathbf{H})\) along \({\fancyscript{R}}(\mathbf{P}_{\mathbf{V}\mathbf{M}}) \stackrel{\perp }{\oplus } [{\fancyscript{N}}(\mathbf{H}) \cap {\fancyscript{N}}(\mathbf{P}_{\mathbf{V}\mathbf{M}})]\). Furthermore, it follows that \(\mathbf{T}\) takes the form
see Baksalary and Trenkler (2010, Sect. 2).
It is well known that (7) ensures that Eq. (3) is solvable. One of the solutions, namely \(\mathbf{T} = (\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H})^\dagger \), gives a representation of the BLUE for \(\mathbf{X} \varvec{\beta }\), i.e., \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta }) = \mathbf{T}\mathbf{y}\). There is a number of further expressions for the BLUE (see Baksalary and Trenkler 2009, Sect. 4), but they all coincide by the assumption (4). Observe that the OLSE of \(\mathbf{X} \varvec{\beta }\) is \(\mathsf{OLSE }(\mathbf{X} \varvec{\beta }) = \mathbf{H}\mathbf{y}\).
3 Another twist
As in Krämer (1980), we consider the problem of identifying those observation vectors \(\mathbf{y }\) which yield the same value of \(\mathsf{OLSE }(\mathbf{X} \varvec{\beta })\) and \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta })\). This amounts to an analysis of the subspace \({\fancyscript{L}}\) of \(\mathbb R ^{n,1}\) which is the null space of \(\mathbf{H} - \mathbf{T}\), i.e., \({\fancyscript{L}} = {\fancyscript{N}}(\mathbf{H} - \mathbf{T})\). For this purpose the following result is useful.
Lemma 1
Let \(\mathbf{R}\) and \(\mathbf{S}\) be idempotent matrices of the same size. Then:
-
(i)
\({\fancyscript{N}}(\mathbf{S} - \mathbf{R}) = {\fancyscript{N}}(\mathbf{S}\overline{\mathbf{R}}) \cap {\fancyscript{N}}(\overline{\mathbf{S}}\mathbf{R})\),
-
(ii)
\({\fancyscript{N}}(\mathbf{R}\mathbf{S}) = {\fancyscript{N}}(\mathbf{S}) \oplus [{\fancyscript{N}}(\mathbf{R}) \cap {\fancyscript{R}}(\mathbf{S})]\).
Proof
For a proof see (Baksalary and Trenkler (2013), Theorems 1 and 9). \(\square \)
Lemma 1 leads to the following result.
Theorem 1
Under the model (1), let \(\mathbf{H} = \mathbf{X}\mathbf{X}^\dagger \), \(\mathbf{M} = \mathbf{I}_n - \mathbf{H}\), and \(\mathbf{T} = (\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H})^\dagger \). Then
Proof
Lemma 1 yields
Another relevant fact is that with \(\mathbf{H}\) of the form (5) and \(\mathbf{T}\) given in (8) we directly get \(\overline{\mathbf{T}}\mathbf{H} = \mathbf{0}\). \(\square \)
The vectors belonging to the subspace \({\fancyscript{L}} = {\fancyscript{N}}(\mathbf{H} - \mathbf{T})\) can be explicitly written as
as the solutions to the equation \(\mathbf{T}\mathbf{M}\mathbf{z} = \mathbf{0}\). Observe also that \({\fancyscript{N}}(\mathbf{T}\mathbf{M}) \supseteq {\fancyscript{N}}(\mathbf{M}) = {\fancyscript{R}}(\mathbf{H})\). This means that, inter alia, all vectors belonging to \({\fancyscript{R}}(\mathbf{H}) = {\fancyscript{R}}(\mathbf{X})\) result in estimates such that \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta }) = \mathsf OLSE (\mathbf{X} \varvec{\beta })\), for example \(\hat{\mathbf{y}} = \mathbf{H}\mathbf{y}\). This does not come as a surprise, since by (5) and (8) it follows that \(\mathbf{T}\mathbf{H} = \mathbf{H}\).
Further characterization of the subspace \({\fancyscript{L}} = {\fancyscript{N}}(\mathbf{H} - \mathbf{T})\) is established in the theorem below.
Theorem 2
Under the model (1), let \(\mathbf{H} = \mathbf{X}\mathbf{X}^\dagger \), \(\mathbf{M} = \mathbf{I}_n - \mathbf{H}\), and \(\mathbf{T} = (\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H})^\dagger \). Then
Proof
By Theorem 1 we have \({\fancyscript{L}} = {\fancyscript{N}}(\mathbf{T}\mathbf{M})\). Hence, Lemma 1 implies
Now
which completes the proof. \(\square \)
The result of Theorem 2 looks somehow different than that of Groß et al. (2001, Theorem 9), for setting there \(\mathbf{C} = \mathbf{I}_n\) gives
This discrepancy can be explained on account of the identity
following from Lemma 1. The subspace of Theorem 2 coincides with (9) when \({\fancyscript{N}}(\mathbf{H}) \cap {\fancyscript{N}}(\mathbf{P}_{\mathbf{V}\mathbf{M}}) = \{\mathbf{0}\}\), which is equivalent to \({\fancyscript{R}}(\mathbf{H}) + {\fancyscript{R}}(\mathbf{P}_{\mathbf{V}\mathbf{M}}) = \mathbb R ^{n,1}\) or \({\fancyscript{R}}(\mathbf{H}:\mathbf{V}\mathbf{M}) = {\fancyscript{R}}(\mathbf{X}:\mathbf{V}) = \mathbb R ^{n,1}\); see Puntanen et al. (2011, Proposition 5.1). However, the latter condition, given above as (4), was assumed to be valid in the whole paper. Thus, we may state what follows.
Corollary 1
Under the model (1), let \(\mathbf{H} = \mathbf{X}\mathbf{X}^\dagger \) and \(\mathbf{M} = \mathbf{I}_n - \mathbf{H}\). Then
Corollary 1 corresponds to Krämer’s (1980, Theorem), where the identity \(\mathsf{BLUE }(\varvec{\beta }) = \mathsf{OLSE }(\varvec{\beta })\) is explored under the assumption that \(\mathbf{X}\) and \(\mathbf{V}\) are of full (column) rank.
Recall that the projector \(\mathbf{P}\) introduced in (5) was determined by the model matrix \(\mathbf{X}\), for \(\mathbf{P} = \mathbf{H} = \mathbf{X}\mathbf{X}^\dagger \). In consequence, rank of \(\mathbf{P}\) coincides with the ranks of \(\mathbf{H}\) and \(\mathbf{X}\), i.e., \(r = {\mathrm{rank}}(\mathbf{H}) = {\mathrm{rank}}(\mathbf{X})\). Consider now an oblique projector of rank \(r\) having the form
with \(\mathbf{K} \in \mathbb R ^{r \times n-r}\). It was shown by Baksalary and Trenkler (2011, Sect. 3) that when \(\mathbf{K} = -\mathbf{W}_{12} (\mathbf{D} \mathbf{W}_{22}\mathbf{D})^\dagger \), where \(\mathbf{D} \in \mathbb R ^{n-r \times n-r}\) is a symmetric idempotent matrix and \(\mathbf{W}_{12} \in \mathbb R ^{r \times n-r}\) and \(\mathbf{W}_{22} \in \mathbb R ^{n-r \times n-r}\) originate from the representation of \(\mathbf{V}\) given by
then \(\mathbf{L}\mathbf{y}\) is an unbiased estimator of \(\mathbf{X} \varvec{\beta }\) whose efficiency lies between that of \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta })\) and \(\mathsf{OLSE }(\mathbf{X} \varvec{\beta })\). In what follows we identify those observation vectors \(\mathbf{y}\) which yield the same estimators, compared to \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta })\) and \(\mathsf{OLSE }(\mathbf{X} \varvec{\beta })\). The resulting formulas give an impression how close the three estimators can be.
Theorem 3
Under the model (1), let \(\mathbf{H} = \mathbf{X}\mathbf{X}^\dagger \), \(\mathbf{M} = \mathbf{I}_n - \mathbf{H}\), and \(\mathbf{T} = (\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H})^\dagger \). Moreover, let \(\mathbf{L}\) be of the form (11). Then:
-
(i)
\({\fancyscript{N}}(\mathbf{H} - \mathbf{L}) = {\fancyscript{N}}(\mathbf{L}\mathbf{M}) = {\fancyscript{R}}(\mathbf{L}) \oplus [{\fancyscript{N}}(\mathbf{H}) \cap {\fancyscript{N}}(\mathbf{L})]\),
-
(ii)
\({\fancyscript{N}}(\mathbf{T} - \mathbf{L}) = {\fancyscript{N}}(\mathbf{T}\overline{\mathbf{L}}) = {\fancyscript{R}}(\mathbf{T}) \oplus [{\fancyscript{N}}(\mathbf{L}) \cap {\fancyscript{N}}(\mathbf{T})]\).
Proof
From Lemma 1 we have \({\fancyscript{N}}(\mathbf{H} - \mathbf{L}) = {\fancyscript{N}}(\mathbf{H}\overline{\mathbf{L}}) \cap {\fancyscript{N}}(\overline{\mathbf{H}}\mathbf{L})\). Representations (5) and (11) yield \(\overline{\mathbf{H}}\mathbf{L} = \mathbf{0}\), whence, again by Lemma 1,
Since Theorem 1 ensures that \({\fancyscript{N}}(\mathbf{H} - \mathbf{L}) = {\fancyscript{N}}(\mathbf{L}\overline{\mathbf{H}}) = {\fancyscript{N}}(\mathbf{L}\mathbf{M})\), point (i) of the theorem is established.
To derive the second part of the theorem, note that Lemma 1 entails \({\fancyscript{N}}(\mathbf{T} - \mathbf{L}) = {\fancyscript{N}}(\mathbf{L}\overline{\mathbf{T}}) \cap {\fancyscript{N}}(\overline{\mathbf{L}}\mathbf{T})\). Similarly as in the proof of point (i), we obtain \(\overline{\mathbf{L}}\mathbf{T} = \mathbf{0}\) which implies \({\fancyscript{N}}(\mathbf{T} - \mathbf{L}) = {\fancyscript{N}}(\mathbf{L}\overline{\mathbf{T}})\), and thus, by Lemma 1,
On the other hand, from Theorem 1 we have \({\fancyscript{N}}(\mathbf{T} - \mathbf{L}) = {\fancyscript{N}}(\mathbf{T}\overline{\mathbf{L}})\), which completes the proof. \(\square \)
4 Bounds for the Euclidean distance
Baksalary and Kala (1980) derived a bound for \(|| \varvec{\mu }^*- \hat{\varvec{\mu }} ||\), where \(\varvec{\mu }^*= \mathsf{OLSE }(\mathbf{X} \varvec{\beta })\) and \(\hat{\varvec{\mu }} = \mathsf{BLUE }(\mathbf{X} \varvec{\beta })\), and \(|| . ||\) denotes the Euclidean norm. To be precise (Baksalary and Kala 1980, Theorem) reads
where \(\gamma \) is the largest eigenvalue of \(\mathbf{H}\mathbf{V}\mathbf{M}\mathbf{V}\mathbf{H}\) and \(\delta \) is the smallest nonzero eigenvalue of \(\mathbf{M}\mathbf{V}\mathbf{M}\). Subsequently, we provide an alternative bound, derived from the representation of \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta })\), using the oblique projector \(\mathbf{T} = (\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H})^\dagger \) given in (8).
Theorem 4
Let \(\hat{\varvec{\mu }} = (\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H})^\dagger \mathbf{y}\) with \((\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H})^\dagger \) of the form (8) be one of the representations of \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta })\). If \(\varvec{\mu }^*= \mathbf{H}\mathbf{y} = \mathsf{OLSE }(\mathbf{X} \varvec{\beta })\), then
where \(\tau _1( \mathbf{B}\mathbf{D}^\dagger )\) is the largest singular value of \(\mathbf{B}\mathbf{D}^\dagger \).
Proof
From
we obtain
Hence,
In consequence, \(|| \varvec{\mu }^*- \hat{\varvec{\mu }} ||^2 \leqslant \lambda _1(\mathbf{D}^\dagger \mathbf{B}^\prime \mathbf{B} \mathbf{D}^\dagger ) ||\mathbf{y}||^2\), where \(\lambda _1(\mathbf{D}^\dagger \mathbf{B}^\prime \mathbf{B} \mathbf{D}^\dagger )\) is the largest eigenvalue of \(\mathbf{D}^\dagger \mathbf{B}^\prime \mathbf{B} \mathbf{D}^\dagger \). The assertion follows by taking square roots. \(\square \)
It is seen from Theorem 4 that \(\varvec{\mu }^*= \hat{\varvec{\mu }}\) for all \(\mathbf{y}\) if and only if \(\mathbf{B} \mathbf{D}^\dagger = \mathbf{0}\), or, equivalently, \(\mathbf{B} = \mathbf{0}\), which means that \(\mathbf{H}\) and \(\mathbf{P}_{\mathbf{V}\mathbf{M}}\) commute.
5 Equality of BLUE and OLSE
The commutativity \(\mathbf{H}\mathbf{P}_{\mathbf{V}\mathbf{M}} = \mathbf{P}_{\mathbf{V}\mathbf{M}} \mathbf{H}\), just mentioned in the preceding section, is not contained in the standard catalogue of conditions necessary and sufficient for the equality \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta }) = \mathsf{OLSE }(\mathbf{X} \varvec{\beta })\). Among the most important conditions equivalent to the equality are:
-
(i)
\(\mathbf{H}\mathbf{V} = \mathbf{V}\mathbf{H}\),
-
(ii)
\(\mathbf{H}\mathbf{V} = \mathbf{H}\mathbf{V}\mathbf{H}\),
-
(iii)
\({\fancyscript{R}}(\mathbf{V}\mathbf{X}) = {\fancyscript{R}}(\mathbf{X}) \cap {\fancyscript{R}}(\mathbf{V})\),
-
(iv)
\(\mathbf{H}\mathbf{V}\mathbf{M} = \mathbf{0}\),
-
(v)
\({\fancyscript{R}}(\mathbf{V}\mathbf{X}) \subseteq {\fancyscript{R}}(\mathbf{X})\).
Note that the condition (v) can be rewritten as \({\fancyscript{R}}(\mathbf{V}\mathbf{X}) = {\fancyscript{R}}(\mathbf{X})\) when \(\mathbf{V}\) is nonsingular; see Krämer (1980) for a discussion related to Kruskal’s theorem and Puntanen et al. (2011, Proposition 10.1). Motivated by Theorem 4 we obtain the following result.
Theorem 5
Under the model (1), let \(\mathbf{H} = \mathbf{X}\mathbf{X}^\dagger \) and \(\mathbf{M} = \mathbf{I}_n - \mathbf{H}\). Then, the following conditions are equivalent:
-
(i)
\(\mathsf{BLUE }(\mathbf{X} \varvec{\beta }) = \mathsf{OLSE }(\mathbf{X} \varvec{\beta })\),
-
(ii)
\(\mathbf{H}\mathbf{V}\mathbf{M} = \mathbf{0}\),
-
(iii)
\(\mathbf{H}\mathbf{P}_{\mathbf{V}\mathbf{M}} = \mathbf{0}\),
-
(iv)
\(\mathbf{H} + \mathbf{P}_{\mathbf{V}\mathbf{M}}\) is an orthogonal projector,
-
(v)
\(\mathbf{H}\mathbf{P}_{\mathbf{V}\mathbf{M}} = \mathbf{P}_{\mathbf{V}\mathbf{M}}\mathbf{H}\),
-
(vi)
\(\mathbf{H}\mathbf{P}_{\mathbf{V}\mathbf{M}}\) is an orthogonal projector.
Proof
For the proof of the equivalence (i) \(\Leftrightarrow \) (ii) see Puntanen et al. (2011, Proposition 10.1).
To show that (ii) implies (iii) postmultiply \(\mathbf{H}\mathbf{V}\mathbf{M} = \mathbf{0}\) by \((\mathbf{V}\mathbf{M})^\dagger \) and refer to \(\mathbf{P}_{\mathbf{V}\mathbf{M}} = \mathbf{V}\mathbf{M}(\mathbf{V}\mathbf{M})^\dagger \). To establish the reverse implication, note that the condition \(\mathbf{H}\mathbf{P}_{\mathbf{V}\mathbf{M}} = \mathbf{0}\) entails \(\mathbf{H}\mathbf{V}\mathbf{M}(\mathbf{V}\mathbf{M})^\dagger = \mathbf{0}\). Postmultiplying this equality by \(\mathbf{V}\mathbf{M}\) leads to \(\mathbf{H}\mathbf{V}\mathbf{M} = \mathbf{0}\).
The equivalence (iii) \(\Leftrightarrow \) (iv) is well known; see e.g., Rao and Mitra (1971, Theorem 5.1.2).
The fact that (iii) \(\Rightarrow \) (v) is visibly seen by taking the transpose of \(\mathbf{H}\mathbf{P}_{\mathbf{V}\mathbf{M}} = \mathbf{0}\). For the proof of the reverse implication, recall that \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta })\) exists if and only if Eq. (7) are satisfied. By condition (v) we have \(\mathbf{H}\mathbf{P}_{\mathbf{V}\mathbf{M}}\mathbf{x} = \mathbf{P}_{\mathbf{V}\mathbf{M}}\mathbf{H}\mathbf{x}\) for any vector \(\mathbf{x} \in \mathbb R ^{n,1}\). Thus, \(\mathbf{x} \in {\fancyscript{R}}(\mathbf{H})\cap {\fancyscript{R}}(\mathbf{P}_{\mathbf{V}\mathbf{M}})\), whence \(\mathbf{H}\mathbf{P}_{\mathbf{V}\mathbf{M}}\mathbf{x} = \mathbf{0}\) for any \(\mathbf{x}\), i.e., \(\mathbf{H}\mathbf{P}_{\mathbf{V}\mathbf{M}} = \mathbf{0}\).
The part (v) \(\Leftrightarrow \) (vi) is also known in the literature; see e.g., Baksalary et al. (2002, Theorem). \(\square \)
Krämer (1980) showed how his theorem characterizing the vectors \(\mathbf{y}\) ensuring the coincidence of \(\mathsf{BLUE }(\varvec{\beta })\) and \(\mathsf{OLSE }(\varvec{\beta })\) can be used to prove Kruskal’s Theorem. This is done in a similar fashion in the present set-up.
Theorem 6
Under the model (1), let \(\mathbf{H} = \mathbf{X}\mathbf{X}^\dagger \) and \(\mathbf{M} = \mathbf{I}_n - \mathbf{H}\). Then, the following conditions are equivalent:
-
(i)
\(\mathsf{BLUE }(\mathbf{X} \varvec{\beta }) = \mathsf{OLSE }(\mathbf{X} \varvec{\beta })\),
-
(ii)
\({\fancyscript{N}}(\mathbf{H}) = {\fancyscript{N}}(\mathbf{H}\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}) \cap {\fancyscript{N}}(\mathbf{H})\).
Proof
First we show that (ii) implies (i). The condition (ii) ensures that \({\fancyscript{N}}(\mathbf{H}) \subseteq {\fancyscript{N}}(\mathbf{H}\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}})\), which yields \({\fancyscript{R}}(\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H}) \subseteq {\fancyscript{R}}(\mathbf{H})\). Thus, \(\mathbf{H}\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H} = \overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H}\). In consequence, \(\mathbf{H}\mathbf{P}_{\mathbf{V}\mathbf{M}}\mathbf{H} = \mathbf{P}_{\mathbf{V}\mathbf{M}}\mathbf{H}\), and taking the transpose leads to \(\mathbf{H}\mathbf{P}_{\mathbf{V}\mathbf{M}} = \mathbf{P}_{\mathbf{V}\mathbf{M}}\mathbf{H}\). The implication now follows on account of point (v) of Theorem 5.
The part (i) \(\Rightarrow \) (ii) is established in a similar fashion by reversing the preceding chain. \(\square \)
From the discussion preceding Corollary 1 it follows that \({\fancyscript{N}}(\mathbf{H}\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}})\), specified in (10), coincides with \({\fancyscript{R}}(\mathbf{P}_{\mathbf{V}\mathbf{M}})\) when (4) holds. In such a case, the condition (ii) of Theorem 6 reduces to \({\fancyscript{N}}(\mathbf{H}) = {\fancyscript{R}}(\mathbf{P}_{\mathbf{V}\mathbf{M}}) \cap {\fancyscript{N}}(\mathbf{H})\), or, equivalently, to \({\fancyscript{R}}(\mathbf{P}_{\mathbf{V}\mathbf{M}}) \subseteq {\fancyscript{N}}(\mathbf{H})\), i.e., \({\fancyscript{R}}(\mathbf{V}\mathbf{M}) \subseteq {\fancyscript{R}}(\mathbf{M})\). When \(\mathbf{V}\) is nonsingular, we get \({\fancyscript{R}}(\mathbf{V}\mathbf{M}) = {\fancyscript{R}}(\mathbf{M})\), which is the final condition of Kruskal’s Theorem in Krämer (1980).
Another observation is that the conditions of Theorems 5 and 6, unlike the customary conditions given on the top of the present section, predominantly deal with orthogonal projectors. Thus, the equality of \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta })\) and \(\mathsf{OLSE }(\mathbf{X} \varvec{\beta })\) is characterized in a more symmetric way.
When \(\mathbf{P}_{\mathbf{V}\mathbf{M}}\) has representation (6), then we get the following equivalences among the statements of Theorem 5:
\(\mathbf{H } + \mathbf{P }_{\mathbf{V }\mathbf{M }}\) is an orthogonal projector if and only if \(\mathbf{A } = \mathbf{0 }\),
\(\mathbf{H }\mathbf{P }_{\mathbf{V }\mathbf{M }} = \mathbf{P }_{\mathbf{V }\mathbf{M}}\mathbf{H}\) if and only if \(\mathbf{B } = \mathbf{0 }\).
Note that condition \(\mathbf{A} = \mathbf{0}\) in general is stronger than \(\mathbf{B} = \mathbf{0}\), but in the present set-up they are equivalent. There exists a large number of equivalent conditions to characterize condition (v) of Theorem 5, for instance:
see Baksalary and Trenkler (2008).
References
Baksalary JK, Kala R (1980) A new bound for the Euclidean norm of the difference between the least squares and the best linear unbiased estimators. Ann Stat 8:679–681
Baksalary JK, Baksalary OM, Szulc T (2002) A property of orthogonal projectors. Linear Algebra Appl 354:35–39
Baksalary OM, Trenkler G (2008) An alternative approach to characterize the commutativity of orthogonal projectors. Discuss Math Probab Stat 28:113–137
Baksalary OM, Trenkler G (2009) A projector oriented approach to the best linear unbiased estimator. Stat Pap 50:721–733
Baksalary OM, Trenkler G (2010) Functions of orthogonal projectors involving the Moore–Penrose inverse. Comput Math Appl 59:764–778
Baksalary OM, Trenkler G (2011) Between OLSE and BLUE. Aust N Z J Stat 53:289–303
Baksalary OM, Trenkler G (2012) On projectors and some of their applications in statistics. In: Bapat RB, Kirkland S, Prasad KM, Puntanen S (eds) Lectures on matrix and graph methods. Manipal University Press, Manipal, pp 113–127
Baksalary OM, Trenkler G (2013) On column and null spaces of functions of a pair of oblique projectors. Linear Multilinear Algebra. doi:10.1080/03081087.2012.731055
Greville TNE (1974) Solutions of the matrix equation \(XAX = X\), and relations between oblique and orthogonal projectors. SIAM J Appl Math 26:828–832
Groß J (2004) The general Gauss–Markov model with possibly singular dispersion matrix. Stat Pap 45:311–336
Groß J, Trenkler G, Werner HJ (2001) The equality of linear transforms of the ordinary least squares estimator and the best linear unbiased estimator. Sankhyā 63:118–127
Jaeger A, Krämer W (1998) A final twist on the equality of OLS and GLS. Stat Pap 39:321–324
Krämer W (1980) A note on the equality of ordinary least squares and Gauss–Markov estimates in the general linear model. Sankhyā 42:130–131
Krämer W, Bartels R, Fiebig DG (1996) Another twist on the equality of OLS and GLS. Stat Pap 37: 277–281
Kruskal W (1968) When are Gauss–Markov and least squares estimators identical? A coordinate-free approach. Ann Math Stat 39:70–75
Puntanen S, Styan GPH, Isotalo J (2011) Matrix tricks for linear statistical models: our personal top twenty. Springer, Heidelberg
Rao CR (1973) Linear statistical inference and its applications, 2nd edn. Wiley, New York
Rao CR, Mitra SK (1971) Generalized inverse of matrices and its applications. Wiley, New York
Trenkler G (1994) Characterizations of oblique and orthogonal projectors. In: Caliński T, Kala R (eds) Proceedings of the international conference on linear statistical inference LINSTAT’93. Kluwer, Dordrecht, pp 255–270
Acknowledgments
The authors are very grateful to two anonymous referees whose remarks considerably improved the paper. In particular, we are thankful for the hints concerning uniqueness of the BLUE from which, besides a number of improvement proposals, we got the idea for the example given in the Appendix.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
As an example to describe the situation when \({\fancyscript{R}}(\mathbf{X} : \mathbf{V})\) is a proper subset of \(\mathbb R ^{n,1}\), consider the linear model (1), where \(\mathbf{X} = (1, 0, 0)^\prime \in \mathbb R ^{3, 1}\) and \(\mathbf{V} = {\mathrm{diag}}(0, 1, 0) \in \mathbb R ^{3 \times 3}\). Then \({\fancyscript{R}}(\mathbf{X} : \mathbf{V})\) is the linear combination of the vectors \((1, 0, 0)^\prime \) and \((0, 1, 0)^\prime \), and does not fill out the whole space \(\mathbb R ^{3, 1}\). It follows that \(\mathbf{H} = \mathbf{X}\mathbf{X}^\dagger = {\mathrm{diag}}(1, 0, 0)\), \(\mathbf{M} = \mathbf{I}_3 - \mathbf{H} = {\mathrm{diag}}(0, 1, 1)\), \(\mathbf{V}\mathbf{M} = \mathbf{V}\), \(\mathbf{H}\mathbf{V} = \mathbf{V}\mathbf{H} = \mathbf{0}\), \(\mathbf{P}_{\mathbf{V}\mathbf{M}} = \mathbf{V}\), \(\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}} = {\mathrm{diag}}(1, 0, 1)\), \(\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H} = \mathbf{H}\), and \(\mathbf{T} = (\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H})^\dagger = \mathbf{H}\). Hence, \(\mathbf{H}\mathbf{y} = \mathbf{T}\mathbf{y} = \mathbf{y}\) for all \(\mathbf{y} \in \mathbb R ^{3,1}\), i.e., \(\mathsf{OLSE }(\mathbf{X} \varvec{\beta }) = \mathsf{BLUE }(\mathbf{X} \varvec{\beta })\) everywhere.
Let us now have a look at the estimator \(\mathbf{G}\mathbf{y}\), where \(\mathbf{G} = {\mathrm{diag}}(1, 0, g)\), with arbitrary \(g \in \mathbb R \). The matrix \(\mathbf{G}\) satisfies Eq. (3), which means that with varying g the statistic \(\mathbf{G}\mathbf{y}\) gives an infinite number of alternative representations of \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta })\). Observe that when \(\mathbf{y} = (y_1, y_2, y_3)^\prime \), then we get \((y_1, 0, gy_3)^\prime \) as a best unbiased, but somewhat ridiculous estimator of \(\mathbf{X} \varvec{\beta }\). It follows that \(\mathsf Cov (\mathbf{H}\mathbf{y}) = \mathsf Cov (\mathbf{T}\mathbf{y}) = \mathsf Cov (\mathbf{G}\mathbf{y}) = \mathbf{0}\). Furthermore, when \(g \ne 0\), then we have \({\fancyscript{N}}(\mathbf{H} - \mathbf{G}) = {\mathrm{span}} \{(1, 0, 0)^\prime , (0, 1, 0)^\prime \}\), in contrast to \({\fancyscript{N}}(\mathbf{H} - \mathbf{T}) = \mathbb R ^{3, 1}\). Note however that in this case the vector \(\mathbf{y}\) does not satisfy the consistency condition of Sect. 1.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
About this article
Cite this article
Baksalary, O.M., Trenkler, G. & Liski, E. Let us do the twist again. Stat Papers 54, 1109–1119 (2013). https://doi.org/10.1007/s00362-013-0512-3
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-013-0512-3
Keywords
- Gauss–Markov model
- Kruskal’s Theorem
- Best linear unbiased estimator
- Ordinary least squares unbiased estimator
- Orthogonal projector