1 Introduction

Let us consider the general Gauss–Markov model

$$\begin{aligned} \mathbf{y } = \mathbf{X } \varvec{\beta } + \mathbf{u }, \end{aligned}$$
(1)

where \(\mathbf{y}\) is an \(n \times 1\) observable random vector, \(\mathbf{X}\) is a known \(n \times p\) model matrix, \(\varvec{\beta }\) is a \(p \times 1\) vector of unknown parameters, and \(\mathbf{u}\) is an \(n \times 1\) random error vector. The expectation vector and the covariance matrix of \(\mathbf{u}\) are \(\mathsf E (\mathbf{u}) = \mathbf{0}\) and \(\mathsf Cov (\mathbf{u}) = \sigma ^2 \mathbf{V}\), respectively, where \(\sigma ^2 > 0\) is an unknown constant and \(\mathbf{V}\) is a known \(n \times n\) nonnegative definite matrix. Both \(\mathbf{X}\) and \(\mathbf{V}\) may be rank deficient. It is assumed beforehand that the model (1) is consistent, i.e., \(\mathbf{y} \in {\fancyscript{R}} ( \mathbf{X} : \mathbf{V} )\), where \({\fancyscript{R}}(\mathbf{.})\) stands for the column space of a matrix argument and \((\mathbf{X} : \mathbf{V})\) denotes the \(n \times (p + n)\) columnwise partitioned matrix obtained by juxtaposing matrices \(\mathbf{X}\) and \(\mathbf{V}\); cf. Rao (1973, p. 297) or Puntanen et al. (2011, pp. 43, 125).

In his paper, Krämer (1980, p. 130) posed the following problem: “Which are the \(\mathbf{y}\), given \(\mathbf{X}\) and \(\mathbf{V}\), such that ordinary least squares (OLS) and Gauss–Markov are equal?” In other words, the problem aimed at identifying those vectors \(\mathbf{y}\) for which the OLS and Gauss–Markov estimates of the parameter vector \(\varvec{\beta }\) coincide. Referring to this problem, in a follow-up paper Krämer et al. (1996) called this a “twist” to Kruskal’s Theorem (Kruskal 1968), which provides conditions necessary and sufficient for the OLS and Gauss–Markov estimates of \(\varvec{\beta }\) to be equal. In Krämer et al. (1996) “another twist” to Kruskal’s Theorem is dealt with, and rather than asking when is the OLS equal to the Gauss–Markov for the full regression vector \(\varvec{\beta }\), a condition for the equality of the OLS and Gauss–Markov for a subparameter of \(\varvec{\beta }\) is provided. A more general “final twist” is considered in Jaeger and Krämer (1998), where the single vectors \(\mathbf{y}\) are characterized that yield identical OLS and Gauss–Markov estimators for such a subparameter.

Inspired by Jaeger and Krämer (1998), Krämer (1980), and Krämer et al. (1996), in what follows we “do the twist again”. However, unlike in the three papers, we do not assume that \(\mathbf{X}\) and \(\mathbf{V}\) are of full (column) rank, which means that the vector \(\varvec{\beta }\) is not necessarily unbiasedly estimable. For this reason, instead of the estimation of \(\varvec{\beta }\), we consider the estimation of the systematic part \(\mathsf E (\mathbf{y}) = \mathbf{X} \varvec{\beta }\). Note that this parameter function always has a linear unbiased estimator, namely \(\mathbf{y}\) itself.

An important role in the subsequent considerations will be played by the notion of a projector. It is known that any \(n \times n\) idempotent matrix, say \(\mathbf{F} \in \mathbb{R }^{n \times n}\), is an oblique projector onto its column space \({\fancyscript{R}}(\mathbf{F})\) along its null space \({\fancyscript{N}}(\mathbf{F})\), where \({\fancyscript{R}}(\mathbf{F}) \oplus {\fancyscript{N}}(\mathbf{F}) = \mathbb{R }^{n,1}\). Among many conditions characterizing idempotent matrices one finds for instance: \(\mathbf{F}^2 = \mathbf{F} \Leftrightarrow {\fancyscript{R}}(\mathbf{F}) = {\fancyscript{N}}(\overline{\mathbf{F}}) \Leftrightarrow {\fancyscript{R}}(\overline{\mathbf{F}}) = {\fancyscript{N}}(\mathbf{F})\), where \(\overline{\mathbf{F}} = \mathbf{I}_n - \mathbf{F}\). When idempotent \(\mathbf{F}\) projects onto \({\fancyscript{R}}(\mathbf{F})\) along the orthogonal complement of \({\fancyscript{R}}(\mathbf{F})\), then it is called an orthogonal projector. It can be verified that \(\mathbf{F}\) is an orthogonal projector if and only if it is both idempotent and symmetric, i.e., \(\mathbf{F}^2 = \mathbf{F} = \mathbf{F}^\prime \). Projectors are widely used in Statistics and Econometrics as a basic tool for estimation and test procedures.

Let \(\mathbf{G}\) be an \(n \times n\) matrix. An estimator \(\mathbf{G}\mathbf{y}\) for \(\mathbf{X} \varvec{\beta }\), which is unbiased and of minimal covariance matrix in the Löwner sense fulfills the conditions

$$\begin{aligned} \mathbf{G }\mathbf{H } = \mathbf{H } \quad {\mathrm{and}} \quad \mathbf{G }\mathbf{V }\mathbf{M } = \mathbf{0 }, \end{aligned}$$
(2)

where \(\mathbf{H} = \mathbf{X}\mathbf{X}^\dagger \) and \(\mathbf{M} = \mathbf{I}_n - \mathbf{H}\) are the orthogonal projectors onto, respectively, \({\fancyscript{R}}(\mathbf{X})\), the column space of \(\mathbf{X}\), and the orthogonal complement of \({\fancyscript{R}}(\mathbf{X})\) which coincides with \({\fancyscript{N}}(\mathbf{X}^\prime )\), the null space of \(\mathbf{X}^\prime \). The symbol \(\mathbf{X}^\dagger \) denotes the Moore–Penrose inverse of \(\mathbf{X}\). The conditions (2) can be rewritten as

$$\begin{aligned} \mathbf{G }(\mathbf{H } : \mathbf{P }_{\mathbf{V }\mathbf{M }}) = (\mathbf{H } : \mathbf{0 }), \end{aligned}$$
(3)

where \(\mathbf{P}_{\mathbf{V}\mathbf{M}} = \mathbf{V}\mathbf{M}(\mathbf{V}\mathbf{M})^\dagger \) is the orthogonal projector onto \({\fancyscript{R}}(\mathbf{V}\mathbf{M})\). It was pointed out in (Baksalary and Trenkler (2012), Remark 3.1) that Eq. (3) always has a solution \(\mathbf{G}\) and that each \(\mathbf{G}\) satisfying (3) yields a representation of the best linear unbiased estimator \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta })\) of \(\mathbf{X} \varvec{\beta }\). All these representations coincide; see Groß (2004, Corollary 3). In the Appendix given below it is demonstrated that there may exist, however, quite useless versions of \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta })\). To avoid this discrepancy subsequently we strengthen the consistency condition \(\mathbf{y} \in {\fancyscript{R}} ( \mathbf{X} : \mathbf{V} )\) to

$$\begin{aligned} {\fancyscript{R}} ( \mathbf{X } : \mathbf{V } ) = \mathbb R ^{n,1}. \end{aligned}$$
(4)

Then, according to Groß (2004, Corollary 4), \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta })\) is unique.

In the next section some representations of the best linear unbiased estimator (BLUE) and the ordinary least squares estimator (OLSE) are provided, whereas Sect. 3 deals with “another twist” to Kruskal’s Theorem, which was briefly mentioned above. Section 4 is concerned with bounds for the Euclidean distance between BLUE and OLSE of \(\mathbf{X} \varvec{\beta }\), and the last section of the paper revisits the problem of when BLUE equals OLSE.

2 Representations of BLUE and OLSE

Let \(\mathbf{P}\) be an orthogonal projector in \(\mathbb{R }^{n,1}\), i.e., an \(n \times n\) real symmetric idempotent matrix. Assume that the rank of \(\mathbf{P}\) is r. It is known that there exists an orthogonal matrix \(\mathbf{U}\) such that

$$\begin{aligned} \mathbf{P } = \mathbf{U }\left( \begin{array}{l@{\quad }l} \mathbf{I }_r &{} \mathbf{0 } \\ \mathbf{0 } &{} \mathbf{0 } \\ \end{array}\right) \mathbf{U }^\prime ; \end{aligned}$$
(5)

see Trenkler (1994, Theorem 13). Any other orthogonal projector of the same size, say \(\mathbf{Q} \in \mathbb R ^{n \times n}\), can be represented as

$$\begin{aligned} \mathbf{Q } = \mathbf{U }\left( \begin{array}{l@{\quad }l} \mathbf{A } &{} \mathbf{B } \\ \mathbf{B }^\prime &{} \mathbf{D } \\ \end{array}\right) \mathbf{U }^\prime , \end{aligned}$$
(6)

with symmetric matrices \(\mathbf{A}\) and \(\mathbf{D}\) of orders \(r\) and \(n-r\), respectively.

In what follows we assume that \(\mathbf{H} = \mathbf{X}\mathbf{X}^\dagger \) is represented by \(\mathbf{P}\) of the form (5) and \(\mathbf{P}_{\mathbf{V}\mathbf{M}}\) is represented by \(\mathbf{Q}\) defined in (6), i.e., \(\mathbf{P} = \mathbf{H} = \mathbf{X}\mathbf{X}^\dagger \) and \(\mathbf{Q} = \mathbf{P}_{\mathbf{V}\mathbf{M}} = \mathbf{V}\mathbf{M}(\mathbf{V}\mathbf{M})^\dagger \). It can be verified that \(\mathbf{T} = (\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H})^\dagger \), where \(\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}} = \mathbf{I}_n - \mathbf{P}_{\mathbf{V}\mathbf{M}}\), is an idempotent matrix; see Greville (1974, p. 830). From (5) and (6) we obtain

$$\begin{aligned} \mathbf{T } = \mathbf{U }\left( \begin{array}{llll} \mathbf{P }_{\overline{\mathbf{A }}} &{}\quad -\mathbf{B }\mathbf{D }^\dagger \\ \mathbf{0 } &{}\quad \mathbf{0 } \\ \end{array}\right) \mathbf{U }^\prime , \end{aligned}$$

where \(\mathbf{P}_{\overline{\mathbf{A}}}\) is the orthogonal projector onto the column space of \(\overline{\mathbf{A}} = \mathbf{I}_r - \mathbf{A}\). It follows that \(\mathbf{T}\) is the oblique projector onto \({\fancyscript{R}}(\mathbf{H}) \cap [{\fancyscript{N}}(\mathbf{H}) + {\fancyscript{N}}(\mathbf{P}_{\mathbf{V}\mathbf{M}})]\) along \({\fancyscript{R}}(\mathbf{P}_{\mathbf{V}\mathbf{M}}) \stackrel{\perp }{\oplus } [{\fancyscript{N}}(\mathbf{H}) \cap {\fancyscript{N}}(\mathbf{P}_{\mathbf{V}\mathbf{M}})]\), where \(\stackrel{\perp }{\oplus }\) indicates that the two subspaces involved in the direct sum are orthogonal; see Baksalary and Trenkler (2010, Theorem 2). From

$$\begin{aligned} {\fancyscript{R}}(\mathbf{H }) \cap {\fancyscript{R}}(\mathbf{V }\mathbf{M }) = {\fancyscript{R}}(\mathbf{H }) \cap {\fancyscript{R}}(\mathbf{P }_{\mathbf{V }\mathbf{M }}) = \{\mathbf{0 }\} \end{aligned}$$
(7)

(see Baksalary and Trenkler 2009, Theorem 1), we arrive at \({\fancyscript{N}}(\mathbf{H}) + {\fancyscript{N}}(\mathbf{P}_{\mathbf{V}\mathbf{M}}) = \mathbb R ^{n,1}\), which leads to the conclusion that \(\mathbf{T}\) is the oblique projector onto \({\fancyscript{R}}(\mathbf{H})\) along \({\fancyscript{R}}(\mathbf{P}_{\mathbf{V}\mathbf{M}}) \stackrel{\perp }{\oplus } [{\fancyscript{N}}(\mathbf{H}) \cap {\fancyscript{N}}(\mathbf{P}_{\mathbf{V}\mathbf{M}})]\). Furthermore, it follows that \(\mathbf{T}\) takes the form

$$\begin{aligned} \mathbf{T } = \mathbf{U }\left( \begin{array}{ll} \mathbf{I }_r &{}\quad -\mathbf{B }\mathbf{D }^\dagger \\ \mathbf{0 } &{}\quad \mathbf{0 } \\ \end{array}\right) \mathbf{U }^\prime ; \end{aligned}$$
(8)

see Baksalary and Trenkler (2010, Sect. 2).

It is well known that (7) ensures that Eq. (3) is solvable. One of the solutions, namely \(\mathbf{T} = (\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H})^\dagger \), gives a representation of the BLUE for \(\mathbf{X} \varvec{\beta }\), i.e., \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta }) = \mathbf{T}\mathbf{y}\). There is a number of further expressions for the BLUE (see Baksalary and Trenkler 2009, Sect. 4), but they all coincide by the assumption (4). Observe that the OLSE of \(\mathbf{X} \varvec{\beta }\) is \(\mathsf{OLSE }(\mathbf{X} \varvec{\beta }) = \mathbf{H}\mathbf{y}\).

3 Another twist

As in Krämer (1980), we consider the problem of identifying those observation vectors \(\mathbf{y }\) which yield the same value of \(\mathsf{OLSE }(\mathbf{X} \varvec{\beta })\) and \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta })\). This amounts to an analysis of the subspace \({\fancyscript{L}}\) of \(\mathbb R ^{n,1}\) which is the null space of \(\mathbf{H} - \mathbf{T}\), i.e., \({\fancyscript{L}} = {\fancyscript{N}}(\mathbf{H} - \mathbf{T})\). For this purpose the following result is useful.

Lemma 1

Let \(\mathbf{R}\) and \(\mathbf{S}\) be idempotent matrices of the same size. Then:

  1. (i)

    \({\fancyscript{N}}(\mathbf{S} - \mathbf{R}) = {\fancyscript{N}}(\mathbf{S}\overline{\mathbf{R}}) \cap {\fancyscript{N}}(\overline{\mathbf{S}}\mathbf{R})\),

  2. (ii)

    \({\fancyscript{N}}(\mathbf{R}\mathbf{S}) = {\fancyscript{N}}(\mathbf{S}) \oplus [{\fancyscript{N}}(\mathbf{R}) \cap {\fancyscript{R}}(\mathbf{S})]\).

Proof

For a proof see (Baksalary and Trenkler (2013), Theorems 1 and 9). \(\square \)

Lemma 1 leads to the following result.

Theorem 1

Under the model (1), let \(\mathbf{H} = \mathbf{X}\mathbf{X}^\dagger \), \(\mathbf{M} = \mathbf{I}_n - \mathbf{H}\), and \(\mathbf{T} = (\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H})^\dagger \). Then

$$\begin{aligned} {\fancyscript{L}} = {\fancyscript{N}}(\mathbf{H } - \mathbf{T }) = {\fancyscript{N}}(\mathbf{T }\mathbf{M }). \end{aligned}$$

Proof

Lemma 1 yields

$$\begin{aligned} {\fancyscript{N}}(\mathbf{H } - \mathbf{T }) = {\fancyscript{N}}(\mathbf{T }\overline{\mathbf{H }}) \cap {\fancyscript{N}}(\overline{\mathbf{T }}\mathbf{H }) = {\fancyscript{N}}(\mathbf{T }\mathbf{M }) \cap {\fancyscript{N}}(\overline{\mathbf{T }}\mathbf{H }). \end{aligned}$$

Another relevant fact is that with \(\mathbf{H}\) of the form (5) and \(\mathbf{T}\) given in (8) we directly get \(\overline{\mathbf{T}}\mathbf{H} = \mathbf{0}\). \(\square \)

The vectors belonging to the subspace \({\fancyscript{L}} = {\fancyscript{N}}(\mathbf{H} - \mathbf{T})\) can be explicitly written as

$$\begin{aligned} {\fancyscript{L}} = \{ [\mathbf{I }_n - (\mathbf{T }\mathbf{M })^\dagger \mathbf{T }\mathbf{M }] \mathbf{z }:\mathbf{z } \in \mathbb R ^{n,1}\}, \end{aligned}$$

as the solutions to the equation \(\mathbf{T}\mathbf{M}\mathbf{z} = \mathbf{0}\). Observe also that \({\fancyscript{N}}(\mathbf{T}\mathbf{M}) \supseteq {\fancyscript{N}}(\mathbf{M}) = {\fancyscript{R}}(\mathbf{H})\). This means that, inter alia, all vectors belonging to \({\fancyscript{R}}(\mathbf{H}) = {\fancyscript{R}}(\mathbf{X})\) result in estimates such that \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta }) = \mathsf OLSE (\mathbf{X} \varvec{\beta })\), for example \(\hat{\mathbf{y}} = \mathbf{H}\mathbf{y}\). This does not come as a surprise, since by (5) and (8) it follows that \(\mathbf{T}\mathbf{H} = \mathbf{H}\).

Further characterization of the subspace \({\fancyscript{L}} = {\fancyscript{N}}(\mathbf{H} - \mathbf{T})\) is established in the theorem below.

Theorem 2

Under the model (1), let \(\mathbf{H} = \mathbf{X}\mathbf{X}^\dagger \), \(\mathbf{M} = \mathbf{I}_n - \mathbf{H}\), and \(\mathbf{T} = (\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H})^\dagger \). Then

$$\begin{aligned} {\fancyscript{L}} = {\fancyscript{R}}(\mathbf{H })\stackrel{\perp }{\oplus } [{\fancyscript{N}}(\mathbf{H }\overline{\mathbf{P }}_{\mathbf{V }\mathbf{M }}) \cap {\fancyscript{N}}(\mathbf{H })]. \end{aligned}$$

Proof

By Theorem 1 we have \({\fancyscript{L}} = {\fancyscript{N}}(\mathbf{T}\mathbf{M})\). Hence, Lemma 1 implies

$$\begin{aligned} {\fancyscript{N}}(\mathbf{T }\mathbf{M }) = {\fancyscript{N}}(\mathbf{M }) \oplus [{\fancyscript{N}}(\mathbf{T }) \cap {\fancyscript{R}}(\mathbf{M })] = {\fancyscript{R}}(\mathbf{H }) \oplus [{\fancyscript{N}}(\mathbf{T }) \cap {\fancyscript{N}}(\mathbf{H })]. \end{aligned}$$

Now

$$\begin{aligned} {\fancyscript{N}}(\mathbf{T }) = {\fancyscript{N}}[(\overline{\mathbf{P}}_{\mathbf{V }\mathbf{M }}\mathbf{H })^\dagger ] = {\fancyscript{N}}[(\overline{\mathbf{P }}_{\mathbf{V }\mathbf{M }}\mathbf{H })^\prime ] = {\fancyscript{N}}(\mathbf{H }\overline{\mathbf{P }}_{\mathbf{V }\mathbf{M }}), \end{aligned}$$

which completes the proof. \(\square \)

The result of Theorem 2 looks somehow different than that of Groß et al. (2001, Theorem 9), for setting there \(\mathbf{C} = \mathbf{I}_n\) gives

$$\begin{aligned} {\fancyscript{L}} = {\fancyscript{R}}(\mathbf{H }) \oplus [{\fancyscript{R}}(\mathbf{V }\mathbf{M }) \cap {\fancyscript{N}}(\mathbf{H })]. \end{aligned}$$
(9)

This discrepancy can be explained on account of the identity

$$\begin{aligned} {\fancyscript{N}}(\mathbf{H }\overline{\mathbf{P }}_{\mathbf{V }\mathbf{M }}) = {\fancyscript{R}}(\mathbf{P }_{\mathbf{V }\mathbf{M }}) \stackrel{\perp }{\oplus } [{\fancyscript{N}}(\mathbf{H }) \cap {\fancyscript{N}}(\mathbf{P }_{\mathbf{V }\mathbf{M }})], \end{aligned}$$
(10)

following from Lemma 1. The subspace of Theorem 2 coincides with (9) when \({\fancyscript{N}}(\mathbf{H}) \cap {\fancyscript{N}}(\mathbf{P}_{\mathbf{V}\mathbf{M}}) = \{\mathbf{0}\}\), which is equivalent to \({\fancyscript{R}}(\mathbf{H}) + {\fancyscript{R}}(\mathbf{P}_{\mathbf{V}\mathbf{M}}) = \mathbb R ^{n,1}\) or \({\fancyscript{R}}(\mathbf{H}:\mathbf{V}\mathbf{M}) = {\fancyscript{R}}(\mathbf{X}:\mathbf{V}) = \mathbb R ^{n,1}\); see Puntanen et al. (2011, Proposition 5.1). However, the latter condition, given above as (4), was assumed to be valid in the whole paper. Thus, we may state what follows.

Corollary 1

Under the model (1), let \(\mathbf{H} = \mathbf{X}\mathbf{X}^\dagger \) and \(\mathbf{M} = \mathbf{I}_n - \mathbf{H}\). Then

$$\begin{aligned} {\fancyscript{L}} = {\fancyscript{R}}(\mathbf{H }) \stackrel{\perp }{\oplus } [{\fancyscript{R}}(\mathbf{V }\mathbf{M })\cap {\fancyscript{N}}(\mathbf{H })]. \end{aligned}$$

Corollary 1 corresponds to Krämer’s (1980, Theorem), where the identity \(\mathsf{BLUE }(\varvec{\beta }) = \mathsf{OLSE }(\varvec{\beta })\) is explored under the assumption that \(\mathbf{X}\) and \(\mathbf{V}\) are of full (column) rank.

Recall that the projector \(\mathbf{P}\) introduced in (5) was determined by the model matrix \(\mathbf{X}\), for \(\mathbf{P} = \mathbf{H} = \mathbf{X}\mathbf{X}^\dagger \). In consequence, rank of \(\mathbf{P}\) coincides with the ranks of \(\mathbf{H}\) and \(\mathbf{X}\), i.e., \(r = {\mathrm{rank}}(\mathbf{H}) = {\mathrm{rank}}(\mathbf{X})\). Consider now an oblique projector of rank \(r\) having the form

$$\begin{aligned} \mathbf{L } = \mathbf{U }\left( \begin{array}{l@{\quad }l} \mathbf{I }_r &{} \mathbf{K } \\ \mathbf{0 } &{} \mathbf{0 } \end{array}\right) \mathbf{U }^\prime , \end{aligned}$$
(11)

with \(\mathbf{K} \in \mathbb R ^{r \times n-r}\). It was shown by Baksalary and Trenkler (2011, Sect. 3) that when \(\mathbf{K} = -\mathbf{W}_{12} (\mathbf{D} \mathbf{W}_{22}\mathbf{D})^\dagger \), where \(\mathbf{D} \in \mathbb R ^{n-r \times n-r}\) is a symmetric idempotent matrix and \(\mathbf{W}_{12} \in \mathbb R ^{r \times n-r}\) and \(\mathbf{W}_{22} \in \mathbb R ^{n-r \times n-r}\) originate from the representation of \(\mathbf{V}\) given by

$$\begin{aligned} \mathbf{V } = \mathbf{U } \left( \begin{array}{ll} \mathbf{W }_{11} &{}\quad \mathbf{W }_{12} \\ \mathbf{W }_{12}^\prime &{}\quad \mathbf{W }_{22} \end{array}\right) \mathbf{U }^\prime , \end{aligned}$$

then \(\mathbf{L}\mathbf{y}\) is an unbiased estimator of \(\mathbf{X} \varvec{\beta }\) whose efficiency lies between that of \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta })\) and \(\mathsf{OLSE }(\mathbf{X} \varvec{\beta })\). In what follows we identify those observation vectors \(\mathbf{y}\) which yield the same estimators, compared to \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta })\) and \(\mathsf{OLSE }(\mathbf{X} \varvec{\beta })\). The resulting formulas give an impression how close the three estimators can be.

Theorem 3

Under the model (1), let \(\mathbf{H} = \mathbf{X}\mathbf{X}^\dagger \), \(\mathbf{M} = \mathbf{I}_n - \mathbf{H}\), and \(\mathbf{T} = (\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H})^\dagger \). Moreover, let \(\mathbf{L}\) be of the form (11). Then:

  1. (i)

    \({\fancyscript{N}}(\mathbf{H} - \mathbf{L}) = {\fancyscript{N}}(\mathbf{L}\mathbf{M}) = {\fancyscript{R}}(\mathbf{L}) \oplus [{\fancyscript{N}}(\mathbf{H}) \cap {\fancyscript{N}}(\mathbf{L})]\),

  2. (ii)

    \({\fancyscript{N}}(\mathbf{T} - \mathbf{L}) = {\fancyscript{N}}(\mathbf{T}\overline{\mathbf{L}}) = {\fancyscript{R}}(\mathbf{T}) \oplus [{\fancyscript{N}}(\mathbf{L}) \cap {\fancyscript{N}}(\mathbf{T})]\).

Proof

From Lemma 1 we have \({\fancyscript{N}}(\mathbf{H} - \mathbf{L}) = {\fancyscript{N}}(\mathbf{H}\overline{\mathbf{L}}) \cap {\fancyscript{N}}(\overline{\mathbf{H}}\mathbf{L})\). Representations (5) and (11) yield \(\overline{\mathbf{H}}\mathbf{L} = \mathbf{0}\), whence, again by Lemma 1,

$$\begin{aligned} {\fancyscript{N}}(\mathbf{H } - \mathbf{L }) = {\fancyscript{N}}(\mathbf{H }\overline{\mathbf{L }}) = {\fancyscript{N}}(\overline{\mathbf{L }}) \oplus [{\fancyscript{N}}(\mathbf{H }) \cap {\fancyscript{R}}(\overline{\mathbf{L }})] = {\fancyscript{R}}(\mathbf{L }) \oplus [{\fancyscript{N}}(\mathbf{H }) \cap {\fancyscript{N}}(\mathbf{L })]. \end{aligned}$$

Since Theorem 1 ensures that \({\fancyscript{N}}(\mathbf{H} - \mathbf{L}) = {\fancyscript{N}}(\mathbf{L}\overline{\mathbf{H}}) = {\fancyscript{N}}(\mathbf{L}\mathbf{M})\), point (i) of the theorem is established.

To derive the second part of the theorem, note that Lemma 1 entails \({\fancyscript{N}}(\mathbf{T} - \mathbf{L}) = {\fancyscript{N}}(\mathbf{L}\overline{\mathbf{T}}) \cap {\fancyscript{N}}(\overline{\mathbf{L}}\mathbf{T})\). Similarly as in the proof of point (i), we obtain \(\overline{\mathbf{L}}\mathbf{T} = \mathbf{0}\) which implies \({\fancyscript{N}}(\mathbf{T} - \mathbf{L}) = {\fancyscript{N}}(\mathbf{L}\overline{\mathbf{T}})\), and thus, by Lemma 1,

$$\begin{aligned} {\fancyscript{N}}(\mathbf{T } - \mathbf{L }) = {\fancyscript{N}}(\mathbf{L }\overline{\mathbf{T }}) = {\fancyscript{N}}(\overline{\mathbf{T }}) \oplus [{\fancyscript{N}}(\mathbf{L }) \cap {\fancyscript{R}}(\overline{\mathbf{T }})] = {\fancyscript{R}}(\mathbf{T }) \oplus [{\fancyscript{N}}(\mathbf{L }) \cap {\fancyscript{N}}(\mathbf{T })]. \end{aligned}$$

On the other hand, from Theorem 1 we have \({\fancyscript{N}}(\mathbf{T} - \mathbf{L}) = {\fancyscript{N}}(\mathbf{T}\overline{\mathbf{L}})\), which completes the proof. \(\square \)

4 Bounds for the Euclidean distance

Baksalary and Kala (1980) derived a bound for \(|| \varvec{\mu }^*- \hat{\varvec{\mu }} ||\), where \(\varvec{\mu }^*= \mathsf{OLSE }(\mathbf{X} \varvec{\beta })\) and \(\hat{\varvec{\mu }} = \mathsf{BLUE }(\mathbf{X} \varvec{\beta })\), and \(|| . ||\) denotes the Euclidean norm. To be precise (Baksalary and Kala 1980, Theorem) reads

$$\begin{aligned} || \varvec{\mu }^*- \hat{\varvec{\mu }} || \leqslant (\gamma ^{1/2}/\delta ) || \mathbf{y } - \varvec{\mu }^*||, \end{aligned}$$

where \(\gamma \) is the largest eigenvalue of \(\mathbf{H}\mathbf{V}\mathbf{M}\mathbf{V}\mathbf{H}\) and \(\delta \) is the smallest nonzero eigenvalue of \(\mathbf{M}\mathbf{V}\mathbf{M}\). Subsequently, we provide an alternative bound, derived from the representation of \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta })\), using the oblique projector \(\mathbf{T} = (\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H})^\dagger \) given in (8).

Theorem 4

Let \(\hat{\varvec{\mu }} = (\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H})^\dagger \mathbf{y}\) with \((\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H})^\dagger \) of the form (8) be one of the representations of \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta })\). If \(\varvec{\mu }^*= \mathbf{H}\mathbf{y} = \mathsf{OLSE }(\mathbf{X} \varvec{\beta })\), then

$$\begin{aligned} || \varvec{\mu }^*- \hat{\varvec{\mu }} || \leqslant \tau _1( \mathbf{B}\mathbf{D }^\dagger ) || \mathbf{y } ||, \end{aligned}$$

where \(\tau _1( \mathbf{B}\mathbf{D}^\dagger )\) is the largest singular value of \(\mathbf{B}\mathbf{D}^\dagger \).

Proof

From

$$\begin{aligned} \varvec{\mu }^*= \mathbf{U }\left( \begin{array}{l@{\quad }l} \mathbf{I }_r &{} \mathbf{0 } \\ \mathbf{0 } &{} \mathbf{0 } \end{array}\right) \mathbf{U }^\prime \mathbf{y } \quad {\mathrm{and}} \quad \hat{\varvec{\mu }} = \mathbf{U }\left( \begin{array}{ll} \mathbf{I }_r &{}\quad -\mathbf{B }\mathbf{D }^\dagger \\ \mathbf{0 } &{}\quad \mathbf{0 } \\ \end{array}\right) \mathbf{U }^\prime \mathbf{y } \end{aligned}$$

we obtain

$$\begin{aligned} \varvec{\mu }^*- \hat{\varvec{\mu }} = \mathbf{U }\left( \begin{array}{ll} \mathbf{0 } &{}\quad \mathbf{B }\mathbf{D }^\dagger \\ \mathbf{0 } &{}\quad \mathbf{0 } \\ \end{array}\right) \mathbf{U }^\prime \mathbf{y }. \end{aligned}$$

Hence,

$$\begin{aligned} || \varvec{\mu }^*- \hat{\varvec{\mu }} ||^2 = \mathbf{y }^\prime \mathbf{U } \left( \begin{array}{ll} \mathbf{0 } &{}\quad \mathbf{0 } \\ \mathbf{0 } &{}\quad \mathbf{D }^\dagger \mathbf{B }^\prime \mathbf{B } \mathbf{D }^\dagger \\ \end{array}\right) \mathbf{U }^\prime \mathbf{y }. \end{aligned}$$

In consequence, \(|| \varvec{\mu }^*- \hat{\varvec{\mu }} ||^2 \leqslant \lambda _1(\mathbf{D}^\dagger \mathbf{B}^\prime \mathbf{B} \mathbf{D}^\dagger ) ||\mathbf{y}||^2\), where \(\lambda _1(\mathbf{D}^\dagger \mathbf{B}^\prime \mathbf{B} \mathbf{D}^\dagger )\) is the largest eigenvalue of \(\mathbf{D}^\dagger \mathbf{B}^\prime \mathbf{B} \mathbf{D}^\dagger \). The assertion follows by taking square roots. \(\square \)

It is seen from Theorem 4 that \(\varvec{\mu }^*= \hat{\varvec{\mu }}\) for all \(\mathbf{y}\) if and only if \(\mathbf{B} \mathbf{D}^\dagger = \mathbf{0}\), or, equivalently, \(\mathbf{B} = \mathbf{0}\), which means that \(\mathbf{H}\) and \(\mathbf{P}_{\mathbf{V}\mathbf{M}}\) commute.

5 Equality of BLUE and OLSE

The commutativity \(\mathbf{H}\mathbf{P}_{\mathbf{V}\mathbf{M}} = \mathbf{P}_{\mathbf{V}\mathbf{M}} \mathbf{H}\), just mentioned in the preceding section, is not contained in the standard catalogue of conditions necessary and sufficient for the equality \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta }) = \mathsf{OLSE }(\mathbf{X} \varvec{\beta })\). Among the most important conditions equivalent to the equality are:

  1. (i)

    \(\mathbf{H}\mathbf{V} = \mathbf{V}\mathbf{H}\),

  2. (ii)

    \(\mathbf{H}\mathbf{V} = \mathbf{H}\mathbf{V}\mathbf{H}\),

  3. (iii)

    \({\fancyscript{R}}(\mathbf{V}\mathbf{X}) = {\fancyscript{R}}(\mathbf{X}) \cap {\fancyscript{R}}(\mathbf{V})\),

  4. (iv)

    \(\mathbf{H}\mathbf{V}\mathbf{M} = \mathbf{0}\),

  5. (v)

    \({\fancyscript{R}}(\mathbf{V}\mathbf{X}) \subseteq {\fancyscript{R}}(\mathbf{X})\).

Note that the condition (v) can be rewritten as \({\fancyscript{R}}(\mathbf{V}\mathbf{X}) = {\fancyscript{R}}(\mathbf{X})\) when \(\mathbf{V}\) is nonsingular; see Krämer (1980) for a discussion related to Kruskal’s theorem and Puntanen et al. (2011, Proposition 10.1). Motivated by Theorem 4 we obtain the following result.

Theorem 5

Under the model (1), let \(\mathbf{H} = \mathbf{X}\mathbf{X}^\dagger \) and \(\mathbf{M} = \mathbf{I}_n - \mathbf{H}\). Then, the following conditions are equivalent:

  1. (i)

    \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta }) = \mathsf{OLSE }(\mathbf{X} \varvec{\beta })\),

  2. (ii)

    \(\mathbf{H}\mathbf{V}\mathbf{M} = \mathbf{0}\),

  3. (iii)

    \(\mathbf{H}\mathbf{P}_{\mathbf{V}\mathbf{M}} = \mathbf{0}\),

  4. (iv)

    \(\mathbf{H} + \mathbf{P}_{\mathbf{V}\mathbf{M}}\) is an orthogonal projector,

  5. (v)

    \(\mathbf{H}\mathbf{P}_{\mathbf{V}\mathbf{M}} = \mathbf{P}_{\mathbf{V}\mathbf{M}}\mathbf{H}\),

  6. (vi)

    \(\mathbf{H}\mathbf{P}_{\mathbf{V}\mathbf{M}}\) is an orthogonal projector.

Proof

For the proof of the equivalence (i) \(\Leftrightarrow \) (ii) see Puntanen et al. (2011, Proposition 10.1).

To show that (ii) implies (iii) postmultiply \(\mathbf{H}\mathbf{V}\mathbf{M} = \mathbf{0}\) by \((\mathbf{V}\mathbf{M})^\dagger \) and refer to \(\mathbf{P}_{\mathbf{V}\mathbf{M}} = \mathbf{V}\mathbf{M}(\mathbf{V}\mathbf{M})^\dagger \). To establish the reverse implication, note that the condition \(\mathbf{H}\mathbf{P}_{\mathbf{V}\mathbf{M}} = \mathbf{0}\) entails \(\mathbf{H}\mathbf{V}\mathbf{M}(\mathbf{V}\mathbf{M})^\dagger = \mathbf{0}\). Postmultiplying this equality by \(\mathbf{V}\mathbf{M}\) leads to \(\mathbf{H}\mathbf{V}\mathbf{M} = \mathbf{0}\).

The equivalence (iii) \(\Leftrightarrow \) (iv) is well known; see e.g., Rao and Mitra (1971, Theorem 5.1.2).

The fact that (iii) \(\Rightarrow \) (v) is visibly seen by taking the transpose of \(\mathbf{H}\mathbf{P}_{\mathbf{V}\mathbf{M}} = \mathbf{0}\). For the proof of the reverse implication, recall that \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta })\) exists if and only if Eq. (7) are satisfied. By condition (v) we have \(\mathbf{H}\mathbf{P}_{\mathbf{V}\mathbf{M}}\mathbf{x} = \mathbf{P}_{\mathbf{V}\mathbf{M}}\mathbf{H}\mathbf{x}\) for any vector \(\mathbf{x} \in \mathbb R ^{n,1}\). Thus, \(\mathbf{x} \in {\fancyscript{R}}(\mathbf{H})\cap {\fancyscript{R}}(\mathbf{P}_{\mathbf{V}\mathbf{M}})\), whence \(\mathbf{H}\mathbf{P}_{\mathbf{V}\mathbf{M}}\mathbf{x} = \mathbf{0}\) for any \(\mathbf{x}\), i.e., \(\mathbf{H}\mathbf{P}_{\mathbf{V}\mathbf{M}} = \mathbf{0}\).

The part (v) \(\Leftrightarrow \) (vi) is also known in the literature; see e.g., Baksalary et al. (2002, Theorem). \(\square \)

Krämer (1980) showed how his theorem characterizing the vectors \(\mathbf{y}\) ensuring the coincidence of \(\mathsf{BLUE }(\varvec{\beta })\) and \(\mathsf{OLSE }(\varvec{\beta })\) can be used to prove Kruskal’s Theorem. This is done in a similar fashion in the present set-up.

Theorem 6

Under the model (1), let \(\mathbf{H} = \mathbf{X}\mathbf{X}^\dagger \) and \(\mathbf{M} = \mathbf{I}_n - \mathbf{H}\). Then, the following conditions are equivalent:

  1. (i)

    \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta }) = \mathsf{OLSE }(\mathbf{X} \varvec{\beta })\),

  2. (ii)

    \({\fancyscript{N}}(\mathbf{H}) = {\fancyscript{N}}(\mathbf{H}\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}) \cap {\fancyscript{N}}(\mathbf{H})\).

Proof

First we show that (ii) implies (i). The condition (ii) ensures that \({\fancyscript{N}}(\mathbf{H}) \subseteq {\fancyscript{N}}(\mathbf{H}\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}})\), which yields \({\fancyscript{R}}(\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H}) \subseteq {\fancyscript{R}}(\mathbf{H})\). Thus, \(\mathbf{H}\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H} = \overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}}\mathbf{H}\). In consequence, \(\mathbf{H}\mathbf{P}_{\mathbf{V}\mathbf{M}}\mathbf{H} = \mathbf{P}_{\mathbf{V}\mathbf{M}}\mathbf{H}\), and taking the transpose leads to \(\mathbf{H}\mathbf{P}_{\mathbf{V}\mathbf{M}} = \mathbf{P}_{\mathbf{V}\mathbf{M}}\mathbf{H}\). The implication now follows on account of point (v) of Theorem 5.

The part (i) \(\Rightarrow \) (ii) is established in a similar fashion by reversing the preceding chain. \(\square \)

From the discussion preceding Corollary 1 it follows that \({\fancyscript{N}}(\mathbf{H}\overline{\mathbf{P}}_{\mathbf{V}\mathbf{M}})\), specified in (10), coincides with \({\fancyscript{R}}(\mathbf{P}_{\mathbf{V}\mathbf{M}})\) when (4) holds. In such a case, the condition (ii) of Theorem 6 reduces to \({\fancyscript{N}}(\mathbf{H}) = {\fancyscript{R}}(\mathbf{P}_{\mathbf{V}\mathbf{M}}) \cap {\fancyscript{N}}(\mathbf{H})\), or, equivalently, to \({\fancyscript{R}}(\mathbf{P}_{\mathbf{V}\mathbf{M}}) \subseteq {\fancyscript{N}}(\mathbf{H})\), i.e., \({\fancyscript{R}}(\mathbf{V}\mathbf{M}) \subseteq {\fancyscript{R}}(\mathbf{M})\). When \(\mathbf{V}\) is nonsingular, we get \({\fancyscript{R}}(\mathbf{V}\mathbf{M}) = {\fancyscript{R}}(\mathbf{M})\), which is the final condition of Kruskal’s Theorem in Krämer (1980).

Another observation is that the conditions of Theorems 5 and 6, unlike the customary conditions given on the top of the present section, predominantly deal with orthogonal projectors. Thus, the equality of \(\mathsf{BLUE }(\mathbf{X} \varvec{\beta })\) and \(\mathsf{OLSE }(\mathbf{X} \varvec{\beta })\) is characterized in a more symmetric way.

When \(\mathbf{P}_{\mathbf{V}\mathbf{M}}\) has representation (6), then we get the following equivalences among the statements of Theorem 5:

\(\mathbf{H } + \mathbf{P }_{\mathbf{V }\mathbf{M }}\) is an orthogonal projector if and only if \(\mathbf{A } = \mathbf{0 }\),

\(\mathbf{H }\mathbf{P }_{\mathbf{V }\mathbf{M }} = \mathbf{P }_{\mathbf{V }\mathbf{M}}\mathbf{H}\) if and only if \(\mathbf{B } = \mathbf{0 }\).

Note that condition \(\mathbf{A} = \mathbf{0}\) in general is stronger than \(\mathbf{B} = \mathbf{0}\), but in the present set-up they are equivalent. There exists a large number of equivalent conditions to characterize condition (v) of Theorem 5, for instance:

$$\begin{aligned}&\mathbf{HP }_{\mathbf{VM }} = \mathbf{P }_{{\fancyscript{R}}(\mathbf{H}) \cap {\fancyscript{R}}(\mathbf{V }\mathbf{M })},\\&{\fancyscript{R}}(\mathbf{H }\mathbf{P }_{\mathbf{V }\mathbf{M }}) = {\fancyscript{R}}(\mathbf{H }) \cap {\fancyscript{R}}(\mathbf{V }\mathbf{M}),\\&{\mathrm{rank}}(\mathbf{H }\mathbf{P }_{\mathbf{V }\mathbf{M }}) = \dim [{\fancyscript{R}}(\mathbf{H }) \cap {\fancyscript{R}}(\mathbf{V }\mathbf{M })],\\&{\mathrm{rank}}(\mathbf{H } - \mathbf{P }_\mathbf{VM }) = {\mathrm{rank}}(\mathbf{H } + \mathbf{P }_{\mathbf{V}\mathbf{M }}) - {\mathrm{rank}}(\mathbf{H }\mathbf{P }_{\mathbf{V }\mathbf{M }}); \end{aligned}$$

see Baksalary and Trenkler (2008).