1 Introduction

The problem of curve fitting on the basis of a finite number of observations arises in almost all areas where mathematics is applied and one of the most powerful tools used for this purpose is based on the least squares method. Over the years a rich sample of results occurred in the literature providing an indisputable evidence that the matrix analysis concepts and techniques offer handy means to apply the method. The present paper constitutes a further contribution to this stream of considerations by demonstrating how an expression for the Moore–Penrose inverse of a columnwise partitioned matrix derived in Baksalary and Baksalary (2007, Theorem 1) may be advantageously utilized to deal with the problems originating from linear regression. Among the benefits resulting from the proposed approach one may mention: a simplification of derivations, a novel insight into the notions involved in the regression model, and reduction of computational costs necessary to obtain sought estimators. Furthermore, by simplifying inevitable mathematical operations, the proposed approach offers an attractive alternative to the researchers, who are not keen on exploiting more advanced than necessary matrix methods or utilizing software packages which do not provide a comprehensive control over the processed data, as the approach enables to perform calculations almost “by hand” preserving an insight into every step of linear regression.

The aforementioned representation of the Moore–Penrose inverse established in Baksalary and Baksalary (2007, Theorem 1) is recalled in the following lemma.

Lemma 1.1

Let \({\mathbf{A}}\) be an \(n \times m\), \(m\geqslant 2\), real matrix columnwise partitioned as \({\mathbf{A}} = ({\mathbf{A}}_1 : {\mathbf{A}}_2)\), with \({\mathbf{A}}_i\) denoting \(n \times m_i\), \(i = 1, 2\), matrices such that \(m_1 + m_2 = m\). Furthermore, let the ranges of \({\mathbf{A}}_1\) and \({\mathbf{A}}_2\) be disjoint. Then the Moore–Penrose inverse of \(\mathbf{A}\) is of the form

(1)

where \({\mathbf{Q}}_i\), \(i = 1, 2\), is the orthogonal projector onto the null space of the transpose of \({\mathbf{A}}_i\) and \(({\mathbf{Q}}_i \mathbf{A}_j)^\dagger \), \(i = 1, 2\), \(i \ne j\), is the Moore–Penrose inverse of \({\mathbf{Q}}_i {\mathbf{A}}_j\).

In the next section we briefy discuss particular linear regression models, shading a spotlight on issues in which the Moore–Penrose inverse of a columnwise partitioned matrix naturally emerges. These considerations are followed by Sect. 3, which contains an example demonstrating applicability of the present approach. The data used in the example originated from the observations of the position of Polaris made on December 12, 1983, by S.G. Brewer, which were afterwards used by Pedler (1993) to develop a solution to an (as the author claims) “astronomical problem” aimed at fitting a circle to a set of points. Section 4 provides a number of remarks concerned with the proposed approach.

2 Particular linear regression models

All matrices occurring in what follows are of real entries and the superscript \(^\prime \) stands for a matrix transpose. Let us consider the linear regression model

$$\begin{aligned} {\mathbf{y}} = {\mathbf{X}} \varvec{\beta } + {\mathbf{u}}, \end{aligned}$$
(2)

where \({\mathbf{y}}\) is an \(n \times 1\) random vector of observations, \({\mathbf{X}}\) is an \(n \times p\) known model (design) matrix of constants, \(\varvec{\beta }\) is a \(p \times 1\) vector of unknown parameters, and \({\mathbf{u}}\) is an \(n \times 1\) vector of unknown errors. The entries of the vector \({\mathbf{u}} = (u_1, u_2,\ldots , u_n)^\prime \) are assumed to have a mean of zero and (unknown) variance \(\sigma ^2\), and each pair \(u_i\), \(u_j\), \(i \ne j\), is assumed to be uncorrelated, i.e., the expectation vector and the covariance matrix of \({\mathbf{u}}\) are \({\mathsf {E}}({\mathbf{u}}) = {\mathbf{0}}\) and \(\mathsf {Cov}({\mathbf{u}}) = \sigma ^2 {\mathbf{I}}_n\), respectively. Customarily, the symbol \({\mathbf{I}}_n\) stands for the identity matrix of order n. We also assume that the matrix \({\mathbf{X}}\) is of full column rank. Then, the least squares estimator (LSE) of \(\varvec{\beta }\) is given by

$$\begin{aligned} \hat{\varvec{\beta }} = ({\mathbf{X}}^\prime {\mathbf{X}})^{-1} \mathbf{X}^\prime {\mathbf{y}}. \end{aligned}$$

It is worth emphasizing that the assumption that \({\mathbf{X}}\) is of full column rank plays a crucial role, and is most often made to assure uniqueness of the estimator of \(\varvec{\beta }\); see Puntanen et al. (2011, p. 34). It turns out that \(({\mathbf{X}}^\prime {\mathbf{X}})^{-1} {\mathbf{X}}^\prime = {\mathbf{X}}^\dagger \), i.e., the Moore–Penrose inverse of \({\mathbf{X}}\); see Appendix A.

To calculate \({\mathbf{X}}^\dagger \), instead of bothering with the inverse of \({\mathbf{X}}^\prime {\mathbf{X}}\), we may write the regressor matrix \({\mathbf{X}}\) in the columnwise partitioned form

$$\begin{aligned} {\mathbf{X}} = ({\mathbf{X}}_1 : {\mathbf{X}}_2), \end{aligned}$$
(3)

where \({\mathbf{X}}_i\), \(i = 1, 2\), denote \(n \times p_i\) matrices such that \(p_1 + p_2 = p\). Since \({\mathbf{X}}\) is of full column rank, it follows that

$$\begin{aligned} {{\mathcal {R}}}({\mathbf{X}}_1) \cap {{\mathcal {R}}}({\mathbf{X}}_2) = \{ {\mathbf{0}} \}, \end{aligned}$$
(4)

where \({{\mathcal {R}}}({\mathbf{.}})\) stands for the column space (range) of a matrix argument. According to Lemma 1.1, the Moore–Penrose inverse of a matrix of the form (3), such that (4) is satisfied, can be expressed as

(5)

where \({\mathbf{Q}}_i = {\mathbf{I}}_n - {\mathbf{X}}_i {\mathbf{X}}_i^\dagger \) is the orthogonal projector onto \({{\mathcal {N}}}({\mathbf{X}}_i^\prime )\), the null space of \({\mathbf{X}}_i^\prime \), \(i = 1, 2\); see Appendix A. Note that the condition (4), under which the representation of the Moore–Penrose inverse (5) is valid, is weaker than the requirement that \({\mathbf{X}}\) specified in (3) is of full column rank (the assumption which will be extensively exploited in what follows).

Let us consider a simple linear regression model

$$\begin{aligned} {\mathbf{y}} = {\beta }_0 {\mathbf {1}} + {\beta }_1 {\mathbf{x}} + {\mathbf{u}}, \end{aligned}$$

where \({\beta }_0, {\beta }_1 \in {\mathbb {R}}\), \({\mathbf {1}} = (1, 1,\ldots , 1)^\prime \) is the vector of n ones and \({\mathbf{x}} = (x_1, x_2,\ldots , x_n)^\prime \) is the vector of observations on one regressor variable. The vector \({\mathbf{u}} = (u_1, u_2,\ldots , u_n)^\prime \) consists of the unknown errors. To obtain the LSE of the parameter vector \(\varvec{\beta } = ({\beta }_0, {\beta }_1)^\prime \) we may partition \(n \times 2\) matrix \({\mathbf{X}}\) as

$$\begin{aligned} {\mathbf{X}} = ({\mathbf {1}} : {\mathbf{x}}), \end{aligned}$$
(6)

where \({{\mathcal {R}}}({\mathbf{1}}) \cap {{\mathcal {R}}}({\mathbf{x}}) = \{ {\mathbf{0}} \}\) since we assume that \({\mathbf{X}}\) is of full column rank. By (5) it follows that

(7)

where \({\mathbf{Q}}_{\mathbf{x}} {\mathbf{1}} = (\mathbf{I }_n - {\mathbf{x}}\mathbf{x}^\dagger ) {\mathbf{1}}\) and \({\mathbf{Q}}_{{\mathbf{1}}} {\mathbf{x}} = ({\mathbf{I}}_n - {\mathbf{1}}{\mathbf{1}}^\dagger ) {\mathbf{x}}\) are both column vectors. Thus, by the identity (A1) given in Appendix A,

$$\begin{aligned} ({\mathbf{Q}}_{\mathbf{x}} {\mathbf{1}})^\dagger = \frac{{\mathbf{1}}^\prime {\mathbf{Q}}_{\mathbf{x}}}{{\mathbf{1}}^\prime {\mathbf{Q}}_{\mathbf{x}} {\mathbf{1}}} \quad \mathrm {and} \quad ({\mathbf{Q}}_{{\mathbf{1}}} {\mathbf{x}})^\dagger = \frac{{\mathbf{x}}^\prime {\mathbf{Q}}_{{\mathbf{1}}}}{{\mathbf{x}}^\prime {\mathbf{Q}}_{{\mathbf{1}}} {\mathbf{x}}}. \end{aligned}$$
(8)

Consequently, the LSE of \(\varvec{\beta } = ({\beta }_0, {\beta }_1)^\prime \) is

(9)

From (8) we obtain

$$\begin{aligned} ({\mathbf{Q}}_{\mathbf{x}} {\mathbf{1}})^\dagger {\mathbf{x}} = 0, \ (\mathbf{Q}_{{\mathbf{1}}} {\mathbf{x}})^\dagger {\mathbf{1}} = 0, \ ({\mathbf{Q}}_{\mathbf{x}} {\mathbf{1}})^\dagger {\mathbf{1}} = 1,\ (\mathbf{Q}_{{\mathbf{1}}} {\mathbf{x}})^\dagger {\mathbf{x}} = 1, \end{aligned}$$

whence

Let us use the symbol \({\mathbf{H}}\) to denote the so-called hat-matrix, which represents the orthogonal projector onto \({{\mathcal {R}}}(\mathbf{X})\), i.e., \({\mathbf{H}} = {\mathbf{X}}{\mathbf{X}}^\dagger \). Then, the identities (6)–(8) entail

$$\begin{aligned} {\mathbf{H}} = {\mathbf{1}}({\mathbf{Q}}_{\mathbf{x}} {\mathbf{1}})^\dagger + {\mathbf{x}} ({\mathbf{Q}}_{{\mathbf{1}}} {\mathbf{x}})^\dagger = \frac{{\mathbf{1}}{\mathbf{1}}^\prime {\mathbf{Q}}_{\mathbf{x}}}{{\mathbf{1}}^\prime {\mathbf{Q}}_{\mathbf{x}} {\mathbf{1}}} + \frac{{\mathbf{x}}{\mathbf{x}}^\prime \mathbf{Q}_{{\mathbf{1}}}}{{\mathbf{x}}^\prime {\mathbf{Q}}_{{\mathbf{1}}} {\mathbf{x}}}, \end{aligned}$$
(10)

which means that the hat-matrix is a sum of two matrices of rank one. Furthermore, each of the summands involved in (10) is idempotent. This observation leads to the conclusion (see e.g., Rao and Mitra (1971, Theorem 5.1.2)) that the matrices necessarily commute and their product is equal to the zero matrix, i.e.,

$$\begin{aligned} {\mathbf{1}}({\mathbf{Q}}_{\mathbf{x}} {\mathbf{1}})^\dagger {\mathbf{x}} (\mathbf{Q}_{{\mathbf{1}}} {\mathbf{x}})^\dagger = {\mathbf{0}} = {\mathbf{x}} ({\mathbf{Q}}_{{\mathbf{1}}} {\mathbf{x}})^\dagger {\mathbf{1}}({\mathbf{Q}}_{\mathbf{x}} {\mathbf{1}})^\dagger . \end{aligned}$$

Since \({\mathbf{1}} \in {{\mathcal {R}}}({\mathbf{X}})\), it follows that \({\mathbf{H}}{\mathbf{1}} = {\mathbf{1}}\). Hence, by denoting \(\hat{\mathbf{y}} = {\mathbf{H}}{\mathbf{y}}\), we see that \({\mathbf{1}}^\prime \hat{\mathbf{y}} = {\mathbf{1}}^\prime {\mathbf{H}}{\mathbf{y}} = {\mathbf{1}}^\prime {\mathbf{y}} \), which gives \(\sum \nolimits _{i=1}^{n} \hat{y_i} = \sum \nolimits _{i=1}^{n} y_i\). Furthermore, by putting \(\hat{\mathbf{u}} = ({\mathbf{I}}_n - {\mathbf{H}}){\mathbf{y}}\), we obtain \({\mathbf{1}}^\prime \hat{\mathbf{u}} = {\mathbf{1}}^\prime ({\mathbf{I}}_n - {\mathbf{H}}){\mathbf{y}} = {\mathbf{0}}\), so \( \sum \nolimits _{i=1}^{n} \hat{u_i} = 0\). Another consequence of \({\mathbf{1}} \in {{\mathcal {R}}}({\mathbf{X}})\) is the identity

$$\begin{aligned} \Vert ({\mathbf{I}}_n - {\mathbf{J}}){\mathbf{y}} \Vert ^2 = \Vert ({\mathbf{H}} - \mathbf{J}){\mathbf{y}} \Vert ^2 + \Vert ({\mathbf{I}}_n - {\mathbf{H}}){\mathbf{y}} \Vert ^2, \end{aligned}$$
(11)

with \({\mathbf{J}} = {\mathbf{1}}{\mathbf{1}}^\dagger \); see Puntanen et al. (2011, Proposition 8.5). Alternatively, the equality (11) can be expressed as \(SST = SSR + SSE\), where SST stands for the total sum of squares, SSR for the regression sum of squares, and SSE for the residual sum of squares. The coefficient of determination defined as

$$\begin{aligned} R^2 = \frac{SSR}{SST} \end{aligned}$$

turns out to be

$$\begin{aligned} R^2 = \frac{\Vert ({\mathbf{H}} - {\mathbf{J}}){\mathbf{y}} \Vert ^2}{\Vert (\mathbf{I}_n - {\mathbf{J}}){\mathbf{y}} \Vert ^2} = \frac{{\mathbf{y}}^\prime ({\mathbf{H}} - {\mathbf{J}}){\mathbf{y}}}{ {\mathbf{y}}^\prime ({\mathbf{I}}_n - {\mathbf{J}}){\mathbf{y}}}. \end{aligned}$$
(12)

Clearly, \(R^2 \geqslant 0\). Another observation is that \({\mathbf{I}}_n - {\mathbf{J}} - ({\mathbf{H}} - {\mathbf{J}}) = {\mathbf{I}}_n - {\mathbf{H}}\) is nonnegative definite (as \({\mathbf{I}}_n - {\mathbf{H}}\) is the orthogonal projector onto \({{\mathcal {N}}}({\mathbf{X}}^\prime )\)). Hence, \({\mathbf{H}} - {\mathbf{J}} {\mathop {\leqslant }\limits ^{{\mathsf {L}}}} {\mathbf{I}}_n - {\mathbf{J}}\), where the symbol \({\mathop {\leqslant }\limits ^{{\mathsf {L}}}}\) denotes the Löwner partial ordering, from where we conclude that \(0 \leqslant R^2 \leqslant 1\). The fact that values of \(R^2\) are restricted to the interval [0, 1] is known in the literature (see e.g., Davidson and MacKinnon (1993, p. 14)), but usually it is demonstrated in rather more involved way than in the present paper.

Consider now the general linear model with intercept

where \({\mathbf{X}}\) and \(\varvec{\beta }\) are of dimensions \(n \times (p -1)\) and \((p - 1) \times 1\), respectively, and \(\begin{pmatrix} {\mathbf{1}} : {\mathbf{X}} \end{pmatrix}\) is assumed to be of full column rank. Then, by analogy to (9), the LSE of \(({\beta }_0, \varvec{\beta })^\prime \) is

(13)

On account of (A1), we obtain \(({\mathbf{Q}}_{\mathbf{X}} {\mathbf{1}})^\dagger = ({\mathbf{1}}^\prime {\mathbf{Q}}_{\mathbf{X}} {\mathbf{1}})^{-1}{\mathbf{1}}^\prime {\mathbf{Q}}_{\mathbf{X}}\). Hence, \((\mathbf{Q}_{\mathbf{X}} {\mathbf{1}})^\dagger = ({\mathbf{Q}}_{\mathbf{X}} {\mathbf{1}})^\dagger {\mathbf{Q}}_{\mathbf{X}}\). Similarly, we arrive at \((\mathbf{Q}_{{\mathbf{1}}} {\mathbf{X}})^\dagger = ({\mathbf{Q}}_{{\mathbf{1}}} {\mathbf{X}})^\dagger {\mathbf{Q}}_{{\mathbf{1}}}\). In consequence, since \({\mathsf {E}}({\mathbf{y}}) = {\beta }_0 {\mathbf{1}} + {\mathbf{X}} \varvec{\beta }\), and since both, \({\mathbf{Q}}_{{\mathbf{1}}} \mathbf{X}\) and \({\mathbf{Q}}_{\mathbf{X}} {\mathbf{1}}\), are of full column ranks, we have

Note that the hat-matrix turns out to be

$$\begin{aligned} {\mathbf{H}} = {\mathbf{1}}({\mathbf{Q}}_{\mathbf{X}} {\mathbf{1}})^\dagger + {\mathbf{X}} ({\mathbf{Q}}_{{\mathbf{1}}} {\mathbf{X}})^\dagger , \end{aligned}$$
(14)

a sum of two matrices of which the former is of rank one and the latter is of the same rank as matrix \({\mathbf{X}}\), i.e., \(p - 1\). Similarly as above, both summands which determine the hat-matrix specified in (14) are commuting idempotents, whose product equals the zero matrix.

The formula (14) (as well as its particular case (10)) can be viewed as an alternative to the representation of \({\mathbf{H}}\) as a sum of two orthogonal projectors, which reads \({\mathbf{H}} = {\mathbf{J}} + {\mathbf{P}}_{{\mathbf{C}}{\mathbf{X}}}\), where \({\mathbf{P}}_\mathbf{CX} = {\mathbf{CX}} ({\mathbf{CX}})^\dagger \), with \({\mathbf{C}}\) denoting the so-called centering matrix defined as \({\mathbf{C}} = {\mathbf{I}}_n - \mathbf{J}\); see Puntanen et al. (2011, formula (8.108)). It is clear that \({\mathbf{J}}{\mathbf{P}}_{\mathbf{C\mathbf{X}}} = {\mathbf{0}}\).

3 Applications

As in Pedler (1993, Sect. 6), we consider now the linear regression model with 4 regressors

(15)

Let \({\mathbf{x}} = (x_1, x_2,\ldots , x_n)^\prime \), \({\mathbf{y}} = (y_1, y_2, \ldots , y_n)^\prime \), \({\mathbf{a}} = (a_1, a_2,\ldots , a_n)^\prime \), \({\mathbf{b}} = (b_1, b_2,\ldots , b_n)^\prime \), \(\varvec{\beta } = (p, q, u, v)^\prime \), \({\mathbf{c}} = (c_1, c_2,\ldots , c_n)^\prime \), \({\mathbf{d}} = (d_1, d_2,\ldots , d_n)^\prime \). Furthermore, let \({\mathbf{X}}\) be a \(2n \times 4\) matrix of the form

(16)

Then (15) can be written as

As the matrix \({\mathbf{X}}\) is of full column rank, we can determine the LSE of \(\varvec{\beta }\) by applying the representation derived in Baksalary and Trenkler (2021, Example 1) as a consequence of Baksalary and Baksalary (2007, Theorem 1). In order to take advantage of this result, let

$$\begin{aligned} A= & {} \frac{\sum \nolimits _{i=1}^n a_i}{\sum \nolimits _{i=1}^n\left( a_i^2 + b_i^2\right) }, \ B = \frac{\sum \nolimits _{i=1}^n b_i}{\sum \nolimits _{i=1}^n\left( a_i^2 + b_i^2\right) }, \end{aligned}$$
(17)
$$\begin{aligned} N&= n - A \sum \limits _{i=1}^n a_i - B \sum \limits _{i=1}^n b_i, \ M = \sum \limits _{i=1}^n\left( a_i^2 + b_i^2\right) - \frac{1}{n} \left[ \left( \sum \limits _{i=1}^n a_i\right) ^2 + \left( \sum \limits _{i=1}^n b_i\right) ^2\right] , \end{aligned}$$
(18)
(19)

where \({\overline{a}} = \frac{1}{n} \sum \nolimits _{i=1}^n a_i\) and \({\overline{b}} = \frac{1}{n} \sum \nolimits _{i=1}^n b_i\). Then, on account of Baksalary and Trenkler (2021, formula (8)), we obtain a very handy representation of the Moore–Penrose inverse of the model matrix (16), namely

$$\begin{aligned} {\mathbf{X}}^\dagger = \begin{pmatrix} \begin{pmatrix} N^{-1} &{}\quad 0 \\ 0 &{}\quad N^{-1} \end{pmatrix} \begin{pmatrix} {\mathbf{e}}^\prime \\ {\mathbf{f}}^\prime \ \end{pmatrix} \\ \\ \begin{pmatrix} M^{-1} &{}\quad 0 \\ 0 &{}\quad M^{-1} \end{pmatrix} \begin{pmatrix} {\mathbf{g}}^\prime \\ {\mathbf{h}}^\prime \end{pmatrix} \end{pmatrix}. \end{aligned}$$
(20)

Hence, analogously to (13), the LSE of the parameter vector \(\varvec{\beta }\) is given by

(21)

Let us now demonstrate the usefulness of the expressions (20) and (21) by applying them to a set of real data.

Example 3.1

As mentioned in Introduction, Pedler (1993) considered “astronomical problem” of fitting a circle to a set of points and solved it from first principles. The considerations in Pedler (1993) contain also an example which exploits the data collected by S.G. Brewer from the observations of the position of Polaris made on December 12, 1983. The data are given in Table 1.

Table 1 At the ith observation at time \(t_i\), \((x_i, y_i)\) denotes (observed) cartesian coordinates of the Polaris position, whereas \(a_i = \cos (2\pi t_i/T)\) and \(b_i = \sin (2\pi t_i /T)\), where \(T = 23\mathrm {h}\ 56 \mathrm {m}\ 04 \mathrm {s}\) is the length of the sidereal day. The values are recalled from Pedler (Pedler 1993, Table 1)

The data provided in Table 1 enable to calculate the scalars A, B, M, N as well as the vectors \({\mathbf{e}}\), \({\mathbf{f}}\), \(\mathbf{g}\), \({\mathbf{h}}\) defined in (17)–(19). Hence, we obtain the vector inner products of the four vectors and the vector \({\mathbf{z}}\). It should be emphasized that these straightforward calculations involve only scalars and vectors and require relatively low computational cost; for details on advantages of the algorithm to calculate the Moore–Penrose inverse which takes into account columnwise partitioning into range disjoint matrices see Baksalary and Trenkler (2021). The outcomes of these computations are provided in Table 2.

Table 2 Values of the scalars calculated on account of the data provided in Table 1

In the light of (21), we arrive at the components of the estimator of \(\hat{\varvec{\beta }}\), which are given in Table 3. As expected, the values coincide with the ones given in Pedler (1993, Table 2).

Table 3 Least squares estimators obtained on account of (21)

The values of the total sum of squares SST, regression sum of squares SSR, residual sum of squares SSE, and the coefficient of determination \(R^2\) are provided in Table 4.

Table 4 Values of SST, SSR, SSE, and \(R^2\) obtained on account of (11) and (12)

4 Supplementary remarks

In the linear regression models considered in Sect. 2 it was assumed that the model matrices are of full column ranks and that the vector \({{\mathbf{1}}}\) is one of the columns. Such assumptions are well justified as they correspond to several common situations. However, the present approach enables to generalize the considerations by weakening the assumption that the model matrix is of full column rank to the requirement that it can be columnwise partitioned into two range disjoint matrices and by relaxing the assumption that one of the columns is the vector \({{\mathbf{1}}}\). To demonstrate this fact, let us assume that the model matrix \(\mathbf{X}\) in (2) is partitioned in accordance with (3), i.e.,

$$\begin{aligned} {\mathbf{y}} = {\mathbf{X}}_1 \varvec{\beta }_1 + {\mathbf{X}}_2 \varvec{\beta }_2 + {\mathbf{u}}, \end{aligned}$$
(22)

where vectors \(\varvec{\beta }_i\) are of orders \(p_i \times 1\), \(i = 1, 2\). Provided that the condition (4) holds, by (5) we conclude that the LSE of \({(\varvec{\beta }_1, \varvec{\beta }_2)}^\prime \) is given by

(23)

Visibly, the expression (13) is obtained from (23) by taking \({\mathbf{X}}_1 = {{\mathbf{1}}}\) and \(\mathbf{X}_2 = {{\mathbf{X}}}\). Furthermore, from (3) and (5) we obtain

$$\begin{aligned} {\mathbf{H}} = {\mathbf{X}}_1 ({\mathbf{X}}_1^\prime {\mathbf{Q}}_2 {\mathbf{X}}_1)^\dagger {\mathbf{X}}_1^\prime {\mathbf{Q}}_2 + {\mathbf{X}}_2 ({\mathbf{X}}_2^\prime {\mathbf{Q}}_1 {\mathbf{X}}_2)^\dagger {\mathbf{X}}_2^\prime {\mathbf{Q}}_1, \end{aligned}$$
(24)

which under \({\mathbf{X}}_1 = {{\mathbf{1}}}\) and \({\mathbf{X}}_2 = {{\mathbf{X}}}\) leads to the formula (14). Note that the expression (24) is given in Puntanen et al. (2011, Proposition 16.1) along with its equivalent counterparts, one of which is (4).

Another evidence of the applicability of Lemma 1.1 in statistical estimation theory was provided in Baksalary and Trenkler (2021), by deriving an original representation for the best linear unbiased estimator (BLUE) of \({\mathbf{X}} \varvec{\beta }\) under the generalized version of the (consistent) linear model (2) with \(\mathsf {Cov}({\mathbf{u}}) = \sigma ^2 {\mathbf{V}}\), where \({\mathbf{V}}\) denotes a known \(n \times n\) positive semidefinite matrix. It was shown in Baksalary and Trenkler (2021, Example 4) that \({\mathbf{G}}{\mathbf{y}}\) with

$$\begin{aligned} {\mathbf{G}} = {\mathbf{X}} ({\mathbf{Q}}_ {{\mathbf{V}}{\mathbf{Q}}_{\mathbf{X}}}{\mathbf{X}})^\dagger = {\mathbf{X}} ({\mathbf{X}}^\prime {\mathbf{Q}}_ {{\mathbf{V}}{\mathbf{Q}}_{\mathbf{X}}}\mathbf{X})^\dagger {\mathbf{X}}^\prime {\mathbf{Q}}_ {{\mathbf{V}}{\mathbf{Q}}_{\mathbf{X}}} \end{aligned}$$
(25)

is BLUE of \({\mathbf{X}} \varvec{\beta }\).

Analogously, we can derive representations for BLUE of both, \(\mathbf{X}_1 \varvec{\beta }_1\) and \({\mathbf{X}}_2 \varvec{\beta }_2\) under the model (22) when \(\mathsf {Cov}({\mathbf{u}}) = \sigma ^2 {\mathbf{V}}\). From Puntanen et al. (2011, formula (10.5)) it follows that \({\mathbf{G}}_1{\mathbf{y}}\) is BLUE of an (estimable) parametric function \({\mathbf{X}}_1 \varvec{\beta }_1\) if \({\mathbf{G}}_1\) satisfies the equation

$$\begin{aligned} {\mathbf{G}}_1({\mathbf{X}}_1 : {\mathbf{X}}_2 : {\mathbf{V}}{\mathbf{Q}}_{\mathbf{X}}) = (\mathbf{X}_1 : {\mathbf{0}} : {\mathbf{0}}). \end{aligned}$$
(26)

On account of (4), which is a necessary and sufficient condition for \({\mathbf{X}}_1 \varvec{\beta }_1\) to be estimable, we arrive at \({{\mathcal {R}}} ({\mathbf{X}}_1) \cap {{\mathcal {R}}}(\mathbf{X}_2 : {\mathbf{V}}{\mathbf{Q}}_{\mathbf{X}}) = \{ {\mathbf{0}} \}\). In consequence, we can utilize the representation of the Moore–Penrose inverse provided in Lemma 1.1, which leads to the conclusion that one of the solutions of (26) is of the form

$$\begin{aligned} {\mathbf{G}}_1 = {\mathbf{X}}_1 ({\mathbf{X}}_1^\prime {\mathbf{Q}}_{({\mathbf{X}}_2 : \mathbf{V}{\mathbf{Q}}_{\mathbf{X}})} {\mathbf{X}}_1)^\dagger {\mathbf{X}}_1^\prime \mathbf{Q}_{({\mathbf{X}}_2 : {\mathbf{V}}{\mathbf{Q}}_{\mathbf{X}})}. \end{aligned}$$

Hence,

$$\begin{aligned} \mathrm {BLUE}({\mathbf{X}}_1 \varvec{\beta }_1) = {\mathbf{X}}_1 (\mathbf{X}_1^\prime {\mathbf{Q}}_{({\mathbf{X}}_2 : {\mathbf{V}}{\mathbf{Q}}_{\mathbf{X}})} \mathbf{X}_1)^\dagger {\mathbf{X}}_1^\prime {\mathbf{Q}}_{({\mathbf{X}}_2 : {\mathbf{V}}\mathbf{Q}_{\mathbf{X}})}{\mathbf{y}}. \end{aligned}$$

Similarly, by interchanging the subscripts “1” and “2” in (26), we obtain

$$\begin{aligned} \mathrm {BLUE}({\mathbf{X}}_2 \varvec{\beta }_2) = {\mathbf{X}}_2 \left( \mathbf{X}_2^\prime {\mathbf{Q}}_{({\mathbf{X}}_1 : {\mathbf{V}}{\mathbf{Q}}_{\mathbf{X}})} \mathbf{X}_2\right) ^\dagger {\mathbf{X}}_2^\prime {\mathbf{Q}}_{({\mathbf{X}}_1 : {\mathbf{V}}\mathbf{Q}_{\mathbf{X}})}{\mathbf{y}}. \end{aligned}$$

The paper is concluded with some remarks concerned with advantages of utilizing the representation of the Moore–Penrose inverse provided in Lemma 1.1 from the computational point of view. In comparison to the methods of determining the inverse based on the singular value decomposition (SVD), which are exploited in several popular software packages (e.g., Matlab, Mathematica or R), an algorithm based on the representation (1) seems to have three main advantages, each leading to a dropping of computational costs. The first one is that it reduces sizes of matrices to be Moore–Penrose inverted—instead of the inverse of an \(n \times m\) matrix \({\mathbf{A}}\), we need to compute two inverses of matrices \({\mathbf{Q}}_2 {\mathbf{A}}_1\) and \(\mathbf{Q}_1 {\mathbf{A}}_2\) of orders \(n \times m_1\) and \(n \times m_2\), respectively, where \(m_1 + m_2 = m\); as several software tools impose limits on sizes of matrices which can be stored, one can encounter a situation in which the inverse of \({\mathbf{A}}\) may exceed the limit, while the inverses of \({\mathbf{Q}}_2 {\mathbf{A}}_1\) and \(\mathbf{Q}_1 {\mathbf{A}}_2\) are still manageable. The second benefit is that the algorithm allows computing both block entries occurring in the inverse (almost) simultaneously—in the light of Baksalary and Trenkler (2021, formula (6)), the two entries involved in the representation (1) are linked by the identity

$$\begin{aligned} \left( {\mathbf{Q}}_i {\mathbf{A}}_j\right) ^\dagger = {\mathbf{A}}_j^\dagger \left[ {\mathbf{I}}_n - \mathbf{A}_i \left( {\mathbf{Q}}_j {\mathbf{A}}_i\right) ^\dagger \right] ,\ i, j = 1, 2, i \ne j, \end{aligned}$$

which means that one of the Moore–Penrose inverses involved in the representation can be derived from the knowledge of the other. The third advantage of the algorithm is that it can be executed iteratively in each subsequent step to matrices of smaller order—from Baksalary and Trenkler (2021, Lemma 1) it follows that when \({\mathbf{A}}\) is of full column rank, then \({\mathbf{Q}}_2 {\mathbf{A}}_1\) and \({\mathbf{Q}}_1 {\mathbf{A}}_2\) are of full column ranks as well, which means that the inverses \(({\mathbf{Q}}_2 {\mathbf{A}}_1)^\dagger \) and \(({\mathbf{Q}}_1 {\mathbf{A}}_2)^\dagger \) can be computed by applying the same algorithm; the procedure might be carried out iteratively till the matrices to be inverted are all reduced to (row) vectors.