An alternative look at the linear regression model

Baksalary, Oskar Maria; Trenkler, Götz

doi:10.1007/s00362-021-01280-x

An alternative look at the linear regression model

Regular Article
Open access
Published: 10 January 2022

Volume 63, pages 1499–1509, (2022)
Cite this article

Download PDF

You have full access to this open access article

Statistical Papers Aims and scope Submit manuscript

An alternative look at the linear regression model

Download PDF

2626 Accesses
2 Citations
Explore all metrics

Abstract

An alternative look at the linear regression model is taken by proposing an original treatment of a full column rank model (design) matrix. In such a situation, the Moore–Penrose inverse of the matrix can be obtained by utilizing a particular formula which is applicable solely when a matrix to be inverted can be columnwise partitioned into two matrices of disjoint ranges. It turns out that this approach, besides simplifying derivations, provides a novel insight into some of the notions involved in the model and reduces computational costs needed to obtain sought estimators. The paper contains also a numerical example based on astronomical observations of the localization of Polaris, demonstrating usefulness of the proposed approach.

Rank-Based Analysis of Linear Models and Beyond: A Review

A new analysis of the relationships between a general linear model and its mis-specified forms

Article 04 October 2016

On linear regression models in infinite dimensional spaces with scalar response

Article 28 August 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The problem of curve fitting on the basis of a finite number of observations arises in almost all areas where mathematics is applied and one of the most powerful tools used for this purpose is based on the least squares method. Over the years a rich sample of results occurred in the literature providing an indisputable evidence that the matrix analysis concepts and techniques offer handy means to apply the method. The present paper constitutes a further contribution to this stream of considerations by demonstrating how an expression for the Moore–Penrose inverse of a columnwise partitioned matrix derived in Baksalary and Baksalary (2007, Theorem 1) may be advantageously utilized to deal with the problems originating from linear regression. Among the benefits resulting from the proposed approach one may mention: a simplification of derivations, a novel insight into the notions involved in the regression model, and reduction of computational costs necessary to obtain sought estimators. Furthermore, by simplifying inevitable mathematical operations, the proposed approach offers an attractive alternative to the researchers, who are not keen on exploiting more advanced than necessary matrix methods or utilizing software packages which do not provide a comprehensive control over the processed data, as the approach enables to perform calculations almost “by hand” preserving an insight into every step of linear regression.

The aforementioned representation of the Moore–Penrose inverse established in Baksalary and Baksalary (2007, Theorem 1) is recalled in the following lemma.

Lemma 1.1

Let ${\mathbf{A}}$ be an $n \times m$, $m\geqslant 2$, real matrix columnwise partitioned as ${\mathbf{A}} = ({\mathbf{A}}_1 : {\mathbf{A}}_2)$, with ${\mathbf{A}}_i$ denoting $n \times m_i$, $i = 1, 2$, matrices such that $m_1 + m_2 = m$. Furthermore, let the ranges of ${\mathbf{A}}_1$ and ${\mathbf{A}}_2$ be disjoint. Then the Moore–Penrose inverse of $\mathbf{A}$ is of the form

(1)

where ${\mathbf{Q}}_i$, $i = 1, 2$, is the orthogonal projector onto the null space of the transpose of ${\mathbf{A}}_i$ and $({\mathbf{Q}}_i \mathbf{A}_j)^\dagger $, $i = 1, 2$, $i \ne j$, is the Moore–Penrose inverse of ${\mathbf{Q}}_i {\mathbf{A}}_j$.

In the next section we briefy discuss particular linear regression models, shading a spotlight on issues in which the Moore–Penrose inverse of a columnwise partitioned matrix naturally emerges. These considerations are followed by Sect. 3, which contains an example demonstrating applicability of the present approach. The data used in the example originated from the observations of the position of Polaris made on December 12, 1983, by S.G. Brewer, which were afterwards used by Pedler (1993) to develop a solution to an (as the author claims) “astronomical problem” aimed at fitting a circle to a set of points. Section 4 provides a number of remarks concerned with the proposed approach.

2 Particular linear regression models

All matrices occurring in what follows are of real entries and the superscript $^\prime $ stands for a matrix transpose. Let us consider the linear regression model

$$\begin{aligned} {\mathbf{y}} = {\mathbf{X}} \varvec{\beta } + {\mathbf{u}}, \end{aligned}$$

(2)

where ${\mathbf{y}}$ is an $n \times 1$ random vector of observations, ${\mathbf{X}}$ is an $n \times p$ known model (design) matrix of constants, $\varvec{\beta }$ is a $p \times 1$ vector of unknown parameters, and ${\mathbf{u}}$ is an $n \times 1$ vector of unknown errors. The entries of the vector ${\mathbf{u}} = (u_1, u_2,\ldots , u_n)^\prime $ are assumed to have a mean of zero and (unknown) variance $\sigma ^2$, and each pair $u_i$, $u_j$, $i \ne j$, is assumed to be uncorrelated, i.e., the expectation vector and the covariance matrix of ${\mathbf{u}}$ are ${\mathsf {E}}({\mathbf{u}}) = {\mathbf{0}}$ and $\mathsf {Cov}({\mathbf{u}}) = \sigma ^2 {\mathbf{I}}_n$, respectively. Customarily, the symbol ${\mathbf{I}}_n$ stands for the identity matrix of order n. We also assume that the matrix ${\mathbf{X}}$ is of full column rank. Then, the least squares estimator (LSE) of $\varvec{\beta }$ is given by

$$\begin{aligned} \hat{\varvec{\beta }} = ({\mathbf{X}}^\prime {\mathbf{X}})^{-1} \mathbf{X}^\prime {\mathbf{y}}. \end{aligned}$$

It is worth emphasizing that the assumption that ${\mathbf{X}}$ is of full column rank plays a crucial role, and is most often made to assure uniqueness of the estimator of $\varvec{\beta }$; see Puntanen et al. (2011, p. 34). It turns out that $({\mathbf{X}}^\prime {\mathbf{X}})^{-1} {\mathbf{X}}^\prime = {\mathbf{X}}^\dagger $, i.e., the Moore–Penrose inverse of ${\mathbf{X}}$; see Appendix A.

To calculate ${\mathbf{X}}^\dagger $, instead of bothering with the inverse of ${\mathbf{X}}^\prime {\mathbf{X}}$, we may write the regressor matrix ${\mathbf{X}}$ in the columnwise partitioned form

$$\begin{aligned} {\mathbf{X}} = ({\mathbf{X}}_1 : {\mathbf{X}}_2), \end{aligned}$$

(3)

where ${\mathbf{X}}_i$, $i = 1, 2$, denote $n \times p_i$ matrices such that $p_1 + p_2 = p$. Since ${\mathbf{X}}$ is of full column rank, it follows that

$$\begin{aligned} {{\mathcal {R}}}({\mathbf{X}}_1) \cap {{\mathcal {R}}}({\mathbf{X}}_2) = \{ {\mathbf{0}} \}, \end{aligned}$$

(4)

where ${{\mathcal {R}}}({\mathbf{.}})$ stands for the column space (range) of a matrix argument. According to Lemma 1.1, the Moore–Penrose inverse of a matrix of the form (3), such that (4) is satisfied, can be expressed as

(5)

where ${\mathbf{Q}}_i = {\mathbf{I}}_n - {\mathbf{X}}_i {\mathbf{X}}_i^\dagger $ is the orthogonal projector onto ${{\mathcal {N}}}({\mathbf{X}}_i^\prime )$, the null space of ${\mathbf{X}}_i^\prime $, $i = 1, 2$; see Appendix A. Note that the condition (4), under which the representation of the Moore–Penrose inverse (5) is valid, is weaker than the requirement that ${\mathbf{X}}$ specified in (3) is of full column rank (the assumption which will be extensively exploited in what follows).

Let us consider a simple linear regression model

$$\begin{aligned} {\mathbf{y}} = {\beta }_0 {\mathbf {1}} + {\beta }_1 {\mathbf{x}} + {\mathbf{u}}, \end{aligned}$$

where ${\beta }_0, {\beta }_1 \in {\mathbb {R}}$, ${\mathbf {1}} = (1, 1,\ldots , 1)^\prime $ is the vector of n ones and ${\mathbf{x}} = (x_1, x_2,\ldots , x_n)^\prime $ is the vector of observations on one regressor variable. The vector ${\mathbf{u}} = (u_1, u_2,\ldots , u_n)^\prime $ consists of the unknown errors. To obtain the LSE of the parameter vector $\varvec{\beta } = ({\beta }_0, {\beta }_1)^\prime $ we may partition $n \times 2$ matrix ${\mathbf{X}}$ as

$$\begin{aligned} {\mathbf{X}} = ({\mathbf {1}} : {\mathbf{x}}), \end{aligned}$$

(6)

where ${{\mathcal {R}}}({\mathbf{1}}) \cap {{\mathcal {R}}}({\mathbf{x}}) = \{ {\mathbf{0}} \}$ since we assume that ${\mathbf{X}}$ is of full column rank. By (5) it follows that

(7)

where ${\mathbf{Q}}_{\mathbf{x}} {\mathbf{1}} = (\mathbf{I }_n - {\mathbf{x}}\mathbf{x}^\dagger ) {\mathbf{1}}$ and ${\mathbf{Q}}_{{\mathbf{1}}} {\mathbf{x}} = ({\mathbf{I}}_n - {\mathbf{1}}{\mathbf{1}}^\dagger ) {\mathbf{x}}$ are both column vectors. Thus, by the identity (A1) given in Appendix A,

$$\begin{aligned} ({\mathbf{Q}}_{\mathbf{x}} {\mathbf{1}})^\dagger = \frac{{\mathbf{1}}^\prime {\mathbf{Q}}_{\mathbf{x}}}{{\mathbf{1}}^\prime {\mathbf{Q}}_{\mathbf{x}} {\mathbf{1}}} \quad \mathrm {and} \quad ({\mathbf{Q}}_{{\mathbf{1}}} {\mathbf{x}})^\dagger = \frac{{\mathbf{x}}^\prime {\mathbf{Q}}_{{\mathbf{1}}}}{{\mathbf{x}}^\prime {\mathbf{Q}}_{{\mathbf{1}}} {\mathbf{x}}}. \end{aligned}$$

(8)

Consequently, the LSE of $\varvec{\beta } = ({\beta }_0, {\beta }_1)^\prime $ is

(9)

From (8) we obtain

$$\begin{aligned} ({\mathbf{Q}}_{\mathbf{x}} {\mathbf{1}})^\dagger {\mathbf{x}} = 0, \ (\mathbf{Q}_{{\mathbf{1}}} {\mathbf{x}})^\dagger {\mathbf{1}} = 0, \ ({\mathbf{Q}}_{\mathbf{x}} {\mathbf{1}})^\dagger {\mathbf{1}} = 1,\ (\mathbf{Q}_{{\mathbf{1}}} {\mathbf{x}})^\dagger {\mathbf{x}} = 1, \end{aligned}$$

whence

Let us use the symbol ${\mathbf{H}}$ to denote the so-called hat-matrix, which represents the orthogonal projector onto ${{\mathcal {R}}}(\mathbf{X})$, i.e., ${\mathbf{H}} = {\mathbf{X}}{\mathbf{X}}^\dagger $. Then, the identities (6)–(8) entail

$$\begin{aligned} {\mathbf{H}} = {\mathbf{1}}({\mathbf{Q}}_{\mathbf{x}} {\mathbf{1}})^\dagger + {\mathbf{x}} ({\mathbf{Q}}_{{\mathbf{1}}} {\mathbf{x}})^\dagger = \frac{{\mathbf{1}}{\mathbf{1}}^\prime {\mathbf{Q}}_{\mathbf{x}}}{{\mathbf{1}}^\prime {\mathbf{Q}}_{\mathbf{x}} {\mathbf{1}}} + \frac{{\mathbf{x}}{\mathbf{x}}^\prime \mathbf{Q}_{{\mathbf{1}}}}{{\mathbf{x}}^\prime {\mathbf{Q}}_{{\mathbf{1}}} {\mathbf{x}}}, \end{aligned}$$

(10)

which means that the hat-matrix is a sum of two matrices of rank one. Furthermore, each of the summands involved in (10) is idempotent. This observation leads to the conclusion (see e.g., Rao and Mitra (1971, Theorem 5.1.2)) that the matrices necessarily commute and their product is equal to the zero matrix, i.e.,

$$\begin{aligned} {\mathbf{1}}({\mathbf{Q}}_{\mathbf{x}} {\mathbf{1}})^\dagger {\mathbf{x}} (\mathbf{Q}_{{\mathbf{1}}} {\mathbf{x}})^\dagger = {\mathbf{0}} = {\mathbf{x}} ({\mathbf{Q}}_{{\mathbf{1}}} {\mathbf{x}})^\dagger {\mathbf{1}}({\mathbf{Q}}_{\mathbf{x}} {\mathbf{1}})^\dagger . \end{aligned}$$

Since ${\mathbf{1}} \in {{\mathcal {R}}}({\mathbf{X}})$, it follows that ${\mathbf{H}}{\mathbf{1}} = {\mathbf{1}}$. Hence, by denoting $\hat{\mathbf{y}} = {\mathbf{H}}{\mathbf{y}}$, we see that ${\mathbf{1}}^\prime \hat{\mathbf{y}} = {\mathbf{1}}^\prime {\mathbf{H}}{\mathbf{y}} = {\mathbf{1}}^\prime {\mathbf{y}} $, which gives $\sum \nolimits _{i=1}^{n} \hat{y_i} = \sum \nolimits _{i=1}^{n} y_i$. Furthermore, by putting $\hat{\mathbf{u}} = ({\mathbf{I}}_n - {\mathbf{H}}){\mathbf{y}}$, we obtain ${\mathbf{1}}^\prime \hat{\mathbf{u}} = {\mathbf{1}}^\prime ({\mathbf{I}}_n - {\mathbf{H}}){\mathbf{y}} = {\mathbf{0}}$, so $ \sum \nolimits _{i=1}^{n} \hat{u_i} = 0$. Another consequence of ${\mathbf{1}} \in {{\mathcal {R}}}({\mathbf{X}})$ is the identity

$$\begin{aligned} \Vert ({\mathbf{I}}_n - {\mathbf{J}}){\mathbf{y}} \Vert ^2 = \Vert ({\mathbf{H}} - \mathbf{J}){\mathbf{y}} \Vert ^2 + \Vert ({\mathbf{I}}_n - {\mathbf{H}}){\mathbf{y}} \Vert ^2, \end{aligned}$$

(11)

with ${\mathbf{J}} = {\mathbf{1}}{\mathbf{1}}^\dagger $; see Puntanen et al. (2011, Proposition 8.5). Alternatively, the equality (11) can be expressed as $SST = SSR + SSE$, where SST stands for the total sum of squares, SSR for the regression sum of squares, and SSE for the residual sum of squares. The coefficient of determination defined as

$$\begin{aligned} R^2 = \frac{SSR}{SST} \end{aligned}$$

turns out to be

$$\begin{aligned} R^2 = \frac{\Vert ({\mathbf{H}} - {\mathbf{J}}){\mathbf{y}} \Vert ^2}{\Vert (\mathbf{I}_n - {\mathbf{J}}){\mathbf{y}} \Vert ^2} = \frac{{\mathbf{y}}^\prime ({\mathbf{H}} - {\mathbf{J}}){\mathbf{y}}}{ {\mathbf{y}}^\prime ({\mathbf{I}}_n - {\mathbf{J}}){\mathbf{y}}}. \end{aligned}$$

(12)

Clearly, $R^2 \geqslant 0$. Another observation is that ${\mathbf{I}}_n - {\mathbf{J}} - ({\mathbf{H}} - {\mathbf{J}}) = {\mathbf{I}}_n - {\mathbf{H}}$ is nonnegative definite (as ${\mathbf{I}}_n - {\mathbf{H}}$ is the orthogonal projector onto ${{\mathcal {N}}}({\mathbf{X}}^\prime )$). Hence, ${\mathbf{H}} - {\mathbf{J}} {\mathop {\leqslant }\limits ^{{\mathsf {L}}}} {\mathbf{I}}_n - {\mathbf{J}}$, where the symbol ${\mathop {\leqslant }\limits ^{{\mathsf {L}}}}$ denotes the Löwner partial ordering, from where we conclude that $0 \leqslant R^2 \leqslant 1$. The fact that values of $R^2$ are restricted to the interval [0, 1] is known in the literature (see e.g., Davidson and MacKinnon (1993, p. 14)), but usually it is demonstrated in rather more involved way than in the present paper.

Consider now the general linear model with intercept

where ${\mathbf{X}}$ and $\varvec{\beta }$ are of dimensions $n \times (p -1)$ and $(p - 1) \times 1$, respectively, and $\begin{pmatrix} {\mathbf{1}} : {\mathbf{X}} \end{pmatrix}$ is assumed to be of full column rank. Then, by analogy to (9), the LSE of $({\beta }_0, \varvec{\beta })^\prime $ is

(13)

On account of (A1), we obtain $({\mathbf{Q}}_{\mathbf{X}} {\mathbf{1}})^\dagger = ({\mathbf{1}}^\prime {\mathbf{Q}}_{\mathbf{X}} {\mathbf{1}})^{-1}{\mathbf{1}}^\prime {\mathbf{Q}}_{\mathbf{X}}$. Hence, $(\mathbf{Q}_{\mathbf{X}} {\mathbf{1}})^\dagger = ({\mathbf{Q}}_{\mathbf{X}} {\mathbf{1}})^\dagger {\mathbf{Q}}_{\mathbf{X}}$. Similarly, we arrive at $(\mathbf{Q}_{{\mathbf{1}}} {\mathbf{X}})^\dagger = ({\mathbf{Q}}_{{\mathbf{1}}} {\mathbf{X}})^\dagger {\mathbf{Q}}_{{\mathbf{1}}}$. In consequence, since ${\mathsf {E}}({\mathbf{y}}) = {\beta }_0 {\mathbf{1}} + {\mathbf{X}} \varvec{\beta }$, and since both, ${\mathbf{Q}}_{{\mathbf{1}}} \mathbf{X}$ and ${\mathbf{Q}}_{\mathbf{X}} {\mathbf{1}}$, are of full column ranks, we have

Note that the hat-matrix turns out to be

$$\begin{aligned} {\mathbf{H}} = {\mathbf{1}}({\mathbf{Q}}_{\mathbf{X}} {\mathbf{1}})^\dagger + {\mathbf{X}} ({\mathbf{Q}}_{{\mathbf{1}}} {\mathbf{X}})^\dagger , \end{aligned}$$

(14)

a sum of two matrices of which the former is of rank one and the latter is of the same rank as matrix ${\mathbf{X}}$, i.e., $p - 1$. Similarly as above, both summands which determine the hat-matrix specified in (14) are commuting idempotents, whose product equals the zero matrix.

The formula (14) (as well as its particular case (10)) can be viewed as an alternative to the representation of ${\mathbf{H}}$ as a sum of two orthogonal projectors, which reads ${\mathbf{H}} = {\mathbf{J}} + {\mathbf{P}}_{{\mathbf{C}}{\mathbf{X}}}$, where ${\mathbf{P}}_\mathbf{CX} = {\mathbf{CX}} ({\mathbf{CX}})^\dagger $, with ${\mathbf{C}}$ denoting the so-called centering matrix defined as ${\mathbf{C}} = {\mathbf{I}}_n - \mathbf{J}$; see Puntanen et al. (2011, formula (8.108)). It is clear that ${\mathbf{J}}{\mathbf{P}}_{\mathbf{C\mathbf{X}}} = {\mathbf{0}}$.

3 Applications

As in Pedler (1993, Sect. 6), we consider now the linear regression model with 4 regressors

(15)

Let ${\mathbf{x}} = (x_1, x_2,\ldots , x_n)^\prime $, ${\mathbf{y}} = (y_1, y_2, \ldots , y_n)^\prime $, ${\mathbf{a}} = (a_1, a_2,\ldots , a_n)^\prime $, ${\mathbf{b}} = (b_1, b_2,\ldots , b_n)^\prime $, $\varvec{\beta } = (p, q, u, v)^\prime $, ${\mathbf{c}} = (c_1, c_2,\ldots , c_n)^\prime $, ${\mathbf{d}} = (d_1, d_2,\ldots , d_n)^\prime $. Furthermore, let ${\mathbf{X}}$ be a $2n \times 4$ matrix of the form

(16)

Then (15) can be written as

As the matrix ${\mathbf{X}}$ is of full column rank, we can determine the LSE of $\varvec{\beta }$ by applying the representation derived in Baksalary and Trenkler (2021, Example 1) as a consequence of Baksalary and Baksalary (2007, Theorem 1). In order to take advantage of this result, let

$$\begin{aligned} A= & {} \frac{\sum \nolimits _{i=1}^n a_i}{\sum \nolimits _{i=1}^n\left( a_i^2 + b_i^2\right) }, \ B = \frac{\sum \nolimits _{i=1}^n b_i}{\sum \nolimits _{i=1}^n\left( a_i^2 + b_i^2\right) }, \end{aligned}$$

(17)

$$\begin{aligned} N&= n - A \sum \limits _{i=1}^n a_i - B \sum \limits _{i=1}^n b_i, \ M = \sum \limits _{i=1}^n\left( a_i^2 + b_i^2\right) - \frac{1}{n} \left[ \left( \sum \limits _{i=1}^n a_i\right) ^2 + \left( \sum \limits _{i=1}^n b_i\right) ^2\right] , \end{aligned}$$

(18)

(19)

where ${\overline{a}} = \frac{1}{n} \sum \nolimits _{i=1}^n a_i$ and ${\overline{b}} = \frac{1}{n} \sum \nolimits _{i=1}^n b_i$. Then, on account of Baksalary and Trenkler (2021, formula (8)), we obtain a very handy representation of the Moore–Penrose inverse of the model matrix (16), namely

$$\begin{aligned} {\mathbf{X}}^\dagger = \begin{pmatrix} \begin{pmatrix} N^{-1} &{}\quad 0 \\ 0 &{}\quad N^{-1} \end{pmatrix} \begin{pmatrix} {\mathbf{e}}^\prime \\ {\mathbf{f}}^\prime \ \end{pmatrix} \\ \\ \begin{pmatrix} M^{-1} &{}\quad 0 \\ 0 &{}\quad M^{-1} \end{pmatrix} \begin{pmatrix} {\mathbf{g}}^\prime \\ {\mathbf{h}}^\prime \end{pmatrix} \end{pmatrix}. \end{aligned}$$

(20)

Hence, analogously to (13), the LSE of the parameter vector $\varvec{\beta }$ is given by

(21)

Let us now demonstrate the usefulness of the expressions (20) and (21) by applying them to a set of real data.

Example 3.1

As mentioned in Introduction, Pedler (1993) considered “astronomical problem” of fitting a circle to a set of points and solved it from first principles. The considerations in Pedler (1993) contain also an example which exploits the data collected by S.G. Brewer from the observations of the position of Polaris made on December 12, 1983. The data are given in Table 1.

Table 1 At the ith observation at time $t_i$, $(x_i, y_i)$ denotes (observed) cartesian coordinates of the Polaris position, whereas $a_i = \cos (2\pi t_i/T)$ and $b_i = \sin (2\pi t_i /T)$, where $T = 23\mathrm {h}\ 56 \mathrm {m}\ 04 \mathrm {s}$ is the length of the sidereal day. The values are recalled from Pedler (Pedler 1993, Table 1)

Full size table

The data provided in Table 1 enable to calculate the scalars A, B, M, N as well as the vectors ${\mathbf{e}}$, ${\mathbf{f}}$, $\mathbf{g}$, ${\mathbf{h}}$ defined in (17)–(19). Hence, we obtain the vector inner products of the four vectors and the vector ${\mathbf{z}}$. It should be emphasized that these straightforward calculations involve only scalars and vectors and require relatively low computational cost; for details on advantages of the algorithm to calculate the Moore–Penrose inverse which takes into account columnwise partitioning into range disjoint matrices see Baksalary and Trenkler (2021). The outcomes of these computations are provided in Table 2.

Table 2 Values of the scalars calculated on account of the data provided in Table 1

Full size table

In the light of (21), we arrive at the components of the estimator of $\hat{\varvec{\beta }}$, which are given in Table 3. As expected, the values coincide with the ones given in Pedler (1993, Table 2).

Table 3 Least squares estimators obtained on account of (21)

Full size table

The values of the total sum of squares SST, regression sum of squares SSR, residual sum of squares SSE, and the coefficient of determination $R^2$ are provided in Table 4.

Table 4 Values of SST, SSR, SSE, and $R^2$ obtained on account of (11) and (12)

Full size table

4 Supplementary remarks

In the linear regression models considered in Sect. 2 it was assumed that the model matrices are of full column ranks and that the vector ${{\mathbf{1}}}$ is one of the columns. Such assumptions are well justified as they correspond to several common situations. However, the present approach enables to generalize the considerations by weakening the assumption that the model matrix is of full column rank to the requirement that it can be columnwise partitioned into two range disjoint matrices and by relaxing the assumption that one of the columns is the vector ${{\mathbf{1}}}$. To demonstrate this fact, let us assume that the model matrix $\mathbf{X}$ in (2) is partitioned in accordance with (3), i.e.,

$$\begin{aligned} {\mathbf{y}} = {\mathbf{X}}_1 \varvec{\beta }_1 + {\mathbf{X}}_2 \varvec{\beta }_2 + {\mathbf{u}}, \end{aligned}$$

(22)

where vectors $\varvec{\beta }_i$ are of orders $p_i \times 1$, $i = 1, 2$. Provided that the condition (4) holds, by (5) we conclude that the LSE of ${(\varvec{\beta }_1, \varvec{\beta }_2)}^\prime $ is given by

(23)

Visibly, the expression (13) is obtained from (23) by taking ${\mathbf{X}}_1 = {{\mathbf{1}}}$ and $\mathbf{X}_2 = {{\mathbf{X}}}$. Furthermore, from (3) and (5) we obtain

$$\begin{aligned} {\mathbf{H}} = {\mathbf{X}}_1 ({\mathbf{X}}_1^\prime {\mathbf{Q}}_2 {\mathbf{X}}_1)^\dagger {\mathbf{X}}_1^\prime {\mathbf{Q}}_2 + {\mathbf{X}}_2 ({\mathbf{X}}_2^\prime {\mathbf{Q}}_1 {\mathbf{X}}_2)^\dagger {\mathbf{X}}_2^\prime {\mathbf{Q}}_1, \end{aligned}$$

(24)

which under ${\mathbf{X}}_1 = {{\mathbf{1}}}$ and ${\mathbf{X}}_2 = {{\mathbf{X}}}$ leads to the formula (14). Note that the expression (24) is given in Puntanen et al. (2011, Proposition 16.1) along with its equivalent counterparts, one of which is (4).

Another evidence of the applicability of Lemma 1.1 in statistical estimation theory was provided in Baksalary and Trenkler (2021), by deriving an original representation for the best linear unbiased estimator (BLUE) of ${\mathbf{X}} \varvec{\beta }$ under the generalized version of the (consistent) linear model (2) with $\mathsf {Cov}({\mathbf{u}}) = \sigma ^2 {\mathbf{V}}$, where ${\mathbf{V}}$ denotes a known $n \times n$ positive semidefinite matrix. It was shown in Baksalary and Trenkler (2021, Example 4) that ${\mathbf{G}}{\mathbf{y}}$ with

$$\begin{aligned} {\mathbf{G}} = {\mathbf{X}} ({\mathbf{Q}}_ {{\mathbf{V}}{\mathbf{Q}}_{\mathbf{X}}}{\mathbf{X}})^\dagger = {\mathbf{X}} ({\mathbf{X}}^\prime {\mathbf{Q}}_ {{\mathbf{V}}{\mathbf{Q}}_{\mathbf{X}}}\mathbf{X})^\dagger {\mathbf{X}}^\prime {\mathbf{Q}}_ {{\mathbf{V}}{\mathbf{Q}}_{\mathbf{X}}} \end{aligned}$$

(25)

is BLUE of ${\mathbf{X}} \varvec{\beta }$.

Analogously, we can derive representations for BLUE of both, $\mathbf{X}_1 \varvec{\beta }_1$ and ${\mathbf{X}}_2 \varvec{\beta }_2$ under the model (22) when $\mathsf {Cov}({\mathbf{u}}) = \sigma ^2 {\mathbf{V}}$. From Puntanen et al. (2011, formula (10.5)) it follows that ${\mathbf{G}}_1{\mathbf{y}}$ is BLUE of an (estimable) parametric function ${\mathbf{X}}_1 \varvec{\beta }_1$ if ${\mathbf{G}}_1$ satisfies the equation

$$\begin{aligned} {\mathbf{G}}_1({\mathbf{X}}_1 : {\mathbf{X}}_2 : {\mathbf{V}}{\mathbf{Q}}_{\mathbf{X}}) = (\mathbf{X}_1 : {\mathbf{0}} : {\mathbf{0}}). \end{aligned}$$

(26)

On account of (4), which is a necessary and sufficient condition for ${\mathbf{X}}_1 \varvec{\beta }_1$ to be estimable, we arrive at ${{\mathcal {R}}} ({\mathbf{X}}_1) \cap {{\mathcal {R}}}(\mathbf{X}_2 : {\mathbf{V}}{\mathbf{Q}}_{\mathbf{X}}) = \{ {\mathbf{0}} \}$. In consequence, we can utilize the representation of the Moore–Penrose inverse provided in Lemma 1.1, which leads to the conclusion that one of the solutions of (26) is of the form

$$\begin{aligned} {\mathbf{G}}_1 = {\mathbf{X}}_1 ({\mathbf{X}}_1^\prime {\mathbf{Q}}_{({\mathbf{X}}_2 : \mathbf{V}{\mathbf{Q}}_{\mathbf{X}})} {\mathbf{X}}_1)^\dagger {\mathbf{X}}_1^\prime \mathbf{Q}_{({\mathbf{X}}_2 : {\mathbf{V}}{\mathbf{Q}}_{\mathbf{X}})}. \end{aligned}$$

Hence,

$$\begin{aligned} \mathrm {BLUE}({\mathbf{X}}_1 \varvec{\beta }_1) = {\mathbf{X}}_1 (\mathbf{X}_1^\prime {\mathbf{Q}}_{({\mathbf{X}}_2 : {\mathbf{V}}{\mathbf{Q}}_{\mathbf{X}})} \mathbf{X}_1)^\dagger {\mathbf{X}}_1^\prime {\mathbf{Q}}_{({\mathbf{X}}_2 : {\mathbf{V}}\mathbf{Q}_{\mathbf{X}})}{\mathbf{y}}. \end{aligned}$$

Similarly, by interchanging the subscripts “1” and “2” in (26), we obtain

$$\begin{aligned} \mathrm {BLUE}({\mathbf{X}}_2 \varvec{\beta }_2) = {\mathbf{X}}_2 \left( \mathbf{X}_2^\prime {\mathbf{Q}}_{({\mathbf{X}}_1 : {\mathbf{V}}{\mathbf{Q}}_{\mathbf{X}})} \mathbf{X}_2\right) ^\dagger {\mathbf{X}}_2^\prime {\mathbf{Q}}_{({\mathbf{X}}_1 : {\mathbf{V}}\mathbf{Q}_{\mathbf{X}})}{\mathbf{y}}. \end{aligned}$$

The paper is concluded with some remarks concerned with advantages of utilizing the representation of the Moore–Penrose inverse provided in Lemma 1.1 from the computational point of view. In comparison to the methods of determining the inverse based on the singular value decomposition (SVD), which are exploited in several popular software packages (e.g., Matlab, Mathematica or R), an algorithm based on the representation (1) seems to have three main advantages, each leading to a dropping of computational costs. The first one is that it reduces sizes of matrices to be Moore–Penrose inverted—instead of the inverse of an $n \times m$ matrix ${\mathbf{A}}$, we need to compute two inverses of matrices ${\mathbf{Q}}_2 {\mathbf{A}}_1$ and $\mathbf{Q}_1 {\mathbf{A}}_2$ of orders $n \times m_1$ and $n \times m_2$, respectively, where $m_1 + m_2 = m$; as several software tools impose limits on sizes of matrices which can be stored, one can encounter a situation in which the inverse of ${\mathbf{A}}$ may exceed the limit, while the inverses of ${\mathbf{Q}}_2 {\mathbf{A}}_1$ and $\mathbf{Q}_1 {\mathbf{A}}_2$ are still manageable. The second benefit is that the algorithm allows computing both block entries occurring in the inverse (almost) simultaneously—in the light of Baksalary and Trenkler (2021, formula (6)), the two entries involved in the representation (1) are linked by the identity

$$\begin{aligned} \left( {\mathbf{Q}}_i {\mathbf{A}}_j\right) ^\dagger = {\mathbf{A}}_j^\dagger \left[ {\mathbf{I}}_n - \mathbf{A}_i \left( {\mathbf{Q}}_j {\mathbf{A}}_i\right) ^\dagger \right] ,\ i, j = 1, 2, i \ne j, \end{aligned}$$

which means that one of the Moore–Penrose inverses involved in the representation can be derived from the knowledge of the other. The third advantage of the algorithm is that it can be executed iteratively in each subsequent step to matrices of smaller order—from Baksalary and Trenkler (2021, Lemma 1) it follows that when ${\mathbf{A}}$ is of full column rank, then ${\mathbf{Q}}_2 {\mathbf{A}}_1$ and ${\mathbf{Q}}_1 {\mathbf{A}}_2$ are of full column ranks as well, which means that the inverses $({\mathbf{Q}}_2 {\mathbf{A}}_1)^\dagger $ and $({\mathbf{Q}}_1 {\mathbf{A}}_2)^\dagger $ can be computed by applying the same algorithm; the procedure might be carried out iteratively till the matrices to be inverted are all reduced to (row) vectors.

References

Baksalary JK, Baksalary OM (2007) Particular formulae for the Moore-Penrose inverse of a columnwise partitioned matrix. Linear Algebra Appl 421: 16–23
Article MathSciNet Google Scholar
Baksalary OM, Trenkler G (2021) On formulae for the Moore-Penrose inverse of a columnwise partitioned matrix. Appl Math Comput 403: 125913
MathSciNet MATH Google Scholar
Davidson R, MacKinnon JG (1993) Estimation and inference in econometrics. Oxford University Press, New York
MATH Google Scholar
Pedler PJ (1993) Fitting a circle to numerical data: an illustration of statistical modelling. Int J Math Educ Sci Technol 24: 131–143
Article Google Scholar
Puntanen S, Styan GPH, Isotalo J (2011) Matrix tricks for linear statistical models - our personal top twenty. Springer-Verlag, Berlin
Book Google Scholar
Rao CR, Mitra SK (1971) Generalized Inverse of Matrices and Its Applications. Wiley, New York
MATH Google Scholar

Download references

Acknowledgements

The authors are thankful to the two referees for their pertinent comments and suggestions on the first version of the paper, which resulted in a noticeable betterment of its content. The authors are also grateful to the handling editor for highlighting several relevant facts, the mentioning of which distinctly enriched the paper.

Open Access

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Author information

Authors and Affiliations

Faculty of Physics, Adam Mickiewicz University, ul. Uniwersytetu Poznańskiego 2, 61-614, Poznan, Poland
Oskar Maria Baksalary
Faculty of Statistics, Dortmund University of Technology, Dortmund, Germany
Götz Trenkler

Authors

Oskar Maria Baksalary
View author publications
You can also search for this author in PubMed Google Scholar
Götz Trenkler
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Moore–Penrose inverse of a matrix

Let ${\mathbf{S}}$ be an $n \times m$ matrix. Then there exists a unique matrix ${\mathbf{S}}^\dagger $ such that

$$\begin{aligned} {\mathbf{SS}}^\dagger {\mathbf{S}} = {\mathbf{S}}, \ {\mathbf{S}}^\dagger {\mathbf{SS}}^\dagger = {\mathbf{S}}^\dagger , \ ({\mathbf{SS}}^\dagger )^\prime = {\mathbf{SS}}^\dagger , \ ({\mathbf{S}}^\dagger {\mathbf{S}})^\prime = {\mathbf{S}}^\dagger {\mathbf{S}}. \end{aligned}$$

The matrix ${\mathbf{S}}^\dagger $ is called the Moore–Penrose inverse of ${\mathbf{S}}$.

It can be verified that

$$\begin{aligned} {\mathbf{S}}^\dagger = ({\mathbf{S}}^\prime {\mathbf{S}})^\dagger {\mathbf{S}}^\prime , \end{aligned}$$

(A1)

which takes the form ${\mathbf{S}}^\dagger = ({\mathbf{S}}^\prime \mathbf{S})^{-1} {\mathbf{S}}^\prime $, when ${\mathbf{S}}$ is of full column rank. Another relevant property of the Moore–Penrose inverse is that it offers a handy way to represent orthogonal projectors in ${\mathbb {R}}^n$ (symmetric idempotent matrices of order n). To be precise, an $ n \times n$ matrix ${\mathbf{P}}$ is an orthogonal projector if and only if it is expressible as ${\mathbf{SS}}^\dagger $ for some $n \times m$ matrix ${\mathbf{S}}$. Then, ${\mathbf{SS}}^\dagger $ is the orthogonal projector onto ${{\mathcal {R}}}({\mathbf{S}})$ and, consequently, ${\mathbf{I}}_n - {\mathbf{SS}}^\dagger $ is the orthogonal projector onto the orthogonal complement of ${{\mathcal {R}}}(\mathbf{S})$, which coincides with ${{\mathcal {N}}}({\mathbf{S}}^\prime )$. Similarly, ${\mathbf{S}}^\dagger {\mathbf{S}}$ and ${\mathbf{I}}_m - \mathbf{S}^\dagger {\mathbf{S}}$ are the orthogonal projectors onto ${{\mathcal {R}}}({\mathbf{S}}^\prime )$ and ${{\mathcal {N}}}({\mathbf{S}})$, respectively, where ${{\mathcal {R}}}({\mathbf{S}}^\prime ) {\mathop {\oplus }\limits ^{\perp }} {{\mathcal {N}}}({\mathbf{S}}) = {\mathbb {R}}^m$. An important feature is that there is a one-to-one correspondence between the orthogonal projector and the subspace onto which it projects.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Baksalary, O.M., Trenkler, G. An alternative look at the linear regression model. Stat Papers 63, 1499–1509 (2022). https://doi.org/10.1007/s00362-021-01280-x

Download citation

Received: 27 March 2021
Revised: 23 September 2021
Accepted: 25 November 2021
Published: 10 January 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s00362-021-01280-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An alternative look at the linear regression model

Abstract

Similar content being viewed by others

Rank-Based Analysis of Linear Models and Beyond: A Review

A new analysis of the relationships between a general linear model and its mis-specified forms

On linear regression models in infinite dimensional spaces with scalar response