Skip to main content

The Multiple Regression Model

  • Chapter
  • First Online:
Principles of Econometrics

Part of the book series: Classroom Companion: Economics ((CCE))

  • 190 Accesses

Abstract

Regression analysis consists in studying the dependence of a variable (the explained variable) on one or more other variables (the explanatory variables). This chapter presents the multiple regression model, which is a linear model comprising a single equation linking an explained variable to several explanatory variables. Since the parameters of this model are unknown, they must be estimated to quantify the relationship between the dependent and the explanatory variables. The chapter presents the most frequently used estimation method, i.e., the ordinary least squares (OLS) method. It also establishes the properties of the OLS estimators, describes the various tests on the regression coefficients, and presents key indicators such as the (adjusted) coefficient of determination. All the concepts are illustrated thanks to several empirical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This chapter calls upon various notions of matrix algebra. In Appendix 3.1, readers will find the elements of matrix algebra necessary to understand the various developments here.

  2. 2.

    Matrices and vectors are written in bold characters. This notation convention will be used throughout the book.

  3. 3.

    In the sense that the matrix of explanatory variables is assumed to be unchanged whatever the sample of observations.

  4. 4.

    We will see later that such an assumption implies that there is no collinearity between the explanatory variables.

  5. 5.

    We consider the \(\left ( i+1\right )\)th element and not the ith since the first element of the matrix \(\left ( \boldsymbol {X}^{\prime }\boldsymbol {X}\right )^{-1}\) relates to the constant term. Obviously, if the variables are centered, it is appropriate to choose the ith element.

  6. 6.

    Recall that the first element of the matrix corresponds to the constant term, which explains why we consider the \(\left ( i+1\right )\)th element and not the ith.

  7. 7.

    The first element of the matrix corresponding to the constant \(\alpha .\)

  8. 8.

    Strictly speaking, this is known as Kullback-Leibler information (see Kullback and Leibler, 1951).

  9. 9.

    See also Akaike (1969, 1974).

  10. 10.

    In this case, the OLS and maximum likelihood estimators are equivalent.

References

  • Akaike, H. (1969), “Fitting Autoregressive Models for Prediction”, Annals of the Institute of Statistical Mathematics, 21, pp. 243–247.

    Article  Google Scholar 

  • Akaike, H. (1973), “Information theory and an extension of maximum likelihood principle”, Second International Symposium on Information Theory, pp. 261–281.

    Google Scholar 

  • Akaike, H. (1974), “A new look at the statistical model identification”, IEEE Transactions on Automatic Control, 19(6), pp. 716–723.

    Article  Google Scholar 

  • BĂ©nassy-QuĂ©rĂ©, A. and V. Salins (2005), “Impact de l’ouverture financière sur les inĂ©galitĂ©s internes dans les pays Ă©mergents”, Working Paper CEPII, 2005–11.

    Google Scholar 

  • Davidson, R. and J.G. MacKinnon (1993), Estimation and Inference in Econometrics, Oxford University Press.

    Google Scholar 

  • Farvaque, E., Jean , N. and B. Zuindeau (2007), “InĂ©galitĂ©s Ă©cologiques et comportement Ă©lectoral : le cas des Ă©lections municipales françaises de 2001”, DĂ©veloppement Durable et Territoires, Dossier 9.

    Google Scholar 

  • Gallant, A.R. (1987), Nonlinear Statistical Models, John Wiley & Sons.

    Book  Google Scholar 

  • Greene, W. (2020), Econometric Analysis, 8th edition, Pearson.

    Google Scholar 

  • Hannan, E.J. and B.G. Quinn (1979), “The Determination of the Order of an Autoregression”, Journal of the Royal Statistical Society, Series B, 41, pp. 190–195.

    Google Scholar 

  • Hurvich, C.M. and C.-L. Tsai (1989), “Regression and time series model selection in small samples”, Biometrika, 76, pp. 297–307.

    Article  Google Scholar 

  • Johnston, J. and J. Dinardo (1996), Econometric Methods, 4th edition, McGraw Hill.

    Google Scholar 

  • Judge, G.G., Griffiths, W.E., Hill, R.C., Lutkepohl, H. and T.C. Lee (1985), The Theory and Practice of Econometrics, 2nd edition, John Wiley & Sons.

    Google Scholar 

  • Judge, G.G., Griffiths, W.E., Hill, R.C., Lutkepohl, H. and T.C. Lee (1988), Introduction to the Theory and Practice of Econometrics, John Wiley & Sons.

    Google Scholar 

  • Kaufmann, D., Kraay, A. and M. Mastruzzi (2006), “Governance Matters V: Aggregate and Individual Governance Indicators for 1996–2005”, http://web.worldbank.org.

  • Kullback, S. and A. Leibler (1951), “On information and sufficiency”, Annals of Mathematical Statistics 22, pp. 79–86.

    Article  Google Scholar 

  • Puech, F. (2005), Analyse des dĂ©terminants de la criminalitĂ© dans les pays en dĂ©veloppement, Thèse pour le doctorat de Sciences Économiques, UniversitĂ© d’Auvergne-Clermont I.

    Google Scholar 

  • Schwarz, G. (1978), “Estimating the Dimension of a Model”, The Annals of Statistics, 6, pp. 461–464.

    Article  Google Scholar 

  • Thuilliez, J. (2007), “Malaria and Primary Education: A Cross-Country Analysis on Primary Repetition and Completion Rates”, Working Paper Centre d’Économie de la Sorbonne, 2007–13.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Appendices

Appendix 3.1: Elements of Matrix Algebra

This appendix presents the main matrix algebra concepts used in this chapter.

3.1.1 General

A matrix is an ordered array of elements (or entries):

$$\displaystyle \begin{aligned} \boldsymbol{A}=\left( a_{ij}\right) = \begin{pmatrix} a_{11}&a_{12}&\cdots&a_{1p}\\ a_{21}&a_{22}&\cdots&a_{2p}\\&&&\\ a_{n1}&a_{n2}&&a_{np} \end{pmatrix} \end{aligned} $$
(3.185)

\(a_{ij}\) is the element corresponding to the ith row and the jth column of the matrix \(\boldsymbol {A}\). The matrix \(\boldsymbol {A}\) has n rows and p columns. The size (or the dimension) of the matrix is said to be \(n\times p\) (which is also noted as \(\left ( n,p\right )\)).

A row vector is a matrix containing only one row. A column vector is a matrix with only one column. A matrix can therefore be thought of as a set of row vectors or column vectors.

When the number of rows is equal to the number of columns, i.e., \(n=p\), we say that \(\boldsymbol {A}\) is a square matrix. Frequently used square matrices include:

  • Symmetric matrix: it is such that \(\left ( a_{ij}\right ) =\left ( a_{ji}\right )\) for all i and j.

  • Diagonal matrix: this is a matrix whose elements off the diagonal are zero:

    $$\displaystyle \begin{aligned} \boldsymbol{A}= \begin{pmatrix} \alpha_{1}&0&0&\cdots&0\\ 0&\alpha_{2}&0&\cdots&0\\ \cdots&&\ddots&&\cdots\\ \cdots&&&\ddots&0\\ 0&0&\cdots&0&\alpha_{p} \end{pmatrix} \end{aligned} $$
    (3.186)
  • Scalar matrix: this is a diagonal matrix whose elements on the diagonal are all identical:

    $$\displaystyle \begin{aligned} \boldsymbol{A}= \begin{pmatrix} \alpha & 0 & 0 & \cdots & 0\\ 0 & \alpha & 0 & \cdots & 0\\ \cdots & & \ddots & & \cdots\\ \cdots & & & \ddots & 0\\ 0 & 0 & \cdots & 0 & \alpha\end{pmatrix} \end{aligned} $$
    (3.187)
  • Identity matrix: this is a scalar matrix, noted \(\boldsymbol {I}\), whose elements on the diagonal are all equal to 1:

    $$\displaystyle \begin{aligned} \boldsymbol{I}= \begin{pmatrix} 1 & 0 & 0 & \cdots & 0\\ 0 & 1 & 0 & \cdots & 0\\ \cdots & & \ddots & & \cdots\\ \cdots & & & \ddots & 0\\ 0 & 0 & \cdots & 0 & 1\end{pmatrix} \end{aligned} $$
    (3.188)
  • Triangular matrix (inferior or superior): this is a matrix with only null elements above or below the diagonal.

3.1.2 Main Matrix Operations

Let \(\boldsymbol {B}\) be a matrix such that:

$$\displaystyle \begin{aligned} \boldsymbol{B}=\left( b_{ij}\right) = \begin{pmatrix} b_{11}&b_{12}&\cdots&b_{1p}\\ b_{21}&b_{22}&\cdots&b_{2p}\\&&&\\ b_{n1}&b_{n2}&&b_{np} \end{pmatrix} \end{aligned} $$
(3.189)

3.1.2.1 Equality

The matrices \(\boldsymbol {A}\) and \(\boldsymbol {B}\) are equal if they are of the same size and if \(a_{ij}=b_{ij}\) for all i and j.

3.1.2.2 Transposition

The transpose \(\boldsymbol {A}^{\prime }\) of the matrix \(\boldsymbol {A}\) is the matrix whose jth row corresponds to the jth column of the matrix \(\boldsymbol {A}\). Since the size of matrix \(\boldsymbol {A}\) is \(n\times p\), the size of matrix \(\boldsymbol {A}^{\prime }\) is \(p\times n\). Thus, we have:

$$\displaystyle \begin{aligned} \boldsymbol{A}^{\prime}= \begin{pmatrix} a_{11}&a_{21}&\cdots&a_{n1}\\ a_{12}&a_{22}&\cdots&a_{n2}\\&&&\\ a_{1p}&a_{2p}&&a_{np} \end{pmatrix} \end{aligned} $$
(3.190)

A symmetric matrix is therefore, by definition, a matrix such that:

$$\displaystyle \begin{aligned} \boldsymbol{A}=\boldsymbol{A}^{\prime} \end{aligned} $$
(3.191)

The transpose of the transpose is equal to the original matrix, i.e.:

$$\displaystyle \begin{aligned} \left( \boldsymbol{A}^{\prime}\right)^{\prime}=\boldsymbol{A} \end{aligned} $$
(3.192)

3.1.2.3 Addition and Subtraction

Two matrices \(\boldsymbol {A}\) and \(\boldsymbol {B}\) can be added only if they are of the same dimensions, the matrix \(\boldsymbol {C}\) resulting from this sum also having this dimension:

$$\displaystyle \begin{aligned} \boldsymbol{C}=\boldsymbol{A}+\boldsymbol{B}=\left( a_{ij}+b_{ij}\right)\end{aligned} $$
(3.193)

Similarly, we have:

$$\displaystyle \begin{aligned} \boldsymbol{D}=\boldsymbol{A}-\boldsymbol{B}=\left( a_{ij}-b_{ij}\right)\end{aligned} $$
(3.194)

Note that:

$$\displaystyle \begin{aligned} \left( \boldsymbol{A}+\boldsymbol{B}\right)^{\prime}=\boldsymbol{A}^{\prime} +\boldsymbol{B}^{\prime} \end{aligned} $$
(3.195)

i.e., the transpose of a sum is equal to the sum of the transpose.

3.1.2.4 Matrix Multiplication and Scalar Product

The scalar product of a line vector \(\boldsymbol {a}\) with n elements and a column vector \(\boldsymbol {b}\) with n elements is a scalar:

$$\displaystyle \begin{aligned} \boldsymbol{a}^{\prime}\boldsymbol{b}=a_{1}b_{1}+a_{2}b_{2}+\ldots+a_{n}b_{n} \end{aligned} $$
(3.196)

that is:

$$\displaystyle \begin{aligned} \boldsymbol{a}^{\prime}\boldsymbol{b}=\boldsymbol{b}^{\prime}\boldsymbol{a}=\sum \limits_{i=1}^{n}a_{i}b_{i} \end{aligned} $$
(3.197)

Now consider two matrices \(\boldsymbol {A}\) and \(\boldsymbol {B}\) and assume that \(\boldsymbol {A}\) is of size \(n\times p\) and \(\boldsymbol {B}\) is of size \(p\times q\). The matrix \(\boldsymbol {C}\) resulting from the product of these two matrices is a matrix of size \(n\times q\), that is:

$$\displaystyle \begin{aligned} \underset{\left( n\times q\right) }{\boldsymbol{C}}=\underset{\left( n\times p\right) }{\boldsymbol{A}}\underset{\left( p\times q\right) }{\boldsymbol{B}} \end{aligned} $$
(3.198)

By noting \(\boldsymbol {a}_{i}\) \((i=1,2,\ldots ,n)\) the rows of \(\boldsymbol {A}\) and \(\boldsymbol {b}_{j}\) \((j=1,2,\ldots ,q)\) the columns of \(\boldsymbol {B}\), each element of \(\boldsymbol {C}\) is the scalar product of a row vector of \(\boldsymbol {A}\) and a column vector of \(\boldsymbol {B}\). Noting \(c_{ij}\) the ijth element of the matrix \(\boldsymbol {C}\), we thus have:

$$\displaystyle \begin{aligned} c_{ij}=\boldsymbol{a}_{i}^{\prime}\boldsymbol{b}_{j} \end{aligned} $$
(3.199)

Matrix multiplication is only possible if the number of columns in the first matrix (matrix \(\boldsymbol {A}\)) is equal to the number of rows in the second matrix (matrix \(\boldsymbol {B}\)). In this case, we speak about matrices that are compatible for multiplication.

The scalar multiplication of a matrix is the multiplication of each element of that matrix by a given scalar. Thus, for a scalar c and a matrix \(\boldsymbol {A}\), we have:

$$\displaystyle \begin{aligned} c\boldsymbol{A}=\left( ca_{ij}\right)\end{aligned} $$
(3.200)

We also note the following results:

  • Multiplication by the identity matrix:

    $$\displaystyle \begin{aligned} \boldsymbol{AI}=\boldsymbol{IA}=\boldsymbol{A} \end{aligned} $$
    (3.201)
  • Transpose of a product of two matrices:

    $$\displaystyle \begin{aligned} \left( \boldsymbol{AB}\right)^{\prime}=\boldsymbol{B}^{\prime}\boldsymbol{A}^{\prime} \end{aligned} $$
    (3.202)
  • Transpose of a product of more than two matrices:

    $$\displaystyle \begin{aligned} \left( \boldsymbol{ABC}\right)^{\prime}=\boldsymbol{C}^{\prime}\boldsymbol{B}^{\prime }\boldsymbol{A}^{\prime} \end{aligned} $$
    (3.203)
  • Multiplication of matrices is associative:

    $$\displaystyle \begin{aligned} \left( \boldsymbol{AB}\right) \boldsymbol{C}=\boldsymbol{A}\left( \boldsymbol{BC}\right)\end{aligned} $$
    (3.204)
  • The sum and multiplication of matrices are distributive:

    $$\displaystyle \begin{aligned} \boldsymbol{A}\left( \boldsymbol{B+C}\right) =\boldsymbol{AB+AC} \end{aligned} $$
    (3.205)

3.1.2.5 Idempotent Matrix

An idempotent matrix \(\boldsymbol {A}\) is a matrix verifying: \(\boldsymbol {AA=A.}\) In other words, an idempotent matrix is equal to its square. Furthermore, if \(\boldsymbol {A}\) is a symmetric idempotent matrix, then \(\boldsymbol {A}^{\prime }\boldsymbol {A=A.}\)

3.1.3 Rank, Trace, Determinant, and Inverse Matrix

3.1.3.1 Rank of a Matrix

Consider a matrix \(\boldsymbol {A}\) of size \(n\times p\). The rows of \(\boldsymbol {A}\) constitute n vectors, while the columns of \(\boldsymbol {A}\) represent p vectors. Let r denote the maximum number of linearly independent rows and s the maximum number of linearly independent columns. We can demonstrate that, for any matrix \(\boldsymbol {A}\) of size \(n\times p\), we have:

$$\displaystyle \begin{aligned} r=s\end{aligned} $$
(3.206)

This maximum number of linearly independent rows or columns is called the rank of the matrix \(\boldsymbol {A}\). The maximum number of linearly independent rows is called the row rank and the maximum number of linearly independent columns is called the column rank. The row rank and the column rank of a matrix are therefore equal.

A matrix whose rank is equal to the number of its columns is called a full rank matrix.

The rank of a matrix is therefore necessarily less than or equal to the number of its rows or columns, i.e.:

$$\displaystyle \begin{aligned} Rank\left( \boldsymbol{A}\right) \leq\min\left( n,p\right)\end{aligned} $$
(3.207)

Furthermore, we have the following properties:

$$\displaystyle \begin{aligned} Rank\left( \boldsymbol{A}\right) =Rank\left( \boldsymbol{A}^{\prime}\right)\end{aligned} $$
(3.208)
$$\displaystyle \begin{aligned} Rank\left( \boldsymbol{A}\right) =Rank\left( \boldsymbol{A}^{\prime}\boldsymbol{A} \right) =Rank\left( \boldsymbol{AA}^{\prime}\right)\end{aligned} $$
(3.209)
$$\displaystyle \begin{aligned} Rank\left( \boldsymbol{AB}\right) \leq\min\left( Rank\left( \boldsymbol{A}\right) ,Rank(\boldsymbol{B})\right)\end{aligned} $$
(3.210)

We can deduce from this last equation that if \(\boldsymbol {A}\) is a matrix of size \(n\times p\) and \(\boldsymbol {B}\) a square matrix of size \(n\times n\), then:

$$\displaystyle \begin{aligned} Rank\left( \boldsymbol{AB}\right) =Rank(\boldsymbol{A})\end{aligned} $$
(3.211)

If \(\boldsymbol {B}\) is a square matrix of size n and rank n, it is said to be nonsingular. In this case, it admits a unique inverse matrix (see below) noted \(\boldsymbol {B} ^{-1}\) such that:

$$\displaystyle \begin{aligned} \boldsymbol{BB}^{-1}=\boldsymbol{B}^{-1}\boldsymbol{B}=\boldsymbol{I} \end{aligned} $$
(3.212)

When the rank of the matrix \(\boldsymbol {B}\) is less than n, the matrix \(\boldsymbol {B}\) is said to be singular and has no inverse.

3.1.3.2 Trace of a Matrix

The trace of a square matrix \(\boldsymbol {A}\) of size \(n\times n\), denoted \(Tr(\boldsymbol {A})\), is the sum of its diagonal elements:

$$\displaystyle \begin{aligned} Tr(\boldsymbol{A})=\sum_{i=1}^{n}a_{ii} \end{aligned} $$
(3.213)

Furthermore:

$$\displaystyle \begin{aligned} Tr(\boldsymbol{A})=Tr(\boldsymbol{A}^{\prime})\end{aligned} $$
(3.214)
$$\displaystyle \begin{aligned} Tr(\boldsymbol{A}+\boldsymbol{B})=Tr(\boldsymbol{A})+Tr(\boldsymbol{B})\end{aligned} $$
(3.215)

and:

$$\displaystyle \begin{aligned} Tr(\boldsymbol{AB})=Tr(\boldsymbol{BA})\end{aligned} $$
(3.216)

3.1.3.3 Determinant of a Matrix

The determinant of a matrix is defined for square matrices only.

As an introductory example, consider a matrix \(\boldsymbol {A}\) of size \(2\times 2\):

$$\displaystyle \begin{aligned} \boldsymbol{A}= \begin{pmatrix} a & c\\ b & d\end{pmatrix} \end{aligned} $$
(3.217)

The determinant of the matrix \(\boldsymbol {A}\), denoted \(det(\boldsymbol {A})\) or \(\left \vert \boldsymbol {A}\right \vert \), is given by:

$$\displaystyle \begin{aligned} det(\boldsymbol{A})=\left\vert \boldsymbol{A}\right\vert = \begin{vmatrix} a & c\\ b & d\end{vmatrix} =ad-bc\end{aligned} $$
(3.218)

More generally, for matrices of size \(n\times n\), we use the cofactor expansion:

$$\displaystyle \begin{aligned} det(\boldsymbol{A})=\left\vert \boldsymbol{A}\right\vert =\sum_{j=1}^{n} a_{ij}\left( -1\right)^{i+j}\left\vert \boldsymbol{A}_{ij}\right\vert\end{aligned} $$
(3.219)

where \(\boldsymbol {A}_{ij}\) is the matrix obtained from matrix \(\boldsymbol {A}\) by deleting row i and column j. \(\left \vert \boldsymbol {A}_{ij}\right \vert \) is called a minor and the term:

$$\displaystyle \begin{aligned} C_{ij}=\left( -1\right)^{i+j}\left\vert \boldsymbol{A}_{ij}\right\vert\end{aligned} $$
(3.220)

is called a cofactor.

We have the following property:

$$\displaystyle \begin{aligned} \left\vert \boldsymbol{A}\right\vert =\left\vert \boldsymbol{A}^{\prime}\right\vert\end{aligned} $$
(3.221)

If \(\boldsymbol {A}\) and \(\boldsymbol {B}\) are two square matrices, we have:

$$\displaystyle \begin{aligned} \left\vert \boldsymbol{AB}\right\vert =\left\vert \boldsymbol{A}\right\vert \times\left\vert \boldsymbol{B}\right\vert\end{aligned} $$
(3.222)

Moreover, the determinant of a matrix is nonzero if and only if that matrix is of full rank. This last property thus provides a way to determine whether or not a matrix is of full rank (this is only operational if the matrix is not too large).

3.1.3.4 Inverse Matrix

For a matrix to be invertible, it must be nonsingular. Conversely, a matrix is nonsingular if and only if its inverse exists.

Consider a matrix \(\boldsymbol {A}\) of size \(n\times n\). We can write the determinant of this matrix as a function of the cofactors as follows:

$$\displaystyle \begin{aligned} \left\vert \boldsymbol{A}\right\vert =a_{i1}C_{i1}+a_{i2}C_{i2}+\ldots+a_{in}C_{in} \end{aligned} $$
(3.223)

or, equivalently:

$$\displaystyle \begin{aligned} \left\vert \boldsymbol{A}\right\vert =a_{1j}C_{1j}+a_{2j}C_{2j}+\ldots+a_{nj}C_{nj} \end{aligned} $$
(3.224)

for \(i,j=1,\ldots ,n\).

The inverse of the matrix \(\boldsymbol {A}\), denoted \(\boldsymbol {A}^{-1}\), is defined by:

$$\displaystyle \begin{aligned} \boldsymbol{A}^{-1}=\frac{1}{\left\vert \boldsymbol{A}\right\vert } \begin{pmatrix} C_{11}&C_{21}&\cdots&C_{n1}\\ C_{12}&C_{22}&\cdots&C_{n2}\\&&&\\ C_{1n}&C_{2n}&&C_{nn} \end{pmatrix} \end{aligned} $$
(3.225)

Let us mention the following properties of inverse matrices:

$$\displaystyle \begin{aligned} \left\vert \boldsymbol{A}^{-1}\right\vert =\frac{1}{\left\vert \boldsymbol{A} \right\vert } \end{aligned} $$
(3.226)
$$\displaystyle \begin{aligned} \left( \boldsymbol{A}^{-1}\right)^{-1}=\boldsymbol{A} \end{aligned} $$
(3.227)
$$\displaystyle \begin{aligned} \left( \boldsymbol{A}^{-1}\right)^{\prime}=\left( \boldsymbol{A}^{\prime}\right)^{-1} \end{aligned} $$
(3.228)
$$\displaystyle \begin{aligned} \left( \boldsymbol{AB}\right)^{-1}=\boldsymbol{B}^{-1}\boldsymbol{A}^{-1} \end{aligned} $$
(3.229)
$$\displaystyle \begin{aligned} \left( \boldsymbol{ABC}\right)^{-1}=\boldsymbol{C}^{-1}\boldsymbol{B}^{-1} \boldsymbol{A}^{-1} \end{aligned} $$
(3.230)

Appendix 3.2: Demonstrations

3.1.1 Appendix 3.2.1: Demonstration of the Minimum Variance Property of OLS Estimators

In order to show that \(\hat {\boldsymbol {\beta }}\) is a minimum variance estimator, suppose there exists another linear estimator \(\boldsymbol {\breve {\beta }}\) of \(\boldsymbol {\beta }\):

$$\displaystyle \begin{aligned} \underset{(k+1,1)}{\boldsymbol{\breve{\beta}}}=\underset{(k+1,T)}{\boldsymbol{M} }\underset{(T,1)}{\boldsymbol{Y}} \end{aligned} $$
(3.231)

We can then write:

$$\displaystyle \begin{aligned} \boldsymbol{\breve{\beta}}=\boldsymbol{M}\left( \boldsymbol{X\beta}+\boldsymbol{\varepsilon }\right) =\boldsymbol{MX\beta}+\boldsymbol{M\varepsilon}{} \end{aligned} $$
(3.232)

If \(\boldsymbol {\breve {\beta }}\) is the desired estimator, by virtue of (3.29), it is necessary that in the expression of \(\boldsymbol {M}\):

$$\displaystyle \begin{aligned} \boldsymbol{M}=\left( \boldsymbol{X}^{\prime}\boldsymbol{X}\right)^{-1}\boldsymbol{X} ^{\prime}+\boldsymbol{N} \end{aligned} $$
(3.233)

\(\boldsymbol {N=0}\), where \(\boldsymbol {N}\) is of dimension \((k+1,T)\). Let us show that \(\boldsymbol {N=0}\).

Knowing that \(\boldsymbol {\breve {\beta }}\) must be an unbiased estimator of \(\boldsymbol {\beta }\), we have:

$$\displaystyle \begin{aligned} E\left( \boldsymbol{\breve{\beta}}\right) =\boldsymbol{\beta} \end{aligned} $$
(3.234)

Furthermore:

$$\displaystyle \begin{aligned} E\left( \boldsymbol{\breve{\beta}}\right) =E\left( \boldsymbol{MX\beta }+\boldsymbol{M\varepsilon}\right) =\boldsymbol{MX\beta} \end{aligned} $$
(3.235)

because \(E\left ( \boldsymbol {\varepsilon }\right ) =0\). We deduce that \(E\left ( \boldsymbol {\breve {\beta }}\right ) =\boldsymbol {\beta }\) if:

$$\displaystyle \begin{aligned} \boldsymbol{MX}=\boldsymbol{I} \end{aligned} $$
(3.236)

Replacing \(\boldsymbol {M}\) with \(\left ( \boldsymbol {X}^{\prime }\boldsymbol {X} \right )^{-1}\boldsymbol {X}^{\prime }+\boldsymbol {N}\), we have:

$$\displaystyle \begin{aligned} \left[ \left( \boldsymbol{X}^{\prime}\boldsymbol{X}\right)^{-1}\boldsymbol{X}^{\prime }+\boldsymbol{N}\right] \boldsymbol{X}=\boldsymbol{I} \end{aligned} $$
(3.237)

Knowing that \(\left ( \boldsymbol {X}^{\prime }\boldsymbol {X}\right )^{-1} \boldsymbol {X}^{\prime }\boldsymbol {X}=\boldsymbol {I}\), we get:

$$\displaystyle \begin{aligned} \boldsymbol{I}+\boldsymbol{NX}=\boldsymbol{I} \end{aligned} $$
(3.238)

Hence:

$$\displaystyle \begin{aligned} \boldsymbol{NX}=0{} \end{aligned} $$
(3.239)

Replacing \(\boldsymbol {MX}\) with \(\boldsymbol {I}\) in (3.232), we have:

$$\displaystyle \begin{aligned} \boldsymbol{\breve{\beta}}=\boldsymbol{\beta}+\boldsymbol{M\varepsilon}=E\left( \boldsymbol{\breve{\beta}}\right) +\boldsymbol{M\varepsilon} \end{aligned} $$
(3.240)

Hence:

$$\displaystyle \begin{aligned} \boldsymbol{\breve{\beta}}-\boldsymbol{\beta}=\boldsymbol{\breve{\beta}}-E\left( \boldsymbol{\breve{\beta}}\right) =\boldsymbol{M\varepsilon} \end{aligned} $$
(3.241)

Let us now determine the variance-covariance matrix \(\boldsymbol {\Omega }_{\boldsymbol {\breve {\beta }}}\) of \(\boldsymbol {\breve {\beta }}\):

$$\displaystyle \begin{aligned} \boldsymbol{\Omega}_{\boldsymbol{\breve{\beta}}}&=E\left[ \left( \boldsymbol{\breve {\beta}}-\boldsymbol{\beta}\right) \left( \boldsymbol{\breve{\beta}}-\boldsymbol{\beta }\right)^{\prime}\right] \\&=E\left[ \left( \boldsymbol{M\varepsilon}\right) \left( \boldsymbol{M\varepsilon }\right)^{\prime}\right] \\&=E\left[ \boldsymbol{M\varepsilon\boldsymbol{\varepsilon}^{\prime}M}^{\prime }\right] \end{aligned} $$
(3.242)

Hence:

$$\displaystyle \begin{aligned} \boldsymbol{\Omega}_{\boldsymbol{\breve{\beta}}}=\sigma_{\varepsilon}^{2} \boldsymbol{MM}^{\prime} \end{aligned} $$
(3.243)

Let us determine the matrix product \(\boldsymbol {MM}^{\prime }\):

$$\displaystyle \begin{aligned} \boldsymbol{MM}^{\prime}=\left[ \left( \boldsymbol{X}^{\prime}\boldsymbol{X}\right)^{-1}\boldsymbol{X}^{\prime}+\boldsymbol{N}\right] \left[ \left( \boldsymbol{X} ^{\prime}\boldsymbol{X}\right)^{-1}\boldsymbol{X}^{\prime}+\boldsymbol{N}\right]^{\prime} \end{aligned} $$
(3.244)

\(\boldsymbol {MM}^{\prime }=\left [ \left ( \boldsymbol {X}^{\prime }\boldsymbol {X}\right )^{-1}\boldsymbol {X}^{\prime }+\boldsymbol {N}\right ] \left [ \boldsymbol {X}\left ( \boldsymbol {X}^{\prime }\boldsymbol {X}\right )^{-1}+\boldsymbol {N}^{\prime }\right ]\)

\(\boldsymbol {MM}^{\prime }=\left ( \boldsymbol {X}^{\prime }\boldsymbol {X}\right )^{-1}\boldsymbol {X}^{\prime }\boldsymbol {X}\left ( \boldsymbol {X}^{\prime }\boldsymbol {X} \right )^{-1}+\left ( \boldsymbol {X}^{\prime }\boldsymbol {X}\right )^{-1} \boldsymbol {X}^{\prime }\boldsymbol {N}^{\prime }+\boldsymbol {NX}\left ( \boldsymbol {X}^{\prime }\boldsymbol {X}\right )^{-1}+\boldsymbol {NN}^{\prime }\)

According to (3.239), \(\boldsymbol {NX=}\left ( \boldsymbol {X}^{\prime }\boldsymbol {N}^{\prime }\right )^{\prime }=0\), hence:

$$\displaystyle \begin{aligned} \boldsymbol{MM}^{\prime}=\left( \boldsymbol{X}^{\prime}\boldsymbol{X}\right)^{-1}+\boldsymbol{NN}^{\prime} \end{aligned} $$
(3.245)

So we have:

$$\displaystyle \begin{aligned} \boldsymbol{\Omega}_{\boldsymbol{\breve{\beta}}}=\sigma_{\varepsilon}^{2}\left( \left( \boldsymbol{X}^{\prime}\boldsymbol{X}\right)^{-1}+\boldsymbol{NN}^{\prime }\right)\end{aligned} $$
(3.246)

For \(\boldsymbol {\breve {\beta }}\) to have minimal variance and knowing that the variances lie on the diagonal of \(\boldsymbol {\Omega }_{\boldsymbol {\breve {\beta }}}\), we need to minimize the diagonal elements of \(\boldsymbol {\Omega }_{\boldsymbol {\breve {\beta }}}\). Since the diagonal elements of \(\left ( \boldsymbol {X}^{\prime }\boldsymbol {X}\right )^{-1}\) are constants, the diagonal elements of the matrix \(\boldsymbol {NN}^{\prime }\) must be minimized. If we denote \(n_{ij}\) the elements of the matrix \(\boldsymbol {N}\), where i stands for the row and j for the column, the diagonal elements of the matrix \(\boldsymbol {NN}^{\prime }\) are given by: \(\sum \limits _{j}n_{ij}^{2}\). These elements are minimal if \(\sum \limits _{j}n_{ij}^{2}=0\), or \(n_{ij}=0\) \(\forall i,\forall j\). We deduce:

$$\displaystyle \begin{aligned} \boldsymbol{N}=0\end{aligned} $$
(3.247)

Therefore:

$$\displaystyle \begin{aligned} \boldsymbol{\breve{\beta}}=\hat{\boldsymbol{\beta}} \end{aligned} $$
(3.248)

It follows that the OLS estimator \(\hat {\boldsymbol {\beta }}\) is of minimum variance.

3.1.2 Appendix 3.2.2: Calculation of the Error Variance

In order to estimate the variance \(\sigma _{\varepsilon }^{2}\) of the errors, we need to use the residuals \(\boldsymbol {e}\):

$$\displaystyle \begin{aligned} \boldsymbol{e}=\boldsymbol{Y}-\boldsymbol{X\hat{\beta}} \end{aligned} $$
(3.249)

We have:

$$\displaystyle \begin{aligned} \boldsymbol{e}&=\boldsymbol{X\beta}+\boldsymbol{\varepsilon}-\boldsymbol{X}\left( \boldsymbol{X}^{\prime}\boldsymbol{X}\right)^{-1}\boldsymbol{X}^{\prime}\boldsymbol{Y}\\&=\boldsymbol{X\beta}+\boldsymbol{\varepsilon}-\boldsymbol{X}\left( \boldsymbol{X}^{\prime }\boldsymbol{X}\right)^{-1}\boldsymbol{X}^{\prime}\left( \boldsymbol{X\beta }+\boldsymbol{\varepsilon}\right) \end{aligned} $$
(3.250)

Hence:

$$\displaystyle \begin{aligned} \boldsymbol{e}=\boldsymbol{\varepsilon}-\boldsymbol{X}\left( \boldsymbol{X}^{\prime }\boldsymbol{X}\right)^{-1}\boldsymbol{X}^{\prime}\boldsymbol{\varepsilon} \end{aligned} $$
(3.251)

Noting \(\boldsymbol {P}=\boldsymbol {I}-\boldsymbol {X}\left ( \boldsymbol {X}^{\prime }\boldsymbol {X}\right )^{-1}\boldsymbol {X}^{\prime }\), we can write:

$$\displaystyle \begin{aligned} \boldsymbol{e}=\boldsymbol{P\varepsilon} \end{aligned} $$
(3.252)

Let us study the properties of the matrix \(\boldsymbol {P}\).

  • \(\boldsymbol {P}^{\prime }=\left [ \boldsymbol {I}-\boldsymbol {X}\left ( \boldsymbol {X} ^{\prime }\boldsymbol {X}\right )^{-1}\boldsymbol {X}^{\prime }\right ]^{\prime }=\boldsymbol {I}-\boldsymbol {X}\left ( \boldsymbol {X}^{\prime }\boldsymbol {X}\right )^{-1}\boldsymbol {X}^{\prime }=\boldsymbol {P}\). \(\boldsymbol {P}\) is therefore a symmetric matrix.

  • \(\boldsymbol {P}^{2}=\left [ \boldsymbol {I}-\boldsymbol {X}\left ( \boldsymbol {X}^{\prime }\boldsymbol {X}\right )^{-1}\boldsymbol {X}^{\prime }\right ] \left [ \boldsymbol {I} -\boldsymbol {X}\left ( \boldsymbol {X}^{\prime }\boldsymbol {X}\right )^{-1}\boldsymbol {X} ^{\prime }\right ] =\boldsymbol {P}\). \(\boldsymbol {P}\) is therefore an idempotent matrix.

We have:

$$\displaystyle \begin{aligned} \boldsymbol{e}^{\prime}\boldsymbol{e}=\boldsymbol{\varepsilon}^{\prime}\boldsymbol{P}^{\prime }\boldsymbol{P\varepsilon=\varepsilon}^{\prime}\boldsymbol{P\varepsilon} \end{aligned} $$
(3.253)

by virtue of the idempotency and symmetry properties of the matrix \(\boldsymbol {P}\).

Let us now determine the mathematical expectation of \(\boldsymbol {e}^{\prime }\boldsymbol {e}\):

$$\displaystyle \begin{aligned} E\left( \boldsymbol{e}^{\prime}\boldsymbol{e}\right) =E\left( \boldsymbol{\varepsilon }^{\prime}\boldsymbol{P\varepsilon}\right)\end{aligned} $$
(3.254)

Since \(\boldsymbol {\varepsilon }^{\prime }\boldsymbol {P\varepsilon }\) is a scalar and noting Tr the trace, we have:

$$\displaystyle \begin{aligned} E\left( \boldsymbol{e}^{\prime}\boldsymbol{e}\right) =E\left[ Tr\left( \boldsymbol{\varepsilon}^{\prime}\boldsymbol{P\varepsilon}\right) \right]\end{aligned} $$
(3.255)

or:

$$\displaystyle \begin{aligned} E\left( \boldsymbol{e}^{\prime}\boldsymbol{e}\right) =E\left[ Tr\left( \boldsymbol{P\varepsilon\boldsymbol{\varepsilon}^{\prime}}\right) \right]\end{aligned} $$
(3.256)

using the fact that \(Tr(\boldsymbol {AB})=Tr(\boldsymbol {BA})\) with \(\boldsymbol {A} =\boldsymbol {\varepsilon }^{\prime }\) and \(\boldsymbol {B}=\boldsymbol {P\varepsilon }\). Hence:

$$\displaystyle \begin{aligned} E\left( \boldsymbol{e}^{\prime}\boldsymbol{e}\right) =\sigma_{\varepsilon} ^{2}Tr\left( \boldsymbol{P}\right)\end{aligned} $$
(3.257)

It remains for us to determine the trace of the matrix \(\boldsymbol {P}\):

$$\displaystyle \begin{aligned} Tr\left( \boldsymbol{P}\right)&=Tr\left[ \boldsymbol{I}-\boldsymbol{X}\left( \boldsymbol{X}^{\prime}\boldsymbol{X}\right)^{-1}\boldsymbol{X}^{\prime}\right] \\&=Tr\boldsymbol{I}-Tr\left[ \boldsymbol{X}\left( \boldsymbol{X}^{\prime}\boldsymbol{X} \right)^{-1}\boldsymbol{X}^{\prime}\right] \\&=Tr\boldsymbol{I}-Tr\left[ \left( \boldsymbol{X}^{\prime}\boldsymbol{X}\right)^{-1}\boldsymbol{X}^{\prime}\boldsymbol{X}\right] \end{aligned} $$
(3.258)

\(Tr\boldsymbol {I}=T\) and \(Tr\left [ \left ( \boldsymbol {X}^{\prime }\boldsymbol {X} \right )^{-1}\boldsymbol {X}^{\prime }\boldsymbol {X}\right ] =k+1\) since the matrix \(\boldsymbol {X}^{\prime }\boldsymbol {X}\) is of size \(\left ( k+1,k+1\right )\). We deduce:

$$\displaystyle \begin{aligned} Tr\left( \boldsymbol{P}\right) =T-k-1\end{aligned} $$
(3.259)

Finally:

$$\displaystyle \begin{aligned} E\left( \boldsymbol{e}^{\prime}\boldsymbol{e}\right) =\sigma_{\varepsilon} ^{2}\left( T-k-1\right)\end{aligned} $$
(3.260)

It follows that the estimator \(\hat {\sigma }_{\varepsilon }^{2}\) of the error variance is therefore written as:

$$\displaystyle \begin{aligned} \hat{\sigma}_{\varepsilon}^{2}=\frac{\boldsymbol{e}^{\prime}\boldsymbol{e}} {T-k-1}\equiv\frac{1}{T-k-1}\sum_{t=1}^{T}e_{t}^{2}{} \end{aligned} $$
(3.261)

This is an unbiased estimator of \(\sigma _{\varepsilon }^{2}\).

3.1.3 Appendix 3.2.3: Significance Tests of Several Coefficients

In order to derive the various significance tests, we need to determine the distribution followed by \(\boldsymbol {R\beta }\). \(\boldsymbol {\beta }\) being unknown, let us replace it by its estimator:

$$\displaystyle \begin{aligned} \boldsymbol{R\hat{\beta}}=\boldsymbol{r} \end{aligned} $$
(3.262)

and determine the distribution followed by \(\boldsymbol {R\hat {\beta }}\). Knowing that \(\hat {\boldsymbol {\beta }}\) is an unbiased estimator of \(\boldsymbol {\beta }\), we can write:

$$\displaystyle \begin{aligned} E\left( \boldsymbol{R\hat{\beta}}\right) =\boldsymbol{R\beta} \end{aligned} $$
(3.263)

Furthermore:

$$\displaystyle \begin{aligned} V\left( \boldsymbol{R\hat{\beta}}\right) =E\left[ \boldsymbol{R}\left( \hat{\boldsymbol{\beta}}-\boldsymbol{\beta}\right) \left( \hat{\boldsymbol{\beta} }-\boldsymbol{\beta}\right)^{\prime}\boldsymbol{R}^{\prime}\right]\end{aligned} $$
(3.264)

Hence:

$$\displaystyle \begin{aligned} V\left( \boldsymbol{R\hat{\beta}}\right) =\sigma_{\varepsilon}^{2} \boldsymbol{R}\left( \boldsymbol{X}^{\prime}\boldsymbol{X}\right)^{-1}\boldsymbol{R} ^{\prime}{} \end{aligned} $$
(3.265)

We know that \(\hat {\boldsymbol {\beta }}\) follows a normal distribution with \(\left ( k+1\right )\) dimensions, therefore:

$$\displaystyle \begin{aligned} \boldsymbol{R\hat{\beta}}\sim N\left( \boldsymbol{R\hat{\beta}},\sigma_{\varepsilon }^{2}\boldsymbol{R}\left( \boldsymbol{X}^{\prime}\boldsymbol{X}\right)^{-1} \boldsymbol{R}^{\prime}\right)\end{aligned} $$
(3.266)

and:

$$\displaystyle \begin{aligned} \boldsymbol{R}\left( \hat{\boldsymbol{\beta}}-\boldsymbol{\beta}\right) \sim N\left( 0,\sigma_{\varepsilon}^{2}\boldsymbol{R}\left( \boldsymbol{X}^{\prime}\boldsymbol{X} \right)^{-1}\boldsymbol{R}^{\prime}\right)\end{aligned} $$
(3.267)

Under the null hypothesis \(\boldsymbol {R\beta }=\boldsymbol {r}\), we therefore have:

$$\displaystyle \begin{aligned} \boldsymbol{R\hat{\beta}}-\boldsymbol{r}\sim N\left( 0,\sigma_{\varepsilon} ^{2}\boldsymbol{R}\left( \boldsymbol{X}^{\prime}\boldsymbol{X}\right)^{-1} \boldsymbol{R}^{\prime}\right)\end{aligned} $$
(3.268)

Using the result that if \(\boldsymbol {w}\sim N\left ( 0,\boldsymbol {\Sigma }\right )\) where \(\boldsymbol {\Sigma }\) is of size \((K,K)\), we have \(\boldsymbol {w}^{\prime }\boldsymbol {\Sigma }^{-1}\boldsymbol {w}\sim \chi _{K}^{2}\); then:

$$\displaystyle \begin{aligned} \left( \boldsymbol{R\hat{\beta}}-\boldsymbol{r}\right)^{\prime}\left[ \sigma_{\varepsilon}^{2}\boldsymbol{R}\left( \boldsymbol{X}^{\prime}\boldsymbol{X} \right)^{-1}\boldsymbol{R}^{\prime}\right]^{-1}\left( \boldsymbol{R\hat{\beta} }-\boldsymbol{r}\right) \sim\chi_{q}^{2} \end{aligned} $$
(3.269)

Knowing that:

$$\displaystyle \begin{aligned} \frac{\boldsymbol{e}^{\prime}\boldsymbol{e}}{\sigma_{\varepsilon}^{2}}\sim\chi_{T-k-1}^{2} \end{aligned} $$
(3.270)

and using the result (see Box 2.2 in Chap. 2) that if \(w\sim \chi _{s}^{2}\) and \(v\sim \chi _{r}^{2}\), the statistic \(F=\frac {w/s}{v/r}\) follows a Fisher distribution with \(\left ( s,r\right )\) degrees of freedom, we deduce:

$$\displaystyle \begin{aligned} F=\frac{\left( \boldsymbol{R\hat{\beta}}-\boldsymbol{r}\right)^{\prime}\left[ \boldsymbol{R}\left( \boldsymbol{X}^{\prime}\boldsymbol{X}\right)^{-1}\boldsymbol{R} ^{\prime}\right]^{-1}\left( \boldsymbol{R\hat{\beta}}-\boldsymbol{r}\right) /q}{\boldsymbol{e}^{\prime}\boldsymbol{e/}\left( T-k-1\right) }\sim F\left( q,T-k-1\right) {} \end{aligned} $$
(3.271)

We then have the following decision rule:

  • If \(F\leq F\left ( q,T-k-1\right )\), the null hypothesis is not rejected, i.e.: \(\boldsymbol {R\beta }=\boldsymbol {r}\).

  • If \(F>F\left ( q,T-k-1\right )\), the null hypothesis is rejected.

Let us now return to the three special cases studied—test on a single coefficient, test on all coefficients, and test on a subset of coefficients—to specify the expression of the test in each of these cases.

  • Test on a particular regression coefficient \(\beta _{i}\). This case corresponds to the null hypothesis \(\beta _{i}=0\), i.e.:

    $$\displaystyle \begin{aligned} \boldsymbol{R}=[0\cdots010\cdots0]\text{ and }\boldsymbol{r}=0\end{aligned} $$
    (3.272)

    We then have \(\boldsymbol {R\hat {\beta }}-\boldsymbol {r}=\beta _{i}\) and the quadratic form \(\boldsymbol {R}\left ( \boldsymbol {X}^{\prime }\boldsymbol {X}\right )^{-1}\boldsymbol {R}^{\prime }\) is equal to the \(\left ( i+1\right )\)th element of the diagonal of the matrix \(\left ( \boldsymbol {X} ^{\prime }\boldsymbol {X}\right )^{-1}\), i.e., \(a_{i+1,i+1}.\)

    The test statistic given in (3.271) becomes:

    $$\displaystyle \begin{aligned} F=\frac{\beta_{i}^{2}\left[ a_{i+1,i+1}\right]^{-1}/1}{\hat{\sigma }_{\varepsilon}^{2}}\sim F\left( 1,T-k-1\right)\end{aligned} $$
    (3.273)

    that is finally:

    $$\displaystyle \begin{aligned} F=\frac{\beta_{i}^{2}}{\hat{\sigma}_{\varepsilon}^{2}a_{i+1,i+1}}\sim F\left( 1,T-k-1\right)\end{aligned} $$
    (3.274)

    This test is equivalent to a Student’s significance test on a single coefficient since \(\left [ t\left ( T-k-1\right ) \right ]^{2}=F\left ( 1,T-k-1\right )\). The same reasoning applies to the test of the null hypothesis \(\beta _{i}=\beta _{0}\).

  • Test of significance of all coefficients. This case corresponds to the null hypothesis \(\beta _{1}=\beta _{2}=\cdots =\beta _{k}=0\), i.e.:

    $$\displaystyle \begin{aligned} \boldsymbol{R}= \begin{pmatrix} 0 & 1 & 0 & \cdots & 0\\ 0 & 0 & 1 & \cdots & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ 0 & \cdots & \cdots & 0 & 1\end{pmatrix} \text{ and }\boldsymbol{r}= \begin{pmatrix} 0\\ 0\\ \vdots\\ 0\end{pmatrix} \end{aligned} $$
    (3.275)

    We then have \(\boldsymbol {R\hat {\beta }}-\boldsymbol {r}=\boldsymbol {\bar {\beta }}\) with \(\boldsymbol {\bar {\beta }}= \begin {pmatrix} \hat {\beta }_{1}&\hat {\beta }_{2}&\cdots &\hat {\beta }_{k} \end {pmatrix} ^{\prime }\), i.e., \(\boldsymbol {\bar {\beta }}\) is the vector of OLS coefficients without the constant term. Furthermore, the matrix \(\boldsymbol {R}\left ( \boldsymbol {X}^{\prime }\boldsymbol {X}\right )^{-1}\boldsymbol {R} ^{\prime }\) involved in calculating the test statistic (Eq. (3.271)) is equal to the submatrix of size \((k,k)\) obtained by deleting the first row and column of the matrix \(\left ( \boldsymbol {X}^{\prime }\boldsymbol {X}\right )^{-1}\). To clarify the expression of this submatrix, let us decompose the matrix \(\boldsymbol {X}\) into two blocks:

    $$\displaystyle \begin{aligned} \boldsymbol{X}= \begin{pmatrix} \boldsymbol{\bar{x}} & \boldsymbol{\bar{X}} \end{pmatrix} \end{aligned} $$
    (3.276)

    where \(\boldsymbol {\bar {x}}\) denotes a column vector composed of 1 and \(\boldsymbol {\bar {X}}\) is the matrix of size \((T,k)\) comprising the values of the k explanatory variables. We then have:

    $$\displaystyle \begin{aligned} \boldsymbol{X}^{\prime}\boldsymbol{X}= \begin{pmatrix} T&\boldsymbol{\bar{x}}^{\prime}\boldsymbol{\bar{X}}\\ \boldsymbol{\bar{X}}^{\prime}\boldsymbol{\bar{x}}&\boldsymbol{\bar{X}}^{\prime }\boldsymbol{\bar{X}} \end{pmatrix} \end{aligned} $$
    (3.277)

    The calculation of \(\left ( \boldsymbol {X}^{\prime }\boldsymbol {X}\right )^{-1}\) shows us that the submatrix of size \((k,k)\) that interests us here is written as:

    $$\displaystyle \begin{aligned} \left( \boldsymbol{\bar{X}}^{\prime}\boldsymbol{\bar{X}-\bar{X}}^{\prime} \boldsymbol{\bar{x}}T^{-1}\boldsymbol{\bar{x}}^{\prime}\boldsymbol{\bar{X}}\right)^{-1}=\left( \boldsymbol{\bar{X}}^{\prime}\boldsymbol{Z\bar{X}}\right)^{-1} \end{aligned} $$
    (3.278)

    where \(\boldsymbol {Z}\) is the transformation matrix given by:

    $$\displaystyle \begin{aligned} \boldsymbol{Z}=\boldsymbol{I}-T^{-1}\boldsymbol{\bar{x}\bar{x}}^{\prime} \end{aligned} $$
    (3.279)

    Relationship (3.271) then becomes:

    $$\displaystyle \begin{aligned} F=\frac{\boldsymbol{\bar{\beta}}^{\prime}\left( \boldsymbol{\bar{X}}^{\prime }\boldsymbol{Z\bar{X}}\right) \boldsymbol{\bar{\beta}}/q}{\boldsymbol{e}^{\prime }\boldsymbol{e/}\left( T-k-1\right) } \end{aligned} $$
    (3.280)

    or, knowing that \(q=k\):

    $$\displaystyle \begin{aligned} F=\frac{\boldsymbol{\bar{\beta}}^{\prime}\left( \boldsymbol{\bar{X}}^{\prime }\boldsymbol{Z\bar{X}}\right) \boldsymbol{\bar{\beta}}/k}{\boldsymbol{e}^{\prime }\boldsymbol{e/}\left( T-k-1\right) }{} \end{aligned} $$
    (3.281)

    The decision rule is given by:

    • If \(F\leq F\left ( q,T-k-1\right )\), the null hypothesis that all explanatory variables are not significant is not rejected.

    • If \(F>F\left ( q,T-k-1\right )\), the null hypothesis is rejected.

    As we have seen in the chapter, this test can also be apprehended through the analysis-of-variance equation.

  • Test of significance of a subset of coefficients. This case corresponds to the null hypothesis: \(\beta _{k-s+2}=\beta _{k-s+3}=\cdots =\beta _{k}=0\), i.e.:

    $$\displaystyle \begin{aligned} \boldsymbol{R}=[ \begin{array} [c]{cc} \boldsymbol{0} & \boldsymbol{I}_{s} \end{array} ]\text{ and }\boldsymbol{r}=\boldsymbol{0} \end{aligned} $$
    (3.282)

    Let us decompose the matrix \(\boldsymbol {X}\) and the vector \(\boldsymbol {\beta }\) into blocks so that:

    $$\displaystyle \begin{aligned} \boldsymbol{Y}&= \begin{pmatrix} \boldsymbol{X}_{r}&\boldsymbol{X}_{s} \end{pmatrix} \begin{pmatrix} \hat{\boldsymbol{\beta}}_{r}\\ \hat{\boldsymbol{\beta}}_{s} \end{pmatrix} +\boldsymbol{e}{}\\&=\boldsymbol{X}_{r}\hat{\boldsymbol{\beta}}_{r}+\boldsymbol{X}_{s}\hat{\boldsymbol{\beta} }_{s}+\boldsymbol{e}\end{aligned} $$
    (3.283)

    where the matrix \(\boldsymbol {X}_{r}\) is formed by the \(\left ( k+1-s\right )\) first columns of \(\boldsymbol {X}\) and \(\boldsymbol {X}_{\boldsymbol {s}}\) is formed by the s remaining columns of the matrix \(\boldsymbol {X}\). We then have: \(\boldsymbol {R\hat {\beta }}-\boldsymbol {r}=\hat {\boldsymbol {\beta }}_{s}\). Furthermore, the matrix \(\boldsymbol {R}\left ( \boldsymbol {X}^{\prime }\boldsymbol {X}\right )^{-1}\boldsymbol {R}^{\prime }\) involved in calculating the test statistic (Eq. (3.271)) is equal to the submatrix of order s obtained by deleting the \(\left ( k+1-s\right )\) first rows and columns of the matrix \(\left ( \boldsymbol {X}^{\prime }\boldsymbol {X}\right )^{-1}\). Let us explain the form of this submatrix. We have:

    $$\displaystyle \begin{aligned} \boldsymbol{X}^{\prime}\boldsymbol{X}= \begin{pmatrix} \boldsymbol{X}_{\boldsymbol{r}}^{\prime}\boldsymbol{X}_{\boldsymbol{r}}&\boldsymbol{X} _{\boldsymbol{r}}^{\prime}\boldsymbol{X}_{\boldsymbol{s}}\\ \boldsymbol{X}_{\boldsymbol{s}}^{\prime}\boldsymbol{X}_{\boldsymbol{r}}&\boldsymbol{X} _{\boldsymbol{s}}^{\prime}\boldsymbol{X}_{\boldsymbol{s}} \end{pmatrix} \end{aligned} $$
    (3.284)

    The calculation of \(\left ( \boldsymbol {X}^{\prime }\boldsymbol {X}\right )^{-1}\) shows us that the submatrix we are interested in here is written as:

    $$\displaystyle \begin{aligned} \left( \boldsymbol{X}_{\boldsymbol{s}}^{\prime}\boldsymbol{X}_{\boldsymbol{s}}-\boldsymbol{X} _{\boldsymbol{s}}^{\prime}\boldsymbol{X}_{\boldsymbol{r}}\left( \boldsymbol{X}_{\boldsymbol{r} }^{\prime}\boldsymbol{X}_{\boldsymbol{r}}\right)^{-1}\boldsymbol{X}_{\boldsymbol{r}} ^{\prime}\boldsymbol{X}_{\boldsymbol{s}}\right)^{-1}&=\left[ \boldsymbol{X} _{\boldsymbol{s}}^{\prime}\left( \boldsymbol{I}-\boldsymbol{X}_{\boldsymbol{r}}\left( \boldsymbol{X}_{\boldsymbol{r}}^{\prime}\boldsymbol{X}_{\boldsymbol{r}}\right)^{-1} \boldsymbol{X}_{\boldsymbol{r}}^{\prime}\right) \boldsymbol{X}_{\boldsymbol{s}}\right]^{-1}\\&=\left[ \boldsymbol{X}_{\boldsymbol{s}}^{\prime}\boldsymbol{Z}_{r}\boldsymbol{X} _{\boldsymbol{s}}\right]^{-1} \end{aligned} $$
    (3.285)

    where \(\boldsymbol {Z}\) is the transformation matrix given by:

    $$\displaystyle \begin{aligned} \boldsymbol{Z}_{\boldsymbol{r}}=\boldsymbol{I}-\boldsymbol{X}_{\boldsymbol{r}}\left( \boldsymbol{X}_{\boldsymbol{r}}^{\prime}\boldsymbol{X}_{\boldsymbol{r}}\right)^{-1} \boldsymbol{X}_{\boldsymbol{r}}^{\prime} \end{aligned} $$
    (3.286)

    Relationship (3.271) then becomes:

    $$\displaystyle \begin{aligned} F=\frac{\hat{\boldsymbol{\beta}}_{s}^{\prime}\left( \boldsymbol{X}_{\boldsymbol{s} }^{\prime}\boldsymbol{Z}_{r}\boldsymbol{X}_{\boldsymbol{s}}\right) \hat{\boldsymbol{\beta} }_{s}/s}{\boldsymbol{e}^{\prime}\boldsymbol{e/}\left( T-k-1\right) } \end{aligned} $$
    (3.287)

    Let us now explain the expression of the numerator. To do this, consider the regression of \(\boldsymbol {Y}\) on the explanatory variables listed in \(\boldsymbol {X}_{\boldsymbol {r}}\) and note \(\boldsymbol {e} _{r}\) the residuals resulting from this regression:

    $$\displaystyle \begin{aligned} \boldsymbol{e}_{r}&=\boldsymbol{Y}-\boldsymbol{X}_{r}\hat{\boldsymbol{\beta}}_{r}\\&=\boldsymbol{Y}-\boldsymbol{X}_{r}\left( \left( \boldsymbol{X}_{\boldsymbol{r}}^{\prime }\boldsymbol{X}_{\boldsymbol{r}}\right)^{-1}\boldsymbol{X}_{\boldsymbol{r}}^{\prime }\boldsymbol{Y}\right) \\&=\boldsymbol{Z}_{\boldsymbol{r}}\boldsymbol{Y}\end{aligned} $$
    (3.288)

    Let us multiply each member of (3.283) by the matrix \(\boldsymbol {Z} _{\boldsymbol {r}}\):

    $$\displaystyle \begin{aligned} \boldsymbol{Z}_{\boldsymbol{r}}\boldsymbol{Y}=\boldsymbol{Z}_{\boldsymbol{r}}\boldsymbol{X} _{r}\hat{\boldsymbol{\beta}}_{r}+\boldsymbol{Z}_{\boldsymbol{r}}\boldsymbol{X}_{s} \hat{\boldsymbol{\beta}}_{s}+\boldsymbol{Z}_{\boldsymbol{r}}\boldsymbol{e} \end{aligned} $$
    (3.289)

    We have:

    • \(\boldsymbol {Z}_{\boldsymbol {r}}\boldsymbol {X}_{r}=\boldsymbol {X}_{r}-\boldsymbol {X} _{\boldsymbol {r}}\left ( \boldsymbol {X}_{\boldsymbol {r}}^{\prime }\boldsymbol {X}_{\boldsymbol {r} }\right )^{-1}\boldsymbol {X}_{\boldsymbol {r}}^{\prime }\boldsymbol {X}_{r}=\boldsymbol {0}\)

    • \(\boldsymbol {Z}_{\boldsymbol {r}}=\boldsymbol {Z}_{\boldsymbol {r}}^{2}=\boldsymbol {Z} _{\boldsymbol {r}}^{\prime }\) (idempotent and symmetric matrix)

    • \(\boldsymbol {Z}_{\boldsymbol {r}}\boldsymbol {e}=\boldsymbol {e}\) because:

      $$\displaystyle \begin{aligned} \boldsymbol{Z}_{\boldsymbol{r}}\boldsymbol{e}&=\left( \boldsymbol{I}-\boldsymbol{X} _{\boldsymbol{r}}\left( \boldsymbol{X}_{\boldsymbol{r}}^{\prime}\boldsymbol{X}_{\boldsymbol{r} }\right)^{-1}\boldsymbol{X}_{\boldsymbol{r}}^{\prime}\right) \left( \boldsymbol{Y} -\boldsymbol{X\hat{\beta}}\right) \\&=\boldsymbol{Y}-\boldsymbol{X\hat{\beta}}-\boldsymbol{X}_{\boldsymbol{r}}\left( \boldsymbol{X}_{\boldsymbol{r}}^{\prime}\boldsymbol{X}_{\boldsymbol{r}}\right)^{-1} \boldsymbol{X}_{\boldsymbol{r}}^{\prime}\boldsymbol{Y+X}_{\boldsymbol{r}}\left( \boldsymbol{X}_{\boldsymbol{r}}^{\prime}\boldsymbol{X}_{\boldsymbol{r}}\right)^{-1} \boldsymbol{X}_{\boldsymbol{r}}^{\prime}\boldsymbol{X\hat{\beta}}\\&=\boldsymbol{Y}-\boldsymbol{X\hat{\beta}}-\boldsymbol{X}_{\boldsymbol{r}}\hat{\boldsymbol{\beta }}_{r}\boldsymbol{+X}_{\boldsymbol{r}}\left( \boldsymbol{X}_{\boldsymbol{r}}^{\prime }\boldsymbol{X}_{\boldsymbol{r}}\right)^{-1}\boldsymbol{X}_{\boldsymbol{r}}^{\prime}\left( \boldsymbol{Y}-\boldsymbol{e}\right) \end{aligned} $$
      (3.290)

      Let us find the value of \(\boldsymbol {X}_{\boldsymbol {r}}^{\prime }\boldsymbol {e}\). We know that:

      $$\displaystyle \begin{aligned} \left( \boldsymbol{X}^{\prime}\boldsymbol{X}\right) \hat{\boldsymbol{\beta}}&=\boldsymbol{X}^{\prime}\boldsymbol{Y}\\&=\boldsymbol{X}^{\prime}\left( \boldsymbol{X\hat{\beta}+e}\right) \end{aligned} $$
      (3.291)

      We therefore deduce that:

      $$\displaystyle \begin{aligned} \boldsymbol{X}^{\prime}\boldsymbol{e=} \begin{pmatrix} \boldsymbol{X}_{\boldsymbol{r}}^{\prime}\boldsymbol{e}\\ \boldsymbol{X}_{\boldsymbol{r}}^{\prime}\boldsymbol{e} \end{pmatrix} =\boldsymbol{0} \end{aligned} $$
      (3.292)

      Finally, we have:

      $$\displaystyle \begin{aligned} \boldsymbol{Z}_{\boldsymbol{r}}\boldsymbol{e}=\boldsymbol{Y}-\boldsymbol{X\hat{\beta}} -\boldsymbol{X}_{\boldsymbol{r}}\hat{\boldsymbol{\beta}}_{r}+\boldsymbol{X}_{\boldsymbol{r} }\hat{\boldsymbol{\beta}}_{r}=\boldsymbol{e} \end{aligned}$$

      Hence:

    $$\displaystyle \begin{aligned} \boldsymbol{Z}_{\boldsymbol{r}}\boldsymbol{Y}=\boldsymbol{Z}_{\boldsymbol{r}}\boldsymbol{X} _{s}\hat{\boldsymbol{\beta}}_{s}+\boldsymbol{e} \end{aligned} $$
    (3.293)

    Let us multiply each member of this relation by its transpose:

    $$\displaystyle \begin{aligned} \boldsymbol{Y}^{\prime}\boldsymbol{Z}_{\boldsymbol{r}}^{\prime}\boldsymbol{Z}_{\boldsymbol{r} }\boldsymbol{Y}=\hat{\boldsymbol{\beta}}_{s}^{\prime}\boldsymbol{X}_{s}^{\prime} \boldsymbol{Z}_{\boldsymbol{r}}^{\prime}\boldsymbol{Z}_{\boldsymbol{r}}\boldsymbol{X} _{s}\hat{\boldsymbol{\beta}}_{s}+\boldsymbol{e}^{\prime}\boldsymbol{e} \end{aligned} $$
    (3.294)

    Knowing that \(\boldsymbol {Z}_{\boldsymbol {r}}=\boldsymbol {Z}_{\boldsymbol {r}}^{\prime }\), we get:

    $$\displaystyle \begin{aligned} \boldsymbol{Y}^{\prime}\boldsymbol{Z}_{\boldsymbol{r}}\boldsymbol{Y}=\hat{\boldsymbol{\beta}} _{s}^{\prime}\boldsymbol{X}_{s}^{\prime}\boldsymbol{Z}_{\boldsymbol{r}}\boldsymbol{X} _{s}\hat{\boldsymbol{\beta}}_{s}+\boldsymbol{e}^{\prime}\boldsymbol{e} \end{aligned} $$
    (3.295)

    Furthermore, since \(\boldsymbol {e}_{\boldsymbol {r}}=\boldsymbol {Z}_{\boldsymbol {r} }\boldsymbol {Y}\), we have: \(\boldsymbol {e}_{\boldsymbol {r}}^{\prime }\boldsymbol {e}_{\boldsymbol {r} }=\boldsymbol {Y}^{\prime }\boldsymbol {Z}_{\boldsymbol {r}}^{\prime }\boldsymbol {Z}_{\boldsymbol {r} }\boldsymbol {Y=Y}^{\prime }\boldsymbol {Z}_{\boldsymbol {r}}\boldsymbol {Y}\). We then obtain the following result:

    $$\displaystyle \begin{aligned} \hat{\boldsymbol{\beta}}_{s}^{\prime}\boldsymbol{X}_{s}^{\prime}\boldsymbol{Z} _{\boldsymbol{r}}\boldsymbol{X}_{s}\hat{\boldsymbol{\beta}}_{s}=\boldsymbol{e}_{\boldsymbol{r} }^{\prime}\boldsymbol{e}_{\boldsymbol{r}}-\boldsymbol{e}^{\prime}\boldsymbol{e} \end{aligned} $$
    (3.296)

    Relationship (3.271) is finally written as:

    $$\displaystyle \begin{aligned} F=\frac{\left( \boldsymbol{e}_{\boldsymbol{r}}^{\prime}\boldsymbol{e}_{\boldsymbol{r} }-\boldsymbol{e}^{\prime}\boldsymbol{e}\right) /s}{\boldsymbol{e}^{\prime}\boldsymbol{e/} \left( T-k-1\right) }\sim F\left( s,T-k-1\right) {} \end{aligned} $$
    (3.297)

    This test, which is very frequently employed, can be used to test the significance of a subset of explanatory variables \(\boldsymbol {X}_{s}\). In practice, it consists in running two regressions:

    • A regression of \(\boldsymbol {Y}\) on the set of explanatory variables, \(\boldsymbol {e}^{\prime }\boldsymbol {e}\) being the corresponding sum of squared residuals

    • A regression of \(\boldsymbol {Y}\) on the subset of explanatory variables \(\boldsymbol {X}_{\boldsymbol {r}}\) (i.e., on variables other than \(\boldsymbol {X}_{s}\)), \(\boldsymbol {e}_{\boldsymbol {r}}^{\prime }\boldsymbol {e} _{\boldsymbol {r}}\) being the corresponding sum of squared residuals

    The decision rule is as follows:

    • If \(F\leq F\left ( s,T-k-1\right )\), the null hypothesis that the variables \(\boldsymbol {X}_{s}\) are not significant is not rejected.

    • If \(F>F\left ( s,T-k-1\right )\), the null hypothesis is rejected.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mignon, V. (2024). The Multiple Regression Model. In: Principles of Econometrics. Classroom Companion: Economics. Springer, Cham. https://doi.org/10.1007/978-3-031-52535-3_3

Download citation

Publish with us

Policies and ethics