## Abstract

In this chapter, we examine the General Linear Model (GLM), an important topic for econometrics and statistics, as well as other disciplines. The term general refers to the fact that there are no restrictions in the number of explanatory variables we may consider, the term linear refers to the manner in which **the parameters enter the model.** It **does not refer to the form of the variables.** This is often termed in the literature the **regression model**, and analysis of empirical results obtained from such models as **regression analysis.**

## Keywords

Covariance Matrix General Linear Model Positive Semidefinite Large Root Generalize Little Square## 10.1 Introduction

In this chapter, we examine the General Linear Model (GLM), an important topic for econometrics and statistics, as well as other disciplines. The term general refers to the fact that there are no restrictions in the number of explanatory variables we may consider, the term linear refers to the manner in which **the parameters enter the model.** It **does not refer to the form of the variables.** This is often termed in the literature the **regression model**, and analysis of empirical results obtained from such models as **regression analysis.**

This is perhaps the most commonly used research approach by empirically oriented economists. We examine both its foundations, but also how the mathematical tools developed in earlier chapters are utilized in determining the properties of the parameter estimators^{1} that it produces.

*y*

_{ t }is an observation at “time”

*t*on the phenomenon to be “explained” by the analysis; the

*x*

_{ ti },

*i*= 0,1,2,

*…*,

*n*are observations on variables that the investigator asserts are important in explaining the behavior of the phenomenon in question; β

_{ i },

*i*= 0,1,2,

*… n*are parameters i.e. they are fixed but

**unknown**constants that modify the influence of the

*x*’s on

*y*. In the language of econometrics,

*y*is termed the

**dependent**variable, while the

*x*’s are termed the

**independent or explanatory**variables; in the language of statistics, they are often referred to, respectively, as the

**regressand**and the

**regressors**. The

*u*’s simply acknowledge that the enumerated variables do not provide an exhaustive explanation for the behavior of the dependent variable; in the language of econometrics, they are typically referred to as the error term or the

**structural errors**. The model stated in Eq. (10.1) is thus the data generating function for the data to be analysed. Contrary to the time series approach to data analysis, econometrics nearly always deals within a reactive context in which the behavior of the dependent variable is conditioned by what occurs in the economic environment beyond itself and its past history.

*T*periods, and the problem is to obtain estimators and carry out inference procedures (such as tests of significance, construction of confidence intervals, and the like) relative to the unknown parameters. Such procedures operate in a certain environment. Before we set forth the assumptions defining this environment, we need to establish some notation. Thus, collecting the explanatory variable observations in the

*T*×

*n*+ 1 matrix

## 10.2 The GLM and the Least Squares Estimator

- i.The elements of the matrix
*X*are**nonstochastic**and its columns are**linearly independent**. Moreover,Often, the explanatory variables may be considered random,$$\displaystyle{\lim _{T\rightarrow \infty }\frac{{X}^{^{\prime}}X} {T} = M_{xx} > 0\ \ \mbox{ i.e. it is a positive definite matrix}.}$$**but independent of the structural errors of the model**. In such cases the regression analysis is carried out**conditionally on the x’s.** - ii.
The errors,

*u*_{ t }, are independent, identically distributed (iid) random variables with mean zero and variance 0 < σ^{2}<*∞*. - iii.In order to obtain a distribution theory, one often adds the assumption that the errors have the normal joint distribution with mean vector zero and covariance matrix σ
^{2}*I*_{ T }, or more succinctly one writeswith increases in the size of the samples (data) available to econometricians over time, this assumption is not frequently employed in current applications, relying instead on central limit theorems (CLT) to provide the distribution theory required for inference.$$\displaystyle{ u \sim N(0,{\sigma }^{2}I_{ T}); }$$(10.5)

### Proposition 10.1.

If *X* obeys condition i, *X* ^{′} *X* is positive definite, and thus invertible.

### Proof:

*X*are linearly independent, the only vector α such that

*X*α = 0 is the zero vector—see Proposition 2.61. Thus, for α ≠ 0, consider

*X*

^{′}

*X*is positive definite and thus invertible—see Proposition 2.62.

^{2}. Although the least squares procedure does not provide a particular way in which such a parameter is to be estimated, it seems intuitively reasonable that we should do so through the residual vector

*N*is a

**symmetric idempotent**matrix—for a definition of symmetric matrices, see Definition 2.4; for a definition of idempotent matrices, see Definition 2.8. It appears intuitively reasonable to think of \(\hat{u}\) of Eq. (10.8) as an “estimator” of the unobservable error vector

*u*; thus it is also natural that we should define an estimator of σ

^{2}, based on the sum of squares \(\hat{{u}}^{^{\prime}}\hat{u}.\) We return to this topic in the next section.

## 10.3 Properties of the Estimators

The properties of the least squares, termed in econometrics the OLS (ordinary least squares), estimator are given below.

### Proposition 10.2 (Gauss-Markov Theorem).

- i.
Unbiased,

- ii.
Efficient

**within the class of linear unbiased estimators**.

### Proof:

**any other**linear (in

*y*) unbiased estimator. In view of linearity, we may write

**unbiased**estimator of β, we have

*C*is of rank

**greater**than zero. Let this rank be

*r*. Thus, there exists at least one (non-null) vector α such that

*C*

^{ ′ }α ≠ 0. Consequently, α

^{′}

*CC*

^{′}α > 0, which demonstrates the validity of the claim.

q.e.d.

### Corollary 10.1.

^{2}in the sense that

### Proof:

*b*

_{ n }=

*T*. Then

*p*= 2. This entails showing unbiasedness, already shown, and asymptotic vanishing of the estimator’s variance. Thus, by Proposition 10.2 and condition ii,

We now examine the properties of the estimator for σ^{2}, hinted at the end of the preceding section.

### Proposition 10.3.

*u*

^{′}

*Nu*is a scalar; the second follows since for all suitable matrices tr

*AB*= tr

*BA*—see Proposition 2.16; the third equality follows from the fact that

*X*, and hence

*N*, is a nonstochastic matrix; the last equality follows from condition ii that defines the context of this discussion. Thus, we need only find the trace of

*N*. Since \(\mathrm{tr}(A + B) = \mathrm{tr}A + \mathrm{tr}B\)—see Proposition 2.16—we conclude that

q.e.d.

### Corollary 10.2.

^{2}is given by

### Proof:

Evident from Proposition 10.3.

## 10.4 Distribution of OLS Estimators

### 10.4.1 A Digression on the Multivariate Normal

*x*having the multivariate normal distribution with mean (vector) μ and covariance matrix Σ is denoted by

*x*has the multivariate normal distribution with mean vector μ and covariance matrix Σ > 0.

*x*, which result in the singularity of the covariance matrix. We handle singular covariance matrix situations through the following convention.

### Convention 10.1.

*k*-element vector ξ obey ξ ∼

*N*(μ,Σ), such that Σ > 0, and suppose that

*y*is an

*n*-element vector (

*k*≤

*n*) which has the representation

*y*has the distribution

**is singular**, but properties of

*y*can be inferred from those of ξ which has a proper multivariate normal distribution. Certain properties of the (multivariate) normal that are easily derivable from its definition are:

- i.Let
*x*∼*N*(μ,Σ), partition \(x = \left (\begin{array}{c} {x}^{(1)} \\ {x}^{(2)}\end{array} \right ),\) such that*x*^{(1)}has*s*elements and*x*^{(2)}has*k*−*s*elements. Partition μ and Σ conformably so thatThen, the$$\displaystyle{ \mu = \left (\begin{array}{c} {\mu }^{(1)} \\ {\mu }^{(2)}\end{array} \right ),\ \ \ \Sigma = \left [\begin{array}{cc} \Sigma _{11} & \Sigma _{12} \\ \Sigma _{21} & \Sigma _{22}\end{array} \right ]. }$$(10.25)**marginal distribution**of*x*^{(i)},*i*= 1,2, obeysThe$$\displaystyle{ {x}^{(i)} \sim N({\mu }^{(i)},\Sigma _{ ii}),\ \ \ i = 1,2. }$$(10.26)**conditional distribution**of*x*^{(1)}given*x*^{(2)}is given by$$\displaystyle{ {x}^{(1)}\vert {x}^{(2)} \sim N({\mu }^{(1)} + \Sigma _{ 12}\Sigma _{22}^{-1}({x}^{(2)} - {\mu }^{(2)}),\ \Sigma _{ 11} - \Sigma _{12}\Sigma _{22}^{-1}\Sigma _{ 21}). }$$(10.27) - ii.Let the
*k*-element random vector*x*obey*x*∼*N*(μ,Σ), Σ > 0, and define \(y = Bx + c,\) where*B*is any conformable matrix; then$$\displaystyle{ y \sim N(B\mu + c,\ B\Sigma {B}^{^{\prime}}). }$$(10.28) - iii.Let
*x*∼*N*(μ,Σ) and partition as in part i;*x*^{(1)}and*x*^{(2)}are**mutually independent**if and only if$$\displaystyle{ \Sigma _{12} = \Sigma _{21}^{^{\prime}} = 0. }$$(10.29) - ivWe also have the sort of converse of ii, i.e. if
*x*is as in ii, there exists a matrix*C*such thatThe proof of this is quite simple; by Proposition 2.15 there exist a nonsingular matrix$$\displaystyle{ y = {C}^{-1}(x - \mu ) \sim N(0,I_{ k}). }$$(10.30)*C*such that Σ =*CC*^{ ′ }; by ii, \(y \sim N(0,{C}^{-1}\Sigma {C}^{{\prime}-1} = I_{k})\). - vAn implication of iv is thatbecause the$$\displaystyle{ {(x - \mu )}^{{\prime}}{\Sigma }^{-1}(x - \mu ) = {y}^{{\prime}}y =\sum _{ i=1}^{k}y_{ i}^{2} \sim \chi _{ r}^{2}, }$$(10.31)
*y*_{ i }are iid*N*(0,1) whose squares have the χ^{2}distribution. More about this distribution will be found immediately below.

In item iii, note that if joint normality **is not assumed** and we partition \(x = \left (\begin{array}{c} {x}^{(1)} \\ {x}^{(2)} \end{array} \right ),\)such that *x* ^{(1)} has *s* elements and *x* ^{(2)} has *k* − *s* elements, as above, then under the condition in iii, *x* ^{(1)} and *x* ^{(2)} **are still uncorrelated**, but **they are not necessarily independent.** Under normality uncorrelatedness implies independence; under any distribution independence always implies uncorrelatedness.

Other distributions, important in the GLM context are the chi-square, the *t*- (sometimes also termed the Student *t*) and the *F*-distributions.

*r*degrees of freedom, denoted by χ

_{ r }

^{2}, may be thought to be the distribution of

**the sum of squares of r mutually independent normal variables with mean zero and variance one**; the

*t*-distribution with

*r*degrees of freedom, denoted by

*t*

_{ r }, is defined as the distribution of the ratio

**mutually independent.**

*F*-distribution with

*m*and

*n*degrees of freedom, denoted by

*F*

_{ m,n }, is defined as the distribution of the ratio

**mutually independent**.

Note that *F* _{ m,n } ≠ *F* _{ n,m }. The precise relation between the two is given by \(F_{m,n} = 1/F_{n,m}.\)

### 10.4.2 Application to OLS Estimators

We now present a very important result.

### Proposition 10.4.

The OLS estimators \(\hat{\beta }\) and \(\hat{{\sigma }}^{2}\) are mutually independent.

### Proof:

**uncorrelated**and hence, by the properties of the multivariate normal, they are

**mutually independent**. Since \(\hat{{\sigma }}^{2}\) depends only on \(\hat{u}\)

**and thus not on**\(\hat{\beta },\) the conclusion of the proposition is evident.

q.e.d.

### Corollary 10.3.

*bona fide*variables by β

_{∗}and the coefficient of the “fictitious variable” one (

*x*

_{ t0}) by β

_{0}(the constant term), so that we have

*e*is a

*T*-element column vector all of whose elements are unities. The following statements are true:

- i.
\(\hat{u} \sim N(0,{\sigma }^{2}N);\)

- ii.
\(\hat{\beta } \sim N(\beta,{\sigma }^{2}{({X}^{^{\prime}}X)}^{-1});\)

- iii.
\(\hat{\beta }_{{\ast}}\sim N(\beta _{{\ast}},{\sigma }^{2}{(X_{1}^{{\ast}^{\prime}}X_{1}^{{\ast}})}^{-1}),\) \(X_{1}^{{\ast}} = (I_{T} - e{e}^{^{\prime}}/T)X_{1}.\)

### Proof:

The first two statements follow immediately from Proposition 10.3 and property i of the multivariate normal. The statement in iii also follows immediately from property i of the multivariate normal and the properties of the inverse of partitioned matrices; however, we also give an alternative proof because we will need a certain result in later discussion.

q.e.d.

### Remark 10.1.

^{2}, is not known. As an example, consider the coefficient of correlation of multiple regression

**mutually independent**by Proposition 10.3. From Proposition 2.15, we have that every positive definite matrix has a nonsingular decomposition, say

*AA*

^{′}. Let

**all**of the elements of the vector ξ are scalars, mutually independent (normal) random variables with mean zero and variance one. Hence, by the preceding discussion,

*N*is a symmetric idempotent matrix of rank \(T - n - 1.\) As a symmetric and idempotent matrix, it has the representation

*Q*is an orthogonal matrix. To verify that, see Propositions 2.53 and 2.55.

*Q*= (

*Q*

_{1},

*Q*

_{2}), so that

*Q*

_{1}corresponds to the nonzero (unit) roots and note that

**under the null hypothesis**,

*H* _{0}: β_{∗} = 0

as against the alternative,

*H* _{1}: β_{∗} ≠ 0,

the numerator of the fraction \(({R}^{2}/n)/[(1 - {R}^{2})/T - n - 1]\) is chi-square distributed with *n* degrees of freedom. Hence, Eq. (10.41) may be used as a test statistic for the test of the hypothesis stated above, and its distribution is \(F_{n,T-n-1}.\)

### Remark 10.2.

*S*

_{ r }be an

*n*×

*r*,

*r*≤

*n*,

**selection**matrix; this means that the columns of

*S*

_{ r }are

**mutually orthogonal**and in each column all elements are zero

**except one which is unity**. This makes

*S*

_{ r }of rank

*r*. Note also that it is orthogonal only in the sense that

*S*

_{ r }

^{′}

*S*

_{ r }=

*I*

_{ r }; on the other hand \(S_{r}S_{r}^{^{\prime}}\neq I_{n}.\) It is clear that if we are interested in testing the hypothesis, say, that \(\beta _{2} = \beta _{3} = \beta _{7} = \beta _{12} = 0,\) we may define the selection matrix

*S*

_{4}such that

*H* _{0}: *S* _{4} ^{′}β_{∗} = 0,

as against the alternative,

\(H_{1}:\ S_{4}^{^{\prime}}\beta _{{\ast}}\neq 0,\)

and its distribution is \(F_{4,T-n-1}.\)

*r*= 1—i.e. for the problem of testing a hypothesis on a

**single**coefficient—we note that the preceding discussion implies that the appropriate test statistic and its distribution are given by

^{2}

*q*

_{ ii }, and

*q*

_{ ii }is the

*i*th diagonal element of \({(X_{1}^{{\ast}^{\prime}}X_{1}^{{\ast}})}^{-1}.\) Making the substitution \(\hat{{\sigma }}^{2} =\hat{ {u}}^{^{\prime}}\hat{u}/T - n - 1\) and taking the square root in Eq. (10.52) we find

*t*-ratio of regression analysis. Finally, also observe that

**square root**of a variable that has the

*F*

_{1,n }distribution is precisely the

*t*

_{ n }distribution.

## 10.5 Nonstandard Errors

*u*∼

*N*(0,σ

^{2}

*I*

_{ T }), but instead have the more general normal distribution

*T*×

*T*, we cannot in practice obtain efficient estimators

**unless**Σ is known. If it is, obtain the nonsingular decomposition and the transformed model, respectively,

If Σ is **not known** the estimator in Eq. (10.57), termed in econometrics the **Aitken** estimator, is not available. However, (as *T* →*∞*), if Σ has only a **fixed finite number of distinct elements** which can be estimated consistently, say by \(\tilde{\Sigma }\), the estimator in Eq. (10.57) **with Σ replaced by** \(\tilde{\Sigma }\) **, is feasible and is termed the Generalized Least Squares (GLS) estimator.**

**unbiased**. Moreover, provided

**in mean square**and thus also in probability.

- i.
\(\Sigma _{\tilde{\beta }} - \Sigma _{\hat{\beta }} \geq 0,\) the Aitken estimator is efficient relative to the OLS estimator;

- ii.
\(\Sigma _{\tilde{\beta }} - \Sigma _{\hat{\beta }} \leq 0,\) the OLS estimator is efficient relative to the Aitken estimator;

- iii.
Finally if \(\Sigma _{\tilde{\beta }} - \Sigma _{\hat{\beta }}\) is an

**indefinite matrix**i.e. it is neither positive nor negative (semi)definite, the two estimators cannot be ranked.

To tackle this issue directly, we consider the simultaneous decomposition of two positive definite matrices; see Proposition 2.64.

**in the metric**of Σ i.e. consider the characteristic equation

*n*+ 1 unit roots and \(T - n - 1\) zero roots. By the simultaneous decomposition theorem, see Proposition 2.64, there exists a nonsingular matrix

*A*such that

*X*

^{′}

*X*)

^{−1}

*X*

^{′}and its transpose, respectively, we find

*B*be a

*T*×

*T*positive semidefinite matrix of rank

*r*, and let

*C*be

*T*×

*m*of rank

*m*,

*m*≤

*T*; then,

*C*

^{′}

*BC*≥ 0. For a proof, we show that either

*C*

^{′}

*BC*= 0, or there exists at least one vector α ≠ 0, such that \({\alpha }^{^{\prime}}{C}^{^{\prime}}BC\alpha > 0\) and no vector η such that \({\eta }^{^{\prime}}{C}^{^{\prime}}BC\eta < 0.\) Since

*C*is of rank

*m*> 0, its column space is of dimension

*m*; if the column space of

*C*is

**contained in the null space of**

*B*,

*C*

^{′}

*BC*= 0; if not, then there exists at least one vector γ ≠ 0 in the column space of

*C*such that γ

^{′}

*B*γ > 0,

**because B is positive semidefinite**. Let α be such that γ =

*C*α; the claim of the last inequality in Eq. (10.68) is thus valid. Moreover, no vector η can exist such that η

^{′}

*C*

^{′}

*BC*η < 0. This is so because

*C*η is

**in the column space**of

*C*

**and**

*B*is positive semidefinite.

### Remark 10.3.

We should also point out that there is an indirect proof of the relative inefficiency of the OLS estimator of β in the model above. We argued earlier that the Aitken estimator in a model with nonstandard errors is simply the OLS estimator in an appropriately transformed model and thus obeys the Gauss-Markov theorem. It follows, therefore, that the OLS estimator in the **untransformed model** cannot possibly obey the Gauss-Markov theorem and is thus **not efficient**.

We now take up another question of practical significance. If in the face of a general covariance matrix, Σ > 0, for the errors of a GLM we estimate parameters and their covariance matrix **as if the model had a scalar covariance matrix**, do the resulting test statistics have a tendency (on the average) to reject too frequently, or not frequently enough, relative to the situation when the correct covariance matrix is estimated.

^{2}

*I*

_{ T }, the covariance matrix of the OLS estimator of β is estimated as

*t*-ratios) will be understated and hence the hypotheses in question will tend to be

**accepted**too frequently; if it is negative semi-definite, the test statistics will tend to be overstated, and hence the hypotheses in question will tend to be

**rejected**too frequently. If it is indefinite, no such statements can be made.

*W*we need to put more structure in place. Because Σ is a positive definite symmetric matrix, we can write

*Q*is the

**orthogonal matrix**of the characteristic vectors and Λ is the

**diagonal matrix**of the (positive) characteristic roots arranged in

**decreasing order**, i.e. λ

_{1}is the largest root and λ

_{ T }is the smallest.

What we shall do is to show that there exist data matrices (*X*) such that *W* is positive semidefinite, and data matrices such that *W* is negative semidefinite. To determine the nature of *W* we must obtain a result that holds for **arbitrary** data matrix *X*. Establishing the validity of the preceding claim is equivalent to establishing that *W* is an **indefinite** matrix.

*Q*can serve as a basis for the Euclidean space

*R*

^{ T }; the columns of the matrix

*X*lie in an (

*n*+ 1)-dimensional subspace of

*R*

^{ T }. Partition

*Q*= (

*Q*

_{1},

*Q*

_{∗}) such that

*Q*

_{1}corresponds to the

*n*+ 1 largest roots and

**suppose**we may represent

*X*=

*Q*

_{1}

*A*, where

*A*is

**nonsingular**. This merely states that

*X*lies in the subspace of

*R*

^{ T }

**spanned by the columns of**

*Q*

_{1}. In this context, we have a simpler expression for

*k*

_{ T }of Eq. (10.71). In particular, we have

*k*

_{ T }is the

**average**of the

**smallest**\(T - n - 1\) roots. Since

*X*

^{′}

*X*=

*A*

^{′}

*A*, we obtain

*k*

_{ T }is the average of the \(T - n - 1\) smallest roots whereas Λ

_{1}contains the

*n*+ 1 largest roots, we conclude that \(k_{T}I_{n+1} - \Lambda _{1} < 0,\) and consequently

**more frequently than appropriate**.

*X*lies in the (

*n*+ 1)-dimensional subspace of

*R*

^{ T }spanned by the columns of

*Q*

_{2}, where

*Q*= (

*Q*

_{∗},

*Q*

_{2}), so that

*Q*

_{2}corresponds to the

*n*+ 1

**smallest roots**. Repeating the same construction as above we find that in this instance

*k*

_{ T }is the

**average**of the

**largest**\(T - n - 1\) roots. Therefore, in this case, we have

*k*

_{ T }is the

**average of the**\(T - n - 1\)

**largest roots**and Λ

_{2}contains along its diagonal the

*n*+ 1

**smallest roots**of Σ.

Evidently, in this case, the test statistics are smaller than appropriate and, consequently, we tend to reject hypotheses **more frequently** than appropriate. The preceding shows that the matrix *W* is **indefinite**.

Finally, the argument given above is admissible because no restrictions are put on the matrix *X*; since it is arbitrary it can, in principle, lie in an (*n* + 1)-dimensional subspace of *R* ^{ T } **spanned by any set of n + 1 of the characteristic vectors** of Σ. Therefore, we must classify the matrix *W* as **indefinite**, when *X* is viewed as arbitrary. It need not be so for any **particular** matrix *X*. But this means that no statement can be made with confidence on the subject of whether using \(\tilde{{\sigma }}^{2}{({X}^{^{\prime}}X)}^{-1}\) as the covariance matrix of the OLS estimator leads to any systematic bias in accepting or rejecting hypotheses. Thus, nothing can be concluded beyond the fact that using the OLS estimator, and OLS based estimators of its covariance matrix, (when actually the covariance matrix of the structural error in non-scalar) is inappropriate, unreliable, and should be avoided for purposes of hypothesis testing.

## 10.6 Inference with Asymptotics

In this section we dispose of the somewhat unrealistic assertion that the structural errors are jointly normally distributed. Recall that we had made this assertion only in order to develop a distribution theory to be used in inferences regarding the parameters of the model.

**based on their asymptotic or limiting distribution.**To this end return to the OLS estimator as exhibited in Eq. (10.6). Nothing much will change relative to the results we obtained above, but the results

**will not hold for every sample size T, but only for “large” T**, although it is not entirely clear what large is. Strictly speaking,

**limiting or asymptotic results hold precisely only at the limit, i.e. as T →∞; but if the sample is large enough the distribution of the entity in question could be well approximated by the asymptotic or limiting distribution.**We may, if we wish, continue with the context of OLS estimation embodied in condition i above, or we may take the position that the explanatory variables are random and all analysis is carried

**conditionally**on the observations in the data matrix

*X*; in this case we replace the condition therein by \(\mathop{\mathrm{plim}}_{T\rightarrow \infty }(XX/T) = M_{xx} > 0\). We shall generally operate under the last condition. At any rate developing the expression in Eq. (10.6) we find

^{3}

**vectors**with mean zero and covariance matrix \({\sigma }^{2}(x_{t\cdot }^{^{\prime}}x_{t\cdot }/T)\), to which a CLT may be applied. Since in all our discussion of CLT we have used only

**scalar random variables**, ostensibly none of these results can be employed in the current (vector) context. On the other hand

**using the observation in Remark 9.4**let λ be an arbitrary conformable vector and consider

*X*′

*X*∕

*T*. Consequently by Proposition 9.7 we conclude

*T*practitioners often use the approximation

*T*) to the asymptotic case (where normality holds only for large

*T*) we shall not follow step by step what we had done earlier. Instead we shall introduce the so called

**general linear hypothesis**,

*H* _{0}: *A*β = *a* _{0}

as against the alternative

*H* _{1}: *A*β ≠ *a* _{0},

where *A* is \(k \times n + 1,\quad k \leq n + 1\), rank(*A*) = *k*, and *A*, *a* _{0} are respectively a matrix and vector **with known elements.**

### Remark 10.4.

Note that the formulation above, encompasses all the types of inference tests considered in the earlier context. For example, if we wish to test the hypothesis that \(\beta _{i} = a_{(0),i} = 0\) simply take *a* _{0} = 0 and *A* as consisting of a single row, all of whose elements are zero save the one corresponding to β_{ i }, which is unity. If we wish to test the hypothesis that \(\beta _{2} = \beta _{3} = \beta _{7} = \beta _{10} = 0\), simply take *a* _{0} = 0, and *A* consisting of four rows, all of whose elements are zero except, respectively, those corresponding to β_{ i }, *i* = 2,3,7,10. If we wish to duplicate the test based on the ratio \({R}^{2}/1 - {R}^{2}\), which tests the hypothesis that the coefficients of all *bona fide* variables are zero, i.e. β^{∗} = 0,^{4} take *A* = (0,*I* _{ n }), which is an *n* × *n* + 1 matrix. Thus, all tests involving **linear** restrictions on the parameters in β **are encompassed in the general linear hypothesis Aβ = a** _{0} **.**

^{∗}is

**not a statistic**since it contains the unknown parameters σ

^{2}and

*M*

_{ xx }, not specified by the null. From Proposition 9.6, however, we know that if we replace σ

^{2},

*M*

_{ xx }by their consistent estimators, viz. \(\hat{{u}}^{{\prime}}\hat{u}/T\),

*X*

^{ ′ }

*X*∕

*T*respectively, the limiting distribution will be the same; thus, consider instead

**is a statistic**in that it does

**not**contain unknown parameters not specified by the null hypothesis.

### Remark 10.5.

A careful examination will disclose that when applied to the test statistic obtained earlier when estimators were normally distributed for every sample size *T*, the statistic τ of Eq. (10.82) duplicates them precisely, except for the denominator in \(\hat{{u}}^{{\prime}}\hat{u}/T\), which is immaterial. This means that if one does what is usually done in evaluating regression results (when it is assumed that normality of estimators prevails for all sample sizes *T*) the test procedures will continue to be valid **when the sample size is large and one employs the limiting distribution of the estimators.** The only difference is that what is *t*-test in the earlier case is now a *z*-test (i.e. based on N(0,1)) and what was an *F*-test is now a chi square test.

## 10.7 Orthogonal Regressors

**mutually orthogonal**, meaning that in the representation

*seriatim*, i.e.

*X*

_{1}is

*T*×

*m*+ 1 and

*X*

_{2}is

*T*×

*k*, \(k = n - m\) such that \(X_{1}^{^{\prime}}X_{2} = 0.\)

*X*

_{1}and

*X*

_{2}are

**not**mutually orthogonal, we can use the preceding discussion to good advantage. For example, suppose we are not particularly interested in the coefficients of the variables in

*X*

_{1}but wish to carry out tests on the coefficients of the variables in

*X*

_{2}. To do so, we need estimators of the coefficient vector β

_{2}as well as its covariance matrix. Oddly enough, we may accomplish this with a simple regression as follows. Rewrite the model as

*y*) from Eq. (10.86), we can express the estimator as

_{2}is

**unbiased**but the estimator for β

_{1}is

**not**. Because we are not interested in β

_{1}, this does not present a problem. Next, compute the residuals from this regression, namely

To complete this facet of our discussion we must show that the estimator of β_{2} as obtained in Eq. (10.87) and as obtained in Eq. (10.6) or Eq. (10.9) are identical; in addition, we must show that the residuals obtained using the estimator in Eq. (10.6) and those obtained from the estimator in Eq. (10.87) are identical.

_{2}, since, evidently, \(\tilde{\beta }_{2}\) of Eq. (10.90) has precisely the same distribution.

*X*= (

*X*

_{1},

*X*

_{2}), as in the previous discussion, we find

*y*on

*X*

_{1}and

*N*

_{1}

*X*

_{2}are precisely the same (numerically) as the residuals of the regression of

*y*on

*X*

_{1}and

*X*

_{2}.

### Remark 10.6.

If we had proved in this volume the projection theorem, the preceding argument would have been quite unnecessary. This is so because the OLS procedure involves the projection of the vector *y* on the subspace spanned by the columns of the matrix (*X* _{1},*X* _{2}), which are, by assumption, linearly independent. Similarly, the regression of *y* on (*X* _{1},*N* _{1} *X* _{2}) involves a **projection** of the vector *y* on the subspace spanned by the columns of (*X* _{1},*N* _{1} *X* _{2}). But the latter is obtained by a Gram-Schmidt orthogonalization procedure on the columns of the matrix (*X* _{1},*X* _{2}). Thus, the two matrices span **precisely** the same subspace. The projection theorem also states that any vector in a *T*-dimensional Euclidean space can be written **uniquely** as the sum of two vectors, one from the subspace spanned by the columns of the matrix in question and one from the **orthogonal complement** of that subspace. Because the subspaces in question are identical, **so are their orthogonal complements**. The component that lies in the orthogonal complement is simply **the vector of residuals** from the corresponding regression.

### Remark 10.7.

The results exhibited in Eqs. (10.6) and (10.87) imply the following computational equivalence. If we are not interested in β_{1}, and we merely wish to obtain the OLS estimator of β_{2} in such a way that we can construct confidence intervals and test hypotheses regarding the parameters therein, we can operate exclusively with the model

_{2}in Eq. (10.97) is

*y*on (

*X*

_{1},

*N*

_{1}

*X*

_{2}), or on (

*X*

_{1},

*X*

_{2}).

## 10.8 Multiple GLM

In this section we take up the case where one has to deal with a number of GLM, that are somehow related but not in any obvious way. The term, multiple GLM, is not standard; in the literature of econometrics the prevailing term is **Seemingly Unrelated Regressions (SUR).** In some sense it is the intellectual precursor to Panel Data Models, a subject we shall take up in the next chapter.

**fixed**number of firms, say

*m*. The explanatory variables in

*x*

_{ t(i)}need not have anything in common with those in

*x*

_{ t(j)},

*i*≠

*j*, although they may; the vectors of coefficients need not be the same for all firms and may indeed have little in common. However, by the nature of the economic environment

**the error terms may be correlated across firms**, since they all operate in the same (macro) economic environment. We may write the observations on the

*i*th firm as

*y*

_{⋅i }is a

*T*element column vector,

^{5}

*X*

^{ i }is a

*T*×

*k*

_{ i }matrix of observations on the explanatory variables, β

_{⋅i }is the

*k*

_{ i }-element column vector of the regression parameters and

*u*

_{⋅i }is the

*T*-element column vector of the errors. Giving effect to the observation that all firms operate in the same economic environment, we are prepared to assume that

*t*. All other standard conditions of the GLM continue in force, for each firm.

*m*GLM

*seriatim*, as we discussed above, obtain estimators and make inferences. If we did so, however, we would be ignoring the information, or condition, exhibited in Eq. (10.102), and this raises the question of whether what we are doing is optimal. To address this issue, write the system in Eq. (10.101) as

**and note that**

*y*is an

*mT*-element column vector as is

*u*,

*X*

^{∗}is an

*mT*×

*k*matrix, \(k =\sum _{ i=1}^{m}k_{i}\) and β

^{∗}is a

*k*-element column vector.

*mT*×

*mT*matrix, contains a fixed number of parameters viz. the elements of the

*m*×

*m*symmetric matrix Σ, this estimator is

**feasible**. Indeed, we can estimate

*seriatim*each of the

*m*GLM by least squares (OLS) obtain the residuals

_{ T→∞ }

*Q*

_{ T }= 0. It also follows from the standard assumptions of the GLM that

*u*

_{⋅i }

^{ ′ }

*u*

_{⋅j }∕

*T*obeys the SLLN, see Sect. 9.3.1 on the applicability of the latter to sequences of iid rvs with finite mean. Consequently,

**exactly the same arguments as above**we can establish the limiting (asymptotic) distribution of the GLS estimator as

*T*case, that the GLS estimator is efficient relative to the OLS estimator.

### Remark 10.8.

Perhaps the development of the argument in this section will help explain the designation of such models as **Seemingly Unrelated Regressions (SUR)** and justify their special treatment.

## Footnotes

- 1.
The term estimator recurs very frequently in econometrics; just to fix its meaning in this chapter and others we define it as: an estimator is a function of the data only (

*x*′*s*and*y*′*s*), say*h*(*y*,*X*) that**does not include unknown parameters**. - 2.
The term consistent generally means that as the sample,

*T*, tends to infinity the estimator converges to the parameter it seeks to estimate. Since the early development of econometrics it meant almost exclusively convergence in probability. This is the meaning we shall use in this and other chapters, i.e. an estimator is consistent for the parameter it seeks to estimate if it converges to it in any fashion that implies convergence in probability. - 3.
In the context of Eq. (10.79) the notation ∼ is to be read “behaves like”.

- 4.
Occasionally this test is referred to as a test of significance of

*R*^{2}. - 5.
The sample size is assumed to be the same for all firms.

## Bibliography

- Anderson, T.W. and H. Rubin (1949), Estimation of the Parameters of a Single Equation in a Complete System of Stochastic Equations,
*Annals of Mathematical Statistics*, pp. 46–63.Google Scholar - Anderson, T.W. and H. Rubin (1950), The Asymptotic Properties of Estimates of Parameters of in a Complete System of Stochastic Equations,
*Annals of Mathematical Statistics*, pp. 570–582.Google Scholar - Balestra, P., & Nerlove, M. (1966). Pooling cross section time series data in the estimation of a dynamic model: The demand for natural gas.
*Econometrica, 34*, 585–612.CrossRefGoogle Scholar - Bellman, R. G. (1960).
*Introduction to matrix analysis*. New York: McGraw-Hill.MATHGoogle Scholar - Billingsley, P. (1968).
*Convergence of probability measures*. New York: Wiley.MATHGoogle Scholar - Billingsley, P. (1995).
*Probability and measure*(3rd ed.). New York: Wiley.MATHGoogle Scholar - Brockwell, P. J., & Davis, R. A. (1991).
*Time series: Theory and methods*(2nd ed.). New York: Springer-Verlag.CrossRefGoogle Scholar - Chow, Y. S., & Teicher, H. (1988).
*Probability theory*(2nd ed.). New York: Springer-Verlag.CrossRefMATHGoogle Scholar - Dhrymes, P. J. (1969). Alternative asymptotic tests of significance and related aspects of 2SLS and 3SLS estimated parameters.
*Review of Economic Studies, 36*, 213–226.CrossRefGoogle Scholar - Dhrymes, P. J. (1970).
*Econometrics: Statistical foundations and applications*. New York: Harper and Row; also (1974). New York: Springer-Verlag.Google Scholar - Dhrymes, P. J. (1973). Restricted and Unrestricted Reduced Forms: Asymptotic Distributions and Relative Efficiencies,
*Econometrica*, vol. 41, pp. 119–134.MathSciNetCrossRefMATHGoogle Scholar - Dhrymes, P. J. (1978).
*Introductory economics*. New York: Springer-Verlag.CrossRefGoogle Scholar - Dhrymes, P.J. (1982) Distributed Lags: Problems of Estmation and Formulation (corrected edition) Amsterdam: North HollandGoogle Scholar
- Dhrymes, P. J. (1989).
*Topics in advanced econometrics: Probability foundations*. New York: Springer-Verlag.CrossRefMATHGoogle Scholar - Dhrymes, P. J. (1994).
*Topics in advanced econometrics: Volume II linear and nonlinear simultaneous equations*. New York: Springer-Verlag.CrossRefMATHGoogle Scholar - Hadley, G. (1961).
*Linear algebra*. Reading: Addison-Wesley.MATHGoogle Scholar - Kendall, M. G., & Stuart, A. (1963).
*The advanced theory of statistics*. London: Charles Griffin.Google Scholar - Kendall M. G., Stuart, A., & Ord, J. K. (1987).
*Kendall’s advanced theory of statistics*. New York: Oxford University Press.MATHGoogle Scholar - Kolassa, J. E. (1997).
*Series approximation methods in statistics*(2nd ed.). New York: Springer-Verlag.CrossRefMATHGoogle Scholar - Sims, C.A. (1980). Macroeconomics and Reality,
*Econometrica*, vol. 48, pp.1–48.CrossRefGoogle Scholar - Shiryayev, A. N. (1984).
*Probability*. New York: Springer-Verlag.CrossRefMATHGoogle Scholar - Stout, W. F. (1974).
*Almost sure convergence*. New York: Academic.MATHGoogle Scholar - Theil, H. (1953). Estimation and Simultaneous Correlation in Complete Equation Systems, mimeograph, The Hague: Central Plan Bureau.Google Scholar
- Theil, H. (1958).
*Economic Forecasts and Policy*, Amsterdam: North Holland.Google Scholar