The General Linear Model

  • Phoebus J. Dhrymes
Chapter

Abstract

In this chapter, we examine the General Linear Model (GLM), an important topic for econometrics and statistics, as well as other disciplines. The term general refers to the fact that there are no restrictions in the number of explanatory variables we may consider, the term linear refers to the manner in which the parameters enter the model. It does not refer to the form of the variables. This is often termed in the literature the regression model, and analysis of empirical results obtained from such models as regression analysis.

Keywords

Covariance Matrix General Linear Model Positive Semidefinite Large Root Generalize Little Square 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

10.1 Introduction

In this chapter, we examine the General Linear Model (GLM), an important topic for econometrics and statistics, as well as other disciplines. The term general refers to the fact that there are no restrictions in the number of explanatory variables we may consider, the term linear refers to the manner in which the parameters enter the model. It does not refer to the form of the variables. This is often termed in the literature the regression model, and analysis of empirical results obtained from such models as regression analysis.

This is perhaps the most commonly used research approach by empirically oriented economists. We examine both its foundations, but also how the mathematical tools developed in earlier chapters are utilized in determining the properties of the parameter estimators1 that it produces.

The basic model is
$$\displaystyle{ y_{t} =\sum _{ i=0}^{n}x_{ ti}\beta _{i} + u_{t}, }$$
(10.1)
where y t is an observation at “time” t on the phenomenon to be “explained” by the analysis; the x ti i = 0,1,2,,n are observations on variables that the investigator asserts are important in explaining the behavior of the phenomenon in question; β i i = 0,1,2,… n are parameters i.e. they are fixed but unknown constants that modify the influence of the x’s on y. In the language of econometrics, y is termed the dependent variable, while the x’s are termed the independent or explanatory variables; in the language of statistics, they are often referred to, respectively, as the regressand and the regressors. The u’s simply acknowledge that the enumerated variables do not provide an exhaustive explanation for the behavior of the dependent variable; in the language of econometrics, they are typically referred to as the error term or the structural errors. The model stated in Eq. (10.1) is thus the data generating function for the data to be analysed. Contrary to the time series approach to data analysis, econometrics nearly always deals within a reactive context in which the behavior of the dependent variable is conditioned by what occurs in the economic environment beyond itself and its past history.
One generally has a set of observations (sample) over T periods, and the problem is to obtain estimators and carry out inference procedures (such as tests of significance, construction of confidence intervals, and the like) relative to the unknown parameters. Such procedures operate in a certain environment. Before we set forth the assumptions defining this environment, we need to establish some notation. Thus, collecting the explanatory variable observations in the T × n + 1 matrix
$$\displaystyle{ X = (x_{ti}),\ \ i = 0,1,2,\ldots,n,\ \ \ t = 1,2,3,\ldots,T, }$$
(10.2)
and further defining
$$\displaystyle{ \beta = {(\beta _{0},\beta _{1},\ldots,\beta _{n})}^{^{\prime}},\ \ y = {(y_{1},y_{2},\ldots,y_{T})}^{^{\prime}},\ \ u = {(u_{1},u_{2},\ldots,u_{T})}^{^{\prime}}, }$$
(10.3)
the observations on the model may be written in the compact form
$$\displaystyle{ y = X\beta + u. }$$
(10.4)

10.2 The GLM and the Least Squares Estimator

The assumptions that define the context of this discussion are the following:
  1. i.
    The elements of the matrix X are nonstochastic and its columns are linearly independent. Moreover,
    $$\displaystyle{\lim _{T\rightarrow \infty }\frac{{X}^{^{\prime}}X} {T} = M_{xx} > 0\ \ \mbox{ i.e. it is a positive definite matrix}.}$$
    Often, the explanatory variables may be considered random, but independent of the structural errors of the model. In such cases the regression analysis is carried out conditionally on the x’s.
     
  2. ii.

    The errors, u t , are independent, identically distributed (iid) random variables with mean zero and variance 0 < σ2 < .

     
  3. iii.
    In order to obtain a distribution theory, one often adds the assumption that the errors have the normal joint distribution with mean vector zero and covariance matrix σ2 I T , or more succinctly one writes
    $$\displaystyle{ u \sim N(0,{\sigma }^{2}I_{ T}); }$$
    (10.5)
    with increases in the size of the samples (data) available to econometricians over time, this assumption is not frequently employed in current applications, relying instead on central limit theorems (CLT) to provide the distribution theory required for inference.
     
The least squares estimator is obtained by the operation
$$\displaystyle{\min _{\beta }{(y - X\beta )}^{^{\prime}}(y - X\beta ),}$$
which yields
$$\displaystyle{ \hat{\beta } = {({X}^{^{\prime}}X)}^{-1}{X}^{^{\prime}}y. }$$
(10.6)
The first question that arises is: how do we know that the inverse exists? Thus, how do we know that the estimator of β is uniquely defined? We answer that in the following proposition.

Proposition 10.1.

If X obeys condition i, X X is positive definite, and thus invertible.

Proof:

Since the columns of X are linearly independent, the only vector α such that Xα = 0 is the zero vector—see Proposition 2.61. Thus, for α ≠ 0, consider
$$\displaystyle{ {\alpha }^{^{\prime}}{X}^{^{\prime}}X\alpha = {\gamma }^{^{\prime}}\gamma =\sum _{ j=0}^{n+1}\gamma _{ j}^{2} > 0,\ \ \ \ \ \ \ \ \ \gamma = X\alpha. }$$
(10.7)
Hence, X X is positive definite and thus invertible—see Proposition 2.62.
The model also contains another unknown parameter, namely the common variance of the errors σ2. Although the least squares procedure does not provide a particular way in which such a parameter is to be estimated, it seems intuitively reasonable that we should do so through the residual vector
$$\displaystyle{ \hat{u} = y - X\hat{\beta } = [I_{T} - X{({X}^{^{\prime}}X)}^{-1}{X}^{^{\prime}}]u = Nu. }$$
(10.8)
First, we note that N is a symmetric idempotent matrix—for a definition of symmetric matrices, see Definition 2.4; for a definition of idempotent matrices, see Definition 2.8. It appears intuitively reasonable to think of \(\hat{u}\) of Eq. (10.8) as an “estimator” of the unobservable error vector u; thus it is also natural that we should define an estimator of σ2, based on the sum of squares \(\hat{{u}}^{^{\prime}}\hat{u}.\) We return to this topic in the next section.

10.3 Properties of the Estimators

The properties of the least squares, termed in econometrics the OLS (ordinary least squares), estimator are given below.

Proposition 10.2 (Gauss-Markov Theorem).

In the context set up by the three conditions noted above, the OLS estimator of Eq. (10.6) is
  1. i.

    Unbiased,

     
  2. ii.

    Efficient within the class of linear unbiased estimators.

     

Proof:

Substituting from Eq. (10.4), we find
$$\displaystyle{ \hat{\beta } = \beta + {({X}^{^{\prime}}X)}^{-1}{X}^{^{\prime}}u, }$$
(10.9)
and taking expectations, we establish, in view of conditions i and ii,
$$\displaystyle{ E\hat{\beta } = \beta + E{({X}^{^{\prime}}X)}^{-1}{X}^{^{\prime}}u = \beta. }$$
(10.10)
To prove efficiency, let \(\tilde{\beta }\) be any other linear (in y) unbiased estimator. In view of linearity, we may write
$$\displaystyle{ \tilde{\beta } = Hy,\ \ \mbox{ where $H$ depends on $X$ only and not on $y.$ } }$$
(10.11)
Without loss of generality, we may write
$$\displaystyle{ H = {({X}^{^{\prime}}X)}^{-1}{X}^{^{\prime}} + C. }$$
(10.12)
Note further that since \(\tilde{\beta } = Hy\) is an unbiased estimator of β, we have
$$\displaystyle{ HX\beta = \beta,\ \ \mbox{ which implies}\ \ HX = I_{n+1},\ \ \mbox{ or, equivalently},\ \ CX = 0. }$$
(10.13)
It follows then immediately that
$$\displaystyle{ \mathrm{Cov}(\tilde{\beta }) = {\sigma }^{2}{({X}^{^{\prime}}X)}^{-1} + {\sigma }^{2}C{C}^{^{\prime}}\ \ \ \ \mbox{ or, equivalently,} }$$
(10.14)
$$\displaystyle{ \mathrm{Cov}(\tilde{\beta }) -\mathrm{Cov}(\hat{\beta }) = {\sigma }^{2}C{C}^{^{\prime}} \geq 0. }$$
(10.15)
That the rightmost member of the equation is valid may be shown as follows: If \(\tilde{\beta }\) is not trivially identical to the OLS estimator \(\hat{\beta },\) the matrix C is of rank greater than zero. Let this rank be r. Thus, there exists at least one (non-null) vector α such that C α ≠ 0. Consequently, α CC α > 0, which demonstrates the validity of the claim.

q.e.d.

Corollary 10.1.

The OLS estimator of β is also consistent2 in the sense that
$$\displaystyle{\hat{\beta }_{T}\stackrel{\mathrm{a.c.}}{\longrightarrow }\beta,}$$
provided \(\vert x_{t\cdot }x_{t\cdot }^{^{\prime}}\vert < B\) uniformly.

Proof:

From Sect.  9.3.1 use Kolmogorov’s criterion with b n = T. Then
$$\displaystyle{ \left \vert \sum _{t=1}^{T}\frac{\mathrm{var}(x_{t\cdot }^{{\prime}}u_{ t})} {{t}^{2}} \right \vert \leq B{\sigma }^{2}\sum _{ t=1}^{T} \frac{1} {{t}^{2}}, }$$
which evidently converges. Hence \(\hat{\beta }\stackrel{\mathrm{a.c.}}{\longrightarrow }\beta \), as claimed. If one does not wish to impose the uniform boundedness condition there are similar weaker conditions that allow the proof of convergence with probability 1. Alternatively, one may not add any further conditions on the explanatory variables, and prove convergence in quadratic mean, see Definition 9.4. With p = 2. This entails showing unbiasedness, already shown, and asymptotic vanishing of the estimator’s variance. Thus, by Proposition 10.2 and condition ii,
$$\displaystyle{ \lim _{T\rightarrow \infty }E(\hat{\beta } - \beta ){(\hat{\beta } - \beta )}^{^{\prime}} = {\sigma }^{2}\lim _{ T\rightarrow \infty }\frac{1} {T}{\left (\frac{{X}^{^{\prime}}X} {T} \right )}^{-1} = 0. }$$
(10.16)
q.e.d.

We now examine the properties of the estimator for σ2, hinted at the end of the preceding section.

Proposition 10.3.

Consider the sum of squares \(\hat{{u}}^{^{\prime}}\hat{u};\) its expectation is given by
$$\displaystyle{ E\hat{{u}}^{^{\prime}}\hat{u} = (T - n - 1){\sigma }^{2}. }$$
(10.17)
Proof: Expanding the representation of the sum of squared residuals we find \(\hat{{u}}^{^{\prime}}\hat{u} = {u}^{^{\prime}}Nu.\) Hence
$$\displaystyle{ E{u}^{^{\prime}}Nu = E\mathrm{tr}{u}^{^{\prime}}Nu = E\mathrm{tr}Nu{u}^{^{\prime}} = \mathrm{tr}NEu{u}^{^{\prime}} = {\sigma }^{2}\mathrm{tr}N. }$$
(10.18)
The first equality follows since u Nu is a scalar; the second follows since for all suitable matrices trAB = trBA—see Proposition 2.16; the third equality follows from the fact that X, and hence N, is a nonstochastic matrix; the last equality follows from condition ii that defines the context of this discussion. Thus, we need only find the trace of N. Since \(\mathrm{tr}(A + B) = \mathrm{tr}A + \mathrm{tr}B\)—see Proposition 2.16—we conclude that
$$\displaystyle\begin{array}{rcl} \mathrm{tr}N& =& \mathrm{tr}I_{T} -\mathrm{tr}X{({X}^{^{\prime}}X)}^{-1}{X}^{^{\prime}} = T -\mathrm{tr}{({X}^{^{\prime}}X)}^{-1}{X}^{^{\prime}}X \\ & =& T -\mathrm{tr}I_{n+1} = T - n - 1. {}\end{array}$$
(10.19)

q.e.d.

Corollary 10.2.

The unbiased OLS estimator for σ2 is given by
$$\displaystyle{ \hat{{\sigma }}^{2} = \frac{\hat{{u}}^{^{\prime}}\hat{u}} {T - n - 1}. }$$
(10.20)

Proof:

Evident from Proposition 10.3.

10.4 Distribution of OLS Estimators

10.4.1 A Digression on the Multivariate Normal

We begin this section by stating a few facts regarding the multivariate normal distribution. A random variable (vector) x having the multivariate normal distribution with mean (vector) μ and covariance matrix Σ is denoted by
$$\displaystyle{ x \sim N(\mu,\Sigma ), }$$
(10.21)
to be read x has the multivariate normal distribution with mean vector μ and covariance matrix Σ > 0.
Its moment generating function is given by
$$\displaystyle{ M_{x}(t) = E{e}^{{t}^{^{\prime}}x} = {e}^{{t}^{^{\prime}}\mu +\frac{1} {2} {t}^{^{\prime}}\Sigma t }. }$$
(10.22)
Generally, we deal with situations where Σ > 0, so that there are no linear dependencies among the elements of the vector x, which result in the singularity of the covariance matrix. We handle singular covariance matrix situations through the following convention.

Convention 10.1.

Let the k-element vector ξ obey ξ ∼ N(μ,Σ), such that Σ > 0, and suppose that y is an n-element vector (kn) which has the representation
$$\displaystyle{ y = A\xi + b,\ \ \mathrm{rank}(A) = k. }$$
(10.23)
Then, we say that y has the distribution
$$\displaystyle{ y \sim N(\nu,\Psi ),\ \ \ \nu = A\mu + b,\ \ \ \Psi = A\Sigma {A}^{^{\prime}}. }$$
(10.24)
Note that in Eq. (10.24), Ψ is singular, but properties of y can be inferred from those of ξ which has a proper multivariate normal distribution. Certain properties of the (multivariate) normal that are easily derivable from its definition are:
  1. i.
    Let xN(μ,Σ), partition \(x = \left (\begin{array}{c} {x}^{(1)} \\ {x}^{(2)}\end{array} \right ),\) such that x (1) has s elements and x (2) has ks elements. Partition μ and Σ conformably so that
    $$\displaystyle{ \mu = \left (\begin{array}{c} {\mu }^{(1)} \\ {\mu }^{(2)}\end{array} \right ),\ \ \ \Sigma = \left [\begin{array}{cc} \Sigma _{11} & \Sigma _{12} \\ \Sigma _{21} & \Sigma _{22}\end{array} \right ]. }$$
    (10.25)
    Then, the marginal distribution of x (i), i = 1,2, obeys
    $$\displaystyle{ {x}^{(i)} \sim N({\mu }^{(i)},\Sigma _{ ii}),\ \ \ i = 1,2. }$$
    (10.26)
    The conditional distribution of x (1) given x (2) is given by
    $$\displaystyle{ {x}^{(1)}\vert {x}^{(2)} \sim N({\mu }^{(1)} + \Sigma _{ 12}\Sigma _{22}^{-1}({x}^{(2)} - {\mu }^{(2)}),\ \Sigma _{ 11} - \Sigma _{12}\Sigma _{22}^{-1}\Sigma _{ 21}). }$$
    (10.27)
     
  2. ii.
    Let the k-element random vector x obey xN(μ,Σ), Σ > 0, and define \(y = Bx + c,\) where B is any conformable matrix; then
    $$\displaystyle{ y \sim N(B\mu + c,\ B\Sigma {B}^{^{\prime}}). }$$
    (10.28)
     
  3. iii.
    Let xN(μ,Σ) and partition as in part i; x (1) and x (2) are mutually independent if and only if
    $$\displaystyle{ \Sigma _{12} = \Sigma _{21}^{^{\prime}} = 0. }$$
    (10.29)
     
  4. iv
    We also have the sort of converse of ii, i.e. if x is as in ii, there exists a matrix C such that
    $$\displaystyle{ y = {C}^{-1}(x - \mu ) \sim N(0,I_{ k}). }$$
    (10.30)
    The proof of this is quite simple; by Proposition 2.15 there exist a nonsingular matrix C such that Σ = CC ; by ii, \(y \sim N(0,{C}^{-1}\Sigma {C}^{{\prime}-1} = I_{k})\).
     
  5. v
    An implication of iv is that
    $$\displaystyle{ {(x - \mu )}^{{\prime}}{\Sigma }^{-1}(x - \mu ) = {y}^{{\prime}}y =\sum _{ i=1}^{k}y_{ i}^{2} \sim \chi _{ r}^{2}, }$$
    (10.31)
    because the y i are iid N(0,1) whose squares have the χ2 distribution. More about this distribution will be found immediately below.
     

In item iii, note that if joint normality is not assumed and we partition \(x = \left (\begin{array}{c} {x}^{(1)} \\ {x}^{(2)} \end{array} \right ),\)such that x (1) has s elements and x (2) has ks elements, as above, then under the condition in iii, x (1) and x (2) are still uncorrelated, but they are not necessarily independent. Under normality uncorrelatedness implies independence; under any distribution independence always implies uncorrelatedness.

Other distributions, important in the GLM context are the chi-square, the t- (sometimes also termed the Student t) and the F-distributions.

The chi-square distribution with r degrees of freedom, denoted by χ r 2, may be thought to be the distribution of the sum of squares of r mutually independent normal variables with mean zero and variance one; the t-distribution with r degrees of freedom, denoted by t r , is defined as the distribution of the ratio
$$\displaystyle{ t_{r} = \frac{\xi } {\sqrt{\zeta /r}},\ \ \ \xi \sim N(0,1),\ \ \ \zeta \sim \chi _{r}^{2}, }$$
(10.32)
with ξ and ζ mutually independent.
The F-distribution with m and n degrees of freedom, denoted by F m,n , is defined as the distribution of the ratio
$$\displaystyle{ F_{m,n} = \frac{\xi /m} {\zeta /n},\ \ \ \xi \sim \chi _{m}^{2},\ \ \ \zeta \sim \chi _{ n}^{2}, }$$
(10.33)
with ξ and ζ mutually independent.

Note that F m,n F n,m . The precise relation between the two is given by \(F_{m,n} = 1/F_{n,m}.\)

10.4.2 Application to OLS Estimators

We now present a very important result.

Proposition 10.4.

The OLS estimators \(\hat{\beta }\) and \(\hat{{\sigma }}^{2}\) are mutually independent.

Proof:

Consider the \(T + n + 1\)-element vector, say \(\phi = {(\hat{{\beta }}^{^{\prime}},\hat{{u}}^{^{\prime}})}^{^{\prime}}.\) From the preceding discussion, we have the representation
$$\displaystyle{ \phi = Au+\left (\begin{array}{c} \beta \\ 0 \end{array} \right ),\ \ \mathrm{where}\ \ A = \left [\begin{array}{c} {({X}^{^{\prime}}X)}^{-1}{X}^{^{\prime}} \\ N \end{array} \right ]. }$$
(10.34)
From our discussion of the multivariate normal, we conclude
$$\displaystyle{ \phi \sim N(\nu,\Psi ),\ \ \mathrm{where}\ \ \nu = \left (\begin{array}{c} \beta \\ 0 \end{array} \right ),\ \ \Psi = {\sigma }^{2}A{A}^{^{\prime}}. }$$
(10.35)
But
$$\displaystyle{ A{A}^{^{\prime}} = \left [\begin{array}{cc} {({X}^{^{\prime}}X)}^{-1} & 0 \\ 0 &N \end{array} \right ], }$$
(10.36)
which shows that \(\hat{\beta }\) and \(\hat{u}\) are uncorrelated and hence, by the properties of the multivariate normal, they are mutually independent. Since \(\hat{{\sigma }}^{2}\) depends only on \(\hat{u}\) and thus not on \(\hat{\beta },\) the conclusion of the proposition is evident.

q.e.d.

Corollary 10.3.

Denote the vector of coefficients of the bona fide variables by β and the coefficient of the “fictitious variable” one (x t0) by β0 (the constant term), so that we have
$$\displaystyle{ \beta = {(\beta _{0},\beta _{{\ast}}^{^{\prime}})}^{^{\prime}},\ \ \ \beta _{ {\ast}} = {(\beta _{1},\beta _{2},\ldots,\beta _{n})}^{^{\prime}},\ \ \ X = (e,X_{ 1}), }$$
(10.37)
where e is a T-element column vector all of whose elements are unities. The following statements are true:
  1. i.

    \(\hat{u} \sim N(0,{\sigma }^{2}N);\)

     
  2. ii.

    \(\hat{\beta } \sim N(\beta,{\sigma }^{2}{({X}^{^{\prime}}X)}^{-1});\)

     
  3. iii.

    \(\hat{\beta }_{{\ast}}\sim N(\beta _{{\ast}},{\sigma }^{2}{(X_{1}^{{\ast}^{\prime}}X_{1}^{{\ast}})}^{-1}),\)    \(X_{1}^{{\ast}} = (I_{T} - e{e}^{^{\prime}}/T)X_{1}.\)

     

Proof:

The first two statements follow immediately from Proposition 10.3 and property i of the multivariate normal. The statement in iii also follows immediately from property i of the multivariate normal and the properties of the inverse of partitioned matrices; however, we also give an alternative proof because we will need a certain result in later discussion.

The first order conditions of the OLS estimator read
$$\displaystyle{T\beta _{0} + {e}^{^{\prime}}X_{1}\beta _{{\ast}} = {e}^{^{\prime}}y,\ \ \ X_{1}^{^{\prime}}e\beta _{ 0} + X_{1}^{^{\prime}}X_{ 1}\beta _{{\ast}} = X_{1}^{^{\prime}}y.}$$
Solving by substitution, we obtain, from the first equation,
$$\displaystyle{ \hat{\beta }_{0} =\bar{ y} -\bar{ x}_{1}\hat{\beta }_{{\ast}},\ \ \ \ \bar{y} = \frac{{e}^{^{\prime}}y} {T},\ \ \ \ \bar{x}_{1} = \frac{X_{1}^{^{\prime}}e} {T}, }$$
(10.38)
and from the second equation
$$\displaystyle{ \hat{\beta }_{{\ast}} ={ \left [X_{1}^{^{\prime}}\left (I_{ T} -\frac{e{e}^{^{\prime}}} {T} \right )X_{1}\right ]}^{-1}\left [X_{ 1}^{^{\prime}}\left (I_{ T} -\frac{e{e}^{^{\prime}}} {T} \right )y\right ]; }$$
(10.39)
substituting for \(y = e\beta _{0} + X_{1}\beta _{{\ast}} + u,\) and noting that \((I_{T} - e{e}^{^{\prime}}/T)e = 0,\) we find equivalently
$$\displaystyle{ \hat{\beta }_{{\ast}} = \beta _{{\ast}} + {(X_{1}^{{\ast}}X_{ 1}^{{\ast}})}^{-1}X_{ 1}^{{\ast}^{\prime}}u. }$$
(10.40)
The validity of statement iii then follows immediately from Property ii of the multivariate normal.

q.e.d.

Remark 10.1.

The results given above ensure that tests of significance or other inference procedures may be carried out, even when the variance parameter, σ2, is not known. As an example, consider the coefficient of correlation of multiple regression
$$\displaystyle{{R}^{2} = 1 - \frac{\hat{{u}}^{^{\prime}}\hat{u}} {{(y - e\bar{y})}^{^{\prime}}(y - e\bar{y})} = \frac{{(y - e\bar{y})}^{^{\prime}}(y - e\bar{y}) -\hat{ {u}}^{^{\prime}}\hat{u}} {{(y - e\bar{y})}^{^{\prime}}(y - e\bar{y})}.}$$
To clarify the role of matrix algebra in easily establishing the desired result, use the first order condition and the proof of Corollary 10.3 to establish
$$\displaystyle\begin{array}{rcl} y - e\bar{y}& =& \hat{u} + \left (I_{T} -\frac{e{e}^{^{\prime}}} {T} \right )X_{1}\hat{\beta }_{{\ast}} {}\\ {(y - e\bar{y})}^{^{\prime}}(y - e\bar{y}) -\hat{ {u}}^{^{\prime}}\hat{u}& =& \hat{\beta }_{{\ast}}^{^{\prime}}X_{ 1}^{^{\prime}}\left (I_{ T} -\frac{e{e}^{^{\prime}}} {T} \right )X_{1}\hat{\beta }_{{\ast}},\ \ \mbox{ and} {}\\ \frac{{R}^{2}} {1 - {R}^{2}}& =& \frac{\hat{\beta }_{{\ast}}^{^{\prime}}X_{1}^{^{\prime}}\left (I_{T} -\frac{e{e}^{^{\prime}}} {T} \right )X_{1}\hat{\beta }_{{\ast}}} {\hat{{u}}^{^{\prime}}\hat{u}}. {}\\ \end{array}$$
It follows therefore that
$$\displaystyle{ \left ( \frac{{R}^{2}} {1 - {R}^{2}}\right )\frac{T - n - 1} {n} = \left (\frac{\hat{\beta }_{{\ast}}^{^{\prime}}X_{1}^{{\ast}^{\prime}}X_{1}^{{\ast}}\hat{\beta }_{{\ast}}/{\sigma }^{2}} {\hat{{u}}^{^{\prime}}\hat{u}/{\sigma }^{2}} \right )\frac{T - n - 1} {n}. }$$
(10.41)
The relevance of this result in carrying out “significance tests” is as follows: first, note that since \(\hat{\beta }_{{\ast}}\sim N(\beta _{{\ast}},{\sigma }^{2}{(X_{1}^{{\ast}^{\prime}}X_{1}^{{\ast}})}^{-1})\) the numerator and denominator of the fraction are mutually independent by Proposition 10.3. From Proposition 2.15, we have that every positive definite matrix has a nonsingular decomposition, say AA . Let
$$\displaystyle{ {(X_{1}^{{\ast}^{\prime}}X_{ 1}^{{\ast}})}^{-1} = A{A}^{^{\prime}}. }$$
(10.42)
It follows from property ii of the multivariate normal that
$$\displaystyle{ \xi = \frac{{A}^{-1}(\hat{\beta }_{{\ast}}- \beta _{{\ast}})} {\sqrt{{\sigma }^{2}}} \sim N(0,I_{n}). }$$
(10.43)
This implies that all of the elements of the vector ξ are scalars, mutually independent (normal) random variables with mean zero and variance one. Hence, by the preceding discussion,
$$\displaystyle{ {\xi }^{^{\prime}}\xi = \frac{(\hat{\beta }_{{\ast}}^{^{\prime}}- \beta _{{\ast}})X_{1}^{{\ast}^{\prime}}X_{1}^{{\ast}}(\hat{\beta }_{{\ast}}- \beta _{{\ast}})} {{\sigma }^{2}} \sim \chi _{n}^{2}. }$$
(10.44)
Similarly, \(\hat{{u}}^{^{\prime}}\hat{u} = {u}^{^{\prime}}Nu\) and N is a symmetric idempotent matrix of rank \(T - n - 1.\) As a symmetric and idempotent matrix, it has the representation
$$\displaystyle{ N = Q\Lambda {Q}^{^{\prime}}\ \ \ \ \ \Lambda = \left [\begin{array}{cc} I_{T-n-1} & 0 \\ 0 &0_{n+1} \end{array} \right ], }$$
(10.45)
where Q is an orthogonal matrix. To verify that, see Propositions 2.53 and 2.55.
Partition the matrix of characteristic vectors Q = (Q 1,Q 2), so that Q 1 corresponds to the nonzero (unit) roots and note that
$$\displaystyle{ \frac{{u}^{^{\prime}}Nu} {{\sigma }^{2}} = \frac{{u}^{^{\prime}}Q_{1}Q_{1}^{^{\prime}}u} {{\sigma }^{2}}. }$$
(10.46)
Put \(\zeta = Q_{1}^{^{\prime}}u/\sqrt{{\sigma }^{2}}\) and note that by property ii of the multivariate normal
$$\displaystyle{ \zeta \sim N(0,Q_{1}^{^{\prime}}(Q_{ 1}Q_{1}^{^{\prime}})Q_{ 1}) = N(0,I_{T-n-1}). }$$
(10.47)
It follows, therefore, by the definition of the chi-square distribution
$$\displaystyle{ \frac{{u}^{^{\prime}}Nu} {{\sigma }^{2}} \sim \chi _{T-n-1}^{2}. }$$
(10.48)
Now, under the null hypothesis,

H 0:  β = 0

as against the alternative,

H 1:  β ≠ 0,

the numerator of the fraction \(({R}^{2}/n)/[(1 - {R}^{2})/T - n - 1]\) is chi-square distributed with n degrees of freedom. Hence, Eq. (10.41) may be used as a test statistic for the test of the hypothesis stated above, and its distribution is \(F_{n,T-n-1}.\)

Remark 10.2.

The preceding remark enunciates a result that is much broader than appears at first sight. Let S r be an n × r, rn, selection matrix; this means that the columns of S r are mutually orthogonal and in each column all elements are zero except one which is unity. This makes S r of rank r. Note also that it is orthogonal only in the sense that S r S r = I r ; on the other hand \(S_{r}S_{r}^{^{\prime}}\neq I_{n}.\) It is clear that if we are interested in testing the hypothesis, say, that \(\beta _{2} = \beta _{3} = \beta _{7} = \beta _{12} = 0,\) we may define the selection matrix S 4 such that
$$\displaystyle{ S_{4}^{^{\prime}}\beta _{ {\ast}} = {(\beta _{2},\beta _{3},\beta _{7},\beta _{12})}^{^{\prime}}. }$$
(10.49)
Since \(S_{4}^{^{\prime}}\hat{\beta }_{{\ast}}\sim N(S_{4}^{^{\prime}}\beta _{{\ast}},{\sigma }^{2}\Psi _{4}),\) where
$$\displaystyle{ \Psi _{4} = S_{4}^{^{\prime}}{(X_{ 1}^{{\ast}^{\prime}}X_{ 1}^{{\ast}})}^{-1}S_{ 4}, }$$
(10.50)
it follows, from the discussion of Remark 10.1 and property i of the multivariate normal distribution, that
$$\displaystyle{ \left (\frac{\hat{\beta }_{{\ast}}^{^{\prime}}S_{4}\Psi _{4}^{-1}S_{4}^{^{\prime}}\hat{\beta }_{{\ast}}} {\hat{{u}}^{^{\prime}}\hat{u}} \right )\left (\frac{T - n - 1} {4} \right ) \sim F_{4,T-n-1} }$$
(10.51)
is a suitable test statistic for testing the null hypothesis

H 0:  S 4 β = 0,

as against the alternative,

\(H_{1}:\ S_{4}^{^{\prime}}\beta _{{\ast}}\neq 0,\)

and its distribution is \(F_{4,T-n-1}.\)

Finally, for r = 1—i.e. for the problem of testing a hypothesis on a single coefficient—we note that the preceding discussion implies that the appropriate test statistic and its distribution are given by
$$\displaystyle{ {\tau }^{2} = \frac{\hat{\beta }_{i}^{2}/\mathrm{Var}(\hat{\beta })_{ i}} {\hat{{u}}^{^{\prime}}\hat{u}/{\sigma }^{2}(T - n - 1)} \sim F_{1,T-n-1}, }$$
(10.52)
where var(\(\hat{\beta }_{i}\)) = σ2 q ii , and q ii is the ith diagonal element of \({(X_{1}^{{\ast}^{\prime}}X_{1}^{{\ast}})}^{-1}.\) Making the substitution \(\hat{{\sigma }}^{2} =\hat{ {u}}^{^{\prime}}\hat{u}/T - n - 1\) and taking the square root in Eq. (10.52) we find
$$\displaystyle{ \tau = \frac{\hat{\beta }_{i}} {\sqrt{\hat{{\sigma }}^{2 } q_{ii}}} \sim \sqrt{F_{1,T-n-1}}, }$$
(10.53)
A close inspection indicates that τ of Eq. (10.53) is simply the usual t-ratio of regression analysis. Finally, also observe that
$$\displaystyle{ t_{T-n-1} = \sqrt{F_{1,T-n-1}}, }$$
(10.54)
or more generally the distribution of the square root of a variable that has the F 1,n distribution is precisely the t n distribution.

10.5 Nonstandard Errors

In this section, we take up issues that arise when the error terms do not obey the standard requirement uN(0,σ2 I T ), but instead have the more general normal distribution
$$\displaystyle{ u \sim N(0,\Sigma ),\ \ \ \ \Sigma > 0. }$$
(10.55)
Since Σ is T × T, we cannot in practice obtain efficient estimators unless Σ is known. If it is, obtain the nonsingular decomposition and the transformed model, respectively,
$$\displaystyle{ \Sigma = A{A}^{^{\prime}},\ \ \ w = Z\beta + v,\ \ \ w = {A}^{-1}y,\ \ Z = {A}^{-1}X,\ \ v = {A}^{-1}u. }$$
(10.56)
It may be verified that the transformed model obeys the standard conditions, and hence the OLS estimator
$$\displaystyle{ \hat{\beta } = {({Z}^{^{\prime}}Z)}^{-1}{Z}^{^{\prime}}w,\ \ \ \mathrm{with}\ \ \ \hat{\beta } \sim N(\beta,\ {({Z}^{^{\prime}}Z)}^{-1}) = N(\beta,{({X}^{{\prime}}{\Sigma }^{-1}X)}^{-1}) }$$
(10.57)
obeys the Gauss-Markov theorem and is thus efficient, within the class of linear unbiased estimators of β.

If Σ is not known the estimator in Eq. (10.57), termed in econometrics the Aitken estimator, is not available. However, (as T), if Σ has only a fixed finite number of distinct elements which can be estimated consistently, say by \(\tilde{\Sigma }\), the estimator in Eq. (10.57) with Σ replaced by \(\tilde{\Sigma }\) , is feasible and is termed the Generalized Least Squares (GLS) estimator.

If we estimate β by OLS methods, what are the properties of that estimator and how does it compare with the Aitken estimator? The OLS estimator evidently obeys
$$\displaystyle{ \tilde{\beta } = {({X}^{^{\prime}}X)}^{-1}{X}^{^{\prime}}y = \beta + {({X}^{^{\prime}}X)}^{-1}{X}^{^{\prime}}u. }$$
(10.58)
From property ii of the multivariate normal, we easily obtain
$$\displaystyle{ \tilde{\beta } \sim N(\beta,\Psi ), }$$
(10.59)
where \(\Psi = {({X}^{^{\prime}}X)}^{-1}{X}^{^{\prime}}\Sigma X{({X}^{^{\prime}}X)}^{-1}.\) It is evident that the estimator is unbiased. Moreover, provided
$$\displaystyle{ \lim _{T\rightarrow \infty }\frac{{X}^{^{\prime}}\Sigma X} {{T}^{2}} = 0, }$$
(10.60)
we have
$$\displaystyle{ \lim _{T\rightarrow \infty }\Psi =\lim _{T\rightarrow \infty }\frac{1} {T}{\left (\frac{{X}^{^{\prime}}X} {T} \right )}^{-1}\left (\frac{{X}^{^{\prime}}\Sigma X} {T} \right )\left (\frac{{X}^{^{\prime}}X} {T} \right ) = 0, }$$
(10.61)
which shows that the estimator is consistent in the mean square sense i.e. it converges to β in mean square and thus also in probability.
How does it compare with the Aitken estimator? To make the comparison, first express the Aitken estimator in the original notation, Thus,
$$\displaystyle{ \hat{\beta } = {({X}^{^{\prime}}{\Sigma }^{-1}X)}^{-1}{X}^{^{\prime}}{\Sigma }^{-1}y,\ \ \ \ \hat{\beta } \sim N(\beta,{({X}^{^{\prime}}{\Sigma }^{-1}X)}^{-1}). }$$
(10.62)
Because both estimators are normal with the same mean, the question of efficiency reduces to whether the difference between the two covariance matrices is positive or negative semi-definite or indefinite. For simplicity of notation let \(\Sigma _{\tilde{\beta }},\ \Sigma _{\hat{\beta }}\) be the covariance matrices of the OLS and Aitken estimators, respectively. If
  1. i.

    \(\Sigma _{\tilde{\beta }} - \Sigma _{\hat{\beta }} \geq 0,\) the Aitken estimator is efficient relative to the OLS estimator;

     
  2. ii.

    \(\Sigma _{\tilde{\beta }} - \Sigma _{\hat{\beta }} \leq 0,\) the OLS estimator is efficient relative to the Aitken estimator;

     
  3. iii.

    Finally if \(\Sigma _{\tilde{\beta }} - \Sigma _{\hat{\beta }}\) is an indefinite matrix i.e. it is neither positive nor negative (semi)definite, the two estimators cannot be ranked.

     

To tackle this issue directly, we consider the simultaneous decomposition of two positive definite matrices; see Proposition 2.64.

Consider the characteristic roots of \(X{({X}^{^{\prime}}{\Sigma }^{-1}X)}^{-1}{X}^{^{\prime}}\) in the metric of Σ i.e. consider the characteristic equation
$$\displaystyle{ \vert \lambda \Sigma - X{({X}^{^{\prime}}{\Sigma }^{-1}X)}^{-1}{X}^{^{\prime}}\vert = 0. }$$
(10.63)
The (characteristic) roots of the (polynomial) equation above are exactly those of
$$\displaystyle{ \vert \lambda I_{T} - X{({X}^{^{\prime}}{\Sigma }^{-1}X)}^{-1}{X}^{^{\prime}}{\Sigma }^{-1}\vert = 0, }$$
(10.64)
as the reader may easily verify by factoring out on the right Σ, and noting that |Σ| ≠ 0. From Proposition 2.43, we have that the nonzero characteristic roots of Eq. (10.64) are exactly those of
$$\displaystyle{ \vert \mu I_{n+1} - {({X}^{^{\prime}}{\Sigma }^{-1}X)}^{-1}{X}^{^{\prime}}{\Sigma }^{-1}X\vert = \vert \mu I_{ n+1} - I_{n+1}\vert = 0. }$$
(10.65)
We conclude, therefore, that Eq. (10.63) has n + 1 unit roots and \(T - n - 1\) zero roots. By the simultaneous decomposition theorem, see Proposition 2.64, there exists a nonsingular matrix A such that
$$\displaystyle{ \Sigma = A{A}^{^{\prime}},\ \ X{({X}^{^{\prime}}{\Sigma }^{-1}X)}^{-1}{X}^{^{\prime}} = A\left [\begin{array}{cc} 0_{T-n-1} & 0 \\ 0 &I_{n+1} \end{array} \right ]{A}^{^{\prime}}. }$$
(10.66)
It follows, therefore, that
$$\displaystyle{ \Sigma -X{({X}^{^{\prime}}{\Sigma }^{-1}X)}^{-1}{X}^{^{\prime}} = A\left [\begin{array}{cc} I_{T-n-1} & 0 \\ 0 &0_{n+1} \end{array} \right ]{A}^{^{\prime}} \geq 0. }$$
(10.67)
Pre- and post-multiplying by (X X)−1 X and its transpose, respectively, we find
$$\displaystyle{ {({X}^{^{\prime}}X)}^{-1}{X}^{^{\prime}}[\Sigma - X{({X}^{^{\prime}}{\Sigma }^{-1}X)}^{-1}{X}^{^{\prime}}]X{({X}^{^{\prime}}X)}^{-1} = \Sigma _{\tilde{ \beta }} - \Sigma _{\hat{\beta }} \geq 0, }$$
(10.68)
which shows that the Aitken estimator is efficient relative to the OLS estimator. The validity of the last inequality is established by the following argument: let B be a T × T positive semidefinite matrix of rank r, and let C be T × m of rank m, mT; then, C BC ≥ 0. For a proof, we show that either C BC = 0, or there exists at least one vector α ≠ 0, such that \({\alpha }^{^{\prime}}{C}^{^{\prime}}BC\alpha > 0\) and no vector η such that \({\eta }^{^{\prime}}{C}^{^{\prime}}BC\eta < 0.\) Since C is of rank m > 0, its column space is of dimension m; if the column space of C is contained in the null space of B, C BC = 0; if not, then there exists at least one vector γ ≠ 0 in the column space of C such that γ Bγ > 0, because B is positive semidefinite. Let α be such that γ = Cα; the claim of the last inequality in Eq. (10.68) is thus valid. Moreover, no vector η can exist such that η C BCη < 0. This is so because Cη is in the column space of C and B is positive semidefinite.

Remark 10.3.

We should also point out that there is an indirect proof of the relative inefficiency of the OLS estimator of β in the model above. We argued earlier that the Aitken estimator in a model with nonstandard errors is simply the OLS estimator in an appropriately transformed model and thus obeys the Gauss-Markov theorem. It follows, therefore, that the OLS estimator in the untransformed model cannot possibly obey the Gauss-Markov theorem and is thus not efficient.

We now take up another question of practical significance. If in the face of a general covariance matrix, Σ > 0, for the errors of a GLM we estimate parameters and their covariance matrix as if the model had a scalar covariance matrix, do the resulting test statistics have a tendency (on the average) to reject too frequently, or not frequently enough, relative to the situation when the correct covariance matrix is estimated.

If we pretend that Σ = σ2 I T , the covariance matrix of the OLS estimator of β is estimated as
$$\displaystyle{ \tilde{{\sigma }}^{2}{({X}^{^{\prime}}X)}^{-1},\ \ \ \mathrm{where}\ \ \ \tilde{{\sigma }}^{2} = \frac{1} {T - n - 1}\tilde{{u}}^{^{\prime}}\tilde{u}. }$$
(10.69)
The question posed essentially asks whether the matrix
$$\displaystyle{ W = E\tilde{{\sigma }}^{2}{({X}^{^{\prime}}X)}^{-1} - {({X}^{^{\prime}}X)}^{-1}({X}^{^{\prime}}\Sigma X){({X}^{^{\prime}}X)}^{-1} }$$
(10.70)
is positive semidefinite, negative semidefinite or indefinite. If it is positive semidefinite, the test statistics (t-ratios) will be understated and hence the hypotheses in question will tend to be accepted too frequently; if it is negative semi-definite, the test statistics will tend to be overstated, and hence the hypotheses in question will tend to be rejected too frequently. If it is indefinite, no such statements can be made.
First we obtain
$$\displaystyle{ E\tilde{{\sigma }}^{2} = \frac{1} {T - n - 1}E\mathrm{tr}u{u}^{^{\prime}}N = \frac{1} {T - n - 1}\mathrm{tr}\Sigma N = k_{T}. }$$
(10.71)
To determine the nature of W we need to put more structure in place. Because Σ is a positive definite symmetric matrix, we can write
$$\displaystyle{ \Sigma = Q\Lambda {Q}^{^{\prime}}, }$$
(10.72)
where Q is the orthogonal matrix of the characteristic vectors and Λ is the diagonal matrix of the (positive) characteristic roots arranged in decreasing order, i.e. λ1 is the largest root and λ T is the smallest.

What we shall do is to show that there exist data matrices (X) such that W is positive semidefinite, and data matrices such that W is negative semidefinite. To determine the nature of W we must obtain a result that holds for arbitrary data matrix X. Establishing the validity of the preceding claim is equivalent to establishing that W is an indefinite matrix.

Evidently, the columns of Q can serve as a basis for the Euclidean space R T ; the columns of the matrix X lie in an (n + 1)-dimensional subspace of R T . Partition Q = (Q 1,Q ) such that Q 1 corresponds to the n + 1 largest roots and suppose we may represent X = Q 1 A, where A is nonsingular. This merely states that X lies in the subspace of R T spanned by the columns of Q 1. In this context, we have a simpler expression for k T of Eq. (10.71). In particular, we have
$$\displaystyle\begin{array}{rcl} (T - n - 1)k_{T}& =& \mathrm{tr}Q\Lambda {Q}^{^{\prime}}[I_{T} - Q_{1}A{({A}^{^{\prime}}A)}^{-1}{A}^{^{\prime}}Q_{ 1}^{^{\prime}}] \\ & =& \mathrm{tr}\Lambda [I_{T} - {(I_{n+1},0)}^{^{\prime}}(I_{n+1},0)] =\sum _{ j=n+2}^{T}\lambda _{ j},{}\end{array}$$
(10.73)
so that k T is the average of the smallest \(T - n - 1\) roots. Since X X = A A, we obtain
$$\displaystyle{ W = k_{T}{({A}^{^{\prime}}A)}^{-1} - {A}^{-1}Q_{ 1}^{^{\prime}}\Sigma Q_{ 1}{A}^{^{\prime}-1} = {A}^{-1}[k_{ T}I_{n+1} - \Lambda _{1}]{A}^{^{\prime}-1}. }$$
(10.74)
Because k T is the average of the \(T - n - 1\) smallest roots whereas Λ1 contains the n + 1 largest roots, we conclude that \(k_{T}I_{n+1} - \Lambda _{1} < 0,\) and consequently
$$\displaystyle{ W < 0. }$$
(10.75)
But this means that the test statistics have a tendency to be larger relative to the case where we employ the correct covariance matrix; thus, hypotheses (that the underlying parameter is zero) would tend to be accepted more frequently than appropriate.
Next, suppose that X lies in the (n + 1)-dimensional subspace of R T spanned by the columns of Q 2, where Q = (Q ,Q 2), so that Q 2 corresponds to the n + 1 smallest roots. Repeating the same construction as above we find that in this instance
$$\displaystyle\begin{array}{rcl} (T - n - 1)k_{T}& =& \mathrm{tr}Q\Lambda {Q}^{^{\prime}}[I_{T} - Q_{2}A{({A}^{^{\prime}}A)}^{-1}{A}^{^{\prime}}Q_{ 2}^{^{\prime}}] \\ & =& \mathrm{tr}\Lambda [I_{T} - {(0,I_{n+1})}^{^{\prime}}(0,I_{n+1})] =\sum _{ j=1}^{T-n-1}\lambda _{ j},{}\end{array}$$
(10.76)
so that k T is the average of the largest \(T - n - 1\) roots. Therefore, in this case, we have
$$\displaystyle{ W = k_{T}{({A}^{^{\prime}}A)}^{-1} - {A}^{-1}Q_{ 2}^{^{\prime}}\Sigma Q_{ 2}{A}^{^{\prime}-1} = {A}^{-1}[k_{ T}I_{n+1} - \Lambda _{2}]{A}^{^{\prime}-1} > 0, }$$
(10.77)
since k T is the average of the \(T - n - 1\) largest roots and Λ2 contains along its diagonal the n + 1 smallest roots of Σ.

Evidently, in this case, the test statistics are smaller than appropriate and, consequently, we tend to reject hypotheses more frequently than appropriate. The preceding shows that the matrix W is indefinite.

Finally, the argument given above is admissible because no restrictions are put on the matrix X; since it is arbitrary it can, in principle, lie in an (n + 1)-dimensional subspace of R T spanned by any set of n + 1 of the characteristic vectors of Σ. Therefore, we must classify the matrix W as indefinite, when X is viewed as arbitrary. It need not be so for any particular matrix X. But this means that no statement can be made with confidence on the subject of whether using \(\tilde{{\sigma }}^{2}{({X}^{^{\prime}}X)}^{-1}\) as the covariance matrix of the OLS estimator leads to any systematic bias in accepting or rejecting hypotheses. Thus, nothing can be concluded beyond the fact that using the OLS estimator, and OLS based estimators of its covariance matrix, (when actually the covariance matrix of the structural error in non-scalar) is inappropriate, unreliable, and should be avoided for purposes of hypothesis testing.

10.6 Inference with Asymptotics

In this section we dispose of the somewhat unrealistic assertion that the structural errors are jointly normally distributed. Recall that we had made this assertion only in order to develop a distribution theory to be used in inferences regarding the parameters of the model.

Here we take advantage of the material developed in  Chap. 9, to develop a distribution theory for the OLS estimators based on their asymptotic or limiting distribution. To this end return to the OLS estimator as exhibited in Eq. (10.6). Nothing much will change relative to the results we obtained above, but the results will not hold for every sample size T, but only for “large” T, although it is not entirely clear what large is. Strictly speaking, limiting or asymptotic results hold precisely only at the limit, i.e. as T →∞; but if the sample is large enough the distribution of the entity in question could be well approximated by the asymptotic or limiting distribution. We may, if we wish, continue with the context of OLS estimation embodied in condition i above, or we may take the position that the explanatory variables are random and all analysis is carried conditionally on the observations in the data matrix X; in this case we replace the condition therein by \(\mathop{\mathrm{plim}}_{T\rightarrow \infty }(XX/T) = M_{xx} > 0\). We shall generally operate under the last condition. At any rate developing the expression in Eq. (10.6) we find
$$\displaystyle{ \hat{\beta }={({X}^{^{\prime}}X)}^{-1}{X}^{^{\prime}}y=\beta +{(X^{\prime}X)}^{-1}X^{\prime}u,\quad \mbox{ or}\quad \sqrt{T}(\hat{\beta }-\beta )={\left (\frac{X^{\prime}X} {T} \right )}^{-1}\left (\frac{X^{\prime}u} {\sqrt{T}}\right ). }$$
(10.78)
Applying Proposition 9.6, we find3
$$\displaystyle{ \sqrt{T}(\hat{\beta } - \beta ) \sim M_{xx}^{-1} \frac{1} {\sqrt{T}}\sum _{t=1}^{T}x_{ t\cdot }^{{\prime}}u_{ t}. }$$
(10.79)
The sum in the right member above is the sum of independent non-identically distributed random vectors with mean zero and covariance matrix \({\sigma }^{2}(x_{t\cdot }^{^{\prime}}x_{t\cdot }/T)\), to which a CLT may be applied. Since in all our discussion of CLT we have used only scalar random variables, ostensibly none of these results can be employed in the current (vector) context. On the other hand using the observation in Remark 9.4 let λ be an arbitrary conformable vector and consider
$$\displaystyle{ \zeta _{T} = \frac{1} {\sqrt{T}}\sum _{t=1}^{T}{\lambda }^{{\prime}}x_{ t\cdot }^{{\prime}}u_{ t}. }$$
(10.80)
The rvs \({\lambda }^{{\prime}}x_{t\cdot }^{{\prime}}u_{t}\), are independent non-identically distributed with mean zero and variance \({\sigma }^{2}{\lambda }^{{\prime}}(x_{t\cdot }^{^{\prime}}x_{t\cdot }/T)\lambda \), which also obey the Lindeberg condition, because of the conditions put on XXT. Consequently by Proposition 9.7 we conclude
$$\displaystyle{\zeta _{T}\stackrel{\mathrm{d}}{\longrightarrow }N(0,{\sigma }^{2}{\lambda }^{{\prime}}M_{ xx}\lambda ),\ \mbox{ and by Remark 9.4}\ \frac{1} {\sqrt{T}}\sum _{t=1}^{T}x_{ t\cdot }^{{\prime}}u_{ t}\stackrel{\mathrm{d}}{\longrightarrow }N(0,{\sigma }^{2}M_{ xx}),}$$
thus establishing
$$\displaystyle{ \sqrt{T}(\hat{\beta } - \beta )\stackrel{\mathrm{d}}{\longrightarrow }N(0,{\sigma }^{2}M_{ xx}^{-1}). }$$
(10.81)
For large T practitioners often use the approximation
$$\displaystyle{\sqrt{T}(\hat{\beta } - \beta ) \approx N(0,\tilde{{\sigma }}^{2}\tilde{M}_{ xx}^{-1}),\quad \tilde{M}_{ xx}^{-1} ={ \left (\frac{XX} {T} \right )}^{-1},\quad \tilde{{\sigma }}^{2} =\hat{ {u}}^{{\prime}}\hat{u}/T.}$$
To translate the procedures for inference tests developed earlier (where we had normal distributions for every sample size, T) to the asymptotic case (where normality holds only for large T) we shall not follow step by step what we had done earlier. Instead we shall introduce the so called general linear hypothesis,

H 0: Aβ = a 0

as against the alternative

H 1: Aβ ≠ a 0,

where A is \(k \times n + 1,\quad k \leq n + 1\), rank(A) = k, and Aa 0 are respectively a matrix and vector with known elements.

Remark 10.4.

Note that the formulation above, encompasses all the types of inference tests considered in the earlier context. For example, if we wish to test the hypothesis that \(\beta _{i} = a_{(0),i} = 0\) simply take a 0 = 0 and A as consisting of a single row, all of whose elements are zero save the one corresponding to β i , which is unity. If we wish to test the hypothesis that \(\beta _{2} = \beta _{3} = \beta _{7} = \beta _{10} = 0\), simply take a 0 = 0, and A consisting of four rows, all of whose elements are zero except, respectively, those corresponding to β i i = 2,3,7,10. If we wish to duplicate the test based on the ratio \({R}^{2}/1 - {R}^{2}\), which tests the hypothesis that the coefficients of all bona fide variables are zero, i.e. β = 0,4 take A = (0,I n ), which is an n × n + 1 matrix. Thus, all tests involving linear restrictions on the parameters in β are encompassed in the general linear hypothesis Aβ = a 0 .

To apply the asymptotic distribution we need to design tests based exclusively on that distribution. To this end consider
$$\displaystyle{{\tau }^{{\ast}} = \sqrt{T}{(A\hat{\beta } - a_{ 0})}^{{\prime}}{[{\sigma }^{2}AM_{ xx}^{-1}{A}^{{\prime}}]}^{-1}\sqrt{T}(A\hat{\beta } - a_{ 0}) \sim \chi _{k}^{2},}$$
because of item v (Eq. (10.31)) given in connection with the multivariate normal above. Unfortunately, however, τ is not a statistic since it contains the unknown parameters σ2 and M xx , not specified by the null. From Proposition 9.6, however, we know that if we replace σ2M xx by their consistent estimators, viz. \(\hat{{u}}^{{\prime}}\hat{u}/T\), X XT respectively, the limiting distribution will be the same; thus, consider instead
$$\displaystyle\begin{array}{rcl} \tau & =& \sqrt{T}{(A\hat{\beta } - a_{0})}^{{\prime}}{[\tilde{{\sigma }}^{2}A{\tilde{M}_{ xx}}^{-1}{A}^{{\prime}}]}^{-1}\sqrt{T}(A\hat{\beta } - a_{ 0}) {}\\ & =& \frac{T{(A\hat{\beta } - a_{0})}^{{\prime}}{[A{({X}^{{\prime}}X/T)}^{-1}{A}^{{\prime}}]}^{-1}(A\hat{\beta } - a_{0})} {\hat{{u}}^{{\prime}}\hat{u}/T} \stackrel{\mathrm{d}}{\rightarrow }\chi _{k}^{2}. {}\\ \end{array}$$
Finally, clearing of redundancies, we may write
$$\displaystyle{ \tau = \frac{{(A\hat{\beta } - a_{0})}^{{\prime}}{[A{({X}^{{\prime}}X)}^{-1}A]}^{-1}(A\hat{\beta } - a_{0})} {\hat{{u}}^{{\prime}}\hat{u}/T} \stackrel{\mathrm{d}}{\rightarrow }\chi _{k}^{2}, }$$
(10.82)
which is a statistic in that it does not contain unknown parameters not specified by the null hypothesis.

Remark 10.5.

A careful examination will disclose that when applied to the test statistic obtained earlier when estimators were normally distributed for every sample size T, the statistic τ of Eq. (10.82) duplicates them precisely, except for the denominator in \(\hat{{u}}^{{\prime}}\hat{u}/T\), which is immaterial. This means that if one does what is usually done in evaluating regression results (when it is assumed that normality of estimators prevails for all sample sizes T) the test procedures will continue to be valid when the sample size is large and one employs the limiting distribution of the estimators. The only difference is that what is t-test in the earlier case is now a z-test (i.e. based on N(0,1)) and what was an F-test is now a chi square test.

10.7 Orthogonal Regressors

Suppose that the regressors of the GLM are mutually orthogonal, meaning that in the representation
$$\displaystyle{y = X\beta + u,\ \ \ \mbox{ we have}\ \ {X}^{^{\prime}}X = D = \mathrm{diag}(d_{1},d_{2},\ldots,d_{n+1}).}$$
In this case, the elements of β, the regression coefficients, can be estimated seriatim, i.e.
$$\displaystyle{\hat{\beta }_{i} = {(x_{\cdot i}^{^{\prime}}x_{ \cdot i})}^{-1}x_{ \cdot i}^{^{\prime}}y,\ \ i = 0,1,2,\ldots,n + 1.}$$
This may be verified directly by computing the elements of
$$\displaystyle{\hat{\beta } = {({X}^{^{\prime}}X)}^{-1}{X}^{^{\prime}}y = {D}^{-1}{X}^{^{\prime}}y.}$$
Although this is a rare occurrence in actual practice, nonetheless it points out an important feature of least squares. Suppose, for example, the model is written as
$$\displaystyle{ y = X_{1}\beta _{1} + X_{2}\beta _{2} + u, }$$
(10.83)
where X 1 is T × m + 1 and X 2 is T × k, \(k = n - m\) such that \(X_{1}^{^{\prime}}X_{2} = 0.\)
By the argument given above, we can estimate
$$\displaystyle{ \hat{\beta }_{1} = {(X_{1}^{^{\prime}}X_{ 1})}^{-1}X_{ 1}^{^{\prime}}y,\quad \hat{\beta }_{ 2} = {(X_{2}^{^{\prime}}X_{ 2})}^{-1}X_{ 2}^{^{\prime}}y. }$$
(10.84)
The least squares residuals are thus given by
$$\displaystyle{ \hat{u} = N_{1}y + (N_{2} - I_{T})y,\ \ \ N_{i} = I_{T} - X_{i}{(X_{i}^{^{\prime}}X_{ i})}^{-1}X_{ i}^{\prime},\ \ i = 1,2. }$$
(10.85)
Even if the two sets of variables in X 1 and X 2 are not mutually orthogonal, we can use the preceding discussion to good advantage. For example, suppose we are not particularly interested in the coefficients of the variables in X 1 but wish to carry out tests on the coefficients of the variables in X 2. To do so, we need estimators of the coefficient vector β2 as well as its covariance matrix. Oddly enough, we may accomplish this with a simple regression as follows. Rewrite the model as
$$\displaystyle\begin{array}{rcl} y& =& X\beta + u = X_{1}\beta _{1} + N_{1}X_{2}\beta _{2} + {u}^{{\ast}}, \\ {u}^{{\ast}}& =& (I_{ T} - N_{1})X_{2}\beta _{2} + u. {}\end{array}$$
(10.86)
Carrying out an OLS regression, we find
$$\displaystyle{\hat{\beta } = {[{(X_{1},X_{2}^{{\ast}})}^{^{\prime}}(X_{ 1},X_{2}^{{\ast}})]}^{-1}{(X_{ 1},X_{2}^{{\ast}})}^{^{\prime}}y,\ \ \ X_{ 2}^{{\ast}} = N_{ 1}X_{2}.}$$
Making a substitution (for y) from Eq. (10.86), we can express the estimator as
$$\displaystyle{ \hat{\beta } = \left (\begin{array}{c} \beta _{1}\\ \ \\ \beta _{2} \end{array} \right )+\left (\begin{array}{c} X_{1}^{^{\prime}}X_{ 2}\beta _{2}\\ \ \\ 0 \end{array} \right )+\left (\begin{array}{c} {(X_{1}^{^{\prime}}X_{1})}^{-1}X_{1}^{^{\prime}}\\ \ \\ {(X_{2}^{{\ast}^{\prime}}X_{2}^{{\ast}})}^{-1}X_{2}^{{\ast}^{\prime}} \end{array} \right )u. }$$
(10.87)
Note that the estimator for β2 is unbiased but the estimator for β1 is not. Because we are not interested in β1, this does not present a problem. Next, compute the residuals from this regression, namely
$$\displaystyle\begin{array}{rcl} \hat{u}& =& y - X_{1}\hat{\beta }_{1} - X_{2}^{{\ast}}\hat{\beta }_{ 2} = y - X_{1}\beta _{1} - N_{1}X_{2}\beta _{2} \\ & & \ \ \ -(I_{T} - N_{1})X_{2}\beta _{2} - [(I_{T} - N_{1}) + X_{2}^{{\ast}^{\prime}}{(X_{ 2}^{{\ast}^{\prime}}X_{ 2}^{{\ast}})}^{-1}X_{ 2}^{{\ast}^{\prime}}]u \\ & =& y - X_{1}\beta _{1} - X_{2}\beta _{2} - [(I_{T} - N_{1}) + X_{2}^{{\ast}^{\prime}}{(X_{ 2}^{{\ast}^{\prime}}X_{ 2}^{{\ast}})}^{-1}X_{ 2}^{{\ast}^{\prime}}]u \\ & =& [N_{1} - N_{1}X_{2}^{^{\prime}}{(X_{ 2}^{^{\prime}}N_{ 1}X_{2})}^{-1}X_{ 2}^{^{\prime}}N_{ 1}]u \\ & =& [I_{T} - X_{2}^{{\ast}^{\prime}}{(X_{ 2}^{{\ast}^{\prime}}X_{ 2}^{{\ast}})}^{-1}X_{ 2}^{{\ast}^{\prime}}]N_{ 1}u. {}\end{array}$$
(10.88)

To complete this facet of our discussion we must show that the estimator of β2 as obtained in Eq. (10.87) and as obtained in Eq. (10.6) or Eq. (10.9) are identical; in addition, we must show that the residuals obtained using the estimator in Eq. (10.6) and those obtained from the estimator in Eq. (10.87) are identical.

To show the validity of the first claim, denote the OLS estimator as originally obtained in Eq. (10.6) by \(\tilde{\beta }\) to distinguish it from the estimator examined in the current discussion. By Corollary 10.3, its distribution is given by
$$\displaystyle{ \tilde{\beta } \sim N(\beta,{\sigma }^{2}B),\ \ B = \left [\begin{array}{cc} B_{11} & B_{12} \\ B_{21} & B_{22} \end{array} \right ] = {({X}^{^{\prime}}X)}^{-1}. }$$
(10.89)
By the property of the multivariate normal given in Eq. (10.26), the marginal distribution of \(\tilde{\beta }_{2}\) is given by
$$\displaystyle{ \tilde{\beta }_{2} \sim N(\beta _{2},{\sigma }^{2}B_{ 22}). }$$
(10.90)
From Proposition 2.31, pertaining to the inverse of a partitioned matrix, we find that
$$\displaystyle{ B_{22} = {[X_{2}^{^{\prime}}(I_{ T} - X_{1}{(X_{1}^{^{\prime}}X_{ 1})}^{-1}X_{ 1}^{^{\prime}})X_{ 2}]}^{-1} = {(X_{ 2}^{{\ast}^{\prime}}X_{ 2}^{{\ast}})}^{-1}, }$$
(10.91)
thus proving that \(\hat{\beta }_{2}\) as exhibited in Eq. (10.87) of the preceding discussion is indeed the OLS estimator of the parameter β2, since, evidently, \(\tilde{\beta }_{2}\) of Eq. (10.90) has precisely the same distribution.
To show the validity of the second claim, requires us to show that
$$\displaystyle{ I_{T} - X{({X}^{^{\prime}}X)}^{-1}{X}^{^{\prime}} = N_{ 1} - X_{2}^{{\ast}}{(X_{ 2}^{{\ast}^{\prime}}X_{ 2}^{{\ast}})}^{-1}X_{ 2}^{{\ast}^{\prime}}, }$$
(10.92)
thus demonstrating that the residuals as obtained in Eq. (10.88) and as obtained in Eq. (10.8) are, in fact, identical.
The OLS residuals obtained from the estimator in Eq. (10.6) are given by
$$\displaystyle{ \tilde{u} = y - X\tilde{\beta } = [I_{T} - X{({X}^{^{\prime}}X)}^{-1}{X}^{^{\prime}}]u. }$$
(10.93)
Using the notation X = (X 1,X 2), as in the previous discussion, we find
$$\displaystyle{ I_{T} -X{({X}^{^{\prime}}X)}^{-1}{X}^{^{\prime}} = I_{ T} - [X_{1}B_{11}X_{1}^{^{\prime}} + X_{ 1}B_{12}X_{2}^{^{\prime}} + X_{ 2}B_{21}X_{1}^{^{\prime}} + X_{ 2}B_{22}X_{2}^{^{\prime}}]. }$$
(10.94)
From Proposition 2.3 and Corollary 5.5,
$$\displaystyle\begin{array}{rcl} B_{11}& =& {[X_{1}^{^{\prime}}X_{ 1} - X_{1}^{^{\prime}}X_{ 2}{(X_{2}^{^{\prime}}X_{ 2})}^{-1}X_{ 2}^{^{\prime}}X_{ 1})]}^{-1} = {(X_{ 1}^{^{\prime}}X_{ 1})}^{-1} \\ & & \ \ \ +{(X_{1}^{^{\prime}}X_{ 1})}^{-1}X_{ 1}^{^{\prime}}X_{ 2}{(X_{2}^{^{\prime}}N_{ 1}X_{2})}^{-1}X_{ 2}^{^{\prime}}X_{ 1}{(X_{1}^{^{\prime}}X_{ 1})}^{-1} \\ B_{12}& =& -{(X_{1}^{{\prime}}X_{ 1})}^{-1}X_{ 1}^{^{\prime}}X_{ 2}{(X_{2}^{^{\prime}}N_{ 1}X_{2})}^{-1},\ \ \ B_{ 21} = B_{12}^{^{\prime}}, \\ B_{22}& =& {(X_{2}^{^{\prime}}N_{ 1}X_{2})}^{-1} = {(X_{ 2}^{{\ast}^{\prime}}X_{ 2}^{{\ast}})}^{-1}. {}\end{array}$$
(10.95)
Substituting the preceding expressions in the right member of Eq. (10.93), we can render the standard OLS residuals of Eq. (10.93) as
$$\displaystyle{ \tilde{u} = [N_{1}-N_{1}X_{2}{(X_{2}^{^{\prime}}N_{ 1}X_{2})}^{-1}X_{ 2}^{^{\prime}}N_{ 1}]u = [I_{T}-N_{1}X_{2}{(X_{2}^{^{\prime}}N_{ 1}X_{2})}^{-1}X_{ 2}^{^{\prime}}N_{ 1}]N_{1}u. }$$
(10.96)
This provides a constructive proof that the residuals from the regression of y on X 1 and N 1 X 2 are precisely the same (numerically) as the residuals of the regression of y on X 1 and X 2.

Remark 10.6.

If we had proved in this volume the projection theorem, the preceding argument would have been quite unnecessary. This is so because the OLS procedure involves the projection of the vector y on the subspace spanned by the columns of the matrix (X 1,X 2), which are, by assumption, linearly independent. Similarly, the regression of y on (X 1,N 1 X 2) involves a projection of the vector y on the subspace spanned by the columns of (X 1,N 1 X 2). But the latter is obtained by a Gram-Schmidt orthogonalization procedure on the columns of the matrix (X 1,X 2). Thus, the two matrices span precisely the same subspace. The projection theorem also states that any vector in a T-dimensional Euclidean space can be written uniquely as the sum of two vectors, one from the subspace spanned by the columns of the matrix in question and one from the orthogonal complement of that subspace. Because the subspaces in question are identical, so are their orthogonal complements. The component that lies in the orthogonal complement is simply the vector of residuals from the corresponding regression.

Remark 10.7.

The results exhibited in Eqs. (10.6) and (10.87) imply the following computational equivalence. If we are not interested in β1, and we merely wish to obtain the OLS estimator of β2 in such a way that we can construct confidence intervals and test hypotheses regarding the parameters therein, we can operate exclusively with the model

$$\displaystyle{ N_{1}y = N_{1}X_{2}\beta _{2} + N_{1}u,\ \ \mbox{ noting that}\ \ N_{1}X_{1} = 0. }$$
(10.97)
From the standard theory of the GLM, the OLS estimator of β2 in Eq. (10.97) is
$$\displaystyle{ \hat{\beta }_{2} = {(X_{2}^{{\ast}^{\prime}}X_{ 2}^{{\ast}})}^{-1}X_{ 2}^{{\ast}^{\prime}}N_{ 1}y, }$$
(10.98)
and the vector of residuals is given by
$$\displaystyle{ \hat{u} = [I_{T} - X_{2}^{{\ast}}{(X_{ 2}^{{\ast}^{\prime}}X_{ 2}^{{\ast}})}^{-1}X_{ 2}^{{\ast}^{\prime}}]N_{ 1}u, }$$
(10.99)
both of which are identical to the results obtained from the regression of y on (X 1,N 1 X 2), or on (X 1,X 2).

10.8 Multiple GLM

In this section we take up the case where one has to deal with a number of GLM, that are somehow related but not in any obvious way. The term, multiple GLM, is not standard; in the literature of econometrics the prevailing term is Seemingly Unrelated Regressions (SUR). In some sense it is the intellectual precursor to Panel Data Models, a subject we shall take up in the next chapter.

This topic arose in early empirical research that dealt with disaggregated investment functions at the level of the firm. Thus, suppose a GLM is an appropriate formulation of the investment activity for a given firm, i, say
$$\displaystyle{ y_{t(i)} = x_{t\cdot }^{i}{\beta }^{(i)} + u_{ t(i)}. }$$
(10.100)
Suppose further the investigator wishes to deal with a small but fixed number of firms, say m. The explanatory variables in x t(i) need not have anything in common with those in x t(j),ij, although they may; the vectors of coefficients need not be the same for all firms and may indeed have little in common. However, by the nature of the economic environment the error terms may be correlated across firms, since they all operate in the same (macro) economic environment. We may write the observations on the ith firm as
$$\displaystyle{ y_{\cdot i} = {X}^{i}\beta _{ \cdot i} + u_{\cdot i},\quad i = 1,2,\ldots m, }$$
(10.101)
where y i is a T element column vector,5 X i is a T × k i matrix of observations on the explanatory variables, βi is the k i -element column vector of the regression parameters and u i is the T-element column vector of the errors. Giving effect to the observation that all firms operate in the same economic environment, we are prepared to assume that
$$\displaystyle{ \mathrm{Cov}(u_{t(i)}u_{t(j)}) = \sigma _{ij}\neq 0, }$$
(10.102)
for all t. All other standard conditions of the GLM continue in force, for each firm.
We could estimate all m GLM seriatim, as we discussed above, obtain estimators and make inferences. If we did so, however, we would be ignoring the information, or condition, exhibited in Eq. (10.102), and this raises the question of whether what we are doing is optimal. To address this issue, write the system in Eq. (10.101) as
$$\displaystyle\begin{array}{rcl} y = {X}^{{\ast}}{\beta }^{{\ast}} + u,& & y = {(y_{ \cdot 1}^{{\prime}},y_{ \cdot 2}^{{\prime}},\ldots,y_{ \cdot m}^{{\prime}})}^{{\prime}},\quad u = {(u_{ \cdot 1}^{{\prime}},u_{ \cdot 2}^{{\prime}},\ldots,u_{ \cdot m}^{{\prime}})}^{{\prime}},\quad \mbox{ where} \\ & & {X}^{{\ast}} = \mathrm{diag}({X}^{1},{X}^{2},\ldots,{X}^{m}),\quad {\beta }^{{\ast}} = {(\beta _{ \cdot 1}^{{\prime}},\beta _{ \cdot 2}^{{\prime}},\ldots,\beta _{ \cdot m}^{{\prime}})}^{{\prime}},{}\end{array}$$
(10.103)
and note that
$$\displaystyle{ \mathrm{Cov}(u{u}^{{\prime}}) = (Eu_{ \cdot i}u_{\cdot j}^{{\prime}}) = (\sigma _{ ij}I_{T}) = \Sigma \otimes I_{T} = \Psi. }$$
(10.104)
Also y is an mT-element column vector as is u, X is an mT × k matrix, \(k =\sum _{ i=1}^{m}k_{i}\) and β is a k-element column vector.
In view of the fact that the system in Eq. (10.103) is formally a GLM, the efficient estimator of its parameters is the Aitken estimator when Ψ is known and, when not the generalized least squares (GLS) estimator. The latter is given by
$$\displaystyle{ \hat{{\beta }}^{{\ast}} = {({X}^{{\ast}{\prime}}\tilde{{\Psi }}^{-1}{X}^{{\ast}})}^{-1}{X}^{{\ast}{\prime}}\tilde{{\Psi }}^{-1}y = {\beta }^{{\ast}} + {({X}^{{\ast}{\prime}}\tilde{{\Psi }}^{-1}{X}^{{\ast}})}^{-1}{X}^{{\ast}{\prime}}\tilde{{\Psi }}^{-1}u, }$$
(10.105)
where \(\tilde{\Psi }\) is a consistent estimator of Ψ. Since the latter, an mT × mT matrix, contains a fixed number of parameters viz. the elements of the m × m symmetric matrix Σ, this estimator is feasible. Indeed, we can estimate seriatim each of the m GLM by least squares (OLS) obtain the residuals
$$\displaystyle{\tilde{u}_{\cdot i} = y_{\cdot i} - {X}^{i}\tilde{\beta }_{ \cdot i}}$$
and thus obtain the consistent estimator
$$\displaystyle{ \tilde{\Psi } =\tilde{ \Sigma } \otimes I_{T},\quad \tilde{\Sigma } = (\tilde{\sigma }_{ij}),\quad \mbox{ where}\quad \tilde{\sigma }_{ij} = \frac{\tilde{u}_{\cdot i}^{{\prime}}\tilde{u}_{\cdot j}} {T},\quad i,j = 1,2,\ldots,m. }$$
(10.106)
Since
$$\displaystyle\begin{array}{rcl} \tilde{\sigma }_{ij}& =& \frac{\tilde{u}_{\cdot i}^{{\prime}}\tilde{u}_{\cdot j}} {T} = \frac{u_{\cdot i}^{{\prime}}u_{\cdot j}} {T} + Q_{T} {}\\ Q_{T} = \frac{u_{\cdot i}^{{\prime}}({N}^{i}{N}^{j} - {N}^{i} - {N}^{j})u_{\cdot j}} {T},\quad {N}^{i}& =& {X}^{i}{({X}^{i{\prime}}{X}^{i})}^{-1}{X}^{i{\prime}},\quad i = 1,2,\ldots m, {}\\ \end{array}$$
it follows by the standard assumptions of the GLM that plim T Q T = 0. It also follows from the standard assumptions of the GLM that u i u j T obeys the SLLN, see Sect.  9.3.1 on the applicability of the latter to sequences of iid rvs with finite mean. Consequently,
$$\displaystyle{ \tilde{\sigma }_{ij}\stackrel{\mathrm{P}}{\rightarrow }\sigma _{ij},\quad \mbox{ and thus}\quad \tilde{\Psi }\stackrel{\mathrm{P}}{\rightarrow }\Psi. }$$
(10.107)
Using exactly the same arguments as above we can establish the limiting (asymptotic) distribution of the GLS estimator as
$$\displaystyle{ \sqrt{T}(\hat{{\beta }}^{{\ast}}- {\beta }^{{\ast}})\stackrel{\mathrm{d}}{\rightarrow }N(0,\Phi ),\quad \Phi =\mathop{ \mathrm{plim}}_{ T\rightarrow \infty }{\left (\frac{{X}^{{\ast}{\prime}}{\Psi }^{-1}{X}^{{\ast}}} {T} \right )}^{-1}. }$$
(10.108)
In the discussion above, see Eq. (10.68), we have already shown, in the finite T case, that the GLS estimator is efficient relative to the OLS estimator.

Remark 10.8.

Perhaps the development of the argument in this section will help explain the designation of such models as Seemingly Unrelated Regressions (SUR) and justify their special treatment.

Footnotes

  1. 1.

    The term estimator recurs very frequently in econometrics; just to fix its meaning in this chapter and others we define it as: an estimator is a function of the data only (xs and ys), say h(y,X) that does not include unknown parameters.

  2. 2.

    The term consistent generally means that as the sample, T, tends to infinity the estimator converges to the parameter it seeks to estimate. Since the early development of econometrics it meant almost exclusively convergence in probability. This is the meaning we shall use in this and other chapters, i.e. an estimator is consistent for the parameter it seeks to estimate if it converges to it in any fashion that implies convergence in probability.

  3. 3.

    In the context of Eq. (10.79) the notation ∼ is to be read “behaves like”.

  4. 4.

    Occasionally this test is referred to as a test of significance of R 2.

  5. 5.

    The sample size is assumed to be the same for all firms.

Bibliography

  1. Anderson, T.W. and H. Rubin (1949), Estimation of the Parameters of a Single Equation in a Complete System of Stochastic Equations, Annals of Mathematical Statistics, pp. 46–63.Google Scholar
  2. Anderson, T.W. and H. Rubin (1950), The Asymptotic Properties of Estimates of Parameters of in a Complete System of Stochastic Equations, Annals of Mathematical Statistics, pp. 570–582.Google Scholar
  3. Balestra, P., & Nerlove, M. (1966). Pooling cross section time series data in the estimation of a dynamic model: The demand for natural gas. Econometrica, 34, 585–612.CrossRefGoogle Scholar
  4. Bellman, R. G. (1960). Introduction to matrix analysis. New York: McGraw-Hill.MATHGoogle Scholar
  5. Billingsley, P. (1968). Convergence of probability measures. New York: Wiley.MATHGoogle Scholar
  6. Billingsley, P. (1995). Probability and measure (3rd ed.). New York: Wiley.MATHGoogle Scholar
  7. Brockwell, P. J., & Davis, R. A. (1991). Time series: Theory and methods (2nd ed.). New York: Springer-Verlag.CrossRefGoogle Scholar
  8. Chow, Y. S., & Teicher, H. (1988). Probability theory (2nd ed.). New York: Springer-Verlag.CrossRefMATHGoogle Scholar
  9. Dhrymes, P. J. (1969). Alternative asymptotic tests of significance and related aspects of 2SLS and 3SLS estimated parameters. Review of Economic Studies, 36, 213–226.CrossRefGoogle Scholar
  10. Dhrymes, P. J. (1970). Econometrics: Statistical foundations and applications. New York: Harper and Row; also (1974). New York: Springer-Verlag.Google Scholar
  11. Dhrymes, P. J. (1973). Restricted and Unrestricted Reduced Forms: Asymptotic Distributions and Relative Efficiencies, Econometrica, vol. 41, pp. 119–134.MathSciNetCrossRefMATHGoogle Scholar
  12. Dhrymes, P. J. (1978). Introductory economics. New York: Springer-Verlag.CrossRefGoogle Scholar
  13. Dhrymes, P.J. (1982) Distributed Lags: Problems of Estmation and Formulation (corrected edition) Amsterdam: North HollandGoogle Scholar
  14. Dhrymes, P. J. (1989). Topics in advanced econometrics: Probability foundations. New York: Springer-Verlag.CrossRefMATHGoogle Scholar
  15. Dhrymes, P. J. (1994). Topics in advanced econometrics: Volume II linear and nonlinear simultaneous equations. New York: Springer-Verlag.CrossRefMATHGoogle Scholar
  16. Hadley, G. (1961). Linear algebra. Reading: Addison-Wesley.MATHGoogle Scholar
  17. Kendall, M. G., & Stuart, A. (1963). The advanced theory of statistics. London: Charles Griffin.Google Scholar
  18. Kendall M. G., Stuart, A., & Ord, J. K. (1987). Kendall’s advanced theory of statistics. New York: Oxford University Press.MATHGoogle Scholar
  19. Kolassa, J. E. (1997). Series approximation methods in statistics (2nd ed.). New York: Springer-Verlag.CrossRefMATHGoogle Scholar
  20. Sims, C.A. (1980). Macroeconomics and Reality, Econometrica, vol. 48, pp.1–48.CrossRefGoogle Scholar
  21. Shiryayev, A. N. (1984). Probability. New York: Springer-Verlag.CrossRefMATHGoogle Scholar
  22. Stout, W. F. (1974). Almost sure convergence. New York: Academic.MATHGoogle Scholar
  23. Theil, H. (1953). Estimation and Simultaneous Correlation in Complete Equation Systems, mimeograph, The Hague: Central Plan Bureau.Google Scholar
  24. Theil, H. (1958). Economic Forecasts and Policy, Amsterdam: North Holland.Google Scholar

Copyright information

© the Author 2013

Authors and Affiliations

  • Phoebus J. Dhrymes
    • 1
  1. 1.Department of EconomicsColumbia UniversityNew YorkUSA

Personalised recommendations