1 Introduction

This paper describes and discusses an estimator for a linear time series model with time-varying coefficients. Such a model, the varying coefficients model, or “VC model” for short, generalizes the standard linear model. The standard model assumes that the coefficients giving the influence of the independent variables on the dependent variable remain constant. In the VC model, these coefficients are permitted to change over time.

The VC model poses the statistical problem of determining the smoothing parameters, variances, or splines, that are needed to model the movements of the coefficients over time. Schlicht (1989) and Schlicht and Ludsteck (2006) have proposed an estimation method—the VC method—that is specifically tailored to the case that the time-varying coefficients are generated by a random walk with normal disturbances. The present paper generalizes this approach to the non-Gaussian case.Footnote 1

The paper is organized as follows. In Sect. 1, the model is described and some motivation is provided by drawing on a discussion in economics. The “criteria” or “penalty” approach is explained that permits to estimate the time-paths of the coefficients in a purely descriptive way. A stochastic formulation is given in Sect. 2 which presupposes knowledge of the variances of the disturbances. It is shown that the descriptive approach outlined in Sect. 1 can be justified statistically by estimating the time averages of the coefficients in the corresponding linear GLS model with constant coefficients, again for given variances of the disturbances. Section 3 turns to the estimation of these variances. Two closely related estimators are explained: a moments estimator and a likelihood estimator. The likelihood estimator presupposes Gaussian disturbances while the moments estimator does not require this assumption. The relationship between these estimators is discussed. Section 4 provides some illustration of the way the model works and ponders some methodological issues. A conclusion follows.

2 The varying coefficients model in descriptive mode

This section describes the varying coefficients model and discusses the motivation for using it, as it emerged in economics (Sect. 1.1). Some features of the proposed method will be previewed in Sect. 1.2. The notation is introduced in Sect. 1.3 and the “criteria” or “penalty” approach is described that permits to estimate the development of the relations between the independent variable and the dependent variables over time in a purely descriptive way (Sect. 1.4).

2.1 The linear theoretical model and its empirical application

Consider a theory stating the dependent variable y as a linear function of some independent variables \(x_{1},x_{2},\ldots ,x_{n}\):

$$\begin{aligned} y=a_{1}x_{1}+a_{2}x_{2}+\cdots +a_{n}x_{n}. \end{aligned}$$
(1.1)

The coefficients \(a_{1},a_{2},\ldots ,a_{n}\) give the influence of the independent variables.

If we have T observations \(y_{t}\), \(x_{1,t}\), \(x_{2,t}\),...\(,x_{n,t}\) with \(t=1,2,\ldots ,T\) denoting the time of an observation, we can try to estimate the theoretical coefficients \(a_{1},a_{2},\ldots ,a_{n}\) by a standard linear regression. In order to do that, we have to add an error term \(u_{t}\) to capture discrepancies of the empirical from the theoretical regularity due to measurement errors etc. and obtain

$$\begin{aligned} y_{t}=a_{1}x_{1,t}+a_{2}x_{2,t}+\cdots +a_{n}x_{n,t}+u_{t},\,\,\,t=1,2,\ldots ,T. \end{aligned}$$
(1.2)

In many cases it appears improbable, however, that outside influences not captured in the theoretical model affect only the disturbance term, and not the coefficients themselves. In the case of economics, we may think of changes in technology, preferences, market structure, and the composition of aggregates. All change over time and may affect the coefficients themselves.

In economics, the problem of possibly time-varying coefficients was the subject of the famous Keynes–Tinbergen controversy around 1940.Footnote 2 While Tinbergen (1940, 153) defended the use of regression analysis with the argument that in “many cases only small changes in structure will occur in the near future”, Keynes (1973, 294) objected that “the method requires not too short a series whereas it is only in a short series, in most cases, that there is a reasonable expectation that the coefficients will be fairly constant.”

It appears that both arguments are correct. The VC model takes care of both by assuming that the coefficients change slowly over time: They are highly auto-correlated. This is formalized by a random walk (Athans 1974; Cooley and Prescott 1973; Schlicht 1973). If \(a_{i,t}\) denotes the state of coefficient \(a_{i}\) at time t, it is assumed that

$$\begin{aligned} a_{i,t+1}= & {} a_{i,t}+v_{i,t} \end{aligned}$$
(1.3)

with the disturbance term \(v_{i,t}\) of expectation zero and with variance \(\sigma _{i}^{2}\). The assumption of expectation zero formalizes the idea that “the coefficients will be fairly constant” in the short run, while the variance \(\sigma _{i}^{2}\) is a measure of the stability of coefficient i and is to be estimated. For \(\sigma _{i}^{2}=0\) for some i,  the case of a constant (time-invariant) coefficients is covered as well. As a consequence, the standard linear model is replaced by

$$\begin{aligned} y_{t}= & {} a_{1,t}x_{1,t}+a_{2,t}x_{2,t}+\cdots +a_{n,t}x_{n,t}+u_{t},\quad E\left\{ u_{t}\right\} =0,\;\; E\{ u_{t}^{2}\} =\sigma ^{2}, \end{aligned}$$
(1.4)
$$\begin{aligned} a_{i,t+1}= & {} a_{i,t}+v_{i,t},\;\; E\{ v_{i,t}\} =0,\;\; E\{ v_{i,t}^{2}\} =\sigma _{i}^{2},\;\; i=1,2,\ldots ,n,\;\; t=1,2,\ldots ,T. \end{aligned}$$
(1.5)

This is the VC model that is presupposed in the following.

2.2 Properties of the VC method

The VC method that will be developed in this paper estimates the expected time-paths of the coefficients \(a_{i,t}\) for given observations \(x_{i,t}\) and \(y_{t}\) with \(i=1,2,\ldots ,n\) and \(T=1,2,\ldots ,T\). It can be viewed as a straightforward generalization of the method of least squares.

  • While the method of ordinary least squares selects estimates that minimize the sum of squared disturbances \(\sum _{t=1}^{T}u_{t}^{2}\) in the equation, VC selects estimates that minimize the sum of squared disturbances in the equation and a weighted sum of squared disturbances in the coefficients, i.e. \(\sum _{t=1}^{T}u_{t}^{2}+\gamma _{1}\sum _{t=2}^{T}v_{1,t}^{2}+\gamma _{2}\sum _{t=2}^{T}v_{2,t}^{2}+\cdots +\gamma _{n}\sum _{t=2}^{T}v_{n,t}^{2}\), where the weights for the changes in the coefficients \(\gamma _{1},\gamma _{2},\ldots ,\gamma _{n}\) are determined by the inverse variance ratios, i.e. \(\gamma _{i}=\sigma ^{2}/\sigma _{i}^{2}\). In other words, it balances the desiderata of a good fit and parameter stability over time.

  • Estimation can proceed by focusing on some selected coefficients and keeping the remaining coefficients constant over time. This is done by keeping the corresponding variances \(\sigma _{i}^{2}\) close to zero, rather than estimating them. (If all coefficients are frozen in this manner, the OLS result is obtained.)

  • The time-averages of the regression coefficients \(\frac{1}{T}\sum _{t}a_{t}\) are GLS estimates of the corresponding regression with fixed coefficients.

  • The VC method does not require initial values for the initial state and the initial variances. Rather all states and variances are estimated in an integrated unified procedure. This is an advantage over Kalman filtering which is typically quite sensitive to the choice of initial values, especially when dealing with shorter time series.

  • The VC method links the purely descriptive method of employing non-parametric splines through penalized least squares with an explicit statistical model with random-walk coefficients. This offers the possibility of model-based estimation.

  • All estimates are moments estimates. It is not necessary to presuppose Gaussian disturbances.

  • For increasing sample sizes T and under the assumption that all disturbances are normally distributed, the moments estimates approach the maximum likelihood estimates.

2.3 Notation and basic assumptions

All vectors are conceived as column vectors, and their transposes are indicated by an apostrophe. The observations at time t are \(x'_{t}=\) \(\left( x_{1,t},x_{2,t},\ldots ,x_{n,t}\right)\) and \(y_{t}\) for \(t=1,2,\ldots ,T\). We write

$$\begin{aligned}&\begin{array}{ccccc} y\,=\,\left( \begin{array}{c} y_{1}\\ y_{2}\\ .\\ .\\ y_{T} \end{array}\right) , &{} &{} x\,=\,\left( \begin{array}{c} x'_{1}\\ x'_{2}\\ .\\ .\\ x'_{T} \end{array}\right) , &{} &{} X\,=\,\left( \begin{array}{ccccc} x'_{1} &{} &{} &{} &{} 0\\ &{} x'_{2}\\ &{} &{} .\\ &{} &{} &{} .\\ 0 &{} &{} &{} &{} x'_{T} \end{array}\right) ,\\ \\ {\mathrm {order}}\quad T &{} &{} \quad T\times n &{} &{} \quad T\times Tn \end{array}\\&\begin{array}{cccccc} a_{t}\,=\,\left( \begin{array}{c} a_{1,t}\\ a_{2,t}\\ .\\ .\\ a_{n,t} \end{array}\right) , &{} &{} a\,=\,\left( \begin{array}{c} a_{1}\\ a_{2}\\ .\\ .\\ a_{T} \end{array}\right) , &{} &{} v_{t}\,=\,\left( \begin{array}{c} v_{1,t}\\ v_{2,t}\\ .\\ .\\ v_{n,t} \end{array}\right) , &{} v\,=\,\left( \begin{array}{c} v_{1}\\ v_{2}\\ v_{3}\\ .\\ .\\ v_{T-1} \end{array}\right) ,\\ \\ {\mathrm {order}}\,\,\,\,\,n\,\,\, &{} &{} \,\,\,\,\,\,Tn &{} &{} \,\,\,\,\,\,\,\,\,\,\,n\,\,\, &{} \,\,\,\,\,\left( T-1\right) n \end{array} \end{aligned}$$

We write further

$$\begin{aligned} \begin{array}{ccc} \varSigma \,=\,{\mathrm {diag}}\left( \begin{array}{c} \sigma _{1}^{2}\\ \sigma _{2}^{2}\\ .\\ .\\ \sigma _{n}^{2} \end{array}\right) &{} = &{} \left( \begin{array}{cccccc} \sigma _{1}^{2} &{} 0 &{} &{} &{} &{} 0\\ 0 &{} \sigma _{2}^{2}\\ &{} &{} .\\ &{} &{} &{} &{} . &{} 0\\ 0 &{} &{} &{} &{} 0 &{} \sigma _{n}^{2} \end{array}\right) \\ \\ {\mathrm {order}}\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, &{} &{} n\times n \end{array} \end{aligned}$$

and define

$$\begin{aligned} \begin{array}{cc} p\,=\left( \begin{array}{cccccc} -1 &{} 1 &{} 0 &{} &{} &{} 0\\ 0 &{} -1 &{} 1 &{} 0\\ &{} &{} . &{} .\\ &{} &{} &{} . &{} . &{} 0\\ 0 &{} &{} &{} 0 &{} -1 &{} 1 \end{array}\right) , &{} P\,=\,p\otimes I_{n}\,=\,\left( \begin{array}{cccccc} -I_{n} &{} I_{n} &{} &{} &{} &{} 0\\ &{} -I_{n} &{} I_{n}\\ &{} &{} . &{} .\\ &{} &{} &{} . &{} .\\ 0 &{} &{} &{} &{} -I_{n} &{} I_{n} \end{array}\right) \\ \\ {\mathrm {order}}\,\,\,\,\,\,\left( T-1\right) \times T\,\,\,\, &{} \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\left( T-1\right) n\times Tn \end{array} \end{aligned}$$
(1.6)

with \(I_{n}\) denoting the identity matrix of order n and \(\otimes\) indicating the Kronecker product operator. Note that p and P are of full rank.

The model is obtained by writing Eqs. (1.4) and (1.5) in matrix form:

The model

$$\begin{aligned} \begin{array}{c} y\end{array}= & {} Xa+u,\begin{array}{ccccc}&E\left\{ u\right\} \,=\,0,&\,&E\left\{ uu'\right\} \,=\,\sigma ^{2}I_{T}\end{array}, \end{aligned}$$
(1.7)
$$\begin{aligned} \begin{array}{c} Pa\end{array}= & {} v,\begin{array}{ccc}&\,&\,\,\,\,\,\,\,\,\,\,\,\,\,\,E\left\{ v\right\} \,=\,0,\end{array}\,\,\,\,\,\,\,\,\,E\left\{ vv'\right\} \,=\,V\,=\,I_{T-1}\otimes \varSigma . \end{aligned}$$
(1.8)

Note that the explanatory variables X are taken as predetermined, rather than stochastic.

Regarding the observations X and y we assume that a perfect fit of the model to the data is not possible:

Assumption 1

\(Pa=0\,\,{\mathrm {implies}}\,\,y\ne Xa\).

This assumption rules out the (trivial) case that the standard linear model (1.2) fits the empirical data perfectly, a case that cannot reasonably be expected to occur in practical applications. Further, the assumption implies that the number of observations exceeds the number of coefficients to be estimated:

$$\begin{aligned} T > n. \end{aligned}$$
(1.9)

2.4 Least squares

In a descriptive spirit, the time-paths of the coefficients can be determined by following the penalized least squares approach, where some criteria are employed that formalize some descriptive desiderata.Footnote 3 In the case at hand, the desiderata are that the model fits the data well and that the coefficients change only slowly over time—u and v ought to be as small as possible. The sum of the squared errors \(u'u\) is taken as a criterion for the goodness of fit of Eq. (1.7), the weighted sum of the squared changes of the coefficients \(v_{i}'v_{i}\) over time give criteria for the stability of the coefficients over time. The combination of all these criteria gives an overall criterion that combines the desiderata of a good fit and stability of coefficients over time. The weights \(\left( \gamma _{1},\gamma _{2},\ldots ,\gamma _{n}\right)\) give the relative importance of the stability of the coefficients over time, where weight \(\gamma _{i}\) relates to coefficient \(a_{i}\). For the time being, these weights are taken as given but will later be estimated, too.

Write

$$\begin{aligned} \varGamma= & {} {\mathrm {diag}}\left( \begin{array}{ccccc} \gamma _{1},&\gamma _{2},&.,&.,&\gamma _{n}\end{array}\right) \end{aligned}$$
(1.10)

and

$$\begin{aligned} G= & {} I_{T-1}\otimes \varGamma . \end{aligned}$$
(1.11)

Adding the sum of squares \(u'u\) and the weighted sum of squares \(v'Gv\) gives the overall criterion

$$\begin{aligned} Q= & {} u'u+v'Gv. \end{aligned}$$
(1.12)

This expression is to be minimized under the constraints given by the model (1.7), (1.8) with the observations X and y :

$$\begin{aligned} u= & {} y-Xa, \end{aligned}$$
(1.13)
$$\begin{aligned} v= & {} Pa. \end{aligned}$$
(1.14)

This determines the time-paths of the coefficients a that optimize this criterion. Hence we can write

$$\begin{aligned} Q= & {} \left( y-Xa\right) '\left( y-Xa\right) +a'P'GPa. \end{aligned}$$
(1.15)

The weighted sum of squares Q is the sum of two positive semi-definite quadratic forms. Assumption 1 rules out the case that Q can be zero. Hence Q is positive definite and of full rank. The first order condition for a minimizing a is

$$\begin{aligned} \frac{\partial Q}{\partial a}= & {} -2Xy+2\left( X'X+P'GP\right) a=0 \end{aligned}$$
(1.16)

and the second order condition is that the Jacobian

$$\begin{aligned} \frac{\partial ^{2}Q}{\partial a\partial a'}= & {} 2\left( X'X+P'GP\right) \end{aligned}$$
(1.17)

is positive definite, which is the case. Solving (1.16) for a and plugging this into (1.13) and (1.14) gives the estimates

$$\begin{aligned} a_{LS}= & {} \left( X'X+P'GP\right) ^{-1}X'y \end{aligned}$$
(1.18)
$$\begin{aligned} u_{LS}= & {} \left( I_{T}-X\left( X'X+P'GP\right) ^{-1}X'\right) y\end{aligned}$$
(1.19)
$$\begin{aligned} v_{LS}= & {} P\left( X'X+P'GP\right) ^{-1}X'y \end{aligned}$$
(1.20)

where the subscript LS stands for “least squares”.

3 The varying coefficients model in stochastic mode

This section considers the statistical treatment of the VC model under the assumption that the variances of the disturbances are known. With the parametrization outlined in Sect. 2.1, the VC model gives rise to a GLS (Aitken) model that permits to estimate the time-averages of the coefficients. With these estimates, the conditional expectations for the coefficients \(a_{i,t}\) for given observations X and y can be determined (Sect. 2.2). If the weights chosen for the descriptive estimation outlined in Sect. 1.4 are equal to the inverse variance ratios, the descriptive estimation and the conditional expectation coincide (Sect. 2.3).

3.1 Orthogonal parametrization

For purposes of estimation we need a model that explains the observation y as a function of the observations X and the random variables u and v. This would permit calculating the probability distribution of the observations y contingent on the parameters of the distributions of u and v, viz \(\sigma ^{2}\) and \(\varSigma\). The true model does not permit such an inference, though, because the matrix P is of rank \(\left( T-1\right) n\) rather than of rank Tn and cannot be inverted. Hence v does not determine a unique a but rather the set of solutions

$$\begin{aligned} A{:}{=}\left\{ \left. a=\tilde{P}v+Z\beta \right| \beta \in \mathbb {R}^{n}\right\} \end{aligned}$$
(2.1)

with \(\beta\) as a shift parameter,

$$\begin{aligned} \tilde{P}&{:}{=}&P'\left( PP'\right) ^{-1} \end{aligned}$$
(2.2)

as the right-hand pseudo-inverse of P given in (1.6) of order \(Tn\times \left( T-1\right) n\), and the matrix

$$\begin{aligned} Z{:}{=}\left( \begin{array}{c} I_{n}\\ I_{n}\\ .\\ I_{n} \end{array}\right) \end{aligned}$$
(2.3)

of order \(Tn\times n\). It is orthogonal to P:

$$\begin{aligned} PZ= & {} 0 \end{aligned}$$

with the square matrix \(\left( P',Z\right)\) of full rank. For any v we have \(a\in A\Leftrightarrow Pa=v\). Hence Eq. (1.7) and the set (2.1) give equivalent descriptions of the relationship between a and v.

Note that

$$\begin{aligned} Z'Z=T\cdot I_{n},\;\,\,\,\,P'\left( PP'\right) ^{-1}P+ZZ'=I_{Tn}. \end{aligned}$$
(2.4)

Regarding the matrices P, \(\tilde{P}\), and Z we have

$$\begin{aligned} \begin{array}{ccccc} P\tilde{P} &{} = &{} \tilde{P'}P' &{} = &{} I_{\left( T-1\right) n}\\ \tilde{P}P &{} = &{} P'\tilde{P}' &{} = &{} \,\,I_{Tn}-ZZ'\\ Z'\tilde{P} &{} = &{} \tilde{P'}Z &{} = &{} 0. \end{array} \end{aligned}$$
(2.5)

In view of (2.1), any solution a to \(Pa=v\) can be written as

$$\begin{aligned} a=\tilde{P}v+Z\,\beta \end{aligned}$$
(2.6)

for some \(\beta \in \mathbb {R}^{n}\). Equation (1.7) can be re-written as

$$\begin{aligned} y=u+X\tilde{P}v+XZ\beta . \end{aligned}$$
(2.7)

The model (2.6), (2.7) will be referred to as the equivalent orthogonally parameterized model. It implies the true model (1.7), (1.8). It implies, in particular, that \(a_{t}\) is a random walk even though \(a_{t}\) depends, according to (2.6), on past and future realizations of \(v_{t}\).

The formal parameter \(\beta\) has a straightforward interpretation. Pre-multiplying (2.6) by \(Z'\) gives

$$\begin{aligned} Z'a= & {} Z'Z\beta =T\beta \end{aligned}$$

and therefore

$$\begin{aligned} \beta= & {} \frac{1}{T}{\displaystyle \sum _{t=1}^{T}a_{t}.} \end{aligned}$$
(2.8)

Hence \(\beta\) gives the averages of the coefficients \(a_{i,t}\) over time.

Equation (2.7) permits calculating the density of y dependent upon the parameters of the distributions of u and v and the formal parameters \(\beta\). In a second step, all these parameters—\(\sigma ^{2}\), \(\varSigma\), and \(\beta\)—can be determined by moments estimators that will be derived in Sect. 3.1.

The orthogonal parametrization (proposed in Schlicht 1985, Sec. 4.3.3 in another context) entails some advantages with respect to symmetry and mathematical transparency, as compared to more usual procedures, such as parametrization by initial values. It permits to derive our moments estimator that does not require normally distributed disturbances, and to write down an explicit likelihood function for the case of normally distributed disturbances that permits estimation of all relevant parameters in a unified one-shot procedure.

The formal parameter vector \(\beta\) relates directly to the coefficient estimates of a standard generalized least squares (GLS, Aitken) regression. Equation (2.7) can be interpreted as a standard regression for this parameter vector with the matrix \(x=XZ\) giving the explanatory variables:

$$\begin{aligned} y=x\beta +w \end{aligned}$$
(2.9)

and the disturbance

$$\begin{aligned} w= & {} X\tilde{P}v+u. \end{aligned}$$
(2.10)

It has expectation zero

$$\begin{aligned} E\left\{ w\right\}= & {} 0 \end{aligned}$$
(2.11)

and covariance

$$\begin{aligned} W{:}{=}E\left\{ ww'\right\} =X\tilde{P}V\tilde{P'}X'+\sigma ^{2}I_{T}. \end{aligned}$$
(2.12)

The Aitken estimate \(\beta _{A}\) satisfies

$$\begin{aligned} x'W^{-1}\left( y-x\beta _{A}\right) =0 \end{aligned}$$
(2.13)

or

$$\begin{aligned} \beta _{A}=\left( x'W^{-1}x\right) ^{-1}x'W^{-1}y. \end{aligned}$$
(2.14)

where the subscript A stands for “Aitken”. As \(x=XZ\) and \(W=X\tilde{P}V\tilde{P'}X'+\sigma ^{2}I_{T}\), Eqs. (2.13) and (2.14) can be written as

$$\begin{aligned} Z'X'\left( X\tilde{P}V\tilde{P'}X'+\sigma ^{2}I_{T}\right) ^{-1}\left( y-XZ\beta _{A}\right)= & {} 0 \end{aligned}$$
(2.15)

and Eq. (2.14) gives rise to

$$\begin{aligned} \beta _{A} =\left( Z'X'\left( X\tilde{P}V\tilde{P'}X'+\sigma ^{2}I_{T}\right) ^{-1}XZ\right) ^{-1}Z'X'\left( X\tilde{P}V\tilde{P'}X'+\sigma ^{2}I_{T}\right) ^{-1}y. \end{aligned}$$
(2.16)

3.2 The filter

This section derives the VC filter which gives the expectation of the coefficients a for given observations X and y, a given shift parameter \(\beta\), and given variances \(\sigma ^{2}\) and \(\varSigma\).

For given \(\beta\) and X, the vectors y and a can be viewed as realizations of random variables determined jointly by the system (2.6), (2.9) as brought about by the disturbances u and v:

$$\begin{aligned} \left( \begin{array}{c} a\\ y \end{array}\right) =\left( \begin{array}{c} Z\\ XZ \end{array}\right) \beta +\left( \begin{array}{cc} \tilde{P} &{} 0\\ X\tilde{P} &{} I_{T} \end{array}\right) \left( \begin{array}{c} v\\ u \end{array}\right) . \end{aligned}$$

The covariance is

$$\begin{aligned} E\left\{ \left( \begin{array}{c} a\\ y \end{array}\right) \left( \begin{array}{cc} a'&y'\end{array}\right) \right\}= & {} \left( \begin{array}{cc} \tilde{P} &{} 0\\ X\tilde{P} &{} I_{T} \end{array}\right) \left( \begin{array}{cc} V &{} 0\\ 0 &{} \sigma ^{2}I_{T} \end{array}\right) \left( \begin{array}{cc} \tilde{P'} &{} \tilde{P'}X'\\ 0 &{} I_{T} \end{array}\right) \\= & {} \left( \begin{array}{cc} \tilde{P}V\tilde{P'} &{} \tilde{P}V\tilde{P'}X'\\ X\tilde{P}V\tilde{P'} &{} X\tilde{P}V\tilde{P'}X'+\sigma ^{2}I_{T} \end{array}\right) . \end{aligned}$$

The marginal distribution of y is as given by (2.9) and (2.12). On this basis, we take our estimate of a as

$$\begin{aligned} a_{A}=Z\beta _{A}+\tilde{P}V\tilde{P'}X'\left( X\tilde{P}V\tilde{P'}X'+\sigma ^{2}I_{T}\right) ^{-1}\left( y-XZ\beta _{A}\right) \end{aligned}$$
(2.17)

which is the expectation of a for the case that u and v are Gaussian and y, \(\beta\), \(\sigma ^{2}\), and \(\varSigma\) are given. (It will turn out later on that \(a_{A}\) is the expectation of a for non-Gaussian disturbances as well, see Eq. (2.27) below.)

Note that the variance-covariance matrix of w, as given in Eq. (2.12), tends to \(\sigma ^{2}I_{T}\) if the the variances \(\sigma _{i}^{2}\) go to zero, and Eq. (2.7) approaches the standard unweighted linear regression. In this sense, the OLS regression model is covered as a special limiting case by the model discussed here.

3.3 Least squares and aitken

The following theorem states that the least squares estimator \(a_{LS}\) and the Aitken estimator \(a_{A}\) coincide if the weights are given by the variance ratios.

Claim 1

\(G=\sigma ^{2}V^{-1}\) implies \(a_{LS}=a_{A}\).

Proof

Consider first the necessary conditions for a minimum of (1.12). The first-order condition (1.16) defines \(a_{LS}\) with weights \(G=\sigma ^{2}V^{-1}\) uniquely and can be written as

$$\begin{aligned} \left( X'X+\sigma ^{2}P'V^{-1}P\right) a_{LS}=X'y. \end{aligned}$$
(2.18)

It will be shown that (2.17) implies

$$\begin{aligned} \left( X'X+\sigma ^{2}P'V^{-1}P\right) a_{A}=X'y, \end{aligned}$$
(2.19)

which will establish the proposition.

Pre-multiplication of (2.17) by \(\left( X'X+\sigma ^{2}P'V^{-1}P\right)\) gives

$$\begin{aligned} \left( X'X+\sigma ^{2}P'V^{-1}P\right) a_{A}= & {} \left( X'X+\sigma ^{2}P'V^{-1}P\right) Z\beta _{A}\\&+\left( X'X+\sigma ^{2}P'V^{-1}P\right) \tilde{P}V\tilde{P'}X'\\&\times \left( X\tilde{P}V\tilde{P'}X+\sigma ^{2}I_{T}\right) ^{-1} \cdot \left( y-XZ\beta _{A}\right) . \end{aligned}$$

Because of \(PZ=0\) this can be written as

$$\begin{aligned} \left( X'X+\sigma ^{2}P'V^{-1}P\right) a_{A}= & {} X'XZ\beta _{A}\\&+X'X\tilde{P}V\tilde{P'}X'\left( X\tilde{P}V\tilde{P'}X+\sigma ^{2}I_{T}\right) ^{-1}\left( y-XZ\beta _{A}\right) \\&+\sigma ^{2}P'\tilde{P'}X'\left( X\tilde{P}V\tilde{P'}X+\sigma ^{2}I_{T}\right) ^{-1}\left( y-XZ\beta _{A}\right) . \end{aligned}$$

Adding and subtracting \(\sigma ^{2}X'\left( X\tilde{P}V\tilde{P'}X+\sigma ^{2}I_{T}\right) ^{-1}\left( y-XZ\beta _{A}\right)\) and using \(P'\tilde{P'}=\left( I_{Tn}-ZZ'\right)\) results in

$$\begin{aligned}&\left( X'X+\sigma ^{2}P'V^{-1}P\right) a_{A} = X'XZ\beta _{A}\\&\quad +X'\left( X\tilde{P}V\tilde{P'}X'+\sigma ^{2}I_{T}\right) \left( X\tilde{P}V\tilde{P'}X+\sigma ^{2}I_{T}\right) ^{-1}\left( y-XZ\beta _{A}\right) \\&\quad -\sigma ^{2}X'\left( X\tilde{P}V\tilde{P'}X+\sigma ^{2}I_{T}\right) ^{-1}\left( y-XZ\beta _{A}\right) \\&\quad +\sigma ^{2}\left( I_{Tn}-ZZ'\right) X'\left( X\tilde{P}V\tilde{P'}X+\sigma ^{2}I_{T}\right) ^{-1}\left( y-XZ\beta _{A}\right) \end{aligned}$$

which reduces to

$$\begin{aligned} \left( X'X+\sigma ^{2}P'V^{-1}P\right) a_{A}= & {} X'XZ\beta _{A} +X'\left( y-XZ\beta _{A}\right) \\&-\sigma ^{2}X'\left( X\tilde{P}V\tilde{P'}X+\sigma ^{2}I_{T}\right) ^{-1}\left( y-XZ\beta _{A}\right) \\&+\sigma ^{2}X'\left( X\tilde{P}V\tilde{P'}X+\sigma ^{2}I_{T}\right) ^{-1}\left( y-XZ\beta _{A}\right) \\&-\sigma ^{2}ZZ'X'\left( X\tilde{P}V\tilde{P'}X+\sigma ^{2}I_{T}\right) ^{-1}\left( y-XZ\beta _{A}\right) . \end{aligned}$$

According to (2.15), the last term is zero and we obtain

$$\begin{aligned} \left( X'X+\sigma ^{2}P'V^{-1}P\right) a_{A}= & {} X'y. \end{aligned}$$

This shows that the least squares estimator \(a_{LS}\) and the Aitken estimator \(a_{A}\) coincide. \(\square\)

As a consequence of Claim 1, the least-squares estimates for u, v, and w and their Aitken counterparts coincide for \(G=\sigma ^{2}V^{-1}\). We need not distinguish them and denote all our estimates by circumflex:

$$\begin{aligned} a_{A}= & {} a_{LS}=\hat{a}=Z\hat{\beta }+\tilde{P}V\tilde{P'}X'\left( X\tilde{P}V\tilde{P'}X'+\sigma ^{2}I_{T}\right) ^{-1}\left( y-XZ\hat{\beta }\right) \end{aligned}$$
(2.20)
$$\begin{aligned} u_{A}= & {} u_{LS}=\hat{u}=\left( I_{T}-X\left( X'X+\sigma ^{2}P'V^{-1}P\right) ^{-1}X'\right) y\end{aligned}$$
(2.21)
$$\begin{aligned} v_{A}= & {} v_{LS}=\hat{v}=P\left( X'X+P'\sigma ^{2}P'V^{-1}P\right) ^{-1}X'y\end{aligned}$$
(2.22)
$$\begin{aligned} w_{A}= & {} w_{LS}=\hat{w}=X\tilde{P}\hat{v}+\hat{u}. \end{aligned}$$
(2.23)

For the sake of completeness and later use, the following observation is added:

Claim 2

\(G=\sigma ^{2}V^{-1}\) implies \(\hat{Q} =\sigma ^{2}\hat{w}'W^{-1}\hat{w}.\) In other words: the sum of squared deviations weighted by the variance ratios \(\frac{\sigma ^{2}}{\sigma _{1}^{2}},\frac{\sigma ^{2}}{\sigma _{2}^{2}},\ldots ,\frac{\sigma ^{2}}{\sigma _{n}^{2}}\) equals the weighted sum of squares (the squared Mahalanobis distance) in the Aitken regression.

Proof

As \(\hat{w}=X\tilde{P}\hat{v}+\hat{u}\), we have

$$\begin{aligned} \hat{Q}= & {} \hat{u}'\hat{u}+\sigma ^{2}\hat{v}'V^{-1}\hat{v}\\= & {} \hat{u}'\left( \hat{w}-X\tilde{P}\hat{v}\right) +\sigma ^{2}\hat{v}'V^{-1}\hat{v}\\= & {} \hat{u}'\hat{w}-\hat{u}'X\tilde{P}\hat{v}+\sigma ^{2}\hat{v}'V^{-1}\hat{v}\\= & {} \hat{u}'\hat{w}-\left( \hat{u}'X\tilde{P}-\sigma ^{2}\hat{v}'V^{-1}\right) \hat{v}\\= & {} \hat{u}'\hat{w}-\left( \hat{u}'X\tilde{P}-\sigma ^{2}\hat{v}'V^{-1}\right) P\hat{a}. \end{aligned}$$

With (2.5), (2.9), (2.12), and (2.20) this gives

$$\begin{aligned} \hat{Q}= & {} \hat{u}'\hat{w}-\left( \hat{u}'X\tilde{P}-\sigma ^{2}\hat{v}'V^{-1}\right) P\left( Z\hat{\beta }+\tilde{P}V\tilde{P'}X'W^{-1}\hat{w}\right) \\= & {} \hat{u}'\hat{w}-\left( \hat{u}'X\tilde{P}-\sigma ^{2}\hat{v}'V^{-1}\right) P\tilde{P}V\tilde{P'}X'W^{-1}\hat{w}\\= & {} \hat{u}'\hat{w}-\left( \hat{u}'X\tilde{P}-\sigma ^{2}\hat{v}'V^{-1}\right) V\tilde{P'}X'W^{-1}\hat{w}\\= & {} \hat{u}'\hat{w}-\left( \hat{u}'X\tilde{P}V\tilde{P'}X'-\sigma ^{2}\hat{v}'\tilde{P'}X'\right) W^{-1}\hat{w}\\= & {} \hat{u}'\hat{w}-\hat{u}'X\tilde{P}V\tilde{P'}X'W^{-1}\hat{w}+\sigma ^{2}\hat{v}'\tilde{P'}X'W^{-1}\hat{w}\\= & {} \hat{u}'\hat{w}-\hat{u}'\left( X\tilde{P}V\tilde{P'}X'+\sigma ^{2}I_{T}-\sigma ^{2}I_{T}\right) W^{-1}\hat{w}+\sigma ^{2}\hat{v}'\tilde{P'}X'W^{-1}\hat{w}\\= & {} \hat{u}'\hat{w}-\hat{u}'\left( X\tilde{P}V\tilde{P'}X'+\sigma ^{2}I_{T}\right) W^{-1}\hat{w}+\sigma ^{2}\hat{u}'W^{-1}\hat{w}+\sigma ^{2}\hat{v}'\tilde{P'}X'W^{-1}\hat{w}\\= & {} \hat{u}'\hat{w}-\hat{u}'\hat{w}+\sigma ^{2}\hat{u}'W^{-1}\hat{w}+\sigma ^{2}\hat{v}'\tilde{P'}X'W^{-1}\hat{w}\\= & {} \sigma ^{2}\left( \hat{u}'+\hat{v}'\tilde{P'}X'\right) W^{-1}\hat{w} \end{aligned}$$

and finally

$$\begin{aligned} \hat{Q}= & {} \sigma ^{2}\hat{w}'W^{-1}\hat{w}. \end{aligned}$$

Hence the weighted sum of squares Q equals the squared Mahalanobis distance. \(\square\)

Consider now the distribution of \(\hat{a}\). The matrix \(\left( X'X+\sigma ^{2}P'V^{-1}P\right)\), henceforth referred to as the “system matrix”, will be denoted by M:

$$\begin{aligned} M= & {} \left( X'X+\sigma ^{2}P'V^{-1}P\right) . \end{aligned}$$
(2.24)

With this, the normal equation (2.19), that defines the solution for the vector of the coefficients \(\hat{a}\) can be written as

$$\begin{aligned} M\,\hat{a}= & {} X'y. \end{aligned}$$
(2.25)

With (1.7) and (2.24) we obtain

$$\begin{aligned} \hat{a}= & {} M^{-1}X'\left( Xa+u\right) \nonumber \\= & {} M^{-1}\left( X'Xa+X'u+\sigma ^{2}P'V^{-1}Pa-\sigma ^{2}P'V^{-1}Pa\right) \nonumber \\= & {} a+M^{-1}\left( X'u-\sigma ^{2}P'V^{-1}v\right) . \end{aligned}$$
(2.26)

Given a realization of the time-path of the coefficients a, the estimator \(\hat{a}\) is distributed with mean

$$\begin{aligned} E\left\{ \hat{a}\left| a\right. \right\}= & {} a \end{aligned}$$
(2.27)

and covariance

$$\begin{aligned} E\left\{ \left( a-\hat{a}\right) \left( a-\hat{a}\right) '\right\} =M^{-1}\left( \begin{array}{cc} X'&-\sigma ^{2}P'V^{-1}\end{array}\right) \left( \begin{array}{cc} \sigma ^{2}I_{T} &{} 0\\ 0 &{} V \end{array}\right) \left( \begin{array}{c} X\\ -\sigma ^{2}V^{-1}P \end{array}\right) M^{-1}, \end{aligned}$$

which reduces to

$$\begin{aligned} E\left\{ \left( a-\hat{a}\right) \left( a-\hat{a}\right) '\right\} =M^{-1}\left( \begin{array}{cc} \sigma ^{2}X'X&+\sigma ^{4}P'V^{-1}P\end{array}\right) M^{-1} \end{aligned}$$

and finally to

$$\begin{aligned} E\left\{ \left( a-\hat{a}\right) \left( a-\hat{a}\right) '\right\} =\sigma ^{2}M^{-1}. \end{aligned}$$
(2.28)

The system matrix (2.24) is determined by the observations X, the variance \(\sigma ^{2}\) and the variances \(\varSigma\). Equation (2.28) gives the precision of our estimate which is directly related to the system matrix M. The next step is to determine the variance \(\sigma ^{2}\) and the variances \(\varSigma\).

4 Variance estimation

This section turns to estimating the variances. In Sect. 3.1 the proposed moments estimators will be derived and in Sect. 3.2 a maximum likelihood criterion \({\mathcal {C}}_{L}\)will be given that is based on the parameterized model described in Sect. 2. In Sect. 3.4 a moments criterion \({\mathcal {C}}_{M}\) will be given that generates, upon minimization, the moments estimators and it will be argued that, for large T, both criteria approach each other. As a consequence, the theoretical appeal of the likelihood estimator for large samples carries over to the moments estimator in the Gaussian case.

4.1 Moments estimation

The moments estimator that will be developed in this section has, for any sample size, a straightforward interpretation: It is defined by the property that the variances of the disturbances in the estimated coefficients equal their expectations. It has, thus, a straightforward connotation even in shorter time series and does not presuppose that the perturbations u and v are normally distributed. It will be shown later that the moments estimators approach the respective maximum likelihood estimators in large samples if the disturbances are normally distributed.

In the following we denote the estimated coefficients by \(\hat{a}\) and the estimated perturbations by \(\hat{u}\) and \(\hat{v}\). For some variances \(\sigma ^{2}\) and \(\sum ={\mathrm {diag}}\left( \begin{array}{cccccc} \sigma _{1}^{2},&\sigma _{2}^{2},&.&.&.,&\sigma _{n}^{2}\end{array}\right)\), the estimated coefficients \(\hat{a}\) along with the estimated disturbances \(\hat{u}\) and \(\hat{v}\) are random variables brought about by realizations of the random variables u and v. Consider \(\hat{u}=y-X\hat{a}=X\left( a-\hat{a}\right) +u\) first. With (2.26) we obtain

$$\begin{aligned} \hat{u}= & {} -X\left( M^{-1}\left( X'u-\sigma ^{2}P'V^{-1}v\right) \right) +u\\= & {} \left( I_{T}-XM^{-1}X'\right) u+\sigma ^{2}XM^{-1}P'V^{-1}v. \end{aligned}$$

Regarding \(\hat{v}\), consider the vectors \(\hat{v}_{i}'=\left( \begin{array}{cccccc} \hat{v}_{i,1}^{2},&\hat{v}_{i,3}^{2},&.&..&,&\hat{v}_{i,T-1}^{2}\end{array}\right)\) for \(i=1,2,\ldots ,n\), that is, the disturbances in the coefficients \(\hat{a}_{i}\) for each coefficient separately. These are obtained as follows.

Denote by \(e_{i}\in \mathbb {R}^{n}\) the n-th column of an \(n\times n\) identity matrix and define the \(\left( T-1\right) \times \left( T-1\right) n\)-matrix

$$\begin{aligned} E_{i}{:}{=}I_{T-1}\otimes e_{i}' \end{aligned}$$
(3.1)

that picks the time-path of the \(i-\)th disturbance \(v_{i}=\left( v_{i,1},v_{i,3},\ldots ,v_{i,T-1}\right) '\) from the disturbance vector v:

$$\begin{aligned} v_{i}{:}{=}E_{i}v. \end{aligned}$$

Note that, from (1.8),

$$\begin{aligned} \sum _{i=1}^{n}\sigma _{i}^{2}E_{i}'E_{i}= & {} V. \end{aligned}$$
(3.2)

Pre-multiplying (2.26) with the matrices \(E_{i}\) yields

$$\begin{aligned} \hat{v}_{i}=E_{i}\left( I_{\left( T-1\right) n}-\sigma ^{2}PM^{-1}P'V^{-1}\right) v+E_{i}PM^{-1}X'u. \end{aligned}$$

Thus \(\hat{u}\) and \(\hat{v_{i}}\) are linear functions of the random variables u and v, and their expected squared errors can be calculated.

Claim 3

For given observations X and y and given variances \(\sigma ^{2}\)and \(\varSigma\), the expected squared deviations of \(\hat{u}\) and \(\hat{v}_{i}\), \(i=1,2,\ldots ,n\) are

$$\begin{aligned} E\left\{ \hat{u}'\hat{u}\right\}= & {} \sigma ^{2}\left( T-{\text {tr}}\left( XM^{-1}X'\right) \right) . \end{aligned}$$
(3.3)
$$\begin{aligned} E\left\{ \hat{v}_{i}'\hat{v}_{i}\right\}= & {} \left( T-1\right) \sigma _{i}^{2}-\sigma ^{2}{\mathrm {tr}}\left( E_{i}PM^{-1}P'E_{i}'\right) . \end{aligned}$$
(3.4)

This implies that the expected sum of squares is

$$\begin{aligned} E\left\{ \hat{Q}\right\}= & {} \sigma ^{2}\left( T-n\right) . \end{aligned}$$
(3.5)

Proof

The expectation of the squared estimated error \(\hat{u}\) is

$$\begin{aligned} E\left\{ \hat{u}'\hat{u}\right\}= & {} E\left\{ \left( u'\left( I_{T}+XM^{-1}X'\right) +\sigma ^{2}v'V^{-1}PM^{-1}X'\right) \right. \\&\cdot \left. \left( \left( I_{T}-XM^{-1}X'\right) u+\sigma ^{2}XM^{-1}P'V^{-1}v\right) \right\} \\= & {} E\left\{ u'\left( I_{T}-XM^{-1}X'\right) \left( I_{T}-XM^{-1}X'\right) u\right\} \\&+\sigma ^{4}E\left\{ v'V^{-1}PM^{-1}X'XM^{-1}P'V^{-1}v\right\} \\= & {} {\text {tr}}\left( E\left\{ u'\left( I_{T}-XM^{-1}X'\right) \left( I_{T}-XM^{-1}X'\right) u\right\} \right) \\&+\sigma ^{4}{\text {tr}}\left( E\left\{ v'V^{-1}PM^{-1}X'XM^{-1}P'V^{-1}v\right\} \right) \\= & {} {\text {tr}}\left( E\left\{ \left( I_{T}-XM^{-1}X'\right) uu'\left( I_{T}-XM^{-1}X'\right) \right\} \right) \\&+\sigma ^{4}{\text {tr}}\left( E\left\{ XM^{-1}P'V^{-1}vv'V^{-1}PM^{-1}X'\right\} \right) \\= & {} {\text {tr}}\left( \sigma ^{2}\left( I_{T}-XM^{-1}X'\right) \left( I_{T}-XM^{-1}X'\right) \right) \\&+{\text {tr}}\left( \sigma ^{4}XM^{-1}P'V^{-1}PM^{-1}X'\right) \\= & {} \sigma ^{2}{\text {tr}}\left( \left( I_{T}-XM^{-1}X'\right) \left( I_{T}-XM^{-1}X'\right) +\sigma ^{2}XM^{-1}P'V^{-1}PM^{-1}X'\right) \\= & {} \sigma ^{2}{\text {tr}}\left( I-2XM^{-1}X'+XM^{-1}X'XM^{-1}X'+\sigma ^{2}XM^{-1}P'V^{-1}PM^{-1}X'\right) \\= & {} \sigma ^{2}{\text {tr}}\left( I_{T}-2XM^{-1}X'+XM^{-1}\left( X'X+\sigma ^{2}P'V^{-1}P\right) M^{-1}X'\right) \\= & {} \sigma ^{2}{\text {tr}}\left( I_{T}-XM^{-1}X'\right) = \sigma ^{2}\left( T-{\text {tr}}XM^{-1}X'\right) . \end{aligned}$$

and hence

$$\begin{aligned} E\left\{ \hat{u}'\hat{u}\right\} =\sigma ^{2}\left( T-{\text {tr}}\left( XM^{-1}X'\right) \right) . \end{aligned}$$
(3.6)

In a similar way, the expectation of the squared estimated disturbance in the i-th coefficient \(\hat{v}_{i}\) is evaluated as

$$\begin{aligned} E\left\{ \hat{v}_{i}'\hat{v}_{i}\right\}= & {} E\left\{ \left( u'XM^{-1}P'E_{i}'+v'\left( I_{\left( T-1\right) n}-\sigma ^{2}V^{-1}PM^{-1}P'\right) E_{i}'\right) \right. \\&\cdot \left. \left( E_{i}PM^{-1}X'u+E_{i}\left( I_{\left( T-1\right) n}-\sigma ^{2}PM^{-1}P'V^{-1}\right) v\right) \right\} \\= & {} E\left\{ u'XM^{-1}P'E_{i}'E_{i}PM^{-1}X'u+v'\left( I_{\left( T-1\right) n}-\sigma ^{2}V^{-1}PM^{-1}P'\right) E_{i}'E_{i}\right. \\&\times \left. \left( I_{\left( T-1\right) n}-\sigma ^{2}PM^{-1}P'V^{-1}\right) v\right\} = E\left\{ {\mathrm {tr}}\left( u'XM^{-1}P'E_{i}'E_{i}PM^{-1}X'u\right. \right. \\&+\left. \left. v'\left( I_{\left( T-1\right) n}-\sigma ^{2}V^{-1}PM^{-1}P'\right) E_{i}'E_{i}\left( I_{\left( T-1\right) n}-\sigma ^{2}PM^{-1}P'V^{-1}\right) v\right) \right\} \\= & {} E\left\{ {\mathrm {tr}}\left( E_{i}PM^{-1}X'uu'XM^{-1}P'E_{i}' + E_{i}\left( I_{\left( T-1\right) n}-\sigma ^{2}PM^{-1}P'V^{-1}\right) vv'\right. \right. \\&\left. \left. \times \left( I_{\left( T-1\right) n}-\sigma ^{2}V^{-1}PM^{-1}P'\right) E_{i}'\right) \right\} = {\mathrm {tr}}\left( \sigma ^{2}E_{i}PM^{-1}X'XM^{-1}P'E_{i}'\right) \\&+{\mathrm {tr}}\left( E_{i}\left( I_{\left( T-1\right) n}-\sigma ^{2}PM^{-1}P'V^{-1}\right) V\left( I_{\left( T-1\right) n}-\sigma ^{2}V^{-1}PM^{-1}P'\right) E_{i}'\right) \\= & {} {\mathrm {tr}}\left( \sigma ^{2}E_{i}PM^{-1}X'XM^{-1}P'E_{i}'\right) +{\mathrm {tr}}\left( E_{i}\left( V-\sigma ^{2}PM^{-1}P'\right) \right. \\&\times \left. \left( I_{\left( T-1\right) n}-\sigma ^{2}V^{-1}PM^{-1}P'\right) E_{i}'\right) ={\mathrm {tr}}\left( \sigma ^{2}E_{i}PM^{-1}X'XM^{-1}P'E_{i}'\right) \\&+{\mathrm {tr}}\left( E_{i}\left( V-\sigma ^{2}PM^{-1}P'\right) E_{i}'-\sigma ^{2}E_{i} \left( V-\sigma ^{2}PM^{-1}P'\right) V^{-1}PM^{-1}P'E_{i}'\right) \\= & {} {\mathrm {tr}}\left( \sigma ^{2}E_{i}PM^{-1}X'XM^{-1}P'E_{i}'\right) +{\mathrm {tr}}\left( E_{i}\left( V-\sigma ^{2}PM^{-1}P'-\sigma ^{2}PM^{-1}P'\right. \right. \\&\left. \left. +\sigma ^{4}PM^{-1}P'V^{-1}PM^{-1}P'\right) E_{i}'\right) = {\mathrm {tr}}\left( \sigma ^{2}E_{i}PM^{-1}X'XM^{-1}P'E_{i}'\right. \\&+\left. E_{i}\left( V-\sigma ^{2}PM^{-1}P'-\sigma ^{2}PM^{-1}P'+\sigma ^{4}PM^{-1}P'V^{-1}PM^{-1}P'\right) E_{i}'\right) \\= & {} {\mathrm {tr}}\left( E_{i}\left( \left( \sigma ^{2}PM^{-1}\left( X'X+\sigma ^{2}P'V^{-1}P\right) M^{-1}P'\right) +V-2\sigma ^{2}PM^{-1}P'\right) E_{i}'\right) \\= & {} {\mathrm {tr}}\left( E_{i}\left( V-\sigma ^{2}PM^{-1}P'\right) E_{i}'\right) = {\mathrm {tr}}\left( E_{i}VE_{i}'-\sigma ^{2}E_{i}PM^{-1}P'E_{i}'\right) \\= & {} {\mathrm {tr}}\left( \left( I_{T-1}\otimes e_{i}'\right) \left( I_{T-1}\otimes \varSigma \right) \left( I_{T-1}\otimes e_{i}\right) -\sigma ^{2}E_{i}PM^{-1}P'E_{i}'\right) \\= & {} {\mathrm {tr}}\left( I_{T-1}\otimes e_{i}'\varSigma e_{i}\right) -\sigma ^{2}{\mathrm {tr}}\left( E_{i}PM^{-1}P'E_{i}'\right) \end{aligned}$$

and hence

$$\begin{aligned} E\left\{ \hat{v}_{i}'\hat{v}_{i}\right\} =\left( T-1\right) \sigma _{i}^{2}-\sigma ^{2}{\mathrm {tr}}\left( E_{i}PM^{-1}P'E_{i}'\right) . \end{aligned}$$
(3.7)

Regarding \(\hat{Q}\) we note that

$$\begin{aligned} X'X+\sigma ^{2}P'V^{-1}P= & {} X'X+\sigma ^{2}\sum _{i=1}^{n}\frac{1}{\sigma _{i}^{2}}E_{i}PP'E_{i}'=M \end{aligned}$$

and obtain

$$\begin{aligned} E\left\{ \hat{Q}\right\}= & {} \sigma ^{2}\left( T-{\text {tr}}\left( XM^{-1}X'\right) \right) +\sum _{i=1}^{n}\frac{\sigma ^{2}}{\sigma _{i}^{2}}\left( \left( T-1\right) \sigma _{i}^{2}-\sigma ^{2}{\mathrm {tr}}\left( E_{i}PM^{-1}P'E_{i}'\right) \right) \\= & {} \sigma ^{2}\left( T-{\text {tr}}\left( XM^{-1}X'\right) +\sum _{i=1}^{n}\left( T-1\right) -\sum _{i=1}^{n}\frac{\sigma ^{2}}{\sigma _{i}^{2}}{\mathrm {tr}}\left( E_{i}PM^{-1}P'E_{i}'\right) \right) \\= & {} \sigma ^{2}\left( T+n\left( T-1\right) -{\text {tr}}\left( XM^{-1}X'\right) -{\mathrm {tr}}\left( \sum _{i=1}^{n}\frac{\sigma ^{2}}{\sigma _{i}^{2}}E_{i}PM^{-1}P'E_{i}'\right) \right) \\= & {} \sigma ^{2}\left( Tn-T-n-{\text {tr}}\left( M^{-1}X'X\right) -{\mathrm {tr}}\left( M^{-1}\sum _{i=1}^{n}\frac{\sigma ^{2}}{\sigma _{i}^{2}}P'E_{i}'E_{i}P\right) \right) \\= & {} \sigma ^{2}\left( Tn-T-n-{\text {tr}}\left( M^{-1}X'X\right) -{\mathrm {tr}}\left( M^{-1}\sum _{i=1}^{n}\sigma ^{2}P'V^{-1}P\right) \right) \\= & {} \sigma ^{2}\left( Tn-T-n-{\text {tr}}\left( M^{-1}\left( X'X-\sigma ^{2}P'V^{-1}P\right) \right) \right) \\= & {} \sigma ^{2}\left( Tn-T-n-{\text {tr}}\left( I_{nT}\right) \right) \end{aligned}$$

and hence

$$\begin{aligned} E\left\{ \hat{Q}\right\} =\sigma ^{2}\left( T-n\right) . \end{aligned}$$

\(\square\)

The moments estimators are obtained by selecting variances \(\sigma ^{2}\) and \(\sigma _{i}^{2},i=1,2,\ldots ,n\) such that the expected moments \(E\left\{ \hat{u}'\hat{u}\right\}\) and \(E\left\{ \hat{v}_{i}'\hat{v}_{i}\right\} ,i=1,2,\ldots ,n\) given in (3.6) and (3.7) are equalized to the estimated moments \(\hat{u}'\hat{u}\) and \(\hat{v}_{i}'\hat{v}_{i},\,i=1,2,\ldots ,n\). As both the expected moments and the estimated moments are functions of the variances, the moments estimators, denoted by \(\hat{\sigma }^{2}\) and \(\hat{\sigma }_{i}^{2},\,i=1,2,\ldots ,n\), respectively, are defined as a fix point of the system

$$\begin{aligned} \hat{u}'\hat{u}= & {} \sigma ^{2}\left( T-{\text {tr}}\left( XM^{-1}X'\right) \right) \\ \hat{v}_{i}'\hat{v}_{i}= & {} \left( T-1\right) \sigma _{i}^{2}-\sigma ^{2}{\mathrm {tr}}\left( E_{i}PM^{-1}P'E_{i}'\right) . \end{aligned}$$

Alternatively, the moments estimators can be equivalently defined as a fix point of the system:

$$\begin{aligned} \hat{v}_{i}'\hat{v}_{i}= & {} \left( T-1\right) \sigma _{i}^{2}-\sigma ^{2}{\mathrm {tr}}\left( E_{i}PM^{-1}P'E_{i}'\right) \\ \hat{Q}= & {} \sigma ^{2}\left( T-n\right) . \end{aligned}$$

The implementations by Schlicht (2005b, 2021) use the latter alternative and employ a gradient process to find the solution of the equation system

$$\begin{aligned} \hat{v}_{i}'\hat{v}_{i}= & {} \left( T-1\right) \hat{\sigma }_{i}^{2}-\hat{\sigma }^{2}{\mathrm {tr}}\left( E_{i}P\hat{M}^{-1}P'E_{i}'\right) \\ \hat{Q}= & {} \hat{\sigma }^{2}\left( T-n\right) . \end{aligned}$$

This can be written as

$$\begin{aligned} \frac{\hat{\sigma }_{i}^{2}}{\hat{\sigma }^{2}}= & {} \left( \frac{\hat{v}_{i}'\hat{v}_{i}}{\hat{Q}}\left( T-n\right) -{\mathrm {tr}}\left( E_{i}P\hat{M}^{-1}P'E_{i}'\right) \right) \frac{1}{T-1} \end{aligned}$$
(3.8)
$$\begin{aligned} \hat{\sigma }^{2}= & {} \frac{1}{T-n}\hat{Q}. \end{aligned}$$
(3.9)

Iteration starts with some variance ratios \(\gamma _{i}=\frac{\sigma ^{2}}{\sigma _{i}^{2}}.\) This permits to determine the right-hand sides of Eqs. (3.8) and (3.9). The variance ratios at the left-hand side of (3.8) and the variance at the left hand side of (3.9) are used for a new iteration, and this continues until convergence is reached, delivering the fix-point values \(\hat{\gamma }_{i}=\frac{\hat{\sigma }^{2}}{\hat{\sigma _{i}}^{2}}\) and \(\hat{\sigma }^{2}\) and the corresponding variances \(\hat{\sigma }_{i}^{2}=\frac{\hat{\sigma }^{2}}{\hat{\gamma }_{i}}\). (If this process does not converge, another solution procedure is available that will be discussed in Sect. 3.3 below.)

4.2 Likelihood estimation

This section derives a maximum-likelihood estimator for the variances under the additional assumption that the disturbances u and v are normally distributed.

Using Eqs. (1.8) and (2.10)–(2.14) together with the identity \(x=XZ\), the concentrated log-likelihood function for the Aitken regression (2.9) can be written as

$$\begin{aligned} {\mathscr {L}}\left( \begin{array}{c} \sigma ^{2},\varSigma \end{array}\right) =-\frac{1}{2}\left( T\left( \log 2+\log \pi \right) +\log \det W\right) -\frac{1}{2}\left( y-XZ\beta \right) 'W^{-1}\left( y-XZ\beta \right) \end{aligned}$$
(3.10)

with

$$\begin{aligned} W= & {} X\tilde{P}\left( I_{T-1}\otimes \varSigma \right) \tilde{P'}X'+\sigma ^{2}I_{T}. \end{aligned}$$

By maximizing (3.10) with respect to \(\beta ,\) \(\sigma ^{2}\) and \(\varSigma\), the maximum likelihood estimates for the variances are obtained and the corresponding expectation for the parameter a is given in analogy to (2.20) as

$$\begin{aligned} \check{a}= & {} Z\check{\beta }+\tilde{P}\check{V}\tilde{P'}X'\left( X\tilde{P}\check{V}\tilde{P'}X'+\check{\sigma }^{2}I_{T}\right) ^{-1}\left( y-XZ\check{\beta }\right) \end{aligned}$$

with a caron denoting the maximum likelihood estimates and \(\check{V}=\left( I_{T-1}\otimes \check{\varSigma }\right)\).

The maximum likelihood estimator can be characterized in another way. This will be explained in the following. In order to do so, the following lemma is needed.

Claim 4

$$\begin{aligned} \log \det W= & {} \log \det \left( PMP'\right) +\left( T-1\right) \sum _{i=1}^{n}\log \sigma _{i}^{2}\nonumber \\&-\left( \left( T-1\right) n-T\right) \log \sigma ^{2}-2\log \det \left( PP'\right) . \end{aligned}$$
(3.11)

Proof

$$\begin{aligned} \det W= & {} \det \left( X\tilde{P}V\tilde{P'}X'+\sigma ^{2}I_{T}\right) \\= & {} \left( \sigma ^{2}\right) ^{T}\det \left( \frac{1}{\sigma ^{2}}X\tilde{P}V^{\frac{1}{2}}V^{\frac{1}{2}}\tilde{P'}X'+I_{T}\right) \\= & {} \left( \sigma ^{2}\right) ^{T}\det \left( \frac{1}{\sigma ^{2}}V^{\frac{1}{2}}\tilde{P'}X'X\tilde{P}V^{\frac{1}{2}}+I_{\left( T-1\right) n}\right) \\= & {} \left( \sigma ^{2}\right) ^{T}\det \left( V^{\frac{1}{2}}\left( \frac{1}{\sigma ^{2}}\tilde{P'}X'X\tilde{P}+V^{-1}\right) V^{\frac{1}{2}}\right) \\= & {} \left( \sigma ^{2}\right) ^{T}\det \left( V\left( \frac{1}{\sigma ^{2}}\left( PP'\right) ^{-1}PX'XP'\left( PP'\right) ^{-1}+V^{-1}\right) \right) \\= & {} \left( \sigma ^{2}\right) ^{T}\det \left( \frac{1}{\sigma ^{2}}V\left( PP'\right) ^{-1}P\left( X'X+\sigma ^{2}P'V^{-1}P\right) P'\left( PP'\right) ^{-1}\right) \\= & {} \left( \sigma ^{2}\right) ^{T}\det \left( \frac{1}{\sigma ^{2}}V\right) \det \left( PP'\right) ^{-1}\det \left( PMP'\right) \det \left( PP'\right) ^{-1}\\= & {} \left( \sigma ^{2}\right) ^{T}\left( \prod _{i=1}^{n}\frac{\sigma _{i}^{2}}{\sigma ^{2}}\right) ^{(T-1)}\det \left( PP'\right) ^{-2}\det \left( PMP'\right) . \end{aligned}$$

Hence the result

$$\begin{aligned} \log \det W= & {} \log \det \left( PMP'\right) +\left( T-1\right) \sum _{i=1}^{n}\log \sigma _{i}^{2}\\&-\left( \left( T-1\right) n-T\right) \log \sigma ^{2}-2\log \det \left( PP'\right) \end{aligned}$$

is obtained. \(\square\)

Claim 5

Minimizing the criterion

$$\begin{aligned} {\mathcal {C}}_{L}&=\log \det \left( PMP'\right) +\left( T-1\right) \sum _{i=1}^{n}\log \sigma _{i}^{2}-\left( \left( T-1\right) n-T\right) \log \sigma ^{2}\nonumber \\&\quad +\frac{1}{\sigma ^{2}}u'u+v'V^{-1}v \end{aligned}$$
(3.12)

is equivalent to maximizing the likelihood function (3.10).

Proof

With (3.11) we have

$$\begin{aligned} {\mathcal {C}}_{L}+2{\mathscr {L}}\left( \begin{array}{c} \sigma ^{2},\varSigma \end{array}\right)= & {} \frac{1}{\sigma ^{2}}u'u+v'V^{-1}v-w'W^{-1}w+2\log \det \left( PP'\right) \\&-T\left( \log 2+\log \pi \right) . \end{aligned}$$

As, according to Claim 2, \(w'W^{-1}w=\left( y-XZ\beta \right) 'W^{-1}\left( y-XZ\beta \right)\) equals \(\frac{1}{\sigma ^{2}}u'u+v'V^{-1}v\) and \(\log \det \left( PP'\right)\) and \(T\left( \log 2+\log \pi \right)\) are independent of the variances, we can write

$$\begin{aligned} {\mathcal {C}}_{L}= & {} -2{\mathscr {L}}\left( \begin{array}{c} \sigma ^{2},\varSigma \end{array}\right) +{\mathrm {constant}} \end{aligned}$$

where “\({\mathrm {constant}}\)” is independent of the variances and maximization of \({\mathscr {L}}\) with regard to the variances is equivalent to minimization of \({\mathcal {C}}_{L}.\) \(\square\)

4.3 Another representation of the moments estimators

The relationship between the likelihood estimator and the moments estimator can be elucidated with the aid of a criterion that is very similar to the likelihood criterion (3.12). This criterion function is

$$\begin{aligned} {\mathcal {C}}_{M}\left( \begin{array}{cc} \sigma ^{2},&\varSigma \end{array}\right)= & {} \log \det M+\log \det PP'+\left( T-1\right) \sum _{i=1}^{n}\log \sigma _{i}^{2}-T\left( n-1\right) \log \sigma ^{2}\nonumber \\&+\frac{1}{\sigma ^{2}}u'u+v'V^{-1}v. \end{aligned}$$
(3.13)

Claim 6

Minimization of the criterion function (3.13) with respect to the disturbances u and v and the variances \(\sigma ^{2}\) and \(\varSigma\) yields the moments estimators as defined in (3.3) and (3.4).

Proof

Note that the envelope theorem together with (3.2) implies

$$\begin{aligned} \frac{\partial }{\partial \sigma ^{2}}\left( \frac{1}{\sigma ^{2}}\hat{u}'\hat{u}+\hat{v}'V^{-1}\hat{v}\right)= & {} -\frac{1}{\sigma ^{4}}\hat{u}'\hat{u} \end{aligned}$$
(3.14)
$$\begin{aligned} \frac{\partial }{\partial \sigma _{i}^{2}}\left( \frac{1}{\sigma ^{2}}\hat{u}'\hat{u}+\hat{v}'V^{-1}\hat{v}\right)= & {} -\frac{\sigma ^{2}}{\sigma _{i}^{4}}\hat{v_{i}}'\hat{v_{i}}. \end{aligned}$$
(3.15)

In view of (3.2) we obtain further

$$\begin{aligned} \frac{\partial \log \det M}{\partial \sigma ^{2}}= & {} {\text {tr}}\left( M^{-1}P'V^{-1}P\right) . \end{aligned}$$
(3.16)

By definition (2.24) we have

$$\begin{aligned} M^{-1}\left( X'X+\sigma ^{2}P'V^{-1}P\right)= & {} I \end{aligned}$$

and hence

$$\begin{aligned} M^{-1}P'V^{-1}P= & {} \frac{1}{\sigma ^{2}}\left( I-M^{-1}X'X\right) . \end{aligned}$$

With this, Eq. (3.16) can be written as

$$\begin{aligned} \frac{\partial \log \det M}{\partial \sigma ^{2}}= & {} {\text {tr}}\left( \frac{1}{\sigma ^{2}}\left( I_{Tn}-M^{-1}X'X\right) \right) \\= & {} \frac{1}{\sigma ^{2}}\left( {\text {tr}}\left( I_{Tn}\right) -{\mathrm {tr}}\left( M^{-1}X'X\right) \right) \\= & {} \frac{Tn}{\sigma ^{2}}-\frac{1}{\sigma ^{2}}{\mathrm {tr}}\left( XM^{-1}X'\right) .\\ \frac{\partial \log \det M}{\partial \sigma _{i}^{2}}= & {} -\frac{\sigma ^{2}}{\sigma _{i}^{4}}{\text {tr}}\left( M^{-1}P'E_{i}'E_{i}P\right) \end{aligned}$$

and we find

$$\begin{aligned} \frac{\partial {\mathcal {C}}_{M}}{\partial \sigma ^{2}}= & {} \frac{Tn}{\sigma ^{2}}-\frac{1}{\sigma ^{2}}{\mathrm {tr}}\left( XM^{-1}X'\right) -\frac{T\left( n-1\right) }{\sigma ^{2}}-\frac{1}{\sigma ^{4}}\hat{u}'\hat{u}=0 \end{aligned}$$
(3.17)
$$\begin{aligned} \frac{\partial {\mathcal {C}}_{M}}{\partial \sigma _{i}^{2}}= & {} -\frac{\sigma ^{2}}{\sigma _{i}^{4}}{\text {tr}}\left( P'E_{i}'E_{i}PM^{-1}\right) +\left( T-1\right) \frac{1}{\sigma _{i}^{2}}-\frac{\sigma ^{2}}{\sigma _{i}^{4}}\hat{v_{i}}'\hat{v_{i}}=0 \end{aligned}$$
(3.18)

which gives

$$\begin{aligned} \hat{u}'\hat{u}= & {} \sigma ^{2}\left( T-\sigma ^{2}{\mathrm {tr}}\left( XM^{-1}X'\right) \right) \end{aligned}$$
(3.19)
$$\begin{aligned} \hat{v_{i}}'\hat{v_{i}}= & {} \left( T-1\right) \sigma _{i}^{2}-\sigma ^{2}{\text {tr}}\left( P'E_{i}'E_{i}PM^{-1}\right) . \end{aligned}$$
(3.20)

These first-order conditions are equivalent to Eqs. (3.3), (3.4) that define the moments estimator. \(\square\)

Johannes Ludsteck’s (2004, 2018) Mathematica packages for VC proceed by minimizing the criterion function (3.13). This permits very clean and transparent programming. As Claim 6 is confined to moments and does not require any assumption about the normality of the disturbances, Ludsteck’s estimators are moments estimators as well.

4.4 The relationship between the likelihood and the moments estimator

The likelihood estimates minimize, according to Claim 5, the criterion \({\mathcal {C}}_{L}\) and the moments estimates minimize, according to Claim 6, the criterion \({\mathcal {C}}_{M}\). It is claimed in the following that, for increasing T and bounded X, both estimates tend to coincide. To show that, the following lemma is needed.

Claim 7

For sufficiently large T and bounded explanatory variables X, the following holds true approximately:

$$\begin{aligned} \det PMP'\approx & {} \det M\det \left( PP'\right) . \end{aligned}$$

Proof

Define the \(Tn\times Tn\) matrix

$$\begin{aligned} \mathbb {P}= & {} \left( \begin{array}{c} P\\ T^{-\frac{1}{2}}Z' \end{array}\right) \end{aligned}$$

and consider the matrix \(\mathbb {P}M\mathbb {P}'.\) One way to calculate it is as follows:

$$\begin{aligned} \mathbb {P}M\mathbb {P}'= & {} \left( \begin{array}{c} P\\ T^{-\frac{1}{2}}Z' \end{array}\right) M\left( \begin{array}{cc} P'&T^{-\frac{1}{2}}Z\end{array}\right) \\= & {} \left( \begin{array}{cc} PMP' &{} T^{-\frac{1}{2}}PMZ\\ T^{-\frac{1}{2}}Z'MP' &{} T^{-1}Z'Z \end{array}\right) \\= & {} \left( \begin{array}{cc} PMP' &{} T^{-\frac{1}{2}}PX'XZ\\ T^{-\frac{1}{2}}Z'X'XP' &{} I_{n} \end{array}\right) . \end{aligned}$$

This implies

$$\begin{aligned} \det \mathbb {P}M\mathbb {P}'= & {} \det I_{n}\det \left( PMP'-\frac{1}{T}PX'XZZ'X'XP'\right) \\= & {} \det \left( PMP'-\frac{1}{T}PX'xx'XP'\right) \\= & {} \det \left( P\left( M-\frac{1}{T}X'xx'X\right) P'\right) \\= & {} \det \left( P\left( X'\left( I_{T}-\frac{1}{T}xx'\right) X+\sigma ^{2}P'V^{-1}P\right) P'\right) . \end{aligned}$$

For increasing T and bounded x, \(\frac{1}{T}xx'\) tends to zero and \(\left( I_{T}-\frac{1}{T}xx'\right)\) tends to \(I_{T}\). Hence \(\det \mathbb {P}M\mathbb {P}'\) tends to \(\det PMP'\) and we can write

$$\begin{aligned} \det \mathbb {P}M\mathbb {P}'\approx & {} \det PMP' \end{aligned}$$
(3.21)

for large T. Another way to evaluate \(\det \left( \mathbb {P}M\mathbb {P}\right)\) is the following:

$$\begin{aligned} \det \mathbb {P}M\mathbb {P}'= & {} \det \left( M\mathbb {P}'\mathbb {P}\right) \\= & {} \det M\det \left( \mathbb {P}'\mathbb {P}\right) \\= & {} \det M\det \left( \mathbb {P}\mathbb {P}'\right) \end{aligned}$$

As

$$\begin{aligned} \det \left( \mathbb {P}\mathbb {P}'\right)= & {} \det \left( \begin{array}{cc} PP' &{} 0\\ 0 &{} I_{n} \end{array}\right) \,=\,\det \left( PP'\right) ,\nonumber \\ \det \mathbb {P}M\mathbb {P}'= & {} \det M\det \left( PP'\right) \end{aligned}$$
(3.22)

is obtained. Combining (3.21) and (3.22) gives the result. \(\square\)

Claim 8

For increasing T and with bounded explanatory variables X, the moments criterion and the likelihood criterion coincide.

Proof

For increasing T and in view Claim 7, \({\mathcal {C}}_{M}\) tends to \({\mathcal {C}}_{L}\). \(\square\)

Hence the minimization of both criteria with respect to the variances will generate in the limit the same result.Footnote 4 In consequence, the descriptive appeal of the moments estimator carries over to the likelihood estimator, and the theoretical appeal of the likelihood estimator carries over to the moments estimator.

5 Miscellaneous notes

The following offers remarks on computation (Sect. 4.1), comments on some applications of the VC method in economics that illustrate aspects of the VC method of potential interest in other fields (Sect. 4.2). Some illustration provided by simulation studies is given (Sect. 4.3). Section 4.4 discusses the problem of artifacts. Some methodological concerns are raised in Sect. 4.5.

5.1 Notes on computation

The VC method has been embodied in some freely available software packages (Ludsteck 2004, 2018; Schlicht 2005b, 2021). Although these have been developed under the assumption that all disturbances are Gaussian, the numerical routines, briefly sketched at the end of Sects. 3.1 and 3.3, remain appropriate for the non-Gaussian case.

Schlicht and Ludsteck (2006, Sec. 11) have compared the performance of the moments estimator with that of the Kalman filter in the EViews (2005) implementation for the Gaussian case and conclude that “both estimators perform very similar—with the caveat that the Eviews estimates have been calculated by using the theoretical values as starting values. ...The distributions of the estimates for the weights are practically indistinguishable.” Given that true variances would be unavailable in practical applications and that the the Kalman results appear to be quite sensitive to the choice of initial values, that speaks for the VC method in the case that the coefficients follow a random walk. Further, the VC method dispenses of necessity to specify initial values and offers additional descriptive features, as indicated by Claim 1 and Eq. (2.8).

5.2 Notes on applications

In spite of its so far insufficient documentation, VC has found a quite a number of applications in various settings, mainly dealing with structural change. As any of the authors of these studies will be a better judge regarding the practical performance of the VC method than this author (who is neither an applied economist, nor an econometrican, nor a statistician), any comments in this regard from my side appear unwarranted. Yet it may be appropriate to illustrate possible uses of the VC method by means of some examples taken from my field, economics.

In the wake of the financial crisis of 2008, it has been observed that “monetary policy rules change gradually, pointing to the importance of applying a time-varying estimation framework” (Baxa et al. 2014) and that, “by applying the time-varying coefficients method ...it was clear that the past financial crisis caused the central bank to be more expansionary in its policy than usual towards financial stress” (Madsen 2012). Further, analyses of inflation targeting (IT) in “a time-varying coefficients methodology ...show a clear picture of credibility gains from the adoption of IT” (Nogueira 2009). Another application dealt with the recent decoupling of greenhouse gas emissions and gross domestic product in the wake of global warming where it has been found that “the evidence for decoupling among the richer countries gets weaker.” (Cohen et al. 2017). Regarding the relationship between unemployment and economic growth, known in economics as “Okun’s Law”, it has been contested that the relationship has been static over time (Jalles 2018) and that, actually, “deregulation in labor and product markets and recessions have strengthened the response of unemployment to the business cycle” (Furceri et al. 2019).

Such applications suggest to me that the VC method may offer an additional useful way for dealing with linear models with coefficients that follow a random walk, and I hope that similar applications will be found in other fields.

5.3 Some illustration

To illustrate the practical workings of VC, assume a model with an intercept term \(a_{t}\) and a single explanatory variable \(x_{t}\) with coefficient \(b_{t}\)Footnote 5:

$$\begin{aligned} y_{t}= & {} a_{t}+b_{t}x_{t}+u_{t} \end{aligned}$$

Using the simulation tool from Ludsteck (2004; 2018), a time series for the explanatory variable was generated with \(x_{t}\sim {\mathcal {N}}\left( 0,100\right)\), \(t=1,2,\ldots ,50\). Further it was assumed that \(u_{t}\sim {\mathcal {N}}\left( 0,0.1\right)\), \(\left( a_{t}-a_{t-1}\right) \sim {\mathcal {N}}\left( 0,0.01\right)\), and \(\left( b_{t}-b_{t-1}\right) \sim {\mathcal {N}}\left( 0,0.001\right)\). Typically the optimally computed expectations of the time paths (calculated by using the true variances) and the VC estimates lie very close together. Figure 1 illustrates a somewhat atypical run with estimated smoothing weights that deviate from the true smoothing weights by the order of five. The optimally estimated time-paths of the coefficients (based on the true variances) and the estimated time-paths (based on the estimated coefficients) move together. This illustrates the general impression that the filtering results, especially the qualitative time-patterns, are not extremely sensitive with regard to the weights used for filtering.

Fig. 1
figure 1

Optimally calculated expectations (thin lines) and VC estimates (thick lines) for intercept (left) and slope (right), together with the realizations of the coefficients (x) and the VC confidence bands. The example has been selected to visually exhibit differences between the true expectations and the VC estimates; usually the weights are estimated better and the curves lie quite close together. As the estimated smoothing weights are considerably smaller than the true weights, the time-paths of the VC estimates are less smooth than the true expectations (True weights are \(\gamma _{a}=10\) and \(\gamma _{b}=100\), while the estimated weights are \(\hat{\gamma }_{a}=1.60\) and \(\hat{\gamma }_{b}=14.76\) here. The true variances are \(\sigma _{u}^{2}=0.1\), \(\sigma _{a}^{2}=0.01\), and \(\sigma _{b}^{2}=0.001\), the estimated variances are \(\hat{\sigma }_{u}^{2}=0.04\)0, \(\hat{\sigma }_{a}^{2}=0.025\), and \(\hat{\sigma }_{b}^{2}=0.0029\))

It is, obviously, never possible to extract the movement of the true coefficients from the data, irrespective how long the time series is. (Only the estimation of the weights will improve with the length of the time series.) The best that can be done is to estimate the expectations of the coefficients. Given the variances, the VC estimate (which is the mean of a random vector) is optimal and cannot be improved upon, and the standard of comparison must be the estimates obtained with optimal weights, as in Fig. 1.

The distribution of the weights in the above setting is illustrated in Fig. 2. The time series for x, u, and v have been generated as described above and the VC moments estimation applied 5000 times. The histogram Fig. 2 illustrates that the estimates cluster around their theoretical values.

Fig. 2
figure 2

Histogram of estimates for the \(\log _{10}\) weights. The theoretical values are \(\log _{10}\gamma _{a}=1\) and \(\log _{10}\gamma _{b}=2\). The distribution of estimates clusters around this peak. (\(T=50\), 5000 trials)

5.4 Artifacts

Suppose that the data of a particular problem have been generated by the standard linear model (1.2). If this is the case, the VC model is misspecified, because a correct estimation would require that the variances \(\sigma _{1}^{2},\sigma _{2}^{2},\ldots ,\sigma _{n}^{2}\) of the coefficients are zero and the weights \(\gamma _{1,}\gamma _{2},\ldots ,\gamma _{n}\)—the inverse variance ratios—are infinite, whereas VC implicitly assumes that the weights are finite. As the VC estimates with sufficiently large weights \(\gamma _{i}\) are indistinguishable from the OLS estimates, the VC estimation would nevertheless be approximately correct if the estimated weights are sufficiently large.Footnote 6

As VC estimates involve nearly twice as many parameters as OLS, there is more room for artifacts in VC. From this point of view, VC ought to be used with caution, especially if all parameters are permitted to vary over time, rather just a selected few. To illustrate, consider a linear model \(y_{t}=a+bx_{t}+u_{t}\) with \(a=1,\) \(b=2\), \(x_{t}\) drawn from a Normal distribution with mean zero and variance 5, and \(u_{t}\) normally distributed with mean zero and variance \(\sigma _{u}^{2}=0.1\). The histogram of the lowest estimated weights is given in Fig. 3. In 99% of the cases, the minimum weight is above 7.97, and in 95% of the cases, the minimum weight is above 34.6. The corresponding VC estimates are given in Fig. 4. In the 1% case, the estimate of the time paths involve severe artifacts. In the 5% case, artifacts are still there, but in the majority of cases, VC estimates conform to OLS estimates. Further, VC does not reject the hypothesis of time-invariant parameters in 99 per cent. of the cases. This observation suggests that VC may be used to check the linear specification of a time-series model.

Fig. 3
figure 3

Histogram of lowest weights \(\gamma _{{\mathrm {min}}}={\mathrm {min}}\left\{ \gamma _{1},{{\gamma _{2}}}\right\}\) of VC estimates for a linear model with time-invariant coefficients. (\(T=50\), 1000 trials)

Fig. 4
figure 4

VC estimates at the 1% quantile (a) and the 5% quantile (b) of Fig. 3. The dashed lines give the confidence band for ± two standard deviations. The red lines indicate the OLS estimates of the coefficients. The true coefficients are 1 and 2. As the OLS estimates fit into the confidence bands, VC does not reject the case of constant coefficients

With higher/lower noise, the problem of artifacts becomes more/less severe.Footnote 7 Still the problem has to be kept in mind when interpreting VC results.

5.5 Aggregate data, Pyrrho’s lemma, and the VC philosophy

Almost all economic models deal with aggregate data. Employment comprises women and men, different age groups and various occupations in sundry industries scattered over many regions. The wage level summarizes the earnings of all these people. Similarly, production comprises a multitude of goods and services, and the price level is just an index of thousands of the attached prices. The structures of these aggregates are not rigid but change over time in response to changing technologies, shifting tastes, and volatile business conditions. To assume that time-invariant laws govern the interaction of time series of such aggregates seem preposterous to me. Some researchers tried to cope with the problem by using weighted regression—giving higher weights to more recent observations (Gilchrist 1967, Rouhiainen 1978). This seems to me to be an inferior alternative to VC.

The reason for developing VC was my desire to show that a Marshallian view of economics, that involves time-varying structures, does not render quantitative economics impossible. Estimation can be done by using Kalman filtering, or the VC method described in this paper, or perhaps other methods. I advocated estimating time-varying structures with Kalman filtering in Schlicht (1977, Appendix B), but without any resonance. This puzzled me. Was this really such a bad idea?

Maybe it wasn’t, but the puzzle remains. What were the reasons for the decade-long resistance to dealing with time-varying coefficients? And why has this somewhat changed over the past fifteen years?

One reason may have been that structures changing over time cannot represent the ’true model’ economists were chasing during the heydays of ’dynamic stochastic general equilibrium’ macroeconomics. The existence of such a ’true model’ was simply postulated (Lucas 1976, 24). I think that this is, in the context of aggregate models dealing with long-run time series, a red herring, distracting from considering seriously what aggregate models represent.Footnote 8

Another reason, I submit, was the reductionist bent of economists. If a structure changes over time, this warrants explanation. Hence there was a tendency to add additional explanatory variables as ’controls’ in order to explain the change. While this may be sensible in certain cases, it is unnecessary and even obfuscating if the changes brought about by such outside forces are slow and independent of the relationships under study.Footnote 9 Further, the introduction of such controls seems, statistically speaking, problematic because of the following theorem that has been provided by Theo Dijkstra (1995, 122).

Pyrrho’s Lemma: For every collection of vectors, consisting of observations on a regressand and regressors, it is possible to get any set of coefficients as well as any set of predictions with variances as small as one desires, just by adding one additional vector from a continuum of vectors.

In other words: There exists a time series \(x_{n+1}\) that, if added to the explanatory variables \(x_{1,}x_{2},\ldots ,x_{n}\) in the standard linear model (1.2), will deliver arbitrarily predetermined coefficients and variances as estimates. This should make us reluctant to seek to explain too much by inserting additional controls which, taken together, span an entire set of such additional time series. Further, the procedure can generate the mirage of a ’true model’ in cases when such a model actually does not exist. Using VC reduces the necessity for adding further controls and mitigates, therefore, Pyrrho’s problem.

Let me add another remark. The VC model (1.4), (1.5) can easily be generalized in many ways. A possibility would be, for instance, to replace \(a_{i,t+1}=a_{i,t}+v_{i,t}\) by \(a_{i,t+1}=\theta _{i}\left( a_{i,t}-\bar{a}_{i}\right) +v_{i,t}\). Such generalizations (and many more) can be handled by Kalman filtering. So why not allow for more general specifications?

My objection would be that such generalizations would impinge on the descriptive transparency of the VC method which is, to me, a major concern—trumping more technical statistical considerations.

An estimation method, such as VC, can be viewed as a filter that seeks to identify certain patterns in clouds of data. In doing so, such a filter gives preference to certain patterns rather than others. The patterns preferred by the VC method conform to the desiderata underlying the descriptive account (Sect. 1.4). These are that the coefficients remain as time-invariant as possible and that a good fit is obtained. This makes sure that all estimated variability over time is driven by the data, rather than by another preference of the model, as would be the case in auto-regressive specifications.

Unfortunately the determination of weights used in VC is descriptively less transparent than the desiderata of stable coefficients and a good fit, but it carries nevertheless some descriptive meaning; in this regard, at least, there is room for improvement.

6 Conclusion

The VC method outlined in this paper addresses linear models with coefficients that are generated by a random walk. This is a rather special case, as it does not cover models with coefficients that presuppose more general stochastic processes. Yet it is a case that has received special attention in the literature, at least in economics. Focussing on this somewhat narrow class of models offers some benefits, however. Non-Gaussian disturbances can be admitted. The researcher is not required to postulate initial values that are typically unknown. Further, the time-averages of the estimated coefficients can be linked to the coefficients of an associated linear GLS (Aitken) model with constant coefficients. Further, the VC method permits an easy treatment of the case that only a subset of the coefficients varies over time while the rest of them remains constant. In all this, it allows for a more satisfactory treatment than would be possible within more general approaches, such as Kalman filtering.

Regarding further developments, and being confined to the rather narrow perspective of an economist, I would find it appropriate to conceive a way for dealing with missing observations in an appropriate way.Footnote 10 Further, moments estimators for simultaneous equations, again with a clear descriptive interpretation, appear desirable. Yet any attempt in this direction definitely exceeds my capabilities.