VC: a method for estimating time-varying coefficients in linear models

This paper describes a moments estimator for a standard state-space model with coefficients generated by a random walk. The method calculates the conditional expectations of the coefficients, given the observations. A penalized least squares estimation is linked to the GLS (Aitken) estimates of the corresponding linear model with time-invariant parameters. The estimates are moments estimates. They do not require the disturbances to be Gaussian, but if they are, the estimates are asymptotically equivalent to maximum likelihood estimates. In contrast to Kalman filtering, no specification of an initial state or an initial covariance matrix is required. While the Kalman filter is one sided, the filter proposed here is two sided and therefore uses more of the available information for estimating intermediate states. Further, the proposed filter has a clear descriptive interpretation.

a standard linear regression. In order to do that, we have to add an error term u t to capture discrepancies of the empirical from the theoretical regularity due to measurement errors etc. and obtain In many cases it appears improbable, however, that outside influences not captured in the theoretical model affect only the disturbance term, and not the coefficients themselves. In the case of economics, we may think of changes in technology, preferences, market structure, and the composition of aggregates. All change over time and may affect the coefficients themselves.
In economics, the problem of possibly time-varying coefficients was the subject of the famous Keynes-Tinbergen controversy around 1940. 2 While Tinbergen (1940, 153) defended the use of regression analysis with the argument that in "many cases only small changes in structure will occur in the near future", Keynes (1973, 294) objected that "the method requires not too short a series whereas it is only in a short series, in most cases, that there is a reasonable expectation that the coefficients will be fairly constant." It appears that both arguments are correct. The VC model takes care of both by assuming that the coefficients change slowly over time: They are highly auto-correlated. This is formalized by a random walk (Athans 1974;Cooley and Prescott 1973;Schlicht 1973). If a i,t denotes the state of coefficient a i at time t, it is assumed that with the disturbance term v i,t of expectation zero and with variance 2 i . The assumption of expectation zero formalizes the idea that "the coefficients will be fairly constant" in the short run, while the variance 2 i is a measure of the stability of coefficient i and is to be estimated. For 2 i = 0 for some i, the case of a constant (time-invariant) coefficients is covered as well. As a consequence, the standard linear model is replaced by This is the VC model that is presupposed in the following.

Properties of the VC method
The VC method that will be developed in this paper estimates the expected timepaths of the coefficients a i,t for given observations x i,t and y t with i = 1, 2, … , n and (1.2) y t = a 1 x 1,t + a 2 x 2,t + ⋯ + a n x n,t + u t , t = 1, 2, … , T.
T = 1, 2, … , T . It can be viewed as a straightforward generalization of the method of least squares.
• While the method of ordinary least squares selects estimates that minimize the sum of squared disturbances ∑ T t=1 u 2 t in the equation, VC selects estimates that minimize the sum of squared disturbances in the equation and a weighted sum of squared disturbances in the coefficients, i.e. ∑ T where the weights for the changes in the coefficients 1 , 2 , … , n are determined by the inverse variance ratios, i.e. i = 2 ∕ 2 i . In other words, it balances the desiderata of a good fit and parameter stability over time. • Estimation can proceed by focusing on some selected coefficients and keeping the remaining coefficients constant over time. This is done by keeping the corresponding variances 2 i close to zero, rather than estimating them. (If all coefficients are frozen in this manner, the OLS result is obtained.) • The time-averages of the regression coefficients 1 T ∑ t a t are GLS estimates of the corresponding regression with fixed coefficients. • The VC method does not require initial values for the initial state and the initial variances. Rather all states and variances are estimated in an integrated unified procedure. This is an advantage over Kalman filtering which is typically quite sensitive to the choice of initial values, especially when dealing with shorter time series. • The VC method links the purely descriptive method of employing non-parametric splines through penalized least squares with an explicit statistical model with random-walk coefficients. This offers the possibility of model-based estimation. • All estimates are moments estimates. It is not necessary to presuppose Gaussian disturbances. • For increasing sample sizes T and under the assumption that all disturbances are normally distributed, the moments estimates approach the maximum likelihood estimates.

Notation and basic assumptions
All vectors are conceived as column vectors, and their transposes are indicated by an apostrophe. The observations at time t are x � t = x 1,t , x 2,t , … , x n,t and y t for t = 1, 2, … , T . We write We write further and define with I n denoting the identity matrix of order n and ⊗ indicating the Kronecker product operator. Note that p and P are of full rank.
The model is obtained by writing Eqs. (1.4) and (1.5) in matrix form: The model Note that the explanatory variables X are taken as predetermined, rather than stochastic.
. 0 x Regarding the observations X and y we assume that a perfect fit of the model to the data is not possible: This assumption rules out the (trivial) case that the standard linear model (1.2) fits the empirical data perfectly, a case that cannot reasonably be expected to occur in practical applications. Further, the assumption implies that the number of observations exceeds the number of coefficients to be estimated:

Least squares
In a descriptive spirit, the time-paths of the coefficients can be determined by following the penalized least squares approach, where some criteria are employed that formalize some descriptive desiderata. 3 In the case at hand, the desiderata are that the model fits the data well and that the coefficients change only slowly over timeu and v ought to be as small as possible. The sum of the squared errors u ′ u is taken as a criterion for the goodness of fit of Eq. (1.7), the weighted sum of the squared changes of the coefficients v ′ i v i over time give criteria for the stability of the coefficients over time. The combination of all these criteria gives an overall criterion that combines the desiderata of a good fit and stability of coefficients over time. The weights 1 , 2 , … , n give the relative importance of the stability of the coefficients over time, where weight i relates to coefficient a i . For the time being, these weights are taken as given but will later be estimated, too. Write and Adding the sum of squares u ′ u and the weighted sum of squares v ′ Gv gives the overall criterion This expression is to be minimized under the constraints given by the model (1.7), (1.8) with the observations X and y : (1.9) T > n.

3
This determines the time-paths of the coefficients a that optimize this criterion. Hence we can write The weighted sum of squares Q is the sum of two positive semi-definite quadratic forms. Assumption 1 rules out the case that Q can be zero. Hence Q is positive definite and of full rank. The first order condition for a minimizing a is and the second order condition is that the Jacobian is positive definite, which is the case. Solving (1.16) for a and plugging this into (1.13) and (1.14) gives the estimates where the subscript LS stands for "least squares".

The varying coefficients model in stochastic mode
This section considers the statistical treatment of the VC model under the assumption that the variances of the disturbances are known. With the parametrization outlined in Sect. 2.1, the VC model gives rise to a GLS (Aitken) model that permits to estimate the time-averages of the coefficients. With these estimates, the conditional expectations for the coefficients a i,t for given observations X and y can be determined (Sect. 2.2). If the weights chosen for the descriptive estimation outlined in Sect. 1.4 are equal to the inverse variance ratios, the descriptive estimation and the conditional expectation coincide (Sect. 2.3).

Orthogonal parametrization
For purposes of estimation we need a model that explains the observation y as a function of the observations X and the random variables u and v. This would permit calculating the probability distribution of the observations y contingent on the parameters of the distributions of u and v, viz 2 and . The true model does not permit such an inference, though, because the matrix P is of rank (T − 1)n rather than of rank Tn and cannot be inverted. Hence v does not determine a unique a but rather the set of solutions with as a shift parameter, as the right-hand pseudo-inverse of P given in (1.6) of order Tn × (T − 1)n , and the matrix of order Tn × n . It is orthogonal to P: with the square matrix P ′ , Z of full rank. For any v we have a ∈ A ⇔ Pa = v . Hence Eq. (1.7) and the set (2.1) give equivalent descriptions of the relationship between a and v. Note that Regarding the matrices P, P , and Z we have In view of (2.1), any solution a to Pa = v can be written as for some ∈ ℝ n . Equation (1.7) can be re-written as (2.5) (2.6) a =Pv + Z (2.7) y = u + XPv + XZ .

3
The model (2.6), (2.7) will be referred to as the equivalent orthogonally parameterized model. It implies the true model (1.7), (1.8). It implies, in particular, that a t is a random walk even though a t depends, according to (2.6), on past and future realizations of v t .
The formal parameter has a straightforward interpretation. Pre-multiplying (2.6) by Z ′ gives and therefore Hence gives the averages of the coefficients a i,t over time.
Equation (2.7) permits calculating the density of y dependent upon the parameters of the distributions of u and v and the formal parameters . In a second step, all these parameters-2 , , and -can be determined by moments estimators that will be derived in Sect. 3.1.
The orthogonal parametrization (proposed in Schlicht 1985, Sec. 4.3.3 in another context) entails some advantages with respect to symmetry and mathematical transparency, as compared to more usual procedures, such as parametrization by initial values. It permits to derive our moments estimator that does not require normally distributed disturbances, and to write down an explicit likelihood function for the case of normally distributed disturbances that permits estimation of all relevant parameters in a unified one-shot procedure.
The formal parameter vector relates directly to the coefficient estimates of a standard generalized least squares (GLS, Aitken) regression. Equation (2.7) can be interpreted as a standard regression for this parameter vector with the matrix x = XZ giving the explanatory variables: and the disturbance It has expectation zero and covariance The Aitken estimate A satisfies or (2.9) (2.13) where the subscript A stands for "Aitken". As x = XZ and W = XPVP � X � + 2 I T , Eqs. (2.13) and (2.14) can be written as and Eq. (2.14) gives rise to

The filter
This section derives the VC filter which gives the expectation of the coefficients a for given observations X and y, a given shift parameter , and given variances 2 and . For given and X, the vectors y and a can be viewed as realizations of random variables determined jointly by the system (2.6), (2.9) as brought about by the disturbances u and v: The covariance is The marginal distribution of y is as given by (2.9) and (2.12). On this basis, we take our estimate of a as which is the expectation of a for the case that u and v are Gaussian and y, , 2 , and are given. (It will turn out later on that a A is the expectation of a for non-Gaussian disturbances as well, see Eq. (2.27) below.) Note that the variance-covariance matrix of w, as given in Eq. (2.12), tends to 2 I T if the the variances 2 i go to zero, and Eq. (2.7) approaches the standard unweighted linear regression. In this sense, the OLS regression model is covered as a special limiting case by the model discussed here. (2.14)

Least squares and aitken
The following theorem states that the least squares estimator a LS and the Aitken estimator a A coincide if the weights are given by the variance ratios.
Proof Consider first the necessary conditions for a minimum of (1.12). The firstorder condition (1.16) defines a LS with weights G = 2 V −1 uniquely and can be written as It will be shown that (2.17) implies which will establish the proposition. Pre-multiplication of (2.17) by X � X + 2 P � V −1 P gives Because of PZ = 0 this can be written as Adding and subtracting 2 X � XPVP � X + 2 I T −1 y − XZ A and using P �P� = I Tn − ZZ � results in which reduces to Journal of the Korean Statistical Society (2021) 50:  According to (2.15), the last term is zero and we obtain This shows that the least squares estimator a LS and the Aitken estimator a A coincide. ◻ As a consequence of Claim 1, the least-squares estimates for u, v, and w and their Aitken counterparts coincide for G = 2 V −1 . We need not distinguish them and denote all our estimates by circumflex: For the sake of completeness and later use, the following observation is added: In other words: the sum of squared deviations weighted by the variance ratios Proof As ŵ = XPv +û , we have With (2.5), (2.9), (2.12), and (2.20) this gives

and finally
Hence the weighted sum of squares Q equals the squared Mahalanobis distance. ◻ Consider now the distribution of â . The matrix X � X + 2 P � V −1 P , henceforth referred to as the "system matrix", will be denoted by M: With this, the normal equation (2.19), that defines the solution for the vector of the coefficients â can be written as With (1.7) and (2.24) we obtain Given a realization of the time-path of the coefficients a, the estimator â is distributed with mean and covariance which reduces tô Journal of the Korean Statistical Society (2021) 50:  and finally to The system matrix (2.24) is determined by the observations X, the variance 2 and the variances . Equation (2.28) gives the precision of our estimate which is directly related to the system matrix M. The next step is to determine the variance 2 and the variances .

Variance estimation
This section turns to estimating the variances. In Sect. 3.1 the proposed moments estimators will be derived and in Sect. 3.2 a maximum likelihood criterion C L will be given that is based on the parameterized model described in Sect. 2. In Sect. 3.4 a moments criterion C M will be given that generates, upon minimization, the moments estimators and it will be argued that, for large T, both criteria approach each other. As a consequence, the theoretical appeal of the likelihood estimator for large samples carries over to the moments estimator in the Gaussian case.

Moments estimation
The moments estimator that will be developed in this section has, for any sample size, a straightforward interpretation: It is defined by the property that the variances of the disturbances in the estimated coefficients equal their expectations. It has, thus, a straightforward connotation even in shorter time series and does not presuppose that the perturbations u and v are normally distributed. It will be shown later that the moments estimators approach the respective maximum likelihood estimators in large samples if the disturbances are normally distributed. In the following we denote the estimated coefficients by â and the estimated perturbations by û and v . For some variances 2 and ∑ = diag � 2 1 , 2 2 , . . ., 2 n � , the estimated coefficients â along with the estimated disturbances û and v are random variables brought about by realizations of the random variables u and v. Consider û = y − Xâ = X(a −â) + u first. With (2.26) we obtain Regarding v , consider the vectors v � i = v 2 i,1 ,v 2 i,3 , . .. ,v 2 i,T−1 for i = 1, 2, … , n , that is, the disturbances in the coefficients â i for each coefficient separately. These are obtained as follows.
Denote by e i ∈ ℝ n the n-th column of an n × n identity matrix and define the (T − 1) × (T − 1)n-matrix (3.5) E Q = 2 (T − n).

and hence
In a similar way, the expectation of the squared estimated disturbance in the i-th

and hence
Regarding Q we note that and obtain The moments estimators are obtained by selecting variances 2 and 2 i , i = 1, 2, … , n such that the expected moments E û ′û and E v � iv i , i = 1, 2, … , n given in (3.6) and (3.7) are equalized to the estimated moments û ′û and v � iv i , i = 1, 2, … , n . As both the expected moments and the estimated moments are functions of the variances, the moments estimators, denoted by ̂2 and ̂2 i , i = 1, 2, … , n , respectively, are defined as a fix point of the system Alternatively, the moments estimators can be equivalently defined as a fix point of the system: The implementations by Schlicht (2005bSchlicht ( , 2021 use the latter alternative and employ a gradient process to find the solution of the equation system This can be written as

3
Iteration starts with some variance ratios i = 2 2 i . This permits to determine the right-hand sides of Eqs. (3.8) and (3.9). The variance ratios at the left-hand side of (3.8) and the variance at the left hand side of (3.9) are used for a new iteration, and this continues until convergence is reached, delivering the fix-point values ̂i =̂2 another solution procedure is available that will be discussed in Sect. 3.3 below.)

Likelihood estimation
This section derives a maximum-likelihood estimator for the variances under the additional assumption that the disturbances u and v are normally distributed. Using Eqs. (1.8) and (2.10)-(2.14) together with the identity x = XZ , the concentrated log-likelihood function for the Aitken regression (2.9) can be written as with By maximizing (3.10) with respect to , 2 and , the maximum likelihood estimates for the variances are obtained and the corresponding expectation for the parameter a is given in analogy to (2.20) as with a caron denoting the maximum likelihood estimates and V = I T−1 ⊗̌ .
The maximum likelihood estimator can be characterized in another way. This will be explained in the following. In order to do so, the following lemma is needed. (3.10) and log det PP ′ and T(log 2 + log ) are independent of the variances, we can write where " constant " is independent of the variances and maximization of L with regard to the variances is equivalent to minimization of C L . ◻

Another representation of the moments estimators
The relationship between the likelihood estimator and the moments estimator can be elucidated with the aid of a criterion that is very similar to the likelihood criterion (3.12). This criterion function is

Claim 6 Minimization of the criterion function (3.13) with respect to the disturbances u and v and the variances 2 and yields the moments estimators as defined in (3.3) and (3.4).
Proof Note that the envelope theorem together with (3.2) implies In view of (3.2) we obtain further (3.14)

3
Journal of the Korean Statistical Society (2021) Ludsteck's (2004Ludsteck's ( , 2018 Mathematica packages for VC proceed by minimizing the criterion function (3.13). This permits very clean and transparent programming. As Claim 6 is confined to moments and does not require any assumption about the normality of the disturbances, Ludsteck's estimators are moments estimators as well.

The relationship between the likelihood and the moments estimator
The likelihood estimates minimize, according to Claim 5, the criterion C L and the moments estimates minimize, according to Claim 6, the criterion C M . It is claimed in the following that, for increasing T and bounded X, both estimates tend to coincide. To show that, the following lemma is needed.

Claim 7 For sufficiently large T and bounded explanatory variables X, the following holds true approximately:
Proof Define the Tn × Tn matrix

3
and consider the matrix ℙMℙ ′ . One way to calculate it is as follows:

This implies
For increasing T and bounded x, 1 T xx ′ tends to zero and I T − 1 T xx � tends to I T . Hence det ℙMℙ ′ tends to det PMP ′ and we can write for large T. Another way to evaluate det (ℙMℙ) is the following: As is obtained. Combining (3.21) and (3.22) gives the result. ◻

Claim 8 For increasing T and with bounded explanatory variables X, the moments criterion and the likelihood criterion coincide.
For increasing T and in view Claim 7, C M tends to C L . ◻ Hence the minimization of both criteria with respect to the variances will generate in the limit the same result. 4 In consequence, the descriptive appeal of the moments estimator carries over to the likelihood estimator, and the theoretical appeal of the likelihood estimator carries over to the moments estimator.

Miscellaneous notes
The following offers remarks on computation (Sect. 4.1), comments on some applications of the VC method in economics that illustrate aspects of the VC method of potential interest in other fields (Sect. 4.2). Some illustration provided by simulation studies is given (Sect. 4.3). Section 4.4 discusses the problem of artifacts. Some methodological concerns are raised in Sect. 4.5.

Notes on computation
The VC method has been embodied in some freely available software packages (Ludsteck 2004(Ludsteck , 2018Schlicht 2005bSchlicht , 2021. Although these have been developed under the assumption that all disturbances are Gaussian, the numerical routines, briefly sketched at the end of Sects. 3.1 and 3.3, remain appropriate for the non-Gaussian case. Schlicht and Ludsteck (2006, Sec. 11) have compared the performance of the moments estimator with that of the Kalman filter in the EViews (2005) implementation for the Gaussian case and conclude that "both estimators perform very similar-with the caveat that the Eviews estimates have been calculated by using the theoretical values as starting values. ...The distributions of the estimates for the weights are practically indistinguishable." Given that true variances would be unavailable in practical applications and that the the Kalman results appear to be quite sensitive to the choice of initial values, that speaks for the VC method in the case that the coefficients follow a random walk. Further, the VC method dispenses of necessity to specify initial values and offers additional descriptive features, as indicated by Claim 1 and Eq. (2.8).

Notes on applications
In spite of its so far insufficient documentation, VC has found a quite a number of applications in various settings, mainly dealing with structural change. As any of the 4 A referee rightly pointed out that, in general, the convergence of functions does not necessarily imply the convergence of their maximizers. In this case, this criticism does not seem to apply, because both C M and C L are smoothly differentiable functions. The minima are characterized by the gradient equations (3.17), (3.18), i.e. authors of these studies will be a better judge regarding the practical performance of the VC method than this author (who is neither an applied economist, nor an econometrican, nor a statistician), any comments in this regard from my side appear unwarranted. Yet it may be appropriate to illustrate possible uses of the VC method by means of some examples taken from my field, economics.
In the wake of the financial crisis of 2008, it has been observed that "monetary policy rules change gradually, pointing to the importance of applying a time-varying estimation framework" (Baxa et al. 2014) and that, "by applying the time-varying coefficients method ...it was clear that the past financial crisis caused the central bank to be more expansionary in its policy than usual towards financial stress" (Madsen 2012). Further, analyses of inflation targeting (IT) in "a time-varying coefficients methodology ...show a clear picture of credibility gains from the adoption of IT" (Nogueira 2009). Another application dealt with the recent decoupling of greenhouse gas emissions and gross domestic product in the wake of global warming where it has been found that "the evidence for decoupling among the richer countries gets weaker." (Cohen et al. 2017). Regarding the relationship between unemployment and economic growth, known in economics as "Okun's Law", it has been contested that the relationship has been static over time (Jalles 2018) and that, actually, "deregulation in labor and product markets and recessions have strengthened the response of unemployment to the business cycle" (Furceri et al. 2019).
Such applications suggest to me that the VC method may offer an additional useful way for dealing with linear models with coefficients that follow a random walk, and I hope that similar applications will be found in other fields.

Some illustration
To illustrate the practical workings of VC, assume a model with an intercept term a t and a single explanatory variable x t with coefficient b t 5 : Using the simulation tool from Ludsteck (2004;, a time series for the explanatory variable was generated with x t ∼ N(0, 100) , t = 1, 2, … , 50 . Further it was assumed that u t ∼ N(0, 0.1) , a t − a t−1 ∼ N(0, 0.01) , and b t − b t−1 ∼ N(0, 0.001) .
Typically the optimally computed expectations of the time paths (calculated by using the true variances) and the VC estimates lie very close together. Figure 1 illustrates a somewhat atypical run with estimated smoothing weights that deviate from the true smoothing weights by the order of five. The optimally estimated time-paths of the coefficients (based on the true variances) and the estimated time-paths (based on the estimated coefficients) move together. This illustrates the general impression that the filtering results, especially the qualitative time-patterns, are not extremely sensitive with regard to the weights used for filtering.
It is, obviously, never possible to extract the movement of the true coefficients from the data, irrespective how long the time series is. (Only the estimation of the weights will improve with the length of the time series.) The best that can be done is to estimate the expectations of the coefficients. Given the variances, the VC estimate (which is the mean of a random vector) is optimal and cannot be improved upon, and the standard of comparison must be the estimates obtained with optimal weights, as in Fig. 1.
The distribution of the weights in the above setting is illustrated in Fig. 2. The time series for x, u, and v have been generated as described above and the VC moments estimation applied 5000 times. The histogram Fig. 2 illustrates that the estimates cluster around their theoretical values.

Artifacts
Suppose that the data of a particular problem have been generated by the standard linear model (1.2). If this is the case, the VC model is misspecified, because a correct estimation would require that the variances 2 1 , 2 2 , … , 2 n of the coefficients are zero and the weights 1, 2 , … , n -the inverse variance ratios-are infinite, whereas VC implicitly assumes that the weights are finite. As the VC estimates with sufficiently large weights i are indistinguishable from the OLS estimates, the VC estimation would nevertheless be approximately correct if the estimated weights are sufficiently large. 6 As VC estimates involve nearly twice as many parameters as OLS, there is more room for artifacts in VC. From this point of view, VC ought to be used with caution, especially if all parameters are permitted to vary over time, rather just a selected few.  Fig. 1 Optimally calculated expectations (thin lines) and VC estimates (thick lines) for intercept (left) and slope (right), together with the realizations of the coefficients (x) and the VC confidence bands. The example has been selected to visually exhibit differences between the true expectations and the VC estimates; usually the weights are estimated better and the curves lie quite close together. As the estimated smoothing weights are considerably smaller than the true weights, the time-paths of the VC estimates are less smooth than the true expectations (True weights are a = 10 and b = 100 , while the estimated weights are ̂a = 1.60 and ̂b = 14.76 here. The true variances are 2 u = 0.1 , 2 a = 0.01 , and 2 b = 0.001 , the estimated variances are ̂2 u = 0.04 0, ̂2 a = 0.025 , and ̂2 b = 0.0029) 6 The option "keep selected coefficients constant" in Schlicht (2005bSchlicht ( , 2021 is implemented with 2 i = 10 −10 for those coefficients that are kept constant.
To illustrate, consider a linear model y t = a + bx t + u t with a = 1, b = 2 , x t drawn from a Normal distribution with mean zero and variance 5, and u t normally distributed with mean zero and variance 2 u = 0.1 . The histogram of the lowest estimated weights is given in Fig. 3. In 99% of the cases, the minimum weight is above 7.97, and in 95% of the cases, the minimum weight is above 34.6. The corresponding VC estimates are given in Fig. 4. In the 1% case, the estimate of the time paths involve severe artifacts. In the 5% case, artifacts are still there, but in the majority of cases, VC estimates conform to OLS estimates. Further, VC does not reject the hypothesis of time-invariant parameters in 99 per cent. of the cases. This observation suggests that VC may be used to check the linear specification of a time-series model.
With higher/lower noise, the problem of artifacts becomes more/less severe. 7 Still the problem has to be kept in mind when interpreting VC results.

Aggregate data, Pyrrho's lemma, and the VC philosophy
Almost all economic models deal with aggregate data. Employment comprises women and men, different age groups and various occupations in sundry industries scattered over many regions. The wage level summarizes the earnings of all these people. Similarly, production comprises a multitude of goods and services, and the price level is just an index of thousands of the attached prices. The structures of these aggregates are not rigid but change over time in response to changing technologies, shifting tastes, and volatile business conditions. To assume that time-invariant laws govern the interaction of time series of such aggregates seem preposterous to me. Some researchers tried to cope with the problem by using weighted regression-giving higher weights to more recent observations (Gilchrist 1967, Rouhiainen 1978. This seems to me to be an inferior alternative to VC. The reason for developing VC was my desire to show that a Marshallian view of economics, that involves time-varying structures, does not render quantitative  Fig. 3. The dashed lines give the confidence band for ± two standard deviations. The red lines indicate the OLS estimates of the coefficients. The true coefficients are 1 and 2. As the OLS estimates fit into the confidence bands, VC does not reject the case of constant coefficients 7 Even in the rather ill-conditioned case of 2 u = 1 VC does not reject the hypothesis that the coefficients may be time-invariant in 90 per cent. of the cases. The interested reader may explore the programming underlying Figs. 3 and 4 as well as further cases by consulting the Mathematica Notebook given in the accompanying material. Other cases of interest may be explored by running the notebook with alternative parameter settings. economics impossible. Estimation can be done by using Kalman filtering, or the VC method described in this paper, or perhaps other methods. I advocated estimating time-varying structures with Kalman filtering in Schlicht (1977, Appendix B), but without any resonance. This puzzled me. Was this really such a bad idea?
Maybe it wasn't, but the puzzle remains. What were the reasons for the decadelong resistance to dealing with time-varying coefficients? And why has this somewhat changed over the past fifteen years?
One reason may have been that structures changing over time cannot represent the 'true model' economists were chasing during the heydays of 'dynamic stochastic general equilibrium' macroeconomics. The existence of such a 'true model' was simply postulated (Lucas 1976, 24). I think that this is, in the context of aggregate models dealing with long-run time series, a red herring, distracting from considering seriously what aggregate models represent. 8 Another reason, I submit, was the reductionist bent of economists. If a structure changes over time, this warrants explanation. Hence there was a tendency to add additional explanatory variables as 'controls' in order to explain the change. While this may be sensible in certain cases, it is unnecessary and even obfuscating if the changes brought about by such outside forces are slow and independent of the relationships under study. 9 Further, the introduction of such controls seems, statistically speaking, problematic because of the following theorem that has been provided by Theo Dijkstra (1995, 122).
Pyrrho's Lemma: For every collection of vectors, consisting of observations on a regressand and regressors, it is possible to get any set of coefficients as well as any set of predictions with variances as small as one desires, just by adding one additional vector from a continuum of vectors.
In other words: There exists a time series x n+1 that, if added to the explanatory variables x 1, x 2 , … , x n in the standard linear model (1.2), will deliver arbitrarily predetermined coefficients and variances as estimates. This should make us reluctant to seek to explain too much by inserting additional controls which, taken together, span an entire set of such additional time series. Further, the procedure can generate the mirage of a 'true model' in cases when such a model actually does not exist. Using VC reduces the necessity for adding further controls and mitigates, therefore, Pyrrho's problem.
Let me add another remark. The VC model (1.4), (1.5) can easily be generalized in many ways. A possibility would be, for instance, to replace a i,t+1 = a i,t + v i,t by a i,t+1 = i a i,t −ā i + v i,t . Such generalizations (and many more) can be handled by Kalman filtering. So why not allow for more general specifications?
My objection would be that such generalizations would impinge on the descriptive transparency of the VC method which is, to me, a major concern-trumping more technical statistical considerations. the VC method. Without the encouragement that this conveyed I would not have been motivated to write this documentation.
Funding Open Access funding enabled and organized by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.