The New Palgrave Dictionary of Economics

2018 Edition
| Editors: Macmillan Publishers Ltd

Vector Autoregressions

Reference work entry


Vector autoregressions are a class of dynamic multivariate models introduced by Sims (1980) to macroeconomics. These models have been primarily used to bring empirical regularities out of the time series data, to provide forecasting and policy analysis, and to serve as a benchmark for model comparison. Economic applications often impose more restrictions on vector autoregressions than originally thought necessary. Recent econometric developments have made it feasible to handle vector autoregressions with a wide class of restrictions and have narrowed the gap between these models and dynamic stochastic general equilibrium models.


Bayesian econometrics Bayesian priors Cowles commission Dynamic multivariate models Dynamic stochastic general equilibrium models Gibbs samplers Identification Impulse responses Likelihood function Marginal data density Markov chain Monte Carlo method Markov-switching vector autoregressions Probability density function Recursive identification Structural shocks Structural vector autoregressions Vector autoregressions 

Vector autoregressions (VARs) are a class of dynamic multivariate models introduced by Sims (1980) to macroeconomics. These models arise mainly as a response to the ‘incredible’ identifying assumptions embedded in traditional large-scale econometric models of the Cowles Commission. The traditional approach uses predetermined or exogenous variables, coupled with many strong exclusion restrictions, to identify each structural equation. VARs, by contrast, explicitly recognize that all economic variables are interdependent and thus should be treated endogenously. The philosophy of VAR modelling begins with a multivariate time series model that has minimal restrictions, and gradually introduces identifying information, with emphasis always placed on the model’s fit to data.

While the traditional econometric approach allows disturbances or shocks to structural equations to be correlated, the VAR methodology insists that structural shocks ought to be independent of one another. The independence assumption plays an essential role in achieving unambiguous economic interpretations about structural shocks such as technology and policy shocks; it can be tested using recently developed econometric tools (Leeper and Zha 2003). The bulk of VAR work has focused on identifying structural shocks as a way to specify the contemporaneous relationships among economic variables. With most dynamic relationships unrestricted, the intent of such an identifying strategy is to construct models that have both economic interpretability and superior fit to data. Dynamic responses to a particular shock, called impulse responses, are often used as economic interpretations to the model. They summarize the properties of all systematic components of the system and have become a major tool in modern economic analysis.

Modelling policy shocks explicitly is important in addressing the practical importance of the Lucas critique. If policy switches regime, such a change may be viewed as a sequence of random shocks from the public’s viewpoint (Sims 1982). If this sequence displays a persistent pattern, the public will adjust its expectations formation accordingly and the Lucas critique may be consequential. For the practice of monetary policy, however, it is an empirical question how significant this adjustment is. Leeper and Zha (2003) construct an econometric measure from the sequence of policy shocks implied by regime switches to gauge whether the public’s behaviour could be well approximated by a linear model. This measure is particularly useful if counterfactual exercises regarding the effects of policy changes are conducted with respect to the Lucas critique.

VARs have also been used for other tasks. Armed with a Bayesian prior, VARs have been known to produce out-of-sample forecasts of economic variables as well as, or even better than, those from commercial forecasting firms (Litterman 1986; Geweke and Whiteman 2006). Because of their ability to forecast, VARs have given researchers a convenient diagnostic tool to assess the feasibility or plausibility of real-time policy projections of other economic models (Sims 1982). VARs have been increasingly used for policy analysis and as a benchmark for comparing different dynamic stochastic general equilibrium (DSGE) models. Restrictions on lagged coefficients have been gradually introduced to give more economic interpretations to individual equations. All these developments are positive and help narrow the gap between statistical and economic models.

This article discusses these and other aspects of VARs, summarizes some key theoretical results for the reader to consult without searching for different sources, and provides a perspective on where future research in this area will be headed.

General Framework

Structural Form

VARs are generally represented in a structural form of which the reduced form is simply a byproduct. The general form is
$$ {\mathbf{y}}_t^{\prime}\mathbf{A}=\sum \limits_{l=1}^p{\mathbf{y}}_{t-l}^{\prime }{\mathbf{A}}_l+{\mathbf{Z}}_{ti}^{\prime}\mathbf{D}+{\varepsilon}_t^{\prime }, $$
where yt is an n × 1 column vector of endogenous variables, A and Al are n × n parameter matrices, zt is an h × 1 column vector of exogenous variables, D is an h × n parameter matrix, p is the lag length, and εt is an n × 1 column vector of structural shocks. The parameters of individual equations in (1) correspond to the columns of A, Al, and D. The structural shocks are assumed to be i.i.d. and independent of one another:
$$ E\left({\varepsilon}_t|{\mathbf{y}}_{t-s},s>0\right)=\underset{n\times 1}{0},\kern0.5em E\left({\varepsilon}_t\varepsilon\;{\prime}_t|{\mathbf{y}}_{t-s},s>0\right)=\underset{n\times n}{\mathbf{I}}, $$
where 0n×n is the n × n matrix of zeros and In × n is the n × n identity matrix. It follows that the reduced form of (1) is
$$ {\mathbf{y}}_t^{\prime }=\sum \limits_{l=1}^p{\mathbf{y}}_{t-l}^{\prime }{\mathbf{B}}_l+{\mathbf{z}}_t^{\prime}\mathbf{C}+{u}_t^{\prime }, $$
where Bl = AlA−1, C = DA−1, and \( {u}_t^{\prime }={\varepsilon}_t^{\prime }{\mathbf{A}}^{-1} \). The covariance matrix of ut is Σ = (AA)−1

In contrast to the traditional econometric approach, the VAR approach puts emphasis almost exclusively on the dynamic properties of endogenous variables yt rather than exogenous variables zt. In most VAR applications, zt simply contains the constant terms.


One main objective in the VAR literature is to obtain economically meaningful impulse responses to structural shocks εt. To achieve this objective, it is necessary to impose at least n(n − 1)/2 identifying restrictions, often on the contemporaneous coefficients represented by A in the structural system (1). In his original work, Sims (1980) makes the contemporaneous coefficient matrix A triangular for identification. The triangular system, often called the recursive identification, has a ‘Wold chain causal’ interpretation which is based on the timing of how shocks affect variables contemporaneously. It assumes that some shocks may influence only a subset of variables within the current period. This identification is still popular because it is straightforward to use and can yield some results that match widely held views. Christiano et al. (1999) discuss extensively how recursive identification can be used in policy analysis.

There are fundamental economic applications that require identification under alternative assumptions rather than the recursive system. One familiar example is the determination of price and quantity as discussed in Sims (1986) and Gordon and Leeper (1994). Both variables are often determined simultaneously by the supply and demand equations in equilibrium; this simultaneity is inconsistent with recursive identification. Bernanke (1986) and Blanchard and Watson (1986) pioneered other applications of non-recursive identified VARs. Estimation of non-recursive VARs presents technical difficulties that are absent in recursive systems. These difficulties help explain the use of recursive VARs even if this maintained assumption is implausible. Recent developments in Bayesian econometrics, however, have made it feasible to estimate non-recursive VARs.

All of these works focus on the contemporaneous coefficient matrix. There are other ways to achieve identification. Blanchard and Quah (1993) and Gali (1992) propose using identifying restrictions directly on short-run and long-run impulse responses, which have been used in quantifying the effects of technology shocks and various nominal shocks, although the unreliable statistical properties of long-run restrictions are documented by Faust and Leeper (1997).

Many VAR applications rely on exact identification: the number of identifying restrictions equalsn(n − 1)/2. This counting condition is necessary but not sufficient for identification. To see this point, consider a three-variable VAR with the following restrictions
$$ \mathbf{A}=\left[\begin{array}{ccc}\hfill {}^{\ast}\hfill & \hfill {}^{\ast}\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill {}^{\ast}\hfill & \hfill {}^{\ast}\hfill \\ {}\hfill {}^{\ast}\hfill & \hfill 0\hfill & \hfill {}^{\ast}\hfill \end{array}\right] $$
where *’s indicate unrestricted coefficients and 0’s indicate exclusion restrictions. This VAR is not identified because in general there exist two distinct sets of structural parameters that deliver the same dynamics of yt. For larger and more complicated systems with both short-run and long-run restrictions, there has been, until recently, no practical guidance as to whether the model is identified. The paper by Rubio-Ramirez et al. (2005) develops a theorem for a necessary and sufficient condition for a VAR to be exactly identified. This theorem applies to a wide range of identified VARs, including those used in the literature. The basic idea is to transform the original structural parameters to the (np + h) × n matrix F (which is a function of A, A1, … Ap, D) so that linear restrictions can be applied to each column of F. The linear restrictions for the ith column of F can be summarized by the matrix Qi of rank qi, where qi is the number of restrictions. According to their theorem, the VAR model is exactly identified if and only ifqi = ni for 1 ≤ in. This result gives the researcher a practical way to determine whether a VAR model is identified.

When the number of identifying restrictions is greater than n(n − 1)/2, a VAR is over-identified. Allowing for over-identification is important since economic theory often implies more than n(n − 1)/2 restrictions. Moreover, many economic applications call for restrictions on the model’s parameters beyond the contemporaneous coefficients (Cushman and Zha 1997). Restrictions on the lag structure, such as block recursions, offer an effective way to handle over-parameterization when the lag length is long (Zha 1999). Classical or Bayesian econometric procedures can be used to test over-identifying restrictions. A review of theoretical results for Bayesian estimation and inference for both exactly identified and over-identified VARs is discussed below.

Impulse Responses

Impulse responses are most commonly used in the VAR literature and are defined as \( {\partial}_{{\mathbf{y}}_{t+s}}/\partial {\varepsilon}_t^{\prime } \) fors ≥ 0. Let Φs be the n × n impulse response matrix at step s and the ith row of Φs be responses of the n endogenous variables to the ith one-standard-deviation structural shock. One can show that the impulse responses can be recursively updated as
$$ {\Phi}_s={\Phi}_{s-1}{\mathbf{B}}_1+\dots +{\Phi}_{s-p}{\mathbf{B}}_p $$
with the convention that Φ0 = A−1 and Φυ = 0n×n for υ < 0.

The concept of impulse response is economically appealing and is used in strands of literature other than VAR work. For example, impulse responses to technology shocks or monetary policy shocks in a DSGE have been often compared to those in a VAR model. In empirical monetary economics, impulse responses of various macroeconomic variables to policy shocks have been a focal point in the recent debate on the effectiveness of monetary policy. These shocks can be thought of as shifts (deviations) from the systematic part of monetary policy that are hard to predict from the viewpoint of the public.

It is sometimes argued that identified VARs are unreliable because certain conclusions are sensitive to the specific identifying assumptions. This argument is a sophism. All economic models, DSGE model and VARs alike, are founded on ‘controversial’ assumptions, and the results can be sensitive to these assumptions. What researchers should do is to select a class of models based on how well they fit to the data, analyse how reasonable the underlying assumptions are, and examine whether there are robust conclusions across models.

Christiano et al. (1999) and Rubio-Ramirez et al. (2005) show some important robust results across different VAR models that have reasonable assumptions and fit to the data equally well. One prominent example is the robust conclusion that a large fraction of the variation in policy instruments, such as the short-term interest rate, can be attributed to the systematic response of policy to shocks originating from the private economy. Such a conclusion is expected of good monetary policy, but it also explains the subtle and difficult task of identifying monetary policy shocks separately from the other shocks affecting the economy.

Estimation and Inference

Bayesian Prior

When one estimates a VAR model for macroeconomic time series data, there is a trade-off between using short and long lags. A VAR with a short lag is prone to misspecification, and a VAR with a long lag length is likely to suffer from the over-fitting problem. The Bayesian prior proposed by Sims and Zha (1998) is designed to eliminate the over-fitting problem without reducing the dimension of the model. It applies to not only reduced-form but also identified VARs.

To describe this prior simply, let zt contain only a constant term and thus D is a 1 × n vector of parameters. Rewrite the structural system (1) in the compact form of\( {\mathbf{y}}_t^{\prime}\mathbf{A}={\mathbf{X}}_t^{\prime}\mathbf{F}+{\varepsilon}_t^{\prime }, \) where
$$ \underset{1\times k}{{\mathbf{x}}_t^{\prime }}=\left[{\mathbf{y}}_{t-1}^{\prime}\Lambda {\mathbf{y}}_{t-p}^{\prime }{\mathbf{z}}_t^{\prime}\right],\kern0.5em \underset{n\times k}{{\mathbf{F}}^{\prime }}=\left[{\mathbf{A}}^{\prime}\Lambda {\mathbf{A}}_p^{\prime }{\mathbf{D}}^{\prime}\right], $$
and k = np + h. For 1 ≤ jn, let ai be the jth column of A and fi be the jth column of F. The first component of the prior is that aj and fi have Gaussian distribution
$$ {\mathbf{a}}_j\sim \boldsymbol{N}\left(0,\mathbf{S}\right)\kern0.5em \mathrm{and}\kern0.5em {\mathbf{f}}_j\mid {\mathbf{a}}_j\sim \boldsymbol{N}\left({\mathbf{Pa}}_j,\mathbf{H}\right), $$
where \( {\mathbf{P}}_{n\times k}^{\prime }= \) [In×n 0n×n0n×n 0n×1], which is consistent with the reduced-form random walk prior of Litterman (1986). The covariance matrices S and H are assumed to be diagonal matrices and are treated as hyperparameters. In principle, one could estimate these hyperparameters or integrate them out in a hierarchical framework. In practice, the values of these hyperparameters are specified before estimation. The ith diagonal element of S is λ0/σi. The diagonal element of H that corresponds to the coefficient on lag l of variable i in equation j is \( \left({\lambda}_0{\lambda}_1{\lambda_2}^{\delta \left(i,j\right)}\right)/\left({\sigma}_i{l}^{\lambda_3}\right) \) where δ(i, j) equals 1 if i = j and 0 otherwise. The diagonal element of H corresponding to the constant term is the square of λ0λ4. The hyperparameter λ0 controls the overall tightness of belief about the random walk feature, as well as tightness on the prior of A itself; λ1 further controls the tightness of belief on random walk and the relative tightness on the prior of lagged coefficients; λ2 controls the influence of variable i in equation j; λ3 controls the rate at which the influence of lag decreases as its length increases; and λ4 controls the relative tightness on the zero value of the constant term. The hyperparameters σi are scale factors to make the units uniform across variables, and are chosen at the sample standard deviations of residuals from univariate autoregressive models fitted to the individual time series in the sample (Litterman 1986).

A VAR with many variables and a long lag is likely to produce relatively large coefficient estimates on distant lags and thus volatile sampling errors. The prior described here is designed to reduce the influence of distant lags and the unreasonable degree of explosiveness embedded in the system. It is essential for ensuring reasonable small-sample properties of the model, especially when there are relatively few degrees of freedom in a large VAR.

The aforementioned prior, however, does not take into account the features of unit roots and cointegration relationships embedded in many time series. For this reason, Sims and Zha (1998) add another component to their prior. This component uses Litterman’s idea of dummy observations to express beliefs on unit roots and cointegration. Specifically, there are n + 1 dummy observations added to the original system, which can be written as
$$ {\mathbf{Y}}_d\mathbf{A}={\mathbf{X}}_d\mathbf{F}+\mathbf{E}, $$
where E is a matrix of random shocks,
$$ \underset{\left(n+1\right)\times n}{{\mathbf{Y}}_d}=\left[\begin{array}{ccc}\hfill {\mu}_5{\overline{y}}_1^0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill \ddots \hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill 0\hfill & \hfill {\mu}_5{\overline{y}}_n^0\hfill \\ {}\hfill {\mu}_6{\overline{y}}_1^0\hfill & \hfill \dots \hfill & \hfill {\mu}_6{\overline{y}}_n^0\hfill \end{array}\right],\underset{\Big(n\left(n+1\right)\times 1}{{\mathbf{c}}_d}=\left[\begin{array}{c}\hfill 0\hfill \\ {}\hfill \vdots \hfill \\ {}\hfill 0\hfill \\ {}\hfill {\mu}_6\hfill \end{array}\right], $$
$$ \underset{\left(n+1\right)\times \left( np+1\right)}{{\mathbf{X}}_d}=\left[{\mathbf{Y}}_d\kern0.5em \dots \kern0.5em {\mathbf{Y}}_d\kern0.5em {\mathbf{c}}_d\right], $$
and \( {\overline{y}}_i^0 \) is the sample average of the p initial conditions for the ith variable of yt and μ5 and μ6 are hyperparameters. The first n + 1 dummy-observation equations in (5) express beliefs that all variables are stationary with means equal to \( {\overline{y}}_i^0 \)’s or cointegration is present. The larger the values of μ5 and μ6, the stronger these beliefs.

Since the values of λ’s and μ’s move in opposite directions to increase or loosen the tightness of the prior, the two symbols λ and μ are kept distinct. In applied work, the values of the hyperparameters for quarterly data are typically set to λ0 = 1, λ1 = 0.2, and λ2 = λ3 = λ4 = μ5 = μ6 = 1.0. For monthly data, λ0 = 0.6, λ1 = 0.1, λ2 = 1.0, λ4 = 0.1, and μ5 = μ6 = 5.0, while the choice of the lag decay weight λ3 is somewhat complicated and is elaborated in Robertson and Tallman (1999).

By taking into account the cointegration relationships among macroeconomic variables, this additional component of the prior helps improve out-of-sample forecasting, reduces the difference in forecasting accuracy between using the vintage and final data, and produces robust impulse responses to monetary policy shocks across VARs with different identification assumptions (Robertson and Tallman 1999, 2001). Furthermore, Leeper et al. (1996) demonstrate that with this prior it is feasible to estimate VAR models with as many as 18 variables – far more than the current DSGE models can handle. Because the prior proposed by Sims and Zha (1998) reflects widely held beliefs in the behaviour of macroeconomic time series, it has been often used as a base line prior in the Bayesian estimation and inference of VAR models.

Marginal Data Density

If a model is used as a candidate for the ‘true’ data-generating mechanism, it is imperative that the model’s fit to the data is superior to those of alternative models. Recent developments in Bayesian econometrics have made it feasible to compare nested and non-nested models for their fits to the data (Geweke 1999). With a proper Bayesian prior, one can numerically compute the marginal data density (MDD) defined as
$$ {\int}_{\Theta}L\left({\mathbf{Y}}_T|\varphi \right)p\left(\varphi \right) d\varphi, $$
where φ is a collection of all the model’s parameters, Θ is the domain of φ, YT is all the data up to T, and L(YT| φ) is the proper likelihood function. To determine the goodness of fit of a DSGE model, for example, one can compare its MDD with that of a VAR model (Smets and Wouters 2003; Del Negro and Schorfheide 2004).

As a VAR is often used as a benchmark for comparing different models, it is important that one compute its MDD efficiently and accurately. For an unrestricted reduced-form VAR as specified in (2), there is a standard closed-form expression for (6) so that no Markov chain Monte Carlo (MCMC) method is needed to obtain the MDD. For restricted (tightly parameterized) VARs implied by a growing number of economic applications, there is in general no closed-form solution to (6), and a numerical approximation to (6) is needed. Because of a high dimension in the VAR parameter space and possible simultaneity in an identified model, popular MCMC approaches such as importance sampling and modified harmonic mean methods require a long sequence of posterior draws to achieve numerical reliability in approximating (6), and thus are computationally very demanding.

Chib (1995) offers a procedure for accurate evaluations of the MDD that requires the existence of a Gibbs sampler by partitioning φ into a few blocks. One can sample alternately from the conditional posterior distribution of one block of parameters given other blocks. While sampling between blocks entails additional simulations, the Chib algorithm can be far more efficient than other methods because each conditional posterior probability density function (PDF) can be evaluated in closed form. The objects needed to complete this algorithm are the closed-form prior PDF and the conditional posterior PDF for each block.

Because the prior discussed so far includes the dummy observations component, there is a question as to whether this overall prior has a standard PDF. To answer this question, it can be shown from (4) and (5) that the overall prior PDF is
$$ {\mathbf{a}}_j\sim \boldsymbol{N}\left(0,\overline{\mathbf{S}}\right)\kern0.5em \mathrm{and}\kern0.5em {\mathbf{f}}_j\mid {\mathbf{a}}_j\sim \boldsymbol{N}\left({\mathbf{Pa}}_j,\overline{\mathbf{H}}\right), $$
where \( \overline{\mathbf{S}}=\mathbf{S} \) and \( \overline{\mathbf{H}}={\left({\mathbf{X}}_d^{\prime }{\mathbf{X}}_d+{\mathbf{H}}^{-1}\right)}^{-1} \). The result (7) follows from the two claims:
$$ {\left({\mathbf{X}}_d^{\prime }{\mathbf{X}}_d+{\mathbf{H}}^{-1}\right)}^{-1}\left({\mathbf{X}}_d^{\prime}\mathbf{Y}+{\mathbf{H}}^{-1}\mathbf{P}\right)=\mathbf{P}; $$
$$ {\mathbf{Y}}_d^{\prime }{\mathbf{Y}}_d+{\mathbf{P}}^{\prime }{\mathbf{H}}^{-1}\mathbf{P}=\left({\mathbf{Y}}_d^{\prime }{\mathbf{X}}_d+{\mathbf{P}}^{\prime }{\mathbf{H}}^{-1}\right)\mathbf{P}. $$
Given the prior (7), Waggoner and Zha (2003a) develop a Gibbs sampler for identified VARs with the linear restrictions studied in the VAR literature. These restrictions can be summarized as
$$ \underset{n\times n}{{\mathbf{Q}}_j}{\mathbf{a}}_j=\underset{n\times 1}{0},\kern0.5em \underset{n\times k}{{\mathbf{R}}_j}{\mathbf{f}}_j=\underset{n\times 1}{0};\kern0.5em j=1,\dots n. $$
If there are qj restrictions on aj and rj restrictions on fj, the ranks of Qj and Rj are qj and rj respectively. Let Uj (Rj) be an n × qj (n × rj) matrix whose columns form an orthonormal basis for the null space of Qj (Rj). The conditions in (8) are satisfied if and only if there exist a qj× 1 vector bj and an rj× 1 vector gj such that aj = Ujbj and fj = Vjgj. The vectors bj and gj are the free parameters of aj and fj dictated by the conditions in (8). It follows from (7) that the prior distribution of bj and gj is jointly normal.

As for the conditional posterior PDFs, it can be shown that the posterior distribution of gj conditional on bj is normal and that the posterior distribution of bj conditional on bi’s for ij has a closed-form PDF and can be simulated from it exactly. These results enable one to use the efficient method of Chib (1995). The MDD calculated this way is reliable and requires little computing time. For example, it takes less than one minute to obtain a very reliable estimate of the MDD for a large VAR with 13 lags and 10 variables. Such accuracy and speed make it feasible to compare a large number of identified VARs with different degrees of restriction.

Error Bands

Because impulse responses are of central interest in interpreting dynamic multivariate models and helping guide the directions for new economic theory to be developed (Christiano et al. 2005), it is essential that measures of the statistical reliability of estimated impulse responses be presented as part of the process of evaluating models. The Bayesian methods reviewed so far in this essay make it feasible to construct the error bands around impulse responses. The error bands can contain any probability and are typically expressed in both.68 and.90 probability bands to characterize the shapes of the likelihood implied by the model.

The error bands of impulse responses reported in most VAR works are constructed as follows. One begins with the Gibbs sampler to draw bj and gj for j = 1, … n. For each posterior draw, the free parameters bj’s and gj’s are transformed to the original structural parameters A, Al (1 = 1, … p), and D; then the impulse responses are computed according to (3). The empirical distribution for each element of the impulse responses is formed and the equal-tail.68 and.90 probability intervals around each element are computed. The probability intervals have exact small-sample properties from a Bayesian point of view; and.90 or.95 probability intervals have been used in the empirical literature to approximate classical small-sample confidence intervals when the high dimensional parameter space and a large number of nuisance parameters make it difficult or impossible to obtain exact classical inferences.

One issue related to the error bands around impulse responses, whose importance is beginning to be recognized, is normalization. A normalization rule selects the sign of each draw of impulse responses from the posterior distribution. If there is no restriction imposed on the sign of each column of the contemporaneous coefficient matrix A, then the likelihood or the posterior function remains the same when the sign of a column of A is reversed. Without any sign restriction, the error bands for impulse responses would be symmetric around zero and thus the estimated responses would be determined to be imprecise.

The conventional normalization is to keep the diagonal of A always positive, based on the notion that a choice of normalization cannot have substantive effects on the results. But this notion is mistaken. If an identified VAR is non-recursive, normalization can generate ill-determined or unreasonably wide error bands around some impulse responses because some coefficients on the diagonal may be insignificantly different from zero.

Waggoner and Zha (2003b) show that normalized likelihoods can be different across normalization rules and that inappropriate normalization tends to produce a multi-modal likelihood. They propose a normalization rule designed to prevent the normalized likelihood from being spuriously multi-modal and thus avoid unreasonably wide error bands caused by the multi-modal likelihood. The algorithm for their normalization is straightforward to implement: for each posterior draw of aj, keep aj if \( {\mathbf{e}}_j^{\prime }{\mathbf{A}}^{-1}{\widehat{\mathbf{a}}}_j>0 \) and replace aj with aj if\( {\mathbf{e}}_j^{\prime }{\mathbf{A}}^{-1}{\widehat{\mathbf{a}}}_j<0 \), where ej is the jth column of the n × n identity matrix. This algorithm works for not only short-run but also long-run restrictions (Evans and Marshall 2002).

Another important issue related to error bands, not addressed until recently, is the characterization of the uncertainty around estimated impulse responses not only at one particular point but also around the shape of the responses as a whole. Let Φs(i, j) be the s-step impulse response of the jth variable to the ith structural shock. The associated error band is only pointwise. It is very unlikely in economic applications, however, that uncertainty about Φs(i, j) is independent across j or s. For example, the response of output to a policy shock is likely to be negatively correlated with the response of unemployment, and the response of inflation this period is likely to be positively correlated with the previous and next responses.

The procedure proposed by Sims and Zha (1999) takes into account these possible correlations across variables and across time. To use this procedure, one can simply stack all the relevant impulse responses into a column vector denoted by \( \tilde{\mathbf{c}} \), where the tilde refers to a posterior draw. From a large number of posterior draws, the mean c and covariance matrix \( \overline{\Omega} \) of \( \tilde{\mathbf{c}} \) are computed. For each posterior draw \( \tilde{\mathbf{c}} \) the kth component \( {\tilde{\gamma}}_k={\left(\tilde{\mathbf{c}}-\overline{\mathbf{c}}\right)}^{\prime }{\overline{\mathbf{w}}}_k \) is calculated, where \( {\overline{\mathbf{w}}}_k \) is the eigenvector corresponding to the kth largest eigenvalue of \( \overline{\Omega} \). From the empirical distribution of \( {\tilde{\gamma}}_k \), one can tabulate different quantiles such as γk,.16 and γk,.84. Thus, the.68 probability error bands explained by the kth component of variation in the group of impulse responses can be computed as \( {\mathbf{c}}_{.16}=\overline{\mathbf{c}}+{\gamma}_{k,.16}{\overline{\mathbf{w}}}_k\;\mathrm{and}\ {\mathbf{c}}_{.84}=\overline{\mathbf{c}}+{\gamma}_{k,.84}{\overline{\mathbf{w}}}_k \). For a particular economic application, if it turns out that only one to three eigenvalues dominate the covariance matrix of \( \tilde{\mathbf{c}} \), these kinds of connecting-dots error bands can be useful in understanding the magnitudes and directions of uncertainty among a group of interrelated impulse responses. This method has proven to be particularly useful in economic applications that characterize the uncertainty around the entire paths, not just points one at a time (Cogley and Sargent 2005; Nason and Rogers 2006).

Markov-Switching VARs

The class of VARs discussed thus far assumes that the parameters are constant over time. This assumption is made mainly for the technical constraint on estimation and inference, however. Many macroeconomic time series display patterns that seem impossible to capture by constant-parameter VARs. One prominent example is changes in volatility over time. In the VAR framework, volatility changes mean that the reduced-form covariance matrix Σ is not constant. In policy analysis, there is a serious debate on whether the coefficients in the policy rule have changed over time, or whether the variances of shocks in the private sector have changed over time, or both. Time-varying VARs are designed to answer these kinds of questions. Stock and Watson (2003) use the reduced-form VAR framework to show that fluctuations in US business cycles can be largely explained by changes in Σ. Sims and Zha (2006b) identify the behaviour of monetary policy from the rest of the VAR system and show that changes in the coefficients in monetary policy are, at most, modest and the variance changes in shocks originating from the private sector dominate aggregate fluctuations.

There have been a number of studies on time-varying VARs that allow the coefficients or the covariance matrix of residuals or both to change over time. These models typically let all the coefficients drift as a random walk or persistent process. To the extent that this kind of modelling tries to capture possible changes in the model’s parameters, the model tends to over-fit because the dimension of time variation embedded in the data is much lower than the model’s specification. Conceptually, there is a problem of distinguishing shocks to the residuals from shocks to the coefficients. The inability to distinguish among these shocks makes it difficult to interpret the effects of, say, monetary policy shocks.

The Markov-switching VAR introduced by Sims and Zha (2006a) is designed to overcome the over-fitting problems present in the other time-varying VARs and, at the same time, maintain clear interpretation of structural shocks. It builds on the Markov-switching model of Hamilton (1989), but emphasizes ways to restrict the degree of time variation allowed in the VAR. It has a capability to approximate parameter drifts arbitrarily well with the growing number of states, while restricting the transition matrix to be concentrated on the diagonal. This feature also allows discontinuous jumps from one state to another, which appears to matter for aggregate fluctuations.

To see how this method works, suppose that the parameter zt drifts according to the process zt = ρzt − 1 + νt where νtN(0, σ2). By discretizing this autoregressive process, one can let the probability of the transition from state j to i be proportional to
$$ {\displaystyle \begin{array}{ll}\hfill & \Pr \left[{z}_t\in \left(\frac{\tau_i\sigma }{\sqrt{1-{\rho}^2}},\frac{\tau_{i+1}\sigma }{\sqrt{1-{\rho}^2}}\right)|{z}_{t-1}=\frac{\tau_j+{\tau}_{j+1}}{2}\frac{\rho \sigma}{\sqrt{1-{\rho}^2}}\right]\\ {}& =\Psi \left(\frac{\tau_{i+1}}{\sqrt{1-{\rho}^2}}-\frac{\tau_j+{\tau}_{j+1}}{2}\frac{\rho }{\sqrt{1-{\rho}^2}}\right)\hfill \\ {}& -\Psi \left(\frac{\tau_i}{\sqrt{1-{\rho}^2}}-\frac{\tau_j+{\tau}_{j+1}}{2}\frac{\rho }{\sqrt{1-{\rho}^2}}\right),\hfill \end{array}} $$
where Ψ( ) is the standard normal cumulative probability function. The values of τ divide up the interval between − 2 and 2 (two standard deviations). For nine states, for example, one has τ1 = 2, τ2 = 1.5, τ3 = 1, …, τ8 = 1.5, and τ9 = 2. Careful restrictions on the degree of time variation, as well as on the constant parameters themselves, will put VARs a step closer to DSGE modelling. Recent work by Davig and Leeper (2005) shows an example of how to use a DSGE model to restrict a VAR on monetary and fiscal policy.


There is a tension between models that have clear economic interpretations but offer a poor fit to data and models that fit well but have few a priori assumptions and are therefore less interpretable (Ingram and Whiteman 1994; Del Negro and Schorfheide 2004). The original philosophy motivating VARs assumes that the economy is sufficiently complex and that simplified theoretical models, while useful in organizing thought about how the economy works, generally abstract from important aspects of the economy. VAR modelling begins with the minimal restrictions on dynamic time-series models, explores empirical regularities that have been ignored by simple models, and insists on the model’s fit to data. The emphasis on fit has begun to bear fruit, as an increasing array of dynamic stochastic general equilibrium models have been tested and compared with VARs (Christiano et al. 2005; Smets and Wouters 2003). Markov-switching VARs go a step further in bringing VARs even closer to the data and thus provide a new benchmark for model comparison.

At the same time, considerable progress has been made to narrow the gap between VARs and DSGE models. Some results from VARs have provided empirical support to the key assumption made by real business cycle (RBC) models that monetary policy shocks play insignificant roles in generating business fluctuations. Nason and Cogley (1994) and Cogley and Nason (1995) discuss similar results from both VAR and RBC approaches. Fernandez-Villaverde et al. (2005) provide conditions and examples under which there exists the VAR representation of a DSGE model. Sims and Zha (2006a) display a close connection between an identified VAR and a DSGE model, and provide a measure for determining whether the ‘invertibility problem’ is a serious issue.

Undoubtedly there are payoffs in moving beyond the original VAR philosophy by imposing more restrictions on both contemporaneous relationships and lag structure while the restrictions are guided carefully by economic theory. Although moving in this direction is desirable, it is essential to maintain the spirit of VAR analysis as originally proposed by Sims (1980). This requires that heavily restricted VARs be subject to careful evaluation in terms of fit. Recent advances in Bayesian estimation and inference methods of restricted VARs make it feasible to compute the MDD accurately and efficiently and, therefore, to determine whether the restrictions have compromised the fit. These methods, however, still fall short of handling VARs with cross-equation restrictions implied by DSGE models. Thus, the challenge ahead of us is to develop new tools for VARs with possible cross-equation restrictions.

See Also


  1. Bernanke, B. 1986. Alternative exploration of the money-income correlation. Carnegie-Rochester Conference Series on Public Policy 25: 49–99.CrossRefGoogle Scholar
  2. Blanchard, O., and D. Quah. 1993. The dynamic effects of aggregate demand and supply disturbances. American Economic Review 83: 655–673.Google Scholar
  3. Blanchard, O., and M. Watson. 1986. Are business cycles all alike? In The American business cycle: Continuity and change, ed. R. Gordon. Chicago: University of Chicago Press.Google Scholar
  4. Chib, S. 1995. Marginal likelihood from the Gibbs output. Journal of the American Statistical Association 90: 1313–1321.CrossRefGoogle Scholar
  5. Christiano, L., M. Eichenbaum, and C. Evans. 1999. Monetary policy shocks: What have we learned and to what end? In Handbook of Macroeconomics, ed. J. Taylor and M. Woodford, Vol. 1A. Amsterdam: North-Holland.Google Scholar
  6. Christiano, L., M. Eichenbaum, and C. Evans. 2005. Nominal rigidities and the dynamics effects of a shock to monetary policy. Journal of Political Economy 113: 1–45.CrossRefGoogle Scholar
  7. Cogley, T., and J. Nason. 1995. Output dynamics in real business cycle models. American Economic Review 85: 492–511.Google Scholar
  8. Cogley, T., and T. Sargent. 2005. Drifts and volatilities: Monetary policies and outcomes in the post WWII U.S. Review of Economic Dynamics 8: 262–302.CrossRefGoogle Scholar
  9. Cushman, D., and T. Zha. 1997. Identifying monetary policy in a small open economy under flexible exchange rates. Journal of Monetary Economics 39: 433–448.CrossRefGoogle Scholar
  10. Davig, T., and E. Leeper. 2005. Fluctuating macro policies and the fiscal theory. Working Paper No. 11212. Cambridge, MA: NBER.Google Scholar
  11. Del Negro, M., and F. Schorfheide. 2004. Priors from general equilibrium models for VARs. International Economic Review 45: 643–673.CrossRefGoogle Scholar
  12. Evans, C., and D. Marshall. 2002. Economic determinants of the nominal treasury yield curve. Working paper. Federal Reserve Bank of Chicago.Google Scholar
  13. Faust, J., and E. Leeper. 1997. When do long-run identifying restrictions give reliable results? Journal of Business and Economic Statistics 15: 345–353.Google Scholar
  14. Fernandez-Villaverde, J., J. Rubio-Ramirez, and T. Sargent. 2005. A, B, C’s (and D’s) for understanding VARs. Working Paper No. 2005–9. Federal Reserve Bank of Atlanta.Google Scholar
  15. Gali, J. 1992. How well does the IS-LM model fit postwar U.S. data? Quarterly Journal of Economics 107: 709–738.CrossRefGoogle Scholar
  16. Geweke, J. 1999. Using simulation methods for Bayesian econometric models: Inference, development, and communication. Econometric Reviews 18: 1–73.CrossRefGoogle Scholar
  17. Geweke, J., and C. Whiteman. 2006. Bayesian forecasting. In The handbook of economic forecasting, ed. G. Elliott, C. Granger, and A. Timmermann. Amsterdam: North-Holland.Google Scholar
  18. Gordon, D., and E. Leeper. 1994. The dynamic impacts of monetary policy: An exercise in tentative identification. Journal of Political Economy 102: 1228–1247.CrossRefGoogle Scholar
  19. Hamilton, J. 1989. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica 57: 357–384.CrossRefGoogle Scholar
  20. Ingram, B., and C. Whiteman. 1994. Supplanting the Minnesota prior: Forecasting macroeconomic time series using real business cycle model priors. Journal of Monetary Economics 34: 497–510.CrossRefGoogle Scholar
  21. Leeper, E., and T. Zha. 2003. Modest policy interventions. Journal of Monetary Economics 50: 1673–1700.CrossRefGoogle Scholar
  22. Leeper, E., C. Sims, and T. Zha. 1996. What does monetary policy do? Brookings Papers on Economic Activity 2: 1–78.CrossRefGoogle Scholar
  23. Litterman, R. 1986. Forecasting with Bayesian vector autoregressions – Five years of experience. Journal of Business and Economic Statistics 4: 25–38.Google Scholar
  24. Nason, J., and T. Cogley. 1994. Testing the implications of long-run neutrality for monetary business cycle models. Journal of Applied Econometrics 9: S37–S70.CrossRefGoogle Scholar
  25. Nason, J., and J. Rogers. 2006. The present-value model of the current account has been rejected: Round up the usual suspects. Journal of International Economics 68: 159–187.CrossRefGoogle Scholar
  26. Robertson, J., and E. Tallman. 1999. Vector autoregressions: Forecasting and reality. Federal Reserve Bank of Atlanta Economic Review 84(1): 4–18.Google Scholar
  27. Robertson, J., and E. Tallman. 2001. Improving federal-funds rate forecasts in VAR models used for policy analysis. Journal of Business and Economic Statistics 19: 324–330.CrossRefGoogle Scholar
  28. Rubio-Ramirez, J., D. Waggoner, and T. Zha. 2005. Markov-switching structural vector autoregressions: Theory and applications. Working Paper No. 2005–27. Federal Reserve Bank of Atlanta.Google Scholar
  29. Sims, C. 1980. Macroeconomics and reality. Econometrica 48: 1–47.CrossRefGoogle Scholar
  30. Sims, C. 1982. Policy analysis with econometric models. Brookings Papers on Economic Activity 1: 107–152.CrossRefGoogle Scholar
  31. Sims, C. 1986. Are forecasting models usable for policy analysis. Federal Reserve Bank of Minneapolis Quarterly Review 10(1): 2–16.Google Scholar
  32. Sims, C., and T. Zha. 1998. Bayesian methods for dynamic multivariate models. International Economic Review 39: 949–968.CrossRefGoogle Scholar
  33. Sims, C., and T. Zha. 1999. Error bands for impulse responses. Econometrica 67: 1113–1155.CrossRefGoogle Scholar
  34. Sims, C., and T. Zha. 2006a. Does monetary policy generate recessions? Macroeconomic Dynamics 10(2): 231–272.CrossRefGoogle Scholar
  35. Sims, C., and T. Zha. 2006b. Were there regime switches in US monetary policy? American Economic Review 96: 54–81.CrossRefGoogle Scholar
  36. Smets, F., and R. Wouters. 2003. An estimated dynamic stochastic general equilibrium model of the euro area. Journal of the European Economic Association 1: 1123–1175.CrossRefGoogle Scholar
  37. Stock, J., and M. Watson. 2003. Has the business cycle changed? Evidence and explanations. Prepared for the federal reserve bank of Kansas city symposium ‘Monetary policy and uncertainty: Adapting to a changing economy’, Jackson Hole, Wyoming, 28–30 August.Google Scholar
  38. Waggoner, D., and T. Zha. 2003a. Likelihood-preserving normalization in multiple equation models. Journal of Econometrics 114: 329–347.CrossRefGoogle Scholar
  39. Waggoner, D., and T. Zha. 2003b. A Gibbs simulator for structural vector autoregressions. Journal of Economic Dynamics & Control 28: 349–366.CrossRefGoogle Scholar
  40. Zha, T. 1999. Block recursion and structural vector autoregressions. Journal of Econometrics 90: 291–316.CrossRefGoogle Scholar

Copyright information

© Macmillan Publishers Ltd. 2018

Authors and Affiliations

  • Tao Zha
    • 1
  1. 1.