1 Introduction: Recent Empirical Evaluations of Macro Models and the Implications for Macro Policy

Recent decades have seen a major financial crisis and a worldwide pandemic, together with largescale responses from fiscal and monetary policy. A variety of attempts have been made to model these events and policy responses empirically. In this paper we review these modelling attempts and suggest some policy conclusions. We will argue that a new class of DSGE models in which there is price-setting but with endogenous duration can account for the shifts in macro behaviour from pre-crisis times up to the present day; these models also prescribe a key role for fiscal policy in stabilising the economy and preventing its slide into the zero lower bound.

Since the crisis, a number of economists have argued for a more central role for fiscal policy, given the enfeeblement of monetary policy with interest rates at the zero lower bound. Prominent advocates of stronger fiscal stimulus for economies battling low inflation and weak demand have included Romer, Stiglitz, and Solow in Blanchard et al. (2012), also Spilimbergo et al. (2008), Lane (2010), though with opposition from Alesina and Giavazzi (2013). This viewpoint has seemed highly persuasive on broad qualitative grounds. However, credible quantitative assessments of the role and effects of fiscal policy have been harder to find. This is what we attempt to do in this paper, drawing on recent DSGE models that can claim to match data behaviour rather accurately.

2 Recent Literature on the Role of Fiscal Policy Since the Crisis

In a recent book based on an MIT conference, Blanchard et al. (2012), Romer, Blanchard and Stiglitz set out support for more aggressive fiscal policies during financial crisis. Romer summarises these views pithily as the realisation among macroeconomists that the exclusive reliance on monetary policy for short run stabilisation was wrong, because it much underestimated the damage from the zero lower bound. Romer also attacks the contribution of DSGE modelling, though, as we will show, it can make a useful empirical contribution. Several other contributions at conferences and other meetings convened after the financial crisis cover similar ground and come to broadly similar policy conclusions. Spilimbergo et al. (2008) reviewed IMF thinking on fiscal policy in crisis periods, finding that in five crisis episodes fiscal policy had a positive part to play, with strong fiscal multipliers. Lane (2010) expresses similar views. Using New Keynesian DSGE models, many research studies — (Christiano et al. 2011; Woodford 2011; Erceg and Linde 2014) — have found that stimulative fiscal policies have big effects on consumption and output when nominal interest rates are low. They show that the government spending multipliers can be much larger at the zero lower bound, and that an exogenous increase in government spending can be welfare improving because it increases expected inflation, which lowers the real interest rate. Coenen et al. (2012) subject seven structural DSGE models to fiscal stimulus shocks using seven different fiscal instruments. One of the consensus results across models is that the size of many multipliers is large, particularly for spending and targeted transfers to financially constrained households. Fiscal policy is found to be most effective if it has moderate persistence and if monetary policy is accommodative. Eggertsson (2010) considers different taxes and looks for the most desirable in the zero lower bound situation. Tax cuts imply that workers will want to work more, and then firms can produce more cheaply, resulting in downward pressure on prices. At the zero lower bound, downward price pressures create deflationary expectations and push the real interest rate higher, which has a negative effect on spending. He finds that the multiplier from a 1% cut in the labour tax at the zero lower bound switches from being positive to negative at -1.02, but a temporary sales tax reduction is expansionary because it makes today’s consumption cheaper relative to the future and stimulates spending. He argues that expansionary fiscal policy at the zero lower bound should stimulate aggregate demand, rather than aggregate supply. Correia et al. (2013) show how distortionary taxes — an increasing path for consumption taxes, a decreasing path for labour taxes, together with a temporary investment tax credit or a temporary cut in capital income taxes — can replicate the effects of negative interest rates and completely eliminate the zero bound problem. The consensus is that supply-side fiscal policies are ineffective, while demand-side policies are expansionary and effective in stabilising the economy when the nominal interest rate is zero.

There are two points to notice about this literature. Firstly, the assessment of fiscal policy’s effectiveness seems to be dependent on what solution method is used to solve the New Keynesian models at the ZLB and the causes of the liquidity trap. Boneva et al. (2016) show that the nonlinear solution exhibits new types of ZLB equilibria that cannot occur using a loglinearised solution. Their New Keynesian model can exhibit the same properties as in the above studies for a relatively small set of parameters and shocks. In other regions of the parameter space, the nonlinear solution implies that demand-side fiscal multipliers at the ZLB are small and not that different from its values for fiscal policy away from the ZLB, while supply-side fiscal stimulus is expansionary at the ZLB. Mertens and Ravn (2014) argue that the output multiplier during the ZLB is small in a New Keynesian model if the ZLB period is caused by a non-fundamental confidence shock. Since in this case government spending shocks are deflationary and increase real interest rates, lowering consumption and investment, the output multiplier is lower than outside of the ZLB period. The second point to notice in this strand of literature is that it abstracts from debt sustainability questions to focus only on the stabilisation role of fiscal policy.

By contrast, Alesina and Giavazzi (2013) convened a conference on the crisis at the University of Chicago, the bulk of which favoured restraint on fiscal policy, emphasising the dangers of rising debt/GDP ratios. Government spending can cause debt crises. Evans et al. (2012) use a two-period overlapping generations model calibrated to the US economy and argue that there is a 35% chance that the US would reach its fiscal limit in about thirty years. Easterly (2001) argues that stationary fiscal gaps relative to GDP do not necessarily prevent debt crises, growth slowdowns can also cause them. Leeper and Walker (2012) find that if large deficits are not followed by large surpluses, then deficit spending financed by debt may cause inflation. Because of these consequences, indebted governments implemented fiscal consolidation to reduce government deficits and debt, while monetary policy was faced with the zero lower bound constraint. The concern was that given higher multipliers during the ZLB period, fiscal consolidation could suppress the low demand further and lead to an even deeper recession, which would increase the government debt/GDP ratio. Blanchard and Leigh (2013) find that for the European economies’ recent austerity, the multipliers were especially high, therefore stronger planned fiscal consolidation was associated with lower growth than expected. Furthermore, Delong and Summers (2012) argue that austerity policies can be counterproductive even if they can reduce the burden of financing the national debt in the future, since the cyclical economic downturns can damage the productive potential of the economy. Warmedinger et al. (2015) however, argue that the above discussion is about short term impact, but there are medium term and long term effects from consolidation. They analyse the impact of fiscal consolidation on the debt/GDP ratio for a sample of individual euro area countries and the euro area aggregate to find that fiscal multipliers must be significantly above 1 to lead to a self-defeating scenario after 5 years and must be very large to lead to a self-defeating scenario after 10 years. That means if the fiscal multiplier is within the range normally considered as plausible for a balanced-composition package, then fiscal consolidation would initially have an adverse effect on the debt ratio, which is reversed after a few years.

Ramey (2019) presents a comprehensive survey on what we have learned in fiscal research since the financial crisis. The paper highlights prominent theoretical analyses, empirical methods and newly constructed data sets. However, we recognise that the existing DSGE literature on fiscal policy lacks thorough empirical analysis of the potential contribution of fiscal policy to macro stability and thus we will draw on recent empirical work on several economies to make good on this lacuna. We assume debt sustainability holds due to the cyclical nature of fiscal action.

3 Macro Models and their Empirical Evaluation

In the past three decades, since the rational expectations revolution and the understanding of how ubiquitous were the implications of Lucas’ (1976) critique, economists have rebuilt macro-economic models in the DSGE mould, trying to ensure that they had good micro-foundations. These models assume simplified set-ups where consumers maximise stylised utility functions and firms maximise stylised profit functions. Most models assume representative agents; more recently they assume heterogeneous agents to deal with such issues as inequality and growth. Much effort has been devoted to making these set-ups as realistic as possible and calibrating the resulting models with parameters that have been estimated on micro datasets.

Sometimes it has seemed as if the economists creating these models have assumed this ‘micro realism’ was enough to create a good DSGE macro model; and that therefore we should treat their models as simulating the true behaviour of the economy. However, a moment’s reflection reveals such assumptions to be self-deluding. Even the most realistic set-ups require bold simplifications simply to be tractable; they are after all models and not the ‘real world’. Furthermore, these models are intended to capture aggregate behaviour and there is a great distance between aggregated behaviour and the micro behaviour of individuals; even heterogeneous agent models do not accurately span the variety of individual types and shock distributions. The reasons for this gap between aggregated behaviour and the micro behaviour of individuals are manifold. One is the fairly obvious one that aggregate actions are the weighted sum of individual actions yet we cannot be sure of the weights, which themselves may change over time and across different shocks. Effectively we choose one constant set of weights but we need to check its accuracy. Another less obvious but important reason is that there are a host of ancillary market institutions whose function is to improve the effectiveness of individual strategies by sharing information; these include investment funds, banks and a variety of other financial intermediaries, whose activities are not usually modelled separately but whose contribution is found in the efficiency of those strategies.

Hence empirical work is needed to check whether these models do capture macroeconomic behaviour. It would be reassuring if well micro-founded models mimiced actual data behaviour. Then we would know that the simplification is not excessive and the aggregation problems have been conquered. More broadly DSGE macro-economic modelling remains highly controversial even among ‘mainstream’ macroeconomists on empirical grounds: for example Romer (2016) has argued that DSGE models are useless for basing advice to policymakers because they fail to capture key aspects of macro behaviour.

To settle such debates we need a tough empirical testing strategy, with strong power to discriminate between models that fit the data behaviour and those that do not. The merits of different testing methods have been reviewed in Le et al. (2016) and Meenagh et al. (2018, 2023), and we review the available approaches below. In this paper we review what we know about the empirical success of different DSGE models. We restrict ourselves to DSGE models because these are the only causal macro models we have that satisfy Lucas’ critique; we can regard them as ‘deep structure’ models where the causal processes are derived explicitly from people’s decisions and we can simulate how changes in government policies will affect the economy through these decisions. Other models may be causal in the sense that identified factors affect behaviour in a causal way, but only under the assumption that the policies and other exogenous processes in effect during the sample period continue in force. So they are causal in quite a restricted way that renders them unuseable for general analysis of how economies work in a full variety of potential contexts, and especially how they would react to changes in policy regimes.

We consider the results of empirical tests for DSGE models of the economy. Inevitably, given its size and influence, our main focus is on models of the US economy. However, we also review results for other large economies, viewed similarly as large and effectively closed. We also review models of various open economies, such as the UK and regions of the Eurozone. What we will see is a general tendency for fiscal policy to make an important stabilising contribution according to these models.

4 The Nature of the Empirical Evidence

In reviewing the evidence we are faced with a variety of ways in which facts are compared with model predictions:

  • Bayesian: here strong priors allow the researcher to estimate a model and assess its probability but this will depend crucially on the priors. But these are precisely what we want to test as we are unsure whether they are correct, given the controversy surrounding the importance of different policy approaches. With ‘flat’ priors which ascribe the same probability to all priors, the Bayesian approach amounts to maximum likelihood.

  • maximum likelihood: here the test power is quite weak in small samples, the usual situation for macro data, and the estimation bias high in small samples — Le et al. (2016). Meenagh et al. (2018, 2023). Hence evidence from FIML estimates and associated Likelihood Ratio statistics is not persuasive.

  • forecasting accuracy tests have rather weak power because they are also Likelihood Ratio tests — but weakened further by being out of sample — Minford et al. (2015).

  • the comparison of various moments singly with their model-simulated equivalents is not statistically valid because it neglects the covariance matrix of these moments which determines their joint distribution — Meenagh et al. (2023). Models generally imply substantial covariances between such moments because of the theoretical restrictions they impose.

Unfortunately the bulk of the empirical literature on DSGE models uses one or other of the above methods. We could go through them all and discuss each; this would be a worthwhile undertaking from which we could well learn much of interest. But the problem is that these methods do not tell us much about the accuracy or usefulness of the complete models of the economy that have been proposed to account for recent macro turmoil. What we would like to know is which models are consistent with the data and which are not. For this we need a method that has enough power to discriminate between the models that succeed and the models that should be discarded.

In what follows we have therefore restricted ourselves to tests under Indirect Inference where, as explained in Le et al. (2016) and Meenagh et al. (2018, 2023) cited above, the power of the test can be made extremely high, but for this reason the test needs to be used at a suitable level of power where it is efficiently traded off against tractability. This trade-off must be found by Monte Carlo experiment on each model. Too much power will mean the rejection of all good models; while weak power gives much too wide bounds on the accuracy of the model which is what we want to assess.

4.1 DSGE Models of the Closed Economy

The most widely used DSGE model today is the New Keynesian model of the US constructed by Christiano et al. (2005) and estimated by Bayesian methods by Smets and Wouters (2007). This model and the US data it is focused on makes a good starting point for our model evaluations. In this model the US is treated as a closed continental economy. In essence it is a standard Real Business Cycle model but with the addition of sticky wages and prices so that there is scope for monetary policy feedback to affect the real economy. Smets and Wouters found that their estimated model passed some forecasting accuracy tests when compared to unrestricted VAR models.

Many central banks are happy to accept the New Keynesian priors of this model since they believe that monetary policy is powerful as the model implies. However, in parts of the profession the model is rejected. Thus Chari et al. (2009) wrote: ‘Some think New Keynesian models are ready to be used for quarter-to-quarter quantitative policy advice. We do not. Focusing on the state-of-the-art version of these models, we argue that some of its shocks and other features are not structural or consistent with microeconomic evidence. Since an accurate structural model is essential to reliably evaluate the effects of policies, we conclude that New Keynesian models are not yet useful for policy analysis.’

So some sort of test is needed for economists in general to decide whether nominal rigidity holds or not. As already noted the forecasting test has little power and so is not useful for this purpose.

Le et al. (2011) applied indirect inference testing to the Smets-Wouters model, first investigating their New Keynesian version and then also investigating a New Classical version with no rigidity. They rejected both on the full post-war sample used by Smets and Wouters, with Wald equivalent t-values of around 2.5, using a three-variable VAR1 (output, inflation and interest rates). They noted that the power of this test, though considerable, was deliberately lower than what they termed a ‘full Wald’ test where all 7 variables were used in a higher order VAR. With such a ‘full Wald’ the model t-value was very much higher; but they argued that the power of this test was too high, in the sense noted above that it would reject most tractable models. They concluded that this model of the US post-war economy, popular as it was in major policy circles, must be regarded as strongly rejected by the appropriate 3-variable test.

They then found that there were two highly significant break points in the sample, in the mid-1960s and the mid-1980s. They also argued that there are parts of the economy where prices and wages are flexible and it therefore should improve the match to the data if this is included in a ‘hybrid’ model that recognises the existence of sectors with differing price rigidity (Dixon and Kara 2011, is similar, with disaggregation). Finally after estimation by indirect inference they found a version of this hybrid model that matched the data from the mid-1980s until 2004, known as ‘the great moderation’; no such version (or any version) could match the earlier two sub-samples. The later sample showed very low shares for the ‘flexible sectors’. However, when it was extended to include the period of financial crisis up to 2012, these shares rose dramatically and became dominant.

One could regard these findings as at least partial support for the critics of nominal rigidity. Micro-data (Zhou and Dixon 2019) show that firms do set prices for periods of time normally but when shocks are large they change them more frequently; thus there is time-dependence but also shock dependence of pricing period lengths. In a variety of economies there is substantial evidence that price rigidity varies with the extent of inflation. The high rigidity of the great moderation period seems to have reflected the lack of large shocks and the low inflation rate of that period; once the shocks of the financial crisis hit, with sharp effects on inflation, this ‘rigidity’ mostly disappears. Nevertheless there is normally some rigidity.

A DSGE model in which rigidity is shock-size-dependent is non-linear. We have the tools to solve such models. Since the financial crisis there has also been the arrival of the zero bound on interest rates and the use of Quantitative Easing (QE, aggressive purchase of bonds for money by the central bank) under the zero bound. Le et al. (2021) estimated such a model, complete with a banking sector and a collateral constraint that made narrow money creation effective by cheapening collateral. They found that this model finally could match the data behaviour over the whole post-war sample; in effect the shifts in regime due to the interaction of the ZLB with inflation and so with the extent of price rigidity manage to mimic the changing data behaviour closely. However, they found that this interaction of the ZLB and price rigidity created considerable inflation variability, as the ZLB weakened the stabilising power of monetary policy on prices and this extra inflation variance in turn reduced price rigidity, further feeding inflation variance. This process is illustrated in Fig. 1, a simulation (no 15) of the model in which the ZLB is repeatedly hit (the shaded areas), with both inflation and interest rates gyrating sharply, and both output and the share of the relatively rigid-price sector (the NK weight) responding.

Fig. 1
figure 1

Bootstrap simulation (all shocks) of US model. Source: Le et al. (2021)

In this prediction of soaring inflation variance after the onset of the zero bound, this model has proved eerily correct — as the chart in Fig. 2 of US inflation testifies. After going negative in 2010 and then settling at low rates initially in the 2010s, in 2023 inflation leapt upwards in a way reminiscent of the 1970s, in turn forcefully ending the ZLB with the sharp interest rate response currently playing out.

Fig. 2
figure 2

US inflation for all urban consumers — Source: St.Louis Fed

To cut into this inflation variance feedback loop, Le et al. (2021) found that there were benefits from both new monetary rules and from stronger fiscal feedback rules. Specifically, they found that substituting a Price Level (or Nominal GDP — NOMGDPT) target for an inflation target in the interest-rate-setting rule could greatly increase stability — because a levels target requires much more persistent interest rate changes which are anticipated by agents, thus giving much more ‘forward guidance’. They further found that fiscal policy has an important role to play in keeping the economy away from the ZLB; with a strongly stabilising fiscal policy that acts directly to prevent the ZLB occurring they found a big increase in both output and inflation stabilityFootnote 1. Their table of results is shown below as Table 1, contrasting variances and welfare under current rules (a Taylor Rule and no fiscal response) with those under a NOMGDPT target rule for money and a fiscal ‘backstop’ rule stopping the ZLB from taking hold. These latter rules keep the frequency of ‘crisis’ (a long, bad recession) down at one per century while reducing both output and inflation variance sharply, and maintaining a high degree of price rigidity.

Table 1 Welfare results for US under different policy rules. Source: Le et al. (2021)

4.2 Work on Other Economies

Work on the UK found that a similar model fitted UK data behaviour before and after the financial crisis, from 1986 to 2016 (Le et al. 2023b). Like the US model, it implies that fiscal policy can contribute to stability by limiting zero bound episodes. Below in Table 2 we show how different fiscal policies contribute to the overall stability of the economy across a large sample of bootstrapped shocks (taken from the full sample period). It can be seen that the fiscal policy backstop, added to NOMGDPT monetary policies, helps to raise stability; we also see that a straightforward fiscal feedback rule produces a similar result.

Table 2 Welfare results for UK under different policy rules. Source: Le et al. (2023a, b)

For the eurozone, in a model that divided the zone into two separate regions, North and South, Minford et al. (2022) found that it matched eurozone data well over the first two decades of the euro’s existence; they modelled the zero bound indirectly by assuming the central bank rule targets the commercial credit rate with its repertoire of instruments, including QE. As in the other models just reviewed fiscal policy can increase stability substantially. We show the key results in Table 3; the results of policy interest are for the Base case, Regime 5 where each region is free to use its fiscal policy to stabilise its own economy, and Regime 7 which additionally creates in place of the euro two regional euro currencies with independent regional central banks pursuing their own interest rate rules. The first panel of Table 3 reveals the sharp falls in key variances due to introducing Regime 5 — Regime 7 increases stability more but is not on the political agenda. The second panel of Table 3 also shows the equivalent implied rise (vs the baseline) in permanent household consumption due to this rise in stability. Ignoring Regime 7, we can see that allowing independent fiscal policy greatly raises stability. The Eurozone Stability and Growth Pact (SGP) currently prevents this policy, essentially to protect the North from the threat of a Southern bailout. However, the paper shows that the average debt/GDP ratio in the South rises little due to the policy, suggesting that this threat could be contained simply by a solvency-monitoring process replacing the SGP.

Table 3 Welfare results for eurozone under different policy rules. Source: Minford et al. (2022)

Similar results are found for Japan in Le et al. (2023a). Growth in Japan has been notoriously weak, even though monetary policy has been stimulative for several decades. Fiscal policy has been intermittently stimulative between contractionary episodes where consumption taxes were raised; the simulation results show that a fiscal rule consistently exerting countercyclical pressure would have stabilised output more around a rising trend. Table 4 shows how, in a standard (‘No sunspot’) model a strong countercyclical fiscal policy greatly stabilises the economy.

Table 4 Welfare results for Japan under different policy rules. Source: Le et al. (2023a, b)

5 Detailed Aspects of Fiscal Rules

We have seen that fiscal policy can help stabilise the economy and steer it way from the zero bound, allowing monetary policy to pursue effective stabilisation too. We have also seen that this is true for a variety of economies other than the US, including several best modelled as small open economies like the UK or large ones like the eurozone.

This still leaves some unanswered questions about fiscal policy, raised by Romer and others in the literature reacting to the financial crisis, viz:

  1. 1.

    Does it matter which fiscal instrument is used? In the work above public spending was the instrument, feeding directly into goods demand. Would it make a difference to use tax-transfers or distortionary income or labour taxes? Both Romer and Solow argue that instruments differ greatly in their effects.

  2. 2.

    Would a standard fiscal feedback rule be more or less effective on stability than the fiscal backstop rule we investigated that eliminates the zero bound? The literature only looks at such standard rules, citing its effect on the zero bound as one advantage, whereas our backstop rule exploits that advantage exclusively.

  3. 3.

    Does ‘fiscal space’ matter, i.e. the extent to which the debt/GDP ratio exceeds some safe sustainable ratio like 50%? Romer argues (‘Lesson 3’) that it is an important factor in fiscal policy’s stabilising power, diminishing it as space shrinks.

The simulations cited above suggest answers to all these questions. These results for fiscal policy all assume that public spending is used as the fiscal instrument; lumpsum transfers would be ineffective due to Ricardian equivalence (present in all the models), while varying distortionary taxes over time creates unnecessary welfare losses from increased distortionsFootnote 2. Furthermore, an aggressive fiscal rule seems to do as well as an explicit fiscal backstop rule preventing the ZLB — Le et al. (2023b) for the UK. Finally, the efficacy of fiscal policy does not appear to vary with the level of debt, or ‘fiscal space’; our various countries had widely differing debt/GDP ratios, all the way to about 250% in Japan; but the effects on stability are similarly beneficial across them all.

6 Conclusions

In this review of the recent empirical evidence on macro modelling, we have found that DSGE models based on New Keynesian principles extended to allow for banking, the ZLB and varying price duration can account well for recent macro behaviour across a variety of economies, whether large and approximately closed like the US or small and open like the UK. Related models can also account for macro behaviour in Japan and the eurozone. These models all find that a contribution from active fiscal policy increases macro stability and welfare, essentially by reducing the frequency of hitting the ZLB, and sharing the stabilisation role with monetary policy whose effectiveness under the ZLB is much reduced.