The output effects of tax changes: narrative evidence from Spain

This paper estimates the GDP impact of legislated tax changes in Spain using a newly constructed narrative record for the period 1986–2015. Our baseline estimates suggest that a 1% of GDP increase in exogenous taxes depresses output by around 1.3% after 1 year, this negative effect fading away at more distant horizons. We also find that the effects of changes in indirect taxes are larger and that, following a tax increase, investment reacts more than consumption. Overall, our set of estimates is consistent with negative output effects triggered by tax increases, yet the quantitative effects are subject to non-negligible uncertainty that is reflected in wide confidence bands, in line with the extant literature for other countries.


Introduction
The macroeconomic effects of fiscal policy has long been a matter of great importance for researchers and policymakers, and the financial crisis in general and the public debt crisis in some euro area countries in particular have nothing but increased the interest in this topic. However, despite this growing interest, there is still no consensus about the economic consequences of fiscal actions. For example, in Alesina and Giavazzi (2013) the editors argue that "researchers are still deeply divided on some crucial issues such as the size (and sometimes also the sign) of fiscal multipliers." Part of this discrepancy stems from the fact that measuring the impact of fiscal shocks is inherently difficult. Very often fiscal changes respond to or are correlated with macroeconomic conditions, hence causal effects are hard to establish. The recent literature has addressed this identification problem in mainly two ways. First, Blanchard and Perotti (2002) estimate a structural vector autoregression (SVAR) by modeling the relationship between the reduced-form residuals and the structural shocks using external information on the output elasticity of government purchases and of taxes, and by assuming that policymakers do not react contemporaneously to output shocks. Second, the narrative record identifies directly the fiscal policy shocks that are uncorrelated with macroeconomic conditions, by identifying the motivation behind each legislated tax change. 1 In this paper we adopt the narrative approach to estimate the output effects of tax shocks in Spain. To this end, we have constructed a detailed record of all the relevant legislated tax changes implemented during the period 1986-2015. Therefore, this paper contributes to the literature pioneered by Romer and Romer (2010), who were the first to estimate the GDP effects of tax shocks using a quarterly narrative record, in their case for the US. Later on, this approach has been applied to the UK (Cloyne 2013), Germany (Hayo and Uhl 2013;Gechert et al. 2016) and Portugal (Pereira and Wemans 2015). Also, Devries et al. (2011) constructed an annual narrative series of taxes and spending for 17 OECD countries during 1978-2009 that was later used by Guajardo et al. (2014) to estimate fiscal multipliers. Building on this series, Alesina et al. (2015) estimate the effects of multi-year fiscal plans, while Alesina et al. (2015) extends the database to analyze the crisis period 2009-2013. 2 Our work follows closely the methodology developed by this literature. First, we identify the tax measures that were more likely to be influenced by other macroeconomic shocks, in order to exclude them from the estimated impulse-response functions. To do so, we follow the eightfold classification developed by Cloyne (2013), who distinguishes between four types of "endogenous" tax shocks (motivated by current or prospective macroeconomic conditions) and four types of "exogenous" shocks (whose motivation is not to offset macro developments), see Sect. 2.3. As some actions are difficult to categorize, for example those adopted during the recent period of financial turmoil, we discuss in detail the rationale behind our grouping.
We then aggregate the exogenous tax changes on a quarterly series. In order to assess the independence of this series from economic conditions, we check whether it can be predicted on the basis of past macroeconomic shocks. We show that we can reject predictability in most of the tests. However, the announcement of some measures during the financial crisis appear to be correlated with past macro developments. For this reason, the baseline estimates are based on a series that excludes those tax changes adopted in the period 2008-2013. We also show that the full set of exogenous taxes (including those implemented during the financial crisis) deliver similar impulseresponse functions.
Next, we estimate the GDP effects of an exogenous tax change by constructing impulse-response functions derived from simple VARs. The benchmark specification is a three-variable VAR of per capita GDP, per capita government spending, and the short-term interest rate, with the tax series included as an exogenous regressor. We find that a 1% of GDP increase in taxes depresses output by around 1.3% after 1 year, this effect fading away at more distant horizons. We also find, as already mentioned, that including the measures adopted during the financial crisis does not significantly affect the estimates. Moreover, we show that this is also the case if we focus on tax changes aiming at increasing long-run growth or imposed by foreign institutions, which are less likely to respond to business cycle developments. Also, we find larger fiscal multipliers if we consider changes in indirect taxes and if we focus on the effect of tax shocks on investment, rather than on output or consumption. In all our results we show that the point estimates are subject to non-negligible uncertainty, reflected in wide confidence bands. Therefore, we raise a flag of caution in over interpreting some quantitative results.
All things considered, our set of estimates provides a coherent picture of negative short-term output effects triggered by tax increases (and vice versa). Overall, our results appear smaller when compared to previous findings of the narrative literature, and they contrast sharply with the results found for Spain thus far. In this regard, the related literature typically finds that the short-term GDP response to a positive net tax shock tends to be expansionary, a fact that is rationalized in those studies by highlighting that, following the revenue shock, a parallel increase of government expenditure takes place, pushing up GDP, see de Castro (2006), de Castro and Hernández de Cos (2008), and de Castro et al. (2014. In addition, this result probably reflects the difficulties with properly identifying a net tax shock within the SVAR approach and limited sample sizes, as reflected in a number of studies with European data, see European Commission (2012). This underlines the value of the narrative record, which is precisely aimed at determining which fiscal shocks are unrelated to macro conditions, and therefore it is able to provide an unbiased estimate of the output effects of tax changes, see the in-depth discussions of Romer and Romer (2010) and Cloyne (2013).
The rest of the paper is organized as follows. The next section describes our narrative record and discusses the endogenous/exogenous categorization of tax measures. Section 3 shows the main results regarding the GDP effects of tax shocks, and Sect. 4 presents further results. Section 5 concludes. Supplementary material can be found in the online Appendix.

Narrative record of legislated tax changes in Spain (1986-2015)
This section describes the compilation of the legislated tax changes in Spain, the identification of the exogenous tax shocks, and the tests performed to assess their predictability given business cycle developments.

Construction of the dataset
We compiled all the legislated tax measures adopted in Spain during the period 1986-2015. In doing so, we used multiple sources, covering a wide range of reports from different government agencies. We highlight three important ones. First, the Budget Law, which is typically approved in the last quarter of each year. This law is regarded as the most relevant bill passed by Parliament, and it usually contains the most significant fiscal actions to be implemented in the following year. Second, the annual and monthly bulletins of the Tax Agency, which contain a very detailed account of tax revenues. Moreover, they describe all the recently adopted tax changes, and provide an estimate of their quantitative impact, both on an annual basis and on a monthly basis in the last years of our sample. And third, own reports elaborated in real-time by Banco de España, containing both a description and a quantitative assessment of the revenue effects of tax measures. These latter reports are elaborated in the context of the fiscal surveillance framework of the Eurosystem.
In order to compute the revenue impact of each measure, as it is standard in this literature, we quantify the yearly change in revenues prompted by the tax change at the quarter it is implemented, normalized by GDP. Implementation corresponds to the first quarter in which the tax action triggers a change in tax liabilities/payments with respect to the previous year. For corporate income taxes we take into account the timing of tax payments as dictated by the extant legislation. In April a first payment must be done, of approximately 25% of tax liabilities (which are a function of the previous year's profits). A second installment is due in October (50%), and a third one in December (25%). Therefore, if a corporate tax action comes into effect in January, the yearly impact is assigned to the second quarter, as the first payment is done in April. If it comes into force after April, the yearly impact is assigned to the fourth quarter. 3 For the personal income tax, as it is withheld at source, the yearly impact is assigned to the first quarter in which tax liabilities change. This is also the case of indirect taxes. Moreover, if the implementation of a tax measure lasts more than 1 year, we identify the set of revenue effects at each quarter it is implemented. We also consider the temporary/permanent nature of each tax action: for measures announced to be temporary, we compute the revenue effect and compensate it with an effect of the opposite sign when the tax change is reversed. We also regard as tax shocks the failure to update excise (per unit) duties in a context of high inflation, which leads to a fall in revenue. 4 One advantage of our dataset is that we are able to use often estimated

Overview of legislated tax changes in Spain
In this section, we provide a brief overview of the tax changes recorded in our narrative dataset for the period 1986-2015. Figure 1 plots the quarterly time series. A much more detailed account with emphasis on the motivation and the macroeconomic conditions surrounding the main tax changes can be found in Appendix A.
The first 10 years of our dataset (1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995) include mainly tax reforms aiming at adapting the Spanish tax code to the European regulations and complying with European Treaties. The government created the value added tax in 1986, fulfilling a requirement for the Spanish accession to the European Economic Community. Later on, it raised it twice (in 1992) in order to comply with the convergence criteria set in  This reform had a negative impact on revenue, due to the introduction of an exempt minimum income. Increases in indirect taxes (on fuel) had a significant impact on revenue in 1990 and 1991.
Following a reform of the corporate income tax in 1996, with a positive impact on revenue, the period 1997-2007 was characterized by tax decreases, stemming chiefly from revisions of the personal income tax (in 1999, 2003 and 2007), and the corporate income tax (in 2007), coupled with changes in social contributions and indirect taxes. Although small counter cyclical measures were adopted in 2002, in order to tackle a deceleration of activity, these reforms targeted long-run growth, competitiveness and compliance with European standards.
In 2008, after significant signs of a slowdown in activity, the government adopted a big stimulus package of around 1% of GDP. The tax decreases spanned 2008 and early 2009. After that, increasing concerns about the budget balance made the government change the policy stance. Tax increases were adopted in subsequent austerity packages, in December 2009, May 2010, August 2011, December 2011 and July 2012. They comprised significant increases in the personal income tax, the corporate income tax, the value added tax, as well as the suppression of a large number of deductions. This contractionary fiscal policy ended by around 2013-2014, where some of the measures adopted previously still had an effect. Following a vigorous economic recovery and ahead of elections, the government decreased direct taxes in 2015 by an amount close to 1% of GDP.

Construction of the exogenous tax series
In order to estimate the impact of legislated tax changes on GDP, it is necessary to purge the tax series from tax changes that respond to current or prospective macroeconomic conditions. Failure to do so involves the risk of assigning to tax changes the effect of other shocks affecting output, therefore incurring in an omitted variable bias. The narrative literature distinguishes between "endogenous" and "exogenous" tax measures. This distinction, rather than strictly econometric, is one of terminology. The former correspond to tax measures enacted in order to offset other macroeconomic shocks likely to affect output in the near term. Therefore, they are invalid to estimate the impact of tax shocks on output. Examples of such measures are a tax decrease because policy makers forecast a recession and a tax increase approved in order to finance a rise in spending. Tax measures deemed exogenous are those whose motivation is not to offset current macroeconomic developments. Examples of such measures include tax cuts implemented to increase potential output and tax changes imposed by external bodies, such as the European institutions. These exogenous measures, to the extent that they are orthogonal to current or prospective macroeconomic conditions, are valid to estimate the effect of tax changes on GDP.
Cloyne (2013) provides a useful eightfold terminology on what can be considered endogenous and exogenous tax changes. We follow his guidance to construct the exogenous tax series of our narrative dataset. Our assessment of the the motivation of each tax measure is based on the examination of the introductory comments of each bill, press releases, media news and different reports.
Endogenous tax changes can be classified in four categories. First, a "demand management" change, that attempts to adjust aggregate demand to offset macroeconomic fluctuations. That is, tax measures pursuing a counter cyclical goal. We include in this category two measures adopted in 2002 to counteract a slowdown in activity, the stimulus package of April 2008 and one measure adopted in 2011 to improve the activity of the construction sector. We also include in this category a corporate income tax cut approved in late 2006 but with a large impact in the second quarter of 2008, when several stimulus actions were implemented.
Second, a "supply-side" reform, that attempts to offset a shock through the supply side of the economy. One example of this category is a reduction of social contributions in March 2009, aiming at fostering the labor market.
Third, a "deficit reduction" action, that is, a legislated tax change stemming from concerns over current movements in the deficit. This category is the most difficult to delimit. Romer and Romer (2010) argue that tax measures responding to inherited budget deficits must be regarded as exogenous, as they are the consequence of past rather than current or future shocks. Cloyne (2013) distinguishes between "deficit reduction" measures, deemed endogenous, and "deficit consolidation" measures, deemed exogenous. As stated, the former includes measures triggered by concerns over current movements of the deficit or by a clear consequence of another shock. The latter includes measures adopted in order to deal with a budget deficit independently of the current macroeconomic conditions. Most of the measures taken by Spain in the period 2010-2012 clearly aimed at dealing with a growing budget deficit. Given the institutional setting of the Stability and Growth Pact, it can be argued that, at least partly, tax raises were imposed exogenously to the country, i.e. European policy makers paid less attention to GDP growth when suggesting the reforms, focusing instead on the evolution of the public deficit. On the other hand, some of the measures were taken under episodes of fiscal stress, which could have an independent effect on GDP growth. This reasoning suggests us to exclude from the exogenous tax series those measures adopted under periods of high financial turmoil, whereas reforms adopted to tackle the budget deficit under milder financial conditions and specially those with large implementation lags are in principle valid to be included in the exogenous tax series. As we note in the next section, however, we found that excluding all measures adopted during the financial crisis improves the unpredictability of the tax series, which advises us to shed all tax changes of the period 2008-2013, yet at the cost of losing the valuable information provided by the financial crisis. Given this trade-off, our empirical strategy is to estimate the baseline impulse-response functions with a tax series that excludes the endogenous measures and the (exogenous) tax changes adopted during the financial crisis, whereas we use the exogenous series with the tax changes adopted during the crisis in order to asses the sensitivity of the results (see Sect. 4.1).
Regarding the categorization of the measures adopted during the crisis, we deemed as endogenous the austerity packages of August 2011 and July 2012. These were packages adopted with urgency under financial turmoil. Indeed, both of them were passed when the risk premium was at historical heights, see Fig. 2. We also deemed as endogenous the fiscal package of December 2009, because it consisted mainly of the removal of a stimulus measure adopted before. On the contrary, we classified as exogenous the austerity packages of May 2010, December 2011 and March 2012. The package of May 2010 was adopted under financial turmoil, but it consisted mainly of expenditure measures. Also, since in most specifications we measure the impact of each tax change at implementation rather than at announcement, i.e. we abstract from anticipation effects, we include in the exogenous series two measures adopted under financial turmoil but that were implemented with significant delay. First, a VAT increase passed in December 2009 and implemented in July the following year, and second, a suppression of a deduction on births and adoptions, adopted in May 2010 and implemented in January of the next year. To the extent that these measures triggered a change in tax liabilities in the future, it can be considered that they did not respond to current financial conditions at the time they were implemented. The packages of December 2011 and March 2012 fall in the category of tax changes stemming from inherited deficits. The former was adopted by a new cabinet a few days after taking office, with the single goal of consolidating the public finances given an expected deviation of the budget from the target set by the European Union. The latter was approved a few weeks later when that deviation materialized. Moreover, both packages were implemented in relatively milder financial conditions: the risk premium was 150 basic points lower than the historical height reached in November 2011. Note that, as we stated before, the baseline estimates excludes all the measures adopted during the crisis, whereas the exogenous changes implemented during this period are used only in Sect. 4.1. Hence, the baseline estimates are not affected by the endogenous/exogenous categorization of the tax changes adopted during the period of financial turmoil. The fourth category of exogenous tax measures are "spending-driven" changes, aimed at financing an spending action. One prominent example of this is the introduction of a duty on fuels in 2002 in order to fund the expenditure on health.
Exogenous tax changes are likewise classified in four categories. First, "long-run" economic reforms, aiming at increasing long-run growth, rather than offsetting a shock. One example of this is the 2003 personal income tax reform, which was motivated on these grounds. Second, "ideological" changes, stemming from philosophical reasons, such as a preference for a lower fiscal pressure. Third, "external" changes, imposed by foreign bodies, such as the European Union. Many fiscal measures in our database correspond to this category. To name a few, the introduction of the value added tax in 1986, adopted as a requirement of the accession to the European Economic Community; two increases of this tax in 1992, passed in the context of the Maastricth Treaty; and the reform of direct taxation implemented in mid and late 1990s, in order to adapt the tax system to the European standards and help the country adopt the euro. And fourth, "deficit consolidation" measures, adopted in order to anchor credibility, independently of the current macroeconomic conditions. As already discussed, we include It corresponds to the exogenous (not triggered by current or prospective macroeconomic conditions) legislated tax changes excluding also those changes adopted during the financial crisis (2008)(2009)(2010)(2011)(2012)(2013). Shaded areas correspond to two negative quarters of GDP growth in this category two fiscal packages adopted in late 2011 and early 2012, whose main motivation was to comply with European rules. Other fiscal packages adopted in the period 2009-2012 were also to some extent imposed by the European institutions. However, they were taken under episodes of financial turmoil, which advises us to exclude them in the estimation of the effect of tax changes on GDP, as already discussed.
Overall, of the 75 measures adopted in the period 1986-2015, we classify 18 as endogenous. Furthermore, of the remaining exogenous changes, 12 were adopted during the financial crisis (2008)(2009)(2010)(2011)(2012)(2013). This leaves us with 44 tax shocks, which comprise the narrative series that we use in the main simulations. In panel B of Table 1 we show descriptive statistics of this series and Fig. 3 displays their timeline. 6

Predictability of the exogenous tax series
To assess how well our original narrative dataset has been purged from measures adopted with counter cyclical motivations, we analyze the predictability of our "exogenous" tax shocks following movements of output, government spending, inflation, and the short-term interest rate. These are standard tests proposed by the narrative litera-  (2008-2013). The first row shows a linear F-test of the joint significance of the macro variables on their association with the legislated tax shocks. The second row depicts the results of a Granger-causality test. The third and fourth rows show likelihood ratio tests of the macro variables having no predictive power on the timing of legislated tax changes, at announcement and implementation dates, respectively. Macro variables include log change of GDP, government spending, inflation, and the short-term interest rate. All regressions include four lags of the macro variables as well as the tax series ture, although it must be stressed that the contemporaneous independence of each tax change with respect to other aggregate fluctuations affecting GDP cannot be tested. We perform four tests. First, a simple F-test of the joint significance of the macro covariates in a linear regression with our tax series as the dependent variable. Second, a VAR Granger causality test. Third, an ordered probit regression at the announcement date. This involves defining a dependent variable taking value −1 on the quarter a tax cut is announced, 0 if there are no tax announcements, and 1 if a tax increase is revealed, where the sign of the tax change is assessed according to its cumulative impact. Then, the predictability of tax announcements is assessed by means of a likelihood ratio test on models with and without the macro covariates. And fourth, we perform a similar likelihood ratio test but defining the dependent variable at the implementation date, rather than at the announcement date. In all tests we use four lags of the macro covariates as well as of the dependent variable. Table 2 shows the results. The exogenous tax series passes three of the four tests, see column (2). To be more more precise, we find no evidence of Granger-causality between the macro variables and the legislated tax changes. Moreover, in the simplest linear specification (F-test) we cannot reject the null hypothesis that the macro variables are jointly non-significant in their association with the tax shocks, see the first two rows. Our exogenous tax series though fails to pass the ordered probit test at announcement date, see the third row. That is, we find some evidence that macro developments help forecast decisions on tax changes, although not the magnitude. This casts doubts on the degree of independence of the narrative series from economic conditions, yet the small sample bias can also play a role. Importantly, however, we find that these macro conditions do not have predictive power when the tax measures are evaluated at the implementation date, see the fourth row. This is somewhat reassuring, as the impact of tax shocks is computed when they trigger an actual change in tax liabilities, rather than when the bill is passed. Hence, in order to avoid confounding effects, it is crucial that tax changes at implementation do not stem from macroeconomic shocks. Moreover, it must be stressed that the announcement date reflects mostly the date when the tax bill becomes law, which can be a poor proxy of the timing when news about tax changes reach the economy. Yet, in order to delve deeper on these results, we analyze whether the predictability is influenced by the tax changes adopted during the turbulent times associated to the financial crisis. We do so by excluding the tax changes adopted during the period 2008-2013 and we find that this series comfortably passes the four tests. This result is consistent with some tax changes being passed after macroeconomic shocks, but being implemented later on. As mentioned before, given that their impact is accounted for at the time they are implemented, this is less worrisome regarding the endogeneity of the narrative series, at least when abstracting from anticipation effects.
Given these results and bearing in mind that the tax series enters as an exogenous variable in the VARs, we stick to the series without the financial crisis for estimating the baseline impulse-response functions and assess the sensitivity of the results to including those exogenous measures adopted in the period of high financial turmoil. As it turns out, both series deliver similar impulse-response functions. 7

Baseline specification
In this section we estimate the effect of a tax shock on GDP. We do so by estimating impulse-response functions derived from VAR models, see Favero and Giavazzi (2012). 8 Our baseline specification is a VAR of three endogenous variables: log real per capita GDP, log real per capita government spending and the short-term interest rate. Controlling for government spending and financial conditions is important as they can play a significant role. For example, interest rates experienced a large degree of volatility during the last years of our sample, possibly affecting the dynamics of output. Regarding public expenditure, some important changes were adopted at the time of legislated tax changes and some other, possibly, as a substitute. Therefore, these factors are likely to affect the estimated impact of tax shocks on GDP. We add as exogenous variables the narrative tax shocks and a linear trend. In this and the subsequent VARs, we include 3 lags of the tax shock as well as of the endogenous variables, following an optimal lag length analysis. 9 Regarding the sources of the macro aggregate data, they are obtained from de Castro et al. (forthcoming).
The baseline VAR takes the following form: where Y includes real GDP, real government spending and the 3-month interest rate; t is a linear trend; τ t is the narrative tax series; A 2 (L) is a lag polynomial of 3 lags; and A 3 (L) is a lag polynomials of 3 lags and the contemporaneous value. In our impulseresponse functions we estimate the output effect up to 12 quarters of a 1% of GDP increase in tax liabilities. We compute 68 and 90% error bands by bootstrapping with 1000 replications. The panel A of Fig. 4 shows the baseline results. We find that after an increase in tax liabilities of 1% of GDP, per capita output falls by 1.3% after 1 year, from that moment on starting to improve until reaching an almost zero effect at the end of the projection horizon. The impulse-response function is estimated with a fairly high degree of imprecision, as can be noted from the wide confidence bands, stemming probably from the relatively short time period and the small number of exogenous measures. However, the estimates suggest that tax changes trigger a significant decrease of GDP, at least in the short term. In order to compare these results with the literature, we estimate our baseline VAR for the US and the UK with the narrative series constructed by Romer and Romer (2010) and Cloyne (2013), respectively. 10 Panel B of Fig. 4 shows that the initial fall following a tax increase is very similar to the one in the US and the UK. Nevertheless, the maximum effect is lower, as GDP in these two countries reaches the bottom at lower levels: − 2.1% after 7 quarters in the US and − 2.6% after 10 quarters in the UK. Moreover, the UK estimates seem to fall outside the 90% confidence bands estimated from Spain. Regarding other countries, Hayo and Uhl (2013) report a GDP fall of 2.4% in Germany after around 8 quarters. Their specification encompasses a five-variable VAR of output, tax revenues, government expenditures, the short-term interest rate and the inflation rate. The estimates of Pereira and Wemans (2015) for Portugal render a drop of 2.3% after 3 years when controlling for output and government spending dynamics. Therefore, the effect of taxes in Spain appears somewhat smaller than is generally found in the narrative literature. This result is suggestive, although it must be taken with caution, given that these impulse-response functions are estimated with considerable uncertainty and that the model specification as well as the sample periods differ, which can largely affect the comparison. 11

Effects of two types of tax changes
We now ask whether the different types of exogenous tax changes cataloged in Sect. 2.3 have different effects on output. We classified tax changes according to three categories: long-run reforms, changes imposed by external bodies and deficit consolidation measures (we did not categorize any tax change as "ideological change"). It must be noted though that the boundaries of such categories are sometimes blurred. For example, some of the convergence criteria established in the different European treaties leading to the single currency involved significant tax changes. Some of those measures were partly the consequence of actions by foreign institutions and partly were driven by deficit consolidation concerns (e.g. the Maastricth Treaty). Also, some bills enacted in order to bring closer the Spanish tax system to European standards were adopted not only to fulfill an external requirement, but also with the aim of increasing long-run GDP. Given these concerns, we classified each tax change according to what we think was the main motivation of the bill, acknowledging that some judgments were inevitable. With these caveats in mind, it is worth exploring whether different types of tax changes imply different output responses. For example, Romer and Romer (2010) find negative output effects stemming from tax changes aiming at increasing long-run growth, and zero effects of deficit-driven tax rises. In this vein we compare two categories: external and long-run reforms, and deficit consolidation measures. We combine the two first categories because they respond quite often to the same underlying motivation. Moreover, they are more likely to be independent from the business cycle, hence this exercise allows us to test the robustness of the results to excluding measures more suspicious of suffering from endogeneity (see the discussion in Sect. 2.3).
Footnote 11 continued labor market, the size of automatic stabilizers, the exchange rate regime, the debt level, the management of public expenditure and the administration of public revenue. On the conjunctural factors they highlight the state of the business cycle and the degree of monetary accommodation.  Fig. 3). Panel A includes measures adopted as a requirement by external bodies, such as the European institutions, and tax changes aimed at increasing long-run GDP growth. Panel B includes measures whose main goal was to improve the budget balance This is of course at the cost of reducing the number of shocks and therefore leading to more imprecise estimates. To implement this exercise, we add to the baseline VAR both types of tax changes as exogenous regressors. Figure 5 shows the timeline of both categories of tax changes. There are 28 measures motivated by external bodies and long-run growth, and 17 measures motivated by the public deficit. The former are spread over the sample period, whereas the latter are concentrated in the early 1990s (note that the austerity packages adopted in 2010-2012, motivated by the developments of the budget balance, were excluded from the baseline estimates). Figure 6 shows the results. We find that both categories of tax changes render similar impulse-response functions, which to some extent resemble that of the baseline. Specifically, a tax increase imposed by external institutions or motivated by long-run growth generates a fall in output of 1.1% after 1 year, whereas measures adopted to manage the public balance imply a GDP fall of 1.6% after 3 quarters and 1.3% after 1 year, with confidence bands that well encompass the former estimates. It is worth noting also that GDP recovers much faster when tax changes are adopted as a result of movements of the budget balance. In this regard, this result is consistent with Romer and Romer (2010), who find zero effects of deficit-driven tax rises.

Effects of direct versus indirect taxes
In this subsection we analyze to what extent direct and/or indirect taxation drive the (negative) effects of taxes on GDP we have found thus far. We define changes in direct taxes as those pertaining to the personal and corporate income taxes as well as social contributions, whereas changes in indirect taxes include the value added tax and duties on specific products. 12 Of the 45 exogenous tax changes in our dataset, 22  This table shows the distribution of changes in direct and indirect taxes regarding their motivation: (i) tax changes imposed by external bodies and those aiming at increasing long-run GDP, and (ii) tax changes motivated by improving the government budget. Direct taxes include the personal income tax, the corporate income tax, and social contributions. Indirect taxes comprise the value added tax and taxes on specific products correspond to direct taxes and 19 to indirect taxes. In terms of the quarterly tax series, out of the 30 quarters with tax changes, 18 include changes in direct taxation and 17 in indirect taxation. Regarding the motivation of the tax changes, direct tax changes are more likely to be motivated by external factors and long-run reforms, whereas indirect tax shocks are roughly evenly distributed between external and long-run reforms and deficit-driven tax changes, see Table 3. We split our tax series into changes in direct and indirect taxation, and include both variables in the VAR specification in order to take into account that they are likely to be correlated. We find that the estimated effect of an increase in direct taxes has a smaller effect on output than an increase of indirect taxes, which has a large negative effect on GDP, see Fig. 7. As before, we rise a flag of caution on interpreting these results at face value, given the small sample of measures on which these estimations are performed. Moreover, we found some evidence that macro developments help predict changes in indirect taxation. Having said this, it is worth stressing that the results point towards more costly increases of indirect taxes.

Further results
In this section we present further results on the effect of tax policy changes. Specifically, we explore the sensitivity of the baseline estimates to including the exogenous measures adopted during the financial crisis and to accounting for anticipation effects. Moreover, we explore the effect of tax shocks on consumption and investment.

Including the exogenous tax changes adopted during the financial crisis
In our baseline estimates we excluded those tax changes adopted during the period 2008-2013, on the grounds that excluding them helped the unpredictability of the tax series, see Sect. 2.4. Nevertheless, the period of financial turmoil that followed the Great Recession provides a useful source of identifying variation and, at least conceptually, some of the tax reforms implemented at this time can be regarded as exogenous, see Sect. 2.3. In this section we explore the sensitivity of the baseline estimates to including such measures. In order to do so, we repeat the baseline VAR model with the new tax series. Moreover, we add a financial crisis dummy (2008)(2009)(2010)(2011)(2012)(2013) in order to capture the macroeconomic turbulence surrounding this period, see Mertens (2015). Figure 8 shows that this has a small effect on the point estimates of the impulseresponse function. We find that, following a tax increase, GDP falls by 1.2% after 1 year, which is 0.1% points less than in the baseline. The time profile mimic also that of the benchmark results. Therefore, we conclude that the estimated negative multipliers we found in the benchmark case are robust to including the turbulent events surrounding the financial crisis.

Anticipation effects
By estimating the impact of tax shocks at the implementation date rather than at the announcement date, we assumed that agents do not react to anticipated tax shocks. This is in line with the baseline specifications of Romer and Romer (2010) and Cloyne (2013), who also show a very limited role of anticipation effects. On the contrary, Mertens and Ravn (2012a) find that unanticipated tax cuts, defined as measures implemented within 90 days of becoming law, give rise to significant increases in output, consumption and investment, whereas anticipated tax cuts are associated to preimplementation drops in output and investment, and no changes in consumption. Once they are implemented, anticipated tax cuts are associated to increases in output and investment.
In this section we explore the role of anticipation effects in our data. To do so, we classify each tax shock as either surprise or anticipated. Surprise shocks are those that occur within the same or in the next quarter after they are announced, whereas anticipated shocks are changes in tax liabilities happening at least two quarters after their announcement. The date of announcement corresponds to the month in which the tax change is signed into law, save for tax changes embedded in the draft budget law, which are normally presented on 30 September and become law in late December. For these tax changes, the month of announcement is considered to be September. Note also that we assign to the next quarter those measures announced in the last month of a quarter. Figure 9 plots the implementation lag distribution of the tax shock series. It shows that most of the tax shocks are classified as surprise, though a non-negligible amount can be regarded as anticipated.
We then estimate the regression model proposed by Mertens and Ravn (2012a), which includes both surprise and anticipated tax changes, as well as anticipation effects (i.e. preimplementation responses). Specifically, the VAR takes the following form: i.e. the implementation lag. The date of announcement is assigned to the month the tax change becomes law, save the tax changes embedded in the draft budget law, whose announcement is assigned to September. Note also that we assign to the next quarter those measures announced in the last month of a quarter where Y includes the same variables as in the baseline, τ u t is the surprise tax series and τ a t,i measures the sum of all anticipated tax liability changes known at date t to be implemented at date t + i. Hence, the coefficients associated to τ a t,i for i > 0 account for anticipation effects of the anticipated tax changes. As before, we include three lags of the endogenous variables and three lags and the contemporaneous value of the narrative series (both surprise and anticipated). The maximum anticipation horizon K is set to 2 quarters. Figure 10 shows the results. The panel A plots the GDP response of a 1% of GDP increase in surprise taxes, whereas the panel B plots the response of an increase in anticipated taxes. Regarding the former, an unexpected tax increase triggers a fall in GDP of 1.3% after 2 quarters, a response that is close to the baseline estimates presented in Fig. 4. On the contrary, the panel B shows that an increase in anticipated taxes yields no effect on output after being implemented, the 68% confidence bands encompassing well the zero effect. This panel suggests also that there is a small preimplementation fall in output after an anticipated tax increase, of around 0.5% of GDP. However, we found that this result is not robust to extending the anticipation horizon to 3 or 4 quarters, hence one must exert extra caution when reading it. All in all, this exercise provides some evidence that anticipation effects play a role in determining the response of output to tax increases, being surprise tax changes those that trigger a GDP movement.
We carried out two additional exercises with regard to anticipation effects. First, some of the tax measures were explicitly legislated to be temporary. These measures, opposite to permanent tax liability changes, would trigger a milder reaction if agents follow the permanent income hypothesis. We therefore reestimated our baseline VAR excluding these temporary measures, which implies the suppression of 7 exogenous tax measures. The estimated effect is slightly lower than the baseline. After an increase of taxes, output falls by 1.0% after 4 quarters, which is 0.2% points less than in the baseline, gradually converging toward zero from that quarter on. Nevertheless, inspecting the confidence bands of these and the baseline estimates we conclude that these differences are not statistically significant. Second, we analyzed the output effects of tax changes at announcement date, rather than at implementation date. Specifically, we computed the cumulative yearly revenue effect of each tax change and assigned it to the date of announcement. 13 We then estimated the effect on output of this new tax series. We found lower effects with respect to the baseline estimates. GDP falls by 0.5% in the first 2 quarters and rapidly converges towards zero and even positive estimates, with the one-standard error confidence bands encompassing the zero-effect at all horizons. Therefore, this exercise suggests that the output effects of tax changes are stronger when they are implemented compared to when they are announced. It must be noted though that the announcement date is subject to measurement error, as agents can well anticipate tax changes before legislation is passed. Therefore, one must be cautious in interpreting this result.

Effects of tax changes on consumption and investment
In this section we analyze the effects of tax shocks on private consumption and investment by adding these two variables to the baseline VAR described in Eq. (1). 14 We find that following a tax increase, both consumption and investment fall in the short-term, then they recover to their original levels at larger horizons. After 1 year, consumption decreases by 0.9% after 1 year, whereas the fall in investment is much sharper, 4% after 1 quarter and 3.5% after 1 year (Fig. 11). Again, there is considerable uncertainty 13 Note that this procedure implied the exclusion of temporary measures, whose cumulative effect is zero. 14 In this case, the information criteria suggests a lag order of 2. The model is a five-variable VAR of GDP, consumption, investment, government spending, and the short-term interest rate, where the narrative tax series is added as an exogenous variable. 68 and 90% bootstrapped error bands are depicted in gray areas surrounding these estimates but, overall, they suggest that following a tax increase, investment reacts more than consumption. This result goes in line with a similar finding by Romer and Romer (2010) and Mertens and Ravn (2012a) in the US and Cloyne (2013) in the UK.

Conclusions
This paper makes two contributions. First, it presents a newly constructed narrative dataset of legislated tax changes adopted in Spain during the period 1986-2015. Second, we use the tax measures whose motivation is not to offset macro shocks in order to estimate the GDP impact of tax changes. In this regard, this paper can be framed within an emerging literature that applies the narrative approach to assess the impact of tax changes on output. This literature was started by Romer and Romer (2010) and went on with further applications for the US and a few European countries. The use of narrative methods provide a credible source of identification by overcoming the traditional problem of finding a source of exogenous variation in tax policies. Overall, our estimates point towards negative effects from tax increases in Spain. Our baseline result shows that following a 1% of GDP increase in taxes, output falls by 1.3% after 1 year, this negative effect fading away over time. Focusing on changes in indirect taxes yields a higher fall in output. Also, following a tax increase the reaction of investment is larger than that of consumption. We note that the estimates are subject to non-negligible uncertainty, reflected in wide confidence bands.
The narrative literature applied to tax policy has experienced significant developments during the last years. For example, important contributions have been made on regime-dependent multipliers (Auerbach and Gorodnichenko (2012)) and on reconciliating the results obtained from narrative vs. SVAR approaches, see for example Favero and Giavazzi (2012) and Mertens and Ravn (2012b). We think that further research can bring the new narrative dataset of Spain to this frameworks in order to improve the estimation of the impact of tax shocks. Given the protracted euro area public debt crisis and the lingering fiscal consolidation needs in several countries, understanding the effects of fiscal policy on macroeconomic developments remains a crucial issue in order to promote growth and achieve fiscal sustainability.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.