It’s a matter of confidence. Institutions, government stability and economic outcomes

In this paper we analyse the effect of constitutional structures over policy outcomes. In particular, we exploit the heterogeneity in parliamentary systems deriving from the presence and the use of the confidence vote to investigate whether stable and unstable parliamentary systems behave differently in terms of the policies they implement. This finer partition of parliamentary systems allows us to identify effects that are more robust than the ones previously discussed in the literature. We show that the difference between presidential and parliamentary systems documented in previous works is driven by a difference between presidential and stable parliamentary systems. We suggest that possible transmission channels are legislative cohesion and (the absence of) selection.


Introduction
Over the last decade the political economy literature has focused on the impact that political institutions have on national policies (Persson 2002) and the seminal work of Persson and Tabellini (2003) has shown that institutions, namely political regimes, do matter in shaping the size and the composition of government spending.
While their partition of institutional frameworks in parliamentary and presidential is the most intuitive, it does not lead to results that are robust to changes such as, for example, the set of countries or the times span as Blume et al. (2009) have highlighted.
We suggest that policy decisions, and government spending among them, depend on how political actors respond to the incentives put in place by the institutional system in which they operate. In other words, the equilibria that emerge from the political process depend on the interaction of the institutional mechanisms defined by the constitutions with the underlying characteristics of the political environment.
We therefore focus on the presence of the confidence vote, 1 traditionally the main element of distinction between parliamentary and presidential regimes, and exploit its effective use as a way to account for the interaction between institutions and other characteristics that may influence policy implementation. In some cases, the confidence vote does indeed generate frequent changes of government, thus replacing possibly bad politicians and generating a different government composition (selection effect). In some others instead, the confidence vote acts as a credible threat and may induce either the executive to behave better (disciplining effect) or the parliament to accept more frequently the executive's misbehaviour (legislative cohesion). Hence, the performance of parliamentary systems may depend on politicians' characteristics such as, for example, the quality of the information available and/or the alignment of their interests with the citizens.
The de facto implementation of a constitutional feature -the confidence voteallows us to highlight the greater heterogeneity of parliamentary systems. Given this complexity, we investigate more deeply the characteristics of countries that adopt a parliamentary constitution by considering the stability of the government as a proxy to distinguish different parliamentary systems (Lijphart 2004). We measure stability as inversely related to the frequency of government changes, which is clearly correlated with the effective use of the confidence vote.
Dividing parliamentary systems in stable and unstable ones responds to the suggestion made by some authors (e.g., Acemoglu 2005;Voigt 2011) that the distinction between parliamentary and presidential systems may simply be too coarse to robustly explain the differences in policy outcomes that we observe.
We contribute to the literature that started with Persson and Tabellini (2003). In their paper, the authors compare constitutional systems-presidentialism vs. parliamentarism-and electoral rules-majoritarian vs. proportional-in order to identify the differences, if any, in a number of relevant social and economic indicators. 2 Without doubting their seminal role, a large number of follow-up articles challenged their results. Blume et al. (2009) found that, while the results on the effect of the electoral rules are robust, the effects of the parliamentary vs. presidential constitutional choice are sensitive to an enlargement of the dataset and the updating of the economic indicators used as regressors.
Moreover, Acemoglu (2005) and Voigt (2011), among others, questioned the presidential/parliamentary classification of constitutional structures in a twofold manner. On the one hand, they claim that the constitutional form of government is endogenous, being an equilibrium outcome rather than an exogenous characteristic. On the other hand, they advocate a finer partition of countries, taking into account the heterogeneity within each group.
The aim of this paper is to verify whether our partition, which is based both on formal institutions (presidential vs. parliamentary) and their actual functioning (stable vs. unstable), may be one of such finer partitions missing in the existing literature. As a matter of fact, our partition stems from the equilibria emerging from the interaction between institutional and other political elements and therefore could shed light on potential transmission channels generating them, thus further contributing to the existing literature. Our main result is that our classification of constitutional systems (presidential, stable parliamentary, unstable parliamentary) delivers results that are more robust than those in the literature. More precisely, we find that stable parliamentary systems are significantly different both from presidential and unstable parliamentary ones. Unstable parliamentary systems and presidential systems, instead, behave alike in terms of the policy they implement. This result is robust to changes in the set of countries included in the dataset and in the definition of stability. Moreover, our analysis tries to address the endogeneity issue raised by Acemoglu (2005) and Voigt (2011), by using a novel instrument.
In Sect. 2 we suggest how our findings may be explained by two effects that are consistent with previous theoretical literature: a selection effect, as in Cella et al. (2017), and a legislative cohesion effect, as in Baron (1998), Feddersen (1998a, 1998b) and Diermeier and Vlaicu (2011). Section 3 presents the data and the empirical strategy, Sect. 4 discusses the results and lists robustness checks and Sect. 5 concludes.

Theoretical background
The idea that further partitioning parliamentary systems may provide insights about the link between institutions and economic policy finds its support in several theoretical papers. As a matter of fact, while the interplay of uncertainty and incentives 1 3 operates unambigously in presidential systems and generally gives rise to unique equilibria, parliamentary systems are more heterogeneous and can support multiple equilibrium strategies and induce multiple equilibrium outcomes (see among others Cella et al. 2017). For example, outcomes of parliamentary systems may differ depending on whether the confidence vote is mostly used as a threat or whether it effectively replaces executives. This refers to the de facto behaviour of politicians that in turn depends on their (short-or long-term) incentive structure. In this context, the confidence vote may affect government duration since a legislative defeat would lead to new elections that will replace the executive and the legislative body with positive probability. On the contrary, in presidential systems both bodies have fixed terms and government stability is not affected by the working of the policy process. 3 In other words, under presidentialism, politicians face undistorted incentives and vote according to policy preferences.
In particular, the literature identifies three channels through which the confidence vote may affect the policy-making process in parliamentary systems. First of all, if the confidence vote is actively used to replace politicians, it improves their expected quality (selection effect, see Cella et al. 2017;Huber and Gallardo 2008). If instead the confidence vote acts as a threat, it may either reduce the distortions to the executive's behaviour (disciplining effect, see Cella et al. 2017;Huber 1996), or induce the voting cohesion in parliament (legislative cohesion, see Baron 1998;Diermeier and Feddersen 1998b;Diermeier and Vlaicu 2011). Cella et al. (2017) highlight this twofold nature of the confidence vote. They model an executive and a legislative body in a parliamentary system where politicians may face early elections if the parliament does not approve the executive's proposed policy. Politicians can be of two types, they either care about implementing the efficient policy or they only care about being in office. The authors show that in such a setting two equilibria may arise, depending on the parameters that describe politicians' quality (type distribution) and information. The confidence vote may act as a threat and induce an office oriented executive to propose the efficient policy in order to prevent early termination of the legislature. In presence of this disciplining effect, stable systems are characterised by a low level of inefficiency. Alternatively, the confidence vote may be used in equilibrium to replace possibly bad politicians. As, on average, office oriented politicians are replaced more often, the expected quality of the executive improves. Hence, the selection effect operates more efficiently in unstable systems in which we should observe a better alignment of the executive's and voters preferences.
In a model of parliamentary democracy, where the government controls the legislative agenda, Baron (1998) shows instead that government members may change policies regarding government spending to preserve the government and may also 3 As noted by Diermeier and Vlaicu (2011): "Under presidentialism the policy process is driven by short-term issue-by-issue incentives because there are only short-term consequences of an unsuccessful proposal. In parliamentary systems, on the other hand, the failure of a policy proposal can lead to a change in the composition of the governing coalition. This injects political incentives in the policy process whereby coalition members consider both their short-term policy interests and their long-term political interest".

3
seek support from the minority in parliament. In other words, the legislative cohesion effect contributes to the stability of parliamentary systems through the approval of a larger fraction of policies aimed at keeping current politicians in power.
The co-existence of these effects implies that parliamentary systems are more heterogeneous than presidential ones, given that their response to the policy implementation process and their degree of stability may vary depending on their characteristics even for a given set of constitutional rules. Moreover, stability and policy response are theoretically correlated, so that it is meaningful to use stability as a proxy to refine the classification of parliamentary systems. The focus of our paper is to exploit empirically this de facto heterogeneity.
Starting from these considerations, our paper tries to shed light on the complex mechanisms that link constitutional features to economic outcomes. We compare the effects that constitutional structures have on the policy-making process, adopting a partition of parliamentary systems that takes into account their degree of stability. In particular, the disciplining effect would imply that the difference between presidential and parliamentary structures should derive from unstable parliamentary systems. On the opposite view, both legislative cohesion and selection effects would induce stable systems to be the ones that differ more from presidential ones. Hence our approach will allow a better understanding not only of the existence of the link between institutions and policies but also of the underlying mechanisms that generate it.

Empirical strategy
We begin our analysis by introducing our refined classification of countries, which, for parliamentary systems, takes into account the level of stability. To do so we create a stability index, gov life, which is the average length of any electoral cycle, computed for each country i, normalized by the legal length of the electoral cycle, over the period 1975-1989. 4 We limit the time span to 1989 in order not to ovelap with the dependent variables, which have been originally built by Persson and Tabellini 4 More precisely, the index gov life is built using the indicator yrcurnt from the World Bank Database of Political Institutions (DPI 2012). The indicator is coded as follows: zero in the election year, L l − k k years after the election, where L l , is the legal length of any electoral term according to country-specific constitutional rules. From the indicator yrcurnt, for every legislature l, we build the variable D l , which characterises the number of years a government has been in office between two elections. The stability index gov life is then defined as: where E l is a dummy which indicates elections. (1) as averages over the period 1990-1998. 5 The index gov life ranges from zero to one, with higher values corresponding to higher stability. 6 Thus, we introduce our finer classification of countries that further partitions parliamentary systems into stable and unstable ones. We classify parliamentary countries in parl stab if their value of gov life is above the median of the stability distribution, and parl unstab if their value of gov life is below the median. 7 In the empirical analysis, we focus on two separate datasets. The first one is the dataset used in Persson and Tabellini (2003) (hereafter PT dataset). The dataset is composed of 85 countries, and it includes data on a set of economic and social indicators. 8 We do not update the variables in the PT dataset as we use it to evaluate how our specification compares to the original one.
The second one is our general dataset (BCIM dataset). It is obtained from the dataset used in Blume et al. (2009), which extends the PT dataset to include 116 countries.
The descriptive statistics of the dependent and independent variables that we consider are in Table 1.
Recall that our analysis focuses on the role of stability on the policy differences between presidential and parliamentary systems. Specifically, we investigate whether the differences found by Persson and Tabellini (2003) are homogeoneously driven by stable and unstable parliamentary countries. We therefore first run a set of regressions described by Eq. (2), estimated through OLS: where pres is the baseline category that represents the control group in our setting.
Dependent variables The main dependent variables are central government expenditure (cgexp) and central government revenues (cgrev), computed as percentage of the GDP, and averaged over the period 1990-1998. 9 Independent variables The main explanatory variables are the dummies that classify countries according to the stability index, as discussed above (pres, parl stab and parl unstab). The controls include: maj, which is equal to one for majoritarian electoral rules and zero for proportional ones; indicators for the continental location, such as OECD (oecd), Central, Latin America and Caribbeans (laam), Africa (2) Y i = + 1 parl stab i + 2 parl unstab i + X i + i , 5 Our strategy does not take into account that structural changes in stability due to constitutional changes may have happened during the time span in which we compute the dependent variable, i.e. 1990-1998, and therefore after the last year we consider for the independent variables (i.e. 1989). In order to take this into account, we measured the (in)stability of parliamentary systems using the index gov life over the period 1975-1998, instead of 1975-1989, thus including all the years that are relevant for the dependent variable. Results, which are available upon request, are qualitatively identical to those we report in this paper, thus suggesting that the constitutional changes that may have happened in the period 1990-1998, on average, do not affect our analysis. 6 As a robustness check, we introduce different measures of stability in Sect. 4.2. 7 More precisely, we first drop three countries which are within 0.05 points from the median of the stability distribution in order to avoid a random assignment of countries due to measurement errors. 8 For a detailed list of variables and sources, see Persson and Tabellini (2003). 9 We consider additional dependent variables in Sect. 4.2 to provide robustness checks.

3
(africa), South and Central Asia (asiae); indicators for the colonial history, such as col_espa (the country is a former colony of Spain or Portugal), col_uka (the country is a former English colony) and col_otha (the country is a former colony of a country other than England, Spain and Portugal); 10 democracy level (gastil); percapita income (lyp); proportion of people between the age 15-64 (prop1564) and over 65 (prop65); population size (lpop); and a dummy indicating a federalist system (federal).
For comparison reasons, we also replicate the following specification used by Persson and Tabellini (2003) and Blume et al. (2009): where the only difference with specification (2) is that countries are partitioned in presidential and parliamentary countries only, and parl is the omitted category.
IV approach The OLS analysis may be biased due to the possible endogeneity of our classification of constitutional systems (Acemoglu 2005). As a matter of fact, political institutions may be shaped by the same factors that determine policies, thus making them endogenous. Therefore, we also perform our analysis using an instrumental variables (IV) approach (Sect. 4.1).

Results
We begin our analysis with a preliminary look at Table 1, which reports the descriptive statistics of the dependent and independent variables, split by constitutional groups. We note that both government size and government revenues are significantly smaller in presidential than parliamentary countries. This is also true for the share of welfare spending in the total government size. Instead, the variation within parliamentary systems is less striking, and the only conclusion we can draw at this stage is that the average government revenue is significantly higher in stable parliamentary countries than in unstable ones.
We then compare our refined classification with the standard analysis by Persson and Tabellini (2003) and Blume et al. (2009). In order to do so, we first run regressions (2) and (3) on the PT dataset using central government expenditure and central government revenues as dependent variables (Table 2). Columns (1) and (3) replicate the standard results in the literature, namely that presidential systems spend systematically less than parliamentary ones, regardless of the chosen measure of the government size.
Columns (2) and (4), instead, show that the difference between constitutional systems is driven by the subgroup of stable parliamentary countries. Indeed, the coefficient of parl stab, 1 , is statistically significant in both regressions. On the 10 All the variables related to the colonial history are weighted for the years of independence as follows: col_uka = col_uk * (250 − t_indep)∕250 , where col_uk = 1 is a dummy indicating a former English colony, t_indep ∈ [0, 250] are the years of independence and 250 is used as the standard value for all noncolonized countries. The same holds for col_espa and col_otha. contrary, the coefficient of parl unstab, 2 , is never significantly different from zero, that is, we cannot reject the hypothesis that unstable parliamentary systems behave like presidential ones. A test for 1 = 2 shows that we can reject the null hypothesis in both cases (p-values of 0.013 and 0.002 respectively); thus, stable parliamentary systems spend significantly more than unstable ones, regardless of the chosen measure of expenditure.  (3), (4) and (5) are mean values for constitutional categories when using the BCIM extended dataset. Entries in columns (6), (7) and (8) are p values. p(x,y) is the t-test on the probability of falsely rejecting equal means across groups corresponding to columns x and y, under the assumption of equal variances. Column (9) is the probability of falsely rejecting equal means across the original PT dataset and the BCIM extended dataset, under the assumption of equal variances. n. obs. at the bottom of the   (2) and (4)) for 1 = 2 .

3
Let us now comment on the effects of the other control variables. First, we note that countries with majoritarian electoral system (maj) spend systematically less than those with a proportional one, and this is consistent with the previous literature (see Persson 2002). The same effect holds for federalist countries. The variable gastil, which is inversely related to the quality of the democracy, has a negative effect on both government size and revenues, as expected. On the contrary, costs and revenues increase with population (even though the level of significance varies depending on the econometrics specification) and with the share of citizens over 65. Geographical dummies, and those indicating colonial history, are only weakly correlated with government expenditure and revenues. Note that covariates' magnitude, sign and significance are only marginally affected by our specification of the model, thus ensuring that we do not increase correlation between independent variables when we introduce our finer classification of constitutional systems.
We then run the same set of regressions on the extended BCIM dataset to check the robustness of our approach (Table 3). As shown by Blume et al. (2009), and replicated in columns (1) and (3), the difference between constitutional systems in the traditional classification is no longer significant, even though the coefficients retain the same sign.
Columns (2) and (4) display the results of our specification. Results of Table 2 prove to be robust to changes in the dataset. First, the coefficient 1 is still significantly different from zero, showing that stable parliamentary systems spend more than presidential ones. Moreover, 2 is not significantly different from zero, that is, we cannot distinguish the performance of unstable parliamentary systems from presidential ones. Finally, we can reject the hypothesis that 1 = 2 (p values are 0.049 and 0.009, respectively). In other words, our analysis strongly suggests that parliamentary systems are not a homogeneous group, so that refining their classification is a modelling improvement.
Moreover, as in Blume et al. (2009), the effect of the electoral system remains robust to the change in the dataset. 11 The same is true for the dummy indicating a federalist system and the proportion of people over 65. Again, we highlight that our classification of countries does not interfere with other covariates.

Instrumental variables
As discussed in Sect. 3, our model may give rise to issues of endogeneity, due to omitted variables. This may be particularly relevant given the cross-section nature of 11 Previous literature has analysed the interplay between constitution and voting system, showing that countries adopting presidential constitutions and majoritarian electoral systems spend, on average, 10% less than parliamentary countries with proportional voting (Persson and Tabellini 2004). We have replicated the same analysis in order to verify if our setting provides novel insights also with respect to the interplay between constitution and voting system. Results, which are available upon request, confirm the findings of the previous literature. In particular, the electoral rule may either mitigate or exaggerate the difference between constitutional systems, depending on whether a parliamentary country adopts a majoritarian or a proportional rule, respectively. As a result, the highest government expenditure is observed in stable parliamentary countries adopting a proportional voting system. our dataset, as we are not able to include fixed effects at country-level to control for time-invariant country-specific characteristics. 12 Hence, we replicate our baseline analysis with an IV approach, as previously done in the relevant literature (Persson and Tabellini 2004;Voigt 2011).
We introduce a selection equation where we let the constitutional choice C depend on a set of observable characteristics: Subscripts i and j refer to individual observations and outcome alternatives respectively (where j = 0, 1, 2 indicates presidential, stable parliamentary and unstable parliamentary systems), X i is the same set of controls entering equation (2), and Z i is the vector of instruments.
Our analysis differs from the IV strategy adopted by Persson and Tabellini (2004) in two ways. First, we estimate the entire model using the Conditional Mixed Process (CMP), developed by Roodman (2011). In our framework, it is a limited information maximum likelihood (LIML) estimator, which replicates standard IV intuitions but considers both the headline (2) and the selection (4) models as a joint system of equations. This method may potentially generate a gain in efficiency because (i) it directly accounts for the non-continuous nature of the endogenous variables (i.e., the type of constitution), and (ii) it considers potential linkages between the error processes of headline and selection equations. 13 In detail, we estimate the selection equation by a multinomial probit model, and the headline regression through OLS. In both equations, our constitutional setups enter as a single categorical variable, as defined above. However, we recognize that little is known empirically about the actual performance of the CMP estimator with a multinomial first stage; as a result, its behaviour may be unpredictable. Thus, we replicate the IV analysis using alternative approaches: a standard 2SLS, and H-FUL estimator by Hausman et al. (2012) that allows for heteroskedasticity. 14 In these models, we include two binary endogenous variables, parl stab and parl unstab, in the first stage. This exercise allows us 13 For more information about the econometrics of the CMP approach, see Roodman (2011). Bound et al. (1995), Baum (2014), Rivers and Vuong (1988) and Wooldridge (2010) discuss in detail why ML estimators may be more efficient than standard two-stages techniques when the IV approach involves not-continuous variables. This approach has been extensively adopted in the empirical literature (see for example Perez and Sanz 2005;Petreski et al. 2014;Suarez-Varela et al. 2015). 14 H-FUL has been proposed by Hausman et al. (2012) to deal with issues raised by the presence of many instruments in heteroskedastic data. The proposed solution is a Fuller (1977) like estimator with standard errors that are robust to heteroskedasticity and many instruments. Recently, Anatolyev and Skolkova (2019) have made available a Stata command that performs the H-FUL estimator: mivreg. 12 We acknowledge that the use of panel data techniques would make the empirical analysis more robust. However, there is in general very little variation of constitutional systems at country level, and our data are no exception to this. According to the Database of Political Institutions, issued by the World Bank, among the 116 countries we include in our analysis, in the period 1975-1989, there are only 24 cases in which countries have changed their constitutional system, out of 1,326 observations available, thus representing only 1.81% of total observations. If we extend the time span to the entire period potentially under investigation in our paper (till 1998), we have 35 constitutional changes out of 2,229 observations, 1.57% of total observations. This limits the possibility to run panel data models, as already stressed by several authors (Gurr 1975;Persson and Tabellini 2004) who have defined the constitution as the iron law, due to the typical constitutional inertia (Persson and Tabellini 2004, p. 28).  (2) and (4)) for 1 = 2 . *p < 0.1 , **p < 0.05 , ***p < 0.01 to test the robustness of our IV analysis, and to provide standard diagnostic tests on the validity of the instruments. Results are reported in Table 6, in Appendix A. Second, we modify the set of instruments to make the analysis more robust. We take into account both Persson and Tabellini (2004) justification of their instruments, and the critiques discussed in later works (see in particular Acemoglu 2005). Thus, we maintain in our set of instruments only some of those adopted by Persson and Tabellini (2004). Specifically, we include in Z i the variables con2150, con5180 and con81, respectively dating the origin of the constitution between 1921-1950, between 1951-1980, and after 1981, with before 1921 as the omitted category. These instruments are motivated by the fact that waves of adoptions of specific types of constitution have historically been observed, and this guarantees the correlation with the constitutional choice. The exclusion restriction is instead ensured by the time lag that separates the constitutional dating variables from more recent policy outcomes. Instead, we do not include the Hall and Jones (1999) variables as instruments (i.e., engfrac, eurfrac, lat01, which indicate the fraction of the population speaking major European languages as native tongues and the distance from the equator, respectively), as Persson and Tabellini (2004) did, as their validity has been widely questioned in the literature (see Acemoglu 2005;Rockey 2012), and by Persson and Tabellini themselves (Persson and Tabellini 2004, p. 37).
Moreover, we include a novel instrument in the analysis. We do so as the predictive power of the constitutional dating variables is somewhat weak, even though the F test on their joint validity significantly rejects the null hypothesis. Our novel instrument, confl mean, is computed as the sum of years a country has been involved in violent conflicts between 1800 and the year the country adopted a constitution for the first time, over the reference period. 15 Violent conflicts include both intra and inter state conflicts. 16 The variable confl mean significantly affects the probability of a country to fall into a particular category of our classification.
First, the degree of conflict may strongly impact the choice of the constitutional system itself. High values of the index may indicate deeply divided societies in which political decision making needs to rely on power-sharing rules. Parliamentary systems offer the ideal political environment for a broad power-sharing executive, given that the cabinet is a collegial decision-making body (Lijphart 2004). On the contrary, presidential systems introduce rules that favour a winner-takes-all outcome, also by facilitating the adoption of a majoritarian electoral rule (Linz 1994). As additional evidence in favour of this correlation, Jung and Deering (2015) show that unstable conditions at the time of the constitutional choice increase the likelihood of adoption of a parliamentary system. Second, this effect should be larger in the case of unstable parliamentary systems given that the degree of stability of a 15 Source: Correlate of War Project. 16 For a detailed review of the definition and categorization of conflicts, see Sarkees (2010). country's political environment may be persistent, i.e., a more unstable environment is more likely to arise in presence of past political instability (Alesina et al. 1996). Therefore, we argue that our new instrument has a robust predictive power for the endogenous regressor, i.e., the constitution type in our classification.
As far as the exclusion restriction is concerned, we recognize that some issues may arise, given that countries may increase government expenditure to finance conflicts. However, we argue that this is not a crucial issue in our analysis, given that the instrument has been computed for a period that long precedes policy choices. As a matter of fact, we compute the instrument for a period before constitution adoption, and we note that the average time between constitution adoption and policy choices is long (81.2 years). Therefore, the level of conflict that we consider as an instrument cannot have a direct effect on the dependent variable, given the variety of policy relevant reforms, events and shocks that may have occurred in that period (e.g., political reforms, changes in the political environment, ideological orientation of governments). 17 To sum up, the vector Z i includes the three constitutional dating variables, already adopted by Persson and Tabellini (2004), plus the new instrument confl mean. For the IV analysis, we restrict the focus on our main research goal, thus considering Eq.
(2) as headline regression and the BCIM dataset.
Before discussing IV results, we comment on the validity of the instruments. The bottom part of Table 6, in Appendix A, reports the standard diagnostic tests. The performance of the instrumenting strategy is satisfactory, as it significantly rejects the null-hypothesis for under-identification, while it does not reject the null for overidentification. The Stock-Yogo test for weak-instruments indicates that instruments do not satisfy the 5% critical value, but they do satisfy the 10%. This represents further evidence in favour of the adoption of the ML approach. The endogeneity test, which under the null indicates that the endogenous variables should be treated as exogenous, rejects the null-hypothesis at the 5% level, thus justifying the adoption of the IV strategy.
Let us now discuss IV results of the headline equation, obtained using the LIML estimates. 18 Table 4 tests the model with central government expenditure and central government revenues as dependent variables. Coefficients are similar to those obtained through OLS in terms of direction and statistical significance, but they differ in size. In particular, coefficients associated with our variables of interest, i.e., parl stab and parl unstab, are now larger in magnitude than OLS ones, thus indicating that the OLS estimates are likely to be downward biased, a result already stressed by Persson and Tabellini in the comparison between presidential and parliamentary 17 To further ensure that we do not have issues of reverse causality between government size and the degree of conflict, we run Eq. (2) including confl mean as additional regressor. Coefficients associated with confl mean are far from being significant at the conventional level, both when the dependent variable is central government expenditure and central government revenues (p-values are 0.966 and 0.803, respectively). Results are available upon request. 18 Results from the instrumenting equation are reported in Appendix A, Table 5.

3
systems. When we perform the IV strategy using other approaches (Table 6, Appendix A), results do not qualitatively change.
Therefore, we can conclude that the IV analysis, regardless of the econometric strategy adopted, corroborates our modelling choice and further confirms that taking into account the level of stability of parliamentary systems provides novel and interesting insights into the existing literature on the comparison of constitutional systems.

Robustness checks and possible transmission channels
The empirical results presented above show that parliamentary systems perform differently depending on their level of stability. However, this interpretation can be questioned in several ways. First of all, is stability what we really capture with our measure? Moreover, does stability matter only for parliamentary systems, as our theoretical considerations suggest? Finally, are there other possible interpretations for the result?
This section addresses these three lines of concerns. First, we test the robustness of our results to different measures of stability, to support our claim that stability is what matters. Second, we run a placebo test refining also the classification of presidential systems according to their stability level. Third, we introduce additional controls to take into account alternative effects that may be relevant, namely the possibility that distribution of leftist governments, large electoral districts or income inequality may drive variations in government spending. 19 At the end of the section, we also discuss our results in light of the theoretical models, and investigate possible transmission channels.
Different measures of stability. So far, in the analysis, countries were partitioned according to the index gov life. As discussed in Sect. 3, gov life measures stability as the average duration of governments, normalized by the legal duration of a legislature. As a first robustness check, we avoid the normalization and include the legal duration of the legislature as a control variable to make the relationship between (in)stability and the outcome variable more flexible.
A different approach to stability could define stable countries as those in which governments reach the natural end of the legislature, while defining all others as equally unstable. In other words, stability could have a non-linear effect. For this reason, we introduce an alternative index, gov end, defined as the fraction of governments that are successful in reaching the legal term of the mandate. The index is built using the indicator yrcurnt from the DPI dataset. Higher values of the index correspond to higher stability.
Another possible concern is that gov life measures the life of a government between two elections. However, changes in the executive do not map one to one to election years. There is the question of how to classify countries in which the executive changes frequently, but elections are held at regular intervals. To tackle this issue we introduce a third measure of stability, year exec, defined as the average tenure of the head of the executive, weighted by the legal length of any electoral term. This index is built using the indicator yearoff from the DPI dataset, which collects information about the number of years the head of the executive has been in office. Higher values of the index correspond to higher stability. Moreover, parliamentary countries in which we have several changes of government, and early elections, but always the same party in power (such as Italy with Democrazia Cristiana from the 1950s to the 1970s) have a low index of stability when we use gov life. One could argue, however, that those countries are stable, as the leadership is strongly held by the same political group. To take this concern into account, we introduce a fourth index of stability, year party, defined as the average number of years the governing party has been in office weighted by the legal length of the electoral term. This index is built using the indicator prtyin from the DPI dataset, which counts the number of years the chief executive party has been in office. Higher values of the index correspond to higher stability.
Finally, an alternative way to measure (in)stability is the number of excess elections, computed as the number of observed elections minus the number of expected elections by law, in the period under investigation. This last indicator can be squared in order to further penalize very unstable parliamentary systems.
Using these alternative stability indexes, we perform our main regression, as in Eq.
(2), on cgexp as dependent variable and, as Table 7 in Appendix A shows, our results do not qualitatively change.
Placebo test Table 8 in Appendix A shows the descriptive statistics of stability by constitutional group. We notice that the average stability is not significantly different between groups. This could raise the concern that we pick up the effect of stability in general, and not the effect of different equilibria in the legislative process of parliamentary systems. To address this issue, we run a placebo test by splitting presidential countries according to their stability, with the same methodology used for parliamentary countries in the main analysis. Table 9 in Appendix A shows that no effect is found when we adopt this classification, that is, stable presidential systems are not significantly different from unstable ones. 20 Additional controls One may wonder how our results may be affected by an uneven distribution of leftist governments or large electoral districts, which may inflate government spending (Milesi-Ferretti et al. 2002). To address this concern we estimate the model including the executive's ideological position (right_left), and the district magnitude (magn) as additional regressors. 21 Table 10 in Appendix A shows that results do not change.
Moreover, we may be concerned by the effect of income inequality on government spending. The theoretical literature on the size of government (Meltzer 20 Results do not change if we partition presidential countries according to the same threshold of stability that we use to partition parliamentary countries, instead of using the median level for presidential ones. 21 Source: DPI, 2012.

3
and Richard 1981) has shown that public expenditure may depend on the relation between the mean income and the income of the median (decisive) voter. Thus, failing to control for that may lead to an omitted variable bias. As a robustness check, we attempt to include information on the share of poor voters in our analysis. Given the lack of direct data, we consider two variables that together may capture the presence of a poor electorate at country level: (i) a measure of income inequality; (ii) an indicator of the average turnout. We believe that these two variables may efficiently account for the link between poor voters and the size of government. In fact, income inequality indicates the concentration of wealth and, as such, it is expected to be positively correlated with a feeling of injustice among the citizens, who may use the ballot box to ask for redistributive policies (Alesina and Angeletos 2005). Hence, income inequality captures the effect of the presence of a poor population on the size of government. Moreover, even if we do not have direct information on the share of poor people among voters, we include the average turnout at country level as additional variable, so as to control for differences among countries that may be driven by differences in the propensity to vote. Data on income inequality are taken from the Global Consumption and Income Project (GDIP) that provides comprehensive information about income inequality for a large number of countries, over the period 1960-2015. First, we restrict the sample to countries that are present in the BCIM dataset and to the period under investigation, that is, up to 1989. Then, for each country i, we extract from the GDIP dataset two measures of inequality: the GINI coefficient, i.e., cumulative proportions of population compared to cumulative proportions of income, and the P80/P20 ratio, that is, the ratio of income owned by the 20% richest to the 20% poorest. As for the turnout, we use the IDEA Voter Turnout Dataset that registers the turnout for each election in a wide number of countries, since 1945. Again, we restrict the dataset to match the same sample of countries and time span, i.e., 1960-1989, as for income inequality. In light of the cross-section nature of our dataset, we average the two variable over the considered period. Table 11, in Appendix A, shows the results: columns (1) and (2) use the GINI coefficient as measure of income inequality, while columns (3) and (4) use the ratio P80/P20. We test this exercise on both the PT (columns 1 and 3) and BCIM (columns 2 and 4) dataset. Results corroborate theoretical intuitions: coefficients associated with income inequality are always positive, statistically significant and large in magnitude, thus indicating that there exists a positive correlation between inequality at country level and the size of government. Moreover, the turnout positively affects the size of government too, even though to a lesser extent. However, our main regressors, i.e., parl stab and parl unstab, retain their original relevance both in terms of statistical power and magnitude.
Transmission channels The overall empirical analysis is consistent with the theoretical intuitions stated in Sect. 2. In particular, in a model as in Cella et al. (2017), the performance of parliamentary systems closely approaches the performance of presidential ones if the confidence vote entails a better ability of legislators to reject bad policy proposals, thus creating a more unstable political environment. This mechanism drives stable parliamentary systems far from the performance of presidential ones, but also from the one of unstable parliamentary 1 3 systems. The same effect is consistent with the presence of the legislative cohesion effect in models as in Baron (1998), according to which the executive and the parliament coordinate to keep politicians in power and to avoid a no confidence motion. Coordination leads either the executive to formulate policy proposals that please the majority of the veto-players or legislators to accept a larger fraction of executive's proposals to avoid early elections. To sum up, the selection effect makes unstable parliamentary systems very similar to presidential ones and legislative cohesion makes stable parliamentary systems very different from presidential ones. We have shown how robust this difference is in explaining an array of implemented policies but our analysis does not allow us to separate the two effects just described.
The main findings described so far do not bring evidence in favour of the disciplining effect that would predict a similar performance by presidential and stable parliamentary systems, by favouring congruent behaviour in both the executive and the parliament. However, this may depend on the fact that we do not empirically isolate the subgroup of fully stable parliamentary countries that should display such effect. This leads us to suppose that the difference between presidential and parliamentary systems is not monotonically increasing in the level of stability of the latter. To test this insight, we further split parliamentary countries into four categories according to their stability distribution. Even if the analysis is sensitive to the small number of countries included in each category, we do find that the difference between constitutional systems is increasing in the stability of the parliamentary constitutional design, but it drops when we consider fully stable parliamentary countries that belong to the top quartile. Results are reported in Appendix A, Table 12. 22

Conclusions
Our analysis has been driven by the belief that institutional features shape the legislative process interacting with other underlying characteristics of the political environment. The presence of the confidence vote in parliamentary systems and its different effect across countries has suggested to classify parliamentary systems according to their degree of stability.
This finer partition of constitutional systems allows us to improve previous results in the literature that studied the effect of constitutions in terms of implemented policies. We show in fact that the difference in the performance of 22 Further evidence in support of the disciplining effect is that results of our main specification hold even when we make the threshold move along the stability distribution. In detail, results remain significant until stable parliamentary countries are in the 75th percentile of the stability distribution. After that, results are no longer significant. This is consistent with the disciplining effect according to which fully stable parliamentary countries should not be significantly different from any other constitutional category.
1 3 presidential and parliamentary systems is driven by stable parliamentary ones. Our results hold for many policy variables, including government expenditure and revenues, and are robust to changes in the data and the definition of stability.
To address concerns regarding the endogeneity of the constitutional choice, we perform an IV analysis using as instruments the time of adoption of the constitution as already done by previous works. To improve our identification strategy, we introduce a new instrument. We build an index that summarizes the degree of conflict in a country from 1800 up to the year of the constitutional choice. The idea is that deeply polarized societies may prefer to rely on power-sharing rules as those offered by parliamentary constitutions. We provide further evidence for this correlation and claim that the instrument is a robust predictor for our (possibly) endogenous regressor.
We also provide some novel insights on the transmission channels that may generate our empirical results. One of the reasons that may make unstable parliamentary systems more similar, in terms of policy choice, to presidential ones is the selection effect that improves the quality of the executive whenever the legislature does not reach the legal term limit. On the other side, our results may be driven by legislative cohesion that makes stable parliamentary systems more different from presidential ones. In other words, the coordination of the executive and the parliament to stay in office till the end of the term produces very different policy choices. We also find some mild evidence of a disciplining effect for the very stable parliamentary systems. In countries at the top of the stability distribution the executive is very disciplined by the threat of the confidence vote and the legislative process will lead to the same equilibria of presidential systems.
Funding Open access funding provided by Università degli Studi di Milano -Bicocca within the CRUI-CARE Agreement.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/. Appendix A: Additional tables 1 3  (1) and (2) are estimated by multinomial probit. Entries are marginal effects. The following controls are included : maj, age, gastil, lyp, lpop, trade, prop1564, prop65, federal, oecd, asiae, laam, africa, col_uka, col_espa, col_otha *p < 0.1 , **p < 0.05 , ***p < 0.01
Here, the stability index is simply the average length of any electoral cycle computed for each country i. Column (6) considers the number of excess elections as measure of stability, while column (7) accounts for the squared number of excess elections. All the regressions include the following controls: maj, age, lyp, trade, prop1564, prop65, gastil, federal, oecd, lpop, africa, asiae, laam, col_uka, col_espa (1) and (2) trade, prop1564, prop65, gastil, federal, oecd, lpop, africa, asiae, laam, col_uka, col_espa, col_otha. Inequality in columns (1) and (2) is measured through the GINI coefficient; in columns (3) and (4) through the ratio P80/P20 *p < 0.1 , **p < 0.05 , ***p < 0.01  23 We also modify the interval in which we compute some of the dependent variables so as to introduce additional sources of variation in the dataset. When dependent variables are computed starting from years more recent than the original ones, then the index gov life ranges from 1975 to the year preceding the one of the dependent variable. We run regressions (2) and (3) on both the original and enlarged datasets whenever possible. The only exception is the regression with ssw as dependent variable, as ssw is only available for a subgroup of countries even in the PT dataset. The change in dependent variables does not alter our results. Moreover, when estimating the effect of the constitutional design on the share of social welfare spending, Persson and Tabellini (2003) slightly modify the original specification by dropping three control variableslpop, pro1564 and trade. Our specification remains significant even when all controls are included. Results are reported in Tables 13, 14, 15, 16, 17, 18 below. Notes: White heteroskedasticity-consistent standard errors in parentheses. PT-modified and BCIM-modified refer to Persson and Tabellini (2003) specification of the model where the authors include all the standard controls-maj, age, lyp, prop65, gastil, federal, oecd, africa, asiae, laam, col_uka, col_espa, col_otha-except that lpop, prop1564 and trade are missing. Then, we re-estimate the model using the same specification as in the previous table. F test (columns (2), (4)) refers to the hypothesis that the coefficients for parl stab and parl unstab are equal ( 1 = 2 ) *p < 0.1 , **p < 0.05 , ***p < 0.01  logyl is the productivity level as in Persson and Tabellini (2003) (Columns (1) and (2)), and in Blume et al. (2009) (Columns (3) and (4)). The regressions include the following controls : maj, age, lyp, trade, gastil, federal, oecd, lpop, africa, asiae, laam, col_uka, col_ espa, col_otha, avelf, prot80, catho80 cpi is the perception of corruption as in Persson and Tabellini (2003) (Columns (1) and (2)), and in Blume et al. (2009) (Columns (3) and (4)). The regressions include : maj, age, lyp, trade, gastil, federal, oecd, lpop, africa, asiae, laam, col_uka, col_espa, col_otha, avelf, prot80, catho80, confu