Voters’ Distance, Information Bias and Politicians’ Salary

This paper presents a theoretical model exploring the role of institutional distance between voters and politicians in the provision of public goods and citizens’ welfare. Proximity eases access to information about public policies, increasing political accountability. However, rent-seeking politicians can bias information reducing citizens’ welfare. We show that the optimal distance depends on the pool of politicians, voters’ political awareness and the cost of distorting information. As these elements differ across regions, a one-size-fits-all institutional reform may be beneficial for some jurisdictions and detrimental for others. A mechanism based on politicians’ remuneration can mitigate possible welfare-decreasing effects of voter-politician proximity.


Introduction
The last decades have seen a mounting distrust in political elites, increasingly perceived by citizens as distant and self-interested decision makers (Enikolopov and Zhuravskaya 2007;Treisman 2007;Fan et al. 2009). Reducing the institutional distance between rulers and citizens (i.e., the degree of centralization in the political decision-making process) is often considered a possible solution to regain the citizens' confidence in politics. Federal constitutional systems and decentralized settings with greater expenditure and fiscal autonomy (Weingast 2009), or electoral rules favoring direct elections of public officials and small electoral districts (Persson and Tabellini 2003;Micozzi 2013) increase proximity of citizens to elected politicians. This would provide the citizens with more control power over policy actions and force politicians to more transparent and responsive behaviors, thus leading to less corruption (Arikan 2004;Ferraz and Finan 2011) and larger provision of public goods (Seabright 1996;Fisman and Gatti 2002;Hindriks and Lockwood 2009;Weingast 2009;De Janvry et al. 2012;Smart and Sturm 2013;Gradstein 2017).
The empirical evidence, however, does not fully support this view. Several studies found that the effect of decentralization reforms on public good provision is positive or negative, depending on the economic and institutional context (Rodden 2006;Goel et al. 2017;Bordignon et al. 2020;Gong et al. 2020).
To illustrate the ambiguous relation between public good and services provision and voter-politician institutional proximity, in Fig. 1 we consider the association between the number of hospital beds over population, a public good typically provided at the local level, and the number of sub-national government layers as a proxy for decentralization (Brennan and Buchanan 1980;Nelson 1986;Goel and Nelson 2011;Ivanyna and Shah 2014). To obtain Fig. 1, we regress per capita hospital beds on real GDP and country and year fixed effects in a sample of 31 OECD and non-OECD countries over years 1980-2012. Then, regression residuals are plotted against the number of sub-national government layers for low-corruption (panel A) and highcorruption (panel B) countries (where corruption is a measure of institutional quality). Figure 1 clearly depicts a positive relationship between institutional proximity and public goods provision in low-corruption countries (panel A) and a negative relation between proximity and public goods in high-corruption countries (panel B), suggesting that different institutional conditions may alter relevance and direction of the effects of citizens-politicians distance.
To account for the ambiguous effect of institutional distance, we consider a political agency model augmented by the possibility that incumbent politicians take a costly action to distort public opinion. 1 This action is aimed at altering information on the true cost of public good, so that politicians can get a rent and still be re-elected. The Fig. 1 Per capita hospital beds and number of sub-national government layers. Note: In the two graphs residuals from the pooled OLS regression, Hospb it = γ 0 + γ 1 G D P it + γ 2 Countr y i + γ 3 T ime t + it , are plotted against C L O SE, where Hospb is the number of hospital beds (per 1000 people), G D P is real G D P in thousands US$ at constant 2005 national prices, Countr y and T ime are country and time fixed effects, and C L O SE is the number of sub-national governments over resident population (per 100,000 people). Panel A considers the countries with values of the political corruption index below the sample median (low-corruption countries) and Panel B the countries with political corruption above the median level (high-corruption countries), where political corruption is measured by the index of public sector corruption constructed by Coppedge et al. (2015). Data cover the period 1980-2012. The list of countries includes: Albania, Australia, Austria, Belgium, Bulgaria, Canada, Croatia, Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Japan, Lithuania, Netherlands, New Zealand, Norway, Poland, Portugal, Romania, Spain, Sweden, Switzerland, Turkey, United Kingdom and United States optimal biasing action varies with institutional distance, citizens' characteristics and other context factors, which affect its effectiveness. For instance, information-bias actions tend to be most effective in countries or regions where local media competence, integrity and independence from the political power are poor (Di Tella and Franceschelli 2011;Snyder and Strömberg 2010;Kendall et al. 2015;Dutta and Roy 2016;Ferguson et al. 2019) or where media competition for local news is lower than that for national news (Besley and Prat 2006). 2 In equilibrium, the provision of public good increases with the costs of biasing information and voters' ability to correctly process information on the cost of public good. Instead, institutional distance has an ambiguous effect on public good provision, depending on the relationship between proximity and the effectiveness of the biasing action. For example, if local media are more effective watch dogs against incumbent politicians than national media, the distorting action becomes less effective as distance decreases, and the provision of public goods is enhanced by decentralization reforms or direct elections of public officials. By contrast, if local media are easily captured by local politicians, proximity brings about more effective biasing actions and public good provision decreases. 3 In terms of welfare, our model produces the known trade-off between selection and discipline (Besley and Smart 2007;Ashworth et al. 2017). Increasing proximity between voters and politicians might lead to a reduction in social welfare as rentseeking politicians are more easily spotted and then they may prefer to grab the maximum possible rent today instead of trying to be re-elected. 4 In this setting, a reduced distance between citizens and politicians, politicians, even when implying greater accountability, does not always improve public goods provision and welfare. Indeed, since rent-seeking incumbent politicians are compelled to "behave", and reduce rent extraction in order to be re-elected (discipline effect), running for re-election may become a very costly strategy, thus pushing the incumbents to grab the maximum possible rent while in office and forfeiting any reelection aspiration (selection effect).
Our main contribution is to characterize this trade-off, showing that the distance threshold for the discipline effect to prevail changes with the context parameters and in particular welfare-maximizing voter-politician distance depends on voters' ability, cost and effectiveness of information biasing and the share of rent-seeking politicians in the jurisdiction. 5 It follows that optimal distance is generally different for different countries and regions. Therefore, decentralization reforms setting a uniform lower voter-politician distance for all jurisdictions may be welfare improving in some regions, and harmful in others.
Finally, we consider a mechanism that could counteract possible negative effects of proximity on welfare. We extend the previous model by explicitly considering the monetary compensation that politicians receive while in office, and allowing for the pool of politicians to be endogenously determined by the participation of citizens in the political activity. The salary paid to politicians increases the value of being in office for a second term, whence a salary threshold is shown to exist above which rent-seeking politicians, once in office, prefer a pooling strategy to mimic the benevolent ones rather than a separating strategy to extract the maximum rent. This salary threshold is higher, the lower the distance between citizens and politicians. In addition, both salary and distance contribute to determine the pool of politicians. Thus, the remuneration to political officeholders can be used to mitigate the possible welfare-decreasing effects of decentralization reforms in weak regions with low quality of local media and high share of bad politicians. In particular, paying relatively high salary to politicians in weak regions and low salary in strong regions may be welfare-maximizing. In this way, in weak regions a pooling equilibrium would dominate, and bad politicians find it profitable to provide enough public good to disguise themselves as benevolent. By contrast, in strong regions bad politicians, once in office, would be filtered out at the next election and most likely substituted by good politicians. 4 Consistent with the arguments raised by Dewatripont et al. (1999) and Holmström (1999) for private managers, a greater institutional proximity to citizens can lead incumbent politicians to misbehave, since it makes running for re-election a very costly strategy. Also, the most skilled among the rent-seekers can be discouraged from participating in the political competition (Gersbach 2004;Poutvaara and Takalo 2007). This results in a higher share of benevolent politicians, but also worsens the average administrative skillfulness of political candidates. 5 The shares of bad and good politicians in the jurisdiction is a parameter related to the quality of institutions, places culture and social norms. In Fig. 1 we proxy this parameter with corruption.
Our paper is closely related to the literature exploring the impact of information manipulation on public sector efficiency (Bordignon and Minelli 2001;Besley and Prat 2006), selection of politicians (Poutvaara and Takalo 2007;Bordignon et al. 2020), and the dilemma between selection and discipline (Besley and Smart 2007;Ashworth et al. 2017). In addition, we also relate to the literature on the effects of political decentralization reforms and electoral accountability on public services provision (Seabright 1996;Hindriks and Lockwood 2009;Boffa et al. 2016;Aslim and Neyapti 2017).
The paper is organized as follows. Section 2 presents our model. Section 3 describes the equilibrium, while in Sect. 4 the welfare analysis is developed. Section 5 discusses the results applied to the case of heterogeneous regions and the related policy implications. In Sect. 6, the analysis is extended to paid politicians. Section 7 concludes.

The Model
Consider a two-period economy, t = 1, 2, where an election takes place at the end of period 1. The electoral competition is between the incumbent government and a challenger. In order to be elected, candidates must get the majority of votes. In each period, the government in office collects tax revenues T ≥ 0, which are exogenously determined and identical in the two periods, and provides an amount of public good G t ≥ 0 at the unit cost θ t > 0, which is randomly drawn by Nature from a probability distribution function (pdf) f (θ ), identically and independently distributed in the two periods. The institutional setting forbids the possibility to run public deficits, such that θ t G t ≤ T must hold in both periods.
There is a continuum of citizens of measure 1. In each period, citizens gain utility from the public good, net of the taxes they pay. Since the level of taxation is exogenously given, and the value of θ is determined by Nature, citizens' welfare can be written in terms of the amount of public good available in the two periods, SW = G 1 + δG 2 , where δ ∈ [0, 1] is the discount factor. Therefore, citizens' welfare in period t is maximized when the whole amount of tax revenue is used for the provision of public good: The amount of the public good provided by politicians is publicly observable, while the realization of the cost variable θ t in each period is private information of the incumbent government. Voters know the pdf f and receive a noisy signal about the realization of θ , independent of the amount of public good G t . We assume that the signal s i received by the voter i depends on the realization of unit cost θ t , the voter individual attitude towards politicians α i and a possible aggregate information signal z (x, d) purposely generated by incumbent politicians to modify voters' perceptions of how much the amount of public good supplied reflects production costs. 6 Specifically: (2) The term α i ≥ 0 captures the heterogeneity of voters' political awareness and ability to interpret public information about government policies; it is assumed to be randomly distributed across voters with a pdf h A (α). The value α i is defined by the skeptic (α i < 1) or credulous (α i > 1) attitude of voters with respect to politicians and political life. Skeptical voters tend to systematically underestimate the true unit cost of public good, while credulous voters are inclined to overestimate the value of θ .
The aggregate bias z(x, d) ≥ 1 is affected by a factor d ∈ (0, D) representing the institutional distance between voters and elected politicians and by the intensity x ≥ 0 of possible unobservable actions that incumbent politicians carry out to bias information available to citizens through media, with z x (x, d) > 0, z x x (x, d) ≤ 0, z(0, d) = 1 for any d, and z d (x, d) > 0 for any x > 0. 7 The unit cost for the incumbent politicians of taking biasing actions is c > 0.
The distance parameter d is the degree of centralization in the decision-making process or electoral rules. 8 More centralization (i.e., larger d) makes it harder for citizens to acquire reliable information about the true value of θ and easier for the government to cheat (i.e., s i increases for any given value of x). The action of distorting information by the incumbent government aims at affecting voters' beliefs by inducing them to overestimate the true cost of public good, so as to have the chance to get a rent and still be re-elected. As shown in Sect. 3, optimal biasing action x * depends on distance and other parameters, so that a change in d yields possibly ambiguous effects on s i . Finally, for simplicity, we assume that, whatever d is, if incumbent politicians do not take any action to bias public information on costs of public goods provision (i.e., if x = 0), the signal that voters get from media depends only on their skeptic or credulous attitude towards politicians.
Voters know that the signal may be noisy. However, they do not know the pdf h A (α) and view themselves as aware citizens, neither systematically skeptical nor gullible, able to correctly interpret the information provided by politicians and be not influenced by possible biasing actions. Therefore, the belief of voter i about the cost of public good is exactly equal to the received signal s i . Unlike citizens, politicians know how α is distributed in the population of voters, and in particular they know its median value α m . The idea is that politicians are able to form an opinion fairly accurate as to what is the distribution of political awareness in the electorate, due to their continued 6 Alternatively, z can be interpreted as the relative advantage of incumbent politicians over challengers in running for reelection and using media to influence voters (Ansolabehere and Snyder 2002;Ansolabehere et al. 2006). 7 Hereafter, for the sake of notation simplicity, and when no confusion arises, we omit the arguments of the first, second and cross derivative functions and denote them with z x , z d , z xx and z xd . 8 We assume a single-region model where the degree of centralization and government accountability are captured by the voter-politician distance, and abstract from selective rent diversion (Ashworth et al. 2017). political activity, recurrent opinion polls on political preferences, and participation in electoral campaigns.
Politicians are benevolent or rent-seekers, π = (b, r ), in proportion β and 1 − β, respectively. Benevolent politicians maximize citizens' welfare when they are in office. By contrast, rent seekers maximize the expected discounted amount of tax revenues that they can divert in the two periods: Citizens know the value of β, but cannot observe the type of incumbent politicians and challengers. The timeline of the political game is the following. t = 1: the value of θ 1 is observed by the incumbent politician; she decides the intensity of biasing action x, and the level of public good G 1 ; payoffs are realized; t = 2: each voter i observes G 1 and a signal s i about the unit cost of production θ 1 , forms an expectation about the incumbent politician's type, and decides whether to re-elect the incumbent or vote for the challenger; the elected politician observes θ 2 and decides the amount of public good G 2 ; payoffs are realized.
The set of strategy n-tuples of the incumbent politician is given by the possible public goods provided and the biasing actions carried out in each period σ π = G π 1 , x π 1 , G π 2 , x π 2 for π = {b, r }. The set of strategies of the voters consists of the possible voting rules v i establishing whether to vote for the incumbent v I or the challenger v C , according to the observed amount of public good and the perceived signal s i about the cost θ 1 .

Equilibrium
We characterize the equilibrium as the set of strategies of the benevolent and rentseeking incumbent politicians in period 1 and 2, which are best responses to, and consistent with voters' beliefs about the cost of the public good and the type of the incumbent politician. Proceeding by backward induction, we first consider strategies and payoffs in the last office term.
In period 2, the dominant strategy of benevolent politicians is to provide the welfaremaximizing amount of public good, G b 2 = G * 2 = T /θ 2 , and do not take any action to bias the information available to voters, x b 2 = 0. Rent-seeking politicians in office in period 2 also have a unique dominant strategy: they do not spend resources to bias information available to voters (i.e., x r 2 = 0), and pocket all the tax revenues without providing any public good, (i.e., G r 2 = 0). 9 Therefore, the amount of public good in period 2 is independent of d, and it is equal to the first best or zero according to whether a benevolent or a rent-seeking politician is in office.
At the beginning of period 2, each voter observes the actual G 1 provided by the incumbent politician and the signal s i about the production cost θ 1 . Given this information set, voters express their electoral preferences and decide whether to vote for the incumbent or the challenger. Voting is not strategic, so that each citizen decides on the basis of her own payoffs without taking into account the voting patterns of other citizens. In addition, voting is purely retrospective and the challenger is drawn randomly from the pool of politicians at the date of election, such that screening challengers is impossible for voters. 10 For the sake of simplicity, we assume that voters adopt the following behavioral voting strategy.
From Assumption 1, it follows that under the majority rule, the incumbent politician is re-elected with probability q = 1 or q = 0, according to whether G 1 is greater than or equal toĜ i for half of the population or not: where the subscript m identifies the median voter. Condition (5) provides some interesting insights on the role of voters' heterogeneity. In general, s m may be greater, equal or lower than the actual unit cost of public good θ 1 , depending on whether the median bias is greater or less than 1. Two broad scenarios may occur.
If the median voter is skeptical and α m < 1, the amount of public good needed for benevolent politicians to be re-elected is greater than the socially optimal amount: G m > T /θ 1 . As a result, given the impossibility of running a deficit, benevolent incumbents cannot be re-elected, unless they provide information to offset voters' excess skepticism. 12 Only rent-seekers can be re-elected if x and d are such that 10 Although these are standard simplifying assumptions in the political agency literature (Besley 2006), we acknowledge that they do not fully capture the real-world voters' behavior, which has also a prospective component. 11 Voting rule in Assumption 1 abstracts from voters' information on β (i.e.about the composition of the pool of politicians in the jurisdiction). However, in Appendix A we show that under certain conditions on voter out-of-equilibrium beliefs and the Bayesian likelihood ratio, a public-good-threshold voting rule independent of β, for which a voter i votes for the incumbent (resp., the challenger) if G 1 ≥Ĝ (s i ) (resp., G 1 <Ĝ (s i )), is consistent with Bayesian updating of prior beliefs about the incumbent politician type. In the special case in which θ is uniformly distributed between zero and a finite maximum value, and the distribution h A of voters' political awareness is such that h (α m ) = h (1), thenĜ (s i ) = T /s i holds, and the voting strategy in Assumption 1 is part of a perfect Bayesian equilibrium. 12 The literature has recognized the possibility that even benevolent politicians dealing with misinformed voters can alter information (Morris 2001) using public resources otherwise destined to finance the supply of public goods. For example, according to Biglaiser and Mezzetti (1997), if voters correctly update their judgement on politicians' ability by observing a public project, then even a totally benevolent incumbent α m z (x, d) > 1 and information biasing actions make it possible to extract some rents. Therefore, when α m < 1, election acts as an adverse selection mechanism and can only be a discipline device for incumbent rent-seekers.
When α m ≥ 1, the median voter is non skeptical and does not underestimate the production costs of public goods. In this case, the amount of public good needed to be re-elected is lower than the maximum feasible amount, leaving the possibility for both benevolent and rent-seeking politicians to be re-elected. In this case, election is a twofold device to select and discipline politicians. As this is the most interesting case, and nothing fundamental changes in the welfare analysis and policy discussion, in the rest of the paper we will focus on it. However, in Appendix B, we characterize equilibrium and social welfare for the case of skeptical median voters.
Under Assumption 2, a benevolent incumbent maximize her utility (3) by providing the social optimal level of public good G b 1 = T /θ 1 and spending no resources on influencing information available to voters, x b 1 = 0, and is always re-elected with q = 1.
By contrast, a rent-seeking incumbent has two strictly un-dominated strategies.
(i) A "hit and run" strategy (henceforth, H-strategy), consisting in grabbing the maximum rent in period 1, taking no information biasing action x r ,H 1 = 0 and providing G r ,H 1 = 0. Since this strategy reveals that the politician in office is a rent-seeker, she is not re-elected and her payoff is: (ii) An "election" strategy (henceforth, E-strategy), consisting in providing the amount of public good needed to be re-elected, and then pocketing all tax revenues in period 2. Thus, the E-strategy implies x r ,E 1 > 0 and G r ,E 1 =Ĝ m in period 1, with a payoff equal to: Equations (6) and (7) indicate that a rent-seeking incumbent faces a trade-off between getting the whole rent today but giving it up tomorrow, and foregoing some rent today in order to get the full rent in the second period. Since the information bias z (x, d) is non-convex in x, optimal bias is determined by maximizing the payoff under the E-strategy in Eq. (7). The first order condition equalling marginal benefit is willing to pay more for that project than its expected real economic benefit. In addition, benevolent politicians can choose to pander to voters by implementing populist policies contrary to voters' interests (Canes-Wrone et al. 2001). Finally, partially benevolent incumbents (i.e. caring for both the public interest and their private rents) behave more or less opportunistically according to the information available to voters and the likely opportunism of challengers (Beniers and Dur 2007). and marginal cost of the biasing activity is implicitly given by the value x r ,E 1 = x * that satisfies: Given the second order condition (7), the best strategy for rent-seeking politicians is derived by comparing the maximum payoffs under E-and H-strategy. In particular: Proposition 1 Under Assumptions 1 and 2, a unique pooling or separating equilibrium exists, according to whether The optimal strategies for benevolent and rent-seeking incumbent politicians are: Pooling equilibrium. At time 1, rent-seeking politicians provide G r ,E As shown by Eq. (8), optimal information biasing depends on distance, voters' awareness and the unit cost of the biasing activity. In particular: Proposition 2 By applying the implicit function theorem to Eq.
The effect of voter-politician distance on the optimal biasing action is uncertain, since it depends on how x and d interact in affecting the signal received by voters: The simple intuition is that if the marginal effectiveness of the biasing action decreases or weakly increases with d (for example, because local media are less effective in checking and balancing political power or they are more easily manipulated by local politicians) the intensity of the biasing action and resources spent on information biasing by rentseeking politicians are reduced in more centralized political system. On the other hand, when the marginal effectiveness of biasing actions strongly increases with the distance, higher centralization leads incumbent rent-seekers to increase their efforts to bias information available to voters. Concerning α m and c, when voters are more politically unaware, rent-seeking politicians have less need to bias information in order to extract private rents from taxes and be re-elected. Similarly, if biasing actions are more expensive, the optimal value x * unambiguously drops.
In a similar way the impact of distance, voters' political unawareness and biasing costs onĜ m may be worked out by differentiating Eq. (5).
An increase of voters' political unawareness and a decrease of information biasing costs unambiguously decrease the provision of public goods by rent-seeking incumbents aiming at being re-elected. By contrast, conditional on choosing the E-strategy, the amount of public good supplied by a rent-seeking incumbent is decreasing with the voter-politician distance only if z x,d > z x x z d /z x , that is, if the marginal effectiveness of the biasing action does not strongly decrease with d. Otherwise, if the quality and independence of media is poorer at the local than the central level, such that z x,d ≤ z x x z d /z x , in decentralized political systems bad politicians have the opportunity to bias information to voters, neutralize the greater accountability produced by voters' proximity, and supply a smaller amount of public good. Now, we can derive how the optimal strategy of bad incumbents is affected by distance: Proposition 4 For any given value of α m and c, a unique distanced (α m , c) exists, such that for d <d rent-seeking politicians prefer the H-strategy, and a separating equilibrium prevails, while for d ≥d the E-strategy is preferred and a pooling equilibrium prevails. The thresholdd decreases with voters' political unawareness α m , and increases with the cost of biasing information c.
The intuition behind Proposition 4 is the following. Rent-seeking incumbents can disguise themselves as benevolent politicians more easily the larger the distance from voters and the political unawareness of the latter, and the less costly is the action to bias information about the real costs of public goods provision. The more difficult cheating, the higher the thresholdd. Below the thresholdd, re-election would require to provide an amount of public goods so large that bad politicians find more profitable not mimicking benevolent politicians and grabbing all the taxes in the first electoral term.
Proposition 4 is illustrated in Fig. 2, which displays the relationship between distance and the amount of public good provided by a rent-seeking incumbent.
In Panel A we assume that the marginal effectiveness of the biasing-information action increases or slightly decreases with distance (i.e., z xd > z x x z d /z x for any d). In this case, the amount of public good supplied by rent-seeking incumbents is zero for d <d when H-strategy is chosen, while it is positive and monotonically decreasing for d ≥d and E-strategy is chosen. At d =d, public goods provided by rent-seeking politicians reach the maximum valueG r . Further increases in distance lead to reducing the provision of public goods because less and less information is available to voters, and for rent seekers passing themselves off as benevolent politicians is easier. The dashed line displays comparative statics with respect to α m and c. An increase in the political unawareness of median voter and a decrease in biasing costs bring about a decrease inĜ m , so that the sloped portion of the curve shifts downwards. Moreover, U increases (see Eqs. (C.8) and (C.9)), and therefore thresholdd is lower. Note that in the graph, the valueG r at the new threshold is placed below the value at the initial threshold, but it may also be above or at the same level as the latter. Intuitively, more unaware voters allow rent-seeking incumbents to be re-elected by providing a lower amount of public good, and this makes the E-strategy more rewarding even at lower distances. By contrast, an increase in the unit cost of biasing reduces the gain from re-election and narrows the set of values d for which the E-strategy is optimal.
In Panel B, we consider the case in which the marginal effectiveness of information biasing decreases with voter-politician distance (z xd ≤ z x x z d /z x ). Once again, there is a thresholdd under which rent-seeking incumbents prefer to play H-strategy, while for d ≥d, the preferred strategy is "election". In this case, however, as the voterpolitician distance increases, the optimal biasing action x * decreases so strongly as to compensate for the negative effects of distance and improve the transparency of information available to voters. In addition, the reduction of expenses for information biasing allows the incumbent to afford to provide the larger amount of public good needed to be re-elected. Therefore, the provision of public goods by rent-seeking politicians increases with distance from voters and reaches its maximum at the highest possible degree of political centralization (i.e., at d = D). Comparative statics is similar to that shown in Panel A.

Welfare
In this section, we determine what is the voter-politician distance set by a constitutional legislator who maximizes the expected welfare of citizens. Some degree of centralization may be recommended to induce rent-seeking politicians to supply a positive amount of public good, since too much proximity would prompt them to adopt H-strategy.
Since the thresholdd does not vary with the realization of θ , citizens' expected welfare may be written as: The first term on the right-hand-side of (10) refers to the payoff obtained when a benevolent politician is in office in period 1, (1 + δ) E (T /θ ), which happens with probability β. The second term shows the possible payoffs if in period 1 a rent-seeking incumbent is in office, which happens with probability (1 − β). In this case, if the distance is lower thand, a separating equilibrium prevails, in which the incumbent plays the H-strategy and the expected payoff of citizens is βδ E (T /θ ), that is the probability of incurring in a benevolent government in period 2 times the discounted expected amount of public good supplied by a benevolent politician. If the distance is large enough, d ≥d, the equilibrium is pooling: rent-seeking incumbents play the E-strategy, and the expected payoff of citizens is E T θα m z(x * (d),d) . The second term of (10) clearly shows the trade-off between the effects of politicians' discipline and selection connected to the choice of distance. In the top line, it is represented the gain from the selection effect, which is the amount of public good expected for the second period thanks to the possible election of a benevolent politician. In the bottom line, the second term accounts for the welfare gain due to the discipline effect of elections, pushing the rent-seeking incumbent to provide a positive amount of public good. Since the first addend of SW does not depend on d, the optimal distance is obtained by comparing the highest possible expected value of public goods provided by a rent-seeking incumbent in period 1 with the expected value of public goods obtainable by voting for the challenger, βδ E (T /θ ).

Proposition 5 Letd ≥d be the value of d such that E
, the expected rewards from selection are greater than the expected rewards from discipline and decentralization is optimal, that isd > , the expected rewards from selection are not greater than the expected rewards from discipline, then: (i) when z xd > z x x z d /z x , 13 In this case, optimal distance is defined as an interval (rather than punctual) solution. This is because, for the sake of simplicity, the model ignores that the cost of public goods θ may vary with distance. If for example one assumes that local governments provide public goods more efficiently than the central government, optimal distance would be 0. On the other hand, admitting that decentralization is somewhat costly (i.e. sustaining many layers of local governments involves higher overall costs than a more centralized framework), optimal distance would be just belowd. moderate centralization is optimal, d * =d =d; (ii) when z xd ≤ z x x z d /z x , maximum centralization is optimal, d * =d = D.
Proof From Proposition 4 and Eq. (8) the thresholdd and the optimal biasing x * are both not dependent on the realization of θ , while from (C.4) in Appendix C the amount of public good provided by rent-seeking politicians under pooling equilibrium is strictly decreasing or increasing with d according to whether z xd is strictly greater than z x x z d /z x or lower or equal than that. Therefore, all the bullets in Proposition 5 follow straightforwardly.
Proposition 5 clearly shows that institutional factors, as captured by the values of β, α m , c and z xd , are key determinants of optimal voter-politician distance. In particular, hinging on Eqs. (C.4)-(C.9) and Propositions 4 and 5, we can state: Proposition 6 Decentralization is more likely to be an optimal setting: (i) the larger the share of benevolent politicians β, and higher the expected rewards from selection; (ii) the larger the political unawareness of voters α m ; (iii) the smaller the unit cost of biasing information c.

Heterogeneous Regions
As shown in the previous sections, the relation between voter-politician distance, public goods provision and social welfare is not univocal, depending on the quality of the institutional context. So far, we have taken the perspective of a single administrative unit, or implicitly assumed that β, α m and c are the same across different administrative units. In fact, the quality of institutional factors can be heterogeneous across regions in a country. Let us assume that these parameters are region specific, and that β μ , α mμ and c μ are the average values of the share of good politicians, political unawareness of median voters, and cost of information biasing.
Consider politicians' quality. A high β μ makes a separating equilibrium socially more rewarding than a pooling equilibrium and pushes the constitutional legislator to choose a more decentralized setting, inducing bad politicians to reveal themselves as rent-seekers and exploiting the high probability of challengers being benevolent. However, a reduction of voter-politician distance through a decentralization reform may be harmful for regions populated by a low share of good politicians (lower than β μ ), as it involves forgoing gains from discipline without obtaining much from selection.
A larger average unawareness of voters α mμ has a negative impact on social welfare for d ≥d and no effect for d <d. This because voters' unawareness permits the bad politicians who play the E-strategy to supply less public goods. This leads constitutional legislator to prefer more decentralized political systems. However, while a lower distance benefits regions populated by relatively highly credulous voters, where rentseeking politicians are very likely to aim for a second term by playing the E-strategy, it can damage areas with realistic voters, where bad politicians have less incentive to pool with good politicians and are more prone to switch from E-to H-strategy.
Similarly, lower value of c μ decreases social welfare when d ≥d, by leading rent-seeking politicians to spend more resources on information biasing, and makes decentralization more rewarding. However, lower d can lead rent-seeking politicians in regions with relatively high biasing costs to play the H-strategy and provide zero public goods. In this case, decentralization reforms might favor some administrative units (the ones with low c) and penalize others.
Finally, suppose that the effectiveness of information biasing is different across regions, and in particular that z xd is larger than z xd z d /z x in good regions and lower than that in the bad ones. It follows that a reduction in distance is likely to be beneficial for the former regions (as long as welfare is decreasing in distance) and harmful for the latter (where the relationship between distance and welfare is reversed). Figure 3 illustrates the above discussion on the effects of decentralization with heterogeneous regions, focusing on a two-region case in which diversity arises in either the quality of politicians β (panel A) or the effectiveness of information biasing z xd (panel B). The horizontal line drawn for d <d displays the social welfare in the case a separating equilibrium occurs (i.e. when low distance triggers H-strategy by rent-seeking politicians). For d ≥d, social welfare is decreasing or increasing in the distance according to the value of z xd .
In Panel A, the relationship between distance and social welfare is depicted for regions L and S. Region L has a large share of good politicians, compared to the average share of benevolent politicians in the country, i.e. the value used by a welfare maximizing constitutional legislator to decide on the optimal distance. Region S is largely populated by bad politicians, so that β L > β μ > β S . In the case considered in the figure, SW H β μ > SW E β μ ,d , and therefore the distance maximizing the average social welfare is lower thand. However, optimal distance is different in the two regions: lower thand for the large-β region L, and equal tod for the small-β region S. Therefore, a decentralization reform maximizing the average social welfare increases the welfare of citizens in region L from SW E β L ,d to SW H (β L ), and decreases the welfare of citizens in region S from SW E β S ,d to SW H (β S ).
In panel B, the quality of politicians is assumed to be equal across regions (β L = β S = β μ ), so that SW H is equal as well, while the effectiveness of information biasing z xd is strictly different. For the sake of simplicity, a unique thresholdd is assumed to hold for both regions. 14 Again, an average-welfare maximizing policy, considering the dotted curve and choosing d * <d would be profitable for the high-z xd region (where SW E is decreasing in distance) and detrimental for the low-z xd region (where SW E is increasing).
Summarizing, our analysis makes it clear that constitutions choosing a unique level of voter-politician distance based on average values of institutional parameters in the country may fail to be optimal for all the regions. From a policy viewpoint, this result suggests that the optimal design of local/central governance rules should take into account the heterogeneity of regions and be accompanied by measures that can help make the regions more similar to each other in terms, for example, of quality of local media and ruling class. In the next section, we extend the analysis by considering

Endogenous Pool of Politicians
The pool of politicians is one of the elements determining the welfare implications of proximity. In this section we investigate the incentive of citizens to participate in the political process by running for elections. There are monetary and non-monetary rewards from being elected. In particular, we assume that rent-seeking politicians are interested only in monetary incentives while benevolent ones receive gratification from doing their social duties and contributing to the welfare of their community. 15 In this context, the salary paid to elected politicians has an impact on the choice of citizens to stand for election or not as well as on the choice of officeholders to provide public goods or grab taxes, which, under certain circumstances, might offset the negative effect of voters' proximity in weak regions.

Public Good Provision
Let us assume that elected politicians are remunerated with a fixed salary W , independent of their ability or performance. The salary is paid out of taxes T , such that the amount of resources available to produce the public good is (T − W ). Beside the 15 The relationship between the remuneration of politicians and their average quality has been investigated both theoretically and empirically (Besley 2004;Gersbach 2004;Messner and Polborn 2004;Beniers and Dur 2007;Poutvaara and Takalo 2007;Kotakorpi and Poutvaara 2011;Gagliarducci and Nannicini 2013;Fedele and Giannoccolo 2020). salary, benevolent politicians reap a non-monetary payoff B > 0 from doing their duty as a civil servant. 16 For rent-seeking politicians B = 0. Also, assume that the cost of public good is independent of the incumbent's ability and that legal and administrative controls constrain the rent diversion to be not larger than (T − W − τ ), where higher values of τ indicate more effective control. The utility of the incumbent politician is: according to whether she is benevolent or a rent-seeker playing H-or E-strategy, respectively.
Proposition 7 (i) The optimal intensity of biasing action x * is decreasing with the salary W paid to elected politicians. (ii) A salaryŴ exists such that: for W ≥Ŵ , rent-seeking incumbents provide the amount of public good G r ,E d) in order to be re-elected for a second term; for W <Ŵ they prefer to divert the maximum rent (T − W − τ ) and leave the office. (iii)Ŵ decreases with d and τ , andĜ m decreases with W .
Proof See Appendix C.4 Like voter-politician distance, paying a salary to incumbent politicians has the effect of increasing the value of holding office for a second term. Therefore, in decentralized political systems, a higher compensation is required to motivate rent-seeking politicians to provide enough public goods to citizens. On the other hand, unlike distance, a higher salary unambiguously reduces both outlays to bias information and the amount of public goods supplied by benevolent and rent-seeking politicians, because funds available for both public goods and rents diminish.

Social Welfare
Introducing a politicians' remuneration affects social welfare by both draining public resources and modifying the pool of candidates to election. Assume that there are two possible types of individuals: if elected, some behave congruently with voters' welfare and others dissonantly. Let the mass of congruent and dissonant individuals in the economy be the same, normalized to 1. Each individual can earn a monetary income in the private sector equal to wa i , where a i ∼ U [0, 1] is individual ability, uniformly distributed in the unit interval. Since the elected candidate has to give up the outside income, congruent individuals run for election if a i ≤ (B + W ) /w, while dissonant individuals do it if a i ≤ U r ,H /w or a i ≤ U r ,E /w, according to whether W ≶Ŵ . Therefore, assuming, as in the previous sections, that the incumbent and 16 The benefit from behaving as a civil servant B can be clearly affected by cultural factors and social norms. Although not explored in our model, this link deserves further analysis. the winning candidate in the second-term are random draws from the pool of people running for election, the probability of having a benevolent politician in office is: Substituting into (10), the social welfare becomes: From (15), when W <Ŵ , a higher salary improves the pool of candidates, and SW H reaches the maximum value at a salary equal or lower the threshold, W * H ≤Ŵ . By contrast, when W ≥Ŵ , a further increase of salary paid to politicians can increase or decrease the probability of having a benevolent politician in office, according to Therefore, if B and/or ∂U r ,E (x * ) ∂ W are sufficiently large, an increase in salary spurs dissonant more than congruent citizens to enter the pool of candidates, implying that social welfare under the pooling equilibrium SW E can be constantly decreasing with W . Otherwise, SW would increase in W up to the point where the resources available for the production of the public good do not decrease enough to reverse the relationship (i.e.,Ŵ ≤ W * E < T − τ ).

Multiple Regions
As in Sect. 5, let us consider different administrative units with different values of parameters. The optimal salary that maximizes the stepwise function (15) of the average region for any given distance is If politicians can be paid differently across regional governments, salary can be used to mitigate the possible negative effects produced by a decentralization reform on weak regions.
Consider two regions L and S, the former characterized by a great difficulty in diverting public resources and the latter where rent-seeking incumbents can easily grab taxes undisclosed (i.e., τ L > τ S ). From Eq. (14), region L can count on a better pool of candidates under both the separating and pooling equilibrium. If the average amount of public goods provided by benevolent politicians is greater than the minimum amount that rent-seeking politicians are forced to provide-that is, if E [(T − W ) /θ ] > E (τ/θ))-then the social welfare in region L comes out to be higher in any equilibrium. Also, since a higher τ involves a stronger reduction in payoffs for H-strategy than for the E-strategy, the threshold salary in region L is lower, i.e.Ŵ L <Ŵ S . As a result, welfare-maximizing salaries are in general different in the two regions, and it can be optimal paying politicians more in weak S-regions than in well-functioning L-regions, W * S > W * L . In Fig. 4 social welfare is displayed as a function of politicians' wage for a given distance. In region S bad politicians may divert taxes more easily, and pooling equilibrium with a wage W * S >Ŵ S is socially preferable to separating equilibrium.The opposite happens in region L, where diverting resources from the production of public goods is difficult, and W * L =Ŵ L . In the latter case, the high average quality of the pool of candidates participating in the electoral competition makes the selection effect of election socially more valuable than the discipline effect of high salaries, and the separating equilibrium is welfare-optimal. By allowing for different compensations to elected politicians in the two regions, W * S > W * L , the legislator can maximize social welfare by realizing the separating equilibrium in the region L and the pooling equilibrium in the region S.

Conclusions
Decentralization institutional reforms are often considered an effective way to increase political accountability, thus increasing social welfare. This is not always supported by the empirical evidence, however. In this paper, we showed that the relationship between distance and public goods provision is non monotonic, and that the welfare effect of proximity depends on the share of good and bad politicians, the voters' political awareness, and the cost and effectiveness of information biasing activities by the incumbent politician. Since these institutional features may vary across regions within the same country, a decentralization reform may be welfare improving for some region but detrimental for others. This is a risk in countries with large regional disparities.
A policy implication of our analysis is that moderate or strong centralization of the political system should be recommended in some cases to select benevolent politicians or even to induce rent-seeking politicians to supply a larger amount of public good. In countries characterized by strong institutional disparities across regions, constitutional reforms aimed at increasing decentralization may result to be profitable for advanced regions and detrimental for the weak ones.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. when G 1 = T /s m where s m = θα m z (x, d) and x * is the value of x that solves c = T z x /α m [z (x, d)] 2 as in Eq. (8) in the main text.
Therefore, a voter i observing public good provision G 1 and a signal s i knows that there are two possibilities: either the incumbent politician is benevolent, and in this case θ 1 = T /G 1 and α i = s i G 1 /T , or the incumbent politician is rent-seeker, and in this case θ 1 = T /α m z (x * , d) G 1 and α i = s i α m G 1 /T . Applying the Bayes formula, the posterior probability that G 1 was generated by a benevolent incumbent is: and voter i rationally supports the incumbent if and only if: If the Bayesian likelihood ratio is monotonically decreasing in G and s i , aĜ i exists such that if G 1 ≥Ĝ i voter i supports the incumbent and otherwise if G 1 <Ĝ i (s i ) she votes for the challenger. In particular, if ∼ U 0,θ and h (α m ) = h (1), with α m > 1, then when G 1 = T /s i , condition (A.1) holds as a strict equality and G i (s i ) = T /s i . In this case, the voting strategy assumed in Assumption 1, and the pooling and separating equilibria derived in Proposition 1 are characterized as perfect Bayesian equilibria.

Appendix B: Skeptical Median Voter
In this Appendix, we characterize the equilibrium strategies and derive the socially optimal institutional distance for the case of skeptical median voters, α m < 1.

B.1 Equilibrium
If the median voter underestimates the cost of providing public goods, the amount of it that must be offered by incumbent politicians to be re-elected without altering information signals available to voters-T /(θ 1 α m )-is greater than the socially optimal amount-T /θ 1 . In this case, providing the social optimal amount of public is not enough for benevolent politicians to be re-elected, and they have to spend resources on information to modify voters' beliefs. This, however, reduces the amount of public good that they can provide to citizens. A necessary condition for benevolent politicians in office to be re-elected is that anx exists such that the amount of public good that can be provided after deducting the information expenditures is equal to the amount of public good needed to be re-elected: . (B.1) .
If this condition holds, a benevolent politician has two strictly un-dominated strategies: (ii) A "behave" strategy (henceforth, B-strategy), consisting in providing the social optimal level of public good in period 1-i.e. G b,B 1 = T /θ 1 -and taking no information actionx b,B 1 = 0. In this way, she misses the opportunity to be reelected. From Eq. (3), the payoff of the benevolent incumbent from the B-strategy profile is: (ii) An "election" strategy (henceforth, E-strategy), consisting in taking the minimum action x b,E 1 =x to be re-elected in period 2 with probability q = 1 by providing an amount of public good G b,E 1 = T − cx /θ 1 in period 1. In this way, she gains the opportunity to provide the social optimal amount of public good in period 2 -G b,E 2 = T /θ 2 -without wasting resources to influence voters' beliefs - (3), the payoff of the benevolent incumbent from the Estrategy profile is: When the median voter is skeptical and the feasibility condition (B.1) holds, the benevolent politician, similarly to rent-seekers, faces a trade-off between maximizing their utility in the first period but giving it up in the second period, and forgoing some utility today in order to get the full utility tomorrow. The best strategy for rent-seeking politicians is derived by comparing the payoffs under E-and B-strategy in Eqs. (B.2) and (B.3): Moving on rent-seeking incumbent politicians, they are re-elected if x is such that α m z(x, d) > 1, and information biasing actions makes it possible to extract some rents, exactly the same as when the median voter is non-skeptical. In addition, their behavior is not affected by the choices of benevolent politicians (i.e. whether or not the latter invest in correcting voters' beliefs), sinceĜ m in Eq. (5) is unchanged. Summarizing: Proposition B.1 If the median voter is skeptical and the voting rule is the one reported in Assumption 1, a unique pooling or separating equilibrium exists. For rent-seeking incumbent politicians, the optimal strategies are the same as reported in Proposition 1 in the main text, according to whether z (x * , d) (δT − cx * ) ≷ T /α m and a pooling or separating equilibrium prevails. For benevolent incumbents, the optimal strategies are: θ 1 δE (T/θ ) ≥ cx.
At Proof When benevolent incumbent politicians adopt the optimal E-strategy, from Proposition B.4 we have that sign ∂G b,E 1 /∂ j = −sign ∂x/∂ j , with j = {d, α m , c}. By applying the implicit function theorem to the feasibility condition (B.1) and recalling Lemma B.1, it is easy to verify that: Asx decreases with d, from Eq. (B.4) it is easy to verify that a distanced (α m , c) , θ 1 ) exists, such that for d <d benevolent politicians prefer the E-strategy, while for d ≥d they prefer "to behave" and supply the socially optimal amount of public good. Following Proposition B.2, when the median voter is skeptical and d >d, the amount of public good supplied by a benevolent incumbent increases with the voter-politician distance, independent of the quality of local media and the marginal effectiveness of information actions. However, this quantity is always lower than the amount of public good provided under the B-strategy (see Fig. 5).

B.2 Welfare
According to whether the benevolent optimal strategy is "behave" or "election", citizens' expected welfare may be written as:

Proposition B.3
The optimal social distance d * is unaffected by the strategy of benevolent incumbent and takes the same values as in Proposition 5 in the main text.
Proof Since the expected welfare when a benevolent politician is in office in period 1 (the first terms in (B.5) and (B.6)) is the same regardless of the strategy that would be followed by a rent-seeking incumbent, and since it is independent of (Eq. (B.5)) Then, the valued for which U = 0 does not not depend on θ . Moreover, by applying the implicit function theorem, ∂d/∂α m < 0 and ∂d/∂c > 0 straightforwardly follow.

C.4 Proof of Proposition 7
(i) From (13), the value x * maximizing U r ,E solves (T − W ) z x /α m z (x, d) 2 = c; then, by applying the implicit function theorem: 1 0 ) (ii) Differentiating U r ,E by W , and substituting from Eq. (C.10), it is easy to verify that: Since U r ,H is independent of biasing effort, a salaryŴ exists such that for any W Ŵ the utility from the E-strategy by providing G r ,E 1 =Ĝ m (W ) is greater, equal to or lower than that from H-strategy, i.e. U r ,E U r ,H . (iii) Applying the implicit function theorem to U r ,E (x * ) − U r ,H = 0, and using Eqs. (C.11) and (C.7), it straightforwardly follows that ∂Ŵ /∂d < 0 and ∂Ŵ /∂τ < 0. Finally, differentiatingĜ r ,E 1 (x * ) by W , and substituting from Eq. (C.10):