Higher Tax for Top Earners

The literature can justify both increasing and decreasing marginal taxes (IMT & DMT) on top incomes under different welfare objectives and income distributions. Even when DMT are theoretically optimal, they are often politically infeasible. Then a flat tax seems to be a constrained optimal solution. We show however that, given any flat tax we can increase the total utility of a poor majority by raising the top income tax rate under a simple condition, which can be checked with empirical data. We further generalize our main results allowing different welfare weights, declining elasticity of labor supply and more tax bands.


Introduction
Throughout most developed and developing economies, income distributions have become increasingly skewed in recent decades (Stiglitz 2012;Piketty 2014). One reason has been declining marginal tax rates for top incomes. Another is the effective high marginal tax faced by low income earners due to withdrawal of benefits as earnings rise, leading to the poverty trap. The existing tax structures in most developed countries B Jim Jin jyj@st-andrews.ac.uk Felix FitzRoy frf@st-andrews.ac.uk are U-shaped, with increasing marginal tax (IMT) on high income earnings (but not on capital gains). The justification of IMT on top incomes is to raise revenue from those most able to pay, and provide a social safety net for the poor. This view is theoretically justified by Diamond (1998) and Saez (2001) based on their assumption that top income follows Pareto distributions (see also Salanié 2003).
However, the shape of the optimal tax curve seems to be sensitive to income distributions. With a bounded distribution, Sadka (1976) and Seade (1977) find zero optimal marginal tax rate for the top earners. In general optimal tax curves are often inversely U-shaped or even declining (see Tuomala 1984;Kanbur and Tuomala 1994;Boadway et al. 2000;Tarkiainen and Tuomala 2007;Hashimzade and Myles 2007;Boadway and Jacquet 2008;Kaplow 2008), implying decreasing marginal taxes (DMT) on top incomes. But as Warren Buffet famously complained, the lower effective average tax rates paid by the rich, due to low capital gain taxes and various loopholes, are widely perceived to be unfair. This political problem often imposes a binding constraint and suggests a constrained optimal solution to be a flat tax, which by continuity should be closer to the optimal DMT and dominate IMT. Moreover, a flat tax will reduce administrative costs and avoid incentive distortions (see Atkinson 1995 for a good overview). Thus Mankiw et al. (2009) argue that "A flat tax, with a universal lump-sum transfer, could be close to optimal".
On the other hand, Diamond and Saez (2011), Piketty andSaez (2012, 2013) (DSPS) argue that if policy makers ignore the welfare of the richest group (due to their low marginal utility of income) and focus on the poor majority, the tax rate for top income should be 70-80%, supposedly higher than the tax rate for lower incomes. This policy has been successfully applied in the Scandinavian countries where high top tax rates co-exist with high labour force participation and the highest level of life satisfaction (Kleven 2014). The validity of different policy recommendations, IMT or flat tax, seems crucially dependent on social objectives as well as income distribution. This paper shows that even when DMT are optimal but not feasible, a flat tax may not be the next best alternative. Given any flat tax, we can increase the total utility of a poor majority by raising the tax rate on top earners under a simple condition, which means the optimal tax on top earnings derived by Saez (2001) is higher than the optimal flat tax. This condition generally holds when the poor majority is sufficiently large.
Following DSPS (though they consider more general cases), we ignore the interests of the rich group and focus on the poor. Later we allow different weights given to different poor households, leading to a similar effect as decreasing marginal utility of income assumed by DSPS. Surprisingly, when we put more weight on the very poor households, a higher tax for top earners is less likely to benefit the poor.
We first assume a constant elasticity of labour supply for the whole population. Later we assume more realistically declining elasticity with income and show that a higher top tax is more likely to be justified. This is consistent with the optimal IMT obtained by Aaberge and Colombino (2013) and Andrienko et al. (2014), using data from Norway, US, UK and Australia, with declining elasticity of labour supply.
Continuous tax curves have been criticized as "too far removed from the tax-benefit systems observed in practice to be a useful guide for policy" (Chone and Laroque 2005, p. 396). Apps et al. (2009) remark that "Given its significance in practice, the piecewise linear tax system seems to have received disproportionately little attention in the literature on optimal income taxation." Following Diamond and Saez (2011) who argue for practical and useful research on tax policy, we first consider two-band taxes. This system is the natural first step beyond a flat tax and can model IMT and DMT as well as a flat tax. Furthermore the two-band tax literature finds DMT optimal.
Sheshinski (1989) first argued for increasing two-band taxes. However, Slemrod et al. (1994) find errors in his proof and use numerical simulations to show that DMT maximize maximin and utilitarian objectives. Similarly Salanié (2003), Hindricks and Myles (2006) obtain optimal decreasing two-band taxes in a two-class economy. Hence it is interesting to see if both decreasing and increasing two-band taxes dominate any flat tax. We later allow more tax bands and generalize our result accordingly.
We introduce our two-band tax model in the next section. Section 3 shows that any flat tax is Pareto dominated by some DMT. Section 4 gives a sufficient condition for a higher top tax rate to benefit a poor majority and shows it is valid if the majority is sufficiently large. Section 5 extends our model and generalizes the results allowing different welfare weights for the poor, declining elasticity of labor supply and multipletax bands. Section 6 concludes the paper.

Basic Model
We assume that a population, normalized to unity, consists of a continuum of households, whose wage is denoted by w, and is distributed on [a, b], where a > 0, b is bounded, but can be very large and approximately treated as infinite. The density and cumulative functions of w are denoted by f (w) and F(w). We define the poor population as those with wages below a fixed levelw, and denote those with higher wages as the rich. The government's objective is to maximize the total utility of the poor. This is similar to Diamond and Saez (2011) who give virtually zero weight to the rich in the social welfare function due to decreasing marginal utility of income. Our objective can be justified by the political goal of income redistributions. We first treat the poor equally but will give them different welfare weights in Sect. 5(i).
Every household has a quasi-linear utility, m − x 1+1/ ε /(1 + 1/ε), where m is net income, x is labour supply and ε is its elasticity. This simple utility function has been widely used in the literature (e.g. Atkinson 1995). We first assume an identical ε for the whole population and later allow declining elasticity in Sect. 5(ii).
Given wage w, a household's pre-tax earnings y = wx. The government imposes two tax rates, t 1 and t 2 , for earnings below and above a threshold Y . The tax revenue, after a fixed expenditure is paid, is distributed to all households equally as a basic income, denoted by B. Given our unit population B is also equal to the total transfer received by the whole population. The two-band taxes reduce to a flat tax when t 1 = t 2 . We will allow more tax bands in Sect. 5(iii).
Given t 1 , t 2 , Y and B, households' utility functions can be written as: Every household chooses labour supply x to maximize utility. We first consider IMT, i.e. t 1 < t 2 and assume Y ≥w 1+ε (1 − t 1 ) ε . Thus every poor household faces the lower rate t 1 , and chooses optimal labour supply x = w ε (1 − t 1 ) ε . This can be justified by the political agenda to help the poor by charging them a low tax rate t 1 . Substituting it into (1), we obtain the maximized utility Integrating it over [a,w], we get the total utility of the poor as our objective function: Given Y ≥w 1+ε (1 − t 1 ) ε and t 1 < t 2 , the population is divided into three groups. All poor households and some rich ones with , the total tax revenue from these three groups is: We assume the fixed expenditure is less than R, so B is positive and maximized whenever R is. So we can replace R by B. Under a flat tax, t 1 = t 2 = t, we havê w = w 1 , and (4) reduces to b a t (1 − t) ε w 1+ε f (w)dw. Then our objective function (3) reduces to:

Flat Tax and DMT
The literature (e.g. Slemrod et al. 1994) has shown that DMT are generally optimal for two-band taxes under maximin or utilitarian objectives. In this section we show that a flat tax is always Pareto dominated by some DMT. We first find the optimal flat tax which maximizes (3 ). To simplify the notation we denote the total earnings of the poor under zero tax by E 1 ≡ w a w 1+ε f (w)dw and denote the corresponding earnings of the rich by E 2 ≡ b w w 1+ε f (w)dw. The total zero-tax earnings of the whole population is E = E 1 + E 2 . Since the population is normalized to 1, E is also the average zero-tax earnings of the whole population. Then the average earnings of the poor and rich are e 1 ≡ E 1 /F(w) and e 2 ≡ E 2 /[1 − F(w)] respectively. By the definition we always have e 1 ≤ E ≤ e 2 .
We differentiate (3 ) and find Hence we get the following result.

Proposition 1
The optimal flat tax to maximize (3 ) is t * = 1−e 1 /E 1+ε−e 1 /E . This result is a special case of Piketty and Saez (2013), who derive an optimal linear tax of (1 −ḡ)/(1 + ε −ḡ), whereḡ is the average social welfare weight weighted by pre-tax incomes, which "is also the ratio of the average income weighted by individual social welfare weights g i to the actual average income" (p. 21). Given our welfare function, which only values the utility of the poor,ḡ = e 1 /E and their formula reduces to our t * . Piketty and Saez (2013) further discuss the median voter tax rate, which maximizes the utility of the median earner, and point out "a tight connection between optimal tax theory and political economy". If e 1 equals the median no-tax earnings, t * is the median voter tax. Interestingly, the median income in the U.S. was roughly $26,000, when the average top 1% income was estimated by Piketty and Saez as $1.2 million. Given the average earnings of $38,000, the average of the bottom 99%, e 1 was also about $26,000. Thus our flat tax for the 99% majority is also the median voter tax.
Next we show that any flat tax t (0 < t < 1/(1 + ε)), including t * , is Pareto dominated by some DMT. 1/(1 + ε) is the revenue maximizing flat tax. We exclude the case of t ≥ 1/(1 + ε), which lies in the inefficient part of the Laffer curve. Now we lower tax rate t 2 for earnings beyond , which is positive given t < 1/(1 + ε). As b 1+ε is the highest no-tax earnings, there is a positive mass earning more than Y , and we can show that each of them will pay more tax with a lower tax rate t 2 . The tax payment from a household within this group is Saez (2001) the impact of tax change can be decomposed into two effects, mechanical and behavioral 1 . The former can be obtained under a constant labor supply and expressed as [(1 − t 2 ) ε w ε +1 − Y ] t 2 . The latter is due to the response of labor supply and indicated by −εt 2 (1 − t 2 ) ε −1 w ε +1 t 2 . Adding them together the derivative of the tax payment respect to t 2 is negative at which is guaranteed for any w < b given our definition of Y . Thus each household earning more than Y pays more tax when t 2 falls. These households must be better off due to a lower marginal tax rate and higher basic income B. Moreover the poorer households are better off too due to higher B. Therefore a lower t 2 benefits everyone.

Proposition 2 Every flat tax is Pareto dominated by some DMT.
The intuition follows from Saez' (2001) concept of behavioural and mechanical responses. A lower t 2 will motivate rich households to increase their labor supply. If the tax threshold Y is set sufficiently high, the tax revenue loss will be limited, and the extra labor supply from each household can generate significant tax revenue due to its high productivity. So the behavioural effect dominates the mechanical effect, leading to a higher revenue. This lower tax applies to a positive mass, not just the highest earner, different from the zero top marginal tax obtained by Sadka (1976) and Seade (1977).
When DMT are optimal but politically infeasible, a flat tax seems to be better than IMT since it should be closer to the true optimum DMT by continuity, and thus dominates IMT. However, this monotonicity of tax policy may not be valid. Assuming the government is politically constrained to implement two-band IMT, we will show that the optimal flat tax is dominated by some IMT under a simple condition.

IMT Versus Flat Tax
Given the optimal flat tax t * , the question now is whether some IMT (t 1 < t 2 ) can generate a higher value of (3) than t * does. This must be true if we find ∂ W/∂t 1 < 0 and ∂ W/∂t 2 > 0 when t 1 = t 2 = t * . In fact these two conditions are identical and we can focus on ∂ W/∂t 2 > 0. Notice that the first term in (3) does not depend on t 2 . If t 2 maximizes (3), it must maximize B (i.e. R). This is essentially the approach taken by Saez (2001). To prove that IMT can dominate the optimal flat tax, we just need to show ∂ B/∂t 2 > 0 when t 1 = t 2 = t * , instead of finding the optimal t 2 .
For simple presentation we let Y =w 1+ε (1 − t 1 ) ε . This is not the only choice to obtain our results. For example, if we let Y =w 1+ε (1 − t * ) ε , the marginal poor (w =w) will bunch when we lower t 1 and raise t 2 , but this does not change the condition for ∂ W/∂t 1 > 0 and ∂ W/∂t 2 < 0 at t 1 = t 2 = t * , and has no effect on our result. Since our goal is to show IMT can dominate t * , this particular Y serves our purpose. Y =w 1+ε (1 − t 1 ) ε impliesŵ =w and the tax revenue (4) (hence B) simplifies to: Then we investigate whether two-band taxes with t 1 < t 2 can lead to a higher value of (3) than the optimal flat tax t * , with B in (3) replaced by (4 ). For simple expression we denote the marginal household's zero-tax earnings,w 1+ε byȳ.
Proposition 3 There exists a two-bracket tax schedule with t 1 < t * < t 2 that dominates the optimal linear tax rate t * , if at t 1 = t 2 = t * , we have Proof see "Appendix A".
As we mentioned earlier, ∂ W/∂t 1 < 0 and ∂ W/∂t 2 > 0 depend on the same condition. This is not coincidental. If both ∂ W/∂t 1 > 0 and ∂ W/∂t 2 > 0 at t 1 = t 2 = t * , it would be possible to increase (3) by raising t 1 and t 2 together. But this is impossible since t * is the optimal flat tax to maximize (3 ).
When (5) holds, any flat tax is dominated by both IMT and DMT. There is no monotonicity in the optimal taxes. However, the superior IMT and DMT require different thresholds for top tax rates. In proving the Pareto superiority of DMT, , Y associated with DMT must be higher than (1 − t) εȳ in the IMT case. Hence both a higher and a lower top tax rate would be desirable if implemented at different income thresholds.
Moreover, our result can be obtained by directly comparing the optimal flat tax t * with the optimal top income tax rate obtained in Saez (2001). Without an income effect as assumed here, his tax rate becomes (1 − g)/[1 − g + εe 2 /(e 2 −ȳ)], where g is the social welfare weight given to the rich (also see Piketty and Saez 2013). In our model g = 0 given zero welfare weight for the rich. Thus Saez' optimal top income tax rate becomes (e 2 −ȳ)/[(1+ε)e 2 −ȳ]. If it is higher than t * , IMT must dominate t * . However no one has explicitly compared these two tax rates. In fact Saez' asymptotic marginal tax rate t a can be obtained from [1 − εt a /(1 − t a )]e 2 =ȳ. So t a > t * if and only if (5) holds 3 . Otherwise Saez' marginal tax for top income is inconsistent with IMT.
To evaluate (5), it is often convenient to consider the income distribution function G(y), with y = w 1+ε , instead of the wage distribution F(w). The validity of (5) may not depend onȳ. For instance, when the income distribution is nearly unbounded, we may approximate it by a Pareto distribution, G(y) = 1 − y − α for y ≥ 1, α > 1. Then condition (5) holds for anyȳ 4 . This result is consistent with Diamond (1998). One may attribute this result to the thick-tail of a Pareto distribution. However, if α is large, the tail becomes very thin while (5) still holds. To see this point further, we consider a thick-tailed distribution G(y) = (y/ h) β , with 0 ≤ y ≤ h and β > 0. The number of rich households may even rise with income (if β > 1). But (5) never holds 5 . The validity of (5) does not require an unbounded income either. For instance, consider a bounded Pareto distribution with G(y) = (1 − y − α )/(1 − h − α ), with 1 ≤ y ≤ h and α > 1. It can be shown that (5) holds for any h andȳ, even when the maximum income h is very low and close to 1. These examples demonstrate the sensitivity of (5) to income distributions.
In spite of such complexity, the validity of (5) may be determined by simple data without knowing income distributions precisely. For instance, Diamond and Saez (2011) estimate the U.S. threshold of the top 1% as $0.4 million and their average earnings as $1.2 million. This impliesȳ/e 2 = 1/3, which is lower than e 1 /E, given e 1 = $26,000 and E = $38, 000 as we mentioned earlier. So condition (5) holds.

Corollary If high earnings follow a Pareto distribution, a higher tax on a small group of top earners always benefits the remaining population.
This result supports DSPS's view about a higher tax on top earners. But this may only apply to a small rich group, e.g. 1%. The current top tax rate, however, usually applies to a much larger group. Bach et al. (2012) argue that their top tax rate of 2/3 in Germany should only apply to an income level much higher than the current threshold. Indeed when we consider a higher tax on a large group, (5) may not hold. For instance, if we consider a higher tax on the top 50%, i.e.ȳ = the median earnings, (5) does not hold for any lognormal distribution. Therefore, starting from the optimal flat tax, raising the tax rate beyond the median earnings will not benefit the poor 50%.
The question is how large the rich group should face a higher tax. It is difficult to answer this question by (5) directly since it is very sensitive to income distributions which can hardly be identified precisely. It would be desirable to check its validity without assuming specific distributions. This is easier to do using another condition equivalent to (5). It depends on whether we have a decreasing e 1 /e 2 , the ratio of the average earnings of the poor and the rich (Proof: see "Appendix B").

Proposition 4 (5) holds if and only if e 1 /e 2 falls aroundȳ.
If e 1 /e 2 is single peaked, it will fall after its maximum. If earnings are unbounded andȳ is sufficiently large, e 1 approaches to E but e 2 to infinity. So e 1 /e 2 must fall and IMT must dominate any flat tax. The question is: how largeȳ is "sufficient". The answer may not be obtained from the theory alone, but from empirical data.
Our data are obtained from the United Nation's "World Income Inequality Database" (May (2008)), and provide each decile's earnings as percentages of aggregate earnings. The data set does not contain the relevant information for all years.
To avoid subjective bias we use the most recent data for each country. Unfortunately, our ratio of e 1 /e 2 does not take into account complex tax systems which generate real data. So we use the actual earnings ratios as approximation for zero-tax earnings ratios. On the other hand, despite complex tax systems in G8 countries, we find their e 1 /e 2 curves are all single peaked and fall from similar thresholds of income deciles.
We use a decile's earnings as a percentage of the aggregate earnings to calculate e 1 /e 2 . The ratio of this group's earnings to that of the whole population is given The data provide us the values of r for G(y) = 10-90%, giving us 9 values of e 1 /e 2 . The results for G8 countries are given in Table 1.
Apparently, the e 1 /e 2 ratios differ significantly between eight countries. As we mentioned earlier, these values are only approximations since we do not take into account complex non-linear tax systems different from flat tax assumed here. Nonetheless these e 1 /e 2 ratios all exhibit a single peak in G8 countries and surprisingly, they start to decline around 80% of income levels. Hence a higher tax can be justified when it is imposed on less than 20% of top earners on the behalf of more than 80% poor majority.

Extensions
(i) Welfare weight So far we have treated all the poor equally. Ideally we may give them different welfare weights, and allow a continuous treatment across the rich and the poor. This is similar to the approach taken by DSPS based on decreasing marginal utility of income. Decreasing welfare weights have a similar effect as assuming decreasing marginal utility of income. Intuitively, one may expect that this should increase the chance of justifying a higher top tax rate. However, like the conventional belief that the maximin is most likely to justify IMT, this conjecture is not correct. Givenw we assign welfare weight s(w) to every poor household w (≤w) such that w a s(w) f (w)dw = F(w). We then multiply s(w) with each poor household's net utility [(1 − t 1 )w] 1+ε /(1 + ε) + B, and integrate the product over [a,w], to get a weighted total utility of the poor as our new objective function: Objective (7) reduces to (3) when s(w) = 1 for any w ≤w. Since s(w) falls with w, w a s(w)w 1+ε f (w)dw < w a w 1+ε f (w)dw. We useẽ 1 to denote the weighted average no-tax earnings of the poor, w a s(w)w 1+ε f (w)dw/F(w). The more weight is given to the poorer households the lowerẽ 1 is. As in the previous case, we first obtain the optimal flat taxt * which maximizes (7). It is similar to t * , except for e 1 being replaced byẽ 1 , i.e.t * = (1 −ẽ 1 /E)/(1 + ε −ẽ 1 /E]. Then a higher top tax rate raises (7) if ∂ W/∂t 1 < 0 and ∂ W/∂t 2 > 0 when t 1 = t 2 =t * , which holds under a new condition (see "Appendix C").
Proposition 5 IMT give a higher value of (7) than any flat tax if at t 1 = t 2 =t * e 1 E >ȳ e 2 (8) When s(w) = 1,ẽ 1 = e 1 and (8) reduces to (5). Condition (8) can also be linked to Saez' asymptotic marginal tax rate. Given anyẽ 1 < e 1 , the optimal tax rate for top income remains the same as before, but the optimal flat taxt * is higher given higher welfare weights on the very poor. So the former is less likely to be higher than the latter, and (8) is less likely to hold than (5) is, and a higher top tax is less likely to be justifiable, unexpectedly. The intuition is that the poorer households are less productive, and rely more on income transfer. A higher tax on low earnings is less damaging to them and more beneficial due to more money transfer from the rich. So a flat tax is less likely to be dominated if we give most weight to the poorest. The validity of (5) only implies a higher top tax rate can benefit the poor as a whole, not necessarily each of them. (8) can tell us if it benefits a particular household. Our objective (7) is identical to maximizing the utility of a household with earnings ofẽ 1 , as a representative family. When (8) holds, a higher top tax rate benefit those with earnings equal or higher thanẽ 1 . Ifẽ 1 is the lowest earnings, all poor will be better off. For instance, given a Pareto distribution with α = 1.5 for the top 1% earners, y/e 2 = 1/3, and (8) becomesẽ 1 /E > 1/3. In most OECD countries (except for US), the ratio of the minimum wage to average wage is more than 1/3 6 . So a higher tax on top 1% can benefit all 99%. Similarly, with α = 2 for the top German earnings, (8) becomesẽ 1 /E > 0.5. The lowest and average monthly German salaries are e1,832 and e3449 7 . Thus virtually all poor can benefit from a higher top tax.
(ii) Declining elasticity Empirical data show that full-time and high income earners are less responsive to tax changes than part-time and low income earners (see Aaberge andColombino 2013 andAndrienko et al. 2014). So our assumption of constant elasticity of labor supply is unrealistic. In fact this assumption is unfavorable for IMT. Now we allow the elasticity to be declining with income. Our objectives (3) and (3 ), and the tax revenue (4 ) remain valid, except that ε cannot be taken out of the integrals. We follow the same approach as before, i.e. first obtain the optimal flat taxt * , which maximizes (3 ), then evaluate ∂ W/∂t 1 and ∂ W/∂t 2 when t 1 = t 2 =t * .

Proposition 6
With declining elasticity of labor supply, some IMT dominate any flat tax if at t 1 = t 2 =t * , we have Proof see "Appendix D".
(iii) More tax bands Finally, we consider the case of more than two tax bands. We assume t 1 only applies to incomes between Y and another lower threshold Y 0 , below which different tax rates may apply. So t 1 is imposed on households with w ≥ w 0 where w 1+ε Let u(w) be the utility of households with w ≤ w 0 , not subject to either t 1 or t 2 . Then the utility of the poor, (3) can be rewritten as: Let F(w 0 ) be the proportion of the households with w ≤ w 0 . Note that the tax revenue from earnings below Y 0 is independent of t 1 and t 2 . When t 1 = t 2 = t, with B 0 representing the part independent of t, we can write the basic income as: The question is: whether a higher tax on income above Y (t 2 > t) can lead to a higher value of (10) than any partial flat tax t on incomes above Y 0 , given other tax rates below Y 0 fixed. To answer this question, we follow the same approach again as before. We first obtain the optimal partial flat tax t * on incomes above Y 0 . Then we find the condition for ∂ W/∂t 1 < 0 and ∂ W/∂t 2 > 0 when t 1 = t 2 = t * . We let y 0 ≡ w 1+ε 0 , and E 0 be the zero-tax earnings of households with w ≥ w 0 , i.e.
Thus we can generalize (5) to the case with more than two tax bands (see "Appendix E").

Proposition 7 IMT can do better than any partial flat tax if at t
In our previous two-band tax case, y 0 = 0, w 0 = a, E 0 = E, (12) reduces to (5).
Although (12) is more complex than (5), its validity may be determined with simple data. In particular (12) must hold when y 0 /e 0 ≥ȳ/e 2 . For instance, if earnings above y 0 follow a Pareto distribution with y 0 /e 0 =ȳ/e 2 , (12) must hold and a higher tax rate aboveȳ is desirable. Moreover, let Y 0 = $0.15 million and Y = $0.4 million, we have y 0 /e 0 = 0.5 according to Saez (2001), andȳ/e 2 = 1/3 according to Diamond and Saez (2011). Again (12) holds and the tax rate above $0.4 million should be higher. These results again support DSPS' higher taxes for top earners.

Concluding remarks
In this paper we argue that a large poor majority are often better off under IMT than any flat tax. We obtain a sufficient condition, which only depends on aggregate features of the income distribution and the tax threshold. Using empirical data from G8 countries we find supporting evidence that a higher tax rate is justifiable when it is imposed on a small group (less than 20%). However, IMT become less likely to dominate any flat tax if we give more welfare weights to the very poor households. Similar to our original condition (5), more general results are obtained with declining elasticity of labor supply and multiple tax bands. These findings support the argument of DSPS for higher taxes on top earners. It also has interesting political economy implications, and might perhaps be interpreted as an explanation for-or at least consistent with-IMT on high income earners in most democracies, in contrast to much optimal tax theory. In this paper we do not consider categorical benefits associated with unemployment or low income. Those benefits create high marginal tax rates for participation in the labour market-the 'poverty trap'. This phenomenon, however, does not affect the larger part of the working population. We focus on the tax rates relevant to the working population and do not consider more complex structures. We do not focus on the optimal difference in tax rates and the magnitude of social gains. Both tend to be small in our model, but would be more significant given low marginal utility of income and low elasticity of labour supply for the rich. Though highly stylized, we hope that this paper contributes to the debate on tax policies.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.