Abstract
I numerically compute Bordaoptimal, i.e., optimal based on the Borda count as the normative criterion, labourincome tax schedules for the United States. I do so in the context of a Mirrleesstyle model with quasilinear preferences and a constant elasticity of labour supply. Because the Borda count is defined for finitely many alternatives, the computations restrict attention to a finite subset of the set of continuous, piecewise linear tax schedules with (in the baseline analysis) four or fewer pieces.
Introduction
I numerically compute Bordaoptimal (BO), i.e., optimal based on the Borda count as the normative criterion, labourincome tax schedules for the United States. I do so in the context of a Mirrleesstyle model with quasilinear preferences and a constant elasticity of labour supply. I perform the computations separately for three different values of the elasticity of labour supply, \(\sigma\).
A major challenge is that the Borda count is defined for finitely many alternatives whereas there are infinitely many possible tax schedules. To deal with this, I identify a subset of the feasible direct mechanisms (DMs) that (a) loosely speaking, corresponds to the set of continuous, piecewise linear tax schedules with N or fewer pieces and (b) lends itself to transparent, finite discretisations. Using \(N=4\) and one such discretisation in the baseline numerical analysis, I compute, for each value of \(\sigma\), the BO DM within the resulting finite set of DMs as well as the corresponding BO tax schedule.
The main findings in terms of the BO tax schedules are that (i) for each value of \(\sigma\), all marginal rates are positive, (ii) depending on the value of \(\sigma\), the marginal rate at the highest incomes may or may not be strictly higher than the marginal rates at all lower incomes, (iii) for each value of \(\sigma\), average rates are nevertheless (possibly, weakly) increasing in income (to a close approximation), and (iv) this progressivity is attenuated as \(\sigma\) increases. These findings hold up well under a number of robustness checks that use alternative values of N and alternative discretisations.
The existing literature on optimal taxation is largely based on utilitarianism (Bentham 1789; Mirrlees 1971), Rawls’ maxmin principle (Rawls 1971; Piketty 1997), or equality of opportunity (Roemer (1998); Fleurbaey (2008)). Although these normative approaches have their appeal, they also have two important limitations. First, they seem disconnected from the idea of democracy. This is awkward given the broad consensus in many countries that public policy should be determined through a democratic process.^{Footnote 1} Second, excluding some notions of equality of opportunity, the implementation of these approaches requires taking a stand on nonordinal properties of utility.
My findings (i)–(iv) above are in line with wellknown findings in this literature. For example, Seade (1982) shows theoretically that, in a Mirrleesstyle model with a utilitarian criterion, the optimal tax schedule must be strictly increasing (in line with finding (i)). Also, using numerical analysis in a Mirrleesstyle model with a utilitarian and a Rawlsian criterion, Saez (2001) obtains marginal rates at high incomes that are lower than marginal rates at low incomes (in line with finding (ii)).^{Footnote 2} What is novel in my paper is that findings (i)–(iv) have been derived based on a different normative foundation.
An alternative, normatively appealing approach to optimal taxation is to use majority rule. Unfortunately, for general sets of tax schedules, a Condorcet winner is not guaranteed to exist. However, if we restrict attention to linear tax schedules, a Condorcet winner does exist under some assumptions (Roberts (1977)). In these settings, three key findings regarding the linear tax schedule selected by majority rule are that (a) under plausible assumptions, the marginal rate is positive, (b) the intercept can be positive, so that the average rate can be decreasing, and (c) the marginal rate is increasing in the ratio between mean and median income (at least when government consumption is zero).^{Footnote 3} The current paper differs from this literature in that it uses a different normative criterion and considers more flexible tax schedules.
The Borda count has several important advantages as a normative criterion. First, it has been characterised in terms of normatively appealing axioms (Young (1974), Maskin (2021)).^{Footnote 4} Second, preference aggregation seems central to the idea of democracy. Third, the Borda count can be implemented without going beyond ordinal utility.
Of course, the Borda count also has limitations. First, it is defined for finitely many alternatives and the results could be sensitive to the discretisation employed. Second, although the Borda count exhibits some sensitivity to the intensity of preferences between any two alternatives by taking into account the number of alternatives ranked inbetween by each individual, policymakers may wish to be more sensitive to preference intensities (e.g., based on introspection or individuals’ verbal reports).
There is also a literature that studies labourincome taxation in various descriptive (as opposed to normative) political economy models. For example, Röell (2012), Brett and Weymark (2017), and De Donder and Hindriks (2003) study labourincome taxation in a twostep model: at the first step, each individual proposes a tax schedule that is selfishlyoptimal for her; at the second step, majority rule is applied to the proposed tax schedules (on which a Condorcet winner exists under some assumptions). Chen (2000), CarbonellNicolau and Efe (2007), Roemer (2012), and Bierbrauer and Boyer (2016) consider models of political competition in which politicians choose tax policies on which to run for office. Bierbrauer et al. (2021) characterise when a monotonic labourincome tax reform (i.e., a reform such that the change in the tax burden is a monotonic function of income) is politically feasible in the sense that it is preferred by a majority over the status quo.^{Footnote 5}
Preferences and productivities
Individuals have preferences over consumption \(c \ge 0\) and labour \(l \ge 0\) represented by the utility function \(c  \frac{\sigma }{1+\sigma } l^\frac{1+\sigma }{\sigma }\), where \(\sigma >0\) is the (Hicksian and Marshallian) elasticity of labour supply. Each individual has a productivity (or type) which is her private information. When type w puts in labour l, she earns pretax income wl. The set of types is \([{\underline{w}},{\overline{w}}]\), where \(0<{\underline{w}}<{\overline{w}}\). Types are distributed according to the probability density function f which has full support on \([{\underline{w}},{\overline{w}}]\).
DMs
Feasible DMs
Given the revelation principle, we can restrict attention to DMs. A DM is a tuple (Y, C), where \(Y:[{\underline{w}},{\overline{w}}] \rightarrow [0,\infty )\) and \(C:[{\underline{w}},{\overline{w}}] \rightarrow [0,\infty )\). Y(w) and C(w) are the income and the consumption, respectively, assigned to an individual reporting to be of type w.
A DM is feasible if the following conditions hold.

(a)
Incentive compatibility: Y is nondecreasing and, for all \(w \in [{\underline{w}},{\overline{w}}]\),
$$\begin{aligned} C(w) = C({\underline{w}})  \frac{\sigma }{1+\sigma } \left( \frac{Y({\underline{w}})}{{\underline{w}}} \right) ^\frac{1+\sigma }{\sigma } + \frac{\sigma }{1+\sigma } \left( \frac{Y(w)}{w} \right) ^\frac{1+\sigma }{\sigma } + \int _{{\underline{w}}}^w \left( \frac{Y({\tilde{w}})}{{\tilde{w}}} \right) ^\frac{1+\sigma }{\sigma } \frac{1}{{\tilde{w}}} d {\tilde{w}}. \end{aligned}$$(1) 
(b)
Government budget constraint:
$$\begin{aligned} \int _{{\underline{w}}}^{{\overline{w}}} (Y(w)C(w)) f(w) dw = R, \end{aligned}$$(2)where \(R \ge 0\) is the exogenously given government consumption per capita.^{Footnote 6}
A finite subset of the feasible DMs
Because the Borda count is defined for a finite set of alternatives, it is necessary to restrict attention to a finite subset of the feasible DMs. To this end, I augment conditions (a) and (b) with two further conditions, the first one being the following.

(c)
Y is of the form:
$$\begin{aligned} Y(w) = \left\{ \begin{array}{ll} (1t_1)^\sigma w^{1+\sigma } &{} \text {if } w=w_0 \\ (1t_i)^\sigma w^{1+\sigma } &{} \text {if } w_{i1}< w \le w_i, t_{i1} > t_i \\ (1t_{i1})^\sigma w_{i1}^{1+\sigma } &{} \text {if } w_{i1}< w \le \left( \frac{1t_{i1}}{1t_i} \right) ^\frac{\sigma }{1+\sigma } w_{i1}, t_{i1}< t_i \\ (1t_i)^\sigma w^{1+\sigma } &{} \text {if } \left( \frac{1t_{i1}}{1t_i} \right) ^\frac{\sigma }{1+\sigma } w_{i1}< w \le w_i, t_{i1} < t_i \\ \end{array} \right. , \end{aligned}$$(3)where (i) \(i \in \{1,\ldots ,n\}\), \(n \ge 1\), (ii) \(w_0={\underline{w}}\), \(w_n={\overline{w}}\), and \(w_{i1} < w_i\) for all i, (iii) \(t_0=1\), \(t_i<1\) for all i, and \(t_{i1} \ne t_i\) for all i, (iv) \(\left( \frac{1t_{i1}}{1t_i} \right) ^\frac{\sigma }{1+\sigma } w_{i1} \le w_i\) for all i such that \(t_{i1} < t_i\), and (v) \(\left( \frac{1t_{i1}}{1t_i} \right) ^\frac{\sigma }{1+\sigma } w_{i1} < w_i\) for all \(i<n\) such that \(t_{i1}< t_i < t_{i+1}\).
The following proposition shows that a DM satisfying (a) and (c) can be interpreted in terms of a corresponding tax schedule.^{Footnote 7}
Proposition 1
Suppose (Y, C) satisfies (a) and (c). Then, there exists a unique tax schedule, T, such that the following hold.

(i)
T implements (Y, C).

(ii)
T is continuous and piecewise linear with n pieces.

(iii)
For each \(i \in \{1,\ldots ,n\}\), \(t_i\) is the slope of the \(i{\text {th}}\) piece of T.^{Footnote 8}

(iv)
If \(n \ge 2\), then, for each \(i \in \{2,\ldots ,n\}\) such that \(t_{i1} > t_i\), \(w_{i1}\) is the highest type that chooses a point on the \((i1){\text {st}}\) piece of T.

(v)
If \(n \ge 2\), then, for each \(i \in \{2,\ldots ,n\}\) such that \(t_{i1} < t_i\), \(w_{i1}\) is the lowest type that chooses at the kink between the \((i1){\text {st}}\) and \(i{\text {th}}\) pieces of T.^{Footnote 9}
Thus, given (Y, C) satisfying (a) and (c), \(t_i\) (\(i=1,\ldots ,n\)) is the marginal rate on the \(i{\text {th}}\) piece of the corresponding tax schedule, T, and \(w_{i1}\) (\(i=2,\ldots ,n\)) is the threshold type where types switch to locating on the \(i{\text {th}}\) piece of T.
The next proposition provides a kind of converse of Proposition 1.
Proposition 2
Suppose that (i) (Y, C) is implemented by some continuous, piecewise linear tax schedule with N pieces and (ii) if \(w={\underline{w}}\) or w is a jump point of Y, Y is strictly increasing on \((w,w+\delta )\) for some \(\delta >0\). Then (Y, C) satisfies (a) and Y satisfies (c) almost everywhere for some \(n \le N\).
Condition (ii) seems weak: it applies to at most N, arbitrarily narrow intervals^{Footnote 10} on each of which it, moreover, allows Y to be arbitrarily close to constant. Thus, abstracting from what seem like technical details, Propositions 1 and 2 tell us that a DM satisfies (a) and (c) for some \(n \le N\) if and only if it is implemented by a continuous, piecewise linear tax schedule with N or fewer pieces.
Letting w(p) denote the \(p{\text {th}}\) type percentile, the next condition provides a finite, numerically tractable discretisation of the set of Y functions satisfying (c).

(d)
\(n \le 4\). Given n, \(t_i \in \{ 2.1.5,1,.8,.6,.4,.2,0,.1,.2,.3,.4,.5,.6,.7,.8,.9 \}\) for all \(i \in \{1,\ldots ,n\}\) and \(w_i \in \{w(10),w(20),w(30),w(40),w(50),w(60),w(70),w(80),\)\(w(90),w(95),w(99),w(99.9) \}\) for all\(i \in \{1,\ldots,n1\}.\)
Thus, the discretisation in (d) in effect restricts attention to continuous, piecewise linear tax schedules with four or fewer pieces such that (i) the marginal rate on any of the pieces lies on the given grid for the \(t_i\)’s and (ii) threshold types lie on the given grid for the \(w_i\)’s (e.g., tax schedules such that types just below the \(45\text {th}\) percentile choose on the second piece and types just above the \(45\text {th}\) percentile choose on the third piece are ruled out). I have somewhat arbitrarily truncated marginal tax rates at \(2\) from below, noting that even lower marginal tax rates could probably only apply to a small fraction of the population if they are to be feasible.
From here on, I restrict attention to the set of DMs satisfying (a)–(d). Let \({\mathcal {D}}\) denote this set. Because Y pins down C through constraints (1) and (2), \({\mathcal {D}}\) corresponds to the set of Y functions such that (c) holds, (d) holds, and \(C({\underline{w}})\) obtained after plugging in for C(w) from (1) into (2) is nonnegative.^{Footnote 11}
Before proceeding, let us consider the following question: Why look for a BO DM in \({\mathcal {D}}\) rather than for a BO continuous, piecewise linear tax schedule with four or fewer pieces? There are three disadvantages to the latter approach. First, to discretise the set of continuous, piecewise linear tax schedules with four or fewer pieces, one would need to choose the grid of income levels at which the kinks can be located. However, it is not obvious how to do that. In contrast, the grid for the \(w_i\)’s in condition (d) seems transparent and natural. Second, one would need to solve each type’s laboursupply optimisation problem given each tax schedule, and this is likely to considerably slow down the numerical calculations. Third, there can be multiple continuous, piecewise linear tax schedules with four or fewer pieces implementing the same DM and one would need to eliminate such duplicates before applying the Borda count.^{Footnote 12} However, duplicate tax schedules may be tricky to identify as they may incorrectly appear to implement slightly different DMs due to imperfect numerical precision.
The Borda count
Given \((Y,C) \in {\mathcal {D}}\), let \(\Delta (Y,C,w)\) denote the number of DMs in \({\mathcal {D}}\) that are strictly worse than (Y, C) according to type w minus the number of DMs in \({\mathcal {D}}\) that are strictly better than (Y, C) according to type w.^{Footnote 13} The Borda count of (Y, C) is:^{Footnote 14}\(^,\)^{Footnote 15}
\((Y,C) \in {\mathcal {D}}\) is BO if \(B(Y,C) \ge B({\hat{Y}},{\hat{C}})\) for all \(({\hat{Y}},{\hat{C}}) \in {\mathcal {D}}\).
Note that evaluating B(Y, C) requires computing all types’ rankings over \({\mathcal {D}}\), which is numerically infeasible. Therefore, to obtain my numerical results, I approximate the integral in (4) based on the rankings of a finite set of “representative” types. The main idea is to approximate \(\Delta (Y,C,\cdot )\) via a step function by (i) partitioning \([{\underline{w}},{\overline{w}}]\) into 14 subintervals and (ii) replacing \(\Delta (Y,C,\cdot )\) over each subinterval with \(\Delta (Y,C,w_m)\), where \(w_m\) is the median (i.e., “representative”) type in that subinterval.^{Footnote 16} I will refer to a DM maximising the approximation of the integral in (4) as “BO” even though, strictly speaking, it’s only BO if it maximises the actual integral in (4).
Calculations for the United States
Calibration
Elasticity of labour supply
Given the considerable controversy in the literature on the elasticity of labour supply,^{Footnote 17} I will perform the analysis separately for \(\sigma \in \{0.25,0.5,1\}\). In choosing these values, I am following Saez and Stantcheva (2018).
Distribution of types
The main idea for calibrating the distribution of types goes as follows. First, I assume that the actual labourincome tax schedule is linear with a 30 percent marginal tax rate. Given this tax schedule, type w’s optimal pretax labour income is \(y^*(w)=0.7^\sigma w^{1+\sigma }\). Second, I back out the distribution of types based on \(y^*(\cdot )\) and data from the World Inequality Database (WID) on the empirical distribution of pretax labour income for individuals over age 20 in the US in 2014.^{Footnote 18}
Government consumption per capita
According to WID, US national income per individual over age 20 in 2014 was $65,192.^{Footnote 19} According to Piketty, Saez and Zucman (2018), total (i.e., federal, state, and local) government consumption in the US has been around 18 percent of national income since the end of World War II. Thus, I set \(R=65,192 \times 0.18 \approx 11,735\). This calculation assumes that government consumption must be financed entirely from labour income taxation, which seems like the natural theoretical benchmark based on Atkinson and Stiglitz (1976).^{Footnote 20}
Main results
For each \(\sigma \in \{0.25,0.5,1\}\), I compute the (as it turns out, unique) BO DM and the corresponding (in the sense of Proposition 1) BO tax schedule.^{Footnote 21} The main features of the BO tax schedules are presented in Table 1 as well as in Figs. 1 and 2. Table 1 shows, for each value of \(\sigma\), the BO Universal Basic Income (UBI), i.e., the negative of the intercept of the BO tax schedule. Figure 1 (Fig. 2) depicts, for each value of \(\sigma\), the BO marginal (average, respectively) tax rate as a function of income.
The first finding is the following.
Finding 1
For \(\sigma \in \{0.25,0.5,1\}\), all BO marginal tax rates are positive.
In particular, there is no equivalent to the the Earned Income Tax Credit at low incomes.
The next finding is perhaps at odds with what is often taken for granted in popular discourse.
Finding 2
For \(\sigma =0.25\), the BO marginal tax rate at the highest incomes is strictly higher than the BO marginal tax rates at all lower incomes. However, this is not true for \(\sigma =1\).^{Footnote 22}
Nevertheless, because of the UBI and marginal rates that don’t decrease sufficiently with income, the BO tax schedule is (possibly, weakly) progressive in terms of average rates.
Finding 3
The BO average tax rate is strictly increasing in income for \(\sigma \in \{0.25,0.5\}\) and, to a close approximation, weakly increasing in income for \(\sigma =1\).
Furthermore, the following holds.
Finding 4
For any incomes \(y_1\) and \(y_2\) such that \(0< y_1< y_2 < 925653\), the difference between the BO average tax rate at \(y_2\) and at \(y_1\) is strictly decreasing in \(\sigma\) on \(\{0.25,0.5,1\}\).^{Footnote 23}
Thus, the progressivity of the BO tax schedule is decreasing in \(\sigma\), at least at the income levels that are relevant for the vast majority of the population.^{Footnote 24} This occurs because (i) the BO UBI falls substantially as \(\sigma\) increases and (ii) abstracting from some minor exceptions at low incomes, at any income level the BO marginal tax rate weakly decreases as \(\sigma\) increases on \(\{0.25,0.5,1\}\). For \(\sigma =1\), the progressivity is attenuated to the point that the average tax rate is approximately flat for a wide range of incomes (for incomes between $32,878 and $925,653, to be precise).^{Footnote 25}
Robustness checks
I explore the robustness of Findings 1–4 to the discretisation in condition (d) by redoing the numerical analysis for each of the following variations of that condition.

(d1)
Same as condition (d) except that \(n \le 3\) instead of \(n \le 4\).

(d2)
\(n \le 4\). Given n, \(t_i \in \{ .8,.6,.4,.2,0,.2,.4,.6,.8 \}\) for all \(i \in \{1,\ldots ,n\}\) and \(w_i \in \{ w(20),w(40),w(60),w(80),w(95),w(99) \}\) for all \(i \in \{1,\ldots ,n1\}\).

(d3)
\(n \le 5\). Given n, \(t_i \in \{ .8,.6,.4,.2,0,.2,.4,.6,.8 \}\) for all \(i \in \{1,\ldots ,n\}\) and\(w_i \in \{ w(10),w(20),w(30),w(40),w(50),w(60),w(70),w(80), w(90),w(95),w(99),w(99.9) \}\) for all \(i \in \{1,\ldots,n1\}.\)

(d4)
\(n = 3\). Letting \({\mathcal {A}}\) denote a set of 20 million points drawn from a uniform distribution on \(\{ (p_1,p_2,t_1,t_2,t_3)  10 \le p_1<p_2 \le 99.99, 1 \le t_i <1 \text { for } 1\le i \le 3 \}\), \((w_1,w_2,t_1,t_2,t_3)\) are such that \((w_1,w_2,t_1,t_2,t_3)=(w(p_1),w(p_2),t_1,t_2,t_3)\) for some \((p_1,p_2,t_1,t_2,t_3) \in {\mathcal {A}}\).^{Footnote 26}\(^,\)^{Footnote 27}
The discretisations in (d1) and (d2) are coarsenings of the discretisation in (d). Relative to (d), (d3) coarsens the grid for the \(t_i\)’s, but allows for tax schedules with five pieces. The discretisation in (d4) is quite different in that the \(w_i\)’s and \(t_i\)’s are drawn randomly.
Findings 1–4 hold up well under (d1)–(d4).^{Footnote 28} In particular, Finding 1 continues to hold across the board.
Finding 2 also continues to hold under (d1), (d2), and (d4). Under (d3), the BO marginal tax rate at the highest incomes is not strictly higher than the BO marginal tax rates at all lower incomes for \(\sigma = 0.25\) either. However, this is only because of high BO marginal rates over the narrow income intervals [0, 4505] and [25406, 32510].
Finding 3 continues to hold with the following exceptions. Under (d2), the BO average tax rate for \(\sigma =1\) modestly declines from 0.355 to 0.27 between incomes $60,699 and $134,115. Under (d3), the BO average tax rate for \(\sigma =1\) modestly declines from 0.343 to 0.249 between incomes $46,434 and $134,115. Given the flatness of the BO average tax rate over these income ranges under (d) and the coarseness of the grids for the \(t_i\)’s under (d2) and (d3), these exceptions seem minor.
Finally, the results under (d1)–(d4) are roughly in line with Finding 4 and Fig. 2 in the sense that, under each of these conditions, the BO averagerate schedule rotates clockwise as \(\sigma\) increases. Having said this, there are some instances in which the BO averagerate schedule over a particular income range is not flatter for a higher value of \(\sigma\).^{Footnote 29}
Comments on the WID data
A few comments regarding the WID data on pretax labour income are in order. First, this data is based on all individuals over age 20 and it counts income from public and private pensions as labour income. This is not ideal for the purpose of backing out productivities because the relationship between pension income and productivity is probably different from the relationship between a workingage individual’s labour income and productivity.
Second, income is split equally within couples, which forces us to treat spouses as having the same productivity. This seems preferable for the purposes of the current paper because it ensures that the same preference over tax schedules is imputed to both spouses.
Third, although using crosssectional data on the distribution of annual income to back out productivities is common (e.g., see Saez (2001)), this probably leads us to exaggerate the dispersion in lifetime productivities. The latter are probably more relevant if we are concerned with the design of a longterm tax system.^{Footnote 30}
Concluding remarks
This paper is an attempt to apply the idea of democracy, as embodied in the Borda count, to the optimal taxation of labour income. Undoubtedly, the analysis has important limitations. Notably, it relies on (i) a simple, static model of labour supply with quasilinear preferences and a constant elasticity of labour supply, (ii) finite discretisations of the set of feasible DMs, and (iii) imperfect data on pretax labour income. For these reasons, Findings 1–4 focused on qualitative aspects of the BO tax schedules and, even so, I view these findings as no more than indicative. More broadly, I hope the current paper will encourage research on BO public policies.
Notes
In some voting models, there is a connection between utilitarianism and majority rule. (See Krishna and Morgan (2015) and the references therein.) However, in these models, voters choose between only two policies. (These two policies may have been endogenously selected from a larger set of policies at a prevoting stage in which candidates strategically decide on which policies to run. However, such a strategic prevoting stage is hardly a key component of the normative ideal of democracy.)
I believe that Saez’s findings are also in line with findings (iii) and (iv), though it’s hard to be sure based on the information provided in his paper.
It is wellknown that the Borda count violates Arrow’s independence of irrelevant alternatives (IIA) (Arrow 1951). However, Maskin (2021) argues that IIA is too stringent and shows that the Borda count satisfies a normatively appealing weakening of IIA. Pearce (2021) also argues forcefully against IIA.
They also characterise when a small perturbation in the statusquo tax schedule is both politically feasible and welfareimproving under (weighted) utilitarianism. Thus, their paper connects to the normative literature based on utilitarianism mentioned above (a key difference being that they consider incremental reforms rather than globally optimal tax schedules).
By imposing equality in (2), I am not allowing the government to burn money. This may not be inconsequential as the Borda rule can be sensitive to the deletion of alternatives. The justification for imposing equality in (2) is twofold. First, this dramatically reduces the number of DMs I’ll need to consider. Second, the model already implicitly leaves out many alternatives that are dominated according to any reasonable preferences, such as alternatives that entail destroying all bridges. Ruling out the burning of money seems to be in the same spirit.
A tax schedule is a function \(T:[0,\infty ) \rightarrow \mathbb {R}\). T(y) is the tax owed by an individual earning income y. T implements (Y, C) if, for all \(w \in [{\underline{w}},{\overline{w}}]\), \(Y(w) \in \mathop {{\mathrm{arg\, max}}}\limits _{y \ge 0} yT(y)  \frac{\sigma }{1+\sigma } \left( \frac{y}{w} \right) ^\frac{1+\sigma }{\sigma }\) and \(C(w)=Y(w)  T(Y(w))\).
I’m counting the pieces (in the graph) of T from left to right.
All proofs are in the appendix.
As made clear in the proof, condition (i) ensures that Y has at most \(N1\) jump points.
Condition (c) ensures that Y is nondecreasing and \(Y(w) \ge 0\) for all \(w \in [{\underline{w}},{\overline{w}}]\).
Including duplicate tax schedules would not only slow down the numerical calculations, but it could also change the BO tax schedule because the Borda winner is not necessarily invariant to the inclusion of duplicates. Given that the model already implicitly leaves out many duplicates (e.g., it conflates (i) tax schedule T and a blue tax form and (ii) the same tax schedule T and a green tax form), eliminating duplicate tax schedules seems to be in the same spirit.
Type w’s ranking over \({\mathcal {D}}\) is based on the indirect utility function \(\phi _w(Y,C) = C(w)  \frac{\sigma }{1+\sigma } \left( \frac{Y(w)}{w}\right) ^\frac{1+\sigma }{\sigma }\). Note that this implicitly assumes that each individual’s preference over DMs is selfish. This assumption is discussed further in the appendix.
I assume the integral in (4) exists.
The Borda count in (4) generalizes the usual Borda count to the case in which individuals can exhibit indifference between alternatives (which is the relevant case in the current context). Note that Maskin (2021) assumes that individuals’ preferences over alternatives are strict. However, as Ivanov (2022) shows, the Borda count in (4) satisfies (extensions to the case of weak preferences of) the axioms in Maskin (2021) (as well as an additional normatively appealing axiom).
The details are in the appendix.
Section 6 discusses some important aspects of the WID data. The details of how I back out the distribution of types are in the appendix.
All dollar amounts in the paper are in 2014 dollars.
The computations were done in Mathematica 12. The code is provided in separate files (refer https://doi.org/10.1007/s00355022014119).
It is also not true for \(\sigma =0.5\), but this doesn’t survive all robustness checks in section 5.3 below as well as the robustness check mentioned in footnote 20.
To establish this, I compute, for each \(\sigma \in \{0.25,0.5,1\}\), the derivative of the BO average tax rate with respect to income. Denoting this derivative at income y by \(a(y,\sigma )\), I obtain that \(a(y,0.25)> a(y,0.5) > a(y,1)\) for almost all \(y \in (0,925653)\). The finding follows because the BO average tax rate is an absolutely continuous function of income so that the increase in the average tax rate over \([y_1,y_2]\) equals \(\int _{y_1}^{y_2} a(y,\sigma ) dy\).
For \(\sigma \in \{0.25,0.5,1\}\), around 99.9 percent of the population choose an income below $925,653 when faced with the BO tax schedule.
One may ask: Are the BO tax schedules more or less progressive than utilitarianoptimal ones? In the appendix, I address this question (without reaching any firm conclusions).
The cases \(n=1\) and \(n=2\) are also somewhat covered under (d4) because \(w_1\) can be arbitrarily close to \(w_2\), \(t_1\) can be arbitrarily close to \(t_2\), and \(t_2\) can be arbitrarily close to \(t_3\).
Recall that the Borda count in (4) is approximated based on the rankings of representative types. The lowest such type is w(5) and the highest such type is w(99.995). Thus, all representative types have aligned incentives to exploit types in \([{\underline{w}},w(5))\) and \((w(99.995),{\overline{w}}]\) by applying a separate marginal rate to a narrow interval of types at the bottom and, respectively, top of \([{\underline{w}},{\overline{w}}]\). The requirement in (d4) that \(w_1 \ge w(10)\) and \(w_2 \le w(99.99)\) constrains such customised targeting of marginal rates at the bottom and top of \([{\underline{w}},{\overline{w}}]\).
In particular, under (d1), the BO averagerate schedules for \(\sigma =0.25\) and \(\sigma =0.5\) have virtually identical slopes between incomes $108,645 and $226,927. Under (d2) and (d3), the BO averagerate schedule is in fact (i) steeper for \(\sigma =0.5\) than for \(\sigma =0.25\) at incomes above $241,642 and (ii) slightly steeper for \(\sigma =1\) than for \(\sigma =0.5\) between incomes $134,115 and $241,642.
Guvenen et al. (2021) have recently provided data on the distribution of lifetime labour incomes. This data is also not ideal for the purposes of the current paper. Remarkably, in the WID data and the Guvenen et al. data, the distribution of income across the population is very similar. I elaborate on these points in the appendix.
WID defines pretax labour income as the sum of all pretax personal income flows accruing to the individual owners of labor as a production factor, before taking into account the operation of the tax/transfer system, but after taking into account the operation of the pension system. The base unit is the individual (rather than the household) but resources are split equally within couples.
For brevity, in the rest of this section I will write “income” although in fact I mean “pretax labour income”
WID reports a negative \(0\text {th}\) income percentile. (I believe this is largely due to the partial imputation of the losses of privately owned businesses to labour income.) However, this is not consistent with the assumption \({\underline{w}}>0\).
As is typical in the utilitarian approach, I (and Saez) offer no justification for the choice of a particular utility representation of each individual’s preferences.
In particular, I was either assuming that Y(w) is continuous and piecewise linear in w or I was assuming that Y(w)/w is continuous and piecewise linear in w. These approaches involved various ad hoc assumptions and were sensitive to changes in these assumptions. Thus, I abanodoned them when I came up with the finite discretisation of the set of feasible DMs based on conditions (c) and (d).
This cohort is the most recent cohort for which data for the whole period between the ages of 25 and 55 is available.
If I understand correctly, these two tables display the same information based on different samples from the same data. Also, these tables (like most of the analysis in that paper) restrict attention to individuals who have had sufficient attachment to the labour market and have been employed in certain sectors. However, comparing data on the distribution of income for this narrower subset of the population (see the last six lines in Table C.12 in Guvenen et al.) and for the whole population (see Table F.2 in Guvenen et al.) reveals that the distribution of earnings in the narrower subset and in the whole population are quite similar.
To apply the maximum theorem, we need the constraint set in problem (6) to be compact. Let \({\overline{y}}\) denote an income level such that (i) it is strictly higher than the income level where the last kink in Z occurs and (ii) type \({\overline{w}}\)’s indifference curves in incomeconsumption space at income \({\overline{y}}\) are steeper than the last piece of Z. Because no type would choose an income level above \({\overline{y}}\), the constraint \(y \ge 0\) in problem (6) can be replaced by the constraint \(0 \le y \le {\overline{y}}\).
I haven’t required that \(Z(y) \ge 0\) for all \(y \ge 0\). Indeed, the unique Z in the claim may be such that \(Z(y) < 0\) for some \(0 \le y<Y({\underline{w}})\). (Analogously, I haven’t required that \(T(y) \le y\) for all \(y \ge 0\), and the unique T in Proposition 1 may be such that \(T(y) > y\) for some \(0 \le y<Y({\underline{w}})\).) This isn’t a problem in practice given that, in the calibrated distribution of types, \({\underline{w}}\) is very close to zero so that, even for very low values of \(t_1\), \(Y({\underline{w}})\) is very close to zero.
The fact that types’ optimal consumptionincome choices under Z must be incentive compatible (because each type could have mimicked any other type’s consumptionincome choice), (i), and (ii) imply that, for all \(w \in [{\underline{w}},{\overline{w}}]\),
$$\begin{aligned}&Z(Y(w)) = \\&Z(Y({\underline{w}}))  \frac{\sigma }{1+\sigma } \left( \frac{Y({\underline{w}})}{{\underline{w}}} \right) ^\frac{1+\sigma }{\sigma } + \frac{\sigma }{1+\sigma } \left( \frac{Y(w)}{w} \right) ^\frac{1+\sigma }{\sigma } + \int _{{\underline{w}}}^w \left( \frac{Y({\tilde{w}})}{{\tilde{w}}} \right) ^\frac{1+\sigma }{\sigma } \frac{1}{{\tilde{w}}} d {\tilde{w}}= \\&C({\underline{w}})  \frac{\sigma }{1+\sigma } \left( \frac{Y({\underline{w}})}{{\underline{w}}} \right) ^\frac{1+\sigma }{\sigma } + \frac{\sigma }{1+\sigma } \left( \frac{Y(w)}{w} \right) ^\frac{1+\sigma }{\sigma } + \int _{{\underline{w}}}^w \left( \frac{Y({\tilde{w}})}{{\tilde{w}}} \right) ^\frac{1+\sigma }{\sigma } \frac{1}{{\tilde{w}}} d {\tilde{w}} = \\&C(w). \end{aligned}$$Given the concavity in y of the maximand in problem (6), the firstorder condition is sufficient.
For all \(i \in \{1,\ldots ,k1\}\), “\(\left( \frac{1t_{i1}}{1t_i} \right) ^\frac{\sigma }{1+\sigma } w_{i1} \le w_i\) whenever \(t_{i1} < t_i\)” implies “\(\left( \frac{1t_{i1}}{1t_i} \right) ^\frac{\sigma }{1+\sigma } {\tilde{w}}_{i1} \le {\tilde{w}}_i\) whenever \(t_{i1} < t_i\)”. For all \(i \in \{1,\ldots ,k2\}\), “\(\left( \frac{1t_{i1}}{1t_i} \right) ^\frac{\sigma }{1+\sigma } w_{i1} < w_i\) whenever \(t_{i1}< t_i < t_{i+1}\)” implies “\(\left( \frac{1t_{i1}}{1t_i} \right) ^\frac{\sigma }{1+\sigma } {\tilde{w}}_{i1} < {\tilde{w}}_i\) whenever \(t_{i1}< t_i < t_{i+1}\)”.
It should be clear that \(Y_{1}\) coincides with Y on \([{\underline{w}},w_{k2}]\). To see that the rest of the statement is true, note that, for \(w \in (w_{k2},{\overline{w}}]\), we have
$$\begin{aligned} Y_{1}(w) = \left\{ \begin{array}{ll} (1t_{k1})^\sigma w^{1+\sigma } &{} \text {if } w_{k2}< w \le w_k, t_{k2} > t_{k1} \\ (1t_{k2})^\sigma w_{k2}^{1+\sigma } &{} \text {if } w_{k2}< w \le \left( \frac{1t_{k2}}{1t_{k1}} \right) ^\frac{\sigma }{1+\sigma } w_{k2}, t_{k2}< t_{k1} \\ (1t_{k1})^\sigma w^{1+\sigma } &{} \text {if } \left( \frac{1t_{k2}}{1t_{k1}} \right) ^\frac{\sigma }{1+\sigma } w_{k2}< w \le w_{k1}, t_{k2}< t_{k1} \\ (1t_{k1})^\sigma w^{1+\sigma } &{} \text {if } w_{k1}< w \le w_k, t_{k2} < t_{k1} \\ \end{array} \right. . \end{aligned}$$This is clear when \(k=2\). Now assume \(k \ge 3\). Given that \(w_{k1}>w_{k2}\) and Y is nondecreasing, it must be that \(Y(w_{k1})\) is weakly higher than the income level at which the kink between the \((k2){\text {nd}}\) and \((k1){\text {st}}\) pieces of \(Z_{1}\) occurs. (Figure 4 is drawn assuming \(Y(w_{k1})\) is strictly to the right of that kink, but nothing in the logic of what follows relies on that.)
It is straightforward to verify that this indifference curve has slope \(1t_{k1}\) at that point.
Straightforward computations yield \(K=\frac{(1t_k)^{1+\sigma }(1t_{k1})^{1+\sigma }}{(1+\sigma )(t_{k1}t_k)} w_{k1}^{1+\sigma }\).
This follows because \(y\frac{\sigma }{1+\sigma } \left( \frac{y}{w} \right) ^\frac{1+\sigma }{\sigma }\) satisfies the singlecrossing property.
The only way for Y to be flat immediately to the left of \(w_{k1}\) is if \(\left( \frac{1t_{k2}}{1t_{k1}} \right) ^\frac{\sigma }{1+\sigma } w_{k2} = w_{k1}\) and \(t_{k2}<t_{k1}\). However, \(t_{k2}<t_{k1}<t_k\) implies \(\left( \frac{1t_{k2}}{1t_{k1}} \right) ^\frac{\sigma }{1+\sigma } w_{k2} < w_{k1}\).
This follows because \((Y(w_{k1}),Z(Y(w_{k1})))\) is available both given Z and given \(Z_{1}\) while the budget set defined by Z in incomeconsumption space is a subset of the one defined by \(Z_{1}\).
This follows because \(y\frac{\sigma }{1+\sigma } \left( \frac{y}{w} \right) ^\frac{1+\sigma }{\sigma }\) satisfies the singlecrossing property.
Note that, because Y is strictly increasing on \((w_{k1}\delta ,w_{k1}]\) for some \(\delta >0\), \(w_{k1}\) is the lowest type to choose at the kink in Z at income \(K=Y(w_{k1})\).
This follows because Y is nondecreasing (so that \(Y(w_{k1}) \le Y(w) \le Y(w_k)\) for all \(w \in (w_{k1},w_k]\)) and \(w_k\) chooses on the \(k{\text {th}}\) piece of \(Z'\) and \(Z''\) (by (iv) and (v) in Claim 1 applied to \(i=k+1\) if \(k<n\) and by the fact that Y is nondecreasing if \(k=n\)).
Clearly, (i) is equivalent to the assumption that (Y, C) is implemented by some continuous, piecewise linear tax schedule with N pieces.
Because Y is strictly increasing near \({\underline{w}}\), we must have \(y>0\). Thus, \(Y(w_a)\) and \(Y(w_b)\) aren’t corner solutions of problem (6) and the tangency condition must hold.
The limits exist because Y is nondecreasing.
Because Y is strictly increasing near \({\underline{w}}\), \(Y(w\epsilon )\) and \(Y(w+\epsilon )\) aren’t corner solutions of problem (6) and the tangency condition must hold.
Given Lemma 1, the fact that Y is nondecreasing, and the fact that Z has at most \(N1\) kinks, there must be at most finitely many intervals on which Y is constant. Thus, the first minimum exists. Given Lemma 2, the fact that Y is nondecreasing, and the fact that Z has at most \(N1\) kinks, it must be that Y has finitely many (in fact, at most \(N1\)) jump points. Thus, the second minimum also exists.
That is, on \((w_{i1},w_i)\), once Y starts increasing, it cannot stop.
\({\hat{y}}>0\) given that Y is strictly increasing near \({\underline{w}}\).
We need \(\epsilon\) to be smaller than the size of the jump in Y at \(w_{i1}\).
We need \(\epsilon\) to be smaller than the size of the jump in Y at \(w_{i1}\).
By the definition of \(t_{i1}\) and Lemma 3, the lefthand side is the expression for Y just to the left of \(w_{i1}\). The equality holds because Y is continuous at \(w_{i1}\).
We need \(\epsilon\) to be smaller than the size of the jump in Y at \(w_i\).
See part 1) of Lemma 5.
References
Arrow KJ (1951) Social choice and individual values. Wiley, New York
Atkinson AB, Stiglitz JE (1976) The design of tax structure: direct versus indirect taxation. J Pub Econ 6(1–2):55–75
Bentham J (1789) An introduction to the principles of morals and legislation.
Bierbrauer FJ, Boyer PC (2016) Efficiency, welfare, and political competition. Quart J Econo 131(1):461–518
Bierbrauer FJ, Boyer PC, Peichl A (2021) Politically feasible reforms of nonlinear tax systems. Am Econ Rev 111(1):153–91
Brett C, Weymark JA (2017) Voting over selfishly optimal nonlinear income tax schedules. Games Econ Behav 101:172–188
CarbonellNicolau O, Efe AO (2007) Voting over income taxation. J Econ Theory 134(1):249–286
Chen Y (2000) Electoral systems, legislative process, and income taxation. J Pub Econ Theory 2:71–100
Fleurbaey M (2008) Fairness, responsibility, and welfare. Oxford University Press, Oxford
Guvenen F, Kaplan G, Song J, Weidner J (2021) Lifetime Earnings in the United States over Six Decades. University of Chicago, Becker Friedman Institute for Economics Working Paper 202160
Hvidberg KB, Kreiner C, Stantcheva S (2021) Social Position and Fairness Views. In: National Bureau of Economic Research working paper 28099
Ivanov A (2022) The Borda count with weak preferences. Econo Lett 210:110162
Keane MP (2011) Labor supply and taxes: a survey. J Econo Literat 49(4):961–1075
Krishna V, Morgan J (2015) Majority rule and utilitarian welfare. Am Econ J Microecon 7(4):339–75
Maskin E (2021) Arrow’s theorem, may’s axioms, and Borda’s rule. Working Paper
Meltzer AH, Richard SF (1981) A rational theory of the size of government. J Polit Econ 1981(89):914–927
Mirrlees J (1971) An exploration in the theory of optimal income taxation. Rev Econ Stud 38:175–208
Pearce D (2021) Individual and social welfare: a bayesian perspective. Working paper
Piketty T (1997) La redistribution fiscale face au chômage. Revue française d’économie 12(1):157–201
Rawls J (1971) A theory of justice. Harvard University Press, Cambridge
Roberts KWS (1977) Voting over income tax schedules. J Publ Econ 8(3):329–340
Röell A (2012) Voting over nonlinear income tax schedules. Unpublished manuscript. School of International and Public Affairs, Columbia University
Roemer John E (1998) Equality of opportunity. Harvard University Press, Cambridge
Roemer JE (2012) The political economy of income taxation under asymmetric information: the twotype case. SERIEs 3(1):181–199
Romer T (1975) Individual welfare, majority voting, and the properties of a linear income tax. J Publ Econ 4(2):163–185
Saez E (2001) Using elasticities to derive optimal income tax rates. Rev Econ Stud 68(1):205–229
Saez E, Slemrod J, Giertz SH (2012) The elasticity of taxable income with respect to marginal tax rates: a critical review. J Econ Literat 50(1):3–50
Saez E, Stantcheva S (2018) A simpler theory of optimal capital taxation. J Publ Econ 162(2018):120–142
Seade J (1982) On the sign of the optimum marginal income tax. Rev Econ Stud 49(4):637–643
Young HP (1974) An axiomatization of Borda’s rule. J Econ Theory 9(1):43–52
World inequality database. https://wid.world
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
None
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research utilised Queen Mary’s Apocrita HPC facility, supported by QMUL ResearchIT. https://doi.org/10.5281/zenodo.438045. I acknowledge the assistance of the ITS Research team at QMUL.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendices
Appendix: Selfish preferences
The analysis has made the implicit assumption that each individual’s preference over DMs is selfish, i.e., that it’s determined solely by each DM’s implications for the individual’s own consumptionlabour bundle. Although this is a nontrivial assumption, I believe it provides a reasonable normative benchmark for two reasons.
First, Hvidberg et al. (2021) find a strong positive relationship between people’s tolerance towards inequality and their own position in the income distribution. Thus, selfish preferences may be a reasonable approximation.
Second, even if people do care about others, it’s plausible that they consider the Borda count with selfish preferences as inputs to be procedurally fair, so that there is no need to bring in additional fairness considerations by feeding otherregarding preferences into the Borda count.
Appendix: Approximating B(Y, C)
I approximate B(Y, C) by:
where \(q_k\) denotes the \(k\text {th}\) element of \((0,10,\ldots ,90,95,99,99.9,99.99,1)\). This approximation effectively assumes that, for each \(1 \le k \le 14\), the preferences of all types between the \(q_k\text {th}\) and \(q_{k+1}\text {th}\) percentiles coincide with the preferences of the median type between these percentiles. To see this, note that B(Y, C) can be written as \(\sum _{k=1}^{14} \int _{w(q_k)}^{w(q_{k+1})} \Delta (Y,C,w) f(w) dw\). Replacing \(\Delta (Y,C,w)\) in the latter expression by \(\Delta (Y,C,w(0.5q_k+0.5q_{k+1}))\) yields (5).
Appendix: Distribution of types
I assume that the actual labourincome tax schedule is a 30 percent flat tax. Given this tax schedule, type w’s optimal pretax labour income is \(y^*(w)=0.7^\sigma w^{1+\sigma }\).
I use data from WID on pretax labour income for individuals over the age of 20 in the US in 2014.^{Footnote 31} In particular, I obtain from WID the data presented in Table 2.
I augment this data in two ways.^{Footnote 32} First, I assume that the lowest income equals $1.^{Footnote 33} Second, WID does not report the income of the highest earner. It does report that the \(99.999\text {th}\) income percentile equals $15,579,290 and the average income in the top 0.001 percent equals $32,134,644. I impute an income to the highest earner by assuming that this income and the \(99.999\text {th}\) income percentile are symmetrically situated around $32,134,644. That is, I assume that the highest earner has an income of $48,689,999. I make this assumption on simplicity grounds. Given that the top 0.001 of earners earned only 0.7 percent of all income, it is unlikely that this assumption is of much consequence.
Then, using \(y^*(\cdot )\) and the augmented WID income data, I back out the various type percentiles (i.e., the \(0\text {th}\) percentile, the \(100\text {th}\) percentile, and all the percentiles listed in Table 2). E.g., given that the \(5\text {th}\) income percentile equals 1264.5269, I infer that the \(5\text {th}\) type percentile is \(w(5) = y^{*1}(1264.5269) = 1264.5269^{\frac{1}{1+\sigma }} / 0.7^{\frac{\sigma }{1+\sigma }}\), where \(y^{*1}(\cdot )\) denotes the inverse of \(y^*(\cdot )\).
Finally, equipped with the various type percentiles, I specify the cumulative density function, F, of the distribution of types through linear interpolation. E.g., I assume that on [w(10), w(15)], \(F(w) = 0.1 + \frac{0.150.1}{w(15)w(10)} (ww(10))\).
Appendix: BO vs. utilitarianoptimal tax schedules
Are the BO tax schedules more or less progressive than utilitarianoptimal (UO) tax schedules? To address this question, I assume that the utilitarian planner solves
In choosing the objective function for the planner, I am following Saez (2001).^{Footnote 34} Note that, to aid comparability to the BO tax schedules, I require that the planner restrict attention to DMs in \({\mathcal {D}}\).
The UO UBI equals $11,965, $7,305, and $3,975 for \(\sigma = 0.25\), \(\sigma =0.5\), and \(\sigma =1\), respectively. Comparing these numbers to the ones in Table 1 reveals the following.
Finding 5
For \(\sigma \in \{0.25,0.5,1\}\), the BO UBI is lower than the UO UBI. The difference shrinks as \(\sigma\) increases on \(\{0.25,0.5,1\}\).
The top/middle/bottom panel in Fig. 3 shows the BO and UO marginal tax rates for \(\sigma =0.25\)/\(\sigma =0.5\)/\(\sigma =1\). The figure reveals the following.
Finding 6
For \(\sigma \in \{0.25,0.5,1\}\), at each level of income the BO marginal tax rate is weakly lower than the UO marginal tax rate.
Findings 5 and 6 suggest that low types fare better under the utilitarian criterion while high types fare better under the Borda count.
Unfortunately, I have low confidence in Findings 5 and 6 for the following reasons. First, in the numerical analysis based on condition (d1) instead of condition (d), the BO tax schedule for \(\sigma =0.5\) has, relative to the UO tax schedule for \(\sigma =0.5\), a slightly higher UBI and weakly higher marginal tax rates at each level of income. Second, at an earlier stage of the project I was using different finite discretisations of the set of feasible DMs.^{Footnote 35} Findings 5 and 6 were not robust to these different approaches.
Data on Lifetime Incomes in Guvenen et al. (2021)
Guvenen et al. (2021) have recently provided data on the distribution of pretax lifetime labour incomes. This data is also less than ideal for the purposes of the current paper. For example, it does not include the distribution of fringe benefits, income is computed at the individual level without any splitting within couples, and no information is provided on the distribution of income within the top 1 percent of earners.
Remarkably, the methodological differences in constructing the WID data and the Guvenen et al. data seem to largely offset so that the distribution (in terms of income shares) of annual pretax labour income in 2014 according to WID is very similar to the distribution (in terms of income shares) of lifetime pretax labour income (between the ages of 25 and 55) for the cohort that turned 25 in 1983 according to Guvenen et al.^{Footnote 36} To see this, consider Table 3. It juxtaposes the share of income earned by individuals falling between different income percentiles according to data from each of these two sources. In particular, the first column refers to the WID data and the second column is computed as an average from the last lines of Tables E.1 and E.2. in Guvenen et al.^{Footnote 37}
Appendix: Proofs
A consumption schedule is a function \(Z:[0,\infty ) \rightarrow \mathbb {R}\). Z(y) is the aftertax income of a person earning income y.
Type w’s problem given a consumption schedule Z is:
Z implements a DM (Y, C) if, for all \(w \in [{\underline{w}},{\overline{w}}]\), Y(w) solves problem (6) and \(C(w)=Z(Y(w))\).
For future use, let \({\mathcal {Y}}(w)\) denote the set of solutions to problem (6). Note that, by the maximum theorem, \({\mathcal {Y}}: [{\underline{w}},{\overline{w}}] \rightrightarrows [0,\infty )\) is an upper hemicontinuous correspondence with nonempty and compact values if Z is continuous and piecewise linear with finitely many pieces.^{Footnote 38}
Because it is more convenient to work with consumption schedules than with tax schedules, I will prove, instead of Proposition 1, the following claim which restates Proposition 1 in terms of a consumption schedule.
Claim 1
Suppose (Y, C) satisfies (a) and (c). Then, there exists a unique consumption schedule, Z, such that the following hold.

(i)
Z implements (Y, C).

(ii)
Z is continuous and piecewise linear with n pieces.

(iii)
For each \(i \in \{1,\ldots ,n\}\), \(1t_i\) is the slope of the \(i{\text {th}}\) piece of Z.

(iv)
If \(n \ge 2\), then, for each \(i \in \{2,\ldots ,n\}\) such that \(t_{i1} > t_i\), \(w_{i1}\) is the highest type that chooses a point on the \((i1){\text {st}}\) piece of Z.

(v)
If \(n \ge 2\), then, for each \(i \in \{2,\ldots ,n\}\) such that \(t_{i1} < t_i\), \(w_{i1}\) is the lowest type that chooses at the kink between the \((i1){\text {st}}\) and \(i{\text {th}}\) pieces of Z.^{Footnote 39}
Proof of Claim 1
Suppose (Y, C) satisfies conditions (a) and (c). Observe the following. To show that a consumption schedule, Z, implements (Y, C), it suffices to show that (i) for all \(w \in [{\underline{w}},{\overline{w}}]\), Y(w) solves problem (6) and (ii) \(Z(Y({\underline{w}})) = C({\underline{w}})\).^{Footnote 40}
Next, I prove by induction the existence of a consumption schedule satisfying (i)(v) in Claim 1. After that, I will turn to proving uniqueness.
Case \(n=1\):
Define Z by
It is straightforward to show that, for all \(w \in [{\underline{w}},{\overline{w}}]\), \(Y(w) = (1t_1)^\sigma w^{1+\sigma }\) satisfies the firstorder condition for problem (6).^{Footnote 41} Also, \(Z(Y({\underline{w}})) = C({\underline{w}})\) holds.
Case \(n=k1\) (where \(k \ge 2\) ):
Assume that Claim 1 holds for this case.
Case \(n=k\) (where \(k \ge 2\) ):
Define \(Y_{1}\) by
where (i) \(i \in \{1,\ldots ,k1\}\) and (ii) \({\tilde{w}}_0={\underline{w}}\), \({\tilde{w}}_i=w_i\) for all \(i \le k2\) and \({\tilde{w}}_{k1}={\overline{w}}\). Observe that \(Y_{1}\) is of the form (c) with \(n=k1\).^{Footnote 42} Also, observe that \(Y_{1}\) coincides with Y on \([{\underline{w}},w_{k1}]\) and \(Y_{1}(w)=(1t_{k1})^\sigma w^{1+\sigma }\) on \((w_{k1},{\overline{w}}]\).^{Footnote 43}
Also, let \(C_{1}: [{\underline{w}},{\overline{w}}] \rightarrow [0,\infty )\) be such that \(C_{1}({\underline{w}})=C({\underline{w}})\) and \((Y_{1},C_{1})\) satisfies incentive compatibility as in (1).
By the assumption in the “\(n=k1\)” case, there exists a consumption schedule, \(Z_{1}\), such that the following hold.

(i)
\(Z_{1}\) implements \((Y_{1},C_{1})\).

(ii)
\(Z_{1}\) is continuous and piecewise linear with \(k1\) pieces.

(iii)
For each \(i \in \{1,\ldots ,k1\}\), \(1t_i\) is the slope of the \(i{\text {th}}\) piece of \(Z_{1}\).

(iv)
If \(k1 \ge 2\), then, for each \(i \in \{2,\ldots ,k1\}\) such that \(t_{i1} > t_i\), \(w_{i1}\) is the highest type that chooses a point on the \((i1){\text {st}}\) piece of \(Z_{1}\).

(v)
If \(k1 \ge 2\), then, for each \(i \in \{2,\ldots ,k1\}\) such that \(t_{i1} < t_i\), \(w_{i1}\) is the lowest type that chooses at the kink between the \((i1){\text {st}}\) and \(i{\text {th}}\) pieces of \(Z_{1}\).
Define Z by
The value of K will depend on whether \(t_{k1} > t_k\) or \(t_{k1} < t_k\). Given that in either case \(K \ge Y({\underline{w}})\) will hold, we will have \(Z(Y({\underline{w}})) = Z_{1}(Y({\underline{w}})) = C_{1}({\underline{w}}) = C({\underline{w}})\).
Subcase \(t_{k1} > t_k:\)
Define K as follows. Referring to Fig. 4, consider the \((k1)\text {st}\) piece of \(Z_{1}\). Note that income level \(Y(w_{k1})\) lies below (the graph of) this piece.^{Footnote 44} Take the indifference curve of type \(w_{k1}\) through the point \((Y(w_{k1}),Z_{1}(Y(w_{k1})))\).^{Footnote 45} Compute K as the income level at which the \((k1)\text {st}\) piece of \(Z_{1}\) intersects a straight line that has slope \(1t_k\) and is tangent to the indifference curve.^{Footnote 46} Let \({\hat{y}}\) denote the income level where the line with slope \(1t_k\) is tangent to the indifference curve.
Because income \(Y(w_{k1})\) is optimal for type \(w_{k1}\) given \(Z_{1}\), it is obvious from the way Z was constructed that incomes \(Y(w_{k1})\) and \({\hat{y}}\) are optimal for type \(w_{k1}\) given Z. Thus, when faced with Z, all types below \(w_{k1}\) find it optimal to choose incomes weakly below \(Y(w_{k1})\) and all types above \(w_{k1}\) find it optimal to choose incomes weakly above \({\hat{y}}\).^{Footnote 47} Because (i) \(Y(w) \le Y(w_{k1})\) is optimal for all \(w \in [{\underline{w}},w_{k1}]\) given \(Z_{1}\) and (ii) Z and \(Z_{1}\) coincide over \([0,Y(w_{k1})]\), it must be that Y(w) is optimal for all \(w \in [{\underline{w}},w_{k1}]\) given Z. For \(w \in (w_{k1},{\overline{w}}]\), it is straightforward to show that the optimal income above \({\hat{y}}\) given Z is \(Y(w) = (1t_k)^\sigma w^{1+\sigma }\). Thus, Z implements (Y, C). Moreover, it should be clear that Z is continuous and piecewise linear with k pieces and satisfies (i)–(v) in Claim 1 with \(n=k\).
Subcase \(t_{k1} < t_k:\)
Set \(K=Y(w_{k1})\). Note that K is the location of the kink between the \((k1){\text {st}}\) and \(k{\text {th}}\) pieces of Z. This is clear when \(k=2\). Now assume \(k \ge 3\). Given that \(w_{k1}>w_{k2}\), Y is nondecreasing, and Y is strictly increasing on \((w_{k1}\delta ,w_{k1}]\) for some \(\delta >0\),^{Footnote 48} it must be that \(Y(w_{k1})\) is strictly higher than the income level at which the kink between the \((k2){\text {nd}}\) and \((k1){\text {st}}\) pieces of \(Z_{1}\) occurs.
Given that income \(Y(w_{k1})\) is optimal for type \(w_{k1}\) given \(Z_{1}\), it must be optimal given Z.^{Footnote 49} Thus, when faced with Z, all types below \(w_{k1}\) find it optimal to choose incomes weakly below \(Y(w_{k1})\) and all types above \(w_{k1}\) find it optimal to choose incomes weakly above \(Y(w_{k1})\).^{Footnote 50} Because (i) \(Y(w) \le Y(w_{k1})\) is optimal for all \(w \in [{\underline{w}},w_{k1}]\) given \(Z_{1}\) and (ii) Z and \(Z_{1}\) coincide over \([0,Y(w_{k1})]\), it must be that Y(w) is optimal for all \(w \in [{\underline{w}},w_{k1}]\) given Z.^{Footnote 51} For \(w \in (w_{k1},{\overline{w}}]\), it is straightforward to show that the optimal income above K given Z is
Thus, Z implements (Y, C). Moreover, it should be clear that Z is continuous and piecewise linear with k pieces and satisfies (i)(v) in Claim 1 with \(n=k\).
It remains to show uniqueness. Suppose \(Z'\) and \(Z''\) are consumption schedules such that (i)–(v) in Claim 1 hold (when applied to \(Z'\) and \(Z''\), respectively).
Let us make the following observations. First, for each \(i \in \{1,\ldots ,n\}\), the \(i\text {th}\) piece of \(Z'\) has the same slope as the \(i\text {th}\) piece of \(Z''\) (by (iii) in Claim 1). Second, we must have \(Z'(Y({\underline{w}}))=Z''(Y({\underline{w}}))=C({\underline{w}})\) (by (i) in Claim 1). Thus, if \(n=1\), we must have \(Z'=Z''\).
From here on, suppose \(n \ge 2\). Assume \(Z' \ne Z''\). The two observations in the previous paragraph and \(Z' \ne Z''\) imply that, for some \(i \in \{2,\ldots ,n\}\), the kink between the \((i1){\text {st}}\) and \(i{\text {th}}\) pieces of \(Z'\) occurs at a different income level than the kink between the \((i1){\text {st}}\) and \(i{\text {th}}\) pieces of \(Z''\). Let k be the lowest i for which this occurs. We need to consider two cases, \(t_{k1}>t_k\) and \(t_{k1}<t_k\).
First, suppose \(t_{k1}>t_k\). Then, \(w_{k1}\) must be the highest type that chooses a point on the \((k1){\text {st}}\) piece of both \(Z'\) and \(Z''\) (by (iv) in Claim 1). Moreover, each type \(w \in (w_{k1},w_k]\) chooses on the \(k{\text {th}}\) piece of \(Z'\) and \(Z''\).^{Footnote 52} By the fact that \({\mathcal {Y}}\) is upper hemicontinuous with nonempty and compact values, we have \(lim_{{\tilde{w}} \downarrow w_{k1}} Y({\tilde{w}}) \in {\mathcal {Y}}(w_{k1})\). Thus, it must also be optimal for type \(w_{k1}\) to choose on the \(k{\text {th}}\) piece of \(Z'\) and \(Z''\). That is, type \(w_{k1}\) is indifferent between choosing on the \((k1){\text {st}}\) and on the \(k{\text {th}}\) piece of \(Z'\) and is also indifferent between choosing on the \((k1){\text {st}}\) and on the \(k{\text {th}}\) piece of \(Z''\). But then, for \(Z'\) and \(Z''\), the kink between the \((k1){\text {st}}\) and \(k{\text {th}}\) pieces must occur at the same income level. We have reached a contradiction.
Next suppose, \(t_{k1}<t_k\). Then, \(w_{k1}\) chooses at the kink between the \((k1){\text {st}}\) and \(k{\text {th}}\) pieces of both \(Z'\) and \(Z''\) (by (v) in Claim 1). Hence, for both \(Z'\) and \(Z''\), this kink must occur at the same income level, namely \(Y(w_{k1})\). We have again reached a contradiction. \(\square\)
Proof of Proposition 2
Suppose that (i) (Y, C) is implemented by some continuous, piecewise linear consumption schedule, Z, with N pieces and (ii) if \(w={\underline{w}}\) or w is a jump point of Y, Y is strictly increasing on \((w,w+\delta )\) for some \(\delta >0\).^{Footnote 53} Note that types’ optimal consumptionincome choices under Z must be incentive compatible (because each type could have mimicked any other type’s consumptionincome choice), so that (Y, C) must satisfy (a). It remains to show that Y satisfies (c) almost everywhere. Let us begin with a few lemmas.
Lemma 1
If Y equals some constant y on \((w',w'')\), then Z has a kink at y.
Proof
Assume Z has no kink at y and take \(w_a\) and \(w_b\) such that \(w'<w_a<w_b<w''\). Given that Z must be linear in some neighbourhood of y, it must be that, for each type \(w_a\) and \(w_b\), its indifference curve in incomeconsumption space is tangent to Z at income level y.^{Footnote 54} This is impossible because two types’ indifference curves cannot have the same slope at the same income level. \(\square\)
Lemma 2
If w is a jump point of Y, then Z has a kink on \((lim_{{\tilde{w}} \uparrow w} Y({\tilde{w}}),lim_{{\tilde{w}} \downarrow w} Y({\tilde{w}}))\).^{Footnote 55}
Proof
Assume that Z exhibits no kinks on \((lim_{{\tilde{w}} \uparrow w} Y({\tilde{w}}),lim_{{\tilde{w}} \downarrow w} Y({\tilde{w}}))\). Then, given the strict convexity of type w’s indifference curves in incomeconsumption space, it is impossible for both income \(lim_{{\tilde{w}} \uparrow w} Y({\tilde{w}})\) and income \(lim_{{\tilde{w}} \downarrow w} Y({\tilde{w}})\) to be optimal for type w. On the other hand, by the fact that \({\mathcal {Y}}\) is upper hemicontinuous with nonempty and compact values, \(lim_{{\tilde{w}} \uparrow w} Y({\tilde{w}}) \in {\mathcal {Y}}(w)\) and \(lim_{{\tilde{w}} \downarrow w} Y({\tilde{w}}) \in {\mathcal {Y}}(w)\). We have reached a contradiction. \(\square\)
Lemma 3
Suppose Y is continuous and strictly increasing on \((w',w'')\). Then, Z is linear with strictly positive slope on \((\lim _{{\tilde{w}} \downarrow w'}Y({\tilde{w}}),\lim _{{\tilde{w}} \uparrow w''}Y({\tilde{w}}))\). Moreover, denoting this slope by \(1t\), we have \(Y(w) = (1t)^\sigma w^{1+\sigma }\) on \((w',w'')\).
Proof
Suppose Z has a kink at \(y \in (\lim _{{\tilde{w}} \downarrow w'}Y({\tilde{w}}),\lim _{{\tilde{w}} \uparrow w''}Y({\tilde{w}}))\). Let \(w \in (w',w'')\) be such that \(Y(w)=y\). Let \(z^\) and \(z^+\) denote the slopes of Z just to the left and just to the right, respectively, of y.
First suppose \(z^>z^+\). In that case, there must exist \(\delta >0\) such that, for all \(0< \epsilon < \delta\), type \((w\epsilon )\)’s indifference curve has slope \(z^\) at income \(Y(w\epsilon )\) and type \((w+\epsilon )\)’s indifference curve has slope \(z^+\) at income \(Y(w\epsilon )\),^{Footnote 56} i.e., \(\frac{Y(w\epsilon )^{1/\sigma }}{(w\epsilon )^{1+1/\sigma }} = z^ < z^+ = \frac{Y(w+\epsilon )^{1/\sigma }}{(w+\epsilon )^{1+1/\sigma }}\). However, taking the limit of the leftmost and rightmost terms in the last expression as \(\epsilon \downarrow 0\) yields \(\frac{Y(w)^{1/\sigma }}{w^{1+1/\sigma }} = z^ < z^+ = \frac{Y(w)^{1/\sigma }}{w^{1+1/\sigma }}\), a contradiction.
Next, suppose \(z^<z^+\). Then, given the smoothness of indifference curves in incomeconsumption space, either the piece of Z just to the left of y or the piece of Z just to the right of y would cut into the uppercontour set of type w’s indifference curve passing through (y, Z(y)). This contradicts \(Y(w)=y\) being optimal for type w.
Now suppose \(1t \le 0\). Then, for types in \((w',w'')\) earning more does not increase consumption, so that Y(w) must be flat on \((w',w'')\), a contradiction.
Finally, \(Y(w) = (1t)^\sigma w^{1+\sigma }\) on \((w',w'')\) follows immediately from the requirement that the indifference curve of type \(w \in (w',w'')\) in incomeconsumption space be tangent to the piece of Z over \((\lim _{{\tilde{w}} \downarrow w'}Y({\tilde{w}}),\lim _{{\tilde{w}} \uparrow w''}Y({\tilde{w}}))\). \(\square\)
The plan for the rest of the proof is to define \(w_0,w_1,\ldots ,w_n\), define \(t_0,t_1,\ldots ,t_n\), show that these \(w_i\)’s and \(t_i\)’s satisfy the requirements in condition (c), and show that Y must be of the form in expression (3) on each \((w_{i1},w_i)\).
Let us start by defining \(w_0,w_1,\ldots ,w_n\) recursively as follows. Let \(w_0={\underline{w}}\) and, given \(w_{i1}<{\overline{w}}\) (where \(i \ge 1\)), define \(w_i\) as follows. Let \(w_{i,1} = \min \{ w \in [{\underline{w}},{\overline{w}}]  w> w_{i1} \text { and } Y(w\epsilon )<Y(w+\epsilon ) \text { for all } \epsilon>0 \text { and } Y \text { is constant on } (w,w+\delta ) \text { for some } \delta >0 \}\) and \(w_{i,2} = \min \{ w \in [{\underline{w}},{\overline{w}}]  w > w_{i1} \text { and } w \text { is a jump point of } Y \}\), where I adopt the convention that the minimum of the empty set equals \(\infty\).^{Footnote 57} Let \(w_i = \min \{ w_{i,1},w_{i,2},{\overline{w}} \}\). That is, \(w_i\) is the lowest value of \(w \in [{\underline{w}},{\overline{w}}]\) strictly to the right of \(w_{i1}\) where either a flat segment of Y begins or Y jumps. If no such value exists, \(w_i={\overline{w}}\).
The \(w_i\)’s thus constructed satisfy the following requirements.
Lemma 4
\(w_0={\underline{w}}\), \(w_n={\overline{w}}\) for some \(n \le N\), and \(w_{i1} < w_i\) for all \(i \in \{1,\ldots ,n\}\).
Proof
The only nonobvious statement is that \(w_n={\overline{w}}\) for some \(n \le N\). Let us prove that.
Suppose \(N=1\) so that Z has a single piece. Then, Y cannot have flat segments or jumps (by Lemmas 1 and 2) so that \(w_{1,1}=w_{1,2} = \infty\) and \(w_1={\overline{w}}\).
Next, suppose \(N \ge 2\) and consider some i such that \(1 \le i<N\). If \(w_i={\overline{w}}\), then \(n=i<N\) and we are done. Assume \(w_i < {\overline{w}}\). If \(w_i\) is a jump point of Y, Z has a kink in \((lim_{{\tilde{w}} \uparrow w_i} Y({\tilde{w}}),lim_{{\tilde{w}} \downarrow w_i} Y({\tilde{w}}))\) (by Lemma 2). If \(w_i\) is where a flat segment of Y begins, Z has a kink at \(\lim _{{\tilde{w}} \downarrow w_i} Y({\tilde{w}})\) (by Lemma 1).^{Footnote 58} Thus, \(w_i\) “eats up” at least one kink of Z. Moreover, this has to be a new kink, one not “eaten up” by \(w_j\) for some \(1 \le j<i\). To see this last point, suppose j is such that \(1 \le j<i\), and consider the following exhaustive cases.

1.
If \(w_i\) is a jump point (i.e., \(w_i=w_{i,2}\)), Z must have a kink in \((lim_{{\tilde{w}} \uparrow w_i} Y({\tilde{w}}),lim_{{\tilde{w}} \downarrow w_i} Y({\tilde{w}}))\) as well as in \((lim_{{\tilde{w}} \uparrow w_j} Y({\tilde{w}}),lim_{{\tilde{w}} \downarrow w_j} Y({\tilde{w}})) \cup \{ \lim _{{\tilde{w}} \downarrow w_j} Y({\tilde{w}})\}\). Given that \(w_j<w_i\) and Y is nondecreasing, these sets are disjoint so the two kinks must be distinct.

2.
The case in which \(w_j\) is a jump point is analogous to the previous case.

3.
If both \(w_j\) and \(w_i\) are where a flat segment of Y begins (i.e., \(w_j=w_{j,1}\) and \(w_i=w_{i,1}\)), Z must have a kink at \(\lim _{{\tilde{w}} \downarrow w_j} Y({\tilde{w}})\) as well as at \(\lim _{{\tilde{w}} \downarrow w_i} Y({\tilde{w}})\). Given that \(w_j<w_i\) and Y is nondecreasing, the flat segment of Y starting at \(w_i\) must lie higher than the flat segment of Y starting at \(w_j\). Thus, we have \(\lim _{{\tilde{w}} \downarrow w_j} Y({\tilde{w}}) < \lim _{{\tilde{w}} \downarrow w_i} Y({\tilde{w}})\) so that the two kinks must be distinct.
Thus, for some \(n \le N\), \(w_1,\ldots ,w_{n1}\) must definitely have “eaten up” all \(N1\) kinks of Z. Then, by Lemmas 1 and 2, we must have \(w_{n,1}=w_{n,2}=\infty\) and, hence, \(w_n={\overline{w}}\). \(\square\)
Before we can define the \(t_i\)’s, we need the following lemma.
Lemma 5
For all \(i \in \{1,\ldots ,n\}\), the following hold.

(1)
Y is continuous on \((w_{i1},w_i)\).

(2)
For all \(w',w'',w''' \in (w_{i1},w_i)\) such that \(w'<w''<w'''\), \(Y(w')<Y(w'')\) implies \(Y(w'')<Y(w''')\).^{Footnote 59}

(3)
Y is nonconstant in any neighbourhood of \(w_{i1}\).
Proof
The lemma follows directly from the definition of \(w_0,w_1,\ldots ,w_n\) and the requirement in Proposition 2 that Y be strictly increasing near \({\underline{w}}\). \(\square\)
Let us proceed by defining \(t_0,t_1,\ldots ,t_n\) as follows. Let \(t_0 = 1\). For \(i \in \{1,\ldots ,n\}\), consider the following exhaustive cases: (i) Y is nonconstant on \((w_{i1},w_i)\) and (ii) \(Y(w)={\hat{y}}\) for all \(w \in (w_{i1},w_i)\).^{Footnote 60} In case (i), we know from Lemma 3 and part 2) of Lemma 5 that Z is linear with strictly positive slope on \((\lim _{{\tilde{w}} \downarrow {\hat{w}}}Y({\tilde{w}}),\lim _{{\tilde{w}} \uparrow w_i}Y({\tilde{w}}))\) for some \({\hat{w}}\) such that \(w_{i1} \le {\hat{w}} < w_i\). Set \(t_i\) to be such that \(1t_i\) equals this slope. In case (ii), define \(t_i\) by the equation \((1t_i)^\sigma w_i^{1+\sigma } = {\hat{y}}\).
The next lemma, taken in conjunction with Lemma 4, demonstrates that the \(w_i\)’s and \(t_i\)’s fulfil the requirements in condition (c).
Lemma 6

(1)
For all \(i \in \{1,\ldots ,n\}\), \(t_i<1\).

(2)
For all \(i \in \{1,\ldots ,n\}\), \(t_{i1} \ne t_i\).

(3)
For all \(i \in \{1,\ldots ,n\}\) such that \(t_{i1} < t_i\), we have \(\left( \frac{1t_{i1}}{1t_i} \right) ^\frac{\sigma }{1+\sigma } w_{i1} \le w_i\).

(4)
For all \(i \in \{1,\ldots ,n1\}\) such that \(t_{i1}< t_i < t_{i+1}\), we have \(\left( \frac{1t_{i1}}{1t_i} \right) ^\frac{\sigma }{1+\sigma } w_{i1} < w_i\).
Proof
Statement 1) is obvious. Statements 2)4) are obvious for \(i=1\). Let us take an arbitrary \(i \in \{2,\ldots ,n\}\) and let us consider the following exhaustive cases.

1.
Y is strictly increasing on \((w_{i1},w_i)\).
In this case, \(w_{i1}\) must be a jump point of Y (because it cannot be a point where a flat segment of Y starts). Also, by the way \(t_i\) was defined and Lemma 3, we have \(Y(w)=(1t_i)^\sigma w^{1+\sigma }\) on \((w_{i1},w_i)\). If \(Y(w) = {\hat{y}}\) on \((w_{i2},w_{i1})\), the following must hold: for some \(\epsilon >0\),^{Footnote 61}\({\hat{y}}+\epsilon = (1t_{i1})^\sigma w_{i1}^{1+\sigma } +\epsilon < (1t_i)^\sigma w^{1+\sigma }\) for w arbitrarily close to \(w_{i1}\). If Y is nonconstant on \((w_{i2},w_{i1})\), the following must hold: for some \(\epsilon >0\),^{Footnote 62}\((1t_{i1})^\sigma w_a^{1+\sigma } +\epsilon < (1t_i)^\sigma w_b^{1+\sigma }\) for \(w_a\) and \(w_b\) arbitrarily close to \(w_{i1}\). Thus, both when Y is constant on \((w_{i2},w_{i1})\) and when Y is nonconstant on \((w_{i2},w_{i1})\), we must have \(t_{i1}>t_i\).

2.
\(Y(w)={\hat{y}}\) on \((w_{i1},w_i)\).
In this case, \(w_{i1}\) cannot be a jump point of Y (by condition (ii) in Proposition 2). Hence, \(w_{i1}\) must be where a flat segment of Y begins so that Y must be strictly increasing just to the left of \(w_{i1}\). Thus, we must have \((1t_{i1})^\sigma w_{i1}^{1+\sigma } = {\hat{y}}\).^{Footnote 63} Also, from the definition of \(t_i\), \((1t_i)^\sigma w_i^{1+\sigma } = {\hat{y}}\). Thus, \(t_{i1} < t_i\) and \(\left( \frac{1t_{i1}}{1t_i} \right) ^\frac{\sigma }{1+\sigma } w_{i1} = w_i\).
Moreover, if \(i \le n1\), we must have \(t_i > t_{i+1}\). To see this, note that \(w_i\) must be a jump point of Y and, hence, Y must be strictly increasing just to the right of \(w_i\) (by condition (ii) in Proposition 2). Hence, \(Y(w) = (1t_{i+1})^\sigma w^{1+\sigma }\) on \((w_i,w_{i+1})\) and the following must hold: for some \(\epsilon >0\),^{Footnote 64}\({\hat{y}}+\epsilon = (1t_i)^\sigma w_i^{1+\sigma } +\epsilon < (1t_{i+1})^\sigma w^{1+\sigma }\) for w arbitrarily close to \(w_i\). Thus, we must have \(t_i > t_{i+1}\).

3.
For some \({\hat{w}}\) such that \(w_{i1}<{\hat{w}}<w_i\), \(Y(w)={\hat{y}}\) on \((w_{i1},{\hat{w}}]\) and Y is strictly increasing on \(({\hat{w}},w_i)\).
In this case, \(w_{i1}\) cannot be a jump point of Y (by condition (ii) in Proposition 2). Hence, \(w_{i1}\) must be where a flat segment of Y begins so that Y must be strictly increasing just to the left of \(w_{i1}\). Thus, we must have \((1t_{i1})^\sigma w_{i1}^{1+\sigma } = {\hat{y}}\).
Further, by the definition of \(t_i\) and Lemma 3, \(Y(w) = (1t_i)^\sigma w^{1+\sigma } > {\hat{y}}\) for all \(w \in ({\hat{w}},w_i)\). Thus, \(\left( \frac{1t_{i1}}{1t_i} \right) ^\frac{\sigma }{1+\sigma } w_{i1} < w_i\). Also, by the continuity of Y on \((w_{i1},w_i)\),^{Footnote 65} we must have \((1t_{i1})^\sigma w_{i1}^{1+\sigma } = (1t_i)^\sigma {\hat{w}}^{1+\sigma }\) so that \(t_{i1}<t_i\) and \({\hat{w}} = \left( \frac{1t_{i1}}{1t_i} \right) ^\frac{\sigma }{1+\sigma } w_{i1}\).
\(\square\)
Let us now turn to the functional form of Y on \((w_{i1},w_i)\) for each \(i \in \{ 1,\ldots ,n \}\). First, consider \(i=1\). Y must be strictly increasing on \(({\underline{w}},w_1)\) so that, by the definition of \(t_1\) and Lemma 3, \(Y(w) = (1t_1)^\sigma w^{1+\sigma }\). Also, \(t_0>t_1\). Thus, Y has the functional form (3) on \(({\underline{w}},w_1)\).
Next consider \(i \in \{ 2,\ldots ,n \}\). In the proof of Lemma 6, we have that

case 1 implies \(t_{i1}>t_i\),

case 2 implies \(t_{i1}<t_i\) and \(\left( \frac{1t_{i1}}{1t_i} \right) ^\frac{\sigma }{1+\sigma } w_{i1} = w_i\), and

case 3 implies \(t_{i1}<t_i\) and \(\left( \frac{1t_{i1}}{1t_i} \right) ^\frac{\sigma }{1+\sigma } w_{i1} < w_i\).
Thus, if \(t_{i1}>t_i\), case 1 in that proof applies and we must have \(Y(w)=(1t_i)^\sigma w^{1+\sigma }\) on \((w_{i1},w_i)\). If \(t_{i1}<t_i\) and \(\left( \frac{1t_{i1}}{1t_i} \right) ^\frac{\sigma }{1+\sigma } w_{i1} = w_i\), case 2 in that proof applies and \(Y(w) = (1t_{i1})^\sigma w_{i1}^{1+\sigma }\) on \((w_{i1},w_i)\). If \(t_{i1}<t_i\) and \(\left( \frac{1t_{i1}}{1t_i} \right) ^\frac{\sigma }{1+\sigma } w_{i1} < w_i\), case 3 in that proof applies and
on \((w_{i1},w_i)\).
The bottom line is that, for \(i \in \{ 1,\ldots ,n \}\), the functional form of Y on \((w_{i1},w_i)\) can be written as:
The penultimate equality holds because the expression after that equality just adds a redundant case. The last equality holds because the expression after that equality just combines the middle two cases as well as the last two cases from the previous expression. Finally, note that, given part 3) in Lemma 6, \(\left( \frac{1t_{i1}}{1t_i} \right) ^\frac{\sigma }{1+\sigma } w_{i1} \le w_i\) is guaranteed to hold if \(t_{i1} < t_i\) and can hence be dropped from the last expression above. \(\square\)
Appendix: Results based on Conditions (d1)–(d4)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ivanov, A. Bordaoptimal taxation of labour income. Soc Choice Welf (2022). https://doi.org/10.1007/s00355022014119
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00355022014119
JEL Classification
 D71
 H21
 H24