Borda-optimal taxation of labour income

I numerically compute Borda-optimal, i.e., optimal based on the Borda count as the normative criterion, labour-income tax schedules for the United States. I do so in the context of a Mirrlees-style model with quasilinear preferences and a constant elasticity of labour supply. Because the Borda count is defined for finitely many alternatives, the computations restrict attention to a finite subset of the set of continuous, piecewise linear tax schedules with (in the baseline analysis) four or fewer pieces.


Introduction
I numerically compute Borda-optimal (BO), i.e., optimal based on the Borda count as the normative criterion, labour-income tax schedules for the United States. I do so in the context of a Mirrlees-style model with quasilinear preferences and a constant elasticity of labour supply. I perform the computations separately for three different values of the elasticity of labour supply, .
A major challenge is that the Borda count is defined for finitely many alternatives whereas there are infinitely many possible tax schedules. To deal with this, I identify a subset of the feasible direct mechanisms (DMs) that (a) loosely speaking, corresponds to the set of continuous, piecewise linear tax schedules with N or fewer pieces and (b) lends itself to transparent, finite discretisations. Using N = 4 and one such discretisation in the baseline numerical analysis, I compute, for each value of , 1 3 the BO DM within the resulting finite set of DMs as well as the corresponding BO tax schedule.
The main findings in terms of the BO tax schedules are that (i) for each value of , all marginal rates are positive, (ii) depending on the value of , the marginal rate at the highest incomes may or may not be strictly higher than the marginal rates at all lower incomes, (iii) for each value of , average rates are nevertheless (possibly, weakly) increasing in income (to a close approximation), and (iv) this progressivity is attenuated as increases. These findings hold up well under a number of robustness checks that use alternative values of N and alternative discretisations.
The existing literature on optimal taxation is largely based on utilitarianism (Bentham 1789;Mirrlees 1971), Rawls' maxmin principle (Rawls 1971;Piketty 1997), or equality of opportunity (Roemer (1998); Fleurbaey (2008)). Although these normative approaches have their appeal, they also have two important limitations. First, they seem disconnected from the idea of democracy. This is awkward given the broad consensus in many countries that public policy should be determined through a democratic process. 1 Second, excluding some notions of equality of opportunity, the implementation of these approaches requires taking a stand on nonordinal properties of utility.
My findings (i)-(iv) above are in line with well-known findings in this literature. For example, Seade (1982) shows theoretically that, in a Mirrlees-style model with a utilitarian criterion, the optimal tax schedule must be strictly increasing (in line with finding (i)). Also, using numerical analysis in a Mirrlees-style model with a utilitarian and a Rawlsian criterion, Saez (2001) obtains marginal rates at high incomes that are lower than marginal rates at low incomes (in line with finding (ii)). 2 What is novel in my paper is that findings (i)-(iv) have been derived based on a different normative foundation.
An alternative, normatively appealing approach to optimal taxation is to use majority rule. Unfortunately, for general sets of tax schedules, a Condorcet winner is not guaranteed to exist. However, if we restrict attention to linear tax schedules, a Condorcet winner does exist under some assumptions (Roberts (1977)). In these settings, three key findings regarding the linear tax schedule selected by majority rule are that (a) under plausible assumptions, the marginal rate is positive, (b) the intercept can be positive, so that the average rate can be decreasing, and (c) the marginal rate is increasing in the ratio between mean and median income (at least when government consumption is zero). 3 The current paper differs from this literature in that it uses a different normative criterion and considers more flexible tax schedules. 1 In some voting models, there is a connection between utilitarianism and majority rule. (See Krishna and Morgan (2015) and the references therein.) However, in these models, voters choose between only two policies. (These two policies may have been endogenously selected from a larger set of policies at a pre-voting stage in which candidates strategically decide on which policies to run. However, such a strategic pre-voting stage is hardly a key component of the normative ideal of democracy.) 2 I believe that Saez's findings are also in line with findings (iii) and (iv), though it's hard to be sure based on the information provided in his paper. 3 For (a) and (b), see Romer (1975) and Roberts (1977). For (c), see Meltzer and Richard (1981).

3
Borda-optimal taxation of labour income The Borda count has several important advantages as a normative criterion. First, it has been characterised in terms of normatively appealing axioms (Young (1974), Maskin (2021)). 4 Second, preference aggregation seems central to the idea of democracy. Third, the Borda count can be implemented without going beyond ordinal utility.
Of course, the Borda count also has limitations. First, it is defined for finitely many alternatives and the results could be sensitive to the discretisation employed. Second, although the Borda count exhibits some sensitivity to the intensity of preferences between any two alternatives by taking into account the number of alternatives ranked inbetween by each individual, policy-makers may wish to be more sensitive to preference intensities (e.g., based on introspection or individuals' verbal reports).
There is also a literature that studies labour-income taxation in various descriptive (as opposed to normative) political economy models. For example, Röell (2012), Brett andWeymark (2017), andDe Donder andHindriks (2003) study labour-income taxation in a two-step model: at the first step, each individual proposes a tax schedule that is selfishly-optimal for her; at the second step, majority rule is applied to the proposed tax schedules (on which a Condorcet winner exists under some assumptions). Chen (2000), Carbonell-Nicolau and Efe (2007), Roemer (2012), and Bierbrauer and Boyer (2016) consider models of political competition in which politicians choose tax policies on which to run for office. Bierbrauer et al. (2021) characterise when a monotonic labour-income tax reform (i.e., a reform such that the change in the tax burden is a monotonic function of income) is politically feasible in the sense that it is preferred by a majority over the status quo. 5

Preferences and productivities
Individuals have preferences over consumption c ≥ 0 and labour l ≥ 0 represented by the utility function c − 1+ l 1+ , where > 0 is the (Hicksian and Marshallian) elasticity of labour supply. Each individual has a productivity (or type) which is her private information. When type w puts in labour l, she earns pre-tax income wl. The set of types is [w, w] , where 0 < w < w . Types are distributed according to the probability density function f which has full support on [w, w]. 4 It is well-known that the Borda count violates Arrow's independence of irrelevant alternatives (IIA) (Arrow 1951). However, Maskin (2021) argues that IIA is too stringent and shows that the Borda count satisfies a normatively appealing weakening of IIA. Pearce (2021) also argues forcefully against IIA. 5 They also characterise when a small perturbation in the status-quo tax schedule is both politically feasible and welfare-improving under (weighted) utilitarianism. Thus, their paper connects to the normative literature based on utilitarianism mentioned above (a key difference being that they consider incremental reforms rather than globally optimal tax schedules).

Feasible DMs
Given the revelation principle, we can restrict attention to DMs. A DM is a tuple . Y(w) and C(w) are the income and the consumption, respectively, assigned to an individual reporting to be of type w.
A DM is feasible if the following conditions hold.
(a) Incentive compatibility: Y is nondecreasing and, for all w ∈ [w, w] , (b) Government budget constraint: where R ≥ 0 is the exogenously given government consumption per capita. 6

A finite subset of the feasible DMs
Because the Borda count is defined for a finite set of alternatives, it is necessary to restrict attention to a finite subset of the feasible DMs. To this end, I augment conditions (a) and (b) with two further conditions, the first one being the following.
(c) Y is of the form: Borda-optimal taxation of labour income such that t i−1 < t i , and (v) The following proposition shows that a DM satisfying (a) and (c) can be interpreted in terms of a corresponding tax schedule. 7 Proposition 1 Suppose (Y, C) satisfies (a) and (c). Then, there exists a unique tax schedule, T, such that the following hold.
(ii) T is continuous and piecewise linear with n pieces.
chooses at the kink between the (i − 1)st and ith pieces of T. 9 Thus, given (Y, C) satisfying (a) and (c), t i ( i = 1, … , n ) is the marginal rate on the ith piece of the corresponding tax schedule, T, and w i−1 ( i = 2, … , n ) is the threshold type where types switch to locating on the ith piece of T.
The next proposition provides a kind of converse of Proposition 1.

Proposition 2
Suppose that (i) (Y, C) is implemented by some continuous, piecewise linear tax schedule with N pieces and (ii) if w = w or w is a jump point of Y, Y is strictly increasing on (w, w + ) for some > 0. Then (Y, C) satisfies (a) and Y satisfies (c) almost everywhere for some n ≤ N.
Condition (ii) seems weak: it applies to at most N, arbitrarily narrow intervals 10 on each of which it, moreover, allows Y to be arbitrarily close to constant. Thus, abstracting from what seem like technical details, Propositions 1 and 2 tell us that a DM satisfies (a) and (c) for some n ≤ N if and only if it is implemented by a continuous, piecewise linear tax schedule with N or fewer pieces.
Letting w(p) denote the pth type percentile, the next condition provides a finite, numerically tractable discretisation of the set of Y functions satisfying (c). 7 A tax schedule is a function T ∶ [0, ∞) → ℝ . T(y) is the tax owed by an individual earning income y. T and C(w) = Y(w) − T(Y(w)). 8 I'm counting the pieces (in the graph) of T from left to right. 9 All proofs are in the appendix. 10 As made clear in the proof, condition (i) ensures that Y has at most N − 1 jump points.
(d) n ≤ 4 . Given n, t i ∈ {−2. − 1.5, −1, −.8, −.6, −.4, −.2, 0, .1, .2, .3, .4, .5, .6, .7, .8, .9} for all i ∈ {1, … , n} and w i ∈ {w (10) Thus, the discretisation in (d) in effect restricts attention to continuous, piecewise linear tax schedules with four or fewer pieces such that (i) the marginal rate on any of the pieces lies on the given grid for the t i 's and (ii) threshold types lie on the given grid for the w i 's (e.g., tax schedules such that types just below the 45th percentile choose on the second piece and types just above the 45th percentile choose on the third piece are ruled out). I have somewhat arbitrarily truncated marginal tax rates at −2 from below, noting that even lower marginal tax rates could probably only apply to a small fraction of the population if they are to be feasible. From here on, I restrict attention to the set of DMs satisfying (a)-(d). Let D denote this set. Because Y pins down C through constraints (1) and (2), D corresponds to the set of Y functions such that (c) holds, (d) holds, and C(w) obtained after plugging in for C(w) from (1) into (2) is nonnegative. 11 Before proceeding, let us consider the following question: Why look for a BO DM in D rather than for a BO continuous, piecewise linear tax schedule with four or fewer pieces? There are three disadvantages to the latter approach. First, to discretise the set of continuous, piecewise linear tax schedules with four or fewer pieces, one would need to choose the grid of income levels at which the kinks can be located. However, it is not obvious how to do that. In contrast, the grid for the w i 's in condition (d) seems transparent and natural. Second, one would need to solve each type's labour-supply optimisation problem given each tax schedule, and this is likely to considerably slow down the numerical calculations. Third, there can be multiple continuous, piecewise linear tax schedules with four or fewer pieces implementing the same DM and one would need to eliminate such duplicates before applying the Borda count. 12 However, duplicate tax schedules may be tricky to identify as they may incorrectly appear to implement slightly different DMs due to imperfect numerical precision.

3
Borda-optimal taxation of labour income

The Borda count
Given (Y, C) ∈ D , let Δ(Y, C, w) denote the number of DMs in D that are strictly worse than (Y, C) according to type w minus the number of DMs in D that are strictly better than (Y, C) according to type w. 13 The Borda count of (Y, C) is: 14 , 15 Note that evaluating B(Y, C) requires computing all types' rankings over D , which is numerically infeasible. Therefore, to obtain my numerical results, I approximate the integral in (4) based on the rankings of a finite set of "representative" types. The main idea is to approximate Δ(Y, C, ⋅) via a step function by (i) partitioning [w, w] into 14 subintervals and (ii) replacing Δ(Y, C, ⋅) over each subinterval with Δ(Y, C, w m ) , where w m is the median (i.e., "representative") type in that subinterval. 16 I will refer to a DM maximising the approximation of the integral in (4) as "BO" even though, strictly speaking, it's only BO if it maximises the actual integral in (4).

Elasticity of labour supply
Given the considerable controversy in the literature on the elasticity of labour supply, 17 I will perform the analysis separately for ∈ {0.25, 0.5, 1} . In choosing these values, I am following Saez and Stantcheva (2018).

Distribution of types
The main idea for calibrating the distribution of types goes as follows. First, I assume that the actual labour-income tax schedule is linear with a 30 percent marginal tax rate. Given this tax schedule, type w's optimal pretax labour income is 16 The details are in the appendix. 17 Keane (2011) and Saez et al. (2012) provide surveys of this literature. 13 Type w's ranking over D is based on the indirect utility function w (Y, Note that this implicitly assumes that each individual's preference over DMs is selfish. This assumption is discussed further in the appendix. 14 I assume the integral in (4) exists. 15 The Borda count in (4) generalizes the usual Borda count to the case in which individuals can exhibit indifference between alternatives (which is the relevant case in the current context). Note that Maskin (2021) assumes that individuals' preferences over alternatives are strict. However, as Ivanov (2022) shows, the Borda count in (4) satisfies (extensions to the case of weak preferences of) the axioms in Maskin (2021) (as well as an additional normatively appealing axiom). y * (w) = 0.7 w 1+ . Second, I back out the distribution of types based on y * (⋅) and data from the World Inequality Database (WID) on the empirical distribution of pretax labour income for individuals over age 20 in the US in 2014. 18

Government consumption per capita
According to WID, US national income per individual over age 20 in 2014 was $ 65,192. 19 According to Piketty, Saez and Zucman (2018), total (i.e., federal, state, and local) government consumption in the US has been around 18 percent of national income since the end of World War II. Thus, I set R = 65, 192 × 0.18 ≈ 11, 735 . This calculation assumes that government consumption must be financed entirely from labour income taxation, which seems like the natural theoretical benchmark based on Atkinson and Stiglitz (1976). 20

Main results
For each ∈ {0.25, 0.5, 1} , I compute the (as it turns out, unique) BO DM and the corresponding (in the sense of Proposition 1) BO tax schedule. 21 The main features of the BO tax schedules are presented in Table 1 as well as in Figs. 1 and 2. Table 1 shows, for each value of , the BO Universal Basic Income (UBI), i.e., the negative of the intercept of the BO tax schedule. Figure 1 (Fig. 2) depicts, for each value of , the BO marginal (average, respectively) tax rate as a function of income.
The first finding is the following.
In particular, there is no equivalent to the the Earned Income Tax Credit at low incomes.
The next finding is perhaps at odds with what is often taken for granted in popular discourse.
Finding 2 For = 0.25, the BO marginal tax rate at the highest incomes is strictly higher than the BO marginal tax rates at all lower incomes. However, this is not true for = 1. 22 18 Section 6 discusses some important aspects of the WID data. The details of how I back out the distribution of types are in the appendix. 19 All dollar amounts in the paper are in 2014 dollars. 20 As a robustness check, I redid the calculations with R = 0 . Findings 1, 2, and 4 below continue to hold. Finding 3 also continues to hold except that, for = 1 , the BO average tax rate is strictly, rather than weakly, increasing in income. 21 The computations were done in Mathematica 12. The code is provided in separate files (refer https:// doi. org/ 10. 1007/ s00355-022-01411-9). 22 It is also not true for = 0.5 , but this doesn't survive all robustness checks in section 5.3 below as well as the robustness check mentioned in footnote 20.
Nevertheless, because of the UBI and marginal rates that don't decrease sufficiently with income, the BO tax schedule is (possibly, weakly) progressive in terms of average rates.

Finding 3 The BO average tax rate is strictly increasing in income for
∈ {0.25, 0.5} and, to a close approximation, weakly increasing in income for = 1.
Furthermore, the following holds.
Finding 4 For any incomes y 1 and y 2 such that 0 < y 1 < y 2 < 925653, the difference between the BO average tax rate at y 2 and at y 1 is strictly decreasing in on {0.25, 0.5, 1}. 23 Thus, the progressivity of the BO tax schedule is decreasing in , at least at the income levels that are relevant for the vast majority of the population. 24 This occurs because (i) the BO UBI falls substantially as increases and (ii) abstracting from some minor exceptions at low incomes, at any income level the BO marginal tax rate weakly decreases as increases on {0.25, 0.5, 1} . For = 1 , the progressivity is attenuated to the point that the average tax rate is approximately flat for a wide range of incomes (for incomes between $32,878 and $925,653, to be precise). 25

Robustness checks
I explore the robustness of Findings 1-4 to the discretisation in condition (d) by redoing the numerical analysis for each of the following variations of that condition.  (95), 23 To establish this, I compute, for each ∈ {0.25, 0.5, 1} , the derivative of the BO average tax rate with respect to income. Denoting this derivative at income y by a(y, ) , I obtain that a(y, 0.25) > a(y, 0.5) > a(y, 1) for almost all y ∈ (0, 925653) . The finding follows because the BO average tax rate is an absolutely continuous function of income so that the increase in the average tax rate over [y 1 , y 2 ] equals ∫ y 2 y 1 a(y, )dy. 24 For ∈ {0.25, 0.5, 1} , around 99.9 percent of the population choose an income below $925,653 when faced with the BO tax schedule. 25 One may ask: Are the BO tax schedules more or less progressive than utilitarian-optimal ones? In the appendix, I address this question (without reaching any firm conclusions).
The discretisations in (d1) and (d2) are coarsenings of the discretisation in (d). Relative to (d), (d3) coarsens the grid for the t i 's, but allows for tax schedules with five pieces. The discretisation in (d4) is quite different in that the w i 's and t i 's are drawn randomly.
Findings 1-4 hold up well under (d1)-(d4). 28 In particular, Finding 1 continues to hold across the board.  26 The cases n = 1 and n = 2 are also somewhat covered under (d4) because w 1 can be arbitrarily close to w 2 , t 1 can be arbitrarily close to t 2 , and t 2 can be arbitrarily close to t 3 . 27 Recall that the Borda count in (4)  Finding 2 also continues to hold under (d1), (d2), and (d4). Under (d3), the BO marginal tax rate at the highest incomes is not strictly higher than the BO marginal tax rates at all lower incomes for = 0.25 either. However, this is only because of high BO marginal rates over the narrow income intervals [0, 4505] and [25406,32510].
Finding 3 continues to hold with the following exceptions. Under (d2), the BO average tax rate for = 1 modestly declines from 0.355 to 0.27 between incomes $60,699 and $134,115. Under (d3), the BO average tax rate for = 1 modestly declines from 0.343 to 0.249 between incomes $46,434 and $134,115. Given the flatness of the BO average tax rate over these income ranges under (d) and the coarseness of the grids for the t i 's under (d2) and (d3), these exceptions seem minor.
Finally, the results under (d1)-(d4) are roughly in line with Finding 4 and Fig. 2 in the sense that, under each of these conditions, the BO average-rate schedule rotates clockwise as increases. Having said this, there are some instances in which the BO average-rate schedule over a particular income range is not flatter for a higher value of . 29

Comments on the WID data
A few comments regarding the WID data on pretax labour income are in order. First, this data is based on all individuals over age 20 and it counts income from public and private pensions as labour income. This is not ideal for the purpose of backing out productivities because the relationship between pension income and productivity is probably different from the relationship between a working-age individual's labour income and productivity.
Second, income is split equally within couples, which forces us to treat spouses as having the same productivity. This seems preferable for the purposes of the current paper because it ensures that the same preference over tax schedules is imputed to both spouses.
Third, although using cross-sectional data on the distribution of annual income to back out productivities is common (e.g., see Saez (2001)), this probably leads us to exaggerate the dispersion in lifetime productivities. The latter are probably more relevant if we are concerned with the design of a long-term tax system. 30

Concluding remarks
This paper is an attempt to apply the idea of democracy, as embodied in the Borda count, to the optimal taxation of labour income. Undoubtedly, the analysis has important limitations. Notably, it relies on (i) a simple, static model of labour supply with quasi-linear preferences and a constant elasticity of labour supply, (ii) finite discretisations of the set of feasible DMs, and (iii) imperfect data on pretax labour income. For these reasons, Findings 1-4 focused on qualitative aspects of the BO tax schedules and, even so, I view these findings as no more than indicative. More broadly, I hope the current paper will encourage research on BO public policies.

Appendix: Selfish preferences
The analysis has made the implicit assumption that each individual's preference over DMs is selfish, i.e., that it's determined solely by each DM's implications for the individual's own consumption-labour bundle. Although this is a nontrivial assumption, I believe it provides a reasonable normative benchmark for two reasons.
First, Hvidberg et al. (2021) find a strong positive relationship between people's tolerance towards inequality and their own position in the income distribution. Thus, selfish preferences may be a reasonable approximation.
30 Guvenen et al. (2021) have recently provided data on the distribution of lifetime labour incomes. This data is also not ideal for the purposes of the current paper. Remarkably, in the WID data and the Guvenen et al. data, the distribution of income across the population is very similar. I elaborate on these points in the appendix.
Second, even if people do care about others, it's plausible that they consider the Borda count with selfish preferences as inputs to be procedurally fair, so that there is no need to bring in additional fairness considerations by feeding other-regarding preferences into the Borda count.

Appendix: Approximating B(Y, C)
I approximate B(Y, C) by: where q k denotes the kth element of (0, 10, … , 90, 95, 99, 99.9, 99.99, 1) . This approximation effectively assumes that, for each 1 ≤ k ≤ 14 , the preferences of all types between the q k th and q k+1 th percentiles coincide with the preferences of the median type between these percentiles. To see this, note that B(Y, C) can be written as C, w) in the latter expression by Δ(Y, C, w(0.5q k + 0.5q k+1 )) yields (5).

Appendix: Distribution of types
I assume that the actual labour-income tax schedule is a 30 percent flat tax. Given this tax schedule, type w's optimal pretax labour income is y * (w) = 0.7 w 1+ .
I use data from WID on pretax labour income for individuals over the age of 20 in the US in 2014. 31 In particular, I obtain from WID the data presented in Table 2.
I augment this data in two ways. 32 First, I assume that the lowest income equals $1. 33 Second, WID does not report the income of the highest earner. It does report that the 99.999th income percentile equals $15,579,290 and the average income in the top 0.001 percent equals $32,134,644. I impute an income to the highest earner by assuming that this income and the 99.999th income percentile are symmetrically situated around $32,134,644. That is, I assume that the highest earner has an income of $48,689,999. I make this assumption on simplicity grounds. Given that the top 0.001 of earners earned only 0.7 percent of all income, it is unlikely that this assumption is of much consequence.
31 WID defines pretax labour income as the sum of all pretax personal income flows accruing to the individual owners of labor as a production factor, before taking into account the operation of the tax/transfer system, but after taking into account the operation of the pension system. The base unit is the individual (rather than the household) but resources are split equally within couples. 32 For brevity, in the rest of this section I will write "income" although in fact I mean "pretax labour income" 33 WID reports a negative 0th income percentile. (I believe this is largely due to the partial imputation of the losses of privately owned businesses to labour income.) However, this is not consistent with the assumption w > 0.

Appendix: BO vs. utilitarian-optimal tax schedules
Are the BO tax schedules more or less progressive than utilitarian-optimal (UO) tax schedules? To address this question, I assume that the utilitarian planner solves In choosing the objective function for the planner, I am following Saez (2001). 34 Note that, to aid comparability to the BO tax schedules, I require that the planner restrict attention to DMs in D.

The top/middle/bottom panel in
Finding 6 For ∈ {0.25, 0.5, 1}, at each level of income the BO marginal tax rate is weakly lower than the UO marginal tax rate.
Findings 5 and 6 suggest that low types fare better under the utilitarian criterion while high types fare better under the Borda count.
Unfortunately, I have low confidence in Findings 5 and 6 for the following reasons. First, in the numerical analysis based on condition (d1) instead of condition (d), the BO tax schedule for = 0.5 has, relative to the UO tax schedule for = 0.5 , a slightly higher UBI and weakly higher marginal tax rates at each level of income. Second, at 34 As is typical in the utilitarian approach, I (and Saez) offer no justification for the choice of a particular utility representation of each individual's preferences.

3
Borda-optimal taxation of labour income an earlier stage of the project I was using different finite discretisations of the set of feasible DMs. 35 Findings 5 and 6 were not robust to these different approaches.

Data on Lifetime Incomes in Guvenen et al. (2021)
Guvenen et al. (2021) have recently provided data on the distribution of pretax lifetime labour incomes. This data is also less than ideal for the purposes of the current paper. For example, it does not include the distribution of fringe benefits, income is computed at the individual level without any splitting within couples, and no information is provided on the distribution of income within the top 1 percent of earners.
Remarkably, the methodological differences in constructing the WID data and the Guvenen et al. data seem to largely offset so that the distribution (in terms of income shares) of annual pretax labour income in 2014 according to WID is very similar to the distribution (in terms of income shares) of lifetime pretax labour income (between the ages of 25 and 55) for the cohort that turned 25 in 1983 according to Guvenen et al. 36 To see this, consider Table 3. It juxtaposes the share of income earned by individuals falling between different income percentiles according to data from each of these two sources. In particular, the first column refers to the WID data In particular, I was either assuming that Y(w) is continuous and piecewise linear in w or I was assuming that Y(w)/w is continuous and piecewise linear in w. These approaches involved various ad hoc assumptions and were sensitive to changes in these assumptions. Thus, I abanodoned them when I came up with the finite discretisation of the set of feasible DMs based on conditions (c) and (d). 36 This cohort is the most recent cohort for which data for the whole period between the ages of 25 and 55 is available.

Appendix: Proofs
A consumption schedule is a function Z ∶ [0, ∞) → ℝ . Z(y) is the after-tax income of a person earning income y. Type w's problem given a consumption schedule Z is: (6) and For future use, let Y(w) denote the set of solutions to problem (6). Note that, by the maximum theorem, Y ∶ [w, w] ⇉ [0, ∞) is an upper hemicontinuous correspondence with nonempty and compact values if Z is continuous and piecewise linear with finitely many pieces. 38 Because it is more convenient to work with consumption schedules than with tax schedules, I will prove, instead of Proposition 1, the following claim which restates Proposition 1 in terms of a consumption schedule. If I understand correctly, these two tables display the same information based on different samples from the same data. Also, these tables (like most of the analysis in that paper) restrict attention to individuals who have had sufficient attachment to the labour market and have been employed in certain sectors. However, comparing data on the distribution of income for this narrower subset of the population (see the last six lines in Table C.12 in Guvenen et al.) and for the whole population (see Table F.2 in Guvenen et al.) reveals that the distribution of earnings in the narrower subset and in the whole population are quite similar. 38 To apply the maximum theorem, we need the constraint set in problem (6) to be compact. Let y denote an income level such that (i) it is strictly higher than the income level where the last kink in Z occurs and (ii) type w 's indifference curves in income-consumption space at income y are steeper than the last piece of Z. Because no type would choose an income level above y , the constraint y ≥ 0 in problem (6) can be replaced by the constraint 0 ≤ y ≤ y.

Claim 1 Suppose (Y, C) satisfies (a) and (c). Then, there exists a unique consumption schedule, Z, such that the following hold.
(i) Z implements (Y, C).
(ii) Z is continuous and piecewise linear with n pieces.  (6) and (ii) Z(Y(w)) = C(w). 40 Next, I prove by induction the existence of a consumption schedule satisfying (i)-(v) in Claim 1. After that, I will turn to proving uniqueness.
39 I haven't required that Z(y) ≥ 0 for all y ≥ 0 . Indeed, the unique Z in the claim may be such that Z(y) < 0 for some 0 ≤ y < Y(w) . (Analogously, I haven't required that T(y) ≤ y for all y ≥ 0 , and the unique T in Proposition 1 may be such that T(y) > y for some 0 ≤ y < Y(w) .) This isn't a problem in practice given that, in the calibrated distribution of types, w is very close to zero so that, even for very low values of t 1 , Y(w) is very close to zero. 40 The fact that types' optimal consumption-income choices under Z must be incentive compatible (because each type could have mimicked any other type's consumption-income choice), (i), and (ii) imply that, for all w ∈ [w, w], . 41 Given the concavity in y of the maximand in problem (6), the first-order condition is sufficient. 43 Also, let C −1 ∶ [w, w] → [0, ∞) be such that C −1 (w) = C(w) and (Y −1 , C −1 ) satisfies incentive compatibility as in (1).
By the assumption in the " n = k − 1 " case, there exists a consumption schedule, Z −1 , such that the following hold.
type that chooses at the kink between the (i − 1)st and ith pieces of Z −1 .
Define Z by The value of K will depend on whether t k−1 > t k or t k−1 < t k . Given that in either case K ≥ Y(w) will hold, we will have Z(Y(w)) = Z −1 (Y(w)) = C −1 (w) = C(w).
. To see that the rest of the statement is true, note that, for w ∈ (w k−2 , w] , we have Define K as follows. Referring to Fig. 4, consider the (k − 1)st piece of Z −1 . Note that income level Y(w k−1 ) lies below (the graph of) this piece. 44 Take the indifference curve of type w k−1 through the point (Y(w k−1 ), Z −1 (Y(w k−1 ))). 45 Compute K as the income level at which the (k − 1)st piece of Z −1 intersects a straight line that has slope 1 − t k and is tangent to the indifference curve. 46 Let ŷ denote the income level where the line with slope 1 − t k is tangent to the indifference curve.
Because income Y(w k−1 ) is optimal for type w k−1 given Z −1 , it is obvious from the way Z was constructed that incomes Y(w k−1 ) and ŷ are optimal for type w k−1 given Z. Thus, when faced with Z, all types below w k−1 find it optimal to choose incomes weakly below Y(w k−1 ) and all types above w k−1 find it optimal to choose incomes weakly above ŷ. 47 Because (i) Y(w) ≤ Y(w k−1 ) is optimal for all w ∈ [w, w k−1 ] given Z −1 and (ii) Z and Z −1 coincide over [0, Y(w k−1 )] , it must be that Y(w) is optimal for all w ∈ [w, w k−1 ] given Z. For w ∈ (w k−1 , w] , it is straightforward to show that the optimal income above ŷ given Z is Y(w) = (1 − t k ) w 1+ . Thus, Z implements (Y, C). Moreover, it should be clear that Z is continuous and piecewise linear with k pieces and satisfies (i)-(v) in Claim 1 with n = k.
Subcase t k−1 < t k ∶ Set K = Y(w k−1 ) . Note that K is the location of the kink between the (k − 1)st and kth pieces of Z. This is clear when k = 2 . Now assume k ≥ 3 . Given that w k−1 > w k−2 , Y is nondecreasing, and Y is strictly increasing on (w k−1 − , w k−1 ] for some > 0, 48 it must be that Y(w k−1 ) is strictly higher than the income level at which the kink between the (k − 2)nd and (k − 1)st pieces of Z −1 occurs.
Given that income Y(w k−1 ) is optimal for type w k−1 given Z −1 , it must be optimal given Z. 49 Thus, when faced with Z, all types below w k−1 find it optimal to choose incomes weakly below Y(w k−1 ) and all types above w k−1 find it optimal to choose incomes weakly above Y(w k−1 ). 50 Because (i) Y(w) ≤ Y(w k−1 ) is optimal for all w ∈ [w, w k−1 ] given Z −1 and (ii) Z and Z −1 coincide over [0, Y(w k−1 )] , it must be that Y(w) is optimal for all w ∈ [w, w k−1 ] given Z. 51 For w ∈ (w k−1 , w] , it is straightforward to show that the optimal income above K given Z is satisfies the single-crossing property. 48 The only way for Y to be flat immediately to the left of w k−1 is if 49 This follows because (Y(w k−1 ), Z(Y(w k−1 ))) is available both given Z and given Z −1 while the budget set defined by Z in income-consumption space is a subset of the one defined by Z −1 . 50 This follows because y − 1+ y w 1+ satisfies the single-crossing property. 51 Note that, because Y is strictly increasing on (w k−1 − , w k−1 ] for some > 0 , w k−1 is the lowest type to choose at the kink in Z at income K = Y(w k−1 ).
Thus, Z implements (Y, C). Moreover, it should be clear that Z is continuous and piecewise linear with k pieces and satisfies (i)-(v) in Claim 1 with n = k.
It remains to show uniqueness. Suppose Z ′ and Z ′′ are consumption schedules such that (i)-(v) in Claim 1 hold (when applied to Z ′ and Z ′′ , respectively).
Let us make the following observations. First, for each i ∈ {1, … , n} , the ith piece of Z ′ has the same slope as the ith piece of Z ′′ (by (iii) in Claim 1). Second, we must have Z � (Y(w)) = Z �� (Y(w)) = C(w) (by (i) in Claim 1). Thus, if n = 1 , we must have From here on, suppose n ≥ 2 . Assume Z ′ ≠ Z ′′ . The two observations in the previous paragraph and Z ′ ≠ Z ′′ imply that, for some i ∈ {2, … , n} , the kink between the (i − 1)st and ith pieces of Z ′ occurs at a different income level than the kink between the (i − 1)st and ith pieces of Z ′′ . Let k be the lowest i for which this occurs. We need to consider two cases, t k−1 > t k and t k−1 < t k .
First, suppose t k−1 > t k . Then, w k−1 must be the highest type that chooses a point on the (k − 1)st piece of both Z ′ and Z ′′ (by (iv) in Claim 1). Moreover, each type w ∈ (w k−1 , w k ] chooses on the kth piece of Z ′ and Z ′′ . 52 By the fact that Y is upper hemicontinuous with nonempty and compact values, we have limw ↓w k−1 Y(w) ∈ Y(w k−1 ) . Thus, it must also be optimal for type w k−1 to choose on the kth piece of Z ′ and Z ′′ . That is, type w k−1 is indifferent between choosing on the (k − 1)st and on the kth piece of Z ′ and is also indifferent between choosing on the (k − 1)st and on the kth piece of Z ′′ . But then, for Z ′ and Z ′′ , the kink between the (k − 1)st and kth pieces must occur at the same income level. We have reached a contradiction.
Next suppose, t k−1 < t k . Then, w k−1 chooses at the kink between the (k − 1)st and kth pieces of both Z ′ and Z ′′ (by (v) in Claim 1). Hence, for both Z ′ and Z ′′ , this kink must occur at the same income level, namely Y(w k−1 ) . We have again reached a contradiction. ◻

Proof of Proposition 2
Suppose that (i) (Y, C) is implemented by some continuous, piecewise linear consumption schedule, Z, with N pieces and (ii) if w = w or w is a jump point of Y, Y is strictly increasing on (w, w + ) for some > 0. 53 Note that types' optimal consumption-income choices under Z must be incentive compatible (because each type could have mimicked any other type's consumption-income choice), so that (Y, C) must satisfy (a). It remains to show that Y satisfies (c) almost everywhere. Let us begin with a few lemmas.
Lemma 1 If Y equals some constant y on (w � , w �� ), then Z has a kink at y.
Proof Assume Z has no kink at y and take w a and w b such that w ′ < w a < w b < w ′′ . Given that Z must be linear in some neighbourhood of y, it must be that, for each type w a and w b , its indifference curve in income-consumption space is tangent to Z at income level y. 54 This is impossible because two types' indifference curves cannot have the same slope at the same income level. ◻ Lemma 2 If w is a jump point of Y, then Z has a kink on (limw ↑w Y(w), limw ↓w Y(w)). 55 Proof Assume that Z exhibits no kinks on (limw ↑w Y(w), limw ↓w Y(w)) . Then, given the strict convexity of type w's indifference curves in income-consumption space, it is impossible for both income limw ↑w Y(w) and income limw ↓w Y(w) to be optimal for type w. On the other hand, by the fact that Y is upper hemicontinuous with nonempty and compact values, limw ↑w Y(w) ∈ Y(w) and limw ↓w Y(w) ∈ Y(w) . We have reached a contradiction. ◻ Lemma 3 Suppose Y is continuous and strictly increasing on (w � , w �� ). Then, Z is linear with strictly positive slope on (limw ↓w � Y(w), limw ↑w �� Y(w)). Moreover, denoting this slope by 1 − t, we have Y(w) = (1 − t) w 1+ on (w � , w �� ).

Fig. 4
Determination of K, the income level at which the kink between the (k − 1)st and kth pieces of Z occurs for the case t k−1 > t k 54 Because Y is strictly increasing near w , we must have y > 0 . Thus, Y(w a ) and Y(w b ) aren't corner solutions of problem (6) and the tangency condition must hold. 55 The limits exist because Y is nondecreasing.
Proof Suppose Z has a kink at y ∈ (limw ↓w � Y(w), limw ↑w �� Y(w)) . Let w ∈ (w � , w �� ) be such that Y(w) = y . Let z − and z + denote the slopes of Z just to the left and just to the right, respectively, of y. First suppose z − > z + . In that case, there must exist > 0 such that, for all 0 < < , type (w − ) 's indifference curve has slope z − at income Y(w − ) and type (w + ) 's indifference curve has slope z + at income Y(w − ), 56 i.e., (w+ ) 1+1∕ . However, taking the limit of the left-most and rightmost terms in the last expression as ↓ 0 yields Y(w) 1∕ w Next, suppose z − < z + . Then, given the smoothness of indifference curves in income-consumption space, either the piece of Z just to the left of y or the piece of Z just to the right of y would cut into the upper-contour set of type w's indifference curve passing through (y, Z(y)). This contradicts Y(w) = y being optimal for type w. Now suppose 1 − t ≤ 0 . Then, for types in (w � , w �� ) earning more does not increase consumption, so that Y(w) must be flat on (w � , w �� ) , a contradiction.
Finally, Y(w) = (1 − t) w 1+ on (w � , w �� ) follows immediately from the requirement that the indifference curve of type w ∈ (w � , w �� ) in income-consumption space be tangent to the piece of Z over (limw ↓w � Y(w), limw ↑w �� Y(w)) . ◻ The plan for the rest of the proof is to define w 0 , w 1 , … , w n , define t 0 , t 1 , … , t n , show that these w i 's and t i 's satisfy the requirements in condition (c), and show that Y must be of the form in expression (3) on each (w i−1 , w i ).
Let us start by defining w 0 , w 1 , … , w n recursively as follows. Let w 0 = w and, given w i−1 < w (where i ≥ 1 ), define w i as follows. Let w i,1 = min{w ∈ [w, w]|w > w i−1 and Y(w − ) < Y(w + ) for all > 0 and Y is constant on (w, w + ) for some > 0} and w i,2 = min{w ∈ [w, w]|w > w i−1 and w is a jump point of Y} , where I adopt the convention that the minimum of the empty set equals ∞. 57 Let w i = min{w i,1 , w i,2 , w} . That is, w i is the lowest value of w ∈ [w, w] strictly to the right of w i−1 where either a flat segment of Y begins or Y jumps. If no such value exists, w i = w.
The w i 's thus constructed satisfy the following requirements.
Lemma 4 w 0 = w , w n = w for some n ≤ N, and w i−1 < w i for all i ∈ {1, … , n}.
Proof The only nonobvious statement is that w n = w for some n ≤ N . Let us prove that. Suppose N = 1 so that Z has a single piece. Then, Y cannot have flat segments or jumps (by Lemmas 1 and 2) so that w 1,1 = w 1,2 = ∞ and w 1 = w.

3
for some ŵ such that w i−1 ≤ŵ < w i . Set t i to be such that 1 − t i equals this slope. In case (ii), define t i by the equation (1 − t i ) w 1+ i =ŷ. The next lemma, taken in conjunction with Lemma 4, demonstrates that the w i 's and t i 's fulfil the requirements in condition (c).
(3) For all i ∈ {1, … , n} such that t i−1 < t i , we have Proof Statement 1) is obvious. Statements 2)-4) are obvious for i = 1 . Let us take an arbitrary i ∈ {2, … , n} and let us consider the following exhaustive cases.
1. Y is strictly increasing on (w i−1 , w i ).
In this case, w i−1 must be a jump point of Y (because it cannot be a point where a flat segment of Y starts). Also, by the way t i was defined and Lemma 3, we have for w a and w b arbitrarily close to w i−1 . Thus, both when Y is constant on (w i−2 , w i−1 ) and when Y is nonconstant on (w i−2 , w i−1 ) , we must have t i−1 > t i . 2. Y(w) =ŷ on (w i−1 , w i ).
In this case, w i−1 cannot be a jump point of Y (by condition (ii) in Proposition 2). Hence, w i−1 must be where a flat segment of Y begins so that Y must be strictly increasing just to the left of w i−1 . Thus, we must have (1 − t i−1 ) w 1+ i−1 =ŷ. 63 Also, from the definition of t i , (1 − t i ) w 1+ i =ŷ . Thus, t i−1 < t i and To see this, note that w i must be a jump point of Y and, hence, Y must be strictly increasing just to the right of w i (by condition (ii) in Proposition 2). Hence, Y(w) = (1 − t i+1 ) w 1+ on (w i , w i+1 ) and the following must hold: for some > 0, for w arbitrarily close to w i . Thus, we must have t i > t i+1 . 3. For some ŵ such that w i−1 <ŵ < w i , Y(w) =ŷ on (w i−1 ,ŵ] and Y is strictly increasing on (ŵ, w i ). 61 We need to be smaller than the size of the jump in Y at w i−1 . 62 We need to be smaller than the size of the jump in Y at w i−1 . 63 By the definition of t i−1 and Lemma 3, the left-hand side is the expression for Y just to the left of w i−1 . The equality holds because Y is continuous at w i−1 . 64 We need to be smaller than the size of the jump in Y at w i .
In this case, w i−1 cannot be a jump point of Y (by condition (ii) in Proposition 2). Hence, w i−1 must be wherea flat segment of Y begins so that Y must be strictly increasing just to the left of w i−1 . Thus, we must have (1 − t i−1 ) w 1+ i−1 =ŷ. Further, by the definition of t i and Lemma 3, Y(w) = (1 − t i ) w 1+ >ŷ for all w ∈ (ŵ, w i ) . Thus, 1−t i−1 1−t i 1+ w i−1 < w i . Also, by the continuity of Y on (w i−1 , w i ), 65 we must have (1 − t i−1 ) w 1+ i−1 = (1 − t i )ŵ 1+ so t hat t i−1 < t i and ŵ = 1−t i−1 1−t i 1+ w i−1 .

◻
Let us now turn to the functional form of Y on (w i−1 , w i ) for each i ∈ {1, … , n} . First, consider i = 1 . Y must be strictly increasing on (w, w 1 ) so that, by the definition of t 1 and Lemma 3, Y(w) = (1 − t 1 ) w 1+ . Also, t 0 > t 1 . Thus, Y has the functional form (3) on (w, w 1 ).
Next consider i ∈ {2, … , n} . In the proof of Lemma 6, we have that -case 1 implies t i−1 > t i , -case 2 implies t i−1 < t i and -case 3 implies t i−1 < t i and Thus, if t i−1 > t i , case 1 in that proof applies and we must have Y(w) = (1 − t i ) w 1+ on (w i−1 , w i ) . If t i−1 < t i and 1−t i−1 1−t i 1+ w i−1 = w i , case 2 in that proof applies and Y(w) = (1 − t i−1 ) w 1+ i−1 on (w i−1 , w i ) . If t i−1 < t i and 1−t i−1 1−t i 1+ w i−1 < w i , case 3 in that proof applies and on (w i−1 , w i ).
The bottom line is that, for i ∈ {1, … , n} , the functional form of Y on (w i−1 , w i ) can be written as: The penultimate equality holds because the expression after that equality just adds a redundant case. The last equality holds because the expression after that equality just combines the middle two cases as well as the last two cases from the previous expression. Finally, note that, given part 3) in Lemma 6, 1−t i−1 1−t i 1+ w i−1 ≤ w i is guaranteed to hold if t i−1 < t i and can hence be dropped from the last expression above. ◻