Introduction

I numerically compute Borda-optimal (BO), i.e., optimal based on the Borda count as the normative criterion, labour-income tax schedules for the United States. I do so in the context of a Mirrlees-style model with quasilinear preferences and a constant elasticity of labour supply. I perform the computations separately for three different values of the elasticity of labour supply, \(\sigma\).

A major challenge is that the Borda count is defined for finitely many alternatives whereas there are infinitely many possible tax schedules. To deal with this, I identify a subset of the feasible direct mechanisms (DMs) that (a) loosely speaking, corresponds to the set of continuous, piecewise linear tax schedules with N or fewer pieces and (b) lends itself to transparent, finite discretisations. Using \(N=4\) and one such discretisation in the baseline numerical analysis, I compute, for each value of \(\sigma\), the BO DM within the resulting finite set of DMs as well as the corresponding BO tax schedule.

The main findings in terms of the BO tax schedules are that (i) for each value of \(\sigma\), all marginal rates are positive, (ii) depending on the value of \(\sigma\), the marginal rate at the highest incomes may or may not be strictly higher than the marginal rates at all lower incomes, (iii) for each value of \(\sigma\), average rates are nevertheless (possibly, weakly) increasing in income (to a close approximation), and (iv) this progressivity is attenuated as \(\sigma\) increases. These findings hold up well under a number of robustness checks that use alternative values of N and alternative discretisations.

The existing literature on optimal taxation is largely based on utilitarianism (Bentham 1789; Mirrlees 1971), Rawls’ maxmin principle (Rawls 1971; Piketty 1997), or equality of opportunity (Roemer (1998); Fleurbaey (2008)). Although these normative approaches have their appeal, they also have two important limitations. First, they seem disconnected from the idea of democracy. This is awkward given the broad consensus in many countries that public policy should be determined through a democratic process.Footnote 1 Second, excluding some notions of equality of opportunity, the implementation of these approaches requires taking a stand on nonordinal properties of utility.

My findings (i)–(iv) above are in line with well-known findings in this literature. For example, Seade (1982) shows theoretically that, in a Mirrlees-style model with a utilitarian criterion, the optimal tax schedule must be strictly increasing (in line with finding (i)). Also, using numerical analysis in a Mirrlees-style model with a utilitarian and a Rawlsian criterion, Saez (2001) obtains marginal rates at high incomes that are lower than marginal rates at low incomes (in line with finding (ii)).Footnote 2 What is novel in my paper is that findings (i)–(iv) have been derived based on a different normative foundation.

An alternative, normatively appealing approach to optimal taxation is to use majority rule. Unfortunately, for general sets of tax schedules, a Condorcet winner is not guaranteed to exist. However, if we restrict attention to linear tax schedules, a Condorcet winner does exist under some assumptions (Roberts (1977)). In these settings, three key findings regarding the linear tax schedule selected by majority rule are that (a) under plausible assumptions, the marginal rate is positive, (b) the intercept can be positive, so that the average rate can be decreasing, and (c) the marginal rate is increasing in the ratio between mean and median income (at least when government consumption is zero).Footnote 3 The current paper differs from this literature in that it uses a different normative criterion and considers more flexible tax schedules.

The Borda count has several important advantages as a normative criterion. First, it has been characterised in terms of normatively appealing axioms (Young (1974), Maskin (2021)).Footnote 4 Second, preference aggregation seems central to the idea of democracy. Third, the Borda count can be implemented without going beyond ordinal utility.

Of course, the Borda count also has limitations. First, it is defined for finitely many alternatives and the results could be sensitive to the discretisation employed. Second, although the Borda count exhibits some sensitivity to the intensity of preferences between any two alternatives by taking into account the number of alternatives ranked inbetween by each individual, policy-makers may wish to be more sensitive to preference intensities (e.g., based on introspection or individuals’ verbal reports).

There is also a literature that studies labour-income taxation in various descriptive (as opposed to normative) political economy models. For example, Röell (2012), Brett and Weymark (2017), and De Donder and Hindriks (2003) study labour-income taxation in a two-step model: at the first step, each individual proposes a tax schedule that is selfishly-optimal for her; at the second step, majority rule is applied to the proposed tax schedules (on which a Condorcet winner exists under some assumptions). Chen (2000), Carbonell-Nicolau and Efe (2007), Roemer (2012), and Bierbrauer and Boyer (2016) consider models of political competition in which politicians choose tax policies on which to run for office. Bierbrauer et al. (2021) characterise when a monotonic labour-income tax reform (i.e., a reform such that the change in the tax burden is a monotonic function of income) is politically feasible in the sense that it is preferred by a majority over the status quo.Footnote 5

Preferences and productivities

Individuals have preferences over consumption \(c \ge 0\) and labour \(l \ge 0\) represented by the utility function \(c - \frac{\sigma }{1+\sigma } l^\frac{1+\sigma }{\sigma }\), where \(\sigma >0\) is the (Hicksian and Marshallian) elasticity of labour supply. Each individual has a productivity (or type) which is her private information. When type w puts in labour l, she earns pre-tax income wl. The set of types is \([{\underline{w}},{\overline{w}}]\), where \(0<{\underline{w}}<{\overline{w}}\). Types are distributed according to the probability density function f which has full support on \([{\underline{w}},{\overline{w}}]\).

DMs

Feasible DMs

Given the revelation principle, we can restrict attention to DMs. A DM is a tuple (YC), where \(Y:[{\underline{w}},{\overline{w}}] \rightarrow [0,\infty )\) and \(C:[{\underline{w}},{\overline{w}}] \rightarrow [0,\infty )\). Y(w) and C(w) are the income and the consumption, respectively, assigned to an individual reporting to be of type w.

A DM is feasible if the following conditions hold.

  1. (a)

    Incentive compatibility: Y is nondecreasing and, for all \(w \in [{\underline{w}},{\overline{w}}]\),

    $$\begin{aligned} C(w) = C({\underline{w}}) - \frac{\sigma }{1+\sigma } \left( \frac{Y({\underline{w}})}{{\underline{w}}} \right) ^\frac{1+\sigma }{\sigma } + \frac{\sigma }{1+\sigma } \left( \frac{Y(w)}{w} \right) ^\frac{1+\sigma }{\sigma } + \int _{{\underline{w}}}^w \left( \frac{Y({\tilde{w}})}{{\tilde{w}}} \right) ^\frac{1+\sigma }{\sigma } \frac{1}{{\tilde{w}}} d {\tilde{w}}. \end{aligned}$$
    (1)
  2. (b)

    Government budget constraint:

    $$\begin{aligned} \int _{{\underline{w}}}^{{\overline{w}}} (Y(w)-C(w)) f(w) dw = R, \end{aligned}$$
    (2)

    where \(R \ge 0\) is the exogenously given government consumption per capita.Footnote 6

A finite subset of the feasible DMs

Because the Borda count is defined for a finite set of alternatives, it is necessary to restrict attention to a finite subset of the feasible DMs. To this end, I augment conditions (a) and (b) with two further conditions, the first one being the following.

  1. (c)

    Y is of the form:

    $$\begin{aligned} Y(w) = \left\{ \begin{array}{ll} (1-t_1)^\sigma w^{1+\sigma } &{} \text {if } w=w_0 \\ (1-t_i)^\sigma w^{1+\sigma } &{} \text {if } w_{i-1}< w \le w_i, t_{i-1} > t_i \\ (1-t_{i-1})^\sigma w_{i-1}^{1+\sigma } &{} \text {if } w_{i-1}< w \le \left( \frac{1-t_{i-1}}{1-t_i} \right) ^\frac{\sigma }{1+\sigma } w_{i-1}, t_{i-1}< t_i \\ (1-t_i)^\sigma w^{1+\sigma } &{} \text {if } \left( \frac{1-t_{i-1}}{1-t_i} \right) ^\frac{\sigma }{1+\sigma } w_{i-1}< w \le w_i, t_{i-1} < t_i \\ \end{array} \right. , \end{aligned}$$
    (3)

    where (i) \(i \in \{1,\ldots ,n\}\), \(n \ge 1\), (ii) \(w_0={\underline{w}}\), \(w_n={\overline{w}}\), and \(w_{i-1} < w_i\) for all i, (iii) \(t_0=1\), \(t_i<1\) for all i, and \(t_{i-1} \ne t_i\) for all i, (iv) \(\left( \frac{1-t_{i-1}}{1-t_i} \right) ^\frac{\sigma }{1+\sigma } w_{i-1} \le w_i\) for all i such that \(t_{i-1} < t_i\), and (v) \(\left( \frac{1-t_{i-1}}{1-t_i} \right) ^\frac{\sigma }{1+\sigma } w_{i-1} < w_i\) for all \(i<n\) such that \(t_{i-1}< t_i < t_{i+1}\).

The following proposition shows that a DM satisfying (a) and (c) can be interpreted in terms of a corresponding tax schedule.Footnote 7

Proposition 1

Suppose (YC) satisfies (a) and (c). Then, there exists a unique tax schedule, T, such that the following hold.

  1. (i)

    T implements (YC).

  2. (ii)

    T is continuous and piecewise linear with n pieces.

  3. (iii)

    For each \(i \in \{1,\ldots ,n\}\), \(t_i\) is the slope of the \(i{\text {th}}\) piece of T.Footnote 8

  4. (iv)

    If \(n \ge 2\), then, for each \(i \in \{2,\ldots ,n\}\) such that \(t_{i-1} > t_i\), \(w_{i-1}\) is the highest type that chooses a point on the \((i-1){\text {st}}\) piece of T.

  5. (v)

    If \(n \ge 2\), then, for each \(i \in \{2,\ldots ,n\}\) such that \(t_{i-1} < t_i\), \(w_{i-1}\) is the lowest type that chooses at the kink between the \((i-1){\text {st}}\) and \(i{\text {th}}\) pieces of T.Footnote 9

Thus, given (YC) satisfying (a) and (c), \(t_i\) (\(i=1,\ldots ,n\)) is the marginal rate on the \(i{\text {th}}\) piece of the corresponding tax schedule, T, and \(w_{i-1}\) (\(i=2,\ldots ,n\)) is the threshold type where types switch to locating on the \(i{\text {th}}\) piece of T.

The next proposition provides a kind of converse of Proposition 1.

Proposition 2

Suppose that (i) (YC) is implemented by some continuous, piecewise linear tax schedule with N pieces and (ii) if \(w={\underline{w}}\) or w is a jump point of Y, Y is strictly increasing on \((w,w+\delta )\) for some \(\delta >0\). Then (YC) satisfies (a) and Y satisfies (c) almost everywhere for some \(n \le N\).

Condition (ii) seems weak: it applies to at most N, arbitrarily narrow intervalsFootnote 10 on each of which it, moreover, allows Y to be arbitrarily close to constant. Thus, abstracting from what seem like technical details, Propositions 1 and 2 tell us that a DM satisfies (a) and (c) for some \(n \le N\) if and only if it is implemented by a continuous, piecewise linear tax schedule with N or fewer pieces.

Letting w(p) denote the \(p{\text {th}}\) type percentile, the next condition provides a finite, numerically tractable discretisation of the set of Y functions satisfying (c).

  1. (d)

    \(n \le 4\). Given n, \(t_i \in \{ -2.-1.5,-1,-.8,-.6,-.4,-.2,0,.1,.2,.3,.4,.5,.6,.7,.8,.9 \}\) for all \(i \in \{1,\ldots ,n\}\) and \(w_i \in \{w(10),w(20),w(30),w(40),w(50),w(60),w(70),w(80),\)\(w(90),w(95),w(99),w(99.9) \}\) for all\(i \in \{1,\ldots,n-1\}.\)

Thus, the discretisation in (d) in effect restricts attention to continuous, piecewise linear tax schedules with four or fewer pieces such that (i) the marginal rate on any of the pieces lies on the given grid for the \(t_i\)’s and (ii) threshold types lie on the given grid for the \(w_i\)’s (e.g., tax schedules such that types just below the \(45\text {th}\) percentile choose on the second piece and types just above the \(45\text {th}\) percentile choose on the third piece are ruled out). I have somewhat arbitrarily truncated marginal tax rates at \(-2\) from below, noting that even lower marginal tax rates could probably only apply to a small fraction of the population if they are to be feasible.

From here on, I restrict attention to the set of DMs satisfying (a)–(d). Let \({\mathcal {D}}\) denote this set. Because Y pins down C through constraints (1) and (2), \({\mathcal {D}}\) corresponds to the set of Y functions such that (c) holds, (d) holds, and \(C({\underline{w}})\) obtained after plugging in for C(w) from (1) into (2) is nonnegative.Footnote 11

Before proceeding, let us consider the following question: Why look for a BO DM in \({\mathcal {D}}\) rather than for a BO continuous, piecewise linear tax schedule with four or fewer pieces? There are three disadvantages to the latter approach. First, to discretise the set of continuous, piecewise linear tax schedules with four or fewer pieces, one would need to choose the grid of income levels at which the kinks can be located. However, it is not obvious how to do that. In contrast, the grid for the \(w_i\)’s in condition (d) seems transparent and natural. Second, one would need to solve each type’s labour-supply optimisation problem given each tax schedule, and this is likely to considerably slow down the numerical calculations. Third, there can be multiple continuous, piecewise linear tax schedules with four or fewer pieces implementing the same DM and one would need to eliminate such duplicates before applying the Borda count.Footnote 12 However, duplicate tax schedules may be tricky to identify as they may incorrectly appear to implement slightly different DMs due to imperfect numerical precision.

The Borda count

Given \((Y,C) \in {\mathcal {D}}\), let \(\Delta (Y,C,w)\) denote the number of DMs in \({\mathcal {D}}\) that are strictly worse than (YC) according to type w minus the number of DMs in \({\mathcal {D}}\) that are strictly better than (YC) according to type w.Footnote 13 The Borda count of (YC) is:Footnote 14\(^,\)Footnote 15

$$\begin{aligned} B(Y,C) = \int _{{\underline{w}}}^{{\overline{w}}} \Delta (Y,C,w) f(w) dw. \end{aligned}$$
(4)

\((Y,C) \in {\mathcal {D}}\) is BO if \(B(Y,C) \ge B({\hat{Y}},{\hat{C}})\) for all \(({\hat{Y}},{\hat{C}}) \in {\mathcal {D}}\).

Note that evaluating B(YC) requires computing all types’ rankings over \({\mathcal {D}}\), which is numerically infeasible. Therefore, to obtain my numerical results, I approximate the integral in (4) based on the rankings of a finite set of “representative” types. The main idea is to approximate \(\Delta (Y,C,\cdot )\) via a step function by (i) partitioning \([{\underline{w}},{\overline{w}}]\) into 14 subintervals and (ii) replacing \(\Delta (Y,C,\cdot )\) over each subinterval with \(\Delta (Y,C,w_m)\), where \(w_m\) is the median (i.e., “representative”) type in that subinterval.Footnote 16 I will refer to a DM maximising the approximation of the integral in (4) as “BO” even though, strictly speaking, it’s only BO if it maximises the actual integral in (4).

Calculations for the United States

Calibration

Elasticity of labour supply

Given the considerable controversy in the literature on the elasticity of labour supply,Footnote 17 I will perform the analysis separately for \(\sigma \in \{0.25,0.5,1\}\). In choosing these values, I am following Saez and Stantcheva (2018).

Distribution of types

The main idea for calibrating the distribution of types goes as follows. First, I assume that the actual labour-income tax schedule is linear with a 30 percent marginal tax rate. Given this tax schedule, type w’s optimal pretax labour income is \(y^*(w)=0.7^\sigma w^{1+\sigma }\). Second, I back out the distribution of types based on \(y^*(\cdot )\) and data from the World Inequality Database (WID) on the empirical distribution of pretax labour income for individuals over age 20 in the US in 2014.Footnote 18

Government consumption per capita

According to WID, US national income per individual over age 20 in 2014 was $65,192.Footnote 19 According to Piketty, Saez and Zucman (2018), total (i.e., federal, state, and local) government consumption in the US has been around 18 percent of national income since the end of World War II. Thus, I set \(R=65,192 \times 0.18 \approx 11,735\). This calculation assumes that government consumption must be financed entirely from labour income taxation, which seems like the natural theoretical benchmark based on Atkinson and Stiglitz (1976).Footnote 20

Main results

Table 1 BO UBI
Fig. 1
figure 1

BO marginal tax rates. Although all lines should technically be perfectly flat, some of them are drawn as squiggles to distinguish the marginal tax rates for the different values of \(\sigma\). The marginal rate for \(\sigma =1\) equals 0.7 for incomes up to $3,154 (this is barely visible in the top left corner of the figure) and jumps to 0.4 at income $925,653 (this is not shown in the figure)

Fig. 2
figure 2

BO average tax rates. The average rates for \(\sigma = 0.25\) and \(\sigma =0.5\) monotonically increase towards 0.7 and 0.5, respectively, as income increases beyond the values shown in the figure. The average rate for \(\sigma =1\) monotonically declines from 0.313 to 0.3 between incomes $32,878 and $925,653 and monotonically increases towards 0.4 at higher incomes

For each \(\sigma \in \{0.25,0.5,1\}\), I compute the (as it turns out, unique) BO DM and the corresponding (in the sense of Proposition 1) BO tax schedule.Footnote 21 The main features of the BO tax schedules are presented in Table 1 as well as in Figs. 1 and 2. Table 1 shows, for each value of \(\sigma\), the BO Universal Basic Income (UBI), i.e., the negative of the intercept of the BO tax schedule. Figure 1 (Fig. 2) depicts, for each value of \(\sigma\), the BO marginal (average, respectively) tax rate as a function of income.

The first finding is the following.

Finding 1

For \(\sigma \in \{0.25,0.5,1\}\), all BO marginal tax rates are positive.

In particular, there is no equivalent to the the Earned Income Tax Credit at low incomes.

The next finding is perhaps at odds with what is often taken for granted in popular discourse.

Finding 2

For \(\sigma =0.25\), the BO marginal tax rate at the highest incomes is strictly higher than the BO marginal tax rates at all lower incomes. However, this is not true for \(\sigma =1\).Footnote 22

Nevertheless, because of the UBI and marginal rates that don’t decrease sufficiently with income, the BO tax schedule is (possibly, weakly) progressive in terms of average rates.

Finding 3

The BO average tax rate is strictly increasing in income for \(\sigma \in \{0.25,0.5\}\) and, to a close approximation, weakly increasing in income for \(\sigma =1\).

Furthermore, the following holds.

Finding 4

For any incomes \(y_1\) and \(y_2\) such that \(0< y_1< y_2 < 925653\), the difference between the BO average tax rate at \(y_2\) and at \(y_1\) is strictly decreasing in \(\sigma\) on \(\{0.25,0.5,1\}\).Footnote 23

Thus, the progressivity of the BO tax schedule is decreasing in \(\sigma\), at least at the income levels that are relevant for the vast majority of the population.Footnote 24 This occurs because (i) the BO UBI falls substantially as \(\sigma\) increases and (ii) abstracting from some minor exceptions at low incomes, at any income level the BO marginal tax rate weakly decreases as \(\sigma\) increases on \(\{0.25,0.5,1\}\). For \(\sigma =1\), the progressivity is attenuated to the point that the average tax rate is approximately flat for a wide range of incomes (for incomes between $32,878 and $925,653, to be precise).Footnote 25

Robustness checks

I explore the robustness of Findings 14 to the discretisation in condition (d) by redoing the numerical analysis for each of the following variations of that condition.

  1. (d1)

    Same as condition (d) except that \(n \le 3\) instead of \(n \le 4\).

  2. (d2)

    \(n \le 4\). Given n, \(t_i \in \{ -.8,-.6,-.4,-.2,0,.2,.4,.6,.8 \}\) for all \(i \in \{1,\ldots ,n\}\) and \(w_i \in \{ w(20),w(40),w(60),w(80),w(95),w(99) \}\) for all \(i \in \{1,\ldots ,n-1\}\).

  3. (d3)

    \(n \le 5\). Given n, \(t_i \in \{ -.8,-.6,-.4,-.2,0,.2,.4,.6,.8 \}\) for all \(i \in \{1,\ldots ,n\}\) and\(w_i \in \{ w(10),w(20),w(30),w(40),w(50),w(60),w(70),w(80), w(90),w(95),w(99),w(99.9) \}\) for all \(i \in \{1,\ldots,n-1\}.\)

  4. (d4)

    \(n = 3\). Letting \({\mathcal {A}}\) denote a set of 20 million points drawn from a uniform distribution on \(\{ (p_1,p_2,t_1,t_2,t_3) | 10 \le p_1<p_2 \le 99.99, -1 \le t_i <1 \text { for } 1\le i \le 3 \}\), \((w_1,w_2,t_1,t_2,t_3)\) are such that \((w_1,w_2,t_1,t_2,t_3)=(w(p_1),w(p_2),t_1,t_2,t_3)\) for some \((p_1,p_2,t_1,t_2,t_3) \in {\mathcal {A}}\).Footnote 26\(^,\)Footnote 27

The discretisations in (d1) and (d2) are coarsenings of the discretisation in (d). Relative to (d), (d3) coarsens the grid for the \(t_i\)’s, but allows for tax schedules with five pieces. The discretisation in (d4) is quite different in that the \(w_i\)’s and \(t_i\)’s are drawn randomly.

Findings 14 hold up well under (d1)–(d4).Footnote 28 In particular, Finding 1 continues to hold across the board.

Finding 2 also continues to hold under (d1), (d2), and (d4). Under (d3), the BO marginal tax rate at the highest incomes is not strictly higher than the BO marginal tax rates at all lower incomes for \(\sigma = 0.25\) either. However, this is only because of high BO marginal rates over the narrow income intervals [0, 4505] and [25406, 32510].

Finding 3 continues to hold with the following exceptions. Under (d2), the BO average tax rate for \(\sigma =1\) modestly declines from 0.355 to 0.27 between incomes $60,699 and $134,115. Under (d3), the BO average tax rate for \(\sigma =1\) modestly declines from 0.343 to 0.249 between incomes $46,434 and $134,115. Given the flatness of the BO average tax rate over these income ranges under (d) and the coarseness of the grids for the \(t_i\)’s under (d2) and (d3), these exceptions seem minor.

Finally, the results under (d1)–(d4) are roughly in line with Finding 4 and Fig. 2 in the sense that, under each of these conditions, the BO average-rate schedule rotates clockwise as \(\sigma\) increases. Having said this, there are some instances in which the BO average-rate schedule over a particular income range is not flatter for a higher value of \(\sigma\).Footnote 29

Comments on the WID data

A few comments regarding the WID data on pretax labour income are in order. First, this data is based on all individuals over age 20 and it counts income from public and private pensions as labour income. This is not ideal for the purpose of backing out productivities because the relationship between pension income and productivity is probably different from the relationship between a working-age individual’s labour income and productivity.

Second, income is split equally within couples, which forces us to treat spouses as having the same productivity. This seems preferable for the purposes of the current paper because it ensures that the same preference over tax schedules is imputed to both spouses.

Third, although using cross-sectional data on the distribution of annual income to back out productivities is common (e.g., see Saez (2001)), this probably leads us to exaggerate the dispersion in lifetime productivities. The latter are probably more relevant if we are concerned with the design of a long-term tax system.Footnote 30

Concluding remarks

This paper is an attempt to apply the idea of democracy, as embodied in the Borda count, to the optimal taxation of labour income. Undoubtedly, the analysis has important limitations. Notably, it relies on (i) a simple, static model of labour supply with quasi-linear preferences and a constant elasticity of labour supply, (ii) finite discretisations of the set of feasible DMs, and (iii) imperfect data on pretax labour income. For these reasons, Findings 14 focused on qualitative aspects of the BO tax schedules and, even so, I view these findings as no more than indicative. More broadly, I hope the current paper will encourage research on BO public policies.