1 Introduction

Generations of economists have struggled with the question of the optimal degree of tax progressivity. In its modern form, this question was first posed by Vickrey (1945), who stated that a full characterization of the optimum “produces a completely unwieldy expression,” leading him to the conclusion that “the problem resists any facile solution.” Indeed, it took another quarter of a century until Mirrlees (1971, 1976) offered a first solution to the problem. The solution was obtained by way of an indirect approach: he first solved for the optimal allocation, subject to resource and incentive compatibility constraints, and only then determined the tax system that would implement this allocation. Ever since, this has been the dominant approach in the literature whenever it concerns nonlinear taxation (e.g., Stiglitz, 1982; Tuomala, 1990; Diamond, 1998).

The advantage of this indirect approach is its mathematical rigor. The problem of finding the optimal allocation conveniently lends itself to the toolbox of optimal control theory, yielding a mathematically well-defined procedure for solving it. But this solution procedure also harbors the main disadvantage of this indirect approach, namely the lack of intuition involved with the derivation of the optimal tax schedule. In reality, government does not exercise any direct control over individuals’ allocations—how much they work and consume of every good in the economy. Instead, it controls the tax system. Interpreting the problem of optimal taxation as choosing the most preferred incentive-compatible allocation may well-alienate the applied world of tax policy, as well as students, from the academic discipline of tax design. In the worst case, it could lead policy makers to disregard academic insights, and academics to focus too much on technical issues that might be of limited practical relevance. In short, it could reduce the practical impact of an academic field whose raison d’être is its potential for practical impact.Footnote 1

A more intuitive way of solving for optimal taxes is by directly considering the social welfare effects of changes in taxes rather than allocations. For optimal linear taxes, this has always been the dominant solution procedure (e.g., Diamond and Mirrlees, 1971; Sheshinski, 1972; Diamond, 1975; Dixit and Sandmo, 1977). The likely reason for this is that a linear tax can be captured by a single parameter, which allows for straightforward optimization techniques. The same techniques cannot directly be applied to solve for the optimal nonlinear tax schedule, as the object to be optimized is a function rather than a parameter. Some recent contributions have circumvented this problem heuristically (e.g., Saez, 2001; 2002; Piketty and Saez, 2013; Jacquet et al., 2013). They consider a small perturbation of the tax schedule and heuristically—i.e., verbally—deduce the social-welfare effects of this perturbation. Equating these social-welfare effects to zero solves for the optimum. To prove that their heuristic is valid, they subsequently show that their results correspond to results obtained by solving for the optimal incentive-compatible allocation. This last step is necessary as it may be unclear whether the heuristic derivation picked up on all the relevant welfare effects.

In what follows, I use the term “primal approach” to refer to the indirect approach of first solving for the optimal allocation. I use the term “dual approach” to describe the method of directly solving for optimal taxes.Footnote 2 I show how one can apply the dual approach to determine the optimal nonlinear income tax without relying on a verbal derivation of social-welfare effects. By doing so, I combine the intuitive appeal of the dual approach with the mathematical rigor of the primal approach. All that is needed is a minor adjustment to the definition of the tax schedule, which makes it amenable to simple optimization techniques.

The key to this adjustment is to recognize that a person’s tax burden can change for two different reasons: due to a change in his taxable income and due to a reform of the tax schedule. Thus, instead of defining a nonlinear tax as T(z), with z a person’s taxable income, I define it as \({T(z,\kappa )\equiv {\mathcal {T}}(z)+\kappa \tau (z)}\). Here, \(\kappa\) is an arbitrary parameter and \(\tau (z)\) is the schedule of any nonlinear tax reform one might want to consider. Writing social welfare in terms of \(T(z,\kappa )\), one can deduce the marginal welfare effects of a reform by simply taking the derivative with respect to the parameter \(\kappa\), and substituting for the specific reform of interest \(\tau (z)\). Expressions for the optimal nonlinear tax schedule are derived by optimizing over \(\kappa\) for any possible function \(\tau (z)\). In other words, at the optimum, social welfare is unaffected by any possible nonlinear reform of the tax schedule.

Beyond its intuitive appeal, a second advantage of the dual approach is that it allows for a large degree of flexibility regarding individual behavior. More specifically, I show that it is straightforward to account for heterogeneity not just in individuals’ income, but also in their responsiveness to tax reforms. Doing so, I replicate findings by Jacquet and Lehmann (2021) who apply the primal approach to show that standard optimal tax formulas are adjusted by using income-conditional average elasticities. Moreover, the dual approach can easily incorporate individual behavior that is not based on utility maximization. Utility maximization might not be an appropriate behavioral framework when individuals form mistaken beliefs about the shape of their budget curve or about the functional form of their own utility function. In that case, optimal tax formulas include a corrective term, prescribing higher marginal taxes for individuals who work “too much” and lower marginal taxes for individuals who work “too little.”Footnote 3 The importance of such corrective term crucially depends on misoptimizers’ responsiveness to tax reforms.

Finally, I show how the dual approach can be applied to determine the welfare effects of tax reforms outside the tax optimum. Contrary to the primal approach, which deals with variations in allocations rather than tax schedules, the dual approach is ideally suited to study small nonlinear reforms of a given tax schedule. This is likely to be of more relevance to actual tax policy than a characterization of the optimum. Moreover, determining the desirability of a reform may be empirically less demanding than determining the optimal tax schedule. The reason for this is that the former depends in part on the responsiveness of taxable income at the actual tax system, whereas the latter depends on the responsiveness at the optimal tax system. While we typically cannot be certain about either of the two, it is arguably less problematic to use available elasticity estimates as measures of the responsiveness of taxable income at the actual tax system than as measures of the responsiveness in the optimum.

The contribution of this paper is mostly methodological and pedagogical in nature. The optimal-tax results are themselves not novel. However, they are typically derived in ways that are either mathematically daunting or verbal and therefore mathematically imprecise. The aim of this paper is to show the reader how known results on optimal taxation can be derived in a fairly simple but precise way. The hope is that this will contribute to a deeper understanding of these results among a broader audience.

Beyond the above-mentioned references, this paper relates to a number of earlier studies. To the best of my knowledge, Christiansen (1981, 1984) was the first to parameterize the nonlinear tax schedule to make it amenable to the analysis of tax reforms. His focus is on the evaluation of public projects and commodity taxation, however, and he does not consider a full characterization of the optimal nonlinear income tax—which is the focus of this study. More recently, Golosov et al. (2014) formalize the dual approach to optimal nonlinear income taxation in a dynamic model by applying Gateaux differentials with respect to the tax schedule; Hendren (2020) uses the dual approach to derive implicit welfare weights; and Spiritus et al. (2022) employ the dual approach to derive optimal taxes when households earn multiple incomes and differ across multiple dimensions. Finally, this paper also relates to earlier contributions that identify desirable tax reforms within any given non-optimal tax system (e.g., Tirole and Guesnerie, 1981; Weymark, 1981; Guesnerie, 1995; Bierbrauer et al., 2022).

Section 2 introduces the parameterization of the tax schedule, and Sect. 3 shows how this helps in deriving the welfare effects of any nonlinear tax reform. Section 4 derives expressions for optimal tax rates using the dual approach, allowing for preference heterogeneity and individuals who do not maximize their utility. Section 5 illustrates how the dual approach can be usefully applied to obtain insights into more limited tax reforms outside the optimum. Section 6 discusses the broader applicability of the dual approach and I wrap up with some concluding remarks.

2 Taxes, revenue, and social welfare

2.1 Parameterization of the tax schedule

I assume that individuals in the economy constitute a continuum \({\mathcal {I}}\) of unit mass, and that an individual \(i\in {\mathcal {I}}\) earns taxable income \(z^{i}\). I furthermore assume that \(\{z^{i}:i\in {\mathcal {I}}\}\) is a closed interval so that it is integrable over the population \({\mathcal {I}}\), and denote the cumulative distribution function of taxable income by H(z) and its density by h(z). A person’s income tax is denoted by \(T^{i}\) and depends on his taxable income. As such, the tax can be affected by both a change in income and a reform of the tax schedule. I formalize this by writing the income tax as the following function of gross income and a parameter \(\kappa\):

$$\begin{aligned} T^{i}\equiv T(z^{i},\kappa )={\mathcal {T}}(z^{i})+\kappa \tau (z^{i}), \end{aligned}$$
(1)

which is assumed to be twice differentiable in \(z^i\). This parameterization of the tax function is central to this paper, as it ensures that optimal tax rules can be derived with simple optimization techniques—similar to the case of optimal linear taxation. I refer to \(\kappa\) as the reform parameter, and to \(\tau (z^i)\) as the reform function or simply the reform. The reform parameter takes on an arbitrary value and the reform function depends on whatever reform of the tax schedule one would like to study. The function \({\mathcal {T}}(z^{i})\) is determined to ensure that \(T(z^{i},\kappa )\) gives the actual tax schedule around which a reform is evaluated. A marginal reform of the income tax can be studied by considering a change \(\textrm{d}\kappa\). For a given taxable income z, such reform increases the tax burden by \(\tau (z)\textrm{d}\kappa\). As I allow the reform function to depend on z, I can analyze any nonlinear marginal reform of the tax schedule.

2.2 The nature of behavioral responses to taxation

Studies on optimal taxation typically introduce a structural model of individual decision making that determines equilibrium levels of income \(z^i\). However, while behavioral responses to taxation are crucial determinants of optimal taxes, they could be captured by reduced-form elasticities without the need for a structural model. I therefore do not impose any specific model of individual behavior. Nevertheless, writing behavioral responses in terms of elasticities does require me to specify how changes in tax rates may affect individual income. To that end, I assume that \(z^i\) is differentiable in \(\kappa\). In other words, I rule out that marginal changes in the tax schedule lead to discrete changes in individuals’ taxable income. In the typical model of utility-maximizing individuals, this implies that individuals’ indifference curves are tangent to the budget curve at exactly one point and that there is no extensive margin.Footnote 4 I moreover assume that the derivative of \(z^i\) is integrable over the population \({\mathcal {I}}\).

2.3 The impact of a tax reform on individual taxes

The effect of a reform on an individual’s tax burden is obtained by taking the total derivative of Eq. (1):

$$\begin{aligned} \frac{\textrm{d}T^{i}}{\textrm{d}\kappa }=\tau (z^{i})+T_{z}^{i}\cdot \frac{\textrm{d}z^{i}}{\textrm{d}\kappa }, \end{aligned}$$
(2)

where a subscript denotes a partial derivative, such that \(T_{z}^{i}\equiv {\partial }T(z^{i},\kappa )/{\partial }z^{i}\) gives the marginal tax rate of an individual with income \(z^{i}\). An individual’s tax burden may be affected both directly by the reform of the tax schedule (first term) and indirectly by an income response to the reform (second term).

The same general point can be made for the change in the individual’s marginal tax rate, obtained by taking the partial derivative of Eq. (1) with respect to z, and subsequently taking the total derivative with respect to \(\kappa\):

$$\begin{aligned} \frac{\textrm{d}T_{z}^{i}}{\textrm{d}\kappa }=\tau _{z}(z^{i})+T_{zz}^{i}\cdot \frac{\textrm{d}z^{i}}{\textrm{d}\kappa }. \end{aligned}$$
(3)

The first term illustrates that the reform raises the marginal tax rate at income level \(z^i\) by \(\tau _z (z^i)\textrm{d}\kappa\). A reform-induced change in individual i’s taxable income further alters his marginal tax rate as long as the tax schedule is locally nonlinear (\(T^i_{zz}\ne 0\)). This latter effect is illustrated by the second term in Eq. (3).

2.4 Government revenue

Government revenue equals the integral of all individuals’ taxes and is given by:

$$\begin{aligned} {\mathcal {R}}\equiv \int _{{\mathcal {I}}}T(z^{i},\kappa )\textrm{d}i. \end{aligned}$$
(4)

I do not here concern myself with the expenditure side of the government, but as usual it is straightforward to allow for expenditures on public goods or on some exogenous spending requirement (cf. Christiansen, 1981). The effect on government revenue of a tax reform is obtained by taking the derivative of Eq. (4):

$$\begin{aligned} \frac{\textrm{d}{\mathcal {R}}}{\textrm{d}\kappa }=\int _{{\mathcal {I}}}\left( \tau (z^{i})+T_{z}^{i}\cdot \frac{\textrm{d}z^{i}}{\textrm{d}\kappa }\right) \textrm{d}i, \end{aligned}$$
(5)

which equals the integral of Eq. (2). The revenue effects of a tax reform can be decomposed into a mechanical effect and a behavioral effect on the tax base. The mechanical effect indicates that the reform raises an amount \(\tau (z^{i})\textrm{d}\kappa\) of resources from every individual \(i\in {\mathcal {I}}\). But a tax reform also tends to affect individuals’ taxable income, changing tax revenue by \(T^i_{z}\textrm{d}z^{i}\).

2.5 Individual utility

A benevolent social planner cares not only about revenue, but also about the utility of its citizens. Utility in this context refers to an individual’s actually experienced utility.Footnote 5 The utility of individual i is assumed to be an individual-specific function of the own before- and after-tax income, denoted by \(u^{i}(z^{i}-T^{i},z^{i})\).Footnote 6 For a given gross income, higher net income allows the individual to consume more and thus tends to raise his utility. For a given net income, higher gross income implies that the individual needs to exert more effort in earning income and thus tends to lower his utility. As the income tax is itself a function of gross income and the reform parameter, we can write utility as the following function:

$$\begin{aligned} U^{i}\equiv U^{i}(z^{i},\kappa )=u^{i}(z^{i}-T(z^{i},\kappa ),z^{i}). \end{aligned}$$
(6)

As with taxable income, I assume that utility and its derivatives are integrable over the population \({\mathcal {I}}\).

Individuals may or may not choose their income levels to maximize utility.Footnote 7 I define \(\omega ^{i}\) as the “behavioral wedge” between the marginal rate of substitution and the marginal rate of transformation between gross and net income:

$$\begin{aligned} \omega ^{i}\equiv \frac{-U_{z}^{i}}{u_{c}^{i}}=\left( \frac{-u_{z}^{i}}{u_{c}^{i}}-(1-T_{z}^{i})\right) , \end{aligned}$$
(7)

where \(u_{c}^{i}\equiv \partial u^i/\partial (z^i-T^i)\). The behavioral wedge measures the degree to which an individual works “too much” because of behavioral biases. The marginal rate of substitution (\(-u^i_z/u^i_c\)) measures the consumption-equivalent utility loss of raising gross income. The marginal rate of transformation (\(1-T_z^i\)) measures the consumption-equivalent utility gain of raising gross income. If individual i maximizes his utility, both terms cancel out and the behavioral wedge equals \(\omega ^i=0\). However, the behavioral wedge will be positive (\(\omega ^i>0\)) or negative (\(\omega ^i>0\)) if individual i works “too much” or “too little.” To be precise, individual i’s income would have corresponded to utility maximization had his marginal tax rate (\(T^i_z\)) been \(\omega ^i\) percentage points lower.

2.6 Social welfare

I assume a welfarist social objective, such that social welfare can be written as a (weighted) integral of individual utility:

$$\begin{aligned} {\mathcal {W}}\equiv \int _{{\mathcal {I}}}\gamma ^{i}U^{i}\textrm{d}i, \end{aligned}$$
(8)

where \(\gamma ^{i}\) is an individual-specific weight that determines the importance of individual i’s utility within the social objective. In the special case of a utilitarian social objective, \(\gamma ^{i}=\gamma\) for all i. The effect of a tax reform on the social objective is obtained by taking the derivative of Eq. (8) with respect to \(\kappa\). Doing so, while substituting for Eq. (7), yields:

$$\begin{aligned} \frac{\textrm{d}{\mathcal {W}}}{\textrm{d}\kappa }= -\int _{{\mathcal {I}}}\gamma ^{i}u_{c}^{i}\left( \tau (z^{i})+\omega ^{i}\cdot \frac{\textrm{d}z^{i}}{\textrm{d}\kappa }\right) \textrm{d}i. \end{aligned}$$
(9)

As with government revenue, a reform’s effect on social welfare can be decomposed into a mechanical effect and a behavioral effect. The first term within brackets, representing the mechanical effect, reflects the direct social welfare loss from reducing individuals’ net income by \(\tau (z^{i})\textrm{d}\kappa\). The second term within brackets represents the reform’s behavioral effect on social welfare. If the reform causes individuals to increase their gross income (\(\textrm{d}z^{i}/\textrm{d}\kappa >0\)), it reduces social welfare if their income is already chosen too high (\(\omega ^{i}>0\)) and raises social welfare if their income is chosen too low (\(\omega ^{i}<0\)). The opposite holds if the reform causes individuals to reduce their gross income (\(\textrm{d}z^{i}/\textrm{d}\kappa <0\)). Naturally, if individuals choose their tax base to maximize utility (\(\omega ^{i}=0\)), a reform only affects social welfare through its mechanical effect.

3 Net social welfare effect of a tax reform

3.1 ...in terms of the behavioral policy response

The net social welfare effects of a tax reform account for its impact on both the social welfare function and government revenue. Denote the social marginal value of public resources by \(\lambda\). Furthermore, denote the social welfare weight of individual i by \(g^i\equiv \gamma ^iu^i_c/\lambda\). It equals the social marginal value of individual consumption in terms of public resources. I can now formulate the following Proposition.

Proposition 1

The marginal net social welfare effect of a nonlinear reform \(\tau (\cdot )\) is given by:

$$\begin{aligned} \frac{\textrm{d}{\mathcal {W}}/\lambda }{\textrm{d}\kappa }+\frac{\textrm{d}{\mathcal {R}}}{\textrm{d}\kappa }=\int _{\mathcal {I}}\left( (1-g^i)\tau (z^i)+(T^i_z-g^i\omega ^i)\frac{\textrm{d}z^i}{\textrm{d}\kappa }\right) \textrm{d}i. \end{aligned}$$
(10)

Proof

The left-hand side is the definition of the net social welfare effect of a tax reform. The right-hand side is obtained by substituting Eqs. (5) and (9). \(\square\)

Equation (10) decomposes the welfare impact of any reform into a redistributional and a behavioral effect. The first term within large brackets gives the redistributional gain of the tax reform. The government redistributes \(\tau (z^i)\textrm{d}\kappa\) resources away from every individual i, and toward the government budget. The net gain per unit of redistributed resources equals the revenue gain minus the utility loss (\(1-g^i\)). The second term within large brackets gives the behavioral effect of the reform. To the extent that a reform raises individual i’s gross income (\(\textrm{d}z^i / \textrm{d}\kappa >0\)), the behavioral effect raises government revenue proportional to the marginal tax rate (\(T^i_z\)) and lowers utility proportional to the behavioral wedge (\(\omega ^i\)). The behavioral wedge is weighted by the social welfare weight (\(g^i\)) to obtain the social value of the behavioral utility effect.

Proposition 1 writes the net social welfare effect in terms of the reform’s effect on taxable income \(\textrm{d}z^i/\textrm{d}\kappa\)—what Hendren (2016) dubs the policy response or policy elasticity. For most reforms, however, we typically lack empirical estimates of the policy response to that specific reform. This makes it impossible to directly calibrate Eq. (10). It is therefore useful to rewrite the net social welfare effect in terms of elasticities that we do often measure.

3.2 ...in terms of net-of-tax rate elasticities

Although we typically lack evidence on the policy response to any given reform, we do have estimates of how individuals’ taxable income responds to changes in their net-of-tax rates, as well as how it responds to changes in disposable income (e.g., Blundell and MaCurdy, 1999; Saez et al., 2012; Chetty, 2012). Writing the welfare effects of a tax reform in terms of these “known” elasticities requires us to impose more structure on the behavioral responses to taxation. In particular, it necessitates the assumption that an individual’s income only responds to changes in the “own” marginal tax rate (\(T_z^i\)) and tax burden (\(T^i\)).Footnote 8 In that case, we can define the policy response of taxable income as:

$$\begin{aligned} \frac{\textrm{d}z^{i}}{\textrm{d}\kappa }\equiv -\frac{z^{i}}{1-T_{z}^{i}}\left( e_{c}^{i}\tau _{z}(z^{i})+\eta ^{i}\cdot \frac{\tau (z^{i})}{z^{i}}\right) , \end{aligned}$$
(11)

where \(e^i_c\) and \(\eta ^i\) measure the responsiveness of individual i’s tax base to, respectively, changes in marginal tax rates and changes in the tax burden.

More specifically, the compensated elasticity \(e_c^i\) is defined as the percent change in taxable income (\(z^i\)) due to a reform that raises the marginal net-of-tax rate (\(1-T_z^i\)) by one percent, while leaving the tax burden unchanged. To see this, set \(\tau (z^i)=0\) and rearrange Eq. (11) to find:

$$\begin{aligned} e^i_c\equiv -\left. \frac{\textrm{d}z^i}{\tau _z(z^i)\textrm{d}\kappa }\frac{1-T_z^i}{z^i}\right| _{\tau (z^i)=0}. \end{aligned}$$
(12)

Notice that the exogenous change in the net of tax rate is given by \(-\tau _z(z^i)\textrm{d}\kappa\), so that \(e^i_c\) indeed measures the compensated net-of-tax rate elasticity of taxable income.

The income effect \(\eta ^i\) measures the change in disposable income due to a reform that lowers the tax burden by one unit, while leaving marginal taxes unchanged. To see this, set \(\tau _z(z^i)=0\) and rearrange Eq. (11) to find:

$$\begin{aligned} \eta ^i\equiv -\left. \frac{(1-T_z^i)\textrm{d}z^i}{\tau (z^i)\textrm{d}\kappa }\right| _{\tau _z(z^i)=0}. \end{aligned}$$
(13)

Notice that the exogenous change in the tax burden is given by \(\tau (z^i)\textrm{d}\kappa\), so that \(\eta ^i\) indeed measures the effect of an exogenous reduction in the tax burden.

The compensated elasticity and the income effect are defined as relative changes in income along the actual budget curve—as in Jacquet et al. (2013)—and not as changes along a linearized ‘virtual’ budget line—as in Saez (2001). That is, \(e^i_c\) and \(\eta ^i\) take into account that changes in taxable income affect an individual’s marginal tax rate, which in turn affects taxable income, and so on. The advantage of defining behavioral effects as moves along the actual budget curve is that it allows me to later on express the optimal tax schedule in terms of these elasticities and characteristics of the actual income distribution, rather than a virtual income distribution.Footnote 9

Decomposing the policy response in terms of elasticities with respect to the “own” tax rates yields the following Corollary.

Corollary 1

If income responds only to changes in “own” tax rates, the marginal net social welfare effect of a nonlinear reform \(\tau (\cdot )\) is given by:

$$\begin{aligned} \small \frac{\textrm{d}{\mathcal {W}}/\lambda }{\textrm{d}\kappa }+\frac{\textrm{d}{\mathcal {R}}}{\textrm{d}\kappa }=\int _{{\mathcal {I}}}\left( (1-g^{i})\tau (z^{i})-\frac{T_{z}^{i}-g^{i}\omega ^{i}}{1-T_{z}^{i}}\cdot (\eta ^{i}\tau (z^{i})+z^ie_c^i\tau _{z}(z^{i}))\right) \textrm{d}i. \end{aligned}$$
(14)

Proof

Substitute Eq. (11) into Eq. (10) to obtain Eq. (14). \(\square\)

A tax reform can be seen to have three effects on social welfare, illustrated in expression (14). The first term gives the redistributional gain as discussed above. The second term gives the behavioral income effects of a tax reform. As long as \(\eta ^{i}<0\), an increase in individual i’s tax burden (\(\tau (z^{i})>0\)) leads him to increase his income. This leads to an increase in tax revenue if the tax rate is positive (\(T_{z}^{i}>0\)) and a decrease in utility if the behavioral wedge is positive (\(\omega ^i>0\)) or an increase in utility if the behavioral wedge is negative (\(\omega ^i<0\)). The third term gives the behavioral substitution effects of a tax reform. An increase in the marginal income tax (\(\tau _{z}(z^{i})>0\)) leads to a reduction in taxable income as long as \(e_{c}^{i}>0\). This reduction leads to tax revenue losses (if \(T_{z}^{i}>0\)) and to utility gains (if \(\omega ^{i}>0\)) or losses (if \(\omega ^{i}<0\)).

Corollary 1 and Eq. (14) are central to the analysis of the rest of this paper. It determines both optimal taxes and the desirability of limited reforms outside the optimum. To see this, notice that taxes can only be set optimally if the marginal net social welfare effect of any reform is nil. Thus, the optimal tax schedule is determined by equating expression (14) to zero for any possible nonlinear tax reform \(\tau (\cdot )\). Indeed, the next section sheds more light on the optimal tax schedule by considering two specific reforms for which the net marginal social welfare gains are set to zero. Furthermore, expression (14) also plays a central role when considering limited tax reforms outside the optimum. Such a tax reform is desirable if and only if expression (14) is positive for that specific reform \(\tau (\cdot )\). In Sect. 5, I further elaborate on this.

4 Optimal taxation

4.1 Reform 1: A uniform tax increase

Taxes are set optimally if no reform of the tax schedule can raise net social welfare. A full characterization of optimal tax rates can thus be obtained by equating expression (14) to zero for all possible reforms \(\tau (\cdot )\). To obtain more insight into what constitutes an optimal tax schedule, I focus here on two specific reforms. The first reform raises the tax burden uniformly across individuals by \(\textrm{d}\kappa\), such that \(\tau (z^{i})=1\) and \(\tau _{z}(z^{i})=0\) for all i. This reform is illustrated in Fig. 1, panel (a). Substituting the reform function into expression (14), while equating it to zero, yields:

$$\begin{aligned} \int _{{\mathcal {I}}}\left( 1-g^{i}-\frac{T_{z}^{i}-g^{i}\omega ^{i}}{1-T_{z}^{i}}\cdot \eta ^{i}\right) \textrm{d}i=0. \end{aligned}$$
(15)

As the reform leaves all marginal tax rates unchanged, it does not generate any substitution effect, and only affects social welfare through its redistribution from taxpayers to government and through behavioral income effects.

Fig. 1
figure 1

Three reform functions \(\tau (z)\). Panel a illustrates reform 1, a uniform tax increase. Panel b illustrates reform 2, an increase in the marginal tax rate around one income level \(z^*\). Panel c illustrates reform 3, an increase in a bracket’s tax rate between \(z^a\) and \(z^b\)

To further interpret Eq. (15), it is useful to introduce a term to denote the social marginal value of individual i’s private resources in terms of public resources. This term is given by:

$$\begin{aligned} \alpha ^{i}\equiv g^{i}+\frac{T_{z}^{i}-g^{i}\omega ^{i}}{1-T_{z}^{i}}\cdot \eta ^{i}. \end{aligned}$$
(16)

Denoted in terms of public resources, a marginal unit increase in individual i’s income yields additional social utility of consumption equal to \(g^{i}\). On top of that, it induces an income effect on taxable income, causing a revenue effect equal to \(T_{z}^{i}\eta ^{i}/(1-T_{z}^{i})\), and a further social utility effect equal to \(-g^{i}\omega ^{i}\eta ^{i}/(1-T_{z}^{i})\). Taken together, \(\alpha ^{i}\) indicates how many resources government is willing to give up in order to provide individual i with an additional unit of income.Footnote 10 The pattern of \(\alpha ^{i}\) determines the social willingness to redistribute between any pair of individuals, i.e., the social planner values redistribution of resource from individual i to individual j if \(\alpha ^{i}<\alpha ^{j}\). I can now formulate the following proposition.

Proposition 2

In the tax optimum, the average social marginal value of private resources must equal the social marginal value of public resources:

$$\begin{aligned} \int _{{\mathcal {I}}}\alpha ^{i}\textrm{d}i=1. \end{aligned}$$
(17)

Proof

Substitute Eq. (16) into (15) and rearrange to obtain Eq. (17). \(\square\)

Proposition 2 implies that in the optimum, a marginal transfer of resources from everyone in the private sector to the public sector does not affect net social welfare. This simple optimality condition has important consequences for public policy. As documented by Jacobs (2018), it implies that the marginal cost of public funds—defined as the inverse of the left-hand side of Eq. (17)—equals one in the tax optimum.Footnote 11 As a result, evaluations of public projects should not inflate the financing costs of these projects simply because of the existence of distortive taxes. Since a nonlinear tax schedule implies that government has access to nondistortive taxes—as illustrated by the reform I consider here—distortions are irrelevant for the marginal financing costs of a project. This validates standard cost-benefit analyses (cf., Christiansen, 1981).

4.2 Reform 2: Raising the marginal tax rate

The second reform I consider raises the marginal tax rate over an infinitesimal interval above some income level \(z^*\), such that \(\tau _z(z^i)>0\) for \(z^*\le z^i<z^*+\textrm{d}z\). As a result, the tax burden remains constant for everyone below this interval, but is raised for everyone above the interval. That is, \(\tau (z^i)=0\) for all \(z^i<z^*\), and \(\tau (z^i)>0\) for \(z^i\ge z^*+\textrm{d}z\). This reform is illustrated in Fig. 1, panel (b). Substituting the reform function into the social welfare effects as given by Eq. (14), equating it to zero, and letting the interval \(\textrm{d}z\) go to zero, yields:

$$\begin{aligned} \small \int _{{\mathcal {I}}:z^{i}>z^{*}}\left( 1-g^{i}-\frac{T_{z}^{i}-g^{i}\omega ^{i}}{1-T_{z}^{i}}\cdot \eta ^{i}\right) \textrm{d}i= \lim _{\textrm{d}z\rightarrow 0} \int _{{\mathcal {I}}:z^*\le z^{i}<z^{*}+\textrm{d}z}\left( \frac{T_{z}^{i}-g^{i}\omega ^{i}}{1-T_{z}^{i}}\cdot z^{i}e_{c}^{i}\right) \frac{\textrm{d}i}{\textrm{d}z}. \end{aligned}$$
(18)

To derive this equation, I substituted for \(\tau _z(z^*)=\lim _{\textrm{d}z\rightarrow 0}\tau (z^*+\textrm{d}z)/\textrm{d}z\), which follows from the definition of the derivative.

4.2.1 ...when individuals maximize utility

To clarify the implications of Eq. (18) for optimal income taxes, I first concentrate on the special case in which individuals maximize utility, such that \(\omega ^{i}=0\) for all i. The number of individuals that face an increase in marginal taxes can be written as \(\lim _{\textrm{d}z\rightarrow 0}(H(z^*+\textrm{d}z)-H(z^*))=\lim _{\textrm{d}z\rightarrow 0}h(z^*)\textrm{d}z\). This allows me to write the average compensated elasticity of individuals with income level \(z^*\) as:

$$\begin{aligned} {\bar{e}}_{c}^{*}\equiv \lim _{\textrm{d}z\rightarrow 0}\int _{{\mathcal {I}}:z^*\le z^{i}<z^{*}+\textrm{d}z}e_{c}^{i}\frac{\textrm{d}i}{h(z^*)\textrm{d}z} \end{aligned}$$
(19)

Moreover, I define the average social marginal value of private resources of individuals who earn more than z as \({\bar{\alpha }}_{z^i>z}\equiv \int _{{\mathcal {I}}:z^{i}>z}\alpha ^i\textrm{d}i / \int _{{\mathcal {I}}:z^{i}>z}\textrm{d}i\). With the help of these definitions, I can now formulate the following Proposition.

Proposition 3

In the tax optimum with utility-maximizing individuals, the marginal tax rate at income level \(z^{*}\), denoted by \(T_{z}^{*}\equiv T_{z}(z^{*},\kappa )\), must satisfy the following condition:

$$\begin{aligned} \frac{T_{z}^{*}}{1-T_{z}^{*}}=\frac{1}{{\bar{e}}_{c}^{*}}\cdot \frac{1-H(z^{*})}{z^{*}h(z^{*})}\cdot \Big (1-{\bar{\alpha }}_{z^i>z^*} \Big ). \end{aligned}$$
(20)

Proof

Substituting \(\omega ^i=0\), Eq. (19), and the definition of \({\bar{\alpha }}_{z^i>z}\) into Eq. (18) and rearranging yields Eq. (20). \(\square\)

Equation (20) presents the canonical formula for the optimal marginal tax rate if people differ in their elasticity of taxable income. Variations of this result appear in Diamond (1998), Saez (2001), Piketty and Saez (2013), and Jacquet and Lehmann (2021). Those earlier contributions elaborately discuss the intuition behind the optimal tax formula.

The optimal marginal tax rate at income level \(z^*\) depends on four terms. It is decreasing in the average elasticity of the tax base \({{\bar{e}}}_c^*\) as this raises the distortive costs of the marginal tax rate. It is decreasing in the income concentration \(z^*h(z^*)\) as this raises the amount of income that is distorted by the marginal tax rate. It is increasing in the share of people with higher income \(1-H(z^*)\) as this increases the amount of redistribution caused by the marginal tax rate. And, finally, it is decreasing in the marginal value of private resources of the relatively rich \({\bar{\alpha }}_{z^i>z^*}\) as this lowers the value of redistribution.

4.2.2 ...when individuals do not maximize utility

Now, consider the general case in which individuals do not necessarily choose their tax base to maximize utility, so that \(\omega ^i\) might be nonzero. Before deriving the optimal tax formula, it is useful to define the income-conditional covariance between two variables as \(\textrm{cov}[x^i,y^i]\equiv \overline{x^iy^i}-{{\bar{x}}}^i {{\bar{y}}}^i\), where an overline indicates average values conditional on labor income \(z^i\). This definition allows us to rewrite part of the right-hand side of Eq. (18) as:

$$\begin{aligned} \small \lim _{\textrm{d}z\rightarrow 0} \int _{{\mathcal {I}}:z^*\le z^{i}<z^{*}+\textrm{d}z}g^i\omega ^ie^i_c\frac{\textrm{d}i}{\textrm{d}z}&= \overline{g^*\omega ^*e^*_c}h(z^*) \\&= \left( \overline{g^*\omega ^*e^*_c}-\overline{g^*\omega ^*}\bar{e}^*_c+\overline{g^*\omega ^*}{{\bar{e}}}^*_c -\bar{g}^*{\bar{\omega }}^*{{\bar{e}}}^*_c+{{\bar{g}}}^*{\bar{\omega }}^*\bar{e}^*_c \right) h(z^*) \nonumber \\&=\left( \bar{g}^*{\bar{\omega }}^*+\textrm{cov}[g^*,\omega ^*]+\textrm{cov}\left[ g^*\omega ^*,\frac{e^*_c}{\bar{e}^*_c}\right] \right) {{\bar{e}}}_c^*h(z^*). \nonumber \end{aligned}$$
(21)

The first line follows from the recognition that \(\int _{{\mathcal {I}}:z^*\le z^{i}<z^{*}+\textrm{d}z} x^i\textrm{d}i\) measures the sum of \(x^i\) over all individuals that earn around \(z^*\). Thus, division by the number of individuals \(h(z^*)\textrm{d}z\) yields the income-conditional average \(\overline{x^*}\). The second line follows from adding and subtracting identical terms. The third line follows from the definition of the income-conditional covariance. Substituting this back into Eq. (18) yields the following Proposition.

Proposition 4

In the tax optimum with individuals who might not maximize their own utility, the marginal tax rate at income level \(z^{*}\) must satisfy the following condition:

$$\begin{aligned} \small \frac{T_{z}^{*} - {\bar{g}}^{*} \bar{\omega }^{*} - \textrm{cov}[g^*,\omega ^*] - \textrm{cov}\left[ g^*\omega ^*,\frac{e_c^*}{{\bar{e}}_c^*}\right] }{1-T_z^*}= \frac{1}{{\bar{e}}_{c}^{*}}\cdot \frac{1-H(z^{*})}{z^{*}h(z^{*})}\cdot \Big ( 1- {\bar{\alpha }}_{z^i>z^*} \Big ). \end{aligned}$$
(22)

Proof

Substituting Eqs. (19) and (21), the definitions of the income distribution, and the average social value of private resources into Eq. (18), and rearranging yields Eq. (22). \(\square\)

Equation (22) gives the canonical formula for the optimal marginal tax rate with behavioral biases. Elaborate discussions of the intuition behind this result are provided by Kanbur et al. (2006), Gerritsen (2016), and Farhi and Gabaix (2020). Compared to the results in Eq. (20), there are three new terms on the left-hand side of the optimal tax formula. These terms indicate how the marginal tax rate at \(z^*\) should be adjusted to “correct” the behavior of individuals with income \(z^*\).

The first novel term (\({{\bar{g}}}^*{\bar{\omega }}^*\)) shows that marginal tax rates should be higher if individuals work too much (\({\bar{\omega }}^*>0\)) and government cares about their utility (\({{\bar{g}}}^*>0\)). The second novel term (\(\textrm{cov}[g^*,\omega ^*]\)) shows that taxes should be even higher if government cares more about the utility of people that overwork relatively much (if \(g^i\) is increasing in \(\omega ^i\) for individuals with income \(z^i=z^*\)).

Finally, the third term (\(\textrm{cov}[g^*\omega ^*,e_c^*/{\bar{e}}_c^*]\)) shows that the corrective argument for taxation depends on the tax responsiveness of biased individuals. In particular, it could be that individuals with larger deviations from utility maximization are less responsive to changes in tax rates.Footnote 12 This would imply that the degree of misoptimization is negatively correlated with behavioral elasticities (\(\textrm{cov}[g^*\omega ^*,e^*_c/ \bar{e}^*_c]<0\)) if individuals with income \(z^*\) mistakenly earn too much on average. Conversely, this correlation would be positive (\(\textrm{cov}[g^*\omega ^*,e^*_c/{{\bar{e}}}^*_c]>0\)) if they earn too little on average. This would mean that the covariance of the third novel term in Eq. (22) takes on the opposite sign of the first novel term (\({{\bar{g}}}^*{\bar{\omega }}^*\)). The corrective argument for taxation becomes weaker as a result, bringing optimal tax rates closer to the ones obtained with utility-maximizing individuals.

5 The desirability of limited reforms

5.1 Reform 3: Raising a bracket’s tax rate

Contrary to much of the literature on optimal taxation, actual tax policy is typically concerned with some limited tax reform rather than a search for the best possible tax system. Moreover, the actual tax system might be far from optimal so that the reform should be evaluated outside the tax optimum. The primal approach is ill-equipped to deal with these issues, as it is concerned with the effects of changes in allocations rather than changes in taxes. The dual approach, on the other hand, is ideally situated to deal with issues of actual tax policy. To see this, note that as long as small changes in tax rates lead to only small behavioral changes in income, the welfare effects identified in Eq. (14) are valid for any small reform \(\tau (z)\) and for any optimal or suboptimal initial allocation.Footnote 13

To show how the dual approach can directly generate insights for actual tax policy, I consider a reform that is part of a policy maker’s or politician’s typical range of policy options: a tax rate increase for a specific tax bracket.Footnote 14 Rather than focusing on the optimal level of the tax rate, I simply determine whether raising the rate is desirable or not, and how this depends on features of the actual, possibly suboptimal, tax system. For simplicity, I disregard income effects on the tax base (\(\eta ^i=0\)) and suboptimal behavior (\(\omega ^i=0\)).Footnote 15 Consider a tax bracket that applies to gross income between \(z^{a}\) and \(z^{b}\). A tax reform that raises this bracket’s tax rate by \(\textrm{d}\kappa\) can be modeled as \(\tau (z)=0\) for \(z<z^{a}\), \(\tau (z)=(z-z^{a})\) for \(z\in [z^{a},z^{b}]\), and \(\tau (z)=(z^{b}-z^{a})\) for \(z>z^{b}\). This indeed implies that \(\tau _z(z)=1\) for \(z\in [z^a,z^b]\) and \(\tau _z(z)=0\) otherwise. The reform is illustrated in Fig. 1, panel (c). Corollary 1 establishes that this reform raises net social welfare if and only if expression (14) is strictly positive. Substituting the reform into expression (14), we thus get the following desirability condition for increasing the bracket’s tax rate:

$$\begin{aligned} \small \int _{{\mathcal {I}}:z^{i}\in [z^{a},z^{b}]}( z^{i}-z^{a})( 1-g^{i}) \textrm{d}i&+\int _{{\mathcal {I}}:z^{i}>z^{b}}(z^{b}-z^{a})( 1-g^{i}) \textrm{d}i \\&\qquad >\int _{z^{a}}^{z^{b}} \frac{T_{z}}{1-T_{z}} \cdot {\bar{e}}_c \cdot z^{i}h(z^i) \cdot \textrm{d}z^i, \nonumber \end{aligned}$$
(23)

where I substituted for the income density on the right-hand side. The left-hand-side of Eq. (23) represents the redistributional benefits of the reform. It gives the difference between the social marginal value of public resources and the social marginal value of private resources for every mechanical unit of tax revenue raised from individuals within the bracket (first integral) and from individuals above the bracket (second integral). Thus, an individual i within the bracket sees his tax burden increase by \((z^i-z^a)\textrm{d}\kappa\), whereas the tax burden of an individual i above the tax bracket increases by \((z^b-z^a)\textrm{d}\kappa\). The total redistributional benefits of the reform generally depend on welfare weights \(g^i\), which ultimately makes desirability a matter of political judgment.Footnote 16

Whereas the redistributional benefits of the reform crucially depend on political values, we can say more about the distortionary costs of the reform, given by the right-hand side of Eq. (23). As usual, these costs are increasing with the responsiveness of the tax base, as measured by the compensated elasticity, the marginal tax wedges within the bracket, and the amount of income that falls within the bracket. Notice, however, that the distortionary costs do not simply equal the product of these three factors’ averages. As can be seen from Eq. (23), it also matters how these factors are correlated. This issue is sidestepped by almost every study that measures the distortionary costs of raising the tax rate within a certain income interval. That is, the literature typically assumes that both the marginal tax rates and the elasticity are constant over the interval of interest. In that case, the marginal distortionary costs indeed reduce to the product of the elasticity, the tax wedge, and the amount of income within the interval.Footnote 17

In reality, tax schedules tend to be highly nonlinear, causing this approach to yield biased estimates of the marginal distortionary costs of taxation. Nonlinearities in actual tax schedules stem from means-tested welfare programs such as an earned income tax credit, rental support, or child benefits, as well as different tax brackets. As means-tested benefits are being phased out with income, marginal taxes tend to decline with income from relatively high rates in the phase-out interval to relatively low rates in the phased-out interval. The same income range is typically associated with increasing income concentrations. Equation (23) then tells us that the distortionary costs of a bracket’s tax rate are lower if this bracket overlaps with the phase-out (and phased-out) income interval of means-tested welfare programs.

6 Broader applicability of the dual approach

The focus of this paper has been on illustrating how the dual approach can be applied to solve for optimal nonlinear income taxes. I show this within a standard context with individuals that only make one intensive-margin decision on the size of their tax base—while allowing for heterogeneous preferences and individual utility misoptimization. However, the dual approach is versatile enough to be much more broadly applicable. In what follows, I therefore illustrate how the above analysis can be adjusted to take into account various nonlinear reforms outside the optimum, multiple intensive decision margins, a participation margin, and multiple tax bases that are subject to separate nonlinear tax schedules.

Nonlinear reforms outside the optimum—The third reform in the previous section just looked at one specific tax reform that might be relevant for actual policy making. That reform was essentially linear—raising the proportional tax rate of a specific bracket—though evaluated within the context of an actual nonlinear schedule of effective marginal tax rates. However, the dual approach can be readily applied to more complicated nonlinear reforms that play a role in actual policy discussions. For example, one could analyze different types of phase-out schedules for the EITC or other welfare programs, or changes to a quadratic tax schedule.Footnote 18 Is it better to phase out the EITC at a linear rate—raising effective marginal tax rates by the same amount across the phase-out range—or at an increasing or decreasing rate? Introducing an increasing phase-out rate within the range \([z^{a},z^{b}]\) could be modeled with a specific reform function \(\tau (z)\) with \(\tau _z(z)>0\) and increasing over the phase-out range. Conversely, a decreasing phase-out rate could be modeled with a reform function that has \(\tau _z(z)>0\) and decreasing over the phase-out range. As before, substituting these reforms into Eq. (14) allows one to readily evaluate the welfare consequences of either phase-out function for any arbitrary initial tax schedule.

Multiple intensive margins— It is straightforward to allow individuals to make more decisions than only the one that determines their tax base. As long as these decisions are unobservable to the tax authority, and therefore untaxed, the analysis remains unchanged in the case of utility-maximizing individuals. Then, even if a tax reform affects individual behavior on these additional decision margins, this does not affect their utility (because of individual utility maximization), nor does it affect government revenue (because the additional decisions are untaxed).

This convenient conclusion no longer holds if individuals do not perfectly maximize utility when making these additional decisions. To see this, notice that the term \(\omega ^{i}\) enters Eq. (14) as a welfare effect of the tax reform. With multiple decision margins, similar terms for every decision margin would enter Eq. (14), thereby yielding multiple corrective reasons for marginal taxes. As a simple example, imagine that individuals perfectly maximize utility when deciding on their (taxed) labor income, but mistakenly consume too much and save too little of their earned income. Then if future consumption is complementary with leisure, higher labor income taxes would be helpful in correcting individuals’ savings decision even though there is no need for a labor-supply correction.

Participation margin—The analysis can further be adapted to allow for a participation margin. For simplicity, I only consider the standard case in which individuals with the same income have the same intensive-margin elasticities, and in which individuals maximize their utility. The latter assumption ensures that a small tax reform only mechanically affects individuals’ utility due to changes in tax burdens, but not through behavioral changes. As a result, a reform of the marginal income tax affects individuals’ utility in essentially the same way as in the case without a participation margin. I can therefore focus attention on how adding a participation margin affects a reform’s effect on government revenue.

For this, I refine the definition of \(z^{i}\) as the “notional tax base,” i.e., the tax base individual i would choose if he decides to participate. His actual tax base when deciding not to participate equals 0. I furthermore introduce a parameter \(\pi ^{i}(\kappa )\) that indicates the share of labor market participants among individuals with notional income \(z^i\). The government budget can then be rewritten as:

$$\begin{aligned} {\mathcal {B}}=\int _{{\mathcal {I}}}\left( \pi ^{i}(\kappa )T(z^{i},\kappa )+(1-\pi ^{i}(\kappa ))T(0,\kappa )\right) \textrm{d}i, \end{aligned}$$
(24)

which gives the integral over participants’ and non-participants’ tax burdens. Taking derivatives, the effect of a marginal tax reform on government revenue can be seen to equal:

$$\begin{aligned} \frac{\textrm{d}{\mathcal {B}}}{\textrm{d}\kappa }=\int _{{\mathcal {I}}}\left( \pi ^{i}(\kappa )\left( \tau (z^{i})+T^i_{z}\frac{\textrm{d}z^{i}}{\textrm{d} \kappa }\right) +(1-\pi ^{i}(\kappa ))\tau (0)+\left( T^i-T^0\right) \frac{\textrm{d}\pi ^{i}}{\textrm{d}\kappa }\right) \textrm{d}i, \end{aligned}$$
(25)

with \(T^0\equiv T(0,\kappa )\). Thus, the reform yields mechanical revenue changes for both participants and non-participants, an intensive behavioral effect on the tax base (\(\textrm{d}z^{i}/\textrm{d}\kappa\)), and an extensive behavioral effect on the tax base (\(\textrm{d}\pi ^{i}/\textrm{d}\kappa\)). The latter behavioral response would typically be unaffected by changes in marginal taxes, but responsive to changes in average tax rates. As a result, the total welfare effect of an increase in the marginal tax rate at \(z^{*}\) now includes the reduced government revenue due to lower participation rates among individuals whose notional income exceeds \(z^{*}\). This additional cost of taxation should be taken into account in the optimum and tends to reduce optimal marginal tax rates.

Multiple tax bases—The dual approach can also be fruitfully employed to study the desirability of other types of government policy in combination with a nonlinear tax schedule. For linear commodity taxation and public good provision, this has previously been illustrated by Christiansen (1981, 1984). But one can also deal with multiple nonlinear tax schedules as in the case of labor-income and capital-income taxes (e.g., Gerritsen et al., 2022). For example, let \(T^{z}\) denote a nonlinear labor-income tax with tax base z, and \(T^{y}\) a nonlinear capital income tax with tax base y. Similar to the analysis above, both nonlinear taxes can be parameterized as \(T^{z}(z,\kappa ^{z})\) and \(T^{y}(z,\kappa ^{y})\) to allow for straightforward welfare analysis of any nonlinear reform of either tax.

7 Conclusion

This paper develops a method to solve for the optimal nonlinear income tax based on the dual approach. The procedure is intuitive and remains close in spirit to actual tax policy. It moreover relies on optimization techniques that are well-known to any undergraduate student of economics, which should make it easier to convey key results to policy makers and students, as well as other academic scholars. I showed that the approach can be applied to not only obtain well-known results in a more intuitive way, but also to solve for optimal nonlinear taxes when individuals have heterogeneous preferences and when they do not perfectly maximize their utility. It moreover allows one to gain new insights into the welfare effects of limited tax reforms outside the optimum, something for which the primal approach is especially ill-suited. I furthermore sketched how the dual approach can be applied to deal with nonlinear tax reforms outside the optimum, and with multiple decision margins, a participation margin, and multiple nonlinear tax bases.