Skip to main content
Log in

Optimal tax problems with multidimensional heterogeneity: a mechanism design approach

  • Original Paper
  • Published:
Social Choice and Welfare Aims and scope Submit manuscript

Abstract

We propose a new method, that we call an allocation perturbation, to derive the optimal nonlinear income tax schedules with multidimensional individual characteristics on which taxes cannot be conditioned. It is well established that, when individuals differ in terms of preferences on top of their skills, optimal marginal tax rates can be negative. In contrast, we show that with heterogeneous behavioral responses and skills, one has optimal positive marginal tax rates, under utilitarian preferences and maximin.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. John Weymark has largely contributed to the literature on optimal income taxation, see e.g. Weymark (1986), Weymark (1987) and Brett and Weymark (2011).

  2. Our paper studies the optimal tax system when individual characteristics, despite being observable by the tax authority, cannot be used as tags (Akerlof 1978), due to legal and/or horizontal equity reasons.

  3. Our definition of “group” is identical to the one in Werning (2007), p. 13.

  4. A smoothly increasing (decreasing) function is also called an increasing (decreasing) diffeomorphism for which the derivative maps the positive real line onto itself.

  5. In (Jacquet and Lehmann 2021, Proposition 5), we show that the assumption of a smoothly-increasing-in-types allocation amounts to assuming: (i) twice differentiability of the tax function \(T(\cdot )\), that (ii) for all \((w,\theta )\in \mathbb {R}_+^*\times \Theta\), the second-order condition associated to the individual maximization program holds strictly and that (iii) for all \((w,\theta )\in \mathbb {R}_+^*\times \Theta\), the function \(y\mapsto \mathscr {U}\left( y-T(y),y;w,\theta \right)\) admits a unique global maximum over \(\mathbb {R}_+\).

  6. For instance, we never found cases where the second-order incentive-compatibility constraints were violated in the large set of simulations we run on US data with taxpayers differing in terms of gender and labor supply elasticities, see Jacquet and Lehmann (2021).

  7. More precisely, in the left-hand side of Eq. (14a), the term \(-\frac{v_{y}\left[ w,\theta \right] }{w\cdot v_{yw}\left[ w,\theta \right] }\) which is equal to the ratio of \(\varepsilon (w,\theta )\) and \(\alpha (w,\theta )\) [see Eq. (35) in the Appendix], is weighted by the conditional density times the skill, \(W(w,\theta )\ f(W(y,\theta )|\theta )\). And, in the right-hand side of (14a), which encapsulates the mechanical and income effects, the weights are the conditional skill densities.

  8. In Hellwig (2007), under a utilitarian criterion, positive optimal tax rates are obtained with more general preferences.

  9. If the utility function \(u(\cdot )\) in (1) were parameterized by type w and \(\theta\) while \(v(\cdot )\) were simply parameterized by w, individuals who earn the same income would have distinct social marginal welfare weights. This could drive negative marginal tax rates. Similarly, if both \(u(\cdot )\) and \(v(\cdot )\) were parameterized by w and \(\theta\), one would also expect negative marginal tax rates. Let us stress that our method could not be used in this framework since the pooling function (10) cannot depend simultaneously on Y and C.

  10. Hence function \({\underline{W}}(\cdot ,\theta )\) coincides with the pooling function \(W(\cdot ,\theta )\).

  11. Indeed, at \(m=0\), \(\mathscr {Y}^R_y\) does no longer depend on the direction R of the tax reform.

References

  • Akerlof GA (1978) The economics of tagging as applied to the optimal income tax, welfare programs, and manpower planning. Am Econ Rev 68(1):8–19

    Google Scholar 

  • Blumkin T, Sadka E, Shem-Tov Y (2015) International tax competition: zero tax rate at the top re-established. Int Tax Public Finance 22(5):760–776

    Article  Google Scholar 

  • Boadway R, Marchand M, Pestieau P, del Mar Racionero M (2002) Optimal redistribution with heterogeneous preferences for leisure. J Public Econ Theory 4(4):475–498

    Article  Google Scholar 

  • Brett C, Weymark JA (2003) Financing education using optimal redistributive taxation. J Public Econ 87(11):2549–2569

    Article  Google Scholar 

  • Brett C, Weymark JA (2011) How optimal nonlinear income taxes change when the distribution of the population changes. J Public Econ 95(11):1239–1247

    Article  Google Scholar 

  • Choné P, Laroque G (2010) Negative marginal tax rates and heterogeneity. Am Econ Rev 100(5):2532–47

    Article  Google Scholar 

  • Cremer H, Lozachmeur J-M, Pestieau P (2012) Income taxation of couples and the tax unit choice. J Popul Econ 25(2):763–778

    Article  Google Scholar 

  • Cuff K (2000) Optimality of workfare with heterogeneous preferences. Can J Econ 33(1):149–174

    Article  Google Scholar 

  • Diamond P (1998) Optimal income taxation: an example with U-shaped pattern of optimal marginal tax rates. Am Econ Rev 88(1):83–95

    Google Scholar 

  • Gomes R, Lozachmeur J-M, Pavan A (2018) Differential taxation and occupational choice. Rev Econ Stud 85(1):511–557

    Article  Google Scholar 

  • Guesnerie R (1995) A contribution to the pure theory of taxation. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Hammond P (1979) Straightforward individual incentive compatibility in large economies. Rev Econ Stud 46(2):263–282

    Article  Google Scholar 

  • Hellwig MF (2007) A contribution to the theory of optimal utilitarian income taxation. J Public Econ 91(7):1449–1477

    Article  Google Scholar 

  • Hendren N (2020) Measuring economic efficiency using inverse-optimum weights. J Public Econ 2020:187

    Google Scholar 

  • Jacquet L, Lehmann E (2013) Optimal redistributive taxation with both extensive and intensive responses. J Econ Theory 148(5):1770–1805

    Article  Google Scholar 

  • Jacquet L, Lehmann E (2021) Optimal income taxation with composition effects. J Eur Econ Assoc 19(2):1299–1341

    Google Scholar 

  • Kleven HJ, Kreiner CT, Saez E (2009) The optimal income taxation of couples. Econometrica 77(2):537–560

    Article  Google Scholar 

  • Lehmann E, Simula L, Trannoy A (2014) Tax me if you can! Otimal nonlinear income tax between competing governments. Q J Econ 129(4):1995–2030

    Article  Google Scholar 

  • Lockwood BB, Weinzierl M (2015) De Gustibus non est Taxandum: heterogeneity in preferences and optimal redistribution. J Public Econ 124:74–80

    Article  Google Scholar 

  • Mirrlees J (1971) An exploration in the theory of optimum income taxation. Rev Econ Stud 38(2):175–208

    Article  Google Scholar 

  • Piketty T (1997) La Redistribution fiscale contre le chômage. Revue française d’économie 12(1):157–203

    Article  Google Scholar 

  • Ramsey FP (1927) A contribution to the theory of taxation. Econ J 37(145):47–61

    Article  Google Scholar 

  • Rochet J (1985) The taxation principle and multi-time Hamilton-Jacobi equations. J Math Econ 14(2):113–128

    Article  Google Scholar 

  • Rochet J-C, Stole LA (2002) Nonlinear pricing with random participation. Rev Econ Stud 69(1):277–311

    Article  Google Scholar 

  • Rochet J-C, Stole LA (1998) Ironing, sweeping, and multidimensional screening. Econometrica 66(4):783–826

    Article  Google Scholar 

  • Rothschild C, Scheuer F (2013) Redistributive taxation in the Roy model. Q J Econ 128(2):623–668

    Article  Google Scholar 

  • Rothschild C, Scheuer F (2016) Optimal taxation with rent-seeking. Rev Econ Stud 83(3):1225–1262

    Article  Google Scholar 

  • Sachs D, Tsyvinski A, Werquin N (2020) Nonlinear tax incidence and optimal taxation in general equilibrium. Econometrica 88(2):469–493

    Article  Google Scholar 

  • Saez E (2001) Using elasticities to derive optimal income tax rates. Rev Econ Stud 68(1):205–229

    Article  Google Scholar 

  • Saez E (2002) Optimal income transfer programs: intensive versus extensive labor supply responses. Q J Econ 117:1039–1073

    Article  Google Scholar 

  • Salanié B (2011) The economics of taxation, 2nd edn. MIT Press

  • Scheuer F (2013) Adverse selection in credit markets and regressive profit taxation. J Econ Theory 148(4):1333–1360

    Article  Google Scholar 

  • Scheuer F (2014) Entrepreneurial taxation with endogenous entry. Am Econ J Econ Policy 6(2):126–63

    Article  Google Scholar 

  • Scheuer F, Werning I (2016) Mirrlees meets Diamond-Mirrlees. Working Paper 22076, National Bureau of Economic Research, March

  • Werning I (2007) Pareto efficient income taxation. MIT Working Paper 2007

  • Weymark JA (1986) A reduced-form optimal nonlinear income tax problem. J Public Econ 30(2):199–217

    Article  Google Scholar 

  • Weymark JA (1987) Comparative static properties of optimal nonlinear income taxes. Econometrica 55(5):1165–1185

    Article  Google Scholar 

  • Wilson RB (1993) Nonlinear pricing. Oxford University Press, Oxford

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laurence Jacquet.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We thank Craig Brett and two anonymous refererees, Pierre Boyer, Vidar Christiansen, Guy Laroque, Emmanuel Saez, Laurent Simula, Stefanie Stantcheva, Kevin Spiritus, Alain Trannoy and Nicolas Werquin. This research was partly realized while Laurence Jacquet was research associate at Oslo Fiscal Studies. She gratefully acknowledges support by Labex MME-DII. The usual disclaimer applies.

Etienne Lehmann is also research fellow at IZA, CESifo and CEPR.

A Appendix

A Appendix

1.1 A.1 Proof of Lemma 4

Proof

The proof consists of two steps. In step (i), we show that there exists at most one incentive-compatible allocation \((w,\theta )\mapsto ({\underline{C}}(w,\theta ),{\underline{Y}}(w,\theta ))\) that verifies Assumption 3 and such that \(({\underline{C}}(w,\theta _0),{\underline{Y}}(w,\theta _0))=(C(w,\theta _0),Y(w,\theta _0))\). In step (ii), we show that this allocation verifies the whole set of incentive constraints (6).

Step (i). To build up the entire incentive-compatible allocation \((w,\theta )\mapsto ({\underline{C}}(w,\theta ),{\underline{Y}}(w,\theta ))\), we must choose \(({\underline{C}}(w,\theta _0),{\underline{Y}}(w,\theta _0))=(C(w,\theta _0),Y(w,\theta _0))\) at any skill level. For each group \(\theta\), \({\underline{Y}}(\cdot ,\theta )\) verifies Assumption 3 if and only if its reciprocal \({\underline{Y}}^{-1}(\cdot ;\theta )\) is smoothly increasing. Let \(y\in \mathbb {R}_+\) be an income level. As \(Y(\cdot ,\theta _0)\) is smoothly increasing from Assumption 3, there exists a unique skill level w such that \(y=Y(w,\theta _0)\). Then according to Lemma 3, among individuals of group \(\theta\), only those of skill \({\underline{W}}(w,\theta )\) must be assigned to the income level \(y=Y(w,\theta _0)\) to verify incentive-compatibility.Footnote 10 Therefore, \({\underline{Y}}^{-1}(\cdot ,\theta )\) must be defined by:

$$\begin{aligned} {\underline{Y}}^{-1}(\cdot ,\theta ):\qquad y\overset{Y^{-1}(\cdot ,\theta _0)}{\longmapsto }w=Y^{-1}(y,\theta _0) \overset{{\underline{W}}(\cdot ,\theta )}{\longmapsto }Y^{-1}(y,\theta ). \end{aligned}$$

\({\underline{Y}}^{-1}(\cdot ,\theta )\) is then smoothly increasing as a combination of two smoothly increasing functions. Moreover, since for each type \((\omega ,\theta )\), there exists a single skill level \(\omega\) such that \({\underline{Y}}(\omega ,\theta )=\textit{Y}(w,\theta _0)\), incentive compatibility requires that \({\underline{C}}(\omega ,\theta )\) also needs to be equal to \({\underline{C}}(w,\theta _0)\). This ends the proof of step (i).

Step (ii). Note that the allocation \((w,\theta )\mapsto ({\underline{Y}}(w,\theta ),{\underline{C}}(w,\theta ))\) is built in such a way that one has \({\underline{Y}}(\omega ,\theta )=\textit{Y}(w,\theta _0)\text { and } {\underline{C}}(\omega ,\theta )=\textit{C}(w,\theta _0)\) if and only if \(\omega ={\underline{W}}(w,\theta )\) and (10) holds. Differentiating in w both sides of these two equations and rearranging terms, we obtain

$$\begin{aligned} \frac{{\dot{C}}\left( w,\theta _0\right) }{{\dot{Y}}\left( w,\theta _0\right) }= \frac{\dot{{\underline{C}}}\left( {\underline{W}}(w,\theta ),\theta _0\right) }{\dot{{\underline{Y}}}\left( {\underline{W}}(w,\theta ),\theta _0\right) }. \end{aligned}$$

As \(w\mapsto (C(w,\theta _0),Y(w,\theta _0))\) is assumed to verify the within-group incentive constraints in Eq. (8b), we know that the left-hand side of the above equation is equal to

$$\begin{aligned} \mathscr {M}(C(w,\theta _0),Y(w,\theta _0);w,\theta _0). \end{aligned}$$

Using the definition of \({\underline{W}}(\cdot ,\theta )\), we have that \(w\mapsto ({\underline{C}}(w,\theta ),{\underline{Y}}(w,\theta ))\) also verifies Eq. (8b). From Lemma 2, it thus verifies the within-group incentive constraints (7). We now check whether the inequality (6) is verified for any \((w,w',\theta ,\theta ')\in \mathbb {R}_+^2\times \Theta ^2\). We know there exists \(\omega \in \mathbb {R}_+\) such that \({\underline{Y}}(\omega ,\theta )={\underline{Y}}(w',\theta ')\text { and } {\underline{C}}(\omega ,\theta )={\underline{C}}(w',\theta ')\). The incentive constraints in (6) are therefore equivalent to:

$$\begin{aligned} \mathscr {U}\left( C(w,\theta ),Y(w,\theta );w,\theta \right) \ge \mathscr {U}\left( C(\omega ,\theta ),Y(\omega ,\theta );w,\theta \right) . \end{aligned}$$

The latter inequality is verified as \(w\mapsto ({\underline{C}}(w,\theta ),{\underline{Y}}(w,\theta ))\) satisfies Eq. (8b). \(\square\)

1.2 A.2 Derivation of Eq. (17)

Proof

To derive (17), we must compute the various Gâteaux derivatives at \(t=0\). For \(A=C,Y,U\) and a given \(\delta\), the Gâteaux derivative of A in the direction \(\Delta _Y(\cdot ,\cdot ;\delta )\) at \(t=0\) is denoted \(\hat{{\hat{A}}}(x,\theta ;\delta )\). Let us remind its definition:

$$\begin{aligned} \hat{{\hat{A}}}(x,\theta ;\delta )\overset{\text {def}}{\equiv }\underset{t\mapsto 0}{\lim } \frac{{\hat{A}}(x,\theta ;t,\delta )-A(w,\theta )}{t}. \end{aligned}$$

By definition we get: \(\hat{{\hat{Y}}}(x,\theta _0;\delta )=\Delta _Y(x;\delta )\), and from (15b) we obtain:

$$\begin{aligned} \hat{{\hat{Y}}}(x,\theta ;\delta )=0\qquad \text {if}\qquad x \in \left[ 0,W(w-\delta ,\theta )\right] \cup \left[ W(w,\theta ),+\infty \right) . \end{aligned}$$
(24a)

Equation (15c) imply that the Gâteaux derivatives of utilities are nil for skill below \(W(w-\delta ,\theta )\). For skills x between \(W(w-\delta ,\theta )\) and \(W(w,\theta )\), Eq. (15e) implies:

$$\begin{aligned} \hat{{\hat{U}}}(x,\theta ;\delta )= -\int _{W(w-\delta ,\theta )}^{x} \upsilon _{yw}\left( Y(\omega ,\theta _0);\omega ,\theta _0\right) \ \hat{{\hat{Y}}}(\omega ,\theta _0;\delta )\ d\omega . \end{aligned}$$
(24b)

For skill x above \(W(w,\theta )\), according to (15f), we have:

$$\begin{aligned} \hat{{\hat{U}}}(x,\theta ;\delta )= -\int _{W(w-\delta ,\theta )}^{W(w,\theta )} \upsilon _{yw}\left( Y(\omega ,\theta _0);\omega ,\theta _0\right) \ \hat{{\hat{Y}}}(\omega ,\theta _0;\delta )\ d\omega . \end{aligned}$$
(24c)

Moreover, Eq. (15h) implies that the Gâteaux derivatives of income must verify:

$$\begin{aligned} \int _{w-\delta }^{w} \upsilon _{yw}\left( Y(\omega ,\theta _0);\omega ,\theta \right) \ \hat{{\hat{Y}}}(\omega ,\theta _0;\delta ) \ d\omega = \int _{W(w-\delta ,\theta )}^{W(w,\theta )} \upsilon _{yw}\left( Y(\omega ,\theta );\omega ,\theta \right) \ \hat{{\hat{Y}}}(\omega ,\theta ;\delta ) \ d\omega . \end{aligned}$$
(24d)

Using Eqs. (12), (24a) and (24c), the Gâteaux derivative of the Lagrangian (16) is:

$$\begin{aligned}&\frac{\partial \hat{{\mathscr {L}}}}{\partial t}(0;\delta )= \int _{\theta \in \Theta }\left\{ \int _{W(w-\delta ,\theta )}^{W(w,\theta )} \left( 1-\frac{\upsilon _y(Y(x,\theta );x,\theta )}{u^\prime (C(x,\theta ))}\right) \hat{{\hat{Y}}}(x,\theta ;\delta ) f(x|\theta )dx\right. \nonumber \\&+ \left. \int _{W(w-\delta ,\theta )}^{W(w,\theta )} \left( \frac{\Phi _U[x,\theta ]}{\lambda }-\frac{1}{u^\prime [x,\theta ]}\right) \hat{{\hat{U}}}(x,\theta ;\delta ) f(x|\theta )dx\right. \nonumber \\&- \left( \int _{W(w-\delta ,\theta )}^{W(w,\theta )} \upsilon _{yw}\left( Y(x,\theta );x,\theta \right) \ \hat{{\hat{Y}}}(x,\theta ;\delta )\ dx\right) \nonumber \\&\times \left. \left( \int _{W(w,\theta )}^{\infty } \left( \frac{\Phi _U[x,\theta ]}{\lambda }-\frac{1}{u^\prime [x,\theta ]}\right) f(x|\theta )dx \right) \right\} d\mu (\theta ). \end{aligned}$$
(25)

Dividing the first-order condition \(\frac{\partial \hat{\mathscr {L}}}{\partial t}(0;\delta )=0\) by \(\int _{w-\delta }^{w} \upsilon _{yw}\left( Y(x,\theta _0);x,\theta _0\right) \ \hat{{\hat{Y}}}(x,\theta _0;\delta )\ dx\) implies, using (24b) and (24d), that:

$$\begin{aligned}&\int _{\theta \in \Theta } \dfrac{\int _{W(w-\delta ,\theta )}^{W(w,\theta )} \left( 1-\dfrac{\upsilon _y(Y(x,\theta );x,\theta )}{u^\prime (C(x,\theta ))}\right) \hat{{\hat{Y}}}(x,\theta ;\delta ) f(x|\theta )dx}{\int _{W(w-\delta ,\theta )}^{W(w,\theta )} \upsilon _{yw}\left( Y(x,\theta );x,\theta \right) \ \hat{{\hat{Y}}}(x,\theta ;\delta )\ dx} \ d\mu (\theta ) \nonumber \\&\quad = \int _{\theta \in \Theta }\left\{ \int _{W(w-\delta ,\theta )}^{W(w,\theta )} \left( \dfrac{\Phi _U[x,\theta ]}{\lambda }-\dfrac{1}{u^\prime [x,\theta ]}\right) \dfrac{\int _{W(w-\delta ,\theta )}^{x} \upsilon _{yw}\left( Y(x,\theta );x,\theta \right) \ \hat{{\hat{Y}}}(x,\theta ;\delta )\ dx}{\int _{W(w-\delta ,\theta )}^{W(w,\theta )} \upsilon _{yw}\left( Y(x,\theta );x,\theta \right) \ \hat{{\hat{Y}}}(x,\theta ;\delta )\ dx } f(x|\theta )dx \right. \nonumber \\&\qquad + \left. \int _{W(w,\theta )}^{\infty } \underset{}{\left( \frac{\Phi _U[x,\theta ]}{\lambda }-\frac{1}{u^\prime [x,\theta ]}\right) } f(x|\theta )dx\right\} d\mu (\theta ). \end{aligned}$$
(26)

We finally take the limit of the latter equality when \(\delta\) tends to 0. Let us consider the first term in the right-hand side of (26). Since

$$\begin{aligned} \dfrac{\int _{W(w-\delta ,\theta )}^{x} \upsilon _{yw}\left( Y(x,\theta );x,\theta \right) \ \hat{{\hat{Y}}}(x,\theta ;\delta )\ dx}{\int _{W(w-\delta ,\theta )}^{W(w,\theta )} \upsilon _{yw}\left( Y(x,\theta );x,\theta \right) \ \hat{{\hat{Y}}}(x,\theta ;\delta )\ dx } \in [0,1] \end{aligned}$$

we get that:

$$\begin{aligned}&\left| \int _{\theta \in \Theta } \int _{W(w-\delta ,\theta )}^{W(w,\theta )} \left( \dfrac{\Phi _U[x,\theta ]}{\lambda }-\dfrac{1}{u^\prime [x,\theta ]}\right) \dfrac{\int _{W(w-\delta ,\theta )}^{x} \upsilon _{yw}\left( Y(x,\theta );x,\theta \right) \ \hat{{\hat{Y}}}(x,\theta ;\delta )\ dx}{\int _{W(w-\delta ,\theta )}^{W(w,\theta )} \upsilon _{yw}\left( Y(x,\theta );x,\theta \right) \ \hat{{\hat{Y}}}(x,\theta ;\delta )\ dx } f(x|\theta )dx d\mu (\theta )\right| \\&\quad \le \left| \int _{\theta \in \Theta } \int _{W(w-\delta ,\theta )}^{W(w,\theta )} \left( \dfrac{\Phi _U[x,\theta ]}{\lambda }-\dfrac{1}{u^\prime [x,\theta ]}\right) f(x|\theta )dx d\mu (\theta )\right| . \end{aligned}$$

As the right hand-side of the latter inequality tends to 0 when \(\delta\) tends to 0, the limit of (26) when \(\delta\) tends to zero leads to:

$$\begin{aligned}&\underset{\delta \mapsto 0}{\lim }\quad \int _{\theta \in \Theta } \dfrac{\int _{W(w-\delta ,\theta )}^{W(w,\theta )} \left( 1-\dfrac{\upsilon _y(Y(x,\theta );x,\theta )}{u^\prime (C(x,\theta ))}\right) \hat{{\hat{Y}}}(x,\theta ;\delta ) f(x|\theta )dx}{\int _{W(w-\delta ,\theta )}^{W(w,\theta )} \upsilon _{yw}\left( Y(x,\theta );x,\theta \right) \ \hat{{\hat{Y}}}(x,\theta ;\delta )\ dx} d\mu (\theta ) \nonumber \\&\quad = \iint _{\theta \in \Theta ,x\ge W(w,\theta )} \left( \frac{\Phi _U[x,\theta ]}{\lambda }-\frac{1}{u^\prime [x,\theta ]}\right) f(x|\theta )dx\ d\mu (\theta ). \end{aligned}$$
(27)

By continuity, the variations of \(f(x|\theta )\), \(\upsilon _y(Y(x,\theta );x,\theta )\), \(\upsilon _{yw}(Y(x,\theta );x,\theta )\) and \(u^\prime (c(x,\theta ))\) within the skill intervals \([W(w-\delta ,\theta ),W(w,\theta )]\) are of second-order when \(\delta\) tends to 0. As \(\Theta\) and intervals \([W(w-\delta ,\theta ),W(w,\theta )]\) are compact, for any small \(e>0\), there always exists \({\tilde{\delta }}(e)\) such that for all \((x,\theta )\in [W(w-{\tilde{\delta }}(e),\theta ),W(w,\theta )]\times \Theta\), one has:

$$\begin{aligned}&\left( \frac{1-\upsilon _y[ W(w,\theta ) ,\theta ]}{u^\prime (C(W(w,\theta ),\theta )}f(W(w,\theta )|\theta )-e\right) \hat{{\hat{Y}}}(x,\theta ;\delta ) \le \left( \frac{1-\upsilon _y[ W(x,\theta ) ,\theta ]}{u^\prime (C(W(x,\theta ),\theta )}f(x|\theta )\right) \hat{{\hat{Y}}}(x,\theta ;\delta ) \\&\quad \le \left( \frac{1-\upsilon _y[W(w,\theta ) ,\theta ]}{u^\prime (C(W(w,\theta ),\theta )}f(W(w,\theta )|\theta )+e \right) \hat{{\hat{Y}}}(x,\theta ;\delta ) \end{aligned}$$

and

$$\begin{aligned}&\left( \upsilon _{yw} [W(w,\theta ) ,\theta ] - e\right) \hat{{\hat{Y}}}(x,\theta ;\delta ) \le \upsilon _{yw} [W(x,\theta ) ,\theta ] \ \hat{{\hat{Y}}}(x,\theta ;\delta ) \le \left( \upsilon _{yw} [W(w,\theta ) ,\theta ] +e \right) \hat{{\hat{Y}}}(x,\theta ;\delta )<0 \end{aligned}$$

so that for all \(\delta <{\tilde{\delta }}(e)\):

$$\begin{aligned}&\int _{\theta \in \Theta } \dfrac{\left( 1-\dfrac{\upsilon _y(Y(W(w,\theta ),\theta );W(w,\theta ),\theta )}{u^\prime (C(W(w,\theta ),\theta )}\right) f(W(w,\theta )|\theta )+e}{\upsilon _{yw}(Y(W(w,\theta ),\theta );W(w,\theta ),\theta )-e}\ \dfrac{\int _{W(w-\delta ,\theta )}^{W(w,\theta )} \hat{{\hat{Y}}}(x,\theta ;\delta ) dx}{\int _{W(w-\delta ,\theta )}^{W(w,\theta )} \hat{{\hat{Y}}}(x,\theta ;\delta )\ dx} d\mu (\theta ) \\&\quad \le \int _{\theta \in \Theta } \dfrac{\int _{W(w-\delta ,\theta )}^{W(w,\theta )} \left( 1-\dfrac{\upsilon _y(Y(x,\theta );x,\theta )}{u^\prime (C(x,\theta ))}\right) \hat{{\hat{Y}}}(x,\theta ;\delta ) f(x|\theta )dx}{\int _{W(w-\delta ,\theta )}^{W(w,\theta )} \upsilon _{yw}\left( Y(x,\theta );x,\theta \right) \ \hat{{\hat{Y}}}(x,\theta ;\delta )\ dx} d\mu (\theta ) \\&\quad \le \int _{\theta \in \Theta } \dfrac{\left( 1-\dfrac{\upsilon _y(Y(W(w,\theta ),\theta );W(w,\theta ),\theta )}{u^\prime (C(W(w,\theta ),\theta )}\right) f(W(w,\theta )|\theta )-e}{\upsilon _{yw}(Y(W(w,\theta ),\theta );W(w,\theta ),\theta )+e}\ \dfrac{\int _{W(w-\delta ,\theta )}^{W(w,\theta )} \hat{{\hat{Y}}}(x,\theta ;\delta ) dx}{\int _{W(w-\delta ,\theta )}^{W(w,\theta )} \hat{{\hat{Y}}}(x,\theta ;\delta )\ dx} d\mu (\theta ). \end{aligned}$$

and therefore, for all \(\delta <{\tilde{\delta }}(e)\):

$$\begin{aligned}&\int _{\theta \in \Theta } \dfrac{\left( 1-\dfrac{\upsilon _y(Y(W(w,\theta ),\theta );W(w,\theta ),\theta )}{u^\prime (C(W(w,\theta ),\theta )}\right) f(W(w,\theta )|\theta )+e}{\upsilon _{yw}(Y(W(w,\theta ),\theta );W(w,\theta ),\theta )-e} d\mu (\theta ) \\&\quad \le \int _{\theta \in \Theta } \dfrac{\int _{W(w-\delta ,\theta )}^{W(w,\theta )} \left( 1-\dfrac{\upsilon _y(Y(x,\theta );x,\theta )}{u^\prime (C(x,\theta ))}\right) \hat{{\hat{Y}}}(x,\theta ;\delta ) f(x|\theta )dx}{\int _{W(w-\delta ,\theta )}^{W(w,\theta )} \upsilon _{yw}\left( Y(x,\theta );x,\theta \right) \ \hat{{\hat{Y}}}(x,\theta ;\delta )\ dx} d\mu (\theta ) \\&\quad \le \int _{\theta \in \Theta } \dfrac{\left( 1-\dfrac{\upsilon _y(Y(W(w,\theta ),\theta );W(w,\theta ),\theta )}{u^\prime (C(W(w,\theta ),\theta )}\right) f(W(w,\theta )|\theta )-e}{\upsilon _{yw}(Y(W(w,\theta ),\theta );W(w,\theta ),\theta )+e} d\mu (\theta ) \end{aligned}$$

Hence, the left-hand side of (27) is equal to the left-hand side of (17). \(\square\)

1.3 A.3 Proof of Lemma 5

With one-dimensional heterogeneity, we only consider within-group incentive constraints. Adopting a first-order approach, only (8a) is considered when building up the Hamiltonian:

$$\begin{aligned} \left( Y(w,\theta )-\mathscr {C}\left( Y(w,\theta ),U(w,\theta );w,\theta \right) +\dfrac{\Phi \left( U(w,\theta );w,\theta \right) }{\lambda } \right) \cdot f(w|\theta )-q(w|\theta )\cdot v_w\left( Y(w,\theta );w,\theta \right) . \end{aligned}$$

where \(Y(w,\theta )\) and \(U(w,\theta )\) are the control and state variables respectively. Using (12), the necessary conditions are:

$$\begin{aligned} 0= & {} \left( 1-\dfrac{v_y\left[ w,\theta \right] }{u^\prime \left[ w,\theta \right] }\right) \cdot f(w|\theta )- q(w|\theta )\cdot v_{yw}\left[ w,\theta \right] \end{aligned}$$
(28a)
$$\begin{aligned} -{\dot{q}}\left( w|\theta \right)= & {} \left( \dfrac{\Phi _U\left[ w,\theta \right] }{\lambda }-\dfrac{1}{u^\prime \left[ w,\theta \right] }\right) \cdot f(w|\theta ) \end{aligned}$$
(28b)
$$\begin{aligned} 0= & {} q(0|\theta ) \end{aligned}$$
(28c)
$$\begin{aligned} 0= & {} \underset{w\mapsto \infty }{\lim }q(w|\theta ). \end{aligned}$$
(28d)

Combining (28b) with (28d) leads to

$$\begin{aligned} q(w|\theta )=\int _w^\infty \left( \dfrac{\Phi _U\left[ w,\theta \right] }{\lambda }-\dfrac{1}{u^\prime \left[ w,\theta \right] }\right) \cdot f(\omega |\theta )d\omega . \end{aligned}$$
(28e)

Combining (3), (2), (28a) and (28e) leads to (18a). Combining (28c) with (28e) leads to (18b).

1.4 A.4 Proof of Proposition 2

Define a reform of a tax schedule \(y\mapsto T(y)\) with its direction, which is a differentiable function \(y\mapsto R(y)\) defined on \({\mathbb {R}}_+\), and with its algebraic magnitude \(m\in {\mathbb {R}}\). After a reform, the tax schedule becomes \(y\mapsto T(y)-m \ R(y)\) and the utility of an individuals of type \((w,\theta )\) is:

$$\begin{aligned} U^R(m;w,\theta )\overset{\text {def}}{\equiv }\quad \underset{y}{\max }\quad u(y-T(y)+m\ R(y))-\upsilon (y;w,\theta ). \end{aligned}$$
(29)

We denote by \(Y^R(m;w,\theta )\) the income of workers of types \((w,\theta )\) after the reform and her consumption becomes \(C^R(m;w,\theta )=Y^R(m;w,\theta )-T(Y^R(m;w,\theta ))+m\ R(Y^R(m;w,\theta ))\). When \(m=0\), we have \(Y^R(0;w,\theta )=Y(w,\theta )\) and \(C^R(0;w,\theta )=C(w,\theta )\). Applying the envelope theorem to (29), we get:

$$\begin{aligned} \frac{\partial U^R}{\partial m}(m;w,\theta )=u_c\left( C^R(m;w,\theta )\right) \ R(y). \end{aligned}$$
(30)

Using (3), the first-order condition associated to (29) equalizes to zero the following expression:

$$\begin{aligned} {\mathscr {Y}}^R(y,m;w,\theta ) \overset{\text {def}}{\equiv }1-T^\prime (y)+m\ R^\prime (y)-\mathscr {M}\left( y-T(y)+m\ R(y),y;w,\theta \right) . \end{aligned}$$
(31)

For simplicity, we drop the superscript R and write \(\mathscr {Y}_y(Y(w,\theta );w,\theta )\) for \(\mathscr {Y}_y^R(Y(w,\theta ),0;w,\theta )\).Footnote 11 We define behavioral responses to tax reforms of direction R by applying the implicit function theorem to \(\mathscr {Y}^R(y,m;w,\theta )=0\) at \(m=0\), which yields:

$$\begin{aligned} \frac{\partial Y^R}{\partial m}(0;w,\theta )=-\frac{\mathscr {Y}^R_{m}(Y(w,\theta ),0;w,\theta )}{\mathscr {Y}_{y}(Y(w,\theta ),0;w,\theta )} \end{aligned}$$
(32)

where:

$$\begin{aligned} {\mathscr {Y}}_y^R(y,m;w,\theta )= & {} -T^{\prime \prime }(y)-\mathscr {M}_y(y-T(y)+m\ R(y),y;w,\theta ) \nonumber \\&-{\mathscr {M}}(y-T(y)+m\ R(y),y;w,\theta )\ {\mathscr {M}}_c(y-T(y)+m\ R(y),y;w,\theta ), \end{aligned}$$
(33a)
$$\begin{aligned} \mathscr {Y}_m^R(y,m;w,\theta )= & {} R^\prime (y)- R(y)\ {\mathscr {M}}_c(y-T(y)+m\ R(y),y;w,\theta ). \end{aligned}$$
(33b)

Using (2) and plugging \(R(Y(w,\theta ))=0\) and \(R^\prime (Y(w,\theta ))=0\) into (33b), the compensated elasticity of earnings (19a) can be rewritten as:

$$\begin{aligned} \varepsilon (w,\theta )=\frac{\mathscr {M}(C(w,\theta ),Y(w,\theta );w,\theta )}{-Y(w,\theta )\ \mathscr {Y}_y(Y(w,\theta );w,\theta ) }>0 \end{aligned}$$
(34a)

which is positive since \(\mathscr {Y}_y\left( Y(w,\theta );w,\theta \right) <0\). Plugging \(R(Y(w,\theta ))=1\) and \(R^\prime (Y(w,\theta ))=0\) into (33b), the income effect (19b) can be rewritten as:

$$\begin{aligned} \eta (w,\theta )= \frac{\mathscr {M}_c(C(w,\theta ),Y(w,\theta );w,\theta )}{\mathscr {Y}_y(Y(w,\theta );w,\theta )} \end{aligned}$$
(34b)

which is negative if leisure is a normal good, since then \(\mathscr {M}_c>0\). The elasticity \(\alpha (w;\theta )\) of earnings with respect to the skill level can be expressed as:

$$\begin{aligned} \alpha (w,\theta )= \frac{w\ \mathscr {M}_w(C(w,\theta ),Y(w,\theta );w,\theta )}{Y(w,\theta )\ \mathscr {Y}_y(Y(w,\theta );w,\theta )} >0. \end{aligned}$$
(34c)

Dividing (34a) by (34c) we get:

$$\begin{aligned} \frac{\varepsilon (w,\theta )}{\alpha (w,\theta )}=-\frac{v_{y}\left[ w,\theta \right] }{w\cdot v_{yw}\left[ w,\theta \right] }. \end{aligned}$$
(35)

Plugging (34a) into (34b) leads to:

$$\begin{aligned} \eta (w,\theta )= Y(w,\theta ) \cdot \frac{u^{\prime \prime }\left[ w,\theta \right] }{u^{\prime }\left[ w,\theta \right] } \cdot \varepsilon (w,\theta ). \end{aligned}$$

It is then straightforward to obtain:

$$\begin{aligned} {\hat{\eta }}(Y(w,\theta _0))=Y(w,\theta _0)\cdot \frac{u^{\prime \prime }\left[ w,\theta _0\right] }{u^{\prime }\left[ w,\theta _0\right] }\cdot {\hat{\varepsilon }}(Y(w,\theta _0)). \end{aligned}$$
(36)

Let \(y\in \mathbb {R}_+\). Since \(\mathscr {Y}_y\left( Y(w,\theta );w,\theta \right) <0\), there exists a single skill level w such that \(y=Y(w,\theta _0)\). From (2), we know that:

$$\begin{aligned} 1-T^\prime \left[ w,\theta \right] =\frac{v_y\left[ w,\theta \right] }{u^\prime \left[ w,\theta \right] }. \end{aligned}$$
(37)

The term in the left-hand side integral of (14a) can be rewritten as:

$$\begin{aligned} \frac{v_y\left[ W(w,\theta ),\theta \right] }{- W(w,\theta )\ v_{yw}\left[ W(w,\theta ),\theta \right] }\ W(w,\theta ) \ f(W(w,\theta )|\theta )= & {} \frac{\varepsilon \left( W(w,\theta ),\theta \right) }{\alpha \left( W(w,\theta ),\theta \right) }\cdot W(w,\theta ) \ f(W(w,\theta )|\theta )\\= & {} \varepsilon \left( W(w,\theta ),\theta \right) \ Y(w,\theta _0)\ h(Y(w,\theta _0)|\theta ). \end{aligned}$$

The first equality is obtained using Eq. (35). The second equality uses (21). It implies with (22b) that Eq. (14a) can be rewritten as:

$$\begin{aligned} \frac{T^\prime \left[ w,\theta _0\right] }{1-T^\prime \left[ w,\theta _0\right] }\cdot {\hat{\varepsilon }}\left( Y(w,\theta _0)\right) \cdot Y(w,\theta _0)\cdot \hat{h}(Y(w,\theta _0)) = J(w) \end{aligned}$$
(38)

where J(w) is defined by the right-hand side of (14a). \(J(\cdot )\) admits for derivative \({\dot{J}}(w)\) where:

$$\begin{aligned}&{\dot{J}}(w)={\dot{C}}(w,\theta _0) \frac{u^{\prime \prime }\left[ w,\theta _0\right] }{ u^{\prime }\left[ w,\theta _0\right] } J(w)\\&\qquad + \int \limits _{\theta \in \Theta }\left\{ \frac{ \Phi _U\left[ W(w,\theta ),\theta \right] \ u^{\prime }\left[ W(w,\theta ),\theta \right] }{\lambda } -1\right\} {\dot{W}}(w,\theta )\ f\left( W(w,\theta )|\theta \right) d\mu (\theta ) \\&\quad = \int _{\theta \in \Theta }\left\{ g\left( W(w,\theta ),\theta \right) -1\right\} \cdot {\dot{W}}(w,\theta )\cdot f\left( W(w,\theta ; \theta _0)|\theta \right) \cdot d\mu (\theta ) +{\dot{C}}(w,\theta _0) \cdot \frac{ u^{\prime \prime }\left[ w,\theta _0\right] }{ u^{\prime }\left[ w,\theta _0\right] } \cdot J(w) \end{aligned}$$

where (20) has been used. Deriving with respect to the skill w both sides of (9) and of \(C(w,\theta _0)=Y(w,\theta _0)-T\left( Y(w,\theta _0)\right)\), we obtain:

$$\begin{aligned} {\dot{W}}(w,\theta )=\frac{{\dot{Y}}\left( w,\theta _0\right) }{{\dot{Y}}\left( W(w,\theta ),\theta \right) } \qquad \text {and}\qquad {\dot{C}}(w,\theta _0) =\left( 1-T^\prime \left( Y(w,\theta _0)\right) \right) \ {\dot{Y}}(w,\theta _0). \end{aligned}$$

We thus obtain:

$$\begin{aligned} {\dot{J}}(w)=\left( \int \limits _{\theta \in \Theta }\left\{ g\left( W(w,\theta ),\theta \right) -1\right\} \frac{f\left( W(w,\theta )|\theta \right) }{{\dot{Y}}(W(w,\theta ),\theta )} d\mu (\theta ) + \left( 1-T^\prime \left[ w,\theta _0\right] \right) \frac{u^{\prime \prime }\left[ w,\theta _0\right] }{u^{\prime }\left[ w,\theta _0\right] } J(w) \right) {\dot{Y}}(w,\theta _0). \end{aligned}$$

Using (21) and (38), \({\dot{J}}(w)\) can be rewritten as:

$$\begin{aligned} {\dot{J}}(w)= & {} \left( \int \limits _{\theta \in \Theta }\left\{ g\left( W(w,\theta ),\theta \right) -1\right\} h\left( Y(w,\theta _0)|\theta \right) d\mu (\theta ) \right. \\&+ \left. T^\prime \left( Y(w,\theta _0)\right) Y(w,\theta _0) \frac{u^{\prime \prime }\left( C(w,\theta _0)\right) }{ u^{\prime }\left( C(w,\theta _0)\right) } {\hat{\varepsilon }}(Y(w,\theta _0)) \hat{h}(Y(w,\theta _0)) \right) {\dot{Y}}(w,\theta _0). \end{aligned}$$

Using (36) and (22d), we get:

$$\begin{aligned} -{\dot{J}}(w)= & {} \left\{ 1-\hat{g}(Y(w,\theta _0))- {\hat{\eta }}(Y(w,\theta _0))\cdot T^\prime \left( Y(w,\theta _0)\right) \right\} \cdot \hat{h}\left( Y(w,\theta )\right) \cdot {\dot{Y}}(w,\theta _0). \end{aligned}$$

As \(J(w)=\int _{x\ge w} (-{\dot{J}}(x))dx\), we get

$$\begin{aligned} J(w)= & {} \int _{x \ge w} \left\{ 1-\hat{g}(Y(x,\theta _0))- {\hat{\eta }}(Y(x,\theta _0))\cdot T^\prime \left( Y(x,\theta _0)\right) \right\} \cdot \hat{h}\left( Y(x,\theta )\right) \cdot {\dot{Y}}(x,\theta _0)\cdot dx. \end{aligned}$$

Changing variables by posing \(z=Y(x,\theta _0)\), we get

$$\begin{aligned} J(w)=\int _{z \ge Y(w,\theta _0)} \left\{ 1-\hat{g}(z)- {\hat{\eta }}(z)\cdot T^\prime \left( Y(z)\right) \right\} \cdot \hat{h}\left( Y(x,\theta )\right) \cdot dz. \end{aligned}$$
(39)

Plugging (39) into (38) gives (23a). Combining (14b) and (39) leads to (23b).

1.5 A.5 Proof of Proposition 3

Let us denote

$$\begin{aligned} K(w) \overset{\text {def}}{\equiv }\iint \limits _{\theta \in \Theta ,x\ge W(w,\theta )} \left( \frac{1}{u^{\prime }(C(x,\theta ))}\ -\frac{\Phi _{U}(U(x,\theta );x,\theta )}{\lambda }\right) f(x|\theta )dx\ d\mu (\theta ) \end{aligned}$$
(40)

the ratio of the right-hand side of (14a) at the skill level w divided by \(u^\prime \left( Y(w,\theta _0)-T(Y(w,\theta _0))\right)\). According to Proposition 1, Eq. (14a) and \(\upsilon _y>0>\upsilon _{yw}\), the sign of \(T^\prime (Y(w,\theta _0))\) is the sign of K(w).

Under utilitarian preferences, \(\Phi _{u}=1\). Changing variable in (40) from x to t such that \(x=W(t,\theta )\), (i.e. \(Y(x,\theta )\equiv Y(t,\theta _0)\) and \(C(x,\theta )\equiv C(t,\theta _0)\) according to (9)), we get:

$$\begin{aligned} K(w) = \int _{t\ge w}\left( \frac{1}{u^{\prime }(C(t,\theta _0))}-\frac{1}{\lambda }\right) \ \left( \int _{\theta \in \Theta } {\dot{W}}(t,\theta )\ f\left( W(t,\theta )\vert \theta \right) d\mu \left( \theta \right) \right) \ dt \end{aligned}$$

The derivative of K(w) has the sign of \(1/\lambda -1/u^{\prime }(C(w,\theta _0))\), which is decreasing in w because of the concavity of \(u(\cdot )\). Moreover, \(\underset{w\mapsto \infty }{\lim }K(w)=0\) and Eq. (14b) implies that \(K(0)=0\). Therefore, \(K(\cdot )\) first increases and then decreases. It is thus positive for all (interior) skill levels. So, optimal marginal tax rates are positive.

Under maximin, one has \(U(x,\theta )>U(0,\theta )\) for all \(x>0\) from (8a). Therefore, within each group, the most deserving individuals are those whose skill \(w=0\). The maximin objective implies \(\Phi _{U}\left[ x,\theta \right] =0\) for all \(x>0\). Hence, Eq. (40) simplifies to:

$$\begin{aligned} K(w) = \iint \limits _{\theta \in \Theta ,x\ge W(w,\theta )} \frac{1}{u^{\prime }(C(x,\theta ))}\ f(x|\theta )dx\ d\mu (\theta ) \end{aligned}$$

for all \(w>0\), which is always positive, thereby leading to positive optimal marginal tax rates.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jacquet, L., Lehmann, E. Optimal tax problems with multidimensional heterogeneity: a mechanism design approach. Soc Choice Welf 60, 135–164 (2023). https://doi.org/10.1007/s00355-021-01349-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00355-021-01349-4

Keywords

Navigation