1 Introduction

The important and influential literature growing out of Mirrlees’ (1971) seminal paper on optimal income taxation has stressed the trade-offs between incentive and distributional considerations in the design of income tax schedules. These trade-offs arise from an information friction that endogenizes the feasible tax instruments: the government knows the distribution of types in the population and it can also observe the actual earned income of each individual, but is not able to observe the specific type of any given individual. Personalized lump-sum taxes and transfers are therefore not available but public observability of earned income at the individual level allows the government to tax earned income on a nonlinear scale.

The vast majority of papers in the optimal tax literature assume that agents differ along a single dimension (market ability). This is due to tractability considerations. Given certain assumptions on the utility function, it enables a monotonic relationship between an agent’s unobserved type and the slope of his/her indifference curve in the earnings-consumption space. This property, referred to as ’single-crossing’ (hereafter, SC), allows the researcher to provide a full characterization of the set of implementable contracts while restricting attention to local incentive constraints linking adjacent types. In the case of a continuum of types, it also implies that the incentive constraints can conveniently be expressed in terms of differential equations. When agents differ along multiple dimensions, however, the SC property will generally be violated, as there is no natural way to order agents in a multidimensional space.Footnote 1

A comparatively small literature analyzes optimal income taxation with multidimensional unobserved heterogeneity, and these contributions can roughly be divided into four strands. A first strand assumes that the additional dimensions of heterogeneity enters additively separable in the utility function, thereby not affecting individuals’ trade-offs between pre-tax and after-tax income (see e.g., Kleven et al. 2009; Jacquet et al. 2013; Scheuer 2014; Bastani et al. 2020). A second strand imposes restrictions such that the various dimensions of heterogeneity can be collapsed into one dimension and parameterized by a single index (see, e.g., Boadway et al. 2002; Choné and Laroque 2010; Golosov et al. 2013; Rothschild and Scheuer 2014; Lockwood and Weinzierl 2015). A third strand analyzes more general forms of heterogeneity, but focuses attention to quantitative analysis of models with a small discrete number of types (see, e.g., Bastani et al. 2013; Judd et al. 2018). Finally, a fourth strand comprises papers that provide a characterization of optimal marginal tax rates while remaining agnostic about which incentive-compatibility constraints are binding in equilibrium (see, e.g., Cremer et al. 1998; Cremer and Gahvari 2002; Micheletto 2008).

Compared to the existing literature referred to above, the purpose of this paper is to provide a more thorough investigation of the consequences descending from abandoning the SC condition. For this purpose, we set up a simple two-type model where the SC condition is naturally violated, and we characterize the properties of a second-best optimum by considering the entire second-best Pareto frontier (hereafter, PF).Footnote 2 The model that we consider is a standard intensive-margin optimal income tax model where agents have identical preferences and heterogeneous market abilities, but where we also allow for heterogeneity in “needs” for a work-related good/service, i.e. a good/service that some agents need to purchase in order to work.Footnote 3 It is this bi-dimensional heterogeneity that implies a violation of the SC condition.

Our analysis highlights several results, each of them representing an anomaly with respect to what is obtained in an optimal income tax model under SC. First of all, a second-best optimum might not preserve the ranking of earned income that prevails under laissez-faire. Second, redistribution via income taxation might be feasible even when the laissez-faire equilibrium is a pooling equilibrium. Third, a second-best optimum might not be unique, in the sense that there might be more than one set of allocations in the (pre-tax income, after-tax income)-space that solve the government’s maximization problem. Fourth, the second-best PF can be disconnected. Fifth, supplementing an optimal nonlinear income tax with an optimal subsidy on work-related expenses may imply that redistribution is achieved through a separating- or pooling equilibrium where both self-selection constraints are binding. A final result that we show is that the labor supply of some agents may be distorted even though no self-selection constraint is (locally) binding in equilibrium.

The paper is organized as follows. In Sect. 2 we present our setting and highlight how it implies that the SC condition does not hold. In Sect. 3 we evaluate the properties of the second-best PF and of the allocations that allow implementing the various points on the second-best PF. To simplify the exposition we make the assumption that, for agents who incur a cost for the purchase of a work-related good, the cost is proportional to their labor supply. In Sect. 4 we discuss how our results change when work-related expenses are subsidized by the government, and in Sect. 5 we briefly consider the possibility that job-related expenses vary nonlinearly with hours of work. Finally, Sect. 6 offers concluding remarks.

2 The model

Consider an economy populated by two groups of individuals who have identical preferences represented by the quasi-linear utility function

$$\begin{aligned} U=c-\frac{1}{2}h^{2}, \end{aligned}$$
(1)

where c denotes consumption and h denotes labor supply.Footnote 4

The two groups of agents are assumed to differ with respect to their market ability, reflected in their hourly wage rate, and their needs for a work-related good. One group has no need for any work-related good, whereas agents belonging to the other group incur a monetary cost \(\varphi (h)=qh\), where q is a positive constant. Throughout the paper we will refer to these groups of agents as “non-users” and “users”, and denote their hourly wage rates by, respectively, \(w^{n}\) and \(w^{u}\) (superscript “n” referring to non-users, and superscript “u” referring to users). Moreover, normalizing to 1 the size of the total population, we will denote by \(\pi \) the proportion of users. Furthermore, we will assume that \(w^{u}>w^{n}\), implying that the high-skilled agents are disadvantaged along our second dimension of heterogeneity, and that \(q<w^{u}\) which ensures that the labor supply of users is strictly positive under laissez-faire.

Assume that the government levies a nonlinear income tax T(wh) and let earned income be denoted by Y (i.e., \(Y\equiv wh\)) and after-tax income be denoted by B (i.e., \(B\equiv Y-T\left( Y\right) \)). It is straightforward to verify that the SC property is not satisfied in our two-type economy. This property requires that, at any bundle in the (YB)-space, the indifference curves are flatter the higher the wage rate of an agent. In our model, and for a given (YB)-bundle, users and non-users have utilities that are respectively given by:

$$\begin{aligned} U^{u}\,=\, & {} B-q\frac{Y}{w^{u}}-\frac{1}{2}\left( \frac{Y}{w^{u}}\right) ^{2}, \\ U^{n}= \,& {} B-\frac{1}{2}\left( \frac{Y}{w^{n}}\right) ^{2}. \end{aligned}$$

Therefore, at a given (YB)-bundle, the slope of a user’s indifference curve is equal to

$$\begin{aligned} MRS_{YB}^{u}\left( Y,B\right) \equiv -\frac{\partial U^{u}/\partial Y}{ \partial U^{u}/\partial B}=\frac{1}{w^{u}}\left[ q+\frac{Y}{w^{u}}\right] , \end{aligned}$$
(2)

whereas non-users have an indifference curve with slope equal to

$$\begin{aligned} MRS_{YB}^{n}\left( Y,B\right) \equiv -\frac{\partial U^{n}/\partial Y}{ \partial U^{n}/\partial B}=\frac{Y}{\left( w^{n}\right) ^{2}}. \end{aligned}$$
(3)

From (2) and (3), it follows that users and non-users have equally sloped indifference curves at bundles where

$$\begin{aligned} Y=\frac{q}{w^{u}}\left[ \frac{1}{\left( w^{n}\right) ^{2}}-\frac{1}{\left( w^{u}\right) ^{2}}\right] ^{-1}=\frac{qw^{u}}{\left( w^{u}\right) ^{2}-\left( w^{n}\right) ^{2}}\left( w^{n}\right) ^{2}\equiv \Omega >0, \end{aligned}$$
(4)

whereas at any bundle where \(Y>\left( <\right) \Omega \), users have flatter (steeper) indifference curves than non-users.

The fact that the SC property is not satisfied shows that our bi-dimensional heterogeneity (in skills and needs) cannot be reduced to one dimension. Albeit this complicates the analysis, it also allows us to highlight some interesting results that can arise due to the violation of SC.

In the next section we will evaluate the properties of the second-best PF and of the allocations that allow implementing the various points on the second-best PF. In doing that, we will restrict our attention to the case when \(\pi \), the proportion of users, is lower than \(1-\left( w^{n}\right) ^{2}/\left( w^{u}\right) ^{2}\); this represents the most interesting case for the purpose of illustrating the anomalies that can arise due to the violation of SC.Footnote 5

Before turning to the analysis of the second-best PF, however, we will devote the remainder of this section to first provide a characterization of the laissez-faire equilibrium, and then characterize the properties of the first-best PF.

2.1 The laissez-faire equilibrium

Under laissez-faire, users choose h to maximize \(\left( w^{u}-q\right) h-h^{2}/2\), implying \(h^{u}=w^{u}-q\), whereas non-users choose h to maximize \(w^{n}h-h^{2}/2\), implying \(h^{n}=w^{n}\).

Therefore, denoting by \(Y_{LF}^{i}\) the laissez-faire income of an individual i, for \(i=n,u\), we have that \(Y_{LF}^{n}=\left( w^{n}\right) ^{2}\), \(Y_{LF}^{u}=\left( w^{u}-q\right) w^{u}\). It then follows that

$$\begin{aligned} Y_{LF}^{u}<\left(>\right) Y_{LF}^{n}\Longleftrightarrow \left( w^{u}-q\right) w^{u}<\left( >\right) \left( w^{n}\right) ^{2}. \end{aligned}$$

Equivalently, defining \({\overline{q}}\) as

$$\begin{aligned} {\overline{q}}\equiv \frac{\left( w^{u}\right) ^{2}-\left( w^{n}\right) ^{2}}{ w^{u}}, \end{aligned}$$

we have that

$$\begin{aligned} Y_{LF}^{u}<\left(>\right) Y_{LF}^{n}\Longleftrightarrow q>\left( <\right) {\overline{q}}. \end{aligned}$$

Consider the case when \(q>{\overline{q}}\), so that \(Y_{LF}^{u}<Y_{LF}^{n}\). Since \(\Omega \) in (4) can be re-expressed as \(\left( w^{n}\right) ^{2}q/{\overline{q}}\), it also follows that \(\Omega >Y_{LF}^{n}\) when \(q> {\overline{q}}\). Similarly, when \(q<{\overline{q}}\), we have that \(\Omega <Y_{LF}^{n}\), and when \(q={\overline{q}}\) we have that \(\Omega =Y_{LF}^{n}\). Thus, whether q is smaller than, equal to, or larger than \({\overline{q}}\) also determines the relative sizes of both types’ MRS at their laissez-faire bundles (i.e. the relations between \(Y_{LF}^{u}\), \(Y_{LF}^{n}\) and the threshold \(\Omega \)).

The following Lemma summarizes the relationship between the value of q and the three possible configurations of a laissez-faire equilibrium.

Lemma 1

Assume that \(w^{u}>w^{n}\).

  1. (i)

    When \(q<{\overline{q}}\), the laissez-faire equilibrium will be such that \( Y_{LF}^{u}>Y_{LF}^{n}>\Omega \);

  2. (ii)

    When \(q={\overline{q}}\), the laissez-faire equilibrium will be such that \( Y_{LF}^{u}=Y_{LF}^{n}=\Omega \);

  3. (iii)

    When \(q>{\overline{q}}\), the laissez-faire equilibrium will be such that \(Y_{LF}^{u}<Y_{LF}^{n}<\Omega \).

A graphical illustration of the laissez-faire equilibrium for the case when \( q>{\overline{q}}\), and of the violation of SC, is provided in Fig. 1 below.

Fig. 1
figure 1

Laissez-faire equilibrium when \( q > \overline{q}\)

Regarding utilities, denoting by \(U_{LF}^{i}\) the laissez-faire utility of an individual i, for \(i=n,u\), we have that \(U_{LF}^{u}=\left( w^{u}-q\right) ^{2}/2\), \(U_{LF}^{n}=\left( w^{n}\right) ^{2}/2\), and therefore

$$\begin{aligned} U_{LF}^{u}<\left(>\right) U_{LF}^{n}\Longleftrightarrow w^{u}-q<\left( >\right) w^{n}, \end{aligned}$$

or, equivalently

$$\begin{aligned} U_{LF}^{u}<\left(>\right) U_{LF}^{n}\Longleftrightarrow q>\left( <\right) w^{u}-w^{n}. \end{aligned}$$

One thing to notice is that the utility ranking and the income ranking may differ. In particular, while \(Y_{LF}^{u}\le Y_{LF}^{n}\) implies that \( U_{LF}^{u}<U_{LF}^{n}\), knowing that \(Y_{LF}^{u}>Y_{LF}^{n}\) is not sufficient to establish who is better off under laissez faire. When \( Y_{LF}^{u}>Y_{LF}^{n}\), we can have that \(U_{LF}^{u}<U_{LF}^{n}\) (when \( \left( w^{u}-q\right) w^{u}>\left( w^{n}\right) ^{2}>\left( w^{u}-q\right) ^{2}\)), \(U_{LF}^{u}=U_{LF}^{n}\) (when \(\left( w^{u}-q\right) w^{u}>\left( w^{n}\right) ^{2}=\left( w^{u}-q\right) ^{2}\)), or \(U_{LF}^{u}>U_{LF}^{n}\) (when \(\left( w^{u}-q\right) w^{u}>\left( w^{u}-q\right) ^{2}>\left( w^{n}\right) ^{2}\)).

2.2 The shape of the first-best Pareto frontier

In a first-best setting where asymmetric information is not an issue, the shape of the PF can be straightforwardly characterized. The first-best PF goes through the point with coordinates (\(U_{LF}^{n},U_{LF}^{u}\)) and has slope \(dU^{u}/dU^{n}=-(1-\pi )/\pi \) for values of \(U^{n}\) such that \( -\left( w^{n}\right) ^{2}/2\le U^{n}\le \left[ \left( w^{u}-q\right) ^{2}\pi /\left( 1-\pi \right) \right] +\left( w^{n}\right) ^{2}/2\). For \( U^{n}>\left[ \left( w^{u}-q\right) ^{2}\pi /\left( 1-\pi \right) \right] +\left( w^{n}\right) ^{2}/2\) the slope of the PF is such that \( dU^{u}/dU^{n}<-(1-\pi )/\pi \); for \(U^{n}<-\left( w^{n}\right) ^{2}/2\) the slope is such that \(-(1-\pi )/\pi<dU^{u}/dU^{n}<0\).

The intuition is as follows. Starting from the laissez-faire equilibrium, a 1$ lump-sum tax levied on non-users, which reduces by 1 the utility of each non-user, allows the government to collect $\((1-\pi )\), which implies that each user can receive a lump-sum transfer of $\(\left( 1-\pi \right) /\pi \), raising utility by \(\left( 1-\pi \right) /\pi \). This kind of income- and utility-redistribution, from non-users to users, can go on until all the income earned by non-users under laissez-faire, i.e. \(\left( w^{n}\right) ^{2}\), is confiscated by the government. At that point we have that \(U^{n}=-\left( w^{n}\right) ^{2}/2\) (consumption for non-users is equal to zero and, with no income effects on labor supply, their labor supply is undistorted at its laissez-faire level) and \(U^{u}=\left[ \left( w^{n}\right) ^{2}\left( 1-\pi \right) /\pi \right] +\left( w^{u}-q\right) ^{2}/2\). Once this point on the first-best PF is reached, and assuming that zero represents the lower bound for individual consumption,Footnote 6 a further increase in \(U^{u}\) can only be obtained by pushing the labor supply of non-users above its undistorted level \(h^{n}=w^{n}\) (while keeping at zero their consumption), so that additional resources can be transferred to users. However, due to the distortion on the labor supply of non-users, redistribution becomes costlier and the slope of the PF becomes equal to \(dU^{u}/dU^{n}=-\left( 1-\pi \right) w^{n}/\pi h^{n}\), which is greater than \(-(1-\pi )/\pi \) when \(h^{n}\) exceeds \(w^{n}\), i.e. its laissez-faire value.Footnote 7

The fact that the non-negativity constraint on consumption becomes binding along some portions of the first-best PF, and consequently the fact that there are portions of the first-best PF where the labor supply of some agents is upward distorted, is an artifact of our assumption that utility is linear in consumption.Footnote 8 Most importantly, it has nothing to do with the fact that the SC property does not hold in our model. For this reason, in our analysis we will hereafter impose the following lower bounds on the utility of, respectively, non-users and users:

$$\begin{aligned} U^{n}\ge & {} -U_{LF}^{n}=-\left( w^{n}\right) ^{2}/2, \end{aligned}$$
(5)
$$\begin{aligned} U^{u}\ge & {} -U_{LF}^{u}=-\left( w^{u}-q\right) ^{2}/2. \end{aligned}$$
(6)

Conditions (5) and (6) ensure that, at each point along the relevant part of the first-best PF, the labor supply of all agents will be left undistorted.

3 Pareto efficient income taxation

Consider now a second-best setting with asymmetric information. Specifically, assume that the government knows the distribution of types in the population but does not know “who is who”. Albeit individual wages, hours of work and job-related expenses are not observed by the government, earned income is assumed to be publicly observable at an individual level. This allows earned income to be taxed on a nonlinear scale and the government’s problem consists in optimally choosing the nonlinear income tax \(T\left( Y\right) \). Notice however that, while \(T\left( Y\right) \) defines a link between earned income Y and after-tax income B which is a single-valued function, the link that it establishes between earned income and consumption is a multivalued function. This is because, for given Y and corresponding tax payment \(T\left( Y\right) \), an individual consumption depends on the amount of job-related expenses.

As customary in the optimal tax literature, we will adopt a mechanism design approach assuming that the government optimally chooses two bundles in the (YB)-space subject to the requirement that the chosen set of bundles satisfies public-budget balance, incentive-compatibility, and non-negativity constraints on both consumption and labor supply. Denoting by (\(Y^{u},B^{u}\) ) the bundle intended for users and by (\(Y^{n},B^{n}\)) the one intended for non-users, a Pareto efficient tax problem can be formalized as follows:

$$\begin{aligned} \underset{Y^{u},B^{u},Y^{n},B^{n}}{\max }{ \ \ }B^{u}-\frac{q}{w^{u}} Y^{u}-\frac{1}{2}\left( \frac{Y^{u}}{w^{u}}\right) ^{2} \end{aligned}$$

subject to:

$$\begin{aligned}&B^{n}-\frac{1}{2}\left( \frac{Y^{n}}{w^{n}}\right) ^{2}\ge {\overline{V}} ^{n},&(\nu ) \\&\left( Y^{u}-B^{u}\right) \pi +\left( Y^{n}-B^{n}\right) \left( 1-\pi \right) \ge 0,&(\mu ) \\&B^{n}-\frac{1}{2}\left( \frac{Y^{n}}{w^{n}}\right) ^{2}\ge B^{u}-\frac{1}{ 2}\left( \frac{Y^{u}}{w^{n}}\right) ^{2},&(\lambda ) \\&B^{u}-\frac{q}{w^{u}}Y^{u}-\frac{1}{2}\left( \frac{Y^{u}}{w^{u}}\right) ^{2}\ge B^{n}-\frac{q}{w^{u}}Y^{n}-\frac{1}{2}\left( \frac{Y^{n}}{w^{u}} \right) ^{2},&(\phi )\\&Y^{u}\ge 0\text {, }Y^{n}\ge 0\text {, }B^{n}\ge 0\text {, } B^{u}-qY^{u}/w^{u}\ge 0. \end{aligned}$$

In the problem above, the \(\nu \)-constraint prescribes a lower bound \( {\overline{V}}^{n}\) for the utility of non-users, the \(\mu \)-constraint represents the government’s budget constraint (the resource constraint of the economy), the \(\lambda \)-constraint is the self-selection constraint requiring non-users not to be tempted to choose the bundle intended for users, and the \(\phi \)-constraint is the self-selection constraint requiring users not to be tempted to choose the bundle intended for non-users. For a given value of \(\ {\overline{V}}^{n}\), we define the set of admissible bundles as the set of bundles \(\{(Y^{u},B^{u}),(Y^{n},B^{n})\}\) satisfying the constraints in the above optimization problem (including the non-negativity constraints on labor supply and consumption for each agent). For given values of \(\pi \), q, \(w^{u}\) and \(w^{n}\), the value function of the optimization problem above defines a value for \(U^{u}\) which is a function of \({\overline{V}}^{n}\), that can be written as \(U_{SB}^{u}\left( {\overline{V}}^{n}\right) \). Repeatedly solving the optimization problem for different values of \({\overline{V}}^{n}\) allows tracing the entire second-best PF. In particular, we have that:

Definition 1

The second-best Pareto-frontier is defined by the graph of the function \(U_{SB}^{u}\left( {\overline{V}}^{n}\right) \) over the domain of values \({\overline{V}}^{n}\) such that the set of admissible bundles is non-empty and the \(\nu \)-constraint is binding.

We will present our results by means of three Propositions which separately consider the three cases described in Lemma 1 above. In each Proposition we will denote by \(T^{\prime }\left( Y_{SB}^{u}\right) \) and \(T^{\prime }\left( Y_{SB}^{n}\right) \) the marginal income tax rate faced by, respectively, users and non-users at the allocation which allows implementing a given point on the second-best PF. As customary in the optimal tax literature, the marginal income tax rate faced by an individual at a given bundle in the (YB)-space is defined as \(1-MRS_{YB}\).

As we will see, the non-standard outcomes which are due to the violation of SC only arise when \(q\ge {\overline{q}}\). For this reason, discussing the results when \(q<{\overline{q}}\) can be regarded as a useful starting point. Proposition 1 summarizes the main findings for this case.

Proposition 1

Assume that \(0<q<{\overline{q}}\), so that \(Y_{LF}^{n}<Y_{LF}^{u}\). Then,

  1. (i)

    the domain of the function \(U_{SB}^{u}\left( {\overline{V}} ^{n}\right) \)describing the second-best PF is given by \({\overline{V}}^{n}\in [-U_{LF}^{n},U_{LF}^{n}+\frac{\pi }{2}\frac{ (Y_{LF}^{u}-Y_{LF}^{n})^{2}}{\left( w^{u}\right) ^{2}-\pi \left( w^{n}\right) ^{2}}]\);

  2. (ii)

    for \(U_{LF}^{n}-\frac{\pi }{2}\frac{(Y_{LF}^{u}-Y_{LF}^{n})^{2}}{ \left( w^{n}\right) ^{2}}\le {\overline{V}}^{n}\le U_{LF}^{n}+\frac{\pi }{2} \frac{(Y_{LF}^{u}-Y_{LF}^{n})^{2}}{\left( w^{u}\right) ^{2}}\), the second-best PF coincides with the first-best PF and it is attained through an allocation where \(T^{\prime }\left( Y_{SB}^{n}\right) =T^{\prime }\left( Y_{SB}^{u}\right) =0\);

  3. (iii)

    for \(U_{LF}^{n}+\frac{\pi }{2}\frac{(Y_{LF}^{u}-Y_{LF}^{n})^{2}}{ \left( w^{u}\right) ^{2}}<{\overline{V}}^{n}\le U_{LF}^{n}+\frac{\pi }{2} \frac{(Y_{LF}^{u}-Y_{LF}^{n})^{2}}{\left( w^{u}\right) ^{2}-\pi \left( w^{n}\right) ^{2}}\), the second-best PF is attained through an allocation where \(T^{\prime }\left( Y_{SB}^{u}\right) =0\) and \(T^{\prime }\left( Y_{SB}^{n}\right) >0\);

  4. (iv)

    for \(-U_{LF}^{n}\le {\overline{V}}^{n}<U_{LF}^{n}-\frac{\pi }{2} \frac{(Y_{LF}^{u}-Y_{LF}^{n})^{2}}{\left( w^{n}\right) ^{2}}\), the second-best PF is attained through an allocation where \(T^{\prime }\left( Y_{SB}^{n}\right) =0\) and \(T^{\prime }\left( Y_{SB}^{u}\right) <0\).

Proof

See Appendix A \(\square \)

The results provided in Proposition 1 are qualitatively similar to those that would be obtained in a standard two-type setting where agents only differ in market ability (\(q=0\)).

Part (ii) shows that when the amount of inter-group redistribution is sufficiently small (i.e., \({\overline{V}}^{n}\) is sufficiently close to \( U_{LF}^{n}\)), no distortion is needed to satisfy incentive-compatibility; this means that asymmetric information does not prevent the government from attaining a point on the first-best PF.

Together, parts (iii) and (iv) show instead that, when the amount of redistribution becomes sufficiently large, incentive-compatibility considerations require to distort the labor supply of the transfer-recipients. When these are represented by non-users, as in part (iii) of Proposition 1, their labor supply will be downward distorted by letting them face a positive marginal tax rate. When transfer-recipients are instead represented by users, as in part (iv) , their labor supply will be upward distorted by letting them face a negative marginal tax rate. In either case, since Proposition 1 refers to the case when \(Y_{LF}^{n}<Y_{LF}^{u}\), the direction of the distortion imposed on the labor supply of the transfer-recipients is always “coherent” with the income ranking prevailing under laissez-faire. Thus, when \(q<{\overline{q}}\), the laissez-faire income-ranking is preserved at all points on the second-best PF.

Let’s now consider the case when \(q={\overline{q}}\).

Proposition 2

Assume that \(q={\overline{q}}\), so that \(Y_{LF}^{n}=Y_{LF}^{u}\). Then,

  1. (i)

    the domain of the function \(U_{SB}^{u}\left( {\overline{V}} ^{n}\right) \)describing the second-best PF is given by \({\overline{V}}^{n}\in \left[ -U_{LF}^{n},U_{LF}^{n}\right] \);

  2. (ii)

    for \((1-\pi )U_{LF}^{n}\le {\overline{V}}^{n}<U_{LF}^{n}\), the second-best PF can be attained through two different allocations, one where \( T^{\prime }\left( Y_{SB}^{n}\right) =0\)and \(T^{\prime }\left( Y_{SB}^{u}\right) <0\), and another one where \(T^{\prime }\left( Y_{SB}^{n}\right) =0\)and \(T^{\prime }\left( Y_{SB}^{u}\right) >0\);

  3. (iii)

    for \(-U_{LF}^{n}\le {\overline{V}}^{n}<(1-\pi )U_{LF}^{n}\), the second-best PF is attained through an allocation where \(T^{\prime }\left( Y_{SB}^{n}\right) =0\)and \(T^{\prime }\left( Y_{SB}^{u}\right) <0\).

Proof

See Appendix B. \(\square \)

A key insight to understand the properties of the PF when \(q={\overline{q}}\) is that the indifference curve on which non-users locate under laissez-faire lies everywhere above the indifference curve on which users locate under laissez-faire (except at the point \(Y_{LF}^{n}=Y_{LF}^{u}\) where the two indifference curves are tangent). This is illustrated in Fig. 2.

Fig. 2
figure 2

Laissez-faire equilibrium when \(q = \overline{q}\)

According to Proposition 2, the government can use a nonlinear income tax to redistribute towards users even in cases when both types earn the same income under laissez-faire. This stands in contrast to models where the SC holds; under SC, an anonymous nonlinear income tax does not allow the government to convert a pooling laissez-faire equilibrium into a separating equilibrium. However, as shown in part (ii), the labor supply of users is always distorted for \({\overline{V}}^{n}<U_{LF}^{n}\), which shows that the \(\lambda \)-constraint is binding for any degree of redistribution from non-users to users.

The indifference curves represented in Fig. 2 are helpful to get an intuition for the result that redistribution towards users is feasible. Suppose in fact that non-users were offered an undistorted bundle on an indifference curve that is below the one on which they locate under laissez-faire. Looking at Fig. 2 it is easy to realize that a downward shift in the indifference curve of non-users would allow to find a set of bundles that are at the same time above the users’ laissez-faire indifference curve and below the downward shifted indifference curve of non-users. This means that, starting from the equilibrium described in Fig. 2, it is feasible to move non-users on a lower indifference curve without violating the incentive-compatibility constraint requiring them not to be tempted to mimic users (the \(\lambda \)-constraint).

According to part (ii) of Proposition 2, for each \({\overline{V}} ^{n}\in [(1-\pi )U_{LF}^{n},U_{LF}^{n})\) the corresponding point on the second-best PF can be achieved through two different allocations. The two allocations are equivalent in the sense that they induce the same utility distribution. Although at both allocations non-users get the same (YB)-bundle and face no distortion on their labor supply (\(T^{\prime }\left( Y_{SB}^{n}\right) =0\) and \(Y_{SB}^{n}=Y_{LF}^{n}\)), one implementing allocation entails a downward distortion on the labor supply of users (\(T^{\prime }\left( Y_{SB}^{u}\right) >0\) and \(Y_{SB}^{u}<Y_{LF}^{u}\) ), whereas the other implementing allocation entails an upward distortion on their labor supply (\(T^{\prime }\left( Y_{SB}^{u}\right) <0\) and \(Y_{SB}^{u}<Y_{LF}^{u}\)). Intuitively, the reason why there are two different allocations that allow achieving the same point on the second-best PF is that, with \(q={\overline{q}}\), the magnitude of the distortion on users’ labor supply, that is needed to deter mimicking by non-users, is the same independently on its direction.

According to part (iii), for \({\overline{V}}^{n}<(1-\pi )U_{LF}^{n}\), a point on the second-best PF always requires that the labor supply of users is upward distorted (\(T^{\prime }\left( Y_{SB}^{u}\right) <0\)). To understand why this is the case, consider the point on the second-best PF that corresponds to \({\overline{V}}^{n}=(1-\pi )U_{LF}^{n}\). Of the two allocations that allow implementing this point, the allocation entailing a downward distortion on the labor supply of users prescribes to offer them the bundle \(\left( Y,B\right) =\left( 0,(1-\pi )U_{LF}^{n}\right) \). At this bundle their labor supply is pushed to its lower bound. Given that incentive-compatibility (the \(\lambda \)-constraint) requires that a reduction in \({\overline{V}}^{n}\) is accompanied by a larger (in absolute value) distortion on users, it follows that once \({\overline{V}}^{n}\) has reached \((1-\pi )U_{LF}^{n}\), a further reduction cannot be accommodated by magnifying the downward distortion on the labor supply of users. Therefore, for \({\overline{V}}^{n}<(1-\pi )U_{LF}^{n}\), the implementing allocation becomes unique and it requires to distort upwards the users’ labor supply.

Finally, Proposition 2 shows that when the two types are pooled at the laissez-faire equilibrium, it is never possible to use a nonlinear income tax to redistribute from users to non-users, i.e. there is no point on the second-best PF where non-users get a utility higher than \(U_{LF}^{n}\). An intuition for this result can again be grasped by looking at the indifference curves depicted in Fig. 2. Given that the laissez-faire indifference curve of users lies everywhere below the laissez-faire indifference curve of non-users (except at \(Y_{LF}^{u}=Y_{LF}^{n}\) where they are tangent), it is impossible to move users on a lower indifference curve without violating the incentive-compatibility constraint requiring them not to be tempted to mimic non-users (the \(\phi \)-constraint). Taking into account that, as previously noticed, for \({\overline{V}}^{n}<U_{LF}^{n}\) the labor supply of users is always distorted, it also follows that when the laissez-faire equilibrium features pooling, the first-best- and the second-best PF share only one point, i.e. the laissez-faire utility distribution.

Let’s now move to the last case that is left to consider, i.e. the case when \(q>{\overline{q}}\).

Proposition 3

Assume that \(q>{\overline{q}}\), so that \(Y_{LF}^{n}>Y_{LF}^{u}\). Then,

  1. (i)

    when \(q<{\overline{q}}\frac{\sqrt{2}+\sqrt{\pi }}{2\sqrt{\pi }}- \frac{\left( \sqrt{2}-\sqrt{\pi }\right) \sqrt{\pi }w^{u}}{2}\), the second-best PF is disconnected and the domain of the function \( U_{SB}^{u}\left( {\overline{V}}^{n}\right) \)is given by

    $$\begin{aligned} {\overline{V}}^{n}\in [-U_{LF}^{n},\left( 1-\pi \right) U_{LF}^{n}-\delta )\cup \left[ \left( 1-\pi \right) U_{LF}^{n},{\overline{V}} _{\max }^{n}\right] , \end{aligned}$$

    where \(\delta >0\) and \({\overline{V}}_{\max }^{n}>U_{LF}^{n}+\frac{\pi }{2} \frac{(Y_{LF}^{u}-Y_{LF}^{n})^{2}}{\left( w^{u}\right) ^{2}}\); when \(q\ge {\overline{q}}\frac{\sqrt{2}+\sqrt{\pi }}{2\sqrt{\pi }}-\frac{\left( \sqrt{2}- \sqrt{\pi }\right) \sqrt{\pi }w^{u}}{2}\), the domain is instead given by

    $$\begin{aligned} {\overline{V}}^{n}\in \left[ \left( 1-\pi \right) U_{LF}^{n},{\overline{V}} _{\max }^{n}\right] ; \end{aligned}$$
  2. (ii)

    for \(U_{LF}^{n}-\frac{\pi }{2}\frac{(Y_{LF}^{u}-Y_{LF}^{n})^{2}}{ \left( w^{n}\right) ^{2}}\le {\overline{V}}^{n}\le U_{LF}^{n}+\frac{\pi }{2} \frac{(Y_{LF}^{u}-Y_{LF}^{n})^{2}}{\left( w^{u}\right) ^{2}}\), the second-best PF coincides with the first-best PF and any point on the frontier is attained through an allocation where \(T^{\prime }\left( Y_{SB}^{n}\right) =T^{\prime }\left( Y_{SB}^{u}\right) =0\);

  3. (iii)

    for \(U_{LF}^{n}+\frac{\pi }{2}\frac{(Y_{LF}^{u}-Y_{LF}^{n})^{2}}{ \left( w^{u}\right) ^{2}}<{\overline{V}}^{n}\le {\overline{V}}_{\max }^{n}\), any point on the second-best PF corresponds to an allocation at which \( T^{\prime }\left( Y_{SB}^{u}\right) =0\) and \(T^{\prime }\left( Y_{SB}^{n}\right) <0\);

  4. (iv)

    for \((1-\pi )U_{LF}^{n}\le {\overline{V}}^{n}<U_{LF}^{n}-\frac{\pi }{2}\frac{(Y_{LF}^{u}-Y_{LF}^{n})^{2}}{\left( w^{n}\right) ^{2}}\), any point on the second-best PF corresponds to an allocation at which \(T^{\prime }\left( Y_{SB}^{n}\right) =0\) and \(T^{\prime }\left( Y_{SB}^{u}\right) >0\);

  5. (v)

    when the second-best PF includes a region where \(-U_{LF}^{n}\le {\overline{V}}^{n}<(1-\pi )U_{LF}^{n}\), any point on that region corresponds to an allocation at which \(T^{\prime }\left( Y_{SB}^{n}\right) =0\) and \( T^{\prime }\left( Y_{SB}^{u}\right) <0\).

Proof

See Appendix C. \(\square \)

Qualitatively, some of the results provided in Proposition 3 are standard. For instance, according to part (ii), no distortion is needed to satisfy incentive-compatibility when the amount of inter-group redistribution is sufficiently small (i.e., for values of \({\overline{V}}^{n}\) sufficiently close to \(U_{LF}^{n}\)). Another standard result is represented by part (iii) which states that, for \({\overline{V}}^{n}>U_{LF}^{n}\), if incentive-compatibility considerations require to distort the bundle offered to the transfer-recipients (in this case, non-users), the direction of the distortion is “coherent” with the income ranking under laissez-faire.

Two results stand out instead as non-standard and are specifically due to the violation of the SC condition. The first, stated in part (i), highlights the possibility that the second-best PF is disconnected. The second, which is a consequence of parts (iv) and(v), highlights that moving along the portion of the second-best PF where \( {\overline{V}}^{n}<U_{LF}^{n}\), the sign of \(T^{\prime }\left( Y_{SB}^{u}\right) \) may change. In particular, despite the fact that \( Y_{LF}^{n}>Y_{LF}^{u}\), users do not necessarily face a downward distortion on their labor supply at all points on the PF where the \(\lambda \) -constraint is binding, i.e. at all points where the labor supply of users needs to be distorted to prevent mimicking by non-users.

These two results are strictly related due to the fact that the sign of \( T^{\prime }\left( Y_{SB}^{u}\right) \) is not everywhere non-negative if and only if the domain of the function \(U_{SB}^{u}\left( {\overline{V}}^{n}\right) \) is a disconnected set, which in turn happens when \(q<{\overline{q}}\frac{ \sqrt{2}+\sqrt{\pi }}{2\sqrt{\pi }}-\frac{\left( \sqrt{2}-\sqrt{\pi }\right) \sqrt{\pi }w^{u}}{2}\).

To understand these results, consider first Fig. 3, which illustrates the qualitative features of the solution to the government’s problem for any given value of \({\overline{V}}^{n}\) such that \({\overline{V}}^{n}\in [(1-\pi )U_{LF}^{n},U_{LF}^{n}-\frac{\pi }{2}\frac{(Y_{LF}^{u}-Y_{LF}^{n})^{2} }{\left( w^{n}\right) ^{2}})\).

Fig. 3
figure 3

A constrained Pareto-efficient allocation featuring \( T'(Y^u) >0 {\text { and }} h^u>0 \)

In the figure, the dashed 45\(^\circ \) line represents the laissez-faire budget line (no taxes nor transfers), and points I and V represent the bundles chosen under laissez-faire by, respectively, non-users and users (\( Y_{LF}^{n}>Y_{LF}^{u}\)). Bundle II represents the undistorted bundle offered to non-users on their indifference curve associated with \(U^{n}={\overline{V}} ^{n}\). The blue 45\(^\circ \) line represents the virtual budget line on which, given the revenue extracted from non-users, a bundle for users can be offered.Footnote 9 On this virtual budget line, incentive-compatibility considerations (the need to satisfy the \( \lambda \)-constraint) prevents the government from offering users the undistorted bundle labelled VI. To prevent non-users from behaving as mimickers, users can only be offered, on the virtual budget line, either bundles to the left of III or bundles to the right of IV, with both bundle III and bundle IV belonging to the set of admissible bundles. The difference between these two sets of bundles is that, whereas with bundle III, or bundles to the left of it, type separation is achieved by imposing a sufficiently large downward distortion on the users’ labor supply, in the case of bundle IV, or bundles to the right of it, type separation is achieved by imposing a sufficiently large upward distortion on the users’ labor supply.

The black curve passing through bundle III is an indifference curve pertaining to users. The figure shows that, among all the admissible bundles that can be offered to users, bundle III is the one at which their utility is maximized. In particular, notice that the utility of users is strictly higher at bundle III than at bundle IV. The intuition is that, even though the \(\lambda \)-constraint can be satisfied by imposing either a sufficiently large downward- or a sufficiently large upward distortion on the labor supply of users, the size of the required distortion is smaller when type separation is obtained by distorting downwards the users’ labor supply (\( T^{\prime }\left( Y_{SB}^{u}\right) >0\)). This allows achieving type separation at a lower efficiency cost.

Consider now Fig. 4, which illustrates the solution to the government’s problem for the case when \({\overline{V}}^{n}\) is lowered to \((1-\pi )U_{LF}^{n}\).

Fig. 4
figure 4

A constrained Pareto-efficient allocation featuring \( T'(Y^u) >0 {\text { and }} h^u=0 \)

In Fig. 4, the dashed 45\(^\circ \) line represents the laissez-faire budget line, and the point labelled I on this line represents the bundle selected by non-users under laissez-faire. Bundle II represents the undistorted bundle offered to non-users lying on the indifference curve where \({\overline{V}}^{n}=\left( 1-\pi \right) U_{LF}^{n}\). The blue 45\(^\circ \) line represents the virtual budget line on which a bundle for users can be offered given the revenue extracted from non-users. Incentive compatibility requires that, on the blue virtual budget line, users can only be offered either bundle III or bundles to the right of IV, with bundle IV belonging to the set of admissible bundles. The black curve passing through bundle III is an indifference curve pertaining to users and it shows that bundle III is strictly preferred by users to bundle IV. Comparing bundle III in Fig. 4 with the corresponding bundle in Fig. 3, we can also see that the size of the downward distortion on the users’ labor supply is larger in Fig. 4.Footnote 10 The important thing to notice, however, is that at bundle III the users’ labor supply has been pushed to its lower bound (\(Y^{u}=0\)).Footnote 11

In a standard model where the SC condition holds, the utility achieved by users at bundle III in Fig. 4 would represent their maximal utility along the second-best PF. The reason is straightforward. Suppose that single-crossing were satisfied and that at all bundles in the (YB)-space users had steeper indifference curves. Then, the users’ indifference curve represented in Fig. 4 would lie everywhere above the indifference curve of non-users, except at bundle III. But this would necessarily imply that, if non-users were to be offered a bundle on a lower indifference curve (to increase the tax revenue collected from them), any (YB)-bundle that makes users better off (compared to bundle III in Fig. 4) would violate incentive-compatibility since it would induce non-users to behave as mimickers.

With SC being violated, instead, things are different. In Fig. 4 all the bundles that are included in the gray area represent bundles that would at the same time: (i) make users better off (compared to the utility that they achieve at bundle III), and (ii) be incentive-compatible in the sense that they would not induce non-users to reject bundle II. Even though the bundles in the gray area cannot be offered to users since they violate the public-budget constraint (when \({\overline{V}}^{n}=\left( 1-\pi \right) U_{LF}^{n}\) and non-users are offered the bundle II), users might be offered a bundle in the gray area if more revenue were collected from non-users, so that the blue virtual budget line could be shifted up. However, since collecting more revenue from non-users implies moving them on a lower indifference curve, and since this implies that the set of bundles in the gray area shrinks, the violation of SC is in general not sufficient to guarantee that the utility of users can be raised above the utility reached at bundle III. What is required is that the simultaneous upward shift in the virtual budget line, and downward shift in the indifference curve of non-users, push their point of intersection (currently at point IV in Fig. 4) inside the gray area. This is more likely to happen the smaller is \(\pi \) and the smaller the difference \(Y_{LF}^{n}-Y_{LF}^{u}\left( >0\right) \).Footnote 12

Notice also that at any bundle inside the gray area the labor supply of users is upward distorted (\(T^{\prime }\left( Y_{SB}^{u}\right) <0\)). Thus, if it is indeed possible, by lowering \({\overline{V}}^{n}\) below \(\left( 1-\pi \right) U_{LF}^{n}\), to raise \(U^{u}\) above the level that it achieves at bundle III, users will need to be assigned a bundle at which their labor supply is upward distorted. Moreover, since the users’ utility is strictly higher at bundle III than at bundle IV, raising \(U^{u}\) above the value achieved at bundle III would necessarily require a discrete downward jump in \({\overline{V}}^{n}\). This is illustrated in Fig. 5 below which shows the second-best PF with the property that the domain of the function \( U_{SB}^{u}\left( {\overline{V}}^{n}\right) \) is disconnected.

Finally, notice that when the second-best PF looks like in Fig. 5, the earned-income ranking that corresponds to the various points on the frontier is not always consistent with the income ranking under laissez-faire. Along the region where \({\overline{V}}^{n}<U_{LF}^{n}\), one moves from a portion of the second-best PF that coincides with the first-best frontier (the green part with slope \(-\left( 1-\pi \right) /\pi \)), to a portion where \( T^{\prime }\left( Y_{SB}^{u}\right) >0\) (the red part of the curve in Fig. 5), and finally to a portion where \(T^{\prime }\left( Y_{SB}^{u}\right) <0\) (the blue part of the curve in Fig. 5). When entering this last portion, the earned-income ranking is no longer consistent with the one under laissez-faire since we have \(Y_{LF}^{n}>Y_{LF}^{u}\) but \( Y_{SB}^{n}<Y_{SB}^{u}\).

Fig. 5
figure 5

A disconnected second-best Pareto frontier

Both the possibility that the second-best PF is disconnected and the possibility of income re-ranking follow from the circumstance that in our setting the SC condition is violated.Footnote 13 Similarly, it is because of the violation of the SC condition that, when redistribution favors users, it might be optimal to let them face a negative marginal tax rate even in cases when they earn less than non-users under laissez-faire. This shows that the violation of SC can provide a novel rationale for negative marginal tax rates.Footnote 14

4 Subsidizing work-related expenses

In our analysis we have so far maintained the assumption that the only policy instrument is a nonlinear income tax. In this setting we have highlighted the consequences descending from the violation of the SC condition. Most governments, however, allow special tax treatments for work-related expenses.Footnote 15 To consider this possibility, and given that a “special” tax treatment usually implies a more lenient one, we will now investigate how our results are affected when job-related expenses are subsidized at a flat rate \(s>0\) that is optimally chosen.Footnote 16 Moreover, since a subsidy on job-related expenses is only valuable to users, we will confine our attention to the portion of the PF where \({\overline{V}}^{n}<U_{LF}^{n}\), i.e. to the portion of the PF where redistribution goes from non-users to users.

The first thing to notice is that the subsidy has a flattening effect on the indifference curves for users in the \(\left( Y,B\right) \)-space. For a given (positive) value of s and a given bundle in the (YB)-space, we have that \(MRS_{YB}^{u}=\left( \left( 1-s\right) q+\frac{Y}{w^{u}}\right) /w^{u}\) . Thus, the threshold value for Y, separating the bundles where \( MRS_{YB}^{u}>MRS_{YB}^{n}\) from those where \(MRS_{YB}^{u}<MRS_{YB}^{n}\), lowers from \(\Omega \), as defined in (4), to

$$\begin{aligned} Y=(1-s)\Omega . \end{aligned}$$
(7)

Hence, the SC property is restored if \(s\ge 1\).Footnote 17

Most importantly, notice that in our setting a subsidy on job-related expenses represents a very effective instrument to redistribute towards users. This is because non-users derive no benefit from the subsidy. Therefore, channeling at least part of the resources transferred to users through a subsidy on job-related expenses makes it less attractive for non-users to behave as mimickers. One can then expect that, by supplementing an optimal nonlinear income tax with an optimally chosen s, the first-best PF and the second-best PF will coincide over a larger set of values for \( {\overline{V}}^{n}\). In particular, since we know from the analysis in Sect. 3 that an optimal nonlinear income tax is sufficient to implement a first-best optimum (i.e., a point on the first-best PF) when \({\overline{V}} ^{n}\in [U_{LF}^{n}-\frac{\pi }{2}\frac{(Y_{LF}^{u}-Y_{LF}^{n})^{2}}{ Y_{LF}^{n}},U_{LF}^{n})\), one can expect that using s as an additional policy tool allows implementing a first-best optimum also for a range of values for \({\overline{V}}^{n}\) that are strictly lower than \(U_{LF}^{n}-\frac{ \pi }{2}\frac{(Y_{LF}^{u}-Y_{LF}^{n})^{2}}{Y_{LF}^{n}}\). As shown in Proposition 4 below, which looks at the solution to the government’s problem for values of \({\overline{V}}^{n}\) such that \(-U_{LF}^{n}\le {\overline{V}} ^{n}<U_{LF}^{n}-\frac{\pi }{2}\frac{(Y_{LF}^{u}-Y_{LF}^{n})^{2}}{Y_{LF}^{n}}\) , this intuition is indeed correct.Footnote 18\(^{\text {,}}\)Footnote 19

Proposition 4

Assume that \(-U_{LF}^{n}\le {\overline{V}} ^{n}<U_{LF}^{n}-\frac{\pi }{2}\frac{(Y_{LF}^{u}-Y_{LF}^{n})^{2}}{Y_{LF}^{n}}\)and that the government is optimizing a nonlinear income tax and a proportional subsidy on work-related expenses. Moreover, let \({\widehat{V}} \equiv U_{LF}^{n}-\frac{\pi }{2}\frac{2w^{u}-q}{w^{u}}\left( Y_{LF}^{n}-Y_{LF}^{u}\right) \)be a threshold value for \({\overline{V}}^{n}\).

  1. (i)

    Suppose that \(q\le {\overline{q}}\) (i.e., \(Y_{LF}^{u}\ge Y_{LF}^{n} \)); then, the second-best PF will coincide with the first-best PF.

  2. (ii)

    Suppose that \(q>{\overline{q}}\) (i.e., \(Y_{LF}^{u}<Y_{LF}^{n}\)). For \({\overline{V}}^{n}\ge {\widehat{V}}\), the second-best PF coincides with the first-best PF. For \({\overline{V}}^{n}<{\widehat{V}}\), instead, both self-selection constraints will be binding and any point on the second-best PF corresponds to an allocation at which both types of agents face a distortion on their labor supply.

Proof

See Appendix D. \(\square \)

According to Proposition 4, there is a crucial difference between cases where \(q\le {\overline{q}}\) and cases where \(q> {\overline{q}}\). In the first scenario, using s as an additional policy instrument always allows implementing a first-best optimum. Instead, when \(q> {\overline{q}}\), a first-best optimum can only be implemented as long as the utility of non-users does not fall below a given threshold value \({\widehat{V}} \). Below, we discuss in two separate subsections the results provided by Proposition 4.

4.1 Part (i)

Consider an initial equilibrium where an optimal nonlinear income tax is used in isolation (\(s=0\)) and users are offered a distorted bundle to prevent mimicking by non-users. The transfer received by each user is equal to \(B^{u}-Y^{u}\) at the initial equilibrium. Introducing a small subsidy on job-related expenses (\(ds>0\)), while at the same time adjusting \(B^{u}\) downwards by \(dB^{u}=-\left( qY^{u}/w^{u}\right) ds\), would leave unchanged the net transfer received by each user.Footnote 20 Such a reform, however, would make mimicking less attractive for non-users.Footnote 21 Therefore, by relaxing the incentive-compatibility constraint for non-users, the reform would pave the way for the possibility to offer users a bundle where their labor supply is less distorted and their utility is higher. When \(Y_{LF}^{u}\ge Y_{LF}^{n}\), one can replicate the kind of reform described above (which hinges on raising s, lowering \(B^{u}\) and moving \(Y^{u}\) closer to its undistorted level) until a first-best optimum is achieved where no agent’s labor supply is distorted. This is because one can set s with the sole purpose of deterring mimicking by non-users, safely disregarding the other self-selection constraint, i.e. the one requiring users not to behave as mimickers. The intuition is provided in Fig. 6 below.

Fig. 6
figure 6

Subsidizing work-related expenses when \(q< \overline{q}\)

In Fig. 6, the bundle labelled I represents the undistorted bundle offered to non-users and lying on the red indifference curve where \(U^{n}={\overline{V}}^{n}<U_{LF}^{n}\). The blue 45\(^\circ \) line represents the virtual budget line on which a bundle for users can be offered, given the revenue extracted from non-users, when a nonlinear income tax is used in isolation (\(s=0\)). Incentive compatibility prevents the government from offering users the (first-best) undistorted bundle labelled IV. Instead, users will be offered the incentive-compatible bundle labelled III. Keeping fixed \({\overline{V}}^{n} \) and supplementing a nonlinear income tax with a subsidy on job-related expenses implies that users can be offered a bundle on a virtual budget line that is flatter than the one prevailing when \(s=0\). In particular, while its intercept does not change,Footnote 22 its slope drops from 1 to \(1-sq/w^{u}\). The dashed blue line represents the virtual budget line generated by supplementing a nonlinear income tax with a subsidy which is just large enough to allow the government to offer an undistorted bundle to users (bundle labelled II) without inducing mimicking by non-users. Notice that the vertical distance between bundle IV and bundle II is equal to \(sqY_{LF}^{u}/w^{u}\). Taking into account that, at bundle IV, the subsidy was set equal to zero, whereas at bundle II users save an amount \(sqY_{LF}^{u}/w^{u}\) on job-related expenses, users get the same net consumption at both bundles, and therefore enjoy the same utility (since labor supply is the same). It is also obvious from the figure that users, whose indifference curve is depicted in black, have no incentive to behave as mimickers since they strictly prefer bundle II to bundle I. The reason is easy to grasp. At bundle II their indifference curve is tangent to the virtual budget line generated by supplementing the income tax with a subsidy on job-related expenses. Thus, along the black indifference curve, at all bundles to the left of II we have that \( MRS_{YB}^{u}<1-sq/w^{u}<1\). Instead, along the red indifference curve (for non-users), at all bundles between I and II we have that \(MRS_{YB}^{n}>1\). Therefore, the fact that the two indifference curves cross at bundle II necessarily implies that bundle II is strictly preferred by users to bundle I.

4.2 Part (ii)

Things are instead different when \(Y_{LF}^{u}<Y_{LF}^{n}\). In this case, setting s large enough to deter mimicking by non-users might imply that users have an incentive to mimic non-users. The intuition why this other self-selection constraint cannot always be disregarded is provided in Fig. 7 below.

Fig. 7
figure 7

Subsidizing work-related expenses when \(q > \overline{q}\)

In Fig. 7 the bundle labelled I represents the undistorted bundle offered to non-users. The blue 45\(^\circ \) line represents the virtual budget line on which a bundle for users can be offered, given the revenue extracted from non-users, when a nonlinear income tax is used in isolation. Incentive compatibility prevents the government from offering users the undistorted bundle labelled IV; instead users will be offered the incentive-compatible bundle labelled III. The dashed blue line is the virtual budget line generated by supplementing a nonlinear income tax with a subsidy which is just large enough to allow the government to offer an undistorted bundle to users (bundle labelled II) without inducing mimicking by non-users. As was the case in Fig. 6, users get the same net consumption at both bundle IV (without the subsidy) and bundle II (with the subsidy), and therefore enjoy the same utility at both bundles. The figure shows that users, whose indifference curve is depicted in black, are indifferent between choosing the bundle II, intended for them by the government, and choosing the bundle intended for non-users.Footnote 23

The case represented in Fig. 7 shows a situation where both self-selection constraints are binding but the government is still able to implement a first-best optimum.Footnote 24 This happens when \(Y_{LF}^{u}<Y_{LF}^{n}\) and \( {\overline{V}}^{n}={\widehat{V}}\). Further lowering \({\overline{V}}^{n}\) would no longer allow the government to implement a first-best optimum. A higher subsidy would be needed to still offer an undistorted bundle to users without inducing non-users to mimic. But a higher subsidy would induce users to mimic non-users. Thus, lowering \({\overline{V}}^{n}\) below \({\widehat{V}}\) will induce the government to raise s, but not as much as it would be needed to offer users an undistorted bundle. The optimal s will then represent a trade-off between the desirable effects in terms of deterring mimicking by non-users and the undesirable effects of making it more tempting for users to mimic non-users. At the resulting second-best optimum both self-selection constraint are binding and both types face a distortion on their labor supply.Footnote 25

For \({\overline{V}}^{n}\) lower than but sufficiently close to \({\widehat{V}}\), the second-best optimum will be a separating equilibrium where each group is offered a distinct (YB)-bundle and the labor supply of both types is downward distorted (\(Y_{SB}^{u}<Y_{LF}^{u}\), \(Y_{SB}^{n}<Y_{LF}^{n}\) and \( Y_{SB}^{u}<Y_{SB}^{n}\)). As one keeps lowering \({\overline{V}}^{n}\), the distortions needed to implement a separating equilibrium become larger and larger, and one finally reaches a value for \({\overline{V}}^{n}\) below which it is no longer possible to further increase the users’ utility.

However, notice that when s is an additional policy instrument, the redistributive goals of the government do not necessarily require the implementation of a separating equilibrium, i.e., an equilibrium where each group is offered a distinct (YB)-bundle. Given that only users benefit from the subsidy s, redistribution can also be achieved by implementing a pooling equilibrium where both groups are offered the same (YB)-bundle (but have, nonetheless, different consumption). In particular, at a pooling equilibrium the government would solve the following optimization problem:

$$\begin{aligned} \underset{Y,B,s}{\max }{ \ \ }B-\left( 1-s\right) q\frac{Y}{w^{u}}- \frac{1}{2}\left( \frac{Y}{w^{u}}\right) ^{2} \end{aligned}$$

subject to

$$\begin{aligned} B-\frac{1}{2}\left( \frac{Y}{w^{n}}\right) ^{2}= & {} {\overline{V}}^{n}, \end{aligned}$$
(8)
$$\begin{aligned} Y-B= & {} \pi sqY/w^{u}. \end{aligned}$$
(9)

Substitute \(B={\overline{V}}^{n}+\left( Y/w^{n}\right) ^{2}/2\) and \( sqY/w^{u}=\left( Y-B\right) /\pi \), from constraint (8) and (9), respectively, into the objective function. The constrained optimization problem above can then be rewritten in an unconstrained way as

$$\begin{aligned}&\underset{Y}{\max }{\ \ }{\overline{V}}^{n}+\frac{1}{2}\left( \frac{Y}{ w^{n}}\right) ^{2}-q\frac{Y}{w^{u}}+\frac{Y}{\pi }-\frac{{\overline{V}}^{n}}{ \pi }-\frac{1}{2\pi }\left( \frac{Y}{w^{n}}\right) ^{2}-\frac{1}{2}\left( \frac{Y}{w^{u}}\right) ^{2}. \end{aligned}$$
(10)

From the first order condition of the problem above, denoting by \(Y^{p}\) the optimal value of Y, one gets:

$$\begin{aligned} Y^{p}=\frac{\left( w^{n}\right) ^{2}w^{u}}{\left( 1-\pi \right) \left( w^{u}\right) ^{2}+\pi \left( w^{n}\right) ^{2}}\left( w^{u}-q\pi \right) . \end{aligned}$$
(11)

Moreover, when \(w^{u}\left( w^{u}-q\right) <\left( w^{n}\right) ^{2}\) (i.e., \(Y_{LF}^{u}<Y_{LF}^{n}\)), it is straightforward to show that

$$\begin{aligned} Y_{LF}^{u}<Y^{p}<Y_{LF}^{n}. \end{aligned}$$
(12)

From (12) we can conclude that, at a pooling equilibrium, the labor supply of users is upward distorted and the labor supply of non-users is downward distorted. Moreover, from (11) we can also see that, since \(Y^{p}\) does not depend on \({\overline{V}}^{n}\), the magnitude of these distortions does not depend on the specific value of \({\overline{V}}^{n}\). Substituting (11) into the objective function of () we get that, at a pooling equilibrium, the users’ utility is given by

$$\begin{aligned} U^{u}=\frac{1}{2\pi }\frac{\left( w^{n}\right) ^{2}\left( w^{u}-q\pi \right) ^{2}}{\left( 1-\pi \right) \left( w^{u}\right) ^{2}+\pi \left( w^{n}\right) ^{2}}-{\overline{V}}^{n}\frac{1-\pi }{\pi }, \end{aligned}$$

which implies that \(\partial U^{u}/\partial {\overline{V}}^{n}=-\left( 1-\pi \right) /\pi \), i.e. the same slope that characterizes the first-best PF.

Clearly, for \({\widehat{V}}\le {\overline{V}}^{n}<U_{LF}^{n}\), a pooling equilibrium will never be chosen by the government. The reason is that, for \( {\widehat{V}}\le {\overline{V}}^{n}<U_{LF}^{n}\) the government can implement a separating equilibrium which allows attaining a point on the first-best PF. Under a pooling equilibrium, instead, it is never possible to reach a point on the first-best PF (given that the labor supply of both groups of agents is distorted). For values of \({\overline{V}}^{n}\) that are smaller than but sufficiently close to \({\widehat{V}}\), a separating equilibrium will again dominate a pooling equilibrium; even though both equilibria entail a distortion on the labor supply of both groups and a point on the first-best PF can no longer be attained, the distortions are less severe under a separating equilibrium. However, for sufficiently low values of \({\overline{V}} ^{n}\), a pooling equilibrium will dominate a separating equilibrium. The reason is that the distortions needed to implement a separating equilibrium become larger and larger as one keeps lowering \({\overline{V}}^{n}\); under a pooling equilibrium, instead, the magnitude of the distortions does not depend on the specific value of \({\overline{V}}^{n}\). The possibility of both types of second-best equilibria (separating and pooling), depending on the chosen value for \({\overline{V}}^{n}\), is illustrated by means of a numerical example in Appendix F.Footnote 26 The example also illustrates the fact that the second-best PF can be disconnected even when the nonlinear income tax is supplemented by an optimal subsidy on job-related expenses.

5 Pareto efficient taxation when job-related expenses are a nonlinear function of hours of work

In Sect. 3 we have emphasized three main anomalies descending from the violation of SC: (i) an anonymous nonlinear income tax may allow the government to convert a pooling laissez-faire equilibrium into a separating equilibrium; (ii) the second-best PF may be disconnected; (iii) a second-best optimum may not preserve the income ranking prevailing under laissez-faire.

As we show in a background version of this paper (see Bastani et al. 2019), similar qualitative results generalize, with some nuances, to a setting where the function \(\varphi \left( h\right) \) (describing the work-related monetary costs) is convex or concave. However, when \(\varphi \left( h\right) \) is concave, one additional anomaly may arise. In particular, when redistribution goes from non-users to users, it is possible that a second-best optimum entails a distortion on the labor supply of users even when no self-selection constraint is (locally) binding in equilibrium. The reason is that, when \(\varphi \left( h\right) \) is sufficiently concave, it is no longer the case that \(MRS_{YB}^{u}\) is monotonically increasing in Y.Footnote 27 To see this, notice that, for individual preferences given by \(U=c-h^{2}/2\) and a general nonlinear function \(\varphi \left( h\right) \), \(MRS_{YB}^{u}\) is given by \(MRS_{YB}^{u}=\left[ \varphi ^{\prime }\left( Y/w^{u}\right) +Y/w^{u}\right] /w^{u}\). Assume that \(\varphi \left( h\right) \) is an increasing and concave function which also satisfies the conditions \(\varphi ^{\prime }\left( 0\right) >w^{u}\), \(\varphi ^{\prime \prime }\left( 0\right) <-1\), and \(\varphi ^{\prime \prime \prime }\left( h\right) >0\). Then, while the value of \(MRS_{YB}^{u}\) is always positive for \(Y\ge 0\), it is larger than 1 and decreasing in Y for sufficiently small values of Y. The fact that \(MRS_{YB}^{u}>1\) for sufficiently low values of Y implies that, when incentive-compatibility considerations require that \(Y^{u}\) must be very small (to prevent mimicking by non-users), it may be optimal to offer users a bundle where \(Y^{u}=0\) even though it would be incentive-compatible to let them increase to some extent their labor supply (and enjoy a slightly larger value of consumption). This possibility is illustrated in Fig. 8 below and a numerical example is provided in Appendix F.

Fig. 8
figure 8

Distortions without binding self-selection constraints

In Fig. 8, the point I represents the bundle selected by non-users under laissez-faire. Bundle II represents the undistorted bundle offered to non-users lying on the indifference curve where \(U^{n}={\overline{V}} ^{n}<U_{LF}^{n}\). The blue 45\(^\circ \) line represents the virtual budget line on which a bundle for users can be offered given the revenue extracted from non-users. Incentive compatibility requires that users can only be offered bundles to the left of bundle V and to the right of bundle VI, with both V and VI belonging to the set of admissible bundles. The three black curves passing through bundles V, IV and III are three different indifference curves pertaining to users.

From the figure, one can see that bundle IV is strictly preferred by users to both the bundle V and bundle VI. But if users are offered the bundle IV, the self-selection constraint requiring non-users not to mimic users is slack. Notice also that users would be better off if they could get bundle III on the blue virtual budget line, i.e. the bundle at which their labor supply is undistorted. However, offering them this bundle would induce mimicking by non-users. Therefore, at a second-best optimum users are offered bundle IV and non-users are offered bundle II; the labor supply of users is downward distorted even though no self-selection constraint is binding at the second-best optimum.Footnote 28\(^{\text {,}}\)Footnote 29

6 Concluding remarks

In this paper, we have considered a two-type optimal nonlinear income tax model where agents differ both in terms of market ability and in terms of “needs” for a work-related good/service, i.e. a good/service that some agents need to purchase in order to work. Because of this bi-dimensional heterogeneity, the single-crossing conditions fails to hold. Ruling out public observability of individual types, we have characterized the properties of a second-best optimum by looking at the entire second-best Pareto frontier.

We have highlighted that, due to the violation of single-crossing, some non-standard results arise. First of all, a second-best optimum might not preserve the earned-income ranking that prevails under laissez-faire. Second, redistribution via income taxation might be feasible even when the laissez-faire equilibrium is a pooling equilibrium. Third, a second-best optimum might not be unique, in the sense that there might be more than one set of allocations in the (pre-tax income, after-tax income)-space that solve the government’s maximization problem. Fourth, the second-best Pareto frontier may be disconnected. Fifth, supplementing an optimal nonlinear income tax with an optimal subsidy on work-related expenses may imply that redistribution is achieved through a separating- or pooling equilibrium where both self-selection constraints are binding. Sixth, we have shown that the labor supply of some agents might be distorted even though no self-selection constraint is (locally) binding in equilibrium.

Before concluding, a final remark is in order. For tractability reasons, we have focused our analysis on a simplified two-type model where skills and needs are perfectly correlated. However, insofar as our non-standard results hinge on the violation of the single-crossing condition, they generalize, with some nuances, to settings with a larger number of types and imperfect correlation between skills and needs.