Pareto efficient income taxation without single-crossing

We provide a full characterization of a two-type optimal nonlinear income tax model where the single-crossing condition is violated due to an assumption that agents differ both in terms of market abilities and in terms of their needs for a work-related good. We set up a Pareto-efficient tax problem and analyze the entire second-best Pareto-frontier, highlighting several non-standard results, such as the possibility of income re-ranking relative to the laissez-faire and gaps in the Pareto-frontier.


Introduction
The important and influential literature growing out of Mirrlees' (1971) seminal paper on optimal income taxation has stressed the trade-offs between incentive and distributional considerations in the design of income tax schedules. These trade-offs arise from an information friction that endogenizes the feasible tax instruments: the government knows the distribution of types in the population and it can also observe We are grateful to Floris Zoutman and two anonymous referees for valuable comments on an earlier draft of the paper.
1 3 the actual earned income of each individual, but is not able to observe the specific type of any given individual. Personalized lump-sum taxes and transfers are therefore not available but public observability of earned income at the individual level allows the government to tax earned income on a nonlinear scale.
The vast majority of papers in the optimal tax literature assume that agents differ along a single dimension (market ability). This is due to tractability considerations. Given certain assumptions on the utility function, it enables a monotonic relationship between an agent's unobserved type and the slope of his/her indifference curve in the earnings-consumption space. This property, referred to as 'single-crossing' (hereafter, SC), allows the researcher to provide a full characterization of the set of implementable contracts while restricting attention to local incentive constraints linking adjacent types. In the case of a continuum of types, it also implies that the incentive constraints can conveniently be expressed in terms of differential equations. When agents differ along multiple dimensions, however, the SC property will generally be violated, as there is no natural way to order agents in a multidimensional space. 1 A comparatively small literature analyzes optimal income taxation with multidimensional unobserved heterogeneity, and these contributions can roughly be divided into four strands. A first strand assumes that the additional dimensions of heterogeneity enters additively separable in the utility function, thereby not affecting individuals' trade-offs between pre-tax and after-tax income (see e.g., Kleven et al. 2009;Jacquet et al. 2013;Scheuer 2014;Bastani et al. 2020). A second strand imposes restrictions such that the various dimensions of heterogeneity can be collapsed into one dimension and parameterized by a single index (see, e.g., Boadway et al. 2002;Choné and Laroque 2010;Golosov et al. 2013;Rothschild and Scheuer 2014;Lockwood and Weinzierl 2015). A third strand analyzes more general forms of heterogeneity, but focuses attention to quantitative analysis of models with a small discrete number of types (see, e.g., Bastani et al. 2013;Judd et al. 2018). Finally, a fourth strand comprises papers that provide a characterization of optimal marginal tax rates while remaining agnostic about which incentive-compatibility constraints are binding in equilibrium (see, e.g., Cremer et al. 1998;Cremer and Gahvari 2002;Micheletto 2008).
Compared to the existing literature referred to above, the purpose of this paper is to provide a more thorough investigation of the consequences descending from abandoning the SC condition. For this purpose, we set up a simple two-type model where the SC condition is naturally violated, and we characterize the properties of a second-best optimum by considering the entire second-best Pareto frontier (hereafter, PF). 2 The model that we consider is a standard intensive-margin optimal income tax model where agents have identical preferences and heterogeneous market abilities, but where we also allow for heterogeneity in "needs" for a work-related good/ 1 3 Pareto efficient income taxation without single-crossing service, i.e. a good/service that some agents need to purchase in order to work. 3 It is this bi-dimensional heterogeneity that implies a violation of the SC condition.
Our analysis highlights several results, each of them representing an anomaly with respect to what is obtained in an optimal income tax model under SC. First of all, a second-best optimum might not preserve the ranking of earned income that prevails under laissez-faire. Second, redistribution via income taxation might be feasible even when the laissez-faire equilibrium is a pooling equilibrium. Third, a second-best optimum might not be unique, in the sense that there might be more than one set of allocations in the (pre-tax income, after-tax income)-space that solve the government's maximization problem. Fourth, the second-best PF can be disconnected. Fifth, supplementing an optimal nonlinear income tax with an optimal subsidy on work-related expenses may imply that redistribution is achieved through a separating-or pooling equilibrium where both self-selection constraints are binding. A final result that we show is that the labor supply of some agents may be distorted even though no self-selection constraint is (locally) binding in equilibrium.
The paper is organized as follows. In Sect. 2 we present our setting and highlight how it implies that the SC condition does not hold. In Sect. 3 we evaluate the properties of the second-best PF and of the allocations that allow implementing the various points on the second-best PF. To simplify the exposition we make the assumption that, for agents who incur a cost for the purchase of a work-related good, the cost is proportional to their labor supply. In Sect. 4 we discuss how our results change when work-related expenses are subsidized by the government, and in Sect. 5 we briefly consider the possibility that job-related expenses vary nonlinearly with hours of work. Finally, Sect. 6 offers concluding remarks.

The model
Consider an economy populated by two groups of individuals who have identical preferences represented by the quasi-linear utility function where c denotes consumption and h denotes labor supply. 4 The two groups of agents are assumed to differ with respect to their market ability, reflected in their hourly wage rate, and their needs for a work-related good. One group has no need for any work-related good, whereas agents belonging to the other group incur a monetary cost (h) = qh , where q is a positive constant. Throughout the paper we will refer to these groups of agents as "nonusers" and "users", and denote their hourly wage rates by, respectively, w n and w u 3 Several interpretations are possible. One example is child care services which are needed by parents of young kids in order to work. Other groups who might face needs constraints include workers with relatives who require elderly care, or workers who incur commuting costs or work-related health costs. 4 The specific iso-elastic form of the utility function is here mainly adopted for analytical convenience.
1 3 (superscript "n" referring to non-users, and superscript "u" referring to users). Moreover, normalizing to 1 the size of the total population, we will denote by the proportion of users. Furthermore, we will assume that w u > w n , implying that the high-skilled agents are disadvantaged along our second dimension of heterogeneity, and that q < w u which ensures that the labor supply of users is strictly positive under laissez-faire. Assume that the government levies a nonlinear income tax T(wh) and let earned income be denoted by Y (i.e., Y ≡ wh ) and after-tax income be denoted by B (i.e., B ≡ Y − T(Y) ). It is straightforward to verify that the SC property is not satisfied in our two-type economy. This property requires that, at any bundle in the (Y, B)-space, the indifference curves are flatter the higher the wage rate of an agent. In our model, and for a given (Y, B)-bundle, users and non-users have utilities that are respectively given by: Therefore, at a given (Y, B)-bundle, the slope of a user's indifference curve is equal to whereas non-users have an indifference curve with slope equal to From (2) and (3), it follows that users and non-users have equally sloped indifference curves at bundles where whereas at any bundle where Y > (<)Ω , users have flatter (steeper) indifference curves than non-users.
The fact that the SC property is not satisfied shows that our bi-dimensional heterogeneity (in skills and needs) cannot be reduced to one dimension. Albeit this complicates the analysis, it also allows us to highlight some interesting results that can arise due to the violation of SC.
In the next section we will evaluate the properties of the second-best PF and of the allocations that allow implementing the various points on the second-best PF. In doing that, we will restrict our attention to the case when , the proportion of users, Pareto efficient income taxation without single-crossing is lower than 1 − (w n ) 2 ∕(w u ) 2 ; this represents the most interesting case for the purpose of illustrating the anomalies that can arise due to the violation of SC. 5 Before turning to the analysis of the second-best PF, however, we will devote the remainder of this section to first provide a characterization of the laissez-faire equilibrium, and then characterize the properties of the first-best PF.  (4) can be re-expressed as (w n ) 2 q∕q , it also follows that Ω > Y n LF when q > q . Similarly, when q < q , we have that Ω < Y n LF , and when q = q we have that Ω = Y n LF . Thus, whether q is smaller than, equal to, or larger than q also determines the relative sizes of both types' MRS at their laissez-faire bundles (i.e. the relations between Y u LF , Y n LF and the threshold Ω). The following Lemma summarizes the relationship between the value of q and the three possible configurations of a laissez-faire equilibrium.

The laissez-faire equilibrium
Lemma 1 Assume that w u > w n .
(i) When q < q , the laissez-faire equilibrium will be such that Y u LF > Y n LF > Ω; (ii) When q = q , the laissez-faire equilibrium will be such that Y u LF = Y n LF = Ω; (iii) When q > q , the laissez-faire equilibrium will be such that Y u LF < Y n LF < Ω.
A graphical illustration of the laissez-faire equilibrium for the case when q > q , and of the violation of SC, is provided in Fig. 1 below.

3
Regarding utilities, denoting by U i LF the laissez-faire utility of an individual i, for i = n, u , we have that U u LF = (w u − q) 2 ∕2 , U n LF = (w n ) 2 ∕2 , and therefore or, equivalently One thing to notice is that the utility ranking and the income ranking may differ. In particular, while

The shape of the first-best Pareto frontier
In a first-best setting where asymmetric information is not an issue, the shape of the PF can be straightforwardly characterized. The first-best PF goes through the point with coordinates ( U n LF , U u LF ) and has slope dU u ∕dU n = −(1 − )∕ for values of U n such that −(w n ) 2 ∕2 ≤ U n ≤ (w u − q) 2 ∕(1 − ) + (w n ) 2 ∕2 . For U n > (w u − q) 2 ∕(1 − ) + (w n ) 2 ∕2 the slope of the PF is such The intuition is as follows. Starting from the laissez-faire equilibrium, a 1$ lumpsum tax levied on non-users, which reduces by 1 the utility of each non-user, allows the government to collect $ (1 − ) , which implies that each user can receive a lumpsum transfer of $ (1 − )∕ , raising utility by (1 − )∕ . This kind of income-and utility-redistribution, from non-users to users, can go on until all the income earned by non-users under laissez-faire, i.e. (w n ) 2 , is confiscated by the government. At that point we have that U n = −(w n ) 2 ∕2 (consumption for non-users is equal to zero and, with no income effects on labor supply, their labor supply is undistorted at its laissez-faire level) and U u = (w n ) 2 (1 − )∕ + (w u − q) 2 ∕2 . Once this point on the first-best PF is reached, and assuming that zero represents the lower bound for individual consumption, 6 a further increase in U u can only be obtained by pushing the labor supply of non-users above its undistorted level h n = w n (while keeping at zero their consumption), so that additional resources can be transferred to users. However, due to the distortion on the labor supply of non-users, redistribution becomes costlier and the slope of the PF becomes equal to dU u ∕dU n = −(1 − )w n ∕ h n , which is greater than −(1 − )∕ when h n exceeds w n , i.e. its laissez-faire value. 7 The fact that the non-negativity constraint on consumption becomes binding along some portions of the first-best PF, and consequently the fact that there are portions of the first-best PF where the labor supply of some agents is upward distorted, is an artifact of our assumption that utility is linear in consumption. 8 Most importantly, it has nothing to do with the fact that the SC property does not hold in our model. For this reason, in our analysis we will hereafter impose the following lower bounds on the utility of, respectively, non-users and users: Conditions (5) and (6) ensure that, at each point along the relevant part of the firstbest PF, the labor supply of all agents will be left undistorted.
6 One can think that individual consumption cannot fall below a subsistence level c . From this perspective, assuming that c = 0 is simply a matter of normalization. 7 A similar reasoning can be adopted to show that the slope of the first-best PF is equal to −(1 − )∕ for values of U n > U n LF and such that (w n ) 2 ∕2 < U n ≤ (w u − q) 2 ∕(1 − ) + (w n ) 2 ∕2 . When U n = (w u − q) 2 ∕(1 − ) + (w n ) 2 ∕2 , all the resources available for consumption by users under laissezfaire have been transferred to non-users. Since consumption for users has then reached its lower bound, a further increase in the utility of non-users can only be obtained by requiring users to increase their labor supply, while keeping at zero their consumption, so that additional resources can be transferred to nonusers. However, since the required increase in h u entails a distortion on the labor supply of users, redistribution becomes costlier and the slope of the PF becomes dU u ∕dU n = −(1 − )h u ∕ (w u − q) , which is lower than −(1 − )∕ when h u exceeds w u − q , i.e. its laissez-faire value. 8 The non-negativity constraint on consumption could be safely disregarded if the marginal utility of consumption goes to infinity as consumption approaches zero.

3 3 Pareto efficient income taxation
Consider now a second-best setting with asymmetric information. Specifically, assume that the government knows the distribution of types in the population but does not know "who is who". Albeit individual wages, hours of work and job-related expenses are not observed by the government, earned income is assumed to be publicly observable at an individual level. This allows earned income to be taxed on a nonlinear scale and the government's problem consists in optimally choosing the nonlinear income tax T(Y) . Notice however that, while T(Y) defines a link between earned income Y and after-tax income B which is a single-valued function, the link that it establishes between earned income and consumption is a multivalued function. This is because, for given Y and corresponding tax payment T(Y) , an individual consumption depends on the amount of job-related expenses.
As customary in the optimal tax literature, we will adopt a mechanism design approach assuming that the government optimally chooses two bundles in the (Y, B)space subject to the requirement that the chosen set of bundles satisfies public-budget balance, incentive-compatibility, and non-negativity constraints on both consumption and labor supply. Denoting by ( Y u , B u ) the bundle intended for users and by ( Y n , B n ) the one intended for non-users, a Pareto efficient tax problem can be formalized as follows: subject to: In the problem above, the -constraint prescribes a lower bound V n for the utility of non-users, the -constraint represents the government's budget constraint (the resource constraint of the economy), the -constraint is the self-selection constraint requiring non-users not to be tempted to choose the bundle intended for users, and the -constraint is the self-selection constraint requiring users not to be tempted to choose the bundle intended for non-users. For a given value of V n , we define the set of admissible bundles as the set of bundles {(Y u , B u ), (Y n , B n )} satisfying the constraints in the above optimization problem (including the non-negativity constraints on labor supply and consumption for each agent). For given values of , q, w u and w n , the value function of the optimization problem above defines a value for U u which is a function of V 1 3 Pareto efficient income taxation without single-crossing optimization problem for different values of V n allows tracing the entire second-best PF. In particular, we have that:

Definition 1
The second-best Pareto-frontier is defined by the graph of the function U u SB V n over the domain of values V n such that the set of admissible bundles is non-empty and the -constraint is binding.
We will present our results by means of three Propositions which separately consider the three cases described in Lemma 1 above. In each Proposition we will denote by T ′ Y u SB and T ′ Y n SB the marginal income tax rate faced by, respectively, users and non-users at the allocation which allows implementing a given point on the secondbest PF. As customary in the optimal tax literature, the marginal income tax rate faced by an individual at a given bundle in the (Y, B)-space is defined as 1 − MRS YB .
As we will see, the non-standard outcomes which are due to the violation of SC only arise when q ≥ q . For this reason, discussing the results when q < q can be regarded as a useful starting point. Proposition 1 summarizes the main findings for this case.
(w u ) 2 , the second-best PF coincides with the first-best PF and it is attained through an allocation where The results provided in Proposition 1 are qualitatively similar to those that would be obtained in a standard two-type setting where agents only differ in market ability ( q = 0).
Part (ii) shows that when the amount of inter-group redistribution is sufficiently small (i.e., V n is sufficiently close to U n LF ), no distortion is needed to satisfy incentive-compatibility; this means that asymmetric information does not prevent the government from attaining a point on the first-best PF.
Together, parts (iii) and (iv) show instead that, when the amount of redistribution becomes sufficiently large, incentive-compatibility considerations require to distort the labor supply of the transfer-recipients. When these are represented by non-users, as in part (iii) of Proposition 1, their labor supply will be downward distorted by letting them face a positive marginal tax rate. When transfer-recipients are instead represented by users, as in part (iv) , their labor supply will be upward distorted by letting them face a negative marginal tax rate. In either case, since Proposition 1 refers to the case when Y n LF < Y u LF , the direction of the distortion imposed on the labor supply of the transfer-recipients is always "coherent" with the income ranking prevailing under laissez-faire. Thus, when q < q , the laissez-faire income-ranking is preserved at all points on the second-best PF.
Let's now consider the case when q = q.
A key insight to understand the properties of the PF when q = q is that the indifference curve on which non-users locate under laissez-faire lies everywhere above the indifference curve on which users locate under laissez-faire (except at the point Y n LF = Y u LF where the two indifference curves are tangent). This is illustrated in Fig. 2. According to Proposition 2, the government can use a nonlinear income tax to redistribute towards users even in cases when both types earn the same income under laissez-faire. This stands in contrast to models where the SC holds; under SC, an anonymous nonlinear income tax does not allow the government to convert a pooling laissez-faire equilibrium into a separating equilibrium. However, as shown in part (ii), the labor supply of users is always distorted for V n < U n LF , which shows that the -constraint is binding for any degree of redistribution from non-users to users.
The indifference curves represented in Fig. 2 are helpful to get an intuition for the result that redistribution towards users is feasible. Suppose in fact that non-users were offered an undistorted bundle on an indifference curve that is below the one on which they locate under laissez-faire. Looking at Fig. 2 it is easy to realize that a downward shift in the indifference curve of non-users would allow to find a set of bundles that are at the same time above the users' laissez-faire indifference curve and below the downward shifted indifference curve of non-users. This means that, starting from the equilibrium described in Fig. 2, it is feasible to move non-users on a lower indifference curve without violating the incentive-compatibility constraint requiring them not to be tempted to mimic users (the -constraint).
Pareto efficient income taxation without single-crossing According to part (ii) of Proposition 2, for each V n ∈ [(1 − )U n LF , U n LF ) the corresponding point on the second-best PF can be achieved through two different allocations. The two allocations are equivalent in the sense that they induce the same utility distribution. Although at both allocations non-users get the same (Y, B)-bundle and face no distortion on their labor supply ( T � Y n SB = 0 and Y n SB = Y n LF ), one implementing allocation entails a downward distortion on the labor supply of users ( T ′ Y u SB > 0 and Y u SB < Y u LF ), whereas the other implementing allocation entails an upward distortion on their labor supply ( T ′ Y u SB < 0 and Y u SB < Y u LF ). Intuitively, the reason why there are two different allocations that allow achieving the same point on the secondbest PF is that, with q = q , the magnitude of the distortion on users' labor supply, that is needed to deter mimicking by non-users, is the same independently on its direction.
According to part (iii), for V n < (1 − )U n LF , a point on the second-best PF always requires that the labor supply of users is upward distorted ( T ′ Y u SB < 0 ). To understand why this is the case, consider the point on the second-best PF that corresponds to V n = (1 − )U n LF . Of the two allocations that allow implementing this point, the allocation entailing a downward distortion on the labor supply of users prescribes to offer them the bundle (Y, B) = 0, (1 − )U n LF . At this bundle their labor supply is pushed to its lower bound. Given that incentive-compatibility (the -constraint) requires that a reduction in V n is accompanied by a larger (in absolute value) distortion on users, it follows that once V n has reached (1 − )U n LF , a further reduction cannot be accommodated by magnifying the downward distortion on the labor supply of users. Therefore, for V n < (1 − )U n LF , the implementing allocation becomes unique and it requires to distort upwards the users' labor supply. Finally, Proposition 2 shows that when the two types are pooled at the laissez-faire equilibrium, it is never possible to use a nonlinear income tax to redistribute from users to non-users, i.e. there is no point on the second-best PF where non-users get a utility higher than U n LF . An intuition for this result can again be grasped by looking at the indifference curves depicted in Fig. 2. Given that the laissez-faire indifference curve of users lies everywhere below the laissez-faire indifference curve of non-users (except at Y u LF = Y n LF where they are tangent), it is impossible to move users on a lower indifference curve without violating the incentive-compatibility constraint requiring them not to be tempted to mimic non-users (the -constraint). Taking into account that, as previously noticed, for V n < U n LF the labor supply of users is always distorted, it also follows that when the laissez-faire equilibrium features pooling, the first-bestand the second-best PF share only one point, i.e. the laissez-faire utility distribution.
Let's now move to the last case that is left to consider, i.e. the case when q > q.
, the second-best PF is disconnected and the domain of the function U u SB V n is given by the domain is instead given by

any point on that region corresponds to an allocation at which
Qualitatively, some of the results provided in Proposition 3 are standard. For instance, according to part (ii), no distortion is needed to satisfy 1 3 Pareto efficient income taxation without single-crossing incentive-compatibility when the amount of inter-group redistribution is sufficiently small (i.e., for values of V n sufficiently close to U n LF ). Another standard result is represented by part (iii) which states that, for V n > U n LF , if incentive-compatibility considerations require to distort the bundle offered to the transfer-recipients (in this case, non-users), the direction of the distortion is "coherent" with the income ranking under laissez-faire.
Two results stand out instead as non-standard and are specifically due to the violation of the SC condition. The first, stated in part (i), highlights the possibility that the second-best PF is disconnected. The second, which is a consequence of parts (iv) and(v), highlights that moving along the portion of the second-best PF where V n < U n LF , the sign of T ′ Y u SB may change. In particular, despite the fact that Y n LF > Y u LF , users do not necessarily face a downward distortion on their labor supply at all points on the PF where the -constraint is binding, i.e. at all points where the labor supply of users needs to be distorted to prevent mimicking by non-users.
These two results are strictly related due to the fact that the sign of T ′ Y u SB is not everywhere non-negative if and only if the domain of the function U u SB V n is a disconnected set, which in turn happens when q < q To understand these results, consider first Fig. 3, which illustrates the qualitative features of the solution to the government's problem for any given value of V In the figure, the dashed 45 • line represents the laissez-faire budget line (no taxes nor transfers), and points I and V represent the bundles chosen under laissez-faire by, respectively, non-users and users ( Y n LF > Y u LF ). Bundle II represents the undistorted bundle offered to non-users on their indifference curve associated with U n = V n . The blue 45 • line represents the virtual budget line on which, given the revenue extracted from non-users, a bundle for users can be offered. 9 On this virtual budget line, incentive-compatibility considerations (the need to satisfy the -constraint) prevents the government from offering users the undistorted bundle labelled VI. To prevent non-users from behaving as mimickers, users can only be offered, on the virtual budget line, either bundles to the left of III or bundles to the right of IV, with both bundle III and bundle IV belonging to the set of admissible bundles. The difference between these two sets of bundles is that, whereas with bundle III, or bundles to the left of it, type separation is achieved by imposing a sufficiently large downward distortion on the users' labor supply, in the case of bundle IV, or bundles to the right of it, type separation is achieved by imposing a sufficiently large upward distortion on the users' labor supply.
The black curve passing through bundle III is an indifference curve pertaining to users. The figure shows that, among all the admissible bundles that can be offered to users, bundle III is the one at which their utility is maximized. In particular, notice 9 The value of the intercept of the blue 45 • line is given by U n LF − V n (1 − )∕ . Thus, the intercept is higher the smaller is V n (i.e., the larger is the tax collected from each non-user) and the smaller is (i.e., the smaller is the fraction of transfer-recipients). that the utility of users is strictly higher at bundle III than at bundle IV. The intuition is that, even though the -constraint can be satisfied by imposing either a sufficiently large downward-or a sufficiently large upward distortion on the labor supply of users, the size of the required distortion is smaller when type separation is obtained by distorting downwards the users' labor supply ( T ′ Y u SB > 0 ). This allows achieving type separation at a lower efficiency cost.
Consider now Fig. 4, which illustrates the solution to the government's problem for the case when V n is lowered to (1 − )U n LF . In Fig. 4, the dashed 45 • line represents the laissez-faire budget line, and the point labelled I on this line represents the bundle selected by non-users under laissezfaire. Bundle II represents the undistorted bundle offered to non-users lying on the indifference curve where V n = (1 − )U n LF . The blue 45 • line represents the virtual budget line on which a bundle for users can be offered given the revenue extracted from non-users. Incentive compatibility requires that, on the blue virtual budget line, users can only be offered either bundle III or bundles to the right of IV, with bundle IV belonging to the set of admissible bundles. The black curve passing through bundle III is an indifference curve pertaining to users and it shows that bundle III is strictly preferred by users to bundle IV. Comparing bundle III in Fig. 4 with the corresponding bundle in Fig. 3, we can also see that the size of the downward distortion on the users' labor supply is larger in Fig. 4. 10 The important thing to notice, however, is that at bundle III the users' labor supply has been pushed to its lower bound ( Y u = 0). 11 In a standard model where the SC condition holds, the utility achieved by users at bundle III in Fig. 4 would represent their maximal utility along the second-best PF. The reason is straightforward. Suppose that single-crossing were satisfied and that at all bundles in the (Y, B)-space users had steeper indifference curves. Then, the users' indifference curve represented in Fig. 4 would lie everywhere above the indifference curve of non-users, except at bundle III. But this would necessarily imply that, if non-users were to be offered a bundle on a lower indifference curve (to increase the tax revenue collected from them), any (Y, B)-bundle that makes users better off (compared to bundle III in Fig. 4) would violate incentive-compatibility since it would induce non-users to behave as mimickers.
With SC being violated, instead, things are different. In Fig. 4 all the bundles that are included in the gray area represent bundles that would at the same time: (i) make users better off (compared to the utility that they achieve at bundle III), and (ii) be incentive-compatible in the sense that they would not induce non-users to reject 10 This is easily understood by looking at Fig. 3 and thinking at how bundle III would be affected by a downward shift in the indifference curves of non-users. Since such a downward shift would also entail an upward shift in the intercept of the 45 • virtual budget line (as more revenue is collected from non-users), the new bundle III would be necessarily associated with a lower value of the users' labor supply. 11 One can also notice that when V n = (1 − )U n LF and users are offered bundle III, utilities are equalized: Even though the bundles in the gray area cannot be offered to users since they violate the public-budget constraint (when V n = (1 − )U n LF and non-users are offered the bundle II), users might be offered a bundle in the gray area if more revenue were collected from non-users, so that the blue virtual budget line could be shifted up. However, since collecting more revenue from non-users implies moving them on a lower indifference curve, and since this implies that the set of bundles in the gray area shrinks, the violation of SC is in general not sufficient to guarantee that the utility of users can be raised above the utility reached at bundle III. What is required is that the simultaneous upward shift in the virtual budget line, and downward shift in the indifference curve of non-users, push their point of intersection (currently at point IV in Fig. 4) inside the gray area. This is more likely to happen the smaller is and the smaller the difference Y n LF − Y u LF (> 0). 12 Notice also that at any bundle inside the gray area the labor supply of users is to raise U u above the level that it achieves at bundle III, users will need to be assigned a bundle at which their labor supply is upward distorted. Moreover, since the users' utility is strictly higher at bundle III than at bundle IV, raising U u above the value achieved at bundle III would necessarily require a discrete downward jump in V n . This is illustrated in Fig. 5 below which shows the second-best PF with the property that the domain of the function U u SB V n is disconnected.
Finally, notice that when the second-best PF looks like in Fig. 5, the earnedincome ranking that corresponds to the various points on the frontier is not always consistent with the income ranking under laissez-faire. Along the region where V n < U n LF , one moves from a portion of the second-best PF that coincides with the first-best frontier (the green part with slope −(1 − )∕ ), to a portion where T ′ Y u SB > 0 (the red part of the curve in Fig. 5), and finally to a portion where T ′ Y u SB < 0 (the blue part of the curve in Fig. 5). When entering this last portion, the earned-income ranking is no longer consistent with the one under laissez-faire Both the possibility that the second-best PF is disconnected and the possibility of income re-ranking follow from the circumstance that in our setting the SC condition is violated. 13 Similarly, it is because of the violation of the SC condition that, when 12 Regarding the effect of , the reason is that a smaller implies that a given upward shift in the blue virtual budget line can be accommodated by a smaller downward shift in the indifference curves of nonusers. Regarding the effect of Y n LF − Y u LF , assume that, for given w n , either w u increases or q decreases (while still satisfying the inequality . This would produce a flattening effect on the indifference curve for users that is displayed in Fig. 4, which would in turn imply that its second intersection with the indifference curve of non-users would occur at a lower value of Y. A smaller upward shift in the blue virtual budget line would then be needed to move bundle IV inside the gray area. redistribution favors users, it might be optimal to let them face a negative marginal tax rate even in cases when they earn less than non-users under laissez-faire. This shows that the violation of SC can provide a novel rationale for negative marginal tax rates. 14

Subsidizing work-related expenses
In our analysis we have so far maintained the assumption that the only policy instrument is a nonlinear income tax. In this setting we have highlighted the consequences descending from the violation of the SC condition. Most governments, however, allow special tax treatments for work-related expenses. 15 To consider this possibility, and given that a "special" tax treatment usually implies a more lenient one, we will now investigate how our results are affected when job-related expenses are subsidized at a flat rate s > 0 that is optimally chosen. 16 Moreover, since a subsidy on job-related expenses is only valuable to users, we will confine our attention to the 14 Previous contributions that have highlighted the possibility that negative marginal tax rates are optimal include Stiglitz (1982), Saez (2002) and Choné and Laroque (2010). In these papers the SC condition is satisfied and the justification for negative marginal tax rates either comes from the assumption that wages are endogenous or from specific assumptions on the profile of social weights that apply to the different types of agents in the economy. 15 Recent contributions that have analyzed the optimal tax treatment of work-related expenses include Koehne and Sachs (2017), Bastani et al. (2020) and Ho and Pavoni (2020), where the last two papers explicitly focus on the case of child care expenditures. A common feature of these papers is that they consider a setting where all agents are, according to our terminology, "users". 16 We are implicitly assuming that job-related expenses are not observable by the government at the individual level so that a nonlinear subsidy scheme is not an option. Lack of public observability of personal purchases is an assumption that is often made in the optimal tax literature (see, e.g., Anderberg and Balestrino 2000;Cremer et al. 2001;Blomquist et al. 2010;Jacobs and Boadway 2014;Casarico et al. 2015). In our setting it appears a realistic case to consider since individuals have often the possibility to misreport their true work-related expenses to the tax authority. For purchases of work-related goods, as opposed to work-related services, the possibility of reselling by agents exacerbates the problem of observing consumption at the individual level.

Fig. 5
A disconnected second-best Pareto frontier portion of the PF where V n < U n LF , i.e. to the portion of the PF where redistribution goes from non-users to users.
The first thing to notice is that the subsidy has a flattening effect on the indifference curves for users in the (Y, B)-space. For a given (positive) value of s and a given bundle in the (Y, B)-space, we have that MRS u YB = (1 − s)q + Y w u ∕w u . Thus, the threshold value for Y, separating the bundles where MRS u YB > MRS n YB from those where MRS u YB < MRS n YB , lowers from Ω , as defined in (4), to Hence, the SC property is restored if s ≥ 1. 17 Most importantly, notice that in our setting a subsidy on job-related expenses represents a very effective instrument to redistribute towards users. This is because non-users derive no benefit from the subsidy. Therefore, channeling at least part of the resources transferred to users through a subsidy on job-related expenses makes it less attractive for non-users to behave as mimickers. One can then expect that, by supplementing an optimal nonlinear income tax with an optimally chosen s, the first-best PF and the second-best PF will coincide over a larger set of values for V n .
In particular, since we know from the analysis in Sect. 3 that an optimal nonlinear income tax is sufficient to implement a first-best optimum (i.e., a point on the firstbest PF) when , one can expect that using s as an additional policy tool allows implementing a first-best optimum also for a range of values for V n that are strictly lower than U n LF − 2 As shown in Proposition 4 below, which looks at the solution to the government's problem for values of V and that the government is optimizing a nonlinear income tax and a proportional subsidy on work-related expenses. Moreover, let V ≡ U n LF − 2 18 The reason for restricting attention to cases where V n ≥ −U n LF is that it allows us to neglect the possibility that the labor supply of non-users is distorted at a first-best optimum due to the non-negativity constraint on private consumption. See the discussion in Sect. 2.2. 19 As explained in the beginning of Sect. 4, due to the fact that a "special" tax treatment for work-related expenses usually means that these kind of expenses are subject to a more lenient tax treatment, in our analysis we restrict attention to the case when work-related expenses are subsidized. However, one can show that a positive tax on work-related expenses ( s < 0 ) can be used as an instrument that makes it less attractive for users to behave as mimickers. Thus, supplementing a nonlinear income tax with a tax on work-related expenses would allow to shift outwards the PF when V n > U n LF .

3
Pareto efficient income taxation without single-crossing (i) Suppose that q ≤ q (i.e., Y u LF ≥ Y n LF ); then, the second-best PF will coincide with the first-best PF.
(ii) Suppose that q > q (i.e., Y u LF < Y n LF ). For V n ≥V , the second-best PF coincides with the first-best PF. For V n < � V , instead, both self-selection constraints will be binding and any point on the second-best PF corresponds to an allocation at which both types of agents face a distortion on their labor supply.
Proof See Appendix D. ◻ According to Proposition 4, there is a crucial difference between cases where q ≤ q and cases where q > q . In the first scenario, using s as an additional policy instrument always allows implementing a first-best optimum. Instead, when q > q , a first-best optimum can only be implemented as long as the utility of non-users does not fall below a given threshold value V . Below, we discuss in two separate subsections the results provided by Proposition 4.

Part (i)
Consider an initial equilibrium where an optimal nonlinear income tax is used in isolation ( s = 0 ) and users are offered a distorted bundle to prevent mimicking by non-users. The transfer received by each user is equal to B u − Y u at the initial equilibrium. Introducing a small subsidy on job-related expenses ( ds > 0 ), while at the same time adjusting B u downwards by dB u = −(qY u ∕w u )ds , would leave unchanged the net transfer received by each user. 20 Such a reform, however, would make mimicking less attractive for non-users. 21 Therefore, by relaxing the incentive-compatibility constraint for non-users, the reform would pave the way for the possibility to offer users a bundle where their labor supply is less distorted and their utility is higher. When Y u LF ≥ Y n LF , one can replicate the kind of reform described above (which hinges on raising s, lowering B u and moving Y u closer to its undistorted level) until a first-best optimum is achieved where no agent's labor supply is distorted. This is because one can set s with the sole purpose of deterring mimicking by non-users, safely disregarding the other self-selection constraint, i.e. the one requiring users not to behave as mimickers. The intuition is provided in Fig. 6 below.
In Fig. 6, the bundle labelled I represents the undistorted bundle offered to nonusers and lying on the red indifference curve where U n = V n < U n LF . The blue 45 • line represents the virtual budget line on which a bundle for users can be offered, given the revenue extracted from non-users, when a nonlinear income tax is used in isolation ( s = 0 ). Incentive compatibility prevents the government from offering users the (firstbest) undistorted bundle labelled IV. Instead, users will be offered the incentive-compatible bundle labelled III. Keeping fixed V n and supplementing a nonlinear income tax with a subsidy on job-related expenses implies that users can be offered a bundle on a virtual budget line that is flatter than the one prevailing when s = 0 . In particular, while its intercept does not change, 22 its slope drops from 1 to 1 − sq∕w u . The dashed blue line represents the virtual budget line generated by supplementing a nonlinear income tax with a subsidy which is just large enough to allow the government to offer an undistorted bundle to users (bundle labelled II) without inducing mimicking by non-users. Notice that the vertical distance between bundle IV and bundle II is equal to sqY u LF ∕w u . Taking into account that, at bundle IV, the subsidy was set equal to zero, whereas at bundle II users save an amount sqY u LF ∕w u on job-related expenses, users get the same net consumption at both bundles, and therefore enjoy the same utility (since labor supply is the same). It is also obvious from the figure that users, whose indifference curve is depicted in black, have no incentive to behave as mimickers since they strictly prefer bundle II to bundle I. The reason is easy to grasp. At bundle II their indifference curve is tangent to the virtual budget line generated by supplementing the 22 The intercept is always equal to U n LF − V n (1 − )∕ , which represents the per-user transfer that can be financed when the utility of non-users is set at V n < U n LF .

Fig. 6
Subsidizing work-related expenses when q < q income tax with a subsidy on job-related expenses. Thus, along the black indifference curve, at all bundles to the left of II we have that MRS u YB < 1 − sq∕w u < 1 . Instead, along the red indifference curve (for non-users), at all bundles between I and II we have that MRS n YB > 1 . Therefore, the fact that the two indifference curves cross at bundle II necessarily implies that bundle II is strictly preferred by users to bundle I.

Part (ii)
Things are instead different when Y u LF < Y n LF . In this case, setting s large enough to deter mimicking by non-users might imply that users have an incentive to mimic non-users. The intuition why this other self-selection constraint cannot always be disregarded is provided in Fig. 7 below.
In Fig. 7 the bundle labelled I represents the undistorted bundle offered to nonusers. The blue 45 • line represents the virtual budget line on which a bundle for users can be offered, given the revenue extracted from non-users, when a nonlinear income tax is used in isolation. Incentive compatibility prevents the government from offering users the undistorted bundle labelled IV; instead users will be offered the incentive-compatible bundle labelled III. The dashed blue line is the virtual budget line generated by supplementing a nonlinear income tax with a subsidy which is just large enough to allow the government to offer an undistorted bundle to users (bundle labelled II) without inducing mimicking by non-users. As was the case in Fig. 6, users get the same net consumption at both bundle IV (without the subsidy) and bundle II (with the subsidy), and therefore enjoy the same utility at both bundles. The figure shows that users, whose indifference curve is depicted in black, are indifferent between choosing the bundle II, intended for them by the government, and choosing the bundle intended for non-users. 23 The case represented in Fig. 7 shows a situation where both self-selection constraints are binding but the government is still able to implement a first-best optimum. 24 This happens when Y u LF < Y n LF and V n =V . Further lowering V n would no 23 Notice that this can only happen when job-related expenses are subsidized. With s = 0 and Y u LF < Y n LF , at any given bundle to the left of Y n LF , users have an indifference curve that is steeper than the one pertaining to non-users. When the two indifference curves cross the second time, it will happen at a bundle where Y > Y n LF . Therefore, with s = 0 , if non-users were indifferent between bundle I and bundle II, users would strictly prefer bundle II. 24 Notice that when a nonlinear income tax is used in isolation, as assumed in Sect. 3, the solution to the government's problem can never be a separating equilibrium where both self-selection constraints are binding. To see the reason for this, suppose to start from a separating equilibrium where both self-selection constraints are binding. With a binding public budget constraint, one (Y, B)-bundle will be associated with a positive tax payment and another one with a negative tax payment. Then the government could improve upon the initial set of bundles by implementing a pooling allocation where all agents are offered the bundle to which is associated a positive tax payment (the utility of all agents would be unaffected and the government would run a positive surplus). But this cannot be an optimum either, since the government's budget constraint would be slack. Consider instead the case when the income tax is supplemented by a subsidy on job-related expenses. In Fig. 7, the government budget constraint would be violated if both groups were to choose bundle II; it would also be violated if both groups were to choose bundle I (since the dashed blue line, on which a bundle for users can be offered without violating publicbudget balance, lies below bundle I).
longer allow the government to implement a first-best optimum. A higher subsidy would be needed to still offer an undistorted bundle to users without inducing nonusers to mimic. But a higher subsidy would induce users to mimic non-users. Thus, lowering V n below V will induce the government to raise s, but not as much as it would be needed to offer users an undistorted bundle. The optimal s will then represent a trade-off between the desirable effects in terms of deterring mimicking by non-users and the undesirable effects of making it more tempting for users to mimic non-users. At the resulting second-best optimum both self-selection constraint are binding and both types face a distortion on their labor supply. 25 For V n lower than but sufficiently close to V , the second-best optimum will be a separating equilibrium where each group is offered a distinct (Y, B)-bundle and the labor supply of both types is downward distorted ( Y u SB < Y u LF , Y n SB < Y n LF and Y u SB < Y n SB ). As one keeps lowering V n , the distortions needed to implement a separating equilibrium become larger and larger, and one finally reaches a value for V n below which it is no longer possible to further increase the users' utility.
However, notice that when s is an additional policy instrument, the redistributive goals of the government do not necessarily require the implementation of a separating equilibrium, i.e., an equilibrium where each group is offered a distinct Fig. 7 Subsidizing work-related expenses when q > q 25 As shown in Appendix E, when Y u LF < Y n LF such a second-best equilibrium will be the necessary outcome under a max-min planner.

3
Pareto efficient income taxation without single-crossing (Y, B)-bundle. Given that only users benefit from the subsidy s, redistribution can also be achieved by implementing a pooling equilibrium where both groups are offered the same (Y, B)-bundle (but have, nonetheless, different consumption). In particular, at a pooling equilibrium the government would solve the following optimization problem: subject to Substitute B = V n + (Y∕w n ) 2 ∕2 and sqY∕w u = (Y − B)∕ , from constraint (8) and (9), respectively, into the objective function. The constrained optimization problem above can then be rewritten in an unconstrained way as From the first order condition of the problem above, denoting by Y p the optimal value of Y, one gets: Moreover, when w u (w u − q) < (w n ) 2 (i.e., Y u LF < Y n LF ), it is straightforward to show that From (12) we can conclude that, at a pooling equilibrium, the labor supply of users is upward distorted and the labor supply of non-users is downward distorted. Moreover, from (11) we can also see that, since Y p does not depend on V n , the magnitude of these distortions does not depend on the specific value of V n . Substituting (11) into the objective function of () we get that, at a pooling equilibrium, the users' utility is given by which implies that U u ∕ V n = −(1 − )∕ , i.e. the same slope that characterizes the first-best PF.
Clearly, for � V ≤ V n < U n LF , a pooling equilibrium will never be chosen by the government. The reason is that, for � V ≤ V n < U n LF the government can implement a separating equilibrium which allows attaining a point on the first-best PF. Under a pooling equilibrium, instead, it is never possible to reach a point on the first-best PF (given that the labor supply of both groups of agents is distorted). For values of V n that are smaller than but sufficiently close to V , a separating equilibrium will again dominate a pooling equilibrium; even though both equilibria entail a distortion on the labor supply of both groups and a point on the first-best PF can no longer be attained, the distortions are less severe under a separating equilibrium. However, for sufficiently low values of V n , a pooling equilibrium will dominate a separating equilibrium. The reason is that the distortions needed to implement a separating equilibrium become larger and larger as one keeps lowering V n ; under a pooling equilibrium, instead, the magnitude of the distortions does not depend on the specific value of V n . The possibility of both types of second-best equilibria (separating and pooling), depending on the chosen value for V n , is illustrated by means of a numerical example in Appendix F. 26 The example also illustrates the fact that the second-best PF can be disconnected even when the nonlinear income tax is supplemented by an optimal subsidy on job-related expenses.

Pareto efficient taxation when job-related expenses are a nonlinear function of hours of work
In Sect. 3 we have emphasized three main anomalies descending from the violation of SC: (i) an anonymous nonlinear income tax may allow the government to convert a pooling laissez-faire equilibrium into a separating equilibrium; (ii) the second-best PF may be disconnected; (iii) a second-best optimum may not preserve the income ranking prevailing under laissez-faire. As we show in a background version of this paper (see Bastani et al. 2019), similar qualitative results generalize, with some nuances, to a setting where the function (h) (describing the work-related monetary costs) is convex or concave. However, when (h) is concave, one additional anomaly may arise. In particular, when redistribution goes from non-users to users, it is possible that a second-best optimum entails a distortion on the labor supply of users even when no self-selection constraint is (locally) binding in equilibrium. The reason is that, when (h) is sufficiently concave, it is no longer the case that MRS u YB is monotonically increasing in Y. 27 To see this, notice that, for individual preferences given by U = c − h 2 ∕2 and a general nonlinear function (h) , MRS u YB is given by MRS u YB = � (Y∕w u ) + Y∕w u ∕w u . Assume that (h) is an increasing and concave function which also satisfies the conditions � (0) > w u , �� (0) < −1 ,

3
Pareto efficient income taxation without single-crossing and ��� (h) > 0 . Then, while the value of MRS u YB is always positive for Y ≥ 0 , it is larger than 1 and decreasing in Y for sufficiently small values of Y. The fact that MRS u YB > 1 for sufficiently low values of Y implies that, when incentive-compatibility considerations require that Y u must be very small (to prevent mimicking by nonusers), it may be optimal to offer users a bundle where Y u = 0 even though it would be incentive-compatible to let them increase to some extent their labor supply (and enjoy a slightly larger value of consumption). This possibility is illustrated in Fig. 8 below and a numerical example is provided in Appendix F.
In Fig. 8, the point I represents the bundle selected by non-users under laissez-faire. Bundle II represents the undistorted bundle offered to non-users lying on the indifference curve where U n = V n < U n LF . The blue 45 • line represents the virtual budget line on which a bundle for users can be offered given the revenue extracted from non-users. Incentive compatibility requires that users can only be offered bundles to the left of bundle V and to the right of bundle VI, with both V and VI belonging to the set of admissible bundles. The three black curves passing through bundles V, IV and III are three different indifference curves pertaining to users.
From the figure, one can see that bundle IV is strictly preferred by users to both the bundle V and bundle VI. But if users are offered the bundle IV, the self-selection constraint requiring non-users not to mimic users is slack. Notice also that users would be better off if they could get bundle III on the blue virtual budget line, i.e. the bundle at which their labor supply is undistorted. However, offering them this bundle would induce mimicking by non-users. Therefore, at a second-best optimum users are offered bundle IV and non-users are offered bundle II; the labor supply of Fig. 8 Distortions without binding self-selection constraints users is downward distorted even though no self-selection constraint is binding at the second-best optimum. 28 , 29

Concluding remarks
In this paper, we have considered a two-type optimal nonlinear income tax model where agents differ both in terms of market ability and in terms of "needs" for a work-related good/service, i.e. a good/service that some agents need to purchase in order to work. Because of this bi-dimensional heterogeneity, the single-crossing conditions fails to hold. Ruling out public observability of individual types, we have characterized the properties of a second-best optimum by looking at the entire second-best Pareto frontier.
We have highlighted that, due to the violation of single-crossing, some nonstandard results arise. First of all, a second-best optimum might not preserve the earned-income ranking that prevails under laissez-faire. Second, redistribution via income taxation might be feasible even when the laissez-faire equilibrium is a pooling equilibrium. Third, a second-best optimum might not be unique, in the sense that there might be more than one set of allocations in the (pre-tax income, aftertax income)-space that solve the government's maximization problem. Fourth, the second-best Pareto frontier may be disconnected. Fifth, supplementing an optimal nonlinear income tax with an optimal subsidy on work-related expenses may imply that redistribution is achieved through a separating-or pooling equilibrium where both self-selection constraints are binding. Sixth, we have shown that the labor supply of some agents might be distorted even though no self-selection constraint is (locally) binding in equilibrium.
Before concluding, a final remark is in order. For tractability reasons, we have focused our analysis on a simplified two-type model where skills and needs are perfectly correlated. However, insofar as our non-standard results hinge on the violation of the single-crossing condition, they generalize, with some nuances, to settings with a larger number of types and imperfect correlation between skills and needs. 28 Nonetheless, the reason why users are offered a distorted bundle is ultimately due to the need to prevent mimicking from non-users and ensure proper self-selection by agents. 29 It should be noticed that the labor supply of users is downward distorted even though at bundle IV the users' MRS is larger than 1, i.e. it satisfies the standard definition of upward distortion. This happens because the standard definition of downward and upward distortion is only valid insofar as an individual's indifference curves are everywhere convex in the (Y, B)-space. To clarify this point, suppose that the indifference curves are everywhere convex and that an individual is located at a bundle A where his MRS is larger (resp.: smaller) than 1. Then, the conclusion that the labor supply of this agent is upward (resp.: downward) distorted is based on the observation that, if the individual could freely choose any bundle along a 45 • line going through bundle A, he would choose a bundle to the left (resp.: right) of bundle A. However, if the indifference curves are not everywhere convex, the fact that MRS > (<)1 at bundle A does not imply that, if the agent were free to choose any bundle along a 45 • line going through bundle A, he/she would necessarily choose a bundle to the left (resp.: right) of bundle A.

Pareto efficient income taxation without single-crossing
Acknowledgements Open access funding provided by Linnaeus University.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.

Appendix A
Proof of Proposition 1 Assume first that V n < U n LF , so that users are offered a (Y, B)bundle such that Y u − B u < 0 and non-users a (Y, B)-bundle such that Y n − B n > 0 . With income tax revenue collected from each non-user being equal to Y n − B n , the revenue that can be transferred to each user is equal to (Y n − B n )(1 − )∕ . With non-users being offered a bundle on their indifference curve with associated utility value V n , the maximum revenue that the government can collect from them is obtained at the bundle where their labor supply is undistorted, implying a zero implicit marginal income tax rate for non-users. 30 Thus, independently on the value of V n , we will have that Y n = (w n ) 2 . With V n < U n LF and Y n = (w n ) 2 , the government collects from each non-user an amount of revenue equal to Y n − B n = (w n ) This implies that the revenue that can be transferred to each user is equal to (1 − ) (1∕2)(w n ) 2 − V n ∕ , which in turn implies that users will be offered a bundle on the virtual budget line On this virtual budget line, however, some bundles cannot be offered since they would induce mimicking by non-users. To find the set of incentive-compatible bundles on the virtual budget line (A1), one has to identify the values for Y at which the relevant indifference curve for non-users (i.e. the one associated with utility V n ) intersects the virtual budget line.
Taking into account that the relevant indifference curve for non-users has equation 30 This is true as long as the undistorted bundle on the indifference curve V n does not violate the constraint B n ≥ 0 , i.e. as long as V n ≥ −U n LF . As we discuss in Sect. 2.2, in our characterization of the PF we impose the restriction that the utility of non-users cannot fall below −U n LF (see Sect. 2.2 and in particular condition (5)). by equating (A1) and (A2) one can find two values for Y. These are given by: where the term within square root is positive due to the initial assumption that where the RHS of (A6) is strictly lower than (w n ) 2 ∕2 = U n LF . Suppose instead that inequality (A6) does not hold. Offering users an undistorted bundle along the virtual budget line (A1) would then violate the (A3) , 31 Notice that, for sufficiently low values of V n (in particular, V n < (1 − )(w n ) 2 ∕2 ), the lower root of (A3) is negative; when this happens, the set of incentive-compatible bundles on the virtual budget line (A1) is given by those bundles where Y is greater or equal to the larger root.
Pareto efficient income taxation without single-crossing incentive-compatibility constraint for non-users. This implies that users will either be offered the bundle Y A , B A where and their labor supply is distorted downwards ( T ′ Y u SB > 0 ), or the bundle Y B , B B where and their labor supply is distorted upwards ( T ′ Y u SB < 0). For later purposes, notice that from (A7), since Y A cannot take negative values, U n can never fall below (1 − )(w n ) 2 ∕2 when users are offered the bundle ( Y A , B A ).
Evaluating the utility of users at the bundle characterized by (A7)-(A8), we have: whereas the utility of users at the bundle characterized by (A9)-(A10) is Before comparing the utility of users at Y A , B A and Y B , B B , notice that a necessary condition for Y A , B A to be part of the second-best PF is that U u Y A , B A ∕ V n < 0 (and similarly, a necessary condition for Y B , B B to be part of the second-best PF is that . This is given by: when Under our assumption that < 1 − (w n ) 2 ∕(w u ) 2 , it follows that (A14) is satisfied as long as where the RHS of (A15) is larger than (1 − )(w n ) 2 ∕2 for < q∕w u . Noticing that q < q ⟹ 1 − (w n ) 2 (w u ) 2 > q w u , we can conclude that, with q < q , offering users the bundle Y A , B A can never be optimal when ≥ q w u . Consider now U u Y B , B B ∕ V n . This is given by: and our assumption that < 1 − (w n ) 2 ∕(w u ) 2 implies that (A17) is always satisfied. Let's now compare U u Y A , B A and U u Y B , B B as given by (A11)-(A12). Simple algebra can be used to show that Therefore, we can conclude that U u Y B , B B > U u Y A , B A for q < q . This shows that, when q < q and (A6) is violated, a second-best optimum will necessarily entail an upward distortion on the labor supply of users ( T ′ Y u SB < 0). Assume now that V n > U n LF . This implies that the optimal bundles offered by the government will entail Y n − B n < 0 and Y u − B u > 0 . With revenue collected from each user being equal to Y u − B u , the revenue that can be transferred to each nonuser is equal to (Y u − B u ) ∕(1 − ) . With users being offered a bundle on their indifference curve with associated utility value U u SB , the maximum revenue that the government can collect from them is obtained at the bundle where their labor supply is undistorted (implying a zero implicit marginal income tax rate for users). In our setting this implies that, independently on the value of U u SB , we will have that Y u = (w u − q)w u . 32 Thus, when the utility obtained by users at a second-best optimum is U u SB < U u LF and their labor supply is left undistorted, the government collects from each user an amount of revenue equal to This implies that the revenue that can be transferred to each non-user is equal to 1 2 (w u − q) 2 − U u SB ∕(1 − ) , which in turn implies that non-users will be offered a bundle on the virtual budget line: On this virtual budget line, however, some bundles cannot be offered since they would induce mimicking by users. To find the set of incentive-compatible bundles, one has to identify the two values for Y at which the relevant indifference curve for users (i.e. the one associated with utility U u SB ) intersects the virtual budget line. Taking into account that the relevant indifference curve for users has equation 1 3 by equating (A19) and (A20) we can find the two relevant values for Y. These are given by where the term within square root is positive due to our assumption that U u SB < U u LF = (w u − q) 2 ∕2. On the virtual budget line (A19), the incentive-compatible bundles (which do not induce users to behave as mimickers) are those satisfying either of the following two conditions: If incentive-compatibility considerations were not an issue, non-users could be offered on the virtual budget line (A19) the undistorted bundle Thus, if it is either the case that or the labor supply of non-users can be left undistorted ( T � Y n SB = 0 ). Solving (A21) and (A22) for U u SB , one finds that T � Y n SB = 0 when

3
Pareto efficient income taxation without single-crossing where the RHS of (A23) is strictly lower than (w u − q) 2 ∕2 = U u LF . Taking into account that when non-users are offered an undistorted bundle, their utility is and substituting for U u SB in (A24) the value provided by the RHS of (A23), one gets the maximum utility that can be enjoyed by non-users without resorting to distort their labor supply: Suppose now that inequality (A23) does not hold. Offering non-users an undistorted bundle along the virtual budget line (A19) would violate the incentive-compatibility constraint for users. This implies that non-users will either be offered the bundle Y C , B C where and their labor supply is distorted downwards ( T ′ Y n SB > 0 ), or the bundle Y D , B D where and their labor supply is distorted upwards ( T ′ Y n SB < 0). For later purposes, notice that from (A25), since Y C cannot take negative values, U u can never fall below (w u − q) 2 ∕2 when non-users are offered the bundle ( Y C , B C ).
Evaluating the utility of non-users at the bundle characterized by (A25)-(A26), we find that whereas the utility of non-users at the bundle characterized by (A27)-(A28) is Before comparing the utility of non-users at Y C , B C and Y D , B D , notice that a necessary condition for Y C , B C to be part of the second-best PF is that U n Y C , B C ∕ U u SB < 0 (and similarly, a necessary condition for Y D , B D to be part of the second-best PF is that U n Y D , B D ∕ U u SB < 0). Consider first U n Y C , B C ∕ U u SB . This is given by: Thus, we have that U n Y C , B C ∕ U u SB < 0 when For q < q , condition (A32) holds for Pareto efficient income taxation without single-crossing where the RHS of (A33) defines a lower bound for U u SB along the second-best PF. Substituting for U u SB into (A29) the value provided by the RHS of (A33) allows deriving an upper bound for U n SB , and therefore V n . After tedious calculations one gets: 33 It is easy to verify that the RHS of (A33) is larger than , which represents the value of U u SB that implies Y C = Ω (where Y C is defined by (A25) and Ω ≡ q (w n ) 2 w u (w u ) 2 −(w n ) 2 = (w n ) 2 q∕q represents the threshold value for Y separating the bundles where MRS u YB > MRS n YB , i.e. those bundles where Y < Ω , from the bundles where MRS u YB < MRS n YB , i.e. those bundles where Y > Ω ). This shows that it can never be optimal to discourage the labor supply of non-users to the point where Y n SB = 0. Consider now U n Y D , B D ∕ U u SB . This is given by: However, condition (A36) is never satisfied when q < q . Therefore, when q < q and (A23) is violated, a second-best optimum will necessarily entail a downward distortion on the labor supply of non-users ( T ′ Y n SB > 0 ). ◻

Appendix B
Proof of Proposition 2 Consider first the case when the intended direction of redistribution is from non-users to users. When q = q , so that (w u − q)w u = (w n ) 2 , the RHS of inequality (A6) simplifies to (w n ) 2 ∕2 , which is the utility achieved by non-users under laissez-faire. This shows that, when q = q , it is never possible to redistribute from non-users to users without distorting the labor supply of the latter. In order not to violate the incentive-compatibility constraint for non-users, users can either be 33 Details of the calculations are available upon request.
offered the distorted bundle characterized by (A7)-(A8), where T ′ Y u SB > 0 , or the distorted bundle characterized by (A9)-(A10), where T ′ Y u SB < 0 . But from (A18) we can see that, when (w n ) 2 = (w u )(w u − q) , users are indifferent between the two bundles. Thus, as long as users prefer these bundles to their laissez-faire bundle, there will be two equivalent second-best optima.
For users to be better off at their laissez-faire bundle than at either bundle (A7)-(A8) or (A9)-(A10), i.e. for U u which, after simplifying and collecting terms, can be restated as When q = q , our assumption that < 1 − (w n ) 2 (w u ) 2 can be equivalently restated as < q∕w u . This implies that w u − q < 0 . Then, (B1) holds when (w u − q)w u ∕2 < V n . But since q = q implies (w n ) 2 = (w u )(w u − q) , we also have that (B1) holds when V n > (w n ) 2 ∕2 = U n LF . This means that redistribution from non-users to users is feasible and users will face a non-zero marginal tax rate at a second-best optimum. For (1 − )U n LF ≤ V n < U n LF there are two equivalent second-best optima, one where T ′ Y u SB > 0 and one where T ′ Y u SB < 0 . For V n < (1 − )U n LF , since the bundle characterized by (A7)-(A8) becomes non admissible (it would require Y u < 0 ), the second-best optimum is unique: users are offered the bundle characterized by (A9)-( A10) and they face a negative marginal tax rate.
Consider now the case when the intended direction of redistribution is from users to non-users. In this case the RHS of inequality (A23) simplifies to (w u − q) 2 ∕2 , which is the utility achieved by users under laissez-faire. This shows that, when (w n ) 2 = (w u )(w u − q) , it is never possible to redistribute from users to non-users without distorting the labor supply of the latter. In order not to violate the incentivecompatibility constraint for users, non-users can either be offered the distorted bundle characterized by (A25)-(A26) or the distorted bundle characterized by (A27)-(A28). With (w n ) 2 = (w u )(w u − q) , non-users are indifferent between the two bundles. However, from (A31) and (A35) we also have that, when 1− > 0 , which implies that there is no point on the PF where non-users get a higher utility than under laissez-faire. ◻

Proof of Proposition 3
Consider first the case when V n < U n LF . For q > q , (A13) takes a negative sign, i.e. U u Y A , B A ∕ V n < 0 , when Under our assumption that < 1 − (w n ) 2 ∕(w u ) 2 , inequality (C1) is always satisfied. Therefore, one can keep raising the utility of users until V n is pushed down to the value (1 − )(w n ) 2 ∕2 (implying that Y A , as defined by (A7), reaches its lower bound Under our assumption that < 1 − (w n ) 2 ∕(w u ) 2 , (C2) is satisfied as long as where the RHS of (C3) is smaller than and it is larger or equal than . Notice in particular that, when is lower than but sufficiently close to 1 − (w n ) 2 ∕(w u ) 2 , the RHS of () defines a value that is smaller than −(w n ) 2 ∕2 , i.e. it violates our constraint (5).
From (A18) we can also see that, for q > q (i.e., (w u − q)w u < (w n ) 2 ), U u Y A , B A > U u Y B , B B . Thus, when q > q and (A6) is violated, a secondbest optimum will entail a downward distortion on the labor supply of users ( T ′ Y u SB > 0 ) as long as V n ≥ (1 − )(w n ) 2 ∕2 . As we have noticed above, when V n = (1 − )(w n ) 2 ∕2 and users are offered the corresponding Y A , B A -bundle, their utility is U u Y A , B A = U n SB = (1 − )(w n ) 2 ∕2 . Moreover, when V n = (1 − )(w n ) 2 ∕2 , Y A is pushed to its lower bound, i.e. Y A = 0 , which means that one has reached the limit of redistribution that can be accomplished by downward distorting the users' labor supply. Therefore, the second-best PF can include points where V n < (1 − )(w n ) 2 ∕2 if and only if, by pushing the utility of non-users below (1 − )(w n ) 2 ∕2 and offering users the corresponding Y B , B B -bundle (i.e., distorting upwards their labor supply), it is possible to raise the users' utility above (1 − )(w n ) 2 ∕2 . To verify whether this is indeed possible, notice first that, according to (C3), U u Y B , B B ∕ V n < 0 requires V n to be sufficiently small. Taking into account that in our analysis we require the constraint (5) to be satisfied, it then follows that the second-best PF will include points where V n < (1 − )(w n ) 2 ∕2 if and only if the following condition holds: Evaluating (A11) at V n = (1 − )(w n ) 2 ∕2 and (A12) at V n = −(w n ) 2 ∕2 , condition (C4) can be restated as Simplifying and collecting terms, the inequality above can be rewritten as or equivalently, dividing all terms by (w n ) 2 , as Based on (C5), we can then conclude that (C4) is satisfied provided that Having established under which conditions the second-best PF also contains values of V n that are lower than (1 − )(w n ) 2 ∕2 , what is left to prove is that, when this happens, the second-best PF is disconnected. For this purpose, taking into account that ∀ V n ∈ [(1 − ) (w n ) 2 2 , (w n ) 2 2 ) we have U u Y A , B A − U u Y B , B B > 0 , it is sufficient to show that the following condition holds: Evaluating both (A13) and (A16) at V n = (1 − )(w n ) 2 ∕2 , the inequality above requires that Multiplying both sides by (w u ) 2 , simplifying and collecting terms, one can rewrite (C9) as which is satisfied given that Proposition 3 refers to the case when q > q (which implies w u (w u − q) < (w n ) 2 ).
Consider now the case when V n > U n LF . For q > q , (A32) is never satisfied, which implies that it is never the case that U n Y C , B C ∕ U u SB < 0 . Regarding the sign of U n Y D , B D ∕ U u SB , we have instead that (A36) holds, and therefore U n Y D , B D ∕ U u SB < 0 , for values of U u SB that satisfy (A33). Thus, when q > q and (A23) is violated, a second-best optimum will necessarily entail an upward distortion on the labor supply of non-users ( T ′ Y n SB < 0 ). Regarding the maximum value that can be achieved by the non-users' utility along the second-best PF, it will depend on whether the lower bound for U u SB , as provided by (A33), defines a value that is larger or not than the lower bound that we have assumed in ( 6). If the RHS of (A33) is larger than −(w u − q) 2 ∕2 , i.e. if the maximum utility that can be achieved by non-users along the second-best PF is found by evaluating (A30) at the value for U u SB provided by the RHS of (A33). In this case, after tedious calculations, one obtains that , V n max = (w n ) 2 2 + (w n ) 2 (w u ) 2 − 2(w u ) 2 − (w n ) 2 2 (w n ) 2 w u (w u − q) − (w n ) 2 2 2 (w u ) 2 − (w n ) 2 2 .
users to achieve the same utility as under the bundle (D2). The difference is that, while offering (D2 ) with s = 0 is not incentive-compatible, offering (D3) with s > 0 prevents mimicking by non-users provided that the following condition holds: i.e. provided that Solving for the minimum value for s, denoted by s * , that satisfies inequality (D4), one gets: Thus, when s is set according to (D5) the government could offer users an undistorted bundle, without inducing mimicking by non-users, even when Pareto efficient income taxation without single-crossing tax with a subsidy on job-related expenses allows implementing a first-best optimum. In fact, assume that −(w n ) 2 ∕2 ≤ V n < (w n ) 2 ∕2 = U n LF . By offering to all agents, users and non-users, the bundle (Y, B) = (w n ) 2 , (w n ) 2 2 + V n and setting s = (w n ) 2 2 −V n (w u −q)q , one achieves redistribution ( U n SB = V n < U n LF ; U u SB = U u LF + 1− U n LF − V n > U u LF ), while at the same time leaving undistorted the labor supply of all agents ( Y n LF = Y n SB = Y u LF = Y u SB ), maintaining incentive-compatibility (given that all agents are offered the same bundle in the (Y, B)-space), and satisfying the public budget constraint (since the cost of the subsidy benefiting users, i.e. (w u − q)sq , is exactly matched by the total revenue collected through the income tax, i.e. (w n ) 2 2 − V n ). ◻

3
Pareto efficient income taxation without single-crossing Setting V n = 35 , one gets a separating equilibrium where U u = 31.42 and U n = 36.05 ; since U n > V n , it follows that V n = 35 does not belong to the domain of the function U u V n which describes the Pareto frontier.
Finally, assume that V n = 33 . In this case the solution to the government's problem is a pooling equilibrium where Y u = Y n = 84.62 , B u = B n = 68.80 and s = 82.25% . At this pooling equilibrium U u = 32.38 , U n = 33 and both the labor supply and the consumption of users are lower than for non-users. The labor supply of users is upward distorted and the labor supply of non-users is downward distorted. 39 Numerical example showing the possibility that a distortion arises even though no self-selection constraint is binding at a second-best optimum. Assume that the users' work-related costs are given by the concave function (h) = 5h + 0.5 √ h . Furthermore, assume that w u = 12.87 , w n = 10 , and = 1∕5 . Under laissez-faire we have that Y u LF = 100.13 and Y n LF = (w n ) 2 = 100 , with U u LF = 29.57 and U n LF = 50 . Assume that in the Pareto efficient tax problem V n is set equal to 40.01. At a second-best optimum we get that Y u SB = 0 , so that the labor supply of users is distorted downwards, Y n SB = 100 (no distortion on the labor supply of non-users), U n SB = 40.01 and U u SB = 39.96. 40 However, since the utility for a non-user choosing the bundle intended for users would be equal to 39.96, and the utility for a user choosing the bundle intended for non-users would be equal to 19.58, it follows that no self-selection constraint is binding at the second-best optimum. Nonetheless, observe that without a self-selection constraint requiring non-users not to be tempted to mimic users, the latter could have been offered an undistorted bundle (in our example, the bundle (Y, B) = (100.13, 140.09)). 39 Setting V n = 32.69 , one would get the second-best optimum that would be chosen by a maxmin government. At this second-best optimum Y u = Y n = 84.62 , B u = B n = 68.49 , s = 83.85% and U u = U n = 32.69 . As shown in Appendix E, with Y u LF < Y n LF a max-min social welfare function always deliver a second-best optimum where both self-selection constraints are binding. However, this secondbest optimum is not necessarily a pooling equilibrium as in our example. For instance, assume that w u = 10.2 , w n = 10 , q = 5 and = 1∕5 . The max-min optimum would be a separating equilibrium where U u < U n . 40 We also have B u SB = 39.96 and B n SB = 90.01 . Notice also that the second-best optimum features income re-ranking with respect to the laissez-faire equilibrium.