Optimal Redistributive Tax and Education Policies in General Equilibrium

Should a redistributive government optimally subsidize education to provoke a reduction in the skill premium through general equilibrium effects on wages? To answer this question, this paper studies optimal linear and non-linear redistributive income taxes and education subsidies in two-type models with endogenous human capital formation, endogenous labor supply, and endogenous wage rates. Under optimal linear policies, education should not be subsidized so as to reduce the skill premium. Linear income taxes are distributionally equivalent to (negative) linear education subsidies, but linear taxes do not distort investment in human capital, whether general equilibrium effects are present or not. If skilled labor supply is more elastic than unskilled labor supply, optimal redistributive linear income taxes are lowered as the distributional gains of linear taxes are offset by a rise in the skill premium. Moreover, the optimal linear income tax may even become negative if general equilibrium effects are sufficiently strong. Under non-linear taxation, governments can directly steer the skill premium by exploiting non-linearities in the policy schedules. At the top, the optimal marginal income tax rate is negative, and the optimal marginal education subsidy is positive. At the bottom, the optimal marginal income tax rate is positive, and education is optimally taxed at the margin. Hence, optimal non-linear tax and education policies compress wage differentials, which contributes to redistribution. Simulations show that the top rate and marginal education subsidies are close to zero for a wide range of plausible parameters. Only when high-ability and low-ability workers are rather poor substitutes in production, marginal education subsidies on the high type and marginal education taxes on the low type substantially differ from zero.


Introduction
In his book Income Distribution, the Dutch Nobel-prize winner Jan Tinbergen (1975) extensively discusses the merits of increasing the supply of skilled workers relative to unskilled workers to reduce wage inequality. As the relative supply of skilled workers falls, the skill premium is lowered, and wage inequality diminishes. Tinbergen's concern with growing inequality between skilled and unskilled workers is more relevant today than it was in the 1970s. Many Western countries are currently confronted with sharply increasing skill premiums. Skill-biased technological change causes the demand for skilled workers to increase more rapidly than the supply of unskilled workers (Katz and Autor, 1999). In Tinbergen's (1975) terminology: the race between education and technological change is currently lost by education. Also, globalization may jeopardize the prospects for low-skilled workers. In light of the deteriorating labor market position of low-skilled workers it is not surprising that subsidies to foster skill formation have a strong policy appeal. By boosting human capital formation, equality may be served because general equilibrium effects on wages reduce the skill premium. As there is less pre-tax inequality, the need to redistribute incomes through distorting income taxes may diminish at the same time.
The main question of this paper is: should general equilibrium effects on the wage distribution be exploited in an optimal redistributive tax cum education system? To answer this question, this paper analyzes optimal redistributive tax and education policies in a Mirrlees (1971) framework. Due to imperfect substitution between different skill types in labor demand, the skill premium is determined by both demand and supply conditions in the labor market. Furthermore, skill levels are endogenously determined by human capital investments, and not exogenous as in Mirrlees (1971).
This analysis closely follows Bovenberg and Jacobs (2005). These authors derive, in a partial equilibrium framework, that human capital formation should neither be taxed nor subsidized (on a net basis) in an optimal redistributive program with linear or non-linear taxes and subsidies. The intuition is as follows. One the one hand, subsidies on education are implicit subsidies on work effort, since working and learning are complementary in generating income. On the other hand, education subsidies are regressive, since high-ability individuals invest more in human capital (the 'ability bias'). With the earnings functions used by Bovenberg and Jacobs (2005), both effects cancel out exactly, and the sole role of education subsidies is to offset the distortions of the income tax on skill formation (see also Jacobs and Bovenberg, 2007). This paper maintains the assumptions of Bovenberg and Jacobs (2005) to ensure that human capital is optimally not subsidized in the absence of general equilibrium effects on wages. Therefore, the earnings function is assumed to be weakly separable between ability, labor, and education, both with linear and non-linear policies. Additionally, the production function for human capital is assumed to have a constant elasticity if linear policies are considered, see also Jacobs and Bovenberg (2007). Individuals also have an iso-elastic utility function, so that labor supply elasticities are constant, and income effects are absent. This simple utility function highlights the crucial role of human capital supply elasticities under linear policy instruments. Finally, the analysis is restricted to two types of agents that differ in their ability to acquire human capital. We refer to the high-ability type as the 'skilled' and the low-ability type as the 'unskilled' agent. 1 Whether education policy should optimally be employed to provoke redistributive general equilibrium effects on the skill premium is shown to depend crucially on two things. First, can education policy affect the relative supply of human capital in such a way that the skill premium falls? The skill premium will be a downward sloping function of the relative supplies of human capital: π (H 1 /H 2 ), π < 0. Hence, education subsidies must increase the relative supply H 1 /H 2 to be helpful in reducing wage inequality. Second, if education subsidies do indeed increase relative supply of human capital, should education subsidies, when optimally combined with an income tax, also be used? The answers to both questions are not obvious, and they differ fundamentally for linear and non-linear policy instruments.
The first part of this paper considers optimal linear tax and education subsidies. Linear education subsidies tend to increase both H 1 and H 2 at the same time, and it is not clear whether relative supply of human capital is increased at all. The supply of human capital of each agent depends on ability a, education e, and labor effort, i.e., H 1 (a 1 , e 1 , l 1 ) and H 2 (a 2 , e 2 , l 2 ), where the first agent has high ability, and the second low ability. 2 It is demonstrated that general equilibrium effects can never be exploited for redistributional reasons when supply elasticities of both labor and education are equal across agents, and income effects are absent. The reason is that relative human capital supply (H 1 /H 2 ) remains fixed. Indeed, linear education subsidies (s) will only compress wage differentials if the skilled worker's supply of human capital is more elastic with respect to the subsidy than the unskilled worker's supply of human capital. As agents are assumed to have human capital production functions with a constant elasticity (under linear instruments only), the labor supply elasticity of the skilled worker has to be higher than that of the unskilled worker for education subsidies to have the potential to compress the wage distribution.
The second question is: if the skilled worker has a higher labor supply elasticity, and education subsidies therefore can reduce the skill premium, should education subsidies also be employed in an optimal redistributive program alongside optimal linear taxes? With linear policies, the answer to this question is no. A linear education subsidy is distributionally equivalent to a linear income tax, even if there are general equilibrium effects on wages. The reason is that gross income is linear in education as a result of the constant elasticity of the production function for human capital. Hence, it is not optimal to exploit general equilibrium effects with the education subsidy, because the income tax can do this equally well, while avoiding excessive investment in human capital. Therefore, the efficiency results of Bovenberg and Jacobs (2005) derived in partial equilibrium carry over to the general equilibrium case.
In contrast, the optimal linear income tax is importantly affected by general equilibrium effects on wages. The assumption that skilled workers have a higher labor supply elasticity than unskilled workers implies that higher income taxes will reduce labor supply of the skilled workers relatively more than that of unskilled workers. Linear income taxation thus increases the skill premium, and before-tax income inequality increases. Consequently, general equilibrium effects on wages run against the distributional benefits of a higher income tax, and optimal income taxes are lowered as a result. Theoretically, the optimal linear income tax may even turn negative if the indirect general equilibrium effects on the pre-tax wage distribution are strong enough to offset the direct effects of higher marginal tax rates on the post-tax wage distribution.
The second part of the paper considers the simultaneous setting of optimal non-linear tax and education policies. Optimal non-linear policies differ fundamentally from linear policies when general equilibrium effects on wages are present. The key to understanding why they differ is that the relative supply of human capital (H 1 /H 2 ) can, by definition, be directly steered by the non-linear tax and education schedules -even if all supply elasticities are equal. By giving a marginal education subsidy on the high type, and a marginal education tax on the low type, the government can directly increase the relative supply of skilled human capital to lower the skill premium. Therefore, the first question -can education policy affect the relative supply of education in such a way that the skill premium falls? -can be answered affirmatively.
The second question is: should education subsidies be optimally employed in an optimal redistributive program? The answer with non-linear policies is yes. The skilled worker faces a marginal subsidy on education, while the unskilled worker faces a marginal tax on education. The optimal marginal income tax on the skilled worker is negative, while the optimal marginal tax on the unskilled worker is positive, as in Stern (1982) and Stiglitz (1982). We demonstrate that marginal education taxes/subsidies are directly linked to the top rate: marginal subsidies are zero when the top rate is zero. As the skilled worker faces a positive subsidy on both labor effort and education, and the unskilled worker faces a positive tax on both education and labor effort, the skill premium will be reduced. A skilled worker will be less tempted to mimic an unskilled worker, incentive compatibility constraints will be relaxed, and the government can redistribute more income. In contrast to the linear policies, non-linear instruments do exploit general equilibrium effects for redistribution.
We analyze the quantitative importance of general equilibrium effects by simulating optimal non-linear income taxes and education subsidies. We find that the marginal top rate is negative, and rather small for plausible elasticities of substitution between skilled and unskilled labor, which confirms the findings of Stern (1982). Optimal education subsidies are not large either, since there is a direct link between the top rate and education subsidies. However, we demonstrate quantitatively that general equilibrium should particularly be exploited for redistribution when the elasticity of substitution between skilled and unskilled labor is low.
Tinbergen's suggestion to promote skill formation so as to provoke a decline in the skill premium for redistributional reasons is possible only under non-linear policies. The case is lost under linear tax instruments. Intuitively, the generic linear education subsidy is an inefficient instrument to reduce the skill premium, because it increases human capital supply of all agents simultaneously. Non-linear education subsidies avoid this simultaneous increase in human capital supplies and can be tailored to increase supply of the high-ability types, while simultaneously lowering the supply of the low-ability types. However, numerical simulations of non-linear policies give rather weak support to the use of education policies in reducing inequality through general equilibrium effects on wages. Hence, it appears that direct instruments are more efficient than indirect instruments, i.e. education subsidies. 3 This paper is related to a number of other papers. First, Teulings (2001, 2004) pioneered the analysis of optimal tax and education policies in the presence of general equilibrium effects on wages. They developed a log-linear matching model with a continuum of agents. These authors show that education is subsidized in an optimal redistributive tax and education policy as long as the regressive incidence of the education subsidy (ability bias) does not offset the progressive general equilibrium effect on wages. Education policy then allows the government to rely less on distortionary income taxation for redistribution. This paper provides different results, and the reasons for this are twofold. First, Teulings (2001, 2003) only analyze log-linear policies. Log-linear education policies have the property that high-ability types receive a higher marginal subsidy on education than low-ability types. Thus, the regressive schedule for education subsidies directly reduces the skill premium, as high-ability types will invest relatively more in human capital than low-ability types. Second, the authors assume that increase in the mean level of human capital always reduces the skill premium, even if all behavioral elasticities are equal and income effects in labor supply are absent, so that relative supplies of human capital remain fixed. This contrasts with the current paper, where an increase in the mean level of human capital, while not changing the relative supplies of human capital, does not affect the wage distribution.
Second, some papers analyzed the question whether direct or indirect instruments should be used for redistribution in general equilibrium settings. If the government cannot observe different skill-types in production, and cannot tax or subsidize all production inputs at different rates, both the Diamond and Mirrlees (1971) production efficiency theorem, and the Atkinson and Stiglitz (1976) zero commodity tax theorem break down. Indirect taxes should therefore optimally complement the non-linear income tax, even when preferences are weakly separable (Naito, 1999). Saez (2004), in contrast, argues against indirect instruments for redistribution, and restores the validity of the production efficiency and zero commodity tax theorems when skill-types are imperfect substitutes in production. Naito (2004Naito ( , 2007 finds that this holds true only if more able individuals do not have a comparative advantage in high-skilled occupations. By allowing for comparative advantage in skill formation, Naito (2004Naito ( , 2007 derives that deviations from aggregate production efficiency, and non-zero commodity taxation are optimal.
This paper contributes to this literature and finds that another powerful property of optimal tax structures is violated under linear and non-linear taxation. Diamond and Mirrlees (1971) derive that optimal tax expressions are the same in partial equilibrium (fixed prices), and general equilibrium (variable prices). This is what Saez (2004) calls the 'tax formula result'. We show that labor demand parameters explicitly enter optimal tax formulae, and factor price changes are exploited for redistribution. Hence, the tax formula result also breaks down with comparative advantage in skill formation, as in Feldstein (1973), Allen (1982), Stern (1982), and Stiglitz (1982). Moreover, indirect instruments, such as education subsidies, should optimally complement the non-linear income tax. Hence, under non-linear income taxation, the zero commodity tax theorem is not applicable to education, which further bolsters the findings by Naito (1999Naito ( , 2004Naito ( , 2007.
The rest of this paper is structured as follows. The next section presents the model with optimal linear income taxation and education policies in general equilibrium. Section 3 studies the same problem using non-linear instruments. Section 4 concludes. An appendix contains all the technical derivations.

Model
This section presents the base-line model. The standard models of optimal income taxation with general equilibrium effects on wages are extended with human capital formation. 4 Individuals differ in their capacity to accumulate human capital and earnings capacities of individuals are endogenous rather than exogenously given. Furthermore, individuals with higher ability have a comparative advantage in skill formation. A 'one-shot' model of human capital investments is analyzed. One may view this model as describing life-time investments in human capital, life-time labor supply and life-time consumption, where there are no inter-temporal distortions due to capital taxes or capital market failures, for example. 5 To fully track down the general equilibrium impact of tax and education policies analytically, the analysis is restricted to two types, as in almost the entire literature. 6

Individuals
There is a unit mass of high-ability and low-ability workers who are indexed by n = 1 and n = 2, respectively. The fraction of high-ability workers is g 1 and the fraction of low-ability workers is g 2 .
Each worker has an iso-elastic utility function u(c n , l n ) which is defined over consumption c n and work effort l n according to u n (c n , l n ) ≡ c n − l 1+1/εn n 1 + 1/ε n , n = 1, 2, where ε n > 0 is the (un)compensated wage elasticity of labor supply of individual n. Since income effects are absent, compensated and uncompensated elasticities coincide, and labor supply is always upward sloping. This utility function is also used for its analytical simplicity in Diamond (1998), Saez (2001), Dur and Teulings (2001), and Naito (2004). This specification of utility is sufficiently general to stress the main points at stake, while not introducing additional analytical complexity due to income effects, as in Allen (1982). Moreover, it highlights the crucial role of different labor supply elasticities under linear policy instruments. Indeed, elasticities of labor supply are assumed to differ, and ε 1 is not equal to ε 2 . In the absence of income effects, different elasticities of labor supply or human capital formation are necessary to obtain general equilibrium effects of policy. As human capital elasticities are assumed to be equal, see below, linear policy instruments would have no general equilibrium effects if labor supply elasticities would be identical. Human capital is accumulated on the intensive margin. Individuals invest e n of their resources in education. One can think of e n as the years enrolled in education or the quality of education where each individual has access to the same educational inputs, but transforms them differently into human capital depending on their ability. Gross labor income z n of each individual is z n ≡ w n a n φ(e n )l n , n = 1, 2, where φ (e n ) > 0 and φ (e n ) < 0. w n denotes the gross wage rate per unit of human capital of an individual of skill-type n, and, l n is work effort. a n φ n (e n ) is the production function for human capital, where a n is the exogenous productivity of investment in human capital. a 1 > a 2 , i.e., high-ability types have a comparative advantage in learning. High-ability types generate more human capital with the same amount of educational efforts because ∂zn ∂en∂an = w n φ (e n )l n > 0. 7 To ensure that optimal education subsidies are zero in the absence of general equilibrium effects, the earnings function is weakly separable in ability, education, and labor (Jacobs and Bovenberg, 2007). The elasticity of the production function is also assumed to be constant under linear policies, and is denoted by β ≡ φ (en)en φ(en) (see Jacobs and Bovenberg, 2007). In general equilibrium, type 1 is assumed to earn a higher gross income than type 2, i.e. w 1 a 1 φ(e 1 )l 1 > w 2 a 1 φ(e 2 )l 2 .
The price of one unit of education is denoted by p n . Note that these costs might differ between individuals. All costs are assumed to be tax deductible since the major costs of education consist of taxed opportunity costs. 8 Investments in education e n are subsidized at flat rate s. Gross incomes z n are taxed at a constant marginal rate t. In addition, every individual may receive a non-individualized lump-sum transfer b. Hence, the income tax is progressive in the sense that average tax rates increase with income. The fundamental informational requirements to levy a linear income tax, and to provide linear education subsidies are that aggregate gross incomes and aggregate investment in human capital must be verifiable by the government.
Consumption c n equals total net labor income minus education expenditures: (3) The first-order conditions for utility maximization yield the following constant elasticity labor supply functions for each individual l n = ((1 − t)w n a n φ(e n )) εn , n = 1, 2. (4) Labor supply l n increases with the net marginal wage rate and taxes depress labor supply. The first-order condition for optimal human capital investment is given by w n a n φ (e n )l n = (1 − s)p n , n = 1, 2.
Marginal benefits of learning (the left-hand side) should be equal to the marginal costs of learning (the right-hand side). Subsidies increase investment in human capital. Taxes have no direct effect on learning because both marginal costs and marginal benefits are equally affected. Taxation does, however, reduce labor supply and lowers the returns of investments in human capital indirectly.
First-order conditions are necessary, but not sufficient. The first-order conditions reveal that investments in education increase if larger labor supply increases the utilization rate of human capital. As investments in human capital increase, net wage rates per hour worked increase, and labor supply expands. Larger labor supply, in turn, results in higher investment in human capital, and so on. Sufficiently strong diminishing returns to human capital accumulation β, or a sufficiently low wage elasticity of labor supply ε n , should guarantee that this feedback dampens, so that an interior solution in human capital formation and labor supply is attained. Secondorder conditions are satisfied by imposing the following restriction on the parameters (see the Appendix for the derivation) The tax elasticities of labor supply ε lt n ≡ − (1−t) en = εn µn are important determinants of the optimal tax rates, and are derived in the Appendix. The tax elasticity of labor earnings z n = w n a n φ(e n )l n amounts to ε zt n = ε lt n + βε et n = εn µn . The tax elasticity of gross income εn µn exceeds the wage elasticity of gross income ε n . The reason is that the tax rate t reduces the after-tax wage (1 − t)w n a n φ(e n ) both directly (by raising the tax wedge between the before-tax wage and the after-tax wage t) and indirectly (by depressing the before-tax wage rate w n a n φ(e n ) through its negative impact on learning e n ). Learning is harmed indirectly because lower labor supply depresses the utilization rate of human capital. Similarly, the subsidy elasticities are given by:

Firms
There is one sector of production. 9 A representative firm maximizes profits while taking wage rates w 1 and w 2 for each skill type as given. The firm produces output Y with a neoclassical production function, which features constant returns to scale in labor inputs H 1 and H 2 where F n (.) > 0, F nn (.) < 0, F 12 ≥ 0, n = 1, 2, and subscript n refers to the n-th argument of differentiation. The income share of the low-income earner is denoted by α ≡ is, therefore, the income share of the high-income earner. Further, if α < 1/2, the low-ability type earns less than the high-ability type. σ ≡ F 1 (.)F 2 (.) F 12 (.)F (.) denotes the partial elasticity of substitution between H 1 and H 2 in the production function F (.).
First-order conditions for profit maximization are necessary and sufficient, and given by The skill premium π is the ratio of wages of skilled and unskilled workers, i.e., π ≡ w 1 /w 2 . With constant returns to scale in production, π is only a function of the relative supplies of skilled and unskilled workers, where π (H 1 /H 2 ) < 0. The skill premium decreases if the relative supply of skilled workers, Note that if both types have the same labor supply elasticity ε n , all the tax and subsidy elasticities will be equal across individuals (see previous section). Hence, linear policy instruments cannot affect the skill premium π in that case.

General equilibrium
Labor market clearing requires that supply equals demand for each labor type: H n = a n φ(e n )l n g n , n = 1, 2. (10) Further, goods market equilibrium requires that total output equals total consumption, plus investments in human capital, plus exogenously given government expenditures Λ:

Government
The government maximizes a social welfare function over indirect utilities v n (b, t, s, w n ): where ω n denotes the weight of type n in social welfare. The welfare weights sum to one: ω 1 +ω 2 = 1. If ω 1 = ω 2 , the social welfare function is utilitarian, and there is no social preference for redistribution due to the constancy of marginal utility of income at the individual level (no income effects). ω 2 > ω 1 implies a social preference for redistribution. 10 The government collects taxes to finance the lump-sum transfer, the education subsidies, and the exogenous revenue requirement Λ. The government budget constraint reads as The government maximizes social welfare by optimally choosing the lump-sum transfer b, the linear marginal tax and the linear education subsidy s. Formally, the following Lagrangian is maximized 11 max {b,t,s} where η denotes the Lagrange multiplier of the government budget constraint, and the labor market clearing conditions have to be imposed H 2 ), and H n = a n φ(e n )l n , n = 1, 2. The first-order condition for the optimal lump-sum transfer is where we used Roy's lemma ( ∂vn ∂b = 1), and ωn η is the social marginal utility of income of type n. The average social marginal benefits of a higher b (i.e., the left-hand side of (15)) should equal the costs in terms of a higher b (i.e., the right-hand side of (15)).
With the aid of the first-order condition for b (15), the distributional characteristic ξ z of labor income is defined as (minus) the normalized covariance between the social marginal utility of income ωn η , and gross labor income z n (see, e.g., Atkinson and Stiglitz, 1980): where the second equality follows from (15). With a positive distributional characteristic ξ z , taxing labor income yields distributional benefits, because the high-ability worker has a lower welfare weight than the low-ability worker, i.e. ω 1 η < ω 2 η , and earns a higher income, z 1 > z 2 . Indeed, a zero distributional characteristic implies either that the government is utilitarian (ω 2 = ω 1 ), and not interested in redistribution, or that the marginal contribution to the tax base is equal for both ability types (i.e., taxable income z n is the same for both types).
Similarly, the distributional characteristic of education ξ e is defined as ξ e ≡ − n ωn η e n g n − n e n g n n ωn η g n n ωn η g n n e n g n = n (1 − ωn η )e n g n n e n g n = ξ z .
A positive distributional characteristic implies that subsidizing (taxing) education results in distributional losses (gains). If education levels are equal for both workers, there is no educational inequality, and subsidizing education yields no distributional losses. The absence of a redistributional motive renders the distributional characteristic zero. Note that the distributional characteristic of education is equal to the distributional characteristic of income, because gross earnings are linear in education due to the constant elasticity of the production function for human capital. This assumption ensures that (negative) education subsidies are distributionally equivalent to income taxes, see also Jacobs and Bovenberg (2007). In the remainder, the superscripts are dropped, and ξ ≡ ξ z = ξ e .
The first-order condition for the optimal linear income tax is given by ∂L ∂t = − n ω n (w n a n φ(e n )l n − p n (1 − s)e n ) g n (18) +η n (w n a n φ(e n )l n − p n (1 − s)e n ) g n +ηt n w n a n φ(e n ) ∂l n ∂t g n − ηs n p n ∂e n ∂t g n And the first-order condition for the linear education subsidy is +ηt n w n a n φ(e n ) ∂l n ∂s g n − ηs n p n ∂e n ∂s g n Roy's lemma has been used in deriving both expressions, i.e., ∂vn ∂t = − (w n a n φ(e n )l n − (1 − s)p n e n ), and ∂vn ∂s = (1 − t)p n e n , and ∂vn ∂wn = (1 − t)a n φ(e n )l n .

Optimal tax policies in general equilibrium
This section first derives the case in which education subsidies are not used (s = 0). Furthermore, it starts with deriving the optimal linear income tax in the absence of general equilibrium effects (see, e.g., Sheshinski, 1971;Dixit and Sandmo, 1977;Atkinson and Stiglitz, 1980). In that case, the last two lines in the first-order condition for t (18) are zero, and the optimal linear income tax is (see Appendix): whereε lt ≡ (1 − α)ε lt 1 + αε lt 2 , andε lt /(1 − β) =ε zt is the income weighted average of the tax elasticity of total labor earnings. The denominator is the elasticity of the tax base with respect to the tax rate. The optimal income tax formula (20) shows the trade-off between equity (numerator) and efficiency (denominator). The larger the social preference for redistribution is, the larger is ξ, and, the higher is the optimal marginal income tax. If both groups have an equal weight in social welfare (ω 2 = ω 1 ) the optimal marginal income tax is zero, because ξ = 0. The larger the income weighted average elasticity of labor earnings is, the lower is the optimal linear income tax, because the labor income tax more heavily distorts labor supply. Both the elasticities of labor supply and human capital formation determine the effective elasticity of earnings, which is due to the feedback between labor supply and human capital formation. See also Bovenberg and Jacobs (2005) for a more elaborate discussion.
With general equilibrium effects, in contrast, the last two lines in the first-order condition for t (18) are not zero, and need to be taken into account as well. To obtain a general expression for the optimal linear income tax with general equilibrium effects on wages, the property is exploited that inverse labor demand equations are homogeneous of degree zero as long as there are constant returns to scale in production, i.e., Simplifying the first-order condition for t then yields (see Appendix) The expression for the optimal linear income tax in partial equilibrium differs from the expression for the optimal income tax in general equilibrium. Hence, the 'tax-formula result' does not apply (see also Saez, 2004). The difference with formula without general equilibrium effects is the added term in the brackets on the right hand side. ω 2 η − ω 1 η ε 1 µ 1 − ε 2 µ 2 α(1 − α) 1 σ measures the distributional losses (or gains) arising from general equilibrium effects on wages, and are subtracted from the direct welfare gains of higher income taxes in reducing inequality (as measured by ξ). The optimal marginal linear income tax is lower with general equilibrium effects than without if: i) substitution between labor types is finite (σ < ∞), ii) the (uncompensated) earnings elasticity of the skilled worker is larger than that of the unskilled type (ε zt 1 = ε 2 µ 2 > ε zt 2 = ε 2 µ 2 ). The effects of imperfect substitution in labor types have been subject of most of the papers in this field, so these results do not come as a surprise. If labor types are perfect substitutes (σ = ∞), the last term in the expression of the optimal income tax (23) vanishes. In this case, wage rates per hour worked are not affected by changes in relative factor supplies, and the wage distribution is exogenous.
The second condition, on the relative sizes of labor supply elasticities, has received no attention so far. The formula demonstrates that general equilibrium effects only contribute to more equality if high-skilled labor earnings respond less elastically to an increase in taxes than low-skilled labor earnings (ε zt 1 = ε 1 µ 1 < ε zt 2 = ε 2 µ 2 ). In that case, higher taxes raise relative supply of skilled labor (H 1 /H 2 ), the skill premium π (H 1 /H 2 ) declines, see (9), and larger inequality results. If, however, skilled labor supply responds more elastically to taxes than unskilled labor supply (ε zt 1 = ε 1 µ 1 > ε zt 2 = ε 2 µ 2 ), general equilibrium effects work against redistribution of incomes through income taxes by increasing before-tax income inequality as the relative supply of skilled labor declines. Below we derive that a higher elasticity of skilled labor supply is a necessary requirement for education subsidies to work in favor of redistribution via general equilibrium effects under linear policy instruments. Hence, we assume that ε zt 1 > ε zt 2 , or, equivalently, ε 1 > ε 2 . Also from an empirical point of view, a larger elasticity of skilled labor relative to unskilled labor is plausible (see Gruber and Saez, 2003). Furthermore, it can be demonstrated that, if both workers have identical CES utility functions, uncompensated wage elasticities of labor supply unambiguously increase with skill type as long as labor supply functions are upward sloping. 12 12 Suppose we hold human capital accumulation fixed.
If utility has the CES form u(c n , l n ) ≡ [δc ρ n + (1 − δ)(1 − l n ) ρ ] 1/ρ , the uncompensated labor supply elasticity with respect to the tax rate is ε lt n ≡ − ∂ln ∂t 1−t l = σ cl −ζ ln/(1−ln)+ζ , where σ cl ≡ 1/(1 − ρ) is the elasticity of substitution between consumption and leisure, and ζ ≡ (1 − t)w n l n /((1 − t)w n l n + b) is the share of labor income in total income. The labor supply curve is upward sloping (backward bending) if σ cl > ζ (σ cl < ζ). With identical tax rates t and transfers g, the uncompensated tax elasticity of labor supply for high-skilled workers is higher than for low-skilled workers, if the labor supply curve is upward sloping, which can be derived as follows. Differentiation of the elasticity with respect to ability n gives: If the compensated elasticities of effective total labor supply ε zt 1 and ε zt 2 are identical (ε zt 1 = ε zt 2 so that ε 1 µ 1 = ε 2 µ 2 ), the general equilibrium term vanishes as well. The intuition is straightforward. With equal elasticities, linear taxation does not affect relative labor supply. If relative total labor supply (H 1 /H 2 ) remains constant, relative wages (w 1 /w 2 ) remain constant as well, cf. the skill premium (9). This is the case, for example, if all individuals have identical preferences (ε 1 = ε 2 ). Hence, this example illustrates the necessity of heterogeneous preferences (in the absence of income effects) for our model to make sense in a general equilibrium context. 13 The expression for the optimal income tax (23) suggests that optimal marginal income taxes may even become negative. If the shares of workers are equal, i.e., g 1 = g 2 = 1 2 we can find an explicit condition under which marginal tax rates are optimally negative (ξ = (ω 2 − ω 1 ) 1 2 − α and η = 1/2) .
The last inequality is satisfied if general equilibrium effects are very important (low σ), if there is less inequality (higher α), and if tax elasticities of earnings are higher for high-skilled workers than for low-skilled workers (large (ε zt 1 − ε zt 2 )/(1 − β)). Subsidizing work effort provokes such strong general equilibrium effects that low-ability workers are better off by paying subsidies to the high-ability workers: their before tax wages increase more than is needed to offset the rise in lump-sum taxes to finance the subsidies on work.
Optimal negative income taxes may not be a purely theoretical curiosity. As an illustrative example, suppose that the elasticity of substitution between skilled and unskilled workers equals σ = 1.5, which is not an uncommon empirical value (see for example Katz and Autor, 1999). Assume furthermore that the skilled worker earns 50% more than the unskilled worker so that α = 0.4. This can be justified by a wage increase of 10% for every additional year of schooling and 5 years of higher education. A Mincer return of 10% is not an uncommon empirical estimate either (see Card, 1999). α = 0.4 may also correspond to the income share of low-skilled workers in labor earnings. Finally, take β to be 1/4. This can be interpreted as the share of forgone earnings in life-time earnings, or the share of schooling years in total years available for working and education. Then, the difference in human capital supply elasticities must satisfy ε zt 1 − ε zt 2 = 0.24. Under this parameterization, a difference of 0.24 in the total elasticity of labor earnings is sufficient to have negative optimal marginal tax rates, when general equilibrium effects on wages are present.
It should be noted that uncompensated elasticities determine the general equilibrium impact of taxes. The change in relative supplies determines the effect on wages. This change in relative supply is correctly measured only with uncompensated, and not compensated, changes. 14 constitutes a smaller part of total income for wealthier individuals, and ∂(ln/1−ln) ∂n > 0. The latter can be proven by taking the differential of ∂(ln/1−ln) ∂n = ∂(wnln/(wn−wnln)) ∂n = ln ∂wn ∂n +wn ∂ln ∂n (wn−wnln) 2 > 0. This term is positive under standard monotonicity (or 'single-crossing') conditions, which ensure that gross incomes (and consumption) increase in ability, i.e., ∂wnln ∂n = l n ∂wn ∂n + w n ∂ln ∂n > 0, see Mirrlees (1971). Allowing for human capital formation does not qualitatively affect this derivation.
13 Using identical utility functions, Allen (1982) found that general equilibrium effects may result in both lower and higher optimal linear income tax rates compared to the case with exogenous wages. Higher optimal linear taxes can be found if income effects in labor supply are sufficiently strong. Even when skilled and unskilled workers have identical preferences, income effects can drive the uncompensated wage elasticity of labor supply of skilled workers below the uncompensated labor supply elasticity of unskilled workers, and general equilibrium effects work in favor of redistribution. 14 The current model ignores income effects, and compensated and uncompensated elasticities coincide. Whether this is an omission remains an open empirical question. Estimates of the income elasticity of labor supply vary substantially, and there appears to be no consensus in the literature on its size, see also Blundell and MaCurdy (1999), and Gruber and Saez (2003).

Optimal education policies in general equilibrium
This section analyzes the case where the government has only education subsidies at its disposal and does not have access to an income tax (t = 0). This is not the relevant case to describe the real world, but it helps to derive the intuition for the results in the next section, where both tax and education policies are simultaneously optimized. Again, first education subsidies in the absence of general equilibrium effects are derived (see Appendix): whereε es ≡ (1 − α)ε es 1 + αε es is the income weighted learning elasticity with respect to the subsidy (ε es n ≡ ∂en ∂s 1−s en = 1 µn ). Since the optimal subsidy is negative, education is taxed if the government has no other means to redistribute incomes. The formula again stresses the tradeoff between equity (numerator) and efficiency (denominator). The more education taxes yield distributional gains (larger ξ), the larger are taxes on education. If education decisions are more elastic to taxes,ε es increases and optimal taxes on education are lower as a consequence.
In the presence of general equilibrium effects, optimal subsidies on education are given by (see Appendix) The difference with the optimal subsidy without general equilibrium effects (25) is again a term containing the general equilibrium impact of the education subsidy ω 2 The intuition of the optimal income tax carries over to the present case. If wage elasticities of labor supply are equal (ε 1 = ε 2 , so that µ 1 = µ 2 ) education subsidies will not provoke general equilibrium effects since relative supplies of human capital are not affected. Similarly, perfect substitution between labor types (σ = ∞) gives the same result as the expression without general equilibrium effects.
Taxes on human capital formation are lower if imperfect substitution on the labor market is important (low σ), and if the labor supply elasticity of the high-ability worker is lower than that of the low-ability worker, i.e., ε 1 > ε 2 . The latter condition is necessary to have education subsidies working in favor of redistribution through general equilibrium effects. Indeed, one may get the result that -even in the absence of an income tax -education should be subsidized for distributional reasons if the adverse distributional effects ξ are outweighed by the positive general equilibrium effects on wages. If shares of both types of workers are equal (g 1 = g 2 ), 2α(1−α) . We can infer that with σ = 1.5 and α = 0.4, a difference in subsidy elasticities of education equal to 0.31 is sufficient to provoke such strong general equilibrium effects that education should be subsidized rather than taxed.

Optimal tax and education policies in general equilibrium
When the government simultaneously optimizes tax and education policies, the optimal taxes and subsidies follow from solving the first-order conditions for t and s. The general solution for optimal education subsidies in the presence of general equilibrium effects is (see Appendix) The government sets the subsidies on education to zero. Therefore, educational investments are weakly efficient, i.e., conditional upon distorted labor supply, in the presence of an optimal income tax. The intuition is the same as in Bovenberg and Jacobs (2005) and Jacobs and Bovenberg (2007). The government may use education subsidies as an implicit tax on leisure, because education and work effort are complementary. At the same time, educational investments generate inequality due to the ability bias in education, i.e., the high-ability individuals invest more in education than low-ability individuals (ceteris paribus). The constant elasticity in the human capital production function ensures that there is a linear relation between e n and z n . Hence, subsidizing e n is equivalent to subsidizing z n . Education subsidies do not reduce the labor supply distortion more compared to an equally costly reduction in income taxes. However, besides distorting labor supply, subsidies on education also distort human capital investment. This distortion in education (over-investment) can be avoided by not subsidizing education. Therefore, the government does not want to use education subsidies to reduce the tax wedge on labor supply (Jacobs and Bovenberg, 2007).
The government is also indifferent between education taxes and income taxes to redistribute income. Because education is linear in income, any redistribution that education taxes can achieve, can be achieved equally as well with income taxes. Indeed, the distributional characteristics of income (ξ z ) and education (ξ e ) are equal. While both taxes on income and education reduce labor effort, taxing education e n additionally causes under-investment in human capital. The income tax does not directly distort human capital investment, because costs and benefits of education are equally affected by the marginal tax rate. The government can therefore avoid distortions in human capital accumulation by using the income tax instead of taxes on education to redistribute incomes (Jacobs and Bovenberg, 2007).
General equilibrium effects do not change the logic of these arguments. Even when general equilibrium effects on wages are present, there is still a linear relationship between earnings and education. Hence, education subsidies remain distributionally equivalent to income taxes, and education subsidies still cannot be used as implicit taxes on leisure. Optimal subsidies thus ensure conditional efficiency in human capital formation. We like to speak of conditional efficiency, because labor supply is still distorted below first-best levels. As the utilization rate of human capital is lower, so are the returns to human capital investments, and education is below first-best levels.
From this discussion follows that education subsidies should not be employed to compress the wage distribution in the optimal redistributive program. This proves the robustness of Bovenberg and Jacobs (2005) in the presence of general equilibrium effects on wages. The optimal linear income tax with optimal education policies remains as in (27). Hence, the reader is referred to the earlier discussion.

Non-linear taxes and subsidies
This section derives the optimal non-linear tax and education policies. We can now allow for general utility and production functions. We do nevertheless maintain the assumption that the earnings function is weakly separable in ability, labor, and education. This ensures that optimal education subsidies are zero in the absence of general equilibrium effects on wages (Jacobs and Bovenberg, 2007). We also provide simulations of the optimal non-linear taxes and subsidies, while maintaining the same structure on preferences and technologies as in the linear case.
The optimal non-linear tax and subsidy rates are found by deriving the optimal second-best allocation of consumption, gross income and education, as in Stern (1982) and Stiglitz (1982). By using the first-order conditions for individual optimization, we can compute the optimal marginal income taxes and optimal marginal education subsidies that would decentralize the optimal second-best allocation.
Any solution to the optimal second-best problem has to satisfy the incentive compatibility constraints stating that each individual has a weakly higher utility choosing the bundle (c n , z n , e n ) of consumption, gross income and education, which is intended for them, than utility of choosing the alternative bundle of (c m , z m , e m ), which is intended for the other. That is, U n (c n , z n , e n , n) ≥ U n (c m , z m , e m , n), n, m = 1, 2, where U n (c n , z n , e n , n) ≡ u n c n , zn wnanφ(en) = u n (c n , l n ). In the current model, the two incentive compatibility constraints are given by u 2 (c 2 , l 2 ) = u 2 c 2 , z 2 w 2 a 2 φ(e 2 ) ≥ u 2 c 1 , z 1 w 2 a 2 φ(e 1 ) = u 2 c 1 , w 1 a 1 w 2 a 2 l 1 .
The government maximizes the social welfare function (12) subject to the economy's resource constraint (11), and incentive compatibility constraint (29). Under normal circumstances, the second-incentive constraint (30) is not binding at an optimal solution, and it will be ignored in the remainder (see also Stiglitz, 1982, andStern, 1982). Assuming for simplicity that the price of a unit of education is one, and the number of high-ability and low-ability persons are both equal to one, the following Lagrangian for the maximization of social welfare is formulated 15 max {c 1 ,l 1 ,e 1 ,c 2 ,l 2 ,e 2 } L = ω 1 u 1 (c 1 , l 1 ) + ω 2 u 2 (c 2 , l 2 ) (31) where the inverse of the skill premium is denoted by π −1 , see (9). The conditions for labor market equilibrium (10) are substituted. To avoid confusion in notation, utility of the highskilled type mimicking the low-skilled type is designated by u * 1 ≡ u 1 c 2 , w 2 a 2 w 1 a 1 l 2 . θ is the Lagrange multiplier on the incentive compatibility constraint, and measures the marginal social value of redistributing income from the high-skilled to the low-skilled worker.
First-order conditions for an optimal allocation are From the first-order conditions for c 1 (32) and l 1 (33) follows the marginal tax rate, T 1 ≡ 1 + 1 w 1 a 1 φ(e 1 ) ∂u 1 /∂l 1 ∂u 1 /∂c 1 , on the high-ability type (see Appendix) 16 Consequently, the marginal top rate is exploited for redistribution with general equilibrium effects on wages. Indeed, the general expression of the optimal marginal tax on the skilled type is almost identical to the one without human capital formation, see Stern (1982) and Stiglitz (1982). The basic intuition of these papers carries over to the present case. A marginal subsidy on work for the high-ability type increases relative supply of skilled human capital, and lowers the skill premium π. Hence, the utility costs of mimicking the low-ability type increase, as it takes the skilled type more labor time to mimic the unskilled type. In the absence of general equilibrium effects on wages (σ = ∞), there is no distortion at the top, and the marginal tax rate is zero, see also Seade (1977) and Mirrlees (1971). Further, if the government is not interested in redistribution, the marginal social value of transferring incomes from high-ability to low-ability workers, θ, is zero, and optimal marginal taxes are zero as well.
Manipulation of the first-order condition for c 2 (35) and the first-order condition for l 2 (36) yields the marginal tax on labor of the low-ability type T 2 ≡ 1 + 1 w 2 a 2 φ(e 2 ) ∂u 2 /∂l 2 ∂u 2 /∂c 2 (see Appendix) In the absence of general equilibrium effects, the marginal tax rate on the low-ability type is unambiguously positive, since the top rate is zero (T 1 = 0) in that case (see Mirrlees, 1971;Stiglitz, 1982). The presence of general equilibrium effects (lower σ) increases the marginal tax rate T 2 on the low type (recall T 1 < 0). This is intuitive, since a higher marginal tax rate on the low type reduces their labor supply, and therefore results in more before-tax wage equality. Hence, the high type is less tempted to mimic the low type, which relaxes the incentive compatibility constraint, and therefore results in more redistribution. Optimal education subsidies for the high type are derived from combining the first-order conditions for e 1 (34) with the optimal marginal income tax for skilled workers (38). Then, we find that the optimal marginal education subsidy for high-ability workers S 1 ≡ 1 − w 1 a 1 φ (e 1 )l 1 is positive (see Appendix) The intuition is that a marginal subsidy on human capital investment of the high type (like the marginal subsidy on work) lowers the skill premium, and makes it more costly for the high-ability type to mimic the low-ability type. Similarly, we find the optimal subsidy on education for the low-ability type S 2 ≡ 1 − w 2 a 2 φ (e 2 )l 2 from the first-order condition for e 2 (37) and the tax rate (39) (see Appendix) Therefore, education is taxed on the margin for the low-ability type. Again, the mechanism is that general equilibrium effects relax the incentive compatibility constraint, and the government can redistribute more income.
Note that the expressions for the marginal education subsidies are all directly related to the top rate. Indeed, subsidies or taxes on education are larger if the top rate is lower, i.e., when general equilibrium effects are more important (lower σ), and if the government wishes to redistribute incomes more heavily (larger θ). If labor types are perfect substitutes (σ = ∞), the optimal marginal education subsidies for both high and low-skilled workers are zero, i.e., S 1 = S 2 = 0, cf. Bovenberg and Jacobs (2005).
General equilibrium effects on the skill-premium should indeed be exploited for redistribution under non-linear policies, in contrast to optimal linear policies. The intuition is that, by using non-linear instruments, the government can directly influence the skill premium π (H 1 /H 2 ), by setting different marginal tax and subsidy rates for each worker, as long as the policy remains incentive compatible. This holds true irrespective of preferences or technologies. Hence, by optimally giving marginal subsidies on the high-ability type and marginal taxes on the lowability type, the skill premium falls, the incentive compatibility constraint is relaxed, and the policy achieves more income redistribution.
The 'tax-formula result' of Saez (2004) is not applicable either with non-linear policy instruments, as the production elasticity (i.e., σ) enters optimal tax expressions. Further, indirect taxes/subsidies, such as education subsidies, are not optimally zero under non-linear income taxation with weakly separable preferences. This bolsters the findings by Naito (1999Naito ( , 2004Naito ( , 2007, who investigated the desirability of non-zero commodity taxes in similar general equilibrium settings with human capital formation, and comparative advantage. In the current model, education should optimally be taxed or subsidized under non-linear income taxation to exploit factor price changes for redistribution, and the Atkinson and Stiglitz (1976) theorem does not apply to education subsidies. In the absence of general equilibrium effects, education would not be taxed or subsidized on a net basis, and would be weakly efficient, i.e., efficient conditional upon distorted labor supply.
To check whether general equilibrium effects can be quantitatively important for optimal non-linear tax and education policies, we simulate the model following Stern (1982). We resort to the standard iso-elastic utility function with a constant wage elasticity of labor supply, which is augmented with a preference parameter δ to calibrate the dis-utility of labor supply u n (c n , l n ) ≡ c n − δ l 1+1/εn n 1 + 1/ε n , n = 1, 2.
In the baseline, we set the wage elasticities of labor supply equal for both types at ε 1 = ε 2 = 0.25. The preference parameter for leisure is calibrated at δ = 10, so as to keep labor effort of both types between 0 and 1. The production function for human capital is Cobb-Douglas, a n φ(e n ) ≡ a n e β n , with an elasticity β = 0.25. The ability parameters are set at a 1 = 10 and a 2 = 5. Hence, the high-ability worker is twice as 'smart' as the low-ability worker. The aggregate production function features a constant elasticity of substitution between skilled and unskilled labor: The elasticity of substitution between high and low-skilled labor is set at σ = 1.5, see Katz and Autor (1999). The share parameter is calibrated at γ = 0.59 to get an income share of skilled labor in total output of 1 − α = 0.67 in the absence of government intervention (see also Stern, 1982). The baseline welfare weights in the social welfare function are ω 1 = 1−ω 2 = 0.25. Setting ω 1 = ω 2 = 0.5 corresponds to an utilitarian criterion, which is non-redistributive because the private marginal utility of income is unity. ω 1 = 1 − ω 2 = 0 corresponds to a Rawlsian social welfare function. The simulation results are given in table 1. Calculations by Stern (1982) in models without endogenous human capital formation reveal that general equilibrium effects have little impact on simulated optimum tax rates. We largely Notes: Utility is given by u n (c n , l n ) ≡ c n − δ l 1+1/εn n 1+1/εn . The production function for human capital is a n φ(e n ) ≡ a n e β n . Output is Y = (γH ρ 1 + (1 − γ)H ρ 2 ) 1/ρ , σ ≡ 1 1−ρ , H n ≡ a n φ(e n )l n . And, social welfare is ω 1 u 1 + ω 2 u 2 . The baseline parameters are: ε 1 = ε 2 = 0.25, δ = 10, β = 0.25, a 1 = 10, a 2 = 5, σ = 1.5, γ = 0.59, ω 1 = 1 − ω 2 = 0.25. confirm this. The the top-rate is indeed only slightly negative, even with endogenous learning. Since the expressions for optimal education subsidies are all directly related to the top-rate, we see that the general equilibrium impact on optimum non-linear subsidies is rather limited as well.
As expected, the optimal tax expressions are most sensitive to changes in the elasticity of substitution σ. For different elasticities σ, we calibrate the parameter γ so as to keep the income share of skilled labor fixed at 1 − α = 0.67 in the absence of government intervention. 17 We find that marginal education subsidies and taxes are substantially differing from zero when the elasticity of substitution between skilled and unskilled labor falls below unity, which is empirically less plausible.
Further, the elasticity of high-ability labor supply ε 1 is important in explaining the pattern of marginal taxes and subsidies. It is a crucial determinant of the incentive compatibility constraint. The higher this elasticity, the more difficult it is for the high type to mimic the low type, and the larger are marginal subsidies on high-ability labor supply and education. From the table follows that the main results are not driven by the labor supply elasticity of the low type, nor by the human capital elasticity, the social welfare function, or the government revenue requirement.

Conclusions
This paper analyzed the simultaneous setting of optimal linear and non-linear income taxes and education subsidies in two-type models with endogenous labor supply, endogenous human capital formation, and endogenous wage rates. To investigate the potential role of general equilibrium effects in shaping optimal linear tax and education policies, we ensured that optimal subsidies on education are zero in the absence of general equilibrium effects on wages. This required weakly separable earnings functions, and, for linear instruments only, a constant elasticity in the human capital production function. For linear education subsidies to work in favor of redistribution, we further assumed that labor supply of the high-ability ('skilled') type is more elastic than labor supply of the low-ability ('unskilled') type. Linear taxes and subsidies cannot -by assumption -affect the skill premium if all elasticities are equal.
We showed that optimal linear education subsidies are zero, even if linear tax and education policies have the potential to provoke equilibrium effects on wage rates. The intuition is that linear income taxes are distributionally equivalent to (negative) linear education subsidies, and more efficient because income taxes do not generate distortions in human capital formation, whereas linear subsidies cause over-investment. This holds true whether general equilibrium effects are present or not. The optimal linear income tax is, however, lowered due to general equilibrium effects on wages if skilled labor supply is more elastic than unskilled labor supply. A higher income tax increases the skill premium, because skilled labor supply falls more than unskilled labor supply. These general equilibrium effects work against the direct redistributional gains of a higher income tax rate. The optimal linear income tax rate may even turn negative when general equilibrium effects on wages are sufficiently strong.
The results for optimal non-linear policies are found to be fundamentally different. With non-linear instruments, the government can directly affect the relative supply of skilled human capital using specific instruments, such as marginal subsidies on human capital or labor supply of the high type, and marginal taxes on human capital or labor supply of the low type. Consequently, one does not need to impose restrictions on preferences or the production function for human capital to obtain an impact of non-linear policies on the skill premium. The nonlinearity of tax and subsidy schedules is sufficient. Optimal non-linear policies do exploit general equilibrium effects on wages for redistributional purposes. The skilled worker optimally faces marginal subsidies on both work effort and education, whereas the unskilled worker optimally faces marginal taxes on work and education. As a result, wage differences are reduced, and the incentive compatibility constraint is relaxed, because the skilled worker finds it harder to mimic an unskilled worker. However, simulations of optimal non-linear policies revealed that the impact of general equilibrium effects on optimal policies is modest. Only at low levels of the elasticity of substitution between skilled and unskilled labor, marginal subsidies (taxes) on skilled (unskilled) education, and marginal subsidies on skilled work are found to be substantially positive.
In future research, the current analysis can be cast in a model with two production sectors, each exhibiting different factor intensities, so as to investigate the desirability of aggregate production efficiency, and the optimality of zero commodity taxation. Our conjecture is that deviations from aggregate production efficiency, and non-zero commodity will be optimal, as Naito (1999Naito ( , 2004Naito ( , 2007 has demonstrated in similar settings, but without education policies. However, it remains unclear how optimal education policies will be affected. The current twotype analysis of non-linear income taxation may also be extended to a setting with a continuum of skill-types in order to further investigate how factor prices should be exploited for redistribution under non-linear income taxation with more realistic skill distributions as in for example Diamond (1998) and Saez (2001). Nevertheless, the expressions for the incentive compatibility constraints reveal that each of them is dependent on the entire wage distribution, and therefore on all the actions of all other agents. As a consequence, the set of incentive compatibility constraints cannot easily be collapsed into a single differential equation on utility, as in Mirrlees (1971), and one needs to resort to numerical simulations. Our results under linear income taxation will also change when more general utility or earnings functions are used to analyze the importance of income effects, and to allow for the possibility that education has a varying degree of complementarity with work effort (see Jacobs and Bovenberg, 2007). Also, extensions with imperfections in labor markets due to for example minimum wages, search frictions, unions, and efficiency wages may be interesting avenues for future research.

Optimal non-linear tax policies
The optimal marginal tax rate on the high-ability type follows from multiplying the first-order condition for c 1 (32) by ∂u 1 /∂l 1 ∂u 1 /∂c 1 , and substituting the result in (33). Rearranging while using the definition for T 1 ≡ 1 + 1 w 1 a 1 φ(e 1 ) ∂u 1 /∂l 1 ∂u 1 /∂c 1 yields T 1 = θ η l 2 w 1 a 1 φ(e 1 ) Next, derive that ∂π −1 ∂H 1 = 1 σ α (1−α) 1 H 2 from differentiating the skill premium: where the property is used that the labor demand function is homogeneous of degree zero (21). Substitution of the last result yields the equation in the text.
Repeating the same procedure with the first-order conditions for c 2 (35) and for l 2 (36) yields the marginal tax on unskilled labor income T 2 ≡ 1 +