1 Introduction

According to the seminal contribution of Atkinson and Stiglitz (1976), if the income tax is allowed to be optimally nonlinear, commodity taxes are a redundant policy instrument when preferences are separable between leisure and other goods and individuals only differ in their innate market ability (skill). If the separability condition is not satisfied and the desired direction of redistribution goes from higher- to lower-skilled agents, one should use commodity taxes and subsidies to discourage the consumption of goods/services that are substitutes with labor supply and encourage the consumption of goods/services that are complements with labor supply.Footnote 1

There is now a sizable literature emphasizing that, in Mirrleesian income tax settings, subsidizing (or publicly providing) goods or services that are consumed in conjunction with labor supply can be welfare-enhancing. The argument is that, by subsidizing (or subjecting to a relatively more lenient tax treatment) the purchase of goods or services that are complements with labor supply, one can slacken the binding incentive constraints faced by a government designing a nonlinear income tax for redistributive purposes.Footnote 2 These constraints arise because the government does not directly observe an individual’s earning ability (skill), and therefore cannot levy taxes or transfers that are directly conditioned on innate ability. Instead, the government pursues its redistributive goals by means of an anonymous nonlinear income tax schedule, i.e., it designs a menu of combinations of gross incomes and taxes, and hence disposable income, and lets agents choose their preferred income point. To achieve redistribution, taxes on low-income earners must be lower than on high-income earners, and conceivably negative. High-skilled agents might then find attractive to mimic low-skilled agents by lowering their labor supply and earning an income qualifying for a lower tax burden. The tax schedule must then be designed in such a way that mimicking is deterred or, in other words, that it satisfies the incentive-compatibility constraints (self-selection constraints) requiring that each agent has no incentive to choose a point other than the one intended for his/her skill type on the income tax schedule set by the government.

One question that has so far been neglected in the literature is whether the welfare-enhancing effects from subsidizing work-related goods/services hinge on the ability of the government to optimize a fully nonlinear income tax, or whether similar welfare gains can be obtained in settings where the government relies on less sophisticated income tax systems like the ones that we typically observe in real economies.Footnote 3 The aim of this paper is to fill this gap in the literature by focusing on child care services as a paradigmatic example of goods/services that are complements with labor supply.Footnote 4 Besides the fact that real-world governments do not typically tax income on a fully nonlinear scale, one reason why the question that we address is interesting is that it is in principle ambiguous whether the effectiveness of subsidies to work-related goods as a welfare-enhancing instrument is increasing or decreasing in the degree of sophistication of the underlying income tax schedule. On the one hand, as the income tax becomes more sophisticated, social welfare increases and it becomes more difficult to reap welfare gains by supplementing income taxation with other policy instruments. On the other hand, as we have just remarked, as the income tax becomes more sophisticated (flexible), it becomes easier for the government to offset, for each agent, the distortionary effects generated by subsidizing the work-related good.

We employ a fairly canonical model where a subset of agents need child care services in order to work, and evaluate the welfare gains from subsidizing this work-related good under income tax schedules that exhibit different degrees of sophistication. In particular, we consider (1) a linear tax system, (2) a two-bracket piecewise linear income tax, (3) a four-bracket piecewise linear income tax, and (4) a fully nonlinear income tax.Footnote 5 We partition the population into “users” (parents) and “nonusers” (non-parents) of the work-related good, and based on the assumption that the attribute (parental status) identifying “users” is publicly observable, we allow the government to select different tax schedules for each group. This allows us to see how the presence of a need for a work-related consumption good affects optimal marginal income taxes, and how the optimal tax schedules change as a subsidy for the work-related good is introduced.Footnote 6 To simplify the analysis, we focus on the case where for each unit of labor supply, one unit of the work-related good needs to be acquired by “users.” In other words, we assume that parents need one hour of child care services for every hour of market work.

Our contribution is mainly quantitative. We present numerical simulations comparing the welfare gains of subsidies for work-related goods under different income tax systems. We characterize individual behavior based on an empirically relevant labor supply model, and we use administrative wage data from Sweden as our source of taxpayer heterogeneity.Footnote 7 The labor supply model adopts a quadratic utility function specification, inspired by the contributions by Stern (1986) and Tuomala (2010). This utility specification produces realistic labor supply behavior and is also computationally convenient as it admits a closed-form solution for the labor supply choice subject to a linear budget segment. This is especially practical when the government is optimizing piecewise linear tax schedules and the optimal choice of each individual needs to be calculated repeatedly, as in the nonlinear budget set procedure of Hausman (1979).Footnote 8 To compute the optimal fully nonlinear tax systems, we employ a specification with a large number of discrete types, following the simulation approach outlined in Bastani (2015), which also enables us to use exactly the same representation of the wage distribution in all our simulations.

Our results indicate that the effectiveness of subsidies to work-related goods as a welfare-enhancing instrument is increasing in the degree of sophistication of the underlying income tax schedule. While under a linear income tax, the magnitude of the welfare gains obtained by subsidizing the work-related good is negligible, the welfare gains that can be achieved by subsidizing the work-related good under a piecewise linear income tax amount to about the same as the gains that can be achieved by subsidizing the work-related good under an optimal nonlinear income tax. This finding enhances the policy relevance of the optimal tax argument in favor of providing subsidies to work-related goods. The optimal value of the subsidy rate on the work-related good is in general quite large and is also (weakly) increasing in the degree of sophistication of the underlying income tax schedule. Finally, our results also shed light on the relative welfare gains of employing piecewise linear rather than fully nonlinear income taxes, showing that piecewise linear taxes are able to reap a major part of the welfare gains associated with fully nonlinear income taxes.

The paper is organized as follows. In Sect. 2, we outline the nonlinear income tax problem, which serves as our theoretical benchmark. In Sect. 3, we present the governments’ problem in the case of linear and piecewise linear tax structures. Section 4 describes our empirical calibration as well as the linear and piecewise linear optimal tax problems. Section 5 describes our results, and finally, Sect. 6 concludes.

2 The nonlinear income tax problem

Consider a setting where agents differ in terms of their labor productivity (wage rates) and their need for a work-related good (child care services). Those who need child care services in order to work are for simplicity labeled “parents” and those who do not need child care services are labeled “non-parents.”Footnote 9

We let Y denote the before tax labor income, given by the product between an agent’s wage rate w and labor supply h. We also make the standard assumption that the policy maker can observe Y but not w or h separately. This rules out first-best personalized lump-sum taxes and transfers but allows labor income to be taxed on a nonlinear scale. Given our focus on child care services as a primary example of a work-related good/service, and given that parental status is an individual characteristic that can reasonably be regarded as publicly observable, we also assume that parental status can be used as a tag in the optimal tax problem, i.e., parents and non-parents face two distinct nonlinear income tax schedules.

The wage rate of an agent of skill type i belonging to group \(j=p,np\), where p refers to parents and np refers to non-parents, is denoted by \( w^{i,j}\). Without loss of generality, we assume that agents are ordered in such a way that \(w^{1,j}<w^{2,j}<...<w^{N,j},\)\(j=p,np\). The total population size is normalized to unity, and the proportion of a type ij-agent in the population is denoted by \(\pi ^{ij}\) and is known by the government. The (exogenous) per unit resource cost of child care services (which would be the price in a competitive market) is denoted by q. Non-parents do not need child care services. For parents, on the other hand, the demand for child care services is strictly related to the hours of work. Assuming that every parent has only one child, for every hour of work parents need one hour of child care services.Footnote 10 Child care services do not represent a good that enters the parents’ utility function directly; for them, it entails a real cost of working, a good which must be acquired in order to work. Thus, in an economy without taxes and public expenditure, the opportunity cost of leisure, which governs the agents’ decisions in an undistorted optimum, is equal to \( {\overline{w}}\equiv w-q\) and w for, respectively, parents and non-parents. All agents have identical preferences over consumption (net of expenditures on child care) c and hours of work h; these are represented by the utility function u(ch), possessing the standard properties.

2.1 Work-related good not subsidized

Let us start with a characterization of the solution to the government’s problem when the work-related good is not subsidized. The government’s objective is to maximize a weighted sum of agents’ utilities. Based on the link between pre-tax earnings and post-tax earnings implied by the tax schedule that applies to them, agents choose labor supply to maximize their utility. This allows us to implicitly express the marginal tax rates faced by agents as \(T^{\prime }\left( Y\right) =1-\mathrm{MRS}\), where MRS denotes the marginal rate of substitution between gross labor income and consumption. Defining by \(B\equiv Y-T\left( Y\right) \) the after-tax income associated with gross labor income Y, the government’s problem can be equivalently stated as the problem of selecting bundles in the \(\left( Y,B\right) \)-space subject to a set of self-selection constraints and a public budget constraint. The self-selection constraints require that each agent (weakly) prefers the bundle intended for him/her rather than behaving as a mimicker by choosing a bundle intended for some other agent.

Given that consumption is determined for parents as \(C=B-qh=B-qY/w\) and for non-parents as \(C=B\), we can define the agents’ indirect utility at any given point in the \(\left( Y,B\right) \)-space as \(V^{i,j}\left( B,Y\right) =u\left( B-\mathbf {1}[j=p]qY/w^{i,j},Y/w^{i,j}\right) \) where \(\mathbf {1} [\cdot ]\) denotes an indicator function. The slope of individuals’ indifference curves in the (YB)-space is given by the MRS expression:

$$\begin{aligned} \mathrm{MRS}^{i,j}(B,Y)= & {} -\frac{V_{Y}^{i,j}}{V_{B}^{i,j}}=\frac{1}{w^{i,j}}\\&\times \left[ \mathbf {1}[j=p]q-\frac{\partial u\left( B-\mathbf {1}[j=p]q\frac{Y}{w^{i,j}}, \frac{Y}{w^{i,j}}\right) /\partial h}{\partial u\left( B-\mathbf {1}[j=p]q \frac{Y}{w^{i,j}},\frac{Y}{w^{i,j}}\right) /\partial c}\right] . \end{aligned}$$

As can be seen from the expression above, the presence of a need for the work-related good affects the shape of the parents’ indifference curves. As a consequence, and in contrast to what happens in models where agents differ only in terms of skills, (weak) normality of c is no longer a sufficient condition to ensure that, at any given point in the \(\left( Y,B\right) \)-space, the indifference curves are flatter the higher the wage rate of an agent. Notice however that, although this agent-monotonicity property does not hold for the population as a whole, it still holds within each of the two groups. Thus, as we are assuming that the government is optimizing separate tax schedules for parents and non-parents, it is sufficient to restrict attention to constraints linking pairs of adjacent types when formalizing the government’s problem.Footnote 11

Denote by \(\alpha ^{ij}\) the welfare weight used by the government for agents of type ij, with \(\sum _{ij}\alpha ^{ij}=1\). Furthermore, assume that the chosen welfare weights imply that, for each of the two tagged groups, the government wants to redistribute from higher- to lower-ability agents so that the only (potentially) binding self-selection constraints are those running downwards and linking pair of adjacent types. Then, the problem solved by the government can be formally written as:

$$\begin{aligned} \underset{\left\{ B^{i,j},Y^{i,j}\right\} }{\max }\qquad \underset{i=1}{ \overset{N}{\sum }}\sum \limits _{j=p,np}\alpha ^{ij}V^{i,j}\left( B^{i,j},Y^{i,j}\right) \end{aligned}$$

subject to:

$$\begin{aligned} V^{i,j}(B^{i,j},Y^{i,j})\ge V^{i,j}(B^{i-1,j},Y^{i-1,j}),i\in \{2,...,N\}, \quad j\in \{p,np\}\,(\lambda ^{i,j}) \end{aligned}$$

and

$$\begin{aligned} \sum \limits _{i=1}^{N}\sum \limits _{j=p,np}\pi ^{ij}\left[ (Y^{i,j}-B^{i,j}) \right] \ge 0\quad (\mu ) \end{aligned}$$

where Lagrange multipliers are within parentheses. The first set of constraints represents the self-selection (incentive-compatibility) constraints, and the second constraint is the government’s budget constraint. Implicit in the formulation of the problem above is the idea that the possibility to tag agents based on parental status allows the government to solve two separate optimal income tax problems, one for parents and one for non-parents, with the possibility of accomplishing lump-sum inter-group transfers. Obviously, tagging is always welfare-improving compared to the case where a single tax schedule applies to the whole population. The welfare-enhancing potential of a tagging scheme derives from the fact that all self-selection constraints linking agents belonging to two separate tagged groups are eliminated.Footnote 12 In the above problem, this is reflected by the fact that we have written the self-selection constraints conditional on j.

As shown in Appendix A, manipulating the first-order conditions of the above problem, the general expression for the marginal tax rate faced by a type i agent, \(i\in \{1,\ldots ,N-1\}\), belonging to group \(j=p,np\) is given by:

$$\begin{aligned} T^{\prime }\left( Y^{ij}\right) =\frac{1}{\mu \pi ^{ij}}\left[ \lambda ^{i+1,j}{\widehat{V}}_{B}^{i+1,j}\left( \mathrm{MRS}^{i,j}(B^{i,j},Y^{i,j})-\mathrm{MRS}^{i+1,j}(B^{i,j},Y^{i,j})\right) \right] \end{aligned}$$
(1)

where \({\widehat{V}}_{B}^{i+1,j}\equiv \frac{d}{dB^{i,j}} V^{i+1,j}(B^{i,j},Y^{i,j})\). Instead, for the highest skilled agent in each group, the standard no-distortion at the top result applies, i.e., for agents \((i,j)=(N,p)\) and \((i,j)=(N,np)\), \(T^{\prime ij}=0\).

The result provided by (1) is a standard one in the optimal tax literature, and we do not discuss it at length. It states that the only reason to distort agents’ (labor supply) behavior is the presence of binding self-selection constraints. Moreover, given that the agent-monotonicity property holds within each of the two tagged groups, (1) implies that the labor supply of all agents, except the highest skilled within each group, is distorted downwards (\(T^{\prime }\left( Y^{ij}\right) >0\) for \( i\in \{1,\ldots ,N-1\}\) and \(j=p,np\)).

Let us now consider how the government’s problem is modified when nonlinear income taxation is supplemented by a child care subsidy.

2.2 Work-related good subsidized

As in our model child care services enter the individual decision problem of parents as a ‘needs constraint’ and are not subject to a separate individual choice, it is straightforward to show that the optimal child care subsidy is 100% when two separate nonlinear tax schedules apply to parents and non-parents. To provide an intuition for this result, suppose that a fully separating equilibrium with \(Y^{1,p}<\cdots <Y^{N,p}\) is achieved as a solution to the government’s problem described in the previous subsection. To show that a Pareto-improvement can be obtained by supplementing income taxation with a child care subsidy, consider the following tax reform. Denote, respectively, by \(\left( Y^{*j,p},B^{*j,p}\right) \) and \(\left( Y^{*j,np},B^{*j,np}\right) \) the bundle offered to parents and non-parents of skill type \(j=1,...,N\) at the solution to the problem where \( s=0\) (i.e., the problem described in the previous subsection). Now introduce a child care subsidy at rate \(s\in (0,1]\) and, while leaving unchanged the set of bundles \(\left( Y^{*j,np},B^{*j,np}\right) \) offered to non-parents, change the set of bundles for parents by offering the following packages: \(\left( Y^{*1,p},B^{*1,p}-sqY^{*1,p}/w^{1,p}\right) \) ,..., \(\left( Y^{*N,p},B^{*N,p}-sqY^{*N,p}/w^{N,p}\right) \).

Notice that, by keeping their labor supply after the reform at the original pre-reform level, the utility of all agents would be unaffected and the government’s budget constraint would still be satisfied since the income tax payment of each type of parents has been increased just enough to cover the cost of the subsidy that they receive (\(sqY^{*j,p}/w^{j,p}\) for \( j=1,...N \)). The only effects of the reform that are left to evaluate are those on the binding self-selection constraints.

Regarding this, no effects whatsoever are generated on the self-selection constraints that are relevant in the design of the nonlinear income tax faced by non-parents.Footnote 13 Consider now the self-selection constraints requiring higher-ability parents to be prevented from mimicking lower-ability parents. After implementation of the proposed reform, the consumption that a parent of skill type j can get by mimicking a parent of skill type \(j-1\) is now lower (by the amount \(sq\left[ \left( Y^{*j-1,p}/w^{j-1,p}\right) -\left( Y^{*j-1,p}/w^{j,p}\right) \right] \)) than before the reform, whereas the labor effort that he/she has to exert has not changed. We can therefore conclude that a child care subsidy is an unambiguously welfare-enhancing instrument in this case. Moreover, we can also notice that the consumption for a j-type parent behaving as a mimicker is lowered by an amount that is increasing in s, which in turn implies that the optimal subsidy rate is in this case 100%.Footnote 14

Based on the discussion above, we can then proceed to analyze the case where our work-related good is fully subsidized. In such a setting, the indirect utility is given by \(V^{i,j}\left( B,Y\right) =u\left( B,Y/w^{i,j}\right) \) for both \(j=p\) and \(j=np\) as child care purchases no longer appear in the (private) budget constraints of parents.Footnote 15 Instead, these expenditures enter the government’s budget constraint. The problem solved by the government in the presence of the subsidy is given by:

$$\begin{aligned} \underset{\left\{ B^{i,j},Y^{i,j}\right\} }{\max }\quad \underset{i=1}{ \overset{N}{\sum }}\sum \limits _{j=p,np}\alpha ^{ij}V^{i,j}\left( B^{i,j},Y^{i,j}\right) \end{aligned}$$

subject to:

$$\begin{aligned} V^{i,j}(B^{i,j},Y^{i,j})\ge V^{i,j}(B^{i-1,j},Y^{i-1,j}),i\in \{2,...,N\}, \,j\in \{p,np\}\text { }(\lambda ^{i,j}) \end{aligned}$$

and

$$\begin{aligned} \sum \limits _{i=1}^{N}\sum \limits _{j=p,np}\pi ^{ij}\left[ (Y^{i,j}-B^{i,j}) \right] \ge q\sum _{i=1}^{N}\pi ^{ip}\frac{Y^{i,p}}{w^{i,p}}\quad (\mu ) \end{aligned}$$

where Lagrange multipliers appear within parentheses.

As shown in Appendix B, manipulating the first-order conditions of the government’s problem, a general expression for the marginal tax rate faced by a type i agent, \(i\in \{1,\ldots ,N-1\}\), belonging to group \(j=p,np\) can be derived:

$$\begin{aligned} T^{\prime }\left( Y^{ij}\right)= & {} \frac{1}{\mu \pi ^{ij}}\left[ \lambda ^{i+1,j}{\widehat{V}}_{B}^{i+1,j}\left( \mathrm{MRS}^{i,j}(B^{i,j},Y^{i,j})-\mathrm{MRS}^{i+1,j}(B^{i,j},Y^{i,j})\right) \right] \nonumber \\&+\, \mathbf {1}[j=p]\frac{q}{w^{i,p}} \end{aligned}$$
(2)

where, again, \({\widehat{V}}_{B}^{i+1,j}\equiv \frac{{\text {d}}}{{\text {d}}B^{i,j}} V^{i+1,j}(B^{i,j},Y^{i,j})\). For agents of type \((i,j)=(N,np)\), we still have that \(T^{\prime ij}=0\), whereas for agents of type \((i,j)=(N,p)\) we have \( T^{\prime ij}=\frac{q}{w^{i,p}}\).

Comparing (1) and (2), it can thus immediately be seen that the only difference comes from the presence of a term \(\frac{q}{ w^{i,p}}\) in the expressions for the marginal tax rates faced by parents when income taxation is supplemented with a child care subsidy. The introduction of a subsidy is therefore likely to lead to an increase in the marginal tax rates for parents. However, the total distortions in the economy may in fact still be reduced. Intuitively, the q / w terms that enter the expressions for the marginal tax rates faced by parents do not represent distortionary terms but serve the same role as a market price in letting parents face the right incentives.Footnote 16 At the same time, the subsidy serves the purpose of weakening the self-selection constraints thwarting the government in the design of the nonlinear income tax that applies to parents. For these constraints, the mimicking-deterring effect reduces the need to distort agents for self-selection purposes. It therefore allows to reduction of the truly distortionary component (i.e., the \(\lambda \)-terms) in the formulas for the marginal tax rates.

Notice that the expressions for the marginal tax rates that apply to non-parents do not incorporate the q / w terms. This is important, since for them, these terms would represent a truly distortionary component. Notice also that the fact that the cost of child care is not mirrored in the expressions for the marginal tax rates that apply to non-parents does not mean that the additional resources needed to finance the child care subsidy are raised only from parents. It means that if also non-parents were to participate in the financing of the child care subsidy, the additional revenue extracted from them may to a large extent be collected in a non-distortionary way through an increase in inframarginal income tax rates.Footnote 17

Having analyzed the role of subsidies to work-related goods under a fully nonlinear income tax, in the next section we describe the quantitative model that we employ to compare the welfare-enhancing power of subsidies under different assumptions regarding the flexibility of the income tax at disposal of the government. Before doing this, however, a final remark is in order. As we have pointed out, in our setting a 100% subsidy rate is optimal under fully nonlinear taxation. This result does not necessarily extend to the case of less sophisticated tax systems as linear- and piecewise linear tax systems. The reason is that with less sophisticated income tax systems the government no longer has the required flexibility to fully offset for each agent the distortion on the leisure-labor choice generated by subsidizing the work-related good. Thus, even though incentive-compatibility constraints are implicitly present also under piecewise linear income taxes,Footnote 18 and therefore one can still regard a subsidy to work-related goods as an instrument exerting mimicking-deterring effects, full subsidization is not necessarily optimal. This also implies that it is in principle ambiguous whether the effectiveness of subsidies to work-related goods as a welfare-enhancing instrument is increasing or decreasing in the degree of sophistication of the underlying income tax schedule. On one hand, as the income tax becomes more sophisticated, social welfare increases and it becomes more difficult to reap welfare gains by supplementing income taxation with other policy instruments. On the other hand, as we have just remarked, when the income tax becomes more sophisticated, it becomes easier for the government to offset for each agent the distortionary effects generated by subsidizing the work-related good.

3 The linear and piecewise linear tax problems

We now present the government maximization problem under a four-bracket piecewise linear tax.Footnote 19 As before, we assume that the population consists of 2N different types of agents with wage rates \(w^{1,j}<w^{2,j}<\cdots <w^{N,j}\) where \(j\in \{np,p\}\). The total population size is normalized to one, and \(\pi ^{ij}\) denotes the population share of a type (ij) agent, \( i=1,\ldots ,n,j\in \{np,p\}\). The piecewise linear tax function is described by four slope parameters \( t_{1},t_{2},t_{3},t_{4}\), and three ‘break points’ \(Z_{i}\) defined as the points on the x-axis where the slope of T changes. The demogrant is denoted by G. Formally, the tax function as a function of income Y is defined as:

$$\begin{aligned} T(Y){=} {\left\{ \begin{array}{ll} -G+t_{1}Y &{}\quad Y\in [0,Z_{1}]; \\ -G+t_{1}Z_{1}+t_{2}(Y-Z_{1}) &{}\quad Y\in (Z_{1},Z_{2}]; \\ -G+t_{1}Z_{1}+t_{2}(Z_{2}-Z_{1})+t_{3}(Y-Z_{2}) &{}\quad Y\in (Z_{2},Z_{3}]; \\ -G+t_{1}Z_{1}+t_{2}(Z_{2}-Z_{1})+t_{3}(Z_{3}-Z_{2})+t_{4}(Y-Z_{3}) &{}\quad Y>Z_{3}. \end{array}\right. } \end{aligned}$$

The set of parameters of the tax function is denoted by

$$\begin{aligned} \Theta =\{(t_{1},t_{2},t_{3},t_{4},Z_{1},Z_{2},Z_{3},G)\mid t_{i}\in [0,1],Z_{3}>Z_{2}>Z_{1},Z_{i},G>0\}. \end{aligned}$$

The government designs two piecewise linear tax schedules, one for parents and one for non-parents, denoted by \(T(Y;\theta ^{p})\) and \(T(Y;\theta ^{np}) \), respectively. The consumption for an individual belonging to group \(j\in \{p,np\}\), with productivity w, choosing to earn an income of Y under a tax schedule described by the tax parameters \(\theta \), is given by:

$$\begin{aligned} C^{j}(Y;\theta ,w)=Y-T(Y,\theta )-\mathbf {1}[j=p]\cdot (1-s)q\frac{Y}{w} \end{aligned}$$

where \(s\in \left[ 0,1\right] \) denotes the subsidy rate to child care.

Agents choose Y to maximize \(U(C^{j}(Y),Y)\equiv u(C^{j}(Y),Y/w)\) leading to the indirect utility function:

$$\begin{aligned} V^{j}(\theta ;w)=U(C^{*j}(\theta ;w),Y^{*j}(\theta ;w)). \end{aligned}$$

Under a max–min social welfare function, the government solves the following problem:

$$\begin{aligned} \max _{(\theta ^{np},\theta ^{p})\in \Theta ^{2}}\Big [\min \Big \{V\left( \theta ^{np},w^{1,np}\right) ,V\left( \theta ^{p},w^{1,p}\right) \Big \}\Big ], \end{aligned}$$
(3)

subject to the resource constraint:

$$\begin{aligned} \sum _{i=1}^{N}\sum _{j=np,p}\pi ^{ij}\left( Y^{*j}(\theta ^{j};w^{i,j})-C^{*j}(\theta ^{j};w^{i,j})\right) \ge q\sum \limits _{i=1}^{N}\pi ^{ip}\frac{Y^{*p}(\theta ^{p};w^{i,p})}{w^{i,p} }. \end{aligned}$$

The solution to the problem above yields an optimal piecewise linear tax system with associated optimized tax schedules \(T^{*j}=T(Y;\theta ^{*j})\), \(j\in \{p,np\}\).Footnote 20 The solution also provides a value for the inter-group transfer, which will be denoted by \(G^{np,p}\), and which can be calculated as \(\sum _{i=1}^{N}\pi ^{i,np}\left( Y^{*np}(\theta ^{np};w^{i,np})-C^{*np}(\theta ^{np};w^{i,np})\right) \).Footnote 21 We solve this problem using numerical optimization techniques. A similar procedure is used to solve numerically the government’s problem under a two-bracket piecewise linear tax.

The case where the tax system is linear can be thought of as a limit case of the piecewise linear structure that we have described above. Simply, in the linear case, each of the two separate tax schedules features a single income bracket and a single marginal tax rate.Footnote 22

4 Quantitative model

In this paper, we use wages as a proxy for skills and calibrate the wage distribution to Swedish register data using the population distribution. Our wage data consist of individuals who worked at least part time in 2005. Parents are defined as women with at least one child in child care age (for Sweden, this corresponds to ages one to six); non-parents are defined as all men (with and without children) and all women without any child in day care age.Footnote 23 According to this definition, in 2005 the fraction of parents in Sweden was slightly below 10%.Footnote 24 As an estimate of the hourly price for child care, we have chosen a price of 40% of the median wage for parents.

In order to capture empirically relevant behavioral elasticities and facilitate a tractable comparison with different optimum tax models, we choose the following quadratic specification of the direct utility function:Footnote 25

$$\begin{aligned} u(c,h)=\alpha c^{2}+\beta (J-h)^{2}+\gamma c(J-h)+\delta (J-h)+\epsilon c, \end{aligned}$$
(4)

where \(\alpha ,\beta <0\), \(\gamma ,\delta ,\epsilon >0\).Footnote 26 The annual time endowment J is set to 5840 hours. The labor supply function is:

$$\begin{aligned} h(w)=\frac{2J\beta +m\gamma +\delta -w(2m\alpha +J\gamma +\epsilon )}{ 2\left( w^{2}\alpha +\beta -w\gamma \right) }, \end{aligned}$$

where m is virtual income and w is the wage rate. Finally, the (uncompensated) elasticity of labor supply is:

$$\begin{aligned} \eta =\left( \frac{-w^{2}\alpha +\beta }{w^{2}\alpha +\beta -w\gamma }-\frac{ 2J\beta +m\gamma +\delta }{2J\beta +m\gamma +\delta -w(2y\alpha +J\gamma +\epsilon )}\right) . \end{aligned}$$

We make the normalization \(\alpha =-1\) and impose the constraint that the labor supply function evaluated at a (net) wage rate of zero is (on average) equal to zero. This pins down \(\beta \). The remaining parameters that need to be chosen are \(\gamma \), \(\delta \), and \(\epsilon \). We choose \(\gamma =0.07\), \(\delta =95\), and \(\epsilon =2000\), which produce empirically relevant substitution- and income effects on labor supply.

Fig. 1
figure 1

Empirical targets

The uncompensated labor supply elasticity as a function of the (hourly) wage rate (denoted in SEK) is shown in the top panel of Fig. 1. Given that the distribution of wages for parents lies to the left of the wage distribution for non-parents, and that parents are interpreted as women with small children, the parameterization is consistent with the empirical finding that the labor supply of women with small children is more responsive to taxation.Footnote 27 The income elasticities of labor supply are shown in panel b) and range between − 0.05 and − 0.08, consistent with the empirical literature, as it usually documents small income effects. Finally, in the bottom panel of Fig. 1 the labor supply function is graphed.Footnote 28 Compared to parameterizations used in the earlier optimal tax literature, we believe the implied behavioral elasticities depicted in the graphs do, by and large, match more closely estimates found in the contemporary empirical labor supply literature.Footnote 29

To obtain a revenue-based measure of the welfare gains attainable by subsidizing child care under different income tax systems, we consider an equivalent variation type of welfare gain measure, taking as a benchmark the solution to the government’s problem under the linear income tax optimum.Footnote 30 We first calculate the minimum amount of extra revenue that should be injected into the government’s budget, in the linear income tax optimum without child care subsidies, in order to achieve the same social welfare level as under a different tax system (piecewise linear or fully nonlinear income tax, with or without child care subsidies). Once we have found this minimum amount of extra revenue, we divide it by the aggregate GDP at the linear income tax optimum without child care subsidies, to get a revenue-based measure of the welfare gains.

Regarding the social welfare function, we focus on the max–min, approximating this social welfare objective with the maximization of the demogrant. This is always a valid approach when the least well-off individual does not work.Footnote 31 In the simulation exercises presented below, the government optimizes two separate income tax schedules for the groups (parents and non-parents) and can transfer resources across the groups. In the case of a max–min social welfare function, this implies that the utility of the least well-off individual has to be the same in each group. When these agents do not work, a social welfare maximum requires the demogrant to be the same for both groups. For all the various tax systems that we consider (fully nonlinear, piecewise linear and linear), we represent the population distribution with 1998 agents and 999 wage rates from each group. These correspond to the quantiles of each distribution, with the exclusion of the extreme values.Footnote 32

Table 1 Optimal linear and piecewise linear taxes
Table 2 Welfare gain comparison
Fig. 2
figure 2

Optimal marginal tax rates for the fully nonlinear and four segment piecewise linear tax systems

5 Quantitative results

Our main results are contained in Tables 1 and 2 and in Fig. 2. In the figure, we have plotted the optimal fully nonlinear income tax system together with the optimal four-bracket piecewise linear tax system. The two top graphs display the marginal tax rate schedules for parents, whereas the two bottom graphs show the corresponding graphs for non-parents. The graphs to the left refer to the case with an optimally chosen subsidy to child care, whereas the graphs to the right refer to the case with no subsidy. The location of the break points in the piecewise linear tax system is indicated with vertical dashed lines. The marginal tax rates associated with the allocations chosen by agents under an optimal fully nonlinear income tax are indicated with blue dots, and the solid red line represents a kernel density approximation of the optimal schedule.Footnote 33

The values of the marginal tax rates for the linear- and piecewise linear tax schedules are displayed in Table 1 together with the value of the optimal subsidy rate and of the demogrant. As we can see from the table, the optimal subsidy rate drops below 100% only when the income tax system is linear. Thus, we get that in general very large subsidy rates are still optimal when the degree of sophistication of the income tax schedule is significantly lower than under a fully nonlinear income tax. At first sight, this may appear counterintuitive given that, under a max–min social welfare function, the government aims at maximizing the utility of the least well off, who are likely to be not working and therefore cannot directly benefit from a subsidy. However, notice that, since in our model all parents, irrespective of their market productivity, face an identical marginal cost of working (given by q, when \(s=0\)), a proportional subsidy on work-related expenditures becomes equivalent to a progressive wage subsidy. Formally, denoting by \(\overline{w}\) the net wage rate of a parent, i.e., \(\overline{w}\equiv \left( 1-t\right) w-\left( 1-s\right) q\), the combined effect of t and s is equivalent to a wage subsidy levied at rate \(\overline{s}=-t+sq/w\):

$$\begin{aligned} w\left( 1+\overline{s}\right) -q=\left( 1-t\right) w-\left( 1-s\right) q\Longrightarrow \overline{s}=s\frac{q}{w}-t, \end{aligned}$$

which turns out to be progressive as \(\partial \overline{s}/\partial w<0\).

Put differently, by supplementing a linear income tax with a proportional subsidy on work-related expenditures, the marginal effective income tax rate (MEITR) faced by an agent is given by \(\tau \equiv t-sq/w\). Thus, even though any given individual faces a constant MEITR, the value of the MEITR is increasing in the market productivity of an agent. Despite the fact that the government does not directly observe the market productivity of an agent, the combination of a flat tax rate t and a flat subsidy s allows the government to offer parents a set of skill-dependent marginal income tax schedules. In a sense, this can be seen as the possibility to introduce in the tax system an additional layer of tagging, even though of an imperfect kind given that all parents, irrespective of their skill type, face the same demogrant, and given that the MEITR \(\tau \) is constrained to vary in skill according to the function \(\partial \tau /\partial w=sq/\left( w^{2}\right) \) .

The last column of Table 1 shows that, when supplementing the income tax that applies to parents with an optimal subsidy on work-related expenditures, there is less need to engage in inter-group redistribution (the value of \(G^{np,p}\) drops in all cases when s is optimally chosen). This is due to the fact that, by relying on the subsidy, the government succeeds in raising the demogrant that can be self-financed via taxation of parents’ aggregate labor income.

In terms of the effects of the subsidy on the structure of optimal statutory marginal income tax rates (as opposed to MEITR), we can see from Table 1 that, while the subsidy has minor effects on the structure of marginal tax rates for non-parents, it shifts up the structure of statutory marginal tax rates for parents. In the linear income tax case, the optimal marginal tax rate increases by about 22%, from 33.29 to 55.63%. Taking into account that in our simulations q is set equal to 40% of the median wage for parents, a subsidy at 60% coupled with an increase from 33.29 to 55.63% in t implies, roughly, that the MEITR for parents is lowered for those with a productivity below the median level and is increased for those with a productivity above the median level. For the piecewise linear income tax cases (two brackets and four brackets), we can also see that the introduction of the subsidy, rather than simply shifting up uniformly the structure of statutory marginal tax rates for parents, is accompanied by an increase in the statutory marginal tax rates that becomes smaller as one considers higher-income brackets.Footnote 34 This implies that the statutory marginal tax rate structure faced by parents becomes more regressive as income taxation is supplemented with a subsidy on work-related expenditures.

Finally, the welfare comparisons are contained in Table 2. The reported results show that, although very large subsidy rates are in general optimal, the magnitude of the welfare gains that can be achieved by using this additional policy instrument varies significantly depending on the degree of sophistication of the underlying income tax schedule. In particular, whereas the welfare gains are negligible under a linear income tax, they are roughly of the same magnitude under a piecewise linear income tax and under a fully nonlinear income tax (0.52% for the case of a two-bracket piecewise linear tax, 0.48% for the case of a four-bracket piecewise linear tax, and 0.50% for the case of a fully nonlinear income tax).

Irrespective of whether work-related expenditures are subsidized or not, Table 2 also sheds light on the relative merits of increasingly sophisticated tax schedules. For the case when \(s=0\), the results show that, while a fully nonlinear income tax delivers large welfare gains compared to a linear income tax, a two-bracket piecewise linear tax already captures about 86% of the welfare gains achievable through a fully nonlinear optimal income tax, with the share increasing to about 91% for the case of a four-bracket piecewise linear income tax.Footnote 35 An almost identical picture emerges comparing the welfare gains of the various tax systems when the subsidy is optimally chosen.

5.1 Robustness with respect to the choice of social welfare function

Up until now, we have considered a max–min social welfare objective, which in a setting where a nonzero fraction of the population is non-working is equal to the objective of tax revenue maximization from the working population. This represents a simple and transparent benchmark case that has been analyzed extensively in the optimal tax literature. In Appendix D, we examine the sensitivity of our results to the choice of social objective by examining the results when the government is maximizing the following social welfare function:

$$\begin{aligned} \sum _{i=1}^{N}\sum _{j=p,np}\log (V^{i,j}). \end{aligned}$$

This is equivalent to a formulation where the social planner is of the Utilitarian type and preferences are given by \(\log (u)\) where u is defined in (4).Footnote 36 The results from this exercise are displayed in Tables 3, 4 and Fig. 3 (mirroring Tables 1, 2 and Fig. 2). As can be seen from the summary of the welfare gains in Table 4, the gains from nonlinear income taxation under the above social welfare function specification are significantly reduced due to the substantial decrease in the governments’ desire for redistribution.Footnote 37 However, it is still the case that the welfare gains of an optimally chosen child care subsidy are about the same under the fully nonlinear income tax as under a piecewise linear tax.

Finally, one may also note from Table 4 that the piecewise linear income tax is able to capture, in comparison with what happened for the case of a max–min social welfare function, a substantially smaller part (about 59% in the case of a two-bracket piecewise linear tax and about 66% in the case of a four-bracket piecewise linear tax) of the welfare gain associated with a fully nonlinear income tax.Footnote 38

6 Concluding remarks

The previous literature has shown that, in the presence of a fully nonlinear income tax, subsidizing complementary-to-labor private goods may be beneficial due to its role in alleviating the self-selection constraints faced by the government when trying to achieve redistributive goals. In this paper, we have set out to examine whether this finding is a theoretical curiosity, namely, that such gains only are achievable when the government is optimizing a fully nonlinear income tax, or whether sizable welfare gains can be obtained also when the government is optimizing simpler income tax systems of the kind used in real economies. This comparison is made possible through a computational approach where we are able to compute fully nonlinear optimal income taxes and piecewise linear taxes under identical circumstances in terms of the components of the optimal income tax model (social welfare function, distribution of productivities, and the model of household behavior).

The message that we provide is overall positive. Using a quantitative simulation model with behavioral foundations consistent with the empirical labor supply literature, our analysis indicates that, while the effectiveness of subsidies to work-related goods as a welfare-enhancing instrument is indeed increasing in the degree of sophistication of the underlying income tax schedule, the welfare gains that can be achieved by subsidizing the work-related good under a piecewise linear income tax is roughly the same as the gains that can be achieved by subsidizing the work-related good under an optimal fully nonlinear income tax. Regarding the optimal value of the subsidy, our results indicate that it is in general quite large and is (weakly) increasing in the degree of sophistication of the underlying income tax schedule.

Our results also indicate that, in general, an optimal nonlinear income tax delivers significant welfare gains compared to a linear income tax, even though the magnitudes of these welfare gains vary substantially depending on the chosen social welfare function. In particular, the welfare gains appear to be increasing in the degree of social aversion to inequality embedded in the social welfare function. However, in the context of our stylized model, between 66 and 91% of the gains of a fully nonlinear optimal income tax (over a linear income tax) can be captured by a four-bracket piecewise linear income tax, depending on the choice of social welfare function, and even with a two-bracket piecewise linear tax one could capture between 59 and 86% of the gains of a fully nonlinear optimal income tax.

To conclude, we would like to emphasize that the purpose of this paper has not been to provide realistic measures of the welfare gains that can be derived from subsidizing child care in real economies, as such exercises would require a more sophisticated model of household behavior. Instead, we have used a simple and computationally tractable model to illustrate how the welfare gains that derive from subsidizing a complementary-to-work good in a nonlinear income tax setting depend on the degree of sophistication of the income tax instrument.