1 Introduction

High levels of within-country inequality in many otherwise successful developing countries have become a key policy concern in global development debate. While some countries have very unequal inherent distributions (e.g., due to historical land ownership arrangements), in others the fruits of economic growth have been unequally shared. No matter what the underlying reason for the high inequality, often the only direct way for governments to affect the distribution of income is via redistributive tax and transfer systems. Clearly, public spending on social services also has an impact on the distribution of well-being, although some of the effects (such as skill-enhancing impacts from educational investment) only materialize over a longer time horizon.

Reflecting the desire to reduce poverty and inequality, redistributive transfer systems have, indeed, proliferated in many developing countries. Starting from Latin America, they are now spreading to low-income countries, including those in Sub-Saharan Africa.Footnote 1 In low-income countries, in particular, redistributive arrangements via transfers are still at an early stage, and they often consist of isolated, donor-driven programs. There is an urgent and well-recognized need to move away from scattered programs to more comprehensive tax-benefit systems.

This paper examines the optimal design of cash transfers, commodity taxes (or subsidies), the provision of public and private goods (such as education and housing), and financing them by a linear income tax. The paper also includes an analysis of optimal income taxation in the presence of an informal sector. The paper therefore provides an overview of many of the most relevant instruments for redistributive policies that are needed for a system-wide analysis of social protection. We build on the optimal income tax approach, which is extensively used in the developed country contextFootnote 2, but much less applied for the design of redistributive systems in developing country circumstances. This approach, initiated by Mirrlees (1971), allows for a rigorous treatment of efficiency concerns (e.g., the potentially harmful effect of distortionary taxation on employment) and redistributive objectives. Achieving the government’s redistributive objectives is constrained by limited information: the social planner cannot directly observe individuals’ income-earning capacity, and therefore it needs to base its tax and transfer policies on observable variables, such as gross income. The most general formulations of optimal tax models apply nonlinear tax schedules, but in a developing country context, using fully nonlinear taxes is rarely feasible. In this paper, we therefore limit the analysis to redistributive linear income taxes, which combine a lump-sum transfer with a proportional income tax, and which can be implemented by withholding at source if necessary.

Linear income taxes are not very common in practice: less than 30 countries had flat tax rates for personal income in 2012, with some concentration in ex-Soviet Eastern Europe (Peichl 2014). It is noteworthy that even though flat taxes are not particularly common in low-income countries, in many instances in such countries the progressive income tax reaches only a small share of the population. This would indicate that despite the existence of a progressive income tax, these countries do not yet possess enough tax capacity to implement well-functioning progressive income taxes. This is one motivation for our interest of modeling optimal linear taxes. Peichl (2014) suggests that simplification benefits can be especially relevant for developing countries.Footnote 3

In conventional optimal taxation models, the government’s objective function is modeled as a social welfare function, which depends directly on individual utilities. We depart from this welfarist approach by presenting general non-welfarist tax rules, as in Kanbur et al. (2006), and, in particular, optimal tax and public good provision rules when the government is assumed to minimize poverty. We have chosen this approach as it resembles well the tone of much of the policy discussion in developing countries, including the Millennium Development Goals (MDGs) and the new Sustainable Development Goals (SDGs), where the objective is explicitly to reduce poverty rather than maximize well-being.Footnote 4 Similarly, the discussion regarding cash transfer systems is often couched especially in terms of poverty alleviation. While we do not necessarily want to advocate poverty minimization over other social objectives, we regard examining its implications, and contrasting them with traditional welfaristic approaches, useful. Using non-welfarist objectives is, as such, nothing new in economics. In fact, as Sen (1985) has argued, one can be critical of utilitarianism for many reasons. Note also that the objective of poverty minimization is not at odds with the restriction of a linear tax scheme that we impose: a flat tax regime together with a lump-sum income transfer component can achieve similar amounts of redistribution toward the poor as a progressive tax system, if specified suitably (Keen et al. 2008; Peichl 2014). In all our analysis, we first present welfarist tax rules (which are mostly already available in the literature) to provide a benchmark to examine how applying poverty minimization as an objective changes the optimal tax and public service provision rules.

We also deal with some extensions to existing models, which are motivated by the developing country context, such as the case where public provision affects the individuals’ income-earning capacity, thus capturing (albeit in a very stylized way) possibilities to affect their capabilities. An important feature to take into account in tax analysis of developing countries is the presence of a large informal sector, and we also examine the implications of this for optimal redistributive policies.

Our paper is related to various strands of earlier literature. First, Kanbur et al. (1994) and Pirttilä and Tuomala (2004) study optimal income tax and commodity tax rules, respectively, from the poverty alleviation point of view, but their papers build on the nonlinear tax approach which is not well suited to developing countries. Kanbur and Keen (1989) do consider linear income taxation together with poverty minimization, but they do not produce optimal tax rules but focus on a tax reform perspective, and provide tax rate simulations. Others have considered different departures from the welfarist standard. For example, Fleurbaey and Maniquet (2007) consider fairness as an objective of the tax-transfer system and its implications on optimal taxation. Roemer et al. (2003) employ a maximin type of social goal and characterize how well tax and transfer systems achieve the goal of equality of opportunity. Second, our work is related to new contributions in behavioral public finance, which address the situation where the behavioral biases of the individuals lead the social planner to adopt a different objective function than the individuals have; see Chetty (2015), Gerritsen (2016), Farhi and Gabaix (2015). A third strand of literature considers taxation and development more generally, such as Gordon and Li (2009), Keen (2009, 2012), Bird and Gendron (2007) and Besley and Persson (2013).Footnote 5 This field, while clearly very relevant, has not concentrated much on the design of optimal redistributive systems. Finally, optimal linear income taxation has been studied from the standard welfarist perspective. We describe these models in Sect. 2.1. The most recent description of linear income tax models can be found in Piketty and Saez (2013). They also emphasize how linear tax rules, while analytically more feasible, provide the same intuition as the more complicated nonlinear models. The linear tax rules, they argue, are robust to alternative specificationsFootnote 6, and examining this forms part of our motivation: we study optimal linear tax policies, in our understanding for the first time, from the poverty minimization perspective.

The paper proceeds as follows. Section 2 examines optimal linear income taxation, while Sect. 3 turns to optimal provision rules for publicly provided private and public goods that are financed by such a linear income tax. Section 4 analyzes the combination of optimal linear income taxes and commodity taxation and asks under which conditions one should use differentiated commodity taxation if the government is interested in poverty minimization and also has optimal cash transfers at its disposal. The question of how optimal poverty-minimizing income tax policies are altered in the presence of an informal sector is examined in Sect. 5, whereas Sect. 6 presents a numerical illustration of optimal income taxation for poverty minimization. Finally, conclusions are provided in Sect. 7.

2 Linear income taxation

2.1 Optimal linear income taxation under the welfarist objective

In this section, we give an overview of some of the models and results for optimal linear income taxation as they have been presented in the literature. Many formulae for optimal taxation were developed in the 1970s and 1980s (see Dixit and Sandmo 1977; Tuomala 1985 and the survey by Tuomala 1990), and they are still being used, whereas Piketty and Saez (2013) offer fresh expressions of the tax rules. Our exposition mainly follows that of Tuomala (1985), but Appendix 1 shows how the results relate to those in Piketty and Saez (2013).

The government collects a linear income tax \(\tau \), which it uses to finance a lump-sum transfer b, along with other exogenous public spending R. The individuals differ in their income-earning capacity (\(w^{i})\), and \(z^{i}\) denotes individual labor income (\(w^{i}L^{i}\), where \(L^{i}\) represents hours worked). Consumption equals \(c^{i}=(1-\tau )z^{i}+b\), where the superscript-i refers to individuals.Footnote 7 There is a discrete distribution of N individuals, whose heterogeneous preferences over consumption and labor are captured by the utility function \(u^{i}(c^{i},z^{i})\). The maximized (subject to the individual budget constraint) value of this utility function is captured by the indirect utility function, which is denoted by \(V^{i}(1-\tau ,b)\), and we refer to the net-of-tax rate as \(1-\tau =a\). To simplify notation, subscript-a refers to the derivative with respect to the net-of-tax rate.

The government has redistributive objectives represented by a Bergson–Samuelson function \(W\left( V^{1},\ldots ,V^{N}\right) \) with \(W'>0\), \(W''<0\). The government’s problem is to choose the tax rate \(\tau \) and transfer b so as to maximize the social welfare function \(\sum W\left( V^{i}(a,b)\right) \) under the budget constraint \((1-a)\sum z^{i}=Nb+R\).Footnote 8 We denote the social marginal utility of income by \(\beta ^{i}=W_{V}V_{b}^{i}\).

All the mathematical details are presented in Appendix 1. There it is shown that the optimal tax rule is given by

$$\begin{aligned} \frac{\tau ^{*}}{1-\tau ^{*}}=\frac{1}{\varepsilon }\left( 1-\frac{z(\beta )}{\bar{z}}\right) , \end{aligned}$$
(1)

where \(\varepsilon =\frac{\mathrm{d}\bar{z}}{\mathrm{d}(1-\tau )}\frac{(1-\tau )}{\bar{z}}\) is the elasticity of total income with respect to the net-of-tax rate, \(\bar{z}\) is average income and \(z(\beta )=\frac{\sum \beta ^{i}z^{i}}{\sum \beta ^{i}}\) welfare-weighted average income. Define \(\varOmega =\frac{z(\beta )}{\bar{z}}\), so that \(I=1-\varOmega \) is a normative measure of inequality or, equivalently, of the relative distortion arising from the second-best tax system. Clearly \(\varOmega \) should vary between zero and unity. One would expect it to be a decreasing function of \(\tau \) (given the per capita revenue requirement \(g=\frac{R}{N}\)). There is a minimum feasible level of \(\tau \) for any given positive g, and of course g must not be too large, or no equilibrium is possible. Hence any solution must also satisfy \(\tau >\tau _\mathrm{min}\) if the tax system is to be progressive. That is, if the tax does not raise sufficient revenue to finance the non-transfer expenditure, R, the shortfall must be made up by imposing a poll tax (\(b<0\)) on each individual. One would also expect the elasticity of labor supply with respect to the net-of-tax rate to be an increasing function of \(\tau \) (it need not be).

We can rewrite (1) as \(\tau ^{*}=\frac{1-\varOmega }{1-\varOmega +\varepsilon }\) to illustrate the basic properties of the optimal tax rate. Because \(\varepsilon \ge 0\) and \(0\le \varOmega <1\), both the numerator and denominator are nonnegative. The optimal tax rate is thus between zero and one. The formula captures neatly the efficiency-equity trade-off. \(\tau \) decreases with \(\varepsilon \) and \(\varOmega \), and we have the following general results: (1) In the extreme case where \(\varOmega =1\), i.e., the government does not value redistribution at all, \(\tau =0\) is optimal. We can call this case libertarian. According to the libertarian view, the level of disposable income is irrelevant (ruling out both basic income b, and other public expenditures, g, funded by the government). (2) If there is no inequality, then again \(\varOmega =1\) and \(\tau =0\). There is no intervention by the government. The inherent inequality will be fully reflected in the disposable income. Furthermore, lump-sum taxation is optimal; \(b=-g\) or \(T=-b\). (3) We can call the case where \(\varOmega =0\) as “Rawlsian” or maximin preferences. The government maximizes tax revenue (optimal \(\tau =\frac{1}{1+\varepsilon }\)) as it maximizes the basic income b (assuming the worst off individual has zero labor income). In fact, maximizing b can be regarded as a non-welfarist case, which is the focus in the next subsection.

2.2 Optimal linear income taxation under non-welfarist objectives

A non-welfarist government is one that follows a different set of preferences than those employed by individuals themselves (Kanbur et al. 2006). Thus, instead of maximizing a function of individual utilities, the government has other, paternalistic objectives that go beyond utilities. A special case taken up in more detail below is the objective of minimizing poverty in the society. To be as general as possible, let us define a “social evaluation function” (as in, e.g., Kanbur et al. 2006) as \(S=\sum F(c^{i},z^{i})\), which the government maximizes instead of the social welfare function. \(F(c^{i},z^{i})\) measures the social value of consumption \(c^{i}\) for a person with income \(z^{i}\) and can be related to \(u(c^{i},z^{i})\) but is not restricted to it. Following Tuomala’s model as above, given the instruments available, linear income tax \(\tau \), lump-sum grant b and other expenditure R the government thus maximizes \(\sum F(az^{i}+b,z^{i})\) subject to the budget constraint \((1-a)\sum z^{i}-Nb=R\). Define

$$\begin{aligned} \frac{\sum \left( F_{c}(z^{i}+az_{a}^{i})+F_{z}z_{a}^{i}\right) }{\sum \left( F_{c}(1+az_{b}^{i})+F_{z}z_{b}^{i}\right) }\equiv \tilde{F}, \end{aligned}$$
(2)

which reflects the relative impact of taxes and transfers on the social evaluation function. Using this definition, and following the same steps as in the previous section (see Appendix), the optimal tax rate becomes:

$$\begin{aligned} \frac{\tau ^{*}}{1-\tau ^{*}}=\frac{1}{\varepsilon }\left( 1-\frac{\tilde{F}}{\bar{z}} \right) . \end{aligned}$$
(3)

The result resembles the welfarist tax rule in (1). In addition to labor supply considerations via the term \(\frac{1}{\varepsilon }\), they both entail a term that measures the relative benefits of taxes and transfers, in the welfarist case via welfare-weighted income, in the non-welfarist case via \(\tilde{F}\), the relative impact on the social evaluation function. Note that since under non-welfarism individuals are not necessarily at their utility optimum, the envelope condition does not apply and thus the behavioral responses \(z_{a}^{i}\) and \(z_{b}^{i}\) are not cancelled out in \(\tilde{F}\). That is, the impacts of tax changes on labor supply are not trivial under non-welfarism. The terms \(\sum z_{a}^{i}\left( F_{c}a+F_{z}\right) \) in the numerator and \(\sum z_{b}^{i}\left( F_{c}a+F_{z}\right) \) in the denominator of (2) capture these effects on the social evaluation function. If taxation had no behavioral impacts (\(z_{a}^{i}=z_{b}^{i}=0\)), it would affect the value of the social evaluation function only by mechanically altering individual after-tax income. Note that in this case, \(\tilde{F}=\frac{\sum F_{c}z^{i}}{\sum F_{c}}\) would be a more direct equivalent to \(z(\beta )=\frac{\sum \beta ^{i}z^{i}}{\sum \beta ^{i}}\). The same equivalence would be achieved also when \(F_{c}a+F_{z}=0\), that is, the social marginal rate of substitution between income and consumption equals the private rate: \(-\frac{F_{z}}{F_{c}}=a=-\frac{u_{z}^{i}}{u_{c}^{i}}\) (the latter is obtained from the individual’s first-order condition). In these cases, \(\tilde{F}\) would be a purely redistributive term, albeit a non-welfaristic one. Paternalistic concerns additionally enter the optimal tax rule via labor supply changes, captured by the response of z. In this way, the tax rule in (3) can be decomposed, and this decomposition is similar in spirit to the corrective parts of the tax formulae in the new optimal tax literature with behavioral agents, such as Farhi and Gabaix (2015) and Gerritsen (2016).

The signs and magnitudes of \(F_{c}\) and \(F_{z}\) and thus of \(\tilde{F}\) depend on the specific objective of the government, that is, on the shape of F. Let us consider the specific case of poverty minimization below.

2.2.1 Special case: poverty minimization

Now let us derive the optimal linear tax results for a government whose objective is to minimize poverty in society. The instruments available to the government are the same, \(\tau \) and b, and other exogenous expenditure is R. Note first that the revenue-maximizing tax rate is in fact equivalent to the tax rate obtained from a maximin objective function, since when the government only cares about the poverty (consumption) of the poorest individual, its only goal is to maximize redistribution to this individual, i.e., maximize tax revenue.

Let us first define the objective function of the government explicitly. Poverty is defined as deprivation of individual consumption \(c^{i}\) relative to some desired level \(\bar{c}\) and measured with a deprivation index \(D\left( c^{i},\bar{c}\right) \), such that \(D>0\,\forall \,c\in [0,\bar{c})\) and \(D=0\) otherwise, and \(D_{c}<0,\,D_{cc}\ge 0\,\forall \,c\in [0,\bar{c})\), as in Pirttilä and Tuomala (2004). A typical example of such an index would be the \(P_{\alpha }\) family of Foster–Greer–Thorbecke (FGT) poverty indices. We discuss the application of FGT indices in our model in Appendix 2. Note, however, that the choice of poverty index depends on the preferences of the government, whether they wish to minimize the total amount of deprivation in the society, or are for instance concerned especially about the incomes of the poorest of the poor. The social evaluation function \(F(c^{i},z^{i})\) becomes \(D\left( c^{i},\bar{c}\right) \) and the objective function is \(\min \,P=\sum D\left( c^{i},\bar{c}\right) \). Now \(F_{c}=D_{c}\) and \(F_{z}=0\), so

$$\begin{aligned} \tilde{F}=\tilde{D}=\frac{\sum D_{c}\left( z^{i}+az_{a}^{i}\right) }{\sum D_{c}\left( 1+az_{b}^{i}\right) }, \end{aligned}$$
(4)

and the optimal tax rule becomes:

$$\begin{aligned} \frac{\tau ^{*}}{1-\tau ^{*}}=\frac{1}{\varepsilon }\left( 1-\frac{\tilde{D}}{\bar{z}} \right) . \end{aligned}$$
(5)

Since now \(F_{z}=0\), the result is closer to (1) than (3) was, although part of the labor supply impacts still remain. Here \(\tilde{D}\) describes the relative efficiency of taxes and transfers in reducing deprivation. Both the numerator and denominator of \(\tilde{D}\) depend on \(D_{c}\), so the difference in the relative efficiency of the two depends on \(z_{a}^{i}\) and \(z_{b}^{i}\). The more people react to taxes (relative to transfers) by earning less, the higher is \(\tilde{D}\) and the lower should the tax rate be. In (1), the higher is the social value of income, the higher is \(z(\beta )\) and the lower should the tax rate be.

Since the form of the result is similar in the welfarist and the poverty minimization cases, the analysis could be also seen as a special case of the argument in Saez and Stantcheva (2016), who derived generalized social welfare weights and express the tax formulae in terms of those.Footnote 9 Here, the generalized social welfare weight would thus be derived from a poverty minimization objective. It could be close to a suitably defined welfarist criterion, and clearly it would be exactly the same only if the welfarist criterion would correspond to the chosen poverty minimization objective.

We can also rewrite \(\tilde{D}\), using \(a=1-\tau \), as: \(\frac{\sum D_{c}\left( z^{i}+(1-\tau )\frac{\partial z^{i}}{\partial (1-\tau )}\right) }{\sum D_{c}\left( 1+(1-\tau )z_{b}^{i}\right) }=\frac{\sum D_{c}\left( 1+\frac{(1-\tau )}{z^{i}}\frac{\partial z^{i}}{\partial (1-\tau )}\right) z^{i}}{\sum D_{c}\left( 1+(1-\tau )z_{b}^{i}\right) }=\frac{\sum D_{c}\left( 1+\varepsilon ^{i}\right) z^{i}}{\sum D_{c}\left( 1+(1-\tau )z_{b}^{i}\right) }\). Thus the \(\tilde{D}\) in the optimal tax result (5) entails a further consideration that depends on labor supply responses. It combines paternalistic preferences—how much poverty is reduced—with the behavioral responses to a tax system—how much labor income increases when the take-home pay goes up. The latter effect tends to lower the optimal tax rate to induce the poor to work more. Kanbur et al. (1994) find a similar result in their nonlinear poverty-minimizing tax model. Here, however, we are restricted to lower the tax on everyone instead of only the poorest individuals.

To summarize, the non-welfarist tax rules differ from the welfarist ones, depending on the definition of non-welfarism in question (the \(F_{c}\) and \(F_{z}\) terms). However, when we take poverty minimization as the specific case of non-welfarism, the tax rules are quite similar to welfarist ones. The basic difference is that equity is not considered in welfare terms but in terms of poverty reduction effectiveness. A more notable difference arises from efficiency considerations. With linear taxation, taking into account labor supply responses means that everybody’s tax rate is affected, instead of just the target group’s. If we want to induce the poor to work more to reduce their poverty, we need to lower everyone’s tax rate. The welfarist linear tax rule does not take this into account. It is not, however, possible to state that under poverty minimization tax rates are optimally lower than under welfare maximization, since we cannot directly compare the welfare and deprivation terms. However, there is an additional efficiency consideration involved under poverty minimization. Nonlinear tax rules of course make it possible to target lower tax rates on the poorer individuals, but in a developing country context with lower administrative capacity this is not necessarily possible, and such considerations affect everyone’s tax rate.

3 Public good provision with linear income taxes

3.1 Optimal public provision under the welfarist objective

Let us first extend the welfarist model of linear taxation to include the provision of pure public goods. The government offers a universal pure public good G, which enters individual utilities in addition to the consumption of private goods. The government’s objective function is now \(\sum W\left( V^{i}(a,b,G)\right) \), whereas the budget constraint becomes \((1-a)\sum z^{i}-Nb-N\pi G=R\) where \(\pi \) is the producer price of the public good. The consumer price of private consumption is normalized to 1. Let us now define the marginal willingness to pay for the public good by the expression \(\sigma =\frac{V_{G}}{V_{b}}\) and \(\sigma ^{*}=\frac{\sum \beta ^{i}\sigma ^{i}}{\sum \beta ^{i}}\) as the welfare-weighted average marginal rate of substitution between public good and income for individual i. The rule for public provision can then be written as

$$\begin{aligned} \pi =\sigma ^{*}-\tau \left( \sigma ^{*}\bar{z}_{b}-\bar{z}_{G}\right) . \end{aligned}$$
(6)

This public good provision rule is a version of a modified Samuelson rule. It equates the relative cost of providing the public good to the welfare-weighted sum of marginal rates of substitution (MRS). It also includes a revenue term, which takes into account the impacts of public good provision and income transfers on labor supply and thus tax revenue.

Consider first the case when labor supply does not depend on public good provision and there are no income effects, i.e., \(\bar{z}_{G}=\bar{z}_{b}=0\). Then we are left with a more familiar rule that welfare-weighted aggregate MRS must equal the cost of the public good. When we add income effects so that \(\bar{z}_{b}<0\), and since \(\sigma ^{*}\) is positive, then because of the second term in (6), the financing costs of the public good are reduced. Likewise, if labor supply and public provision are positively related, the financing costs of the public good are reduced.

3.2 Optimal provision of public goods under poverty minimization

Now consider a non-welfarist government interested in minimizing poverty. The public good G which it offers enters the deprivation index separately from other, private consumption x: \(D\left( x,G,\bar{x},\bar{G}\right) \). The government still offers a lump-sum cash transfer b as well and finances its expenses with the linear income tax \(\tau \).

Again alternative formulations of the public good provision rule can be written. The first is

$$\begin{aligned} \pi =D^{*}-\tau \left( D^{*}\bar{z}_{b}-\bar{z}_{G}\right) , \end{aligned}$$
(7)

which can be compared with Eq. (6). Here, \(D^{*}=\frac{\sum D_{G}+\sum D_{x}az_{G}^{i}}{\sum D_{x}\left( 1+az_{b}^{i}\right) }\) captures the efficiency of the public good in reducing deprivation relative to the income transfer (because \(D_{G},\,D_{x}<0\), \(D^{*}>0\)). Again, if \(\bar{z}_{G}=\bar{z}_{b}=0\), the equation reduces to \(\pi =D^{*}=\frac{\sum D_{G}}{\sum D_{x}}\). This rule highlights a considerable difference to the standard modified Samuelson rules, reflecting instead of a welfare-based MRS the direct poverty reduction impact of the public good. With \(\bar{z}_{G}\ne 0\) and \(\bar{z}_{b}\ne 0\), \(D^{*}\) also depends on the indirect impacts of the public good via labor supply on consumption. As previously, the right-hand side includes a tax revenue term. Using the same example as in the context of (6), if \(\bar{z}_{G}=0\) and \(\bar{z}_{b}<0\), the price \(\pi \) of the public good would be higher than its relative efficiency in eliminating deprivation.

Here we have allowed the government to be directly interested in the consumption of some pure public good. But if the government is solely interested in reducing income poverty, it might not include such goods in the deprivation measure.Footnote 10 However, suppose that individual welfare does not directly depend on the public good provided but the public good can have a productivity increasing impact. An example could be publicly provided education services that affect individuals’ productivity via the wage rate. We therefore suppose that the direct impact of the public good on deprivation cancels out (i.e., \(D_{G}=0\)), whereas the wage rate becomes an increasing function of G, i.e., \(w'(G)>0\) (denoting \(z=w(G)L\)). This means that the expression for \(D^{*}\) is rewritten as

$$\begin{aligned} D^{*}=\frac{\sum D_{x}a\left( w\frac{\partial L}{\partial G}+w'L\right) }{\sum D_{x}\left( 1+aw\frac{\partial L}{\partial b}\right) }. \end{aligned}$$
(8)

This means that even if labor supply would not react to changes in public good provision, such provision would still be potentially desirable through its impact on the wage rate. In this way, public good provision can be interpreted as increasing the capability of the individuals to earn a living wage, which serves as a poverty reducing tool, and which can in some cases be a more effective way to reduce poverty rather than direct cash transfers. The optimality depends on the relative strength of \(w'(G)>0\) versus the direct impact of the transfers.

An alternative provision rule for the public good, which results from extending the Piketty–Saez approach, in the usual case where it also enters individuals’ utility function is

$$\begin{aligned} \frac{\int \left( D_{G}+D_{x}(1-\tau )\frac{\partial z^{i}}{\partial G}\right) \,\mathrm {d}\nu (i)}{\int D_{x}\,\mathrm {d}\nu (i)}=\pi -\tau \frac{\mathrm{d}Z}{\mathrm{d}G}. \end{aligned}$$
(9)

In the numerator of the left-hand side, the first term is the direct deprivation effect of G and the second term captures the indirect deprivation effect, operating via the labor supply impacts of the public good, which affect the level of private consumption, x. These impacts are scaled by the poverty alleviation impact of private consumption itself (the impact of a cash transfer). The right-hand side reflects the costs of public good provision: besides the direct cost of the good there is an indirect tax revenue effect operating through labor supply. The condition is directly comparable to the welfarist rule, given in (39) in the Appendix, because even though the welfarist case relies on utilities, in the FOC for G no envelope condition is evoked. The only difference between Eqs. (39) and (9) is that the utility and welfare weight terms are exchanged for deprivation terms.

Consider finally the provision of a quasi-private good, such that in addition to the publicly provided amount, individuals can purchase (“top-up”) the good themselves as well. The good is denoted by s and its total amount consists of private purchases h and public provision G: \(s=G+h\). In addition to good s, individuals consume other private goods, denoted by x. The individual budget constraint is thus \(c^{i}=x^{i}+ph^{i}=(1-\tau )z^{i}+\tau Z(1-\tau )-R-\pi G\), where p is the consumer price of private purchases of the quasi-private good. The producer price of education in the private sector (p) or in the public sector (\(\pi \)) can be equal, or one sector could have access to cheaper technology. Deprivation is determined in terms of consumption of x and s, so the objective function is \(\min \,P=\int D\left( x^{i},s^{i},\bar{x},\bar{s}\right) \,\mathrm {d}\nu (i)\). In this case, the provision rule is

$$\begin{aligned} \frac{\int \left[ D_{x}\left( (1-\tau )\frac{\partial z^{i}}{\partial s}\frac{\partial s}{\partial G}-p\frac{\partial h^{i}}{\partial G}\right) +D_{s}\frac{\partial s^{i}}{\partial G}\right] \,\mathrm {d}\nu (i)}{\int D_{x}\,\mathrm {d}\nu (i)}=\pi -\tau \frac{\mathrm{d}Z}{\mathrm{d}G}. \end{aligned}$$
(10)

The result is analogous to the pure public good result in (9), with the difference that now the impact G has on poverty depends on whether public provision fully crowds out private purchases of the good (i.e., \(\frac{\mathrm{d}h}{\mathrm{d}G}=-1\Leftrightarrow \frac{\mathrm{d}s}{\mathrm{d}G}=0\)) or not (i.e., \(\frac{\mathrm{d}h}{\mathrm{d}G}=0\Leftrightarrow \frac{\mathrm{d}s}{\mathrm{d}G}=1\)). If there is full crowding out, an increase in public provision of G that is fully funded via a corresponding increase in the tax rate has no impact on the consumption of s and consequently no impact on poverty. If there is no crowding out, however, the FOC becomes

$$\begin{aligned} \frac{\int \left[ D_{x}\left( (1-\tau )\frac{\partial z^{i}}{\partial s}\right) +D_{s}\right] \,\mathrm {d}\nu (i)}{\int D_{x}\,\mathrm {d}\nu (i)}=\pi -\tau \frac{\mathrm{d}Z}{\mathrm{d}G}, \end{aligned}$$
(11)

which is the same as in the case of a pure public good in Eq. (9).

To summarize, the welfarist public provision rule, when public goods are financed with linear income taxes and supplemented with lump-sum transfers, differs from the standard modified Samuelson rule. It equates a welfare-weighted sum of MRS to the marginal cost where tax revenue impacts are taken into account. Indirect effects of public provision (through labor supply decisions and thus private consumption) are incorporated. The poverty-minimizing public provision rule, however, replaces the welfare-weighted sum of MRS with the relative marginal returns to deprivation reduction. Here the “MRS” term measures how well public good is translated to reduced poverty (incorporating indirect effects as well), relative to private consumption. Finally, when the public good has positive effects on productivity, its provision can be desirable even if it would not have any direct impact on poverty.

4 Commodity taxation with linear income taxes

4.1 Optimal commodity taxation with linear income tax under the welfarist objective

This section considers the possibility that the government also uses commodity taxation (subsidies) to influence consumers’ welfare. We follow the modeling of Diamond (1975). Unlike the analysis above, there are J consumer goods \(x_{j}\) instead of just two. Working with many goods is used to be able to more clearly describe the conditions under which uniform commodity taxation occurs at the optimum. The government levies a tax \(t_{j}\) on the consumption of good \(x_{j}\), so that its consumer price is \(q_{j}=p_{j}+t_{j},\) where \(p_{j}\) represents the producer price (a commodity subsidy would be reflected by \(t_{j}<0)\). Let q denote the vector of all consumer prices. In addition, the government can use a lump-sum transfer, b. Note that in this exposition, leisure is the untaxed numeraire commodity. Alternatively, one could also imply a linear tax on labor supply as above and treat one of the consumption goods as the untaxed numeraire. However, choosing leisure as the numeraire makes the exposition easier. Thus, the consumer’s budget constraint is \(\sum _{j}q_{j}x_{j}^{i}=z^{i}+b.\)

The government maximizes \(\sum _{i}W\left( V^{i}(b,q)\right) \) subject to its budget constraint \(\sum _{i}\sum _{j}t_{j}x_{j}^{i}-Nb=R.\) It is useful to define, following Diamond (1975),

$$\begin{aligned} \gamma ^{i}=\beta ^{i}+\lambda \sum _{j}t_{j}\frac{\partial x_{j}^{i}}{\partial b} \end{aligned}$$
(12)

as the net social marginal utility of income for person i. This notion takes into account the direct marginal social gain, \(\beta ^{i}\) , and the tax revenue impact arising from commodity demand changes. The rule for optimal commodity taxation for good k is shown to be

$$\begin{aligned} \frac{1}{N}\sum _{i}\sum _{j}t_{j}\frac{\partial \tilde{x}_{k}^{i}}{\partial q_{j}}=\frac{1}{\lambda }\mathrm{cov}(\gamma ^{i},x_{k}^{i}). \end{aligned}$$
(13)

The left-hand side of the rule is the aggregate compensated change (weighted by commodity taxes) of good k when commodity prices are changed. The right-hand side refers to the covariance of the net marginal social welfare of income and consumption of the good in question. The rule says that the consumption of those goods whose demand is the greatest for people with low net social marginal value of income (presumably, the rich) should be discouraged by the tax system. Likewise the consumption of goods such as necessities should be encouraged by the tax system.

The key policy question is whether or when uniform commodity taxes are optimal, or, in other words, when would a linear income tax combined with an optimal demogrant be sufficient to reach the society’s distributional goals at the smallest cost. Deaton (1979) shows that weakly separable consumption and leisure and linear Engel curves are sufficient conditions for the optimality of uniform commodity taxes. These requirements are quite stringent and unlikely to hold in practice; however, the economic importance they imply is unclear. If implementing differentiated commodity taxation entails significant administrative costs, they may easily outweigh the potential benefits of distributional goals and that is why economists have typically been quite skeptical about non-uniform commodity taxation when applied to practical tax policy.

4.2 Optimal commodity taxation with linear income tax under poverty minimization

Poverty could be measured in many ways when there are multiple commodities: the government may care about overall consumption, the consumption of some of the goods (those that are in the basket used to measure poverty) or then it cares about both the overall consumption and the relative share of different kinds of consumption goods (such as merit goods). We discuss these measurement issues in Appendix 2, but here we examine the simplest set-up where deprivation only depends on disposable income, \(c^{i}=z^{i}+b\). Using the consumer’s budget constraint, this is equal to the overall consumption level, \(\sum _{j}q_{j}x_{j}^{i}\).

The government thus minimizes the sum of the poverty index \(D\left( \sum _{j}q_{j}x_{j}^{i},\bar{c}\right) \), and the budget constraint is the same as before. It is again useful to define

$$\begin{aligned} \gamma _{P}^{i}=D_{c}\sum _{j}q_{j}\frac{\partial x_{j}^{i}}{\partial b}+\lambda \sum _{j}t_{j}\frac{\partial x_{j}^{i}}{\partial b} \end{aligned}$$
(14)

as the net poverty impact of additional income for person i. This notion takes into account the direct impact on poverty and the tax revenue impact arising from commodity demand changes.

As shown in Appendix 1 section “Commodity taxation”, this leads to an optimal tax rule as below:

$$\begin{aligned} \frac{1}{N}\sum _{i}\sum _{j}t_{j}\frac{\partial \tilde{x}_{k}^{i}}{\partial q_{j}}=-\frac{1}{\lambda }\left[ \frac{1}{N}\sum _{i}D_{c}x_{k}^{i}+\frac{1}{N}\sum _{i}\sum _{j}D_{c}q_{j}\frac{\partial \tilde{x}_{k}^{i}}{\partial q_{j}}\right] +\frac{1}{\lambda }\mathrm{cov}\left( \gamma _{P}^{i},x_{k}^{i}\right) . \end{aligned}$$
(15)

In this formulation, the left-hand side is the same as in the welfarist case and it reflects the aggregate compensated change in the demand of good k. The first two terms in the square brackets at the right-hand side capture the impacts of tax changes on poverty: the first term is the direct impact of the price change (keeping consumption unaffected) on measured poverty, whereas the second depends on the behavioral shift in consumption. Multiplied by the minus sign, the former term implies that the consumption of the good should be encouraged, whereas if demand decreases when the prices increase, the latter term actually serves to discourage consumption. The last term on the right reflects the same principles as the covariance rule in Eq. (13), the correlation of the net poverty impact of income and the consumption of the good in question. That is, the covariance part of the tax rule moves the tax rule in the direction of favoring goods that have high poverty reduction impact on the poor (i.e., that the poor consume more).

The key lesson to note from the optimal commodity tax rule in the poverty minimization case is that the conventional conditions for uniform commodity tax to be optimal are not valid anymore. The reason is that even if demand was separable from labor supply, the first term on the right still remains in the rule, and its magnitude clearly varies depending on the quantity of good consumed. Thus, income transfers are not sufficient to alleviate poverty when the government aims to minimize poverty that depends on disposable income. The intuition is very simple: commodity tax changes have a direct effect on the purchasing power of the consumer, and these depend on the amount consumed. The extent of encouraging the consumption of the goods is the greater, the larger is their share of consumption among the consumption bundles of the poor. The result resembles that of Pirttilä and Tuomala (2004), meaning that the intuition from optimal nonlinear income taxation under poverty minimization carries over to linear income taxation. A formal proof is provided in Appendix 1.

In sum, the rule for optimal commodity taxation is changed when we shift from welfare maximization to poverty minimization. The welfarist rule reflects a fairly straightforward trade-off between efficiency (tax revenue) and equity (distributional impacts). The poverty-minimizing commodity tax rule brings new terms; the interrelations of which are not easy to disentangle. It, however, also takes into account efficiency considerations (tax revenue through indirect labor supply effects) and equity (direct impact of the taxed good on poverty and indirect impact via labor supply effects). Most importantly, the conventional wisdom of when uniform commodity taxation is sufficient fails to hold in the poverty minimization case. Thus, observed commodity subsidies in developing countries, such as fuel or food subsidies, can be considered optimal given the preference for poverty minimization.Footnote 11 In practice, it would be wise to limit the number of differentiated commodity tax rates to a few essential categories such as fuel and food, in order to keep the administrative complexity at a minimum.

5 Poverty minimization in the presence of an informal sector

An important issue for a developing country attempting to collect taxes is the issue of a large informal sector. If part of tax revenue is lost due to tax evasion in the informal sector, which is likely to be the case in the less developed economies, then the income transfer is reduced and redistributive targets may not be met. In this section, we discuss the implications of informality for optimal redistributive policies for a government wishing to minimize poverty.Footnote 12 The results can thus be contrasted to those obtained in previous sections.

Following Kanbur (2015) and Kanbur and Keen (2014), informal operators can be categorized as those who should comply with regulations but illegally choose not to, and those who legally remain outside regulation, e.g., due to the smaller size of operations (either naturally or by adjusting size as a response to regulation). For our purposes, however, it is enough to lump these categories into one “informal sector,” where it is possible to avoid taxes at least to some extent. It is also possible for workers to work in both sectors, such that part of total income is declared for taxation and part is evaded (consider, e.g., supplementing official employment income with street vendoring). Note also that especially in the case of agriculture, evasion can also consist of home production. In this case, the reason for “informality” would be the small size of the producing entity, such that they are naturally not liable for taxes. Production for own consumption is, however, still relevant for the well-being and measured poverty of the family.

In this application, we follow the approach pioneered in Besley and Persson (2013). They work with a model that fits into the description above, where part of the tax base evades taxes. We thus take informality as given, and do not consider whether informality is “natural,” illegal or a response to taxation. Furthermore, this intensive margin model (what extent of income is earned in the informal sector), they argue, yields essentially similar results as an extensive margin model (whether to participate in the formal job market).

Consider the case of income taxation. We can incorporate informality into the model by noting that people can shelter part e of their labor income from taxation. The extent of evasion is assumed to increase when the tax rate goes up, and thus \(\frac{\partial e}{\partial a}<0\). Income taxes are only paid from income \(z^{i}-e^{i}\). It is noteworthy that for a government wishing to minimize income poverty, this is in fact beneficial: disposable incomes rise. The more this effect is concentrated among the poor who enter the deprivation index, the better. Individual consumption is now \(z^{i}-\tau (z^{i}-e^{i})+b=e^{i}+a(z^{i}-e^{i})+b\). On the other hand, tax collections are reduced: the budget constraint becomes \((1-a)\sum (z^{i}-e^{i})=Nb+R\). Our formulation follows that of Besley and Persson (2013), but we simplify it in order to explicitly consider the problem of optimal taxation, whereas they focus on the issue of investments in the state’s fiscal capacity (we abstract from this issue here and take evasion as given).Footnote 13 The framework, however, nicely captures the essential trade-offs a government faces when there is tax evasion.

The government now minimizes the Lagrangian \(L=\sum D\left( e^{i}+a(z^{i}-e^{i})+b,\bar{c}\right) +\lambda ((1-a)\sum (z^{i}-e^{i})-Nb-R)\). The first-order condition with respect to the net-of-tax rate is:

$$\begin{aligned}&\sum D_{c}\left( \frac{\partial e^{i}}{\partial a}+z^{i}-e^{i}+a\left( \frac{\partial z^{i}}{\partial a}-\frac{\partial e^{i}}{\partial a}\right) \right) \nonumber \\&\quad =\lambda \left( \sum (z^{i}-e^{i})-(1-a)\sum \left( \frac{\partial z^{i}}{\partial a}-\frac{\partial e^{i}}{\partial a}\right) \right) , \end{aligned}$$
(16)

whereas, under the assumption that there are no income effects in evasion, the first-order condition with respect to b stays the same. From here, we can derive a rule for the optimal tax following the same steps as in Sect. 2.2:

$$\begin{aligned} \frac{\tau ^{*}}{1-\tau ^{*}}=\frac{1}{\varepsilon ^{e}}\left( 1-\frac{\tilde{D}^{e}}{\bar{z}^{e}}\right) , \end{aligned}$$
(17)

where now \(\varepsilon ^{e}\) is a tax elasticity of the net-of-evasion tax base \(\bar{z}^{e}=\bar{z}-\bar{e}\) and \(\tilde{D}^{e}\) represents the relative impact of taxes and transfers on the deprivation index (see Appendix 1 for further detail). The rule represents a trade-off between poverty reduction and efficiency, both of which are now altered by evasion. There is a pressure toward lower tax rates, as now distortions of taxation are increased by evasion behavior, so \(\varepsilon ^{e}>\varepsilon \). Contrary to this effect, \(\tilde{D}^{e}\) is reduced compared to \(\tilde{D}\) because reducing taxes (increasing a) is now a less useful instrument for poverty reduction, as part of the taxes have been evaded. As \(\frac{\partial e}{\partial a}<0\), people pay more taxes when tax rates are reduced, and therefore poverty in fact increases. \(\tilde{D}^{e}\) thus works to increase tax rates.

Therefore, an interesting trade-off arises: informality increases the cost of raising taxes, but it also means that higher taxes are less harmful as those in the informal sector do not need to pay them (and they are still entitled to the lump-sum transfer).Footnote 14 These countervailing forces have not been noted by the literature before. The presence of informality therefore seems to give rise to tax policy rules that are far from trivial. Future work could also look more deeply into the issue of the tax mix in the presence of informality. If income tax is more easily evaded than commodity taxation, as Boadway et al. (1994) suggest, this could give rise to policies that focus taxation and redistribution on commodity taxes and subsidies, instead of income taxes and lump-sum transfers. Slemrod and Gillitzer (2014) have also suggested focusing on a “tax systems approach” and including, among other things, evasion behavior into optimal taxation analysis to obtain more useful prescriptions for actual tax policy. This topic certainly deserves a more detailed analysis.

6 A numerical illustration

To further illustrate the differences of tax rates under poverty minimization and welfarism, we provide a simple numerical simulation. Here we concentrate on the special case where there are no income effects on labor supply and the elasticity of labor supply with respect to the net-of-tax wage rate is constant. If \(\varepsilon \) denotes this elasticity, the quasi-linear indirect utility function is given by \(v(w(1-\tau ),b)=b+\frac{\left[ w(1-\tau )\right] ^{1+\varepsilon }}{1+\varepsilon }\), so that \(\varepsilon \) is constant. Like most work on optimal nonlinear and linear income taxation, we use the lognormal distribution \(\ln (n,m\sigma ^{2})\) to describe the distribution of productivities with support \(\left[ 0,\infty \right) \) and parameters m and \(\sigma \) (see Aitchison and Brown 1957). The first parameter, m, is the log of the median wage. The second parameter, the variance of log wage \(\sigma ^{2}\), is itself an inequality measure. As is well known, the lognormal distribution fits reasonably well over a large part of the income range but diverges markedly at both tails. The Pareto distribution in turn fits well at the upper tail. We also use the two-parameter version of the Champernowne distribution (known also as the Fisk distribution). This distribution approaches asymptotically a form of Pareto distribution for large values of wages but it also has an interior maximum. In our simulations, the revenue requirement is set to zero; thus, the system is purely redistributive.

To illustrate the poverty-minimizing tax formula in (3), we also need to specify a measure of poverty. Typically, poverty indices consist of computing some average measure of deprivation by setting individual needs as defined above at the agreed upon poverty line \(\bar{c}\). For this purpose, we take a poverty index of the form developed by Foster et al. (1984). They have proposed defining a poverty index as the average of these poverty gaps across individuals raised to some power \(\alpha \). When \(\alpha =1\), it is just the proportion of units below the poverty line multiplied by the average poverty gap. (See Appendix 2 for more details.) We consider the cases where either 30 or 40% of the population lie below the poverty line.

The results from the simulation of the optimal tax when the government minimizes the poverty gap for the lognormal case are presented in Table 1. Results are shown for two different values of labor supply elasticity \(\varepsilon \), two different values regarding income dispersion \(\sigma \), and two values of the share of population below the poverty line \(F(\bar{w})\). The tax rates are high, above 60%, for all the combinations of parameter values.Footnote 15

Comparing these results to the welfarist case is not straightforward, as those depend on the chosen welfare function. We adopt a constant relative inequality aversion form of the welfare function: the contribution to social welfare of the ith individual is \(\frac{w_{i}^{1-\eta }}{1-\eta }\), where \(\eta \) is the constant relative inequality aversion coefficient. Hence, the social marginal value of income to an individual with wage rate w is proportional to \(w^{-\eta }\). Using the property of the lognormal distribution \(\ln (E(w^{s}))=sm+s^{2}\frac{\sigma ^{2}}{2}\), we can calculate the optimal tax rate from the following formula: \(\frac{\tau }{1-\tau }=\frac{1}{\varepsilon }[1-e^{-\eta (1+\varepsilon )\sigma ^{2}}]\). Or, using the property of the lognormal distribution that \(\ln (1+cv^{2})=\sigma ^{2}\), where cv is the coefficient of variation, we can rewrite \(\tau =\frac{1}{1+\varepsilon /[1+cv^{2}]^{-\eta (1+\varepsilon )}}\).

A wide range of values for the inequality aversion parameter \(\eta \) have been employed in the literature, varying typically from 0.5 to 2. Note that, as discussed in Sect. 2.1, as \(\eta \rightarrow \infty \), social preferences approach “maximin” preferences, where the optimal tax rate is the same as the revenue-maximizing tax rate, \(\tau =\frac{1}{1+\varepsilon }\), which does not depend on the original income distribution. Naturally, if there is no regard for inequality in the society, \(\eta =0\) and \(\tau =0\). Table 2 displays the welfaristic tax simulation results for two different values of labor supply elasticity \(\varepsilon \), for two different values of income dispersion \(\sigma \), and for five different values of inequality aversion \(\eta \).

The simulation results illustrate clearly that at conventional inequality aversion levels, optimal welfaristic tax rates lie well below the poverty-minimizing rates. Only as inequality aversion becomes extremely high do the welfaristic rates approach the poverty-minimizing ones. With poverty minimization as the social objective, optimal tax rates are close to the revenue-maximizing “maximin” rate.

Another point of comparison could be the welfaristic linear tax simulations of Stern (1976). His calculations differ from ours as he incorporates income effects and a non-constant elasticity of labor supply with respect to the tax rate.Footnote 16 With the elasticity of substitution between consumption and leisure at 0.5 and income dispersion described by \(\sigma =0.39\), as concern for inequality rises from low to medium and high, he finds tax rates rising from 19 to 43 and 48%. The extreme “maximin” result is 80%. These tax rates are also clearly lower than the poverty-minimizing rates, except at very extreme values of inequality aversion.

These numerical examples and Stern’s (1976) results tend to suggest that the tax rates for the poverty minimization case are likely to be higher than for many welfarist examples. The results compare to Kanbur et al. (1994), who also found that the (nonlinear) marginal tax rates on the poor are fairly high under the poverty minimization objective. Both their and our results are interesting from the point of view that the analytical formulae for the optimal tax rate include a term that, ceteris paribus, encourages labor supply, but in computational results its influence is offset, most likely, by the need to minimize the poverty gap. The higher the poverty rate, the higher the lump-sum grant financed by these taxes needs to be, in order to raise more people out of poverty.

Table 1 Simulated tax rates for poverty minimization under different values of \(\varepsilon \), \(\sigma \), and \(F(\bar{w})\)
Table 2 Simulated tax rates in the welfaristic case under different values of \(\varepsilon \), \(\sigma \), and \(\eta \)

7 Conclusion

This paper examined optimal linear income taxation, public provision of public and private goods and the optimal combination of linear income tax and commodity taxes when the government’s aim is to minimize poverty. The linear tax environment was chosen because such taxes are more easily implementable in a developing country context and since optimal linear tax rules are seen to provide similar intuition as the more complex nonlinear tax formulas.

The results show that the linear income tax includes additional components that work toward lowering the marginal tax rate. This result arises from the goal to boost earnings to reduce income poverty. Unlike in the optimal nonlinear income tax framework, this lower marginal tax affects all taxpayers in the society. However, the numerical simulations offered suggest that this mechanism is offset by the distributive concerns and in practice the optimal tax rates for poverty minimization appear high. Public good provision in the optimal tax framework under poverty minimization was shown to depend on the relative efficiency of public provision versus income transfers in generating poverty reductions. One particular avenue where public provision is useful is via its potentially beneficial impact on individuals’ earnings capacity. Thus, public provision can be desirable even if its direct welfare effects were non-existent.

Perhaps more importantly, poverty minimization as an objective changes completely the conditions under which uniform commodity taxation is optimal. When the government’s objective is to minimize poverty that depends on disposable income, uniform commodity taxation is unlikely to be ever optimal: this is because the commodity tax changes have first-order effects on consumers’ budget via the direct impact on the cost of living, and this direct effect depends on the relative importance of different goods in the overall consumption bundle. Separability in demand coupled with linear Engel curves is not sufficient to guarantee optimality of uniform commodity taxes. In reality, the administrative difficulties of implementing commodity taxation with many tax rates must, of course, be taken into account, as well.

We also examined the implications of the presence of an informal sector for optimal tax and transfer policies. The results revealed that when the government is concerned about income poverty, the presence of the informal sector is, on the one hand, useful, as it reduces the poverty-increasing effect of higher taxes but, on the other hand, it is also costly since it is likely to increase the elasticity of the tax base. Examining the implications of informality on the role of other instruments of government policies is an important avenue for future work.

Another strand of follow-up work should address the question of complementary policies for redistribution, such as minimum wages. It should be borne in mind that different policies impose different requirements on administrative capacity,Footnote 17 and examining which poverty reduction instruments become available only as the societies advance on their development path is an interesting avenue for further work.