On redistributive taxation under the threat of high-skill emigration

The increasing international mobility of high-skill individuals is often seen as posing a threat to domestic social welfare, by limiting the ability of governments to tax these individuals and redistribute to the poor. In this paper, we examine a simple dynamic nonlinear income tax model without commitment. In this setting, it is shown that the threat of emigration by high-skill individuals facilitates redistribution and increases social welfare in the short-run, and has no effect on social welfare over the long-run.


Introduction
Optimal tax analyses typically assume that individuals cannot emigrate to avoid domestic taxation. Such an assumption is, however, increasingly viewed as being unrealistic, especially as it relates to high-skill individuals. When high-skill individuals are immobile, redistributive taxation must take into consideration that these individuals may change their labour supply along the intensive margin. That is, high-skill individuals may work less, thus reducing the amount of income available for redistribution. When high-skill individuals are internationally mobile, they have the additional option of changing their labour supply along the extensive margin, i.e., they may emigrate. An often-employed method to capture the threat of emigration is to introduce typedependent participation constraints into the optimal tax problem; see, e.g., Osmundsen (1999), Krause (2009a), Simula andTrannoy (2010, 2012), and Lehmann et al. (2014). Naturally, these additional constraints reduce the level of social welfare attainable.
In this paper, we introduce the threat of emigration by high-skill individuals into a two-period nonlinear income tax model without commitment. As in the related literature, this threat is captured by introducing a participation constraint for high-skill individuals. However, we show that the introduction of the high-skill type's participation constraint facilitates redistribution and increases social welfare in the short-run, and has no effect on social welfare over the long-run. One may wonder how it is possible that the introduction of a binding constraint into an optimisation problem does not reduce the maximised value of the objective function. The participation constraint appears in period 2 and it does reduce the maximised value of the objective function (social welfare) in that period, but the specific manner in which social welfare is reduced-by limiting redistribution-enables a higher level of social welfare to be obtained in period 1. Thus social welfare summed over both periods remains unchanged. The intuition can be summarised as follows. In period 1 the government does not know each individual's skill type, and therefore implements standard (incentive-compatible) nonlinear income taxation which induces individuals to reveal their type. However, high-skill individuals know that if they reveal their type in period 1, they lose their information advantage and will be subjected to first-best taxation in period 2. This means that high-skill individuals must be offered a very attractive tax treatment in period 1 to reveal their type, to compensate them for having to face first-best taxation in period 2. Accordingly, the ability of the government to redistribute in period 1 is severely limited. However, the threat of emigration reduces the extent of redistribution possible in period 2, meaning that high-skill individuals require less compensation in period 1 to reveal their type. This enables the government to implement more redistribution in period 1, which correspondingly increases first-period social welfare. The threat of emigration limits redistribution and reduces social welfare in period 2, but optimal taxation balances the short-run benefits against the long-run costs. Thus the threat of emigration has no effect on the level of social welfare summed over both periods.
In terms of previous results, the paper most closely related to ours is that by Leite-Monteiro (1997). He also finds that increased international mobility may enhance redistribution, but his model is entirely different to ours. In Leite-Monterio's model, the government can always implement first-best personalised lump-sum taxes, and changes in the country's skill composition after migration takes place is what makes enhanced redistribution a possibility. By contrast, in our model the participation constraint ensures that no one migrates, so the skill composition remains unchanged. Our paper is also related to the literature on dynamic Mirrlees (1971) nonlinear income taxation without commitment, e.g., Roberts (1984), Apps and Rees (2006), Brett and Weymark (2008a), Krause (2009b), Berliant and Ledyard (2014), and Guo and Krause (2011, 2013, 2014, 2015a. In dynamic Mirrlees models, the question arises as to whether the government can or cannot commit to not using skill-type information revealed by individuals in earlier periods when it implements taxation in latter periods. A common theme in the literature is the highlighting of how different and counter-intuitive optimal policy can be when the commitment assumption is relaxed.
The present paper provides another example of a result that, at first glance, appears quite counter-intuitive.
The remainder of the paper is organised as follows. Section 2 outlines the model and the structure of optimal taxation, while Sect. 3 presents and discusses our result. Section 4 discusses other possible solutions to the optimal tax problem, and what these would imply for our result. Section 5 concludes, and the proof of our result is contained in an appendix.

A simple model
We consider a two-period model with a unit measure of individuals and with the following timing. In period 1 the government knows that there are φ ∈ (0, 1) high-skill individuals and (1 − φ) low-skill individuals in the economy, but it does not know any individual's skill type. The government therefore implements standard secondbest (incentive-compatible) nonlinear income taxation, under which each individual is willing to reveal their type. Then, in period 2, the government knows each individual's skill type, and is tempted to use this information to implement first-best redistributive taxation. However, high-skill individuals have the option of emigrating, so redistribution is limited by the high-skill type's participation constraint. We assume that emigration is only possible in period 2. The implications of this assumption are discussed in Sect. 4, though it is consistent with the observation that individuals tend to be immobile in the short-run (period 1), but mobile in the long-run (period 2). For simplicity we assume that individuals do not save or borrow, so the only link between the periods is the revelation and use of skill-type information.
Specifically, the government in period 2 solves the following problem. Choose tax treatments c 2 L , y 2 L and c 2 H , y 2 H for the low-skill and high-skill individuals, respectively, to maximise: subject to: where c t i is type i's consumption (or post-tax income) in period t, y t i = w i l t i is type i's pre-tax income in period t, with w i denoting type i's wage rate and l t i denoting type i's labour supply in period t. It is assumed that w H > w L > 0 and that wages remain constant over time. Equation (2.1) is a utilitarian social welfare function, where u(·) is increasing and strictly concave and the individuals' utility function is quasi-linear in labour. 1 Equation (2.2) is the government's budget constraint, 2 and Eq. (2.3) is the high-skill type's participation constraint. High-skill individuals can obtain a utility level of V H by emigrating, which is their reservation utility. We assume that V H is sufficiently high such that Eq. (2.3) is binding, i.e., the threat of emigration is effective. The solution to programme (2.1)-(2.3) yields the value function W 2 (φ, w L , w H , V H ), which is the level of social welfare attainable in period 2.
In period 1 the government does not know each individual's skill type, and therefore implements second-best (incentive-compatible) nonlinear income taxation. It chooses tax treatments c 1 L , y 1 L and c 1 H , y 1 H for the low-skill and high-skill individuals, respectively, to maximise: subject to: come from the solution to programme (2.1)-(2.3). We use δ ∈ (0, 1) to denote the discount factor. Equation (2.4) is the first-period utilitarian social welfare function. The government in period 1 might care about the level of social welfare in period 2, but its choice of c 1 L , y 1 L and c 1 H , y 1 H cannot affect the level of social welfare attainable in period 2, which is W 2 (φ, w L , w H , V H ). Therefore, the government in period 1 acts as if maximising only first-period social welfare. Equation (2.5) is the government's firstperiod budget constraint, while Eq. (2.6) is the high-skill type's incentive-compatibility constraint.
In order for a high-skill individual to be willing to choose c 1 H , y 1 H in period 1 and thus reveal their type, the utility they obtain from this choice in period 1 plus the utility they obtain in period 2 (which will be their reservation utility, V H ), must be greater than or equal to the utility they could obtain by mimicking. A high-skill individual has two mimicking strategies. First, a high-skill individual could choose the low-skill type's tax treatment in period 1-effectively announcing to the government that they are low-skill-and then not emigrate in period 2. In this case, because in period 1 they announced that they are low-skill, the government in period 2 will treat them as low-skill and give them the low-skill type's tax treatment. The mimicking high-skill individual would therefore obtain utility level U 2 H in period 2. Second, a high-skill individual could choose the low-skill type's tax treatment in period 1 and then emigrate in period 2, to avoid being treated as low-skill in period 2. In this case, they obtain utility level V H in period 2. Which of these mimicking strategies is best depends upon U 2 H and V H . The first is better when U 2 H > V H , and the second is better w H , which is the same as the condition required for incentive compatibility in static models. However, as we explain below (in Sect. 3.1), U 2 H > V H will hold unless V H is so high as to undo a key feature of dynamic nonlinear taxation without commitment.
We make the standard assumption that any individual who is indifferent between truthfully revealing their type and mimicking will tell the truth. Therefore, no mimicking occurs in the solution to the optimal tax problem. We also omit the low-skill type's incentive-compatibility constraint, because we make the common assumption that the redistributive goals of the government create an incentive for high-skill individuals to mimic low-skill individuals, but not vice versa. 3 As discussed earlier, emigration is only possible in period 2, so a participation constraint does not appear in period 1. The solution to programme (2.4)-(2.6) yields the value function , which represents the level of social welfare attainable in the first period.

The threat of emigration and social welfare
It is shown in the appendix that: In our dynamic nonlinear income tax model without commitment, the threat of high-skill emigration increases social welfare in period 1 (∂ W 1 (·)/∂ V H > 0), decreases social welfare in period 2 (∂ W 2 (·)/∂ V H < 0), and has no effect on social welfare summed over the two periods (∂ W 1 (·)/∂ V H + δ∂ W 2 (·)/∂ V H = 0).
The proposition implies that the threat of high-skill emigration facilitates redistribution and increases social welfare in the short-run, and has no effect on the level of social welfare attainable over the long-run. The intuition follows from a key feature of dynamic nonlinear income taxation without commitment. In the absence of a participation constraint, high-skill individuals know that if they reveal their type in period 1, they will be subjected to first-best redistributive taxation in period 2. Therefore, highskill individuals must be offered an attractive tax treatment in period 1 if they are to be willing to reveal their type, to compensate them for the unattractive tax treatment they will face in period 2. This restricts the government's ability to redistribute and raise social welfare in period 1. Now consider the effects of the participation constraint. The participation constraint means that high-skill individuals obtain more utility in period 2, because they will emigrate unless they receive at least their reservation utility. Therefore, high-skill individuals now have less incentive to conceal their type in period 1. Accordingly, the government can now offer high-skill individuals a less attractive tax treatment in period 1 and still obtain skill-type information. Taken together, this implies that social welfare is lower in period 2 (due to the participation constraint), but higher in period 1 (as the incentive-compatibility constraint is relaxed). Moreover, social welfare summed over the two periods remains unchanged, because optimal taxation balances the short-run benefits vis-a-vis relaxation of the incentive-compatibility constraint against the long-run costs of the participation constraint.

A numerical example
As our proposition assumes that U 2 H > V H , in this subsection we provide a numerical example of our result. We also explain why our assumption, and hence our result, will be valid unless V H is so high as to undo a key feature of dynamic nonlinear taxation without commitment.
In the numerical example we assume that the utility function takes the form ln(c t i )− l t i . This is based on Chetty (2006), who concludes that a reasonable estimate of the coefficient of relative risk aversion is one (log utility). The OECD (2014) reports that approximately one-third of adults have attained tertiary level education. We assume that these individuals are high-skill and the remainder are low-skill, i.e., we set φ = 1/3. Fang (2006) and Goldin and Katz (2007) estimate that the college wage premium is approximately 60%. We therefore normalise the low-skill type's wage rate to unity (w L = 1), and set the high-skill type's wage rate at w H = 1.6. Following common practice, we assume an annual discount rate of 4%. However, as most individuals work for around 40 years of their lives, we take each period to be 20 years in length. An annual discount rate of 4% then implies a 20-year discount factor of δ = 0.456. Table 1 reports the parameter values used in the numerical example, as well as the results for the cases of V H = −1.0 and V H = −0.9. It can be seen that as the value of the participation constraint is increased from V H = −1.0 to V H = −0.9, the discounted sum of social welfare over the two periods remains unchanged (at −1.208). Social welfare increases in period 1 and decreases in period 2. These findings are an example of our proposition, since the assumption L , y 1 L ). As discussed earlier, a key feature of dynamic nonlinear taxation without commitment is that high-skill individuals know that they will be subjected to first-best taxation in period 2 if they reveal their type in period 1. Accordingly, high-skill individuals must be offered an attractive tax treatment in period 1 to reveal their type, as compensation for the unattractive tax treatment they will receive in period 2. Therefore, compared to static settings, c 1 H , y 1 H will be favourable and c 1 L , y 1 L will be unfavourable. That is, L w H would hold in static settings. It is only if V H was so high as to undo this general feature of dynamic nonlinear taxation without commitment that our assumption would no longer be valid.

A general condition
In this subsection, we provide a general condition under which U 2 H > V H , assuming log utility. It is shown in the appendix that: (3.1) For any particular value of the participation constraint, Eq. (3.1) can be used to check if U 2 H > V H . For example, one can check that (3.1) is satisfied for the parameters used in our numerical example.

Discussion and some caveats
Our proposition is driven by an important feature of dynamic nonlinear income taxation without commitment, namely, that high-skill individuals must be offered an attractive tax treatment in period 1 to reveal their type. This restricts redistribution in period 1 and reduces social welfare. In fact, it is possible that the welfare cost is sufficiently severe that it would be better if the government did not induce skill-type revelation. That is, it may be optimal for the government to implement pooling taxation. This involves the government offering a single tax treatment to all individuals, and therefore skill-type information is not revealed.
Our analysis implicitly assumes that pooling taxation is not optimal. If pooling taxation is optimal, our result no longer holds. However, the question arises as to how likely it is that pooling becomes optimal. Pooling taxation is quite severe in that it imposes the same tax treatment on everyone. Guo and Krause (2015b) show, for a range of empirically-plausible parameter values, that pooling taxation is not optimal in either two-period or infinite-horizon settings. Indeed, they show that even the autarkic equilibrium yields a higher level of social welfare than does pooling taxation. Therefore, while it is possible that pooling taxation is optimal, it is less likely than regular separating taxation (under which our result holds).
Perhaps a more serious caveat follows from our assumption that the participation constraint appears only in period 2. If one assumes that high-skill individuals are immobile in the short-run and mobile in the long-run, then our assumption seems reasonable. However, if a participation constraint for high-skill individuals was introduced into period 1, and it rendered the incentive-compatibility constraint slack, then our result would no longer hold. For this to occur, the value of the first-period participation constraint would have to be quite high, since incentive-compatibility requires that high-skill individuals be offered a generous first-period tax treatment (as discussed earlier). Nevertheless, it is worth noting that our result would not hold in that case. It is also worth noting that for sufficiently high values of the participation constraint, it might be better to simply let high-skill individuals emigrate. But we do not consider that possibility here.
Finally, nonlinear income taxation a la Mirrlees (1971) can be viewed as an application of principal-agent theory, so one may wonder if an analogue of our result would apply in other applications of principal-agent theory in dynamic settings. For example, repeated contracting vis-a-vis insurance, regulation, or employment agreements. 4 An important difference is that in nonlinear income taxation, the principal (the government) seeks to maximise the aggregate welfare of the agents (the individuals), rather than having its own separate objective. That said, in all dynamic applications without commitment, the agent-type with the information advantage must be compensated to give-up that advantage. So it seems possible that an analogue of our result may apply in these other applications as well.

Conclusion
In this paper we have shown, using a simple dynamic nonlinear income tax model without commitment, that the threat of high-skill emigration can increase social welfare in the short-run and have no effect on social welfare over the long-run. While this result may be viewed as being primarily of theoretical interest, it also provides a new example of how restrictions on policy instruments that are clearly welfare-reducing when the government can commit, may in fact be beneficial when the government cannot commit.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix
Proof of the Proposition The Lagrangian corresponding to programme (2.1)-(2.3) is: where λ 2 > 0 and α 2 H > 0 are Lagrange multipliers. The first-order conditions are: By the Envelope Theorem: where the last equality follows from (A.3) and (A.5). This shows that ∂ W 2 (·)/∂ V H < 0.
Under the assumption that U 2 H > V H , the Lagrangian corresponding to programme (2.4)-(2.6) is: where λ 1 > 0 and θ 1 H > 0 are Lagrange multipliers, and use is made of the fact that U 2 H = u(c 2 L (·)) − y 2 L (·) w H . The first-order conditions on y 1 L and y 1 H are, respectively: By the Envelope Theorem: where the last equality makes use of (A.2) and (A.3). From (A.2) and (A.3) we also obtain: Likewise, from (A.3)-(A.5) we obtain: From (A.7) and using (A.14) we obtain: From (A.6) and using (A.13)-(A.15) we obtain: Equation (A.12) can now be simplified to: where use has been made of (A.10) and (A.11). This shows that ∂ W 1 (·)/∂ V H > 0.
L w H . Using (A.18) and after some algebraic manipulation, it can be shown that: which is Eq. (3.1).