Malas Notches

This paper shows that the suﬃcient statistic approach to the welfare properties of income (and other) taxes does not easily extend to tax systems with notches, because with notches, changes in bunching induced by changes in tax rates have a ﬁrst-order eﬀect on tax revenues. In an income tax setting, we show that the marginal excess burden (MEB) of a change in the top rate of tax is given by the Feldstein (1999) formula for the MEB of a proportional tax, plus a correction term. These correction terms cannot be calculated just from knowledge of the elasticity of taxable income and quantitatively, they can be large. An application to VAT is discussed; with a calibration to UK data, the MEB of the VAT is roughly three times what is would be if VAT was simply a proportional tax.


Introduction
In a recent survey, Chetty (2009b) argues that an important new development in public economics is the so-called sufficient statistic approach, which "derives formulas for the welfare consequences of policies that are functions of high-level elasticities rather than deep primitives" (Chetty (2009b), p 451). In turn, this means that to assess the welfare properties of these policies, only these elasticities, rather than fully structural models, need to be estimated. 1 The sufficient statistic approach originated in a seminal paper by Feldstein (1999), who showed that the marginal excess burden (MEB) of a proportional income tax only depends on the behavioral responses to the tax via a sufficient statistic, the personal elasticity of taxable income (ETI). Feldstein's paper has given rise to a large literature devoted to obtaining empirical estimates of the ETI (Gruber and Saez (2002), Saez et al. (2012), Kleven and Schultz (2014), Weber (2014)).
Subsequently, Saez (2001) and Saez et al. (2012) showed that the Feldstein formula for the MEB could be extended to the top rate of tax in a progressive piece-wise linear income tax system, and they also established formulae for the revenue and welfare-maximizing rate of tax. These formulae also have the sufficient statistic feature; specifically, they depend only on the elasticity of the ETI, a statistic of the income distribution, which is constant if the top tail of the income distribution is Pareto 2 , and possibly a welfare weight.
In this paper, we ask the question as to whether these sufficient statistic properties of key formulae also extend to tax systems with notches. Generally, a tax notch occurs when there is a discontinuous change in the tax liability as the tax base varies (Slemrod (2013), Kleven (2016)).
In practice, we do see notches in several major kinds of taxes, and these are being increasingly studied in the empirical literature. For example, in Pakistan, there are notches of up to 5% in the personal income tax (Kleven and Waseem (2013)), and in Ireland, an emergency income levy after the financial crisis had a notch of up to 4% (Hargaden (2015)) 3 . There are small notches in the federal income tax in the US, and larger notches induced by income-dependent entitlement to tax credits (Slemrod (2013)). In Germany, there is a large notch in income tax generated by the Mini-Job program (Tazhitdinova (2018)). 4 1 Chetty (2009a) also argues that this sufficient statistic approach is also valuable in several other contexts, such as evaluating the welfare gain from social insurance programs, and the welfare effects of changes in taxes with optimization frictions. 2 The formula is that the marginal excess burden equals tea 1−t−te , where t is the rate of tax, e is the personal elasticity of taxable income with respect to the net of tax rate 1−t, and a is the Pareto parameter.
3 From Table 1 of Hargaden (2015), in 2010, earnings of above 26000 Euro incurred a charge of 1040 Euro. 4 This is aimed at increasing the labor supply of low-income individuals: earnings below the mini-job threshold, are exempt from income tax and the employee portion of social security taxes, while earnings Notches also exist in other major taxes. For example, notches are, or were until recently, present in housing transactions taxes in the UK and the US (Best and Kleven (2013), Kopczuk and Munroe (2015)). They also arise in the corporate income tax in Costa Rica (Bachas and Soto (2015)). Slemrod (2013) notes that there are many examples of commodity tax notches, where a marginal change in some characteristic can change the product classification so as to produce a discrete change in the tax liability. 5 Finally, as argued by Liu and Lockwood (2015), a VAT threshold can be thought of as a tax notch; a firm's VAT liability changes discontinuously when its sales go over the registration threshold. Indeed, given the importance and near-ubiquity of VAT, this is in fact the most important example of a tax notch.
We first study notches in the income tax setting of Saez (2010) and others, where households differ in ability or taste so that the disutility of generating taxable income varies across households. For simplicity, we assume a two-bracket tax i.e. a tax with a lower rate below a threshold, and a higher rate above. In this setting, our first contribution is to derive an exact formula for the marginal excess burden (MEB) of the higher rate of tax, This formula similar to Feldstein (1999)'s formula for the MEB of a proportional income tax, but includes a correction factor that captures the effect of the bunching response to an increase in the top rate tax on tax revenue.
The bunching response measures the change in the number of households bunching at the threshold to avoid paying the top rate of tax, and is thus distinct from the intensive margin response of taxable income of given household to the tax rate, the elasticity of taxable income, or ETI. With a notch, unlike the case of a kink, the bunching response affects tax revenue because with a notch, the tax schedule is discontinuous at the threshold. Specifically, an increase in the top rate of tax increases bunching, which -due to the notch -lowers tax revenue, and thus raises the MEB.
Moreover, this correction factors depends on an underlying variable C that cannot be expressed as a simple function of the usual sufficient statistics i.e. the ETI and the Pareto parameter of the upper tail of the income distribution. It does depend on these variables, but it also depends on the lower rate of tax, the position of the notch, and a counterfactual, i.e. the earnings that the individual at the top of the interval (the top buncher) would choose if faced with the higher rate of tax. So, the sufficient statistic approach seems to break down with tax notches.
However, all is not lost; we show that the counterfactual earnings of the top buncher can either be computed theoretically, using the indifference condition that the top buncher is indifferent between bunching and being above the notch, or, in any empirical study of bunching, it can be computed empirically, using the estimate of excess mass at the notch (the parameter B in Kleven and Waseem (2013)). Thus, this paper is the first to show above the threshold are not.
5 For example, in the US, the Gas Guzzler Tax, under which high-performance cars are subject upon initial sale to a per-vehicle tax that is higher, the lower is the fuel economy of the car.
how bunching estimates at notches can be used to make welfare calculations.
Of course, if the correction factor turns out to be small, the Feldstein formula still provides a good approximation to the MEB. Our third contribution is to investigate whether this is the case. Calibrations show that the percentage error from using the Feldstein formula for the MEB can be very large. At baseline values, the marginal excess burden is underestimated by a factor of six 6 . So, the conclusion is that at least in the income tax setting, the sufficient statistic approach is not practical.
We then turn to apply our approach to the VAT, which is the most empirically important example of a tax notch. We present a simple model of small traders who differ in productivity, and are subject to VAT at rate t above a threshold level of sales. We show that this model is formally equivalent to our income tax model, in the sense that registered firms above the threshold face an effective rate of VAT t R on value-added, and firms below the threshold face t N , a lower rate 7 .
We then show that the MEB of an increase in the statutory rate of VAT is given by the Feldstein formula for a proportional tax plus a correction factor as in the income tax case. However, the details of the correction factor are more complex, because an increase in the statutory rate t increases both the effective rates t R , t N . A calibration of the model shows that the proportional tax formula for the MEB of the VAT underestimates the true MEB by a factor of up to three. This framework also allows us to evaluate the effect of increased compliance costs of VAT in the MEB via its impact on bunching; increased compliance costs increase bunching and thus increase the MEB, but the effect is quantitatively small relative to the effect of the notch.
Finally, it should be noted that in this paper, we take all parameters of the tax system, including the notch, as given, and only vary the top rate of tax. A broader question, to be addressed in future work, is whether a notch can ever be part of an optimal tax system 8 .
The remainder of the paper is arranged as follows. After the literature review in Section 2, in Section 3, we set up the model. Section 4 has the main analytical results for the income tax, and Section 5 the simulations. Section 6 deals with the extension to the VAT, and Section 7 concludes. 6 We also show that using the formula for the optimal proportional tax substantially overestimates the true optimal top rate of tax with a notch. 7 It may seem counter-intuitive that non-registered firms face a positive rate of effective VAT; this is because non-registered firms cannot claim back VAT on inputs, so-called "embedded" VAT. 8 In the standard Mirrlees framework, where the tax is fully non-linear, this not the case; the optimal tax schedule is always continuous in income. However, where skills are continuously distributed, and the government is restricted to a finite number of tax rates, the answer to this question is less obvious. In fact, Blinder and Rosen (1985) note in the context of subsidies for charitable giving, with heterogenous tastes, sometimes a notch can improve on a linear subsidy.

Related Literature
This paper speaks to a number of related literatures. First, it is already known that due to externalities of one kind or another, the sufficient statistic approach has its limitations. Saez et al. (2012) give the examples of deductibility from income tax of charitable giving and mortgage interest payments for residential housing. In these cases, an increase in the marginal rate of tax will boost charity income and home ownership respectively, which may be valuable objectives in themselves. Saez et al. (2012) call these classical externalities 9 .
Fiscal externalities, where the actions of the household generate additional revenue for the government and thus benefits other households, can also cause the sufficient statistic approach to fail, or at least require adjustment, but in these cases a simple change to the formula is sometimes possible. The analysis of income tax evasion of Chetty (2009b) is a case in point 10 . As Gillitzer and Slemrod (2016) show, in this case the standard formula for the marginal efficiency cost of funds can be adjusted in the same way it must be adjusted for any fiscal externality, i.e. whenever a change in tax rates induces taxpayers to shift income to another tax. Our results are rather different to these cases of both classical and fiscal externalities. In our setting, there is no fiscal or other externality-rather, the sufficient statistic approach fails because the bunching response has a first-order effect on tax revenue.
A second related literature is on VAT. Here, there are two distinct sets of related papers. First, there is a growing literature on the effect of VAT thresholds on firm behavior. Theoretical contributions include Keen and Mintz (2004), Kanbur and Keen (2014) and Liu and Lockwood (2015), and empirical studies include Liu and Lockwood (2015) and Harju et al. (2016). The theoretical work of Kanbur, Keen and Mintz focusses on the optimal threshold of the VAT, holding the rate of tax fixed, and is thus complementary to this paper, which characterizes the MEB of an increase in the rate, holding the threshold fixed. In fact, we effectively ask the question of whether it is legitimate to ignore the threshold altogether when calculating the MEB of the VAT. Therefore, our paper relates to a literature on the marginal excess burden of indirect taxes, including VAT (e.g. Ballard et al. (1985), Rutherford and Paltsev (1999)). In these papers, when the marginal excess burden of VAT is calculated, it is always assumed that the VAT is a proportional tax i.e. the VAT threshold is ignored. This paper shows that this simplifying assumption yields seriously biased estimates. 9 See Doerrenberg et al. (2015) for a more formal statement of this argument, and estimates of how deductions respond to tax rate changes for the case of Germany. 10 Chetty shows that when the household can evade the personal income tax at a cost, if that cost is a pure transfer payment i.e. a fine times a probability of detection, there is effectively a positive fiscal externality of evasion -it generates additional revenue for the government and thus benefit for all households. In this case, as we might expect, we see that the elasticity of taxable income over-estimates the excess burden of the tax.
A third related literature is that on the MEB and welfare-maximizing taxes with kinks in the tax schedule. Here, we make a small contribution as a by-product of our main focus, which is on notches. In the case of kinks, it generally understood that the marginal excess burden of the top rate of income tax, and the welfare-maximizing top rate depends via simple formulae, only on the elasticity of the ETI, and the Pareto statistic of the income distribution. However, there seems to be some confusion about the conditions required for this result. Saez et al. (2012) suggest that what is required is that assumption that "behavioral responses take place only along the intensive margin", or more precisely that the bunching response of an increase in the top rate of tax is of second order relative to the extensive margin response. 11 This assumption is very strong, as even with a kink, there is always a bunching response. Our Proposition 1 below shows that this assumption is not necessary, because no matter what the size of the bunching response, the response has no effect on tax revenue, to first order, as the tax schedule is continuous. All that is required is that the distribution of taxpayer types is continuous, a standard assumption.

Set-Up
We follow Saez (2010) in our set-up. There are individual taxpayers indexed by a skill or taste parameter n ∈ [n, n], assumed continuously distributed in the population with distribution H(n) and density h(n). A type n individual has preferences over consumption c and taxable income z of the form where ψ(z; n) is the disutility of earning income z. So, as utility is linear in c, we are assuming away income effects. We also assume: A1 says that a higher n represents a higher skill level (i.e. higher wage), or a lower taste for leisure. In particular, the higher n, the lower the total and marginal disutility of generating a given amount of taxable income. Assumption A1 is satisfied for example, by the iso-elastic specification of Saez (2010): 11 Specifically, they say the following. "The change dt could induce a small fraction dN of the N taxpayers to leave (or join if dt < 0) the top bracket. As long as behavioral responses take place only along the intensive margin, each individual response is proportional to dt so that the total revenue effect of such responses is second order (dN.dt ) and hence can be ignored in our derivation."

The budget constraint is
Finally, for future reference, define the optimal taxable income at tax rate t for a type n taxpayer to be; Note from A1 that z 1−t , z n > 0, where subscripts denote derivatives. So, z 1−t is the response of taxable income to the net-of-tax rate. Following Saez et al. (2012), we call this the intensive margin response to the tax.

Kinks and Notches
For simplicity, we focus on a two-bracket tax, although our arguments apply straightforwardly to the case of the highest tax in a piecewise-linear tax system with any number of brackets. We will assume that the tax system is progressive; that is, the tax rate on incomes in the higher income bracket is strictly greater than the tax on incomes in the lower income bracket. So, with a two-bracket tax, for a kink, the tax function is for z 0 > 0, t H > t L ≥ 0; that is, all income below the kink point z 0 is taxed at the lower rate t L , and all income in excess of the kink is taxed at the higher rate. For a notch, the tax function is with t H > t L ≥ 0. That is, when taxable income is below z 0 , a tax at rate t L is paid on all income, but when t H is above z 0 , a tax at rate t H is paid on all income.

Bunching
With either a kink or a notch, all types in an interval n ∈ [n L , n H ] will bunch at taxable income z 0 . In both cases, the lowest type who bunches is the one who is just willing to earn taxable income z 0 at the lower tax rate. So, n L is defined by the condition With a kink, the highest type who bunches, n H , is defined by the condition that the optimal choice of taxable income at tax t H is just z 0 i.e.
With a notch, n H is defined by the condition that the n H type must be indifferent between staying at the notch and paying tax t L , and choosing z optimally, and paying t H on all income . To write this indifference condition, we first define the indirect utility Then, the condition defining n H can be written: The left-hand side of (6) is utility when taxable income is constrained to be at the notch value z 0 . Note that this indifference condition implies z( the n H −type could choose z optimally and stay below the notch. Note the difference between indifference condition (6) and the condition (5).

The Bunching Response
Here, we study the effect of a change in t H on the mass of individuals who bunch i.e. on the size of the interval [n L , n H ]. Note first from (4) that n L is unaffected by t H for both a kink and a notch. Next, in the kink case, we can calculate from (5) note that So, we have a bunching response to an increase in t H : i.e. an increase in the tax rate above the kink makes going above the kink less attractive, and so more people bunch below the kink.
In the notch case, note that v t = −z, where v t is the derivative of v with respect to t. Then, we can calculate from (6) that Also, as ψ nz (z; n) < 0 and z(1 − t H , n H ) > z 0 , we see that the denominator of (8) is positive, and consequently from (8): So, again we see that there is a bunching response to a change in t H ; an increase in the tax rate above the notch makes going above the notch less attractive, and so more people bunch at the notch.

The Effect of the Bunching Response on Tax Revenue
Here, we establish a key result that the effect of the bunching response on tax revenue with a kink and a notch are qualitatively different, being zero and negative respectively. With a kink, revenue can be written (10) Note that all households with n ≥ n L pay tax at the lower rate on the first z 0 of earnings.
In the kink case, the bunching effect on tax revenue i.e. the effect of a change in t H on R via a change in n H in t H is, from (10): So, overall, with a kink, the effect of the bunching response on tax revenue is zero. This is simply due to the fact that a kinked tax schedule is continuous in z.
With a notch, revenue is (12) Comparing this to (10), we see a key difference. Because the higher rate applies to all income for those earning above z 0 , the threshold z 0 no longer enters into the the tax base for t H , and so the upper limit of integration on z 0 in the tax base for t L falls from n to n H , reflecting the fact that now only individuals below n H pay any tax at the lower rate. Note from (12) that; This is strictly negative as t H > t L , z(1−t H ; n H ) > z 0 . So, in contrast to the kink case, the bunching effect on tax revenue R from an increase in t H is negative, as ∂n H ∂t H > 0 from (9). This is because a small increase in n H has two effects on revenue that are both negative. First, there is a discontinuity in the tax base; the earnings of these who now locate at the notch fall discontinuously from z(1 − t H ; n H ) to z 0 . Second, there is a discontinuity in the tax rate applying to that base; all these earnings are taxed at a lower rate, t L rather than t H .
So, we conclude: Proposition 1. The effect of the bunching response on tax revenue is zero for a kink, but strictly negative for a notch.
This result is the key one that drives the rest of the paper. So, to fully appreciate the intuition, we consider the two following figures. Each Figure shows how tax revenue collected from an n-type household varies with n. To make the figures as clear as possible, we assume iso-elastic utility as in (1), in which case it is easily verified that tax revenue collected from an n-type household is linear in n. In both Figures 1 and 2, the tax revenue as a function of n before the change is shown by the red line. Note that for n between n L and n H , households are bunching and so revenue is constant. Now suppose that t H increases. In Figure 1, the green line shows the hypothetical revenue paid by households above n H following the increase if there were no behavioral response; all households above n H pay more, proportionally to n. The green arrow shows the bunching response; some households move from above the threshold to just below. From this figure, it is clear that this has only a second-order effect on tax revenue; the government loses just the small triangle shown.  Figure 2 below shows the same change in the top rate of tax, but for a notched tax. With a notch, there is a discrete increase in the tax liability above the notch. When the top rate of tax t H increases, as before, the green arrow shows the bunching response; some households move from above the threshold to just below. But now, it is clear that this change causes a first-order drop in tax revenue, as shown by the grey square; this is because tax revenue as a function of n is discontinuous at this point. Finally, the result that the bunching response on tax revenue is zero for a kink also helps to clarify some confusion in the literature. As already noted, Saez et al. (2012) argue that for sufficient statistic formulae to apply in the kink case, what is required is that assumption that "behavioral responses take place only along the intensive margin", or more precisely that the bunching response of an increase in the top rate of tax is of second order relative to the extensive margin response. Proposition 1 shows that this assumption is not required, because no matter how large is ∂n H ∂t H , ∂R ∂n H = 0 in the kink case.

The Marginal Excess Burden
Here, we derive a formula for the marginal excess burden (MEB) of t H when there is a notch and show that it can be written as the MEB of a proportional tax plus a correction factor. To define the MEB, note that due to quasi-linearity, the natural measure of welfare is the integral of indirect utilities, say W, plus revenue R, which is assumed to be redistributed as a lump-sum back to households when calculating the MEB. So, The minus sign ensures that the marginal excess burden is measured as a positive number. From (12), we see that the effect of an increase in t H on tax revenue is: Here is the base in which the higher rate of tax is levied. So, (15) is composed of three terms, the mechanical effect B H , and two behavioral effects on tax revenue, the intensive-margin and bunching effects. The intensive-margin effect on tax revenue is standard; it describes how tax revenue changes because of changes in earnings, conditional on the taxpayer staying the same tax bracket. The bunching effect on tax revenue and its impact on the marginal excess burden is the focus of our investigation.
To compute dW/dt H , note first that the integral of indirect utilities is (17) By definition, a small change in n H has no effect on welfare, because n H is defined by (6) above. So, using v t = −z, we see that So, plugging (15), (18) Here, is the intensive-margin elasticity of the tax base B H with respect to the net of tax rate 1− t H , and C is a correction factor, which captures the effect of a changing n H , the bunching response, on the MEB, via its effect on revenue. Of course, given the specification (1), e is a constant independent of n H . We can then prove 12 ; Proposition 2. Assume iso-elastic utility (1), and that the distribution of n is Pareto, with shape and scale parameters a, n. Then, the MEB with a notch is where Moreover, in (22), andz H = n H (1 − t H ) e and n H is defined by (6).
Some comments are appropriate at this point. First, the MEB (21) is the formula for the marginal excess burden of a proportional income tax, as shown by Feldstein (1999), plus the correction factor C. This is intuitive; all households above n H are paying tax at rate t H on all their income, so for these households, t H is indeed a proportional tax. So, as already remarked, the correction factor C just captures the effect of a changing n H , the bunching response, on the MEB, via its effect on revenue.
Second, we can ask how the MEB compares to the MEB in a kinked tax system. As shown for example, by Saez (2001), the latter is Clearly, M EB K depends only on simple sufficient statistics; other than the tax rate t H , it depends only on e, the intensive-margin elasticity of taxable income, and a, the shape parameter of the income distribution.
By contrast, from (22), it is clear that C is a more complex object. It depends not only on sufficient statistics e, a, and the top rate of tax, t H , but also on other parameters of the tax system t L , z 0 , and onz H , which is the unconstrained earnings of the type n H , given that they face the higher rate of tax.
So, there are two ways of solving for C. One is simply to compute C using the formulae (22), (6), choosing calibrated values for e, a, z 0 , and that is what we do in this paper. Alternatively, as shown by Kleven and Waseem (2013), in any empirical study of a notch, the earnings n H (1 − t L ) e can be estimated. Specifically, n H (1 − t L ) e is simply z * + ∆z * in the notation of their paper, where z * is the earnings notch and as explained there, ∆z * /z * can be estimated from excess bunching at the notch. Given this,z H can be recovered simply by multiplying z * + ∆z * by (1 − t H ) e /(1 − t L ) e .

Simulations
We have seen that the MEB of an increase in t H is given by the corresponding formula for a proportional tax t H plus a correction factor, C. Moreover, the MEB formula for a proportional tax is very simple, depending only on the intensive-margin elasticity e, and thus can easily be calculated.
So, a key question is whether we can get a good approximation to M EB by setting C = 0 i.e. treating t H as a proportional tax. In this section, we investigate whether the MEB calculated assuming that t H is a proportional tax, is a good approximation to the true MEB.
To do this, we need to calibrate the model. In particular, we require values for e, a, t H , t L , and z 0 . Our baseline parameter values are chosen as follows. Following Piketty and Saez (2013), we set a = 1.5, and following Saez et al. (2012) and Kleven and Schultz (2014), we set e = 0.25. Regarding the tax rates, we first set t L = 0.2, which is broadly in line with the average income and payroll tax paid by US households 13 . It is also the basic rate of income tax in the UK. For the notch, we use the fact that notches in personal income tax, where they exist, are small. For example, Kleven and Waseem (2013) show that in the Pakistani income tax, the notch ranges between 2 and 5 percentage points. So, we will take our baseline notch t H − t L = ∆t = 0.03.
To choose n, z 0 we assume that only the top 20% of the population pay a higher rate of income tax, roughly the proportion in the UK. Define n 0 to be the skill level corresponding to taxable income just at the notch i.e. n 0 (1 − t L ) e = z 0 . This requires that 80% of the population have skills below n 0 i.e. H (n 0 ) = 1 − n n 0 α = 0.8, or n n 0 = (0.2) 1/1.5 = 0.342. Given that only the ratio n n 0 is determined, we set n = 1, so n 0 = 2.924. But then z 0 = 2.924(0.8) 0.25 = 2.168.
Finally, from (22), we need a value for n H . Under the assumption (1), the indifference condition (6) reduces to Equation (23) has two roots, and we take the larger root to ensure that n H (1 − t L ) e > z 0 . Finally, parameter values are chosen so that the denominator in (21) is positive, which is equivalent to dR/dt H > 0 i.e. that the tax rate is on the right side of the Laffer curve. This requires simply that the notch is greater than 0.0015. 14 Figures 3, 4 show both the true MEB, as given by (21), and the approximation, treating t H as a proportional tax i.e. setting C = 0 in (21). The former is denoted by M EB in the Figures, and the latter by M EB A .
The error in using M EB A at the baseline values can be read off from Figure 3, setting e = 0.25. It can be seen that true MEB is about 0.6, whereas the approximation is about 0.1. So, the error in using the proportional formula is about a factor of six. 14 For the denominator in (21) to be positive, we require 1 − t H (1 + e) > C, which is satisfied for t H − t L > 0.0015. shows that M EB is increasing in e, at a faster rate than M EB A , so when e = 0.4 for example, the error in using M EB A is almost an order of magnitude. Figure 4 shows that M EB is also increasing in a, the Pareto parameter which measures (inversely) the size of the tail of the income distribution. As M EB A is independent of a, this means that the the error in using M EB A is increasing in a.
6 An Application to VAT

The Set-Up
As remarked in the introduction, perhaps the most important example of a tax notch is the value-added tax. In this section, we present a simple model of value-added tax, based on Liu and Lockwood (2015), which is mathematically equivalent to the model developed above. We then calibrate the model using UK data from Liu and Lockwood (2015), to estimate the MEB from the VAT, taking into account bunching at the threshold.
Consider a single industry with a fixed, large number of small traders producing a homogeneous good. Each small trader combines his own labor input l with an intermediate input x to produce output y via a fixed coefficients technology where γ measures the the input requirement per unit of output. In particular, for all traders, to produce one unit of output requires γ units of input. where π is profit and ψ(l; m) is the disutility of labour. So, traders are differentiated by disutility of labor. This assumption is not essential, but facilitates comparison to the income tax case. 15 For simplicity, it is assumed that traders only sell to final consumers, who have perfectly elastic demand for the good at price p = 1. This is analogous to the assumption made in the taxable income literature that the wage is fixed, i.e. labor demand is perfectly elastic at a fixed wage. Finally, the intermediate input is produced only from labor supplied by non-trader households via a fixed-coefficients technology where one unit of labor is needed to produce one unit of the intermediate input. So, the tax-exclusive price of the output is w, the wage, which we also assume to be 1.
The traders and the producer of the intermediate inputs face a VAT system. It is assumed that the producer is VAT-registered. If the trader is registered, he must charge VAT on sales y at rate t, but can claim back any VAT paid on inputs. The trader must register for VAT if the value of sales y exceeds the threshold y 0 , but can register voluntarily even if y < y 0 .
Note that when not registered, the price of the input is 1 + t. So, the profit for the non-registered trader is π N = (1 − γ(1 + t))y.
where γ is the cost of inputs relative to revenue per unit sold. For the registered trader, we reason as follows. This trader must charge VAT on his output. None of the output VAT can be passed on to the buyer, as he has perfectly elastic demand. So, revenue per unit sold is p/(1 + t). But, if the trader is registered, he can claim back VAT on the input use x, so the price of the input is w. So, overall, the profit for the registered trader is We now assume, to make the analysis interesting, that 1 > γ(1 + t). From (26), this ensures that non-registered firms make a positive profit. Also, it ensures that for a given value of sales y, π N > π R , so there is no voluntary registration. This is important because then the VAT threshold functions exactly like a tax notch.

Effective VAT Rates
Now define n ≡ m(1 − γ). Then, substituting (26), (27) into (25), after some rearrangement, we can show that the payoff of trader n can be written as a function of value-added z = y(1 − γ) and the VAT system as follows; As A is a free parameter, we set it equal to 1 − γ. Then, (28), (29) describe a utility function and a tax schedule as function of Z that are mathematically equivalent to the income tax model -although, obviously, the economic interpretation of z is different. Here, t N , t R are the effective tax rates faced by non-registered and registered traders respectively on the value-added they generate. Obviously, both effective rates are increasing in the statutory rate, t. Also, note that both rates are increasing in input intensity γ. Moreover, from our assumption 1 > (1 + t)γ, t R > t N .
So, faced with the tax schedule (29), all traders in the interval n ∈ [n L , n R ] will bunch at the VAT threshold z 0 . Moreover, n L = z 0 /(1 − t N ) e , and n R solves (23) with t H , t L replaced by t R , t N .
Finally, letting z(1 − t; n) be the value-added chosen by an unconstrained firm facing tax t, it can be shown that the revenue from the VAT is as in (12), with t H , t L replaced by t R , t N i.e. (30) In (30), the base on which t N is levied is the value-added of non-registered traders, and the base of t R is the value-added of registered traders.

The Marginal Excess Burden of the VAT
With the VAT, a change in the statutory rate t of VAT will change both effective tax rates t N , t R unless γ = 0 i.e. no intermediate inputs are used. This is of course, analogous to a reform that changes both t H and t L in the income tax model. So, for the VAT, the formula for the MEB becomes somewhat more complex. To present the formula for the MEB in this case, we need a few more definitions. First, note from (30), using z(1 − t); n) = (1 − t) e n, the effective bases of t N and t R are (31) Then, from (31), the intensive-margin elasticities of B R , B N with respect to the net-of-tax rate are The term φ captures a new effect of bunching; with bunching, a mass H(n R ) − H(n N ) of the non-registered firms that are bunching are unresponsive to a change in the rate of VAT, which lowers the aggregate intensive-margin elasticity of the tax base B N with respect to t N . 16 Moreover, recall that an increase in t causes both t N and t R to increase, so measures the importance of a change in t R on revenue relative to t N . Armed with these new definitions, we can state our result.
Proposition 4. Assume that the distribution of sales (and pretax-income) is Pareto, with shape and scale parameters a, n. Then, the MEB of the VAT is and finally the correction factor is So, we note now that bunching impacts the calculation of the MEB in two ways. First, as before, there is a correction factor C in (35). The correction factor is more complex than in the income tax case. The reason for the additional complexity is clear from (37); an increase in t now increases both t R , t N and in turn, both of these effective taxes affect n R , the top of the bunching interval, and thus revenue. An explicit formula for C in terms of parameters can be derived as in (22) above; this is done in the Appendix.
In addition, there is a second, new effect of bunching in (36). Bunching dampens the intensive-margin response to a change in t, because at a fixed n N , n R , firms in this interval will not adjust their sales in response to a change in t. This is captured by the term φ > 0, which lowers the intensive margin response from e to ε. An interesting special case is where the small traders do not use any intermediate input, so i.e. γ = 0. Then from (29), t N = 0, t R = t 1+t = τ, so (35) simplifies to It can be checked that in this case, C is given by the explicit formula (22), replacing t H , t L by t R , 0 respectively. 17

Simulations
Here we calibrate the VAT model, and plot the true M EB in (35) and an approximation to the MEB as parameters vary 18 . The approximation is the one treating VAT as a proportional tax i.e. setting C = 0 in (38), which gives The parameters are calibrated as follows. In the UK, the statutory rate of VAT is 20%, so t = 0.2. Liu and Lockwood (2016) calculate that for the universe of firms in the UK that file a corporate tax return, γ = 0.45. This gives t N = 0.16, t R = 0.30.
Finally, we need a value for a. A prior question is whether the "upper tail" of the distribution of firm sales y is well-described by a Pareto distribution. In the case of personal incomes, a Pareto distribution of the upper tail is widely accepted, but less is known about firms. In the US, there is evidence that the size distribution of firms as measured by sales is Pareto (Luttmer (2007)), and Luttmer estimates a value for the US of a = 1.06. In the Online Appendix, I provide evidence that this is also the case for the UK, using firm sales from administrative data on corporate tax returns. I show that for firms above the VAT threshold, the estimate a is about 1.2. So, this is the figure we will use in the simulations.
18 The details of the calibration are described in the Online Appendix.
Our results are given in Figures 5-8 below. Figures 5 and 6 show the simpler case with no intermediate inputs i.e. γ = 0, in which case we know that formula (35) reduces to formula (38). We can see that at the baseline figures for the parameters e.g. e = 0.25 in Figure  5, the true MEB is about twice as high as the approximation. This difference is much smaller than in the income tax case, and is driven partly by the lower value of a in the VAT case. Indeed, we can see in Figure 8 that the accuracy of the approximation M EB A falls rapidly as a rises, because M EB is increasing in a whereas M EB A is independent of a. Here, we see that the difference between the true MEB and the approximation is somewhat higher; the true MEB is about 3 times higher than the approximation. As in the case with no inputs, the true MEB is increasing in both e and a.

Conclusions
This paper shows that the sufficient statistic approach to the welfare properties of income (and other) taxes does not easily extend to tax systems with notches, because with notches, changes in bunching induced by changes in tax rates have a first-order effect on tax revenues. In an income tax setting, we showed that the MEB of a change in the top rate of tax is given by the Feldstein (1999) formula for the MEB of a proportional tax, plus a correction term. Also, under certain conditions, the optimal top rate of tax is given by the formula for the optimal proportional tax, minus a correction term. These correction terms can be computed empirically, using an estimate of excess mass at the notch. Quantitatively, these correction terms can be very large.
An application to VAT was also discussed. A simple model of small traders who differ in productivity, and are subject to VAT at rate t above a threshold level of sales was shown to be formally equivalent to the income tax model. We showed that the MEB of an increase in the statutory rate of VAT is given by the Feldstein formula for a proportional tax plus a correction factor as in the income tax case. With a calibration to UK data, the MEB of the VAT is roughly three times what is would be if VAT was simply a proportional tax.

A Appendix
Proof of Proposition 2. It remains to derive a formula for C. From (8) and z(1 − t; n) = (1 − t) e n, we have Next, from (13) and (16), using the fact that z(1 − t; n) = (1 − t) e n, we have So, plugging (A.2),(A.3) into (N.6), we have: where in the second line we have used´n n H nh(n)dn = E[n |n ≥ n H ] (1 − H(n H )) . Now, given that n follows a Pareto distribution with shape and scale parameters a, n, we also know that Plugging (A.5) into (A.4), we get: Then, using the definitionz H = n H (1 − t H ) e in (A.6), and rearranging, we get (22) as required.
Derivation of (28), (29), (30). We first derive (28), (29). Trader utility is profit minus the disutility of labour. So, combining (A.1), (26), (27) and using n = m(1 − γ), l = y, get: Now, using z = y(1 − γ) in (A.7), we get Finally, we note that for (A.8) to imply (29), we require But, solving (A.9) for t N , t R , we get (29) as required. Now we derive (30). Let y(n) be the sales of an n-type trader. Then, revenue from the from the VAT is The first term is revenue from VAT levied on the value of sales of registered firms, because the sale price is 1/(1 + t), and the second term is revenue from inputs sold by the intermediate input producer to firms that do not register for VAT. Using z(n) = y(n)(1 − γ), we can write this as Finally, replacing z(n) by z(1 − t N ; n), z 0 , or z(1 − t R ; n) where appropriate, we get (30) as required.
Proof of Proposition 4. Let B N , B R be the bases of the effective taxes t N , t R defined in (31). Then from (17), (30), and remembering that a change in the statutory rate of VAT t changes t N , t R via (29), we have: So, plugging (A.12),(A.13) into (14), we have, after rearrangement where in the last line, we have used (32).So, dividing top and bottom of (A.15) by B R ∂t and using the definition of θ from (34), and the definition of C from (37), we get .16) can be rearranged to (36), as required.

Online Appendix
Details of MEB Simulation for the VAT Case. We need to express all the relevant elements of the M EB in terms of the parameters, t, γ, z 0 ,and n R , n N . In turn, we know that n N = z 0 /(1 − t N ) e and that n R is determined by Assume that the distribution of firms is Pareto with shape and scale parameters a, n.
Without loss of generality, we assume n = 1; so, the distribution and density of n is H(n) = 1 − n −a , h(n) = a n a+1 . So, using these formulae and z(1 − t; n) = (1 − t) e n, we have by routine calculation; Moreover, from the formulae for t N , t R in the paper, we have: So, plugging (N.3) into the formula for θ in the paper, we can write Plugging (N.2) into (N.5) allows us to compute θ as a function of t, γ, z 0 ,and n N , n R . Next, using z(1 − t; n) = (1 − t) e n, and the properties of the Pareto distribution, we have; So, using (N.2), (N.5), φ can be computed as a function of t, γ, z 0 ,and n N , n R . Finally, recalling the definition of C in the paper, we have: where in the second line, we use (N.3).
It remains to calculate ∂n R ∂t N , ∂n R ∂t R , ∂R ∂n R . From (N.1), we have: Moreover, from the formula for ∂R ∂n R in the paper, and the iso-elastic form of z(1 − t, n), we get ∂R ∂n R = (t N z 0 − t R (1 − t R ) e n R )h(n R ) (N.8) Plugging (N.7),(N.8) into (N.6), and using the formula for the density of the Pareto density to substitute out h(n R ), we eventually get: This expression for C is computable knowing t, γ, z 0 ,and n R , n L . Thus, all the components of M EB in the paper can be calculated.
Calculation of the Pareto Parameter for UK firms. We use the method of Luttmer (2007) and others to estimate the distribution of of firm size for the UK using corporate tax return data. Firm size y is measured by sales. If the distribution of firm size is Pareto, the log of the size of the upper tail of the distribution of firm size is linear in y, with the coefficient on y being a.
We briefly describe the the data here: a fuller description is given in Liu and Lockwood (2015). We have annual sales of firms, taken from the universe of corporation tax records (CT600) in financial years 2004/5 to 2009/10. The data is then refined by eliminating companies which are part of a larger VAT group i.e. using only standard-alone independent companies. We also drop all observations with partial-year corporation tax records. In addition, we eliminate companies that mainly engage in overseas activities. This yields a data-set with 731,706 observations for 435,688 companies between April 1, 2004 andMarch 30, 2010. To analyze the data, we group firm sales into bins of size £10,000. A visual inspection of the data (available on request) indicates that log of the size of the upper tail of the distribution of firm size is near to linear in y. We then regress (by year) the size of the upper tail of the distribution, denoted 1 − F , on firm size as measured by sales. We define the upper tail as starting at the VAT threshold in any year. The coefficient on sales in this regression gives a value for a. Inspection of the results in Table 1 below indicates that for our population of UK firms, a is approximately 1.2.