Political Preferences for Redistribution in Sweden

We examine preferences for redistribution inherent in Swedish tax policy 1971–2012 using the inverse optimal tax approach. The income distribution is carefully characterized with the help of administrative register data and we employ behavioral elasticities reflecting the perceived distortionary effects of taxation. The revealed social welfare weights are high for non-workers, small for low-income earners, and hump-shaped around the median. At the top, they are always negative, especially so during the high-tax years of the 1970s and 80s. The weights on non-workers increased sharply in the 1970s fell drastically in the late 80s/early 90s, and have since then increased.


Introduction
By observing how tax policy has evolved over time, one can obtain insights into how the government values redistribution and the priorities policymakers have regarding the well-being of different income groups. The picture would however be incomplete. Inspection of tax schedules in isolation does not reveal the economic environment in which decisions about tax policy are made. If policymakers trade off the value of redistribution against the efficiency costs associated with income taxation, the observed tax policy choices of the government reflect not only policymakers' preferences for income redistribution but also the shape of the income distribution and behavioral responses associated with income taxation. The reason is that by imposing different marginal tax rates at different levels of income, the government can expose different individuals to different tax burdens, thereby achieving redistribution. However, individuals respond to income taxation through behavioral responses, such as by reducing their labor effort when being subject to an increased marginal tax rate. These behavioral responses reduce the size of the tax base and constrain the redistributive capacity of the government. Moreover, the shape of the income distribution plays a central role in the design of the income tax. For example, the total efficiency cost of an increase in the marginal tax rate at a particular level of income depends on the number of individuals who are reporting that level of income and are, at the margin, affected by the tax increase.
The aforementioned effects are incorporated into the analysis of optimal income taxation, pioneered by Mirrlees (1971), that strives to find the income tax that achieves the best possible trade-off between redistribution and economic efficiency. The theory is normative as it relies on a set of 'social welfare weights' that allow the utilities of different individuals to be compared and aggregated into a measure of social welfare. In the literature, the implications of adopting different specifications of the set of social welfare weights have been explored. For example, two important benchmarks are the 'utilitarian' case, where the government attaches the same weight to all individuals in the economy, and the 'max-min' case, where the government is only concerned with maximizing the utility of the least well-off agent in the economy. 1 Recently, scholars have become increasingly interested in a positive application of optimal tax theory where observed tax policy is analyzed through the lens of an optimal income tax model. These exercises, which have become known as 'inverse' optimal income tax analyses, allow researchers to retrieve the set of social welfare weights that would rationalize the observed (actual) structure of income taxation as optimal given empirical knowledge about behavioral responses to taxation and the income distribution. The approach allows us to assess whether preferences for redistribution (as stated by policymakers) are consistent with observed policy choices given available estimates of the magnitude of behavioral responses, the shape of the income distribution, and a specific theoretical framework (the optimal income tax model). It also allows for a systematic way to evaluate how preferences for redistribution have evolved over time, taking into account the changing shape of the income distribution, the evolution of tax policy, and behavioral responses to taxation. The bottom line is that, in order to assess the redistributive policy of the government, the researcher should analyze the income distribution and the tax system in combination, not separately, and should explicitly recognize the link between income choices and tax rates in the form of behavioral responses. 2 In this paper we use the inverse optimal taxation approach to analyze preferences for redistribution in Sweden and how they have evolved during the years 1971-2012. To the best of our knowledge, this is the first time the inverse optimal tax approach has been applied to a Nordic country. Sweden is a country with one of the highest tax-to-GDP ratios in the world, and has been successful in combining large amounts of redistribution with high economic output.
It is also a country with a history of highly progressive income taxation and large variation in marginal tax rates both across income groups and over time.
Our analysis contains the following components. We provide a detailed characterization of 2 While observed tax policy might not be the outcome of a perfect optimization by the government, the key assumption in the inverse optimal tax analysis is that current tax structures have been derived, at least in part, from a policymakers' concern that the redistributive benefits of taxation should be weighed against the efficiency costs of taxation. To the extent that the observed tax schedule are non-optimal, as demonstrated by Hendren (2014), the analysis can be seen as quantifying the marginal cost of taxation at different points of the income distribution taking into account the behavioral responses to taxation (as captured by intensive and extensive margins of labor supply) as well as the local shape of the income distribution. The recovered weights can then be used to test the local optimality of the tax system in the spirit of the tax reform literature (see e.g. Kleven and Kreiner (2006) and Saez and Stantcheva (2016) for recent applications). See also Lorenz and Sachs (2015) for a related analysis with an application to Germany. These authors show that a given tax/transfer-system is likely to be inefficient if effective marginal tax rates quickly fall with income. the evolution of Swedish tax policy over the last 45 years, highlighting the effects of several major Swedish tax reforms. We employ detailed administrative micro data covering the full range of the income distribution and characterize the income distribution non-parametrically.
Due to the high quality of our data we are able to describe the shape of the income distribution well, including at the top. We show that top incomes are approximately Pareto distributed and calculate the Pareto parameter (a measure of the thinness of the tail of the income distribution).
We use these measures to provide a separate analysis of how the social welfare weight on top earners has evolved during our sample period.
In order to interpret our estimated social weights as politicians' implicit preferences for redistribution, we need to know policymakers' beliefs about behavioral elasticities. This is consistent with a recent paper by Lockwood and Weinzierl (2016) who stress the importance of using the perceived distortionary costs of taxation when backing out implied preferences for redistribution using an optimal tax model. We survey reports from the Swedish government that indicate that our baseline intensive-margin elasticity of 0.2 is not far from the beliefs of Swedish politicians, indicating that our estimates may in fact be close to actual distributional preferences of Swedish policymakers.
Our most important results can be summarized as follows. In contrast to what is expected given standard assumptions in the optimal income tax literature, and previous results for the United States 3 , we do not find social weights to be monotonically declining with income. Instead, we document a hump-shaped pattern with the highest weights on middle-income workers. We also document some dramatic changes over time. The weights on welfare recipients increased sharply in the 1970s and then fell substantially in connection with the tax reforms in the late 80s/early 90s and have since then seen a steady increase. An interesting and general pattern throughout our analysis period is that the social weights on low-income wage-earners are relatively low, while the weights on non-workers are quite high. Social weights for the top decile are always negative, fundamentally so during the high-tax years of the 1970s and 80s.
The paper is organized as follows. Below we provide a brief review of the related literature.
In section 2 we describe the optimal income tax model that is inverted for the purpose of deriving 3 For example, Hendren (2014) and Lockwood and Weinzierl (2016). social welfare weights. Section 3 describes our income data and the evolution of the Swedish income distribution. In section 4 we describe the Swedish tax system between 1971 and 2016.
In section 5 we present a meta-analysis of recent and historical empirical estimates of behavioral responses to taxation and discuss how we have incorporated them in our analysis. We also discuss politicians' beliefs about these elasticities capturing the perceived distortionary costs of taxation. Section 6 presents our results: social welfare weights for different income groups and how they have evolved over time. Finally, section 7 offers concluding remarks.

Related literature
The origin of the positive approach to optimal income taxation analyzed in this paper dates back to at least Weisbrod (1968) who recognized the importance of using distributional weights in cost-benefit analysis. The theory was further developed by Basu (1980). An early important contribution is Christiansen and Jansen (1978) who derived the social welfare weights inherent in the Norwegian structure of commodity taxation. More recently, beginning with Bourguignon and Spadaro (2012), there has been a surge of papers calculating preferences for redistribution based on observed tax/transfer systems using the inversion of optimal income tax models. For example, Bargain et al. (2014) analyze the redistributive preferences for 17 EU countries and the United States by inverting the Saez (2002) optimal income tax model. Zoutman et al. (2014) analyze the re-distributive preferences in the Netherlands by inverting the continuous optimal income tax model with both intensive and extensive margins developed by Jacquet et al. (2013). Relatedly, Zoutman et al. (2015) analyze the political preferences for redistribution of Dutch political parties. Hendren (2014) develops a general approach to compare income distributions recognizing the cost of achieving a more equitable income distribution through modifications to the tax schedule. While the above mentioned papers focus on analyzing redistributive preferences at a specific point in time, in similarity to our study, Bargain and Keane (2010) and Lockwood and Weinzierl (2016) track how preferences for redistribution have evolved over time. The paper most closely related to our study is Lockwood and Weinzierl (2016) who invert the Diamond (1998) continuous pure intensive-margin optimal income tax model to analyze the evolution of social preferences in the US between 1979 and 2010. In line with Zoutman et al. (2014,2015) and Hendren (2014), we also allow for extensive margin (participation) decisions.

Theoretical framework
The origin of this paper is the Mirrlees (1971) framework of optimal income taxation. Individuals differ in their income earnings capacities (or skills) and make labor effort choices based on their preferences and the link between pre-tax and post-tax income implied by the tax schedule. The optimal income tax problem is formulated using the tools of mechanism design. The optimal income tax is defined as the structure of income taxation that maximizes the social objective function subject to a set of incentive-compatability constraints and the government's budget constraint. The incentive-compatability constraints derive from the government's inability to impose individual-specific taxation which implies that individuals are free to choose any point on the income tax schedule. 4 The efficiency cost of taxation arises from the fact that raising the disposable income of low-income persons through a reduction in their tax liability (with the purpose of achieving redistribution), results in a tightening of the incentive constraint as high-income persons now have an incentive to reduce their income to obtain a more lenient tax treatment. Saez (2001) brought the literature on optimal income taxation closer to empirical research by reformulating the original Mirrlees model in terms of empirically quantifiable components.
Notably, behavioral responses were expressed in terms of empirically estimable taxable income elasticities and the differences in earnings capacities (or skills) were directly mapped to the empirical income distribution. The social objective function was specified as a weighted sum of the utilities of the individuals in the economy. These weights are commonly referred to as 'social welfare weights' in the public finance literature.
To conceptually approach the problem of redistributive taxation we find it useful to consider a population of agents indexed by a multidimensional set Θ. Following Saez and Stantcheva (2016), we define for each θ ∈ Θ a generalized social marginal welfare weight g(θ) which measures the value society places on a marginal increase in consumption for an individual 4 The key contribution in the Mirrlees formulation was the focus on information as the fundamental constraint on public policy, rather than relying on some ad-hoc restrictions on the shape of the income tax function. with characteristics θ. The weights g(θ), θ ∈ Θ embody society's judgment of fairness and may vary depending on both income and non-income characteristics including unobservable personal characteristics and circumstances. The social welfare weights can be used to assess the optimality of the tax system. The welfare gain of any policy reform can be computed by calculating the money metric utility loss or gain for each individual and aggregating these gains/losses using the weights g(θ).
Since we are interested in using taxes based on income to recover social welfare weights, we need to aggregate g(θ) over subsets of Θ that are associated with individuals having the same earned income z. Thus we will consider the average social welfare weight at the level of income z which we denote by g(z).
For example, suppose an agent with characteristics θ ∈ Θ maximizes a utility function is the marginal tax rate, y is non-labor income, k > 0 is a parameter measuring the intensity of the disutility of work and z 0 (θ) is an individual-specific parameter. Then, the optimal income choice of a θ-type individual will be z = z 0 (θ)(1 − τ) k where z 0 (θ) has the interpretation as the potential income of an individual with characteristics θ which is equal to the income chosen by a θ-type individual in the absence of taxation (τ = 0). In this simple framework, g(z) will be the average social welfare weight across the set of individuals who report an income level of exactly z when the income tax rate is τ, namely, the set of agents {θ ∈ Θ : z = z 0 (θ)(1 − τ) k }.
The social welfare weights g(z) that we derive in this paper will be attached to specific income levels z while we remain agnostic about how the income z has been generated (e.g. whether income is the outcome of luck or effort).
In accordance with Hendren (2014) and Zoutman et al. (2014) we derive social welfare weights in a general optimal tax framework featuring both intensive and extensive margin labor supply decisions. As suggested by Saez (2001), and shown formally by Jacquet and Lehmann (2015), optimal income tax formulas can be extended to the case of multidimensional heterogeneity. 5 However, as discussed at length by Hendren (2014), in the presence of multidimensional heterogeneity (heterogeneity conditional on income), the nonlinear income tax schedule 5 At least when there are no general equilibrium effects on wages; see Rothschild and Scheuer (2014). cannot be inverted using the established approach. In line with earlier literature, we therefore abstract from this issue. Our approach is to focus on a representative group of the population (that could be considered a specific subset of Θ) and then analyze how social welfare weights vary across different income levels for individuals belonging to this group.

Intensive margin
Let us now briefly discuss the determination of optimal income taxes. The basic principle behind optimal income taxation is to weigh the costs of taxation (measured in terms of behavioral responses) against the benefits of taxation (measured by the social welfare weights). A tax system is deemed optimal when there exists no perturbation to the tax schedule that would increase aggregate social welfare. This optimality condition allows one to derive the representation of the Mirrlees (1971) optimal marginal tax rate formula presented in terms of income by Saez (2001): which here is slightly re-expressed in the convenient form presented in Saez and Stantcheva (2015) for the case without income effects in the earned income decision. 6 In the above equation ε(z) is the average elasticity of income z with respect to 1 − T (z) for individuals earning z θ = z, α(z) is the local Pareto parameter defined as α(z) = zh(z)/(1 − H(z)) where h(z) is the PDF of the income distribution and H(z) is the CDF of the income distribution. Finally, is the average social weight of people earning more than z. 7 We make a few observations before we turn to the inversion of this formula. The formula embeds the key trade-off between equity and efficiency. First of all we see that optimal marginal tax rates are decreasing in G(z). The higher is the social weight on the consumption of individ-6 To simplify the analysis, we abstract from income effects. Income effects on labor supply are generally considered to be small. For example, McClelland and Mok (2012) survey income elasticities in the range 0-0.1. Moreover, income effects do not seem to matter greatly for the profile of social welfare weights. For example, Zoutman et al. (2015) employ an income elasticity of 0.1 but find that income effects have only a small impact on the results. 7 To simplify notation we have written each variable x as x, implicitly recognizing that x denotes an average across subsets of Θ. uals earnings more than z, the lower should T (z) be, since the size of T (z) affects the total tax paid by all individuals earnings more than z. Second, optimal marginal tax rates are decreasing in the product α(z)ε(z). This product measures the efficiency cost of taxation manifested in the revenue gains or losses induced by the behavioral response to the tax. More specifically, the factor ε(z) measures the extent individuals reduce their income in response to a marginal tax increase locally at z and α(z) captures the size of the tax base (loosely speaking, the number of individuals with income z) which measures how costly such a behavioral response will be in terms of tax revenue.
Equation (1) can be inverted to obtain an expression for the social welfare weight g(z): Assuming T (z) is piece-wise linear (which is indeed the case in our empirical application) and that ε(z) is constant on a given segment of the tax schedule, we can compute the derivative in (2). Thus a simplified expression of (2), valid in the interior of any segment of the tax schedule, is: where ρ(z) = zα (z) α(z) is the elasticity of the local Pareto parameter α(z). To understand the impact of the income distribution on the profile of social welfare weights, Hendren (2014) refers to as the local elasticity of the income distribution. 8

Intensive and extensive margin
We now add an extensive margin to the model by allowing workers to decide whether or not to enter the labor force. Conceptually this decision is based on a comparison between a fixed cost of work and the financial reward from working where the latter is affected by the tax and transfer system. The fixed cost of working can be interpreted broadly to accommodate the utility costs (e.g. stemming from foregone leisure, searching for a job) or monetary costs (such as commuting or child care costs). 9 In accordance with Zoutman et al. (2015) the formula for the social welfare weights with both intensive and extensive margins takes the following form: The extensive margin adds an extra term to the intensive-margin formula (2) equal to the product of the participation elasticity P (z) = and the participation tax rate where B 0 is the income provided to an individual with z = 0 (for example through the social assistance system) and e z is the employment rate among individuals with potential income z (i.e. who would have an income of z if they would choose to enter the labor force). 10 To gain some intuition behind the extensive-margin part of the formula, notice that in the absence of intensive-margin responses, i.e. ε(z) = 0, formula (4) is equivalent to: which specifies the optimal participation tax rate at each income level z and has a form that is familiar from earlier work (i.e. Saez 2002). In appendix A we derive this formula heuristically using a perturbation argument. The key intuition behind this equation is that it trades off the mechanical increases in tax revenue resulting from increases in taxes on the working population against the behavioral losses in tax revenue resulting from participation responses.
As explained in Jacquet et al. (2013), in the presence of intensive-margin responses [ε(z) 0], optimal participation taxes will deviate from formula (5) since one needs to take into account the distortions induced by the tax schedule on intensive-margin responses. 11 In the presence of 9 Technically, the pure intensive-margin model above includes an extensive margin since it allows individuals to choose between z = 0 and z > 0. However, a standard practice in the public finance literature is to employ fixed costs of work to better explain the labor supply behavior at low levels of income. See Hausman (1980) and Cogan (1981). 10 The formula is presented by Zoutman et al. (2015) for the case of uni-dimensional heterogeneity. An equivalent formula is presented by Hendren (2014) in his setting with multidimensional taxpayer types. 11 Notice that assuming away the intensive margin is equivalent to assuming that the government can observe the potential income of all workers which implies that the only possible adjustment in response to income taxation intensive-margin responses (5) instead defines a "target" for the participation tax. In this case, using Zoutman et al. (2015), appendix, equation 34 (or equation 18d in Jacquet et al. 2013) an average of equation (5) should hold in the optimum, i.e.

Weight on non-workers
Equation (4) allows us to compute social welfare weights for working individuals. To derive an expression for the social welfare weight on non-workers we proceed as follows. Denote by E the fraction of the population that is working. The possibility for the government to levy a uniform lump-sum transfer that is received by everyone in the population (workers and non-workers) requires the following optimality condition to be satisfied: The RHS of (7) measures, since the population size is normalized to 1, the marginal cost of simultaneously decreasing T (z) by 1 SEK and increasing B 0 by 1 SEK (raising the disposable income of all workers by 1 SEK). The LHS of (7) measures the marginal social benefit of such a transfer where we have denoted the social welfare weight on non-workers with g 0 . Notice that by construction such a reform does not change the financial reward from working for anyone in the population. Moreover, in the absence of income effects on the decision to supply taxable income, the intensive-margin choices of individuals are unaffected as well. Re-arranging (7) we can get an expression for the social weight placed on the total population of non-workers: Next, we will present our strategies regarding the three components of the inversion exercise: the income distribution, the tax system and behavioral elasticities. However, before that, we is to leave the labor force. If the potential income of each workers on the other hand is unobservable to the government individuals can adjust their income in response to taxation and it becomes impossible to implement specific participation tax rates at each level of potential income.
briefly discuss redistribution through other channels than the nonlinear income tax.

Redistribution through other channels
In this paper we derive preferences for redistribution by analyzing the evolution of the income tax through the lens of an optimal income tax model. It should be acknowledged that the government performs redistribution through additional channels as well, such as through the choice of commodity tax structure and through the expenditure side of the government budget.
We have abstracted from many of these alternative channels for redistribution. This can be motivated partly by the fact that it makes sense to focus on the income tax since it is the primary vehicle of redistribution in modern economies and partly because of the complexity involved in assessing the value different income groups assign to different components of the government budget. Our approach is equivalent to assuming that public expenditures benefit individuals equally across the income distribution. This may be reasonable to assume in the Swedish case since Sweden has a history of providing uniform rather than means-tested benefits, perhaps for historical reasons in order to ensure sufficient political support for the welfare state. For example, child care, health care and education are universally provided and it is uncommon for families to choose to opt out and pay for these services out of their own pockets. This is in contrast to the US which relies to a larger extent on means-tested in-kind transfers. Notable such examples in the US that are highly redistributive in nature are the government subsidies directed towards medical care, food consumption/nutritional assistance, housing, and early childhood education. 12 Our simplified approach to deal with public expenditures boils down to assuming that a given percentage of the public budget is dedicated to the financing of public goods (such as defense, public administration and infrastructure) and that the value of these goods enter additively in the utility function of individual agents in the economy. This implies that the structure of social welfare weights is unaffected by the level of public good provision.

Income distribution
The first component of our inversion exercise is the income distribution. Our income data source is the LINDA database provided by Statistics Sweden, which contains administrative data for a random sample of 3.35 percent of the Swedish population (around 300,000 individuals). We characterize the income distribution non-parametrically using a kernel density estimator with an adaptive bandwidth. The shape of the income distribution for eight representative years is shown in figure 1.
Even though LINDA data is available already from 1968, we choose 1971 as the starting year of our analysis due to the important tax law change that happened in that year when Sweden switched from joint to individual taxation for married couples. Hence our tax unit of analysis will throughout the paper be the individual. Nonetheless, some elements of the transfer system are determined as a function of the income of both spouses. To simplify our calculations, and to be in line with earlier literature that focuses on countries with individual taxation, we focus on single men and women without children. The fact that we focus on childless individuals simplifies the analysis considerably since we do not need to take into account child allowances and various transfer programs which are only available to individuals with children (such as housing allowance). Moreover, to properly apply the inverse optimal tax approach to an economy with couples would require us to invert a family model of optimal income taxation. This is an interesting but formidable task that has not yet been done in the literature and we therefore leave it to future research. 13 14 Our theoretical model concerns the redistribution of pre-tax labor income, including selfemployment income. Hence we exclude capital income and transfers, such as pensions and unemployment benefits. We assume that individuals who receive labor income do not simultaneously receive unemployment benefits, student aid or pension payments. Therefore we restrict our sample to individuals aged 25 to 59. Since data availability and definitions change over the

Taxes
The second component of our inversion exercise is the tax system. The period of our analysis contains substantial variation in marginal tax rates reflecting, for example, the changes to the 13 To make some progress along these lines, we have in the supplementary material, section C.1, provided a heuristic attempt at calculating social welfare weights for couples with two children, taking into account housing allowance and child allowances while assuming a simple form of correlation between the incomes of spouses.
14 Notice that the conceptual difficulty of calculating social welfare weights for couples is more difficult when there is individual taxation as compared to when there is joint taxation. In countries with joint taxation (such as the United States), the process of deriving social welfare weights for couples is simplified by focusing on couples who file jointly. In such case, however, the inversion approach seems only feasible when assuming a unitary model of family decision-making, thereby ignoring potentially important sources of intra-household inequality that previous research has shown can have considerable consequence for the optimal tax problem (as shown for example by Bastani 2013). 15 The earned income tax credit (EITC) was introduced in 2007 and targets precisely the income category that we are interested in. Because the EITC in a sense supersedes the standard deduction, it also determines the marginal tax rate. Therefore we use EITC-eligible income ("underlag för jobbskatteavdrag") for the years 2007-2012. "Primary income" is a closely related income concept which we use for 1993-2006. Since no clear-cut labor income variable is available in LINDA before 1993, we have chosen to use taxable earned income ("[kommunalt] taxerad förvärvsinkomst"). This variable is a bit problematic due to the fact that many social security transfers are taxable in Sweden. For example, it implies that there is a greater mass at low to medium income levels generated by these taxable transfers. However, this issue is somewhat mitigated by the fact that the number of individuals receiving taxable transfers from the government was lower in the 1970s and 80s than after the crisis in the early 1990s. In the supplementary material, figure 12, we show results for 2013-2016 by assuming that the income distribution has remained unchanged since 2012 and scaling it up by nominal wage growth for 2013 to 2016 as estimated by the Swedish Ministry of Finance.
tax system that occurred with the great expansion of the welfare state in the 1970s, and the overhaul of the tax system following the big crisis in the beginning of the 1990s. Below we give a brief overview of the Swedish tax system and how it has evolved and describe the major developments of tax policy that are relevant to our analysis. In subsection 4.2 below we present the effective marginal income tax rates that we use in our model. Readers who are familiar with the Swedish tax system might want to skip directly to that section.
The basic structure of the Swedish tax system is simple. A municipal income tax is levied at a flat rate on incomes exceeding a standard deduction. 16 On top of this a progressive central government income tax is paid on all income exceeding a certain threshold. The central income tax mostly targets high-income earners and therefore does not generate as much tax revenue as the municipal income tax. The two other major sources of tax revenue for the government are the value-added tax and the social contributions that firms are obliged pay for their employees. 17

Historical developments of tax policy in Sweden 1971-2016
The 1970s were characterized by a substantial increase in taxes and the size of the public sector.
In the beginning of the 1970s, the municipal income tax and the value-added tax were the two biggest sources of tax revenue and social security contributions quickly rose to become a third important source of tax revenue. 18 The valued-added tax (VAT) was introduced in 1969, replacing the sales tax, and has increased in importance over time. There has been a secular increase in average VAT over time from about 12 percent in the beginning of our period to around 19 percent at the end of period.
quoted tax-inclusive. 19 Between 1972 and 1982 the rate of social contributions applicable to top incomes rose from 2 percent to 33 percent, as social contributions were made strictly proportional to income. 20 16 For some years this standard deduction varies with income, thereby affecting marginal tax rates. 17 The development of the average municipal income tax and central government income tax, social contributions and the value-added tax since 1971 is shown in figures 8d, 8b, and 8a in the supplementary material. In figure 8c we present the evolution of top tax rates. 18 In figure 8d, section C.2, we plot the average municipal tax rate between 1971 and 2015 showing that municipal taxes have trended upwards, reflecting an increasingly ambitious welfare state, with a particularly sharp increase during the 1970s. 19 These developments can be seen in figure 8a, section C.2. 20 These developments can be seen in figure 8b, section C.2.
At the same time the income tax became more progressive (the top central government tax rate reached 58 percent) and in 1982 the effective top marginal tax rate reached a striking 91 percent. These high marginal tax rates caused a growing concern regarding the distortionary effects of taxation. This eventually resulted in a series of tax reforms, lead by the Social Democratic government, that substantially reduced marginal tax rates and simplified the tax system by significantly reducing the number of brackets.
The series of tax reforms were initiated with the tax reform of 1981 ("the Wonderful Night", named after a tense night of negotiations between the Social Democrats, the Liberals and the Center Party), which came into force in 1983. The tax reforms during the 80s culminated in the "Tax Reform of the Century" in 1990-1991, when the central government income tax was greatly simplified and reduced to a single tax bracket of 20 percent applying only to high incomes. This reform was made possible through an agreement between the Social Democrats and the Liberals. With a municipal tax rate of about 30 percent, the idea was to ensure that no one faced a marginal tax rate higher than 50 percent (excluding VAT and social security contributions). Moreover, to appease the main blue-collar union (LO), the standard deduction was made income-dependent and higher for low-to medium-income earners, a targeted tax cut that also affected marginal tax rates. 21 In the 1990s Sweden experienced a severe economic crisis that lead to various austerity measures. Notably, the standard deduction was decreased and a second bracket of the central government tax at a level of 25 percent ("värnskatt") was introduced. Some of these austerity measures were then reversed in the early 2000s. For example, the standard deduction was made more generous and a tax credit for the pension fee was gradually phased in.
The most important tax reforms in the last two decades were initiated in 2007, following the election of a new center-right government, when an earned income tax credit was implemented.
It was expanded over the next couple of years and lowered average tax rates for all labor income

Effective marginal income tax rate
We compute the effective marginal rate of income tax as follows: where τ l is the marginal tax on labor, τ c is the weighted tax on consumption (quoted taxinclusive) and τ s denotes social security contributions 23 (quoted tax-exclusive). VAT is ultimately paid by consumers and affects the tax wedge on labor and is therefore also relevant to our analysis. Our main source for the tax law of the 1970s and 80s is Söderberg (1996). For the 1990s and 2000s we have mostly relied on primary sources.
Following the Mirrleesian approach to the treatment of nonlinear income taxes, we view government transfers as negative taxes and analyze an integrated tax-benefit system. The lumpsum component of the nonlinear income tax corresponds to the social assistance system which is a last-resort income support program provided by municipalities. It depends on household income as well as assets, and varies by municipality. 24 The presence of the social assistance system generates a 100 percent effective marginal tax rate at the bottom of the income distribution. This implies that there is a region at the bottom where it is irrational for any individual to locate (with strictly convex preferences). Correspondingly, our formula for the social welfare weight (that is based on a principle of individual optimization) is undefined in this region.
of the standard deduction on the marginal tax schedule. 23 Social security contributions (SSC) are paid by employers to finance social insurance benefits such as sick pay, unemployment benefits and pensions. Because the benefits received are a function of income, social security contributions can be considered part taxes and part insurance fees. To exactly separate the tax from the benefit component of the SSC is a laborious economic exercise that is sensitive to assumptions. Here we follow Flood et al. (2013) in assuming that 60 percent of social security contributions constitute fees and the remainder taxes. All social security benefits are capped, which means that after a certain point 100 percent of the SSC constitute taxes. The cap varies by benefit and over time, but we assume it to be 7.5 price base amounts, corresponding to 340,000 SEK in 2015, as this has been the cap for sick pay since 1990. Social contributions were initially levied at different rates at different income levels, with the highest marginal rate applying to middle-income earners because the connection with benefits was the largest in that segment. Gradually the connection with benefits weakened and from 1982 social contributions are levied at the same rate for all taxpayers. Between 2007 and 2016, younger people have faced a lower rate of social contributions. As we focus on people older than 24 we can ignore this reduction. 24 In appendix B we describe in greater detail how we have constructed the data series of social assistance. Finally, it should be noted that in our analysis we only include the value-added tax while excluding specific taxes on energy, carbon dioxide, alcohol and tobacco, as these taxes can be considered to be levied to correct for externalities and not primarily employed to achieve redistribution. 25 Our analysis focuses primarily on eight representative years : 1971, 1977, 1983, 1989, 1991, 2000, 2006 and 2012. Figure 2 shows the effective marginal tax schedule for these years. 26

Elasticities
The third and final component of our inversion exercise is the set of elasticities that we use to capture behavioral responses to the income tax system. We need both intensive-margin and extensive-margin elasticities. Our model is very flexible and allows these to vary across income 25 Presently, the main rate of VAT is 25 percent of the price before tax (i.e., 20 percent quoted tax-inclusive). Lower rates apply to food (12 percent) and books and public transportation (6 percent). Rent and financial services are not subject to VAT. We obtain effective average VAT rates from Du Rietz et al. (2013) and assume that all agents face this consumption tax rate, regardless of income. 26 In figure 9 in the supplementary material we show the effective marginal tax schedule for all years. levels and over time.
Two approaches are possible. The first is to use reasonable estimates from the research literature. Our results will then have a fiscal externalities interpretation in line with Hendren (2014). The question answered then is: What it the opportunity cost of taxing different income groups?
A second possible approach is to use the actual elasticities employed by policymakers when they evaluate proposals for policy reforms. If two politicians disagree on the desirability of tax reform, they could in principle either disagree about labor supply elasticities, or about social weights. Assuming that the elasticities stated by politicians correctly reflect their beliefs about the efficiency costs of taxation, the elicited social welfare weights will be informative about their preferences for redistribution, under the assumption that politicians to some extent weigh the redistributive benefits of taxation against the costs due to behavioral responses. This requires that policymakers truthfully reveal their beliefs, and excludes the possibility that e.g. right-ofcenter politicians exaggerate behavioral responses to motivate tax cuts, when the real reason is that they have a higher subjective social welfare weight for the group concerned. Sweden is well suited for our case study, as the country has a long tradition of commissioning independent reports which the government then uses as a basis for policymaking. These reports can give us an indication of politicians' beliefs about labor supply elasticities. Although the two approaches are conceptually different, the elasticities are similar for the case of Sweden, as we shall see.
We also need to decide whether to use elasticities estimated on Swedish data or estimates from the international literature. Here there is a trade-off between relevance to the Swedish case on the one hand, and concerns regarding credible identification, on the other hand.
We begin by surveying the research literature. We choose 0.2 as the value of our baseline compensated intensive margin elasticity.

Extensive margin
There is scarce supply of credible quasi-experimental evidence on extensive-margin responses. Chetty et al. (2012) survey recent microeconometric studies that have estimated extensivemargin elasticities and find that an extensive margin elasticity of 0.25 is most closely consistent with the evidence. One should keep in mind, however, that the participation elasticity is not a deep structural parameter, but depends on the number of workers who are at the margin indifferent between working and not working. The participation response is therefore a function of the local shape of the distribution of fixed costs/reservation wages at the economy's current equilibrium. Thus we should expect participation elasticities to be different for different income groups (depending on the skill-specific employment level) and at different points in time (depending on the overall employment level in the economy). 28 Of particular relevance to our analysis are two studies on Swedish data, Selin (2011)

The perceived distortionary costs of taxation
The government committee that laid the groundwork for the Swedish "Tax Reform of the Century" in 1991 concluded that an intensive-margin labor supply elasticity of 0.25-0.3 best re-28 See Bastani et al. (2016) for further discussion about the relationship between the participation elasticity and the employment level. Notice, that in the context of our model there is a 1-1 correspondence between the tax system and the employment rate. 29 These estimates reported by Selin (2014) and Bastani et al. (2016) might at first glance seem very different. Nonetheless, the studies are consistent if one notices that Selin (2014) reports that the pre-reform share of married women with positive earnings was 67 percent (table 8) whereas the corresponding share in Bastani et al. (2016) is 90 percent. flected available evidence, some of which had been commissioned by the committee. 30 The government seemed to find these estimates credible and decided to underfund the tax reform by 5 billion SEK, since the government committee expected beneficial revenue effects from lower marginal tax rates through behavioral responses (what is labeled by practitioners as dynamic effects). A government report commissioned to evaluate the reform concluded that a compensated hours elasticity of 0.1 for men and 0.4 for married women seemed reasonable. 31 These estimates are fairly close to our baseline elasticity of 0.2, which indicates that our social weights estimates can be seen as representing the social weights of centrist politicians of the late 1980s and early 1990s.
In 2006 an alliance of center-right parties won the election on a platform of "making work pay" and increasing labor force participation. There is some indication that this was reflected in the choice of elasticities, with greater emphasis on the extensive margin. According to the Fiscal Policy Council (2008), the government used an hours elasticity of 0.1 and a participation elasticity of 0.25 to evaluate its reforms in 2007.
In a report from the Department of Finance (2009), the government assumes an hours elasticity of 0.05 or 0.1 for men and 0.1 or 0.15 for women, and a participation elasticity of 0.1.
The participation elasticity is not very high, but considering the conservative choice of intensivemargin elasticity there is some support for the proposition that the government's rhetoric of labor force participation also was reflected in its beliefs about the relative magnitudes of intensiveand extensive-margin elasticities.

Results
In this section we report the results of our model: the marginal social welfare weights placed by the government on various income groups and how they have evolved over time. We also report how the government's relative valuation of working and non-working individuals has changed over time. We compute social weights using equation (4). This formula requires three components that have been described in detail above: (i) behavioral elasticities on the intensive 30 See the government report SOU 1989:33, supplement 4, p. 15 (SOU 1989. 31 See the budget proposition of the Swedish government prop. 1997/98:1, supplement 6, p. 14 (The Swedish Ministry of Finance 1998). and extensive margin, (ii) the tax system and (iii) the income distribution. We plot the weights for eight representative years and present the results in figure 3a. These years represent the major episodes of the evolution of the Swedish tax system. 32 The qualitatively most important features of the shape and evolution of the social welfare weights during our period of study can be described as follows. For most years, social weights are increasing up to about the 40th or 50th percentile, producing a hump-like shape. This is an interesting feature of our schedules, but it remains an open question if the hump reflects support of redistribution towards the middle class or a tendency of politicians to cater to the median voter (a political economy consideration that is not part of our model). During the late 1980s and early 1990s the hump is less clear, but it reappears again in the late 1990s. An overall tendency is that the profile of social welfare weights become flatter in later years, reflecting a downward trend in the amount of redistribution carried out by the Swedish income tax system.
The hump-shaped pattern of social welfare weights is surprising, as one would normally expect the social weights to decline monotonically with income, reflecting the government's desire to redistribute from high-to low-income individuals. The hump-shaped feature can be contrasted with the results of Hendren (2014), who finds a monotonically declining shape for the United States, but is similar to the results for the Netherlands presented by Zoutman et al. (2015).
These differences are interesting, but need not necessarily reflect fundamental differences between countries in terms of preferences for redistribution. Instead, they could reflect inherent differences between countries in terms of the extent the government redistributes through the tax system versus other channels. In particular, countries differ in their reliance on means-tested versus universal support programs (see also the discussion in section 2.3). 33 Turning to the upper half of the distribution, we see that the weights decline quite steeply in this region in the 1970s and 1980s but the decline is much less pronounced in the 1990s and beyond.
We pay special attention to the top of the distribution in section 6.1 below, but make the We now proceed to analyze the relative social welfare weight on the working and the nonworking population and study how it has evolved over time. Individuals who report zero income are assumed to receive social assistance. Thus, the results will depend on the generosity of welfare benefits, relative to the tax treatment of working individuals. Notably, and in accordance with the equations in section 2.2, the results will also strongly depend on the fraction of the We count as non-participating those individuals who earn less than the welfare benefit level after tax. For some years this is a quite high number, over a third. Our approach is motivated by the fact that there is a large group in our age span (25-59) that are neither working nor receiving benefits. Ignoring this group results in unrealistically high employment rate. Because of this, the variations in non-work social weights resulting from variations in the proportion non-employed should be interpreted with caution.
The average social weight attached to those who are working and not working (assumed to receive welfare benefits) is shown in figure 4. The weight on workers is simply the weighted average of the social weights shown in figure 3a. Because the grand average of social weights must be equal to 1 (as we have ruled out income effects), the weight on welfare recipients is simply the mirror image of the weights on workers, adjusting for variations in the proportion of workers and non-workers over time.
It seems that the weight on workers is largely driven by variations in marginal tax rates.
The increasing tax rates of the 1970s are clearly visible. There is a remarkable stability after the 1990-1991 tax reform. Fluctuations in the relative gain to working -for example the introduction of the EITC in 2007 -barely seems to affect the weight on the employed. Perhaps this is unsurprising as we have set the extensive margin elasticities (with an average of 0.1) to generally be lower than the intensive margin elasticity (equal to 0.2).
The weight on welfare recipients is much more volatile, varying between 1.6 and 3.3. Dur-ing the high-marginal-tax era of the late 1970s and 80s, the weight on the non-employed is substantially higher than the weight on low-income workers. Also during the 90s and 00s, the weight on non-workers seems high. We interpret this to mean that Swedish politicians find those out of work to be deserving, as opposed to the American view that the working poor are especially worthy of support. Note that the increase since the mid-1990s is entirely driven by a larger proportion of individuals working.

Social welfare weights at the top
The logic of the inverse optimal taxation approach becomes especially clear in the case of top incomes since we can exploit a simplified way of expressing the optimal top income tax rate.
Consider again the optimal income tax formula (1): To derive the social welfare weight for top income earners we assume that for z sufficiently large, say z ≥ z, G(z), α(z) and ε(z) all are constant and equal toĝ, a and e, respectively.
Moreover, we assume that the government levies a constant marginal tax rate above z equal to τ. Thus, for z ≥ z the above expression takes the form: This is the well-known top tax rate formula first presented by Saez (2001 Re-arranging (9) we obtain an explicit expression for the social welfare weight on top in- 34 In figure 11 in the supplementary material we plot the local Pareto parameter and can see that it is indeed quite stable in the upper part of the distribution. come earners:ĝ An important special case occurs whenĝ = 0, i.e. when the policymaker places no weight on additional consumption of the very rich. This yields the revenue-maximizing (Laffer) tax rate τ L = 1 1+ae . If τ > τ L cutting taxes for the very rich would raise tax revenue and in that case the social welfare weight placed on the very rich,ĝ, is negative. We may also notice that when g = 0 in (10) then e = 1 a 1−τ τ is the elasticity of taxable income that would rationalize current marginal tax rates as revenue-maximizing (i.e. they would be the optimal marginal tax rates whenĝ = 0). We calculate the Pareto parameter as follows: where z 1 is an arbitrary income level beyond we wish to apply the Pareto distribution andz 1 is average income above z 1 . The advantage of this estimator is that it exploits data in the entire right tail of the income distribution. We choose z 1 equal to three times the average income. The Pareto parameter for 1971-2012 calculated according to (11) is plotted in figure 5a and ranges between 3 and 4. There is a tendency of an increase during the 1970s and 80s and a decrease during the 90s and 00s. 35 We calculate τ using the effective top marginal tax rate (including VAT and social security contributions) that was in place in Sweden during our period of interest. The evolution of the top tax rate is shown in figure 5b. The top tax rate peaked at 91 percent in 1982 and then came down as an effect of the tax reforms of the 1980s, culminating in the comprehensive tax reform of 1990-1991. The top tax rate then stayed relatively stable over the 1990s and 2000s, except for the introduction of an extra five percent tax in 1995 (the so-called "värnskatt"). Lastly, the phase-out of the earned income tax credit for high-income earners, which was implemented in 2016, raises the top marginal income tax rate by three percentage points, to 60 percent.
In the same diagram, we also plot the revenue-maximizing top rate, using equation (9) with 35 The shift in 1991 should be interpreted cautiously as the tax reform may have changed how incomes were reported.
We find, given our assumptions about elasticities, that the Laffer rate has varied between 55 and 65 percent since 1971, always substantially below the rates that were in place at the time. As noted above, if tax rate exceeds the Laffer rate, this implies that the top social weight is negative, i.e. politicians seem to be willing to pay to decrease the income of high-income earners. The social weights on high incomes based on equation (10) are shown in figure 6a. The numbers range from −6 to −0.2, stabilizing just above −1 from the early 1990s and onwards.
Of course, the finding that the social welfare weight is negative at the top depends on our choice of intensive-margin elasticity of 0.2. As a final exercise we therefore calculate the elasticities of taxable income that would rationalize historical top marginal rates in Sweden, i.e. the elasticities that would be needed for the top social weight to be zero. As evident from figure 6b, the implied elasticities range between 0.03 and 0.16, which belong to the lower end of the estimates reported in the literature. The optimal tax literature uses labor supply elasticities, the income distribution and a normative social welfare function to calculate the optimal income tax schedule, balancing the desire to redistribute income with the efficiency losses caused by taxation. This paper is part of a recently emerging inverse optimal tax literature, where the researcher uses behavioral elasticities, the income distribution and observed tax laws to back out policymakers' implied social welfare weights for each income level.
To the best of our knowledge, this is the first paper to offer a comprehensive joint analysis of the tax system, income distribution and measures of the distortionary costs of taxation in a Nordic country using the inverse optimal taxation methodology. Sweden represents an interesting case-study as it is a country with one of the highest tax-to-GDP ratios in the world and has a history of highly progressive income taxation with large variation in marginal tax rates both across income groups and over time.
Using administrative micro-data, we have described the Swedish tax system and the full range of the income distribution in detail, allowing us to analyze political preferences for redistribution in Sweden for more than four decades. By virtue of the Swedish governments' long tradition of issuing government reports containing elasticities used by politicians to evaluate policy proposals, we have also been able to gain insights into the distortionary costs of taxation as perceived by politicians. These are the measures of behavioral responses to taxation that the literature has highlighted ideally should be used in the inverse optimal tax exercise. Even though these measures do not deviate significantly from those reported in the empirical public finance literature, their availability strengthens the case that the social welfare weights that we have derived are informative about the actual preferences for redistribution among Swedish politicians.
A Heuristic derivation of the optimal participation tax rate Moreover, the behavioral effect on tax revenue is dB = (T (z) − T (0)) de dT dT . In an optimum, we must have dM + dW + dB = 0, thus we have dT e − gdT e + (T (z) − T (0)) de dT dT = 0. Dividing through by dT e we get 1 − g + (T (z) − T (0)) de

B The level of social assistance
In order to model the extensive margin and compare the social weight of working and nonworking individuals, we need a measure of the minimum income guaranteed by the government, corresponding to the lump-sum component of a Mirrlees-type model. In Sweden, the social assistance departments of the municipalities are responsible for providing everyone who cannot provide for themselves with a minimum income. Between 1957 and 1981 this was called social aid ("socialhjälp"). There was no binding national norm, but many municipalities had a normal amount tied to the consumer price index. A government committee (SOU 1977:40, p. 317) reports that this amount seems to have been about 90 percent of the base amount per year during the first half of the 1970s. We assume that this remained the case until 1981.
In 1982 a new social assistance act (socialtjänstlag) came into force. Social aid was renamed social benefit ("socialbidrag") and the benefit level seems to have been raised. From this year it also became compulsory for municipalities to provide able-bodied adults with a minimum standard of living. Previously this had not been legally required. In 1985 a national norm for social benefit was introduced. We assume this to be the national minimum income. In 1997-1998 some types of expenditure, e.g. home equipment, were taken out of the norm and instead paid out on an individual basis. We assume that an amount corresponding to the decline in the national norm for those years was paid out in addition to the norm. In 2002 the social assistance act was reformed and social benefit was renamed income support ("försörjningsstöd").