Political preferences for redistribution in Sweden

We examine preferences for redistribution inherent in Swedish tax policy during 1971–2012 using the inverse optimal tax approach. The income distribution is carefully characterized with the help of administrative register data, and we employ behavioral elasticities reflecting the perceived distortionary effects of taxation. The revealed social welfare weights are high for non-workers, small for low-income earners, and hump-shaped around the median. At the top, they are always negative, especially so during the high-tax years of the 1970s and ’80s. The weights on non-workers increased sharply in the 1970s, fell drastically in the late ’80s and early ’90s, and have since then increased.


Introduction
By observing how tax policy has evolved over time, one can obtain insights into how the government values redistribution and the priorities policymakers have regarding the wellbeing of different income groups. The picture would, however, be incomplete. Inspection of tax schedules in isolation does not reveal the economic environment in which decisions about tax policy are made. If policymakers trade off the value of redistribution against the efficiency costs of income taxation, the observed tax policy choices of the government reflect not only policymakers' preferences for income redistribution but also the shape of the income distribution and behavioral responses associated with income taxation. The reason is that by imposing different marginal tax rates at different levels of income, the government can expose different individuals to different tax burdens, thereby achieving redistribution. However, individuals respond to income taxation through behavioral responses, such as by reducing their labor effort when being subject to an increased marginal tax rate. These behavioral responses reduce the size of the tax base and constrain the redistributive capacity of the government. Moreover, the shape of the income distribution plays a central role in the design of the income tax. For example, the total efficiency cost of an increase in the marginal tax rate at a given income level depends on the number of individuals who are reporting that level of income and are, at the margin, affected by the tax increase.
Recently, scholars have become increasingly interested in a positive application of optimal tax theory where observed tax policy is analyzed through the lens of an optimal income tax model, following Mirrlees (1971). These exercises, which have become known as 'inverse' optimal income tax analyzes, allow researchers to retrieve the set of social welfare weights that would rationalize the observed (actual) structure of income taxation as optimal given empirical knowledge about behavioral responses to taxation and the income distribution. The approach allows us to assess whether preferences for redistribution (as stated by policymakers) are consistent with observed policy choices given available estimates of the magnitude of behavioral responses, the shape of the income distribution, and a specific theoretical framework (the optimal income tax model). It also allows for a systematic way to evaluate how preferences for redistribution have evolved over time, taking into account the changing shape of the income distribution, the evolution of tax policy, and behavioral responses to taxation. The inverse optimal tax analysis can be seen as an extension of the literature on inequality measurement, pioneered by Atkinson (1970), taking into account the cost (in terms of behavioral responses) of reducing income inequality. 1 In this paper, we use the inverse optimal taxation approach to analyze preferences for redistribution in Sweden, a country with one of the highest tax-to-GDP ratios in the world, that also has been successful in combining large amounts of redistribution with high economic output. It is also a country with a history of highly progressive income taxation and large variation in marginal tax rates both across income groups and over time. While Sweden has previously been part of earlier comprehensive studies analyzing preference for redistribution in EU countries (e.g., Bargain et al. 2014a, b;Spadaro et al. 2015), to the best of our knowledge, ours is the first study that focuses on a Nordic country, using administrative panel data spanning over four decades.
The contribution of our paper is to examine the evolution of political preferences for redistribution in Sweden during 1971-2012, providing a detailed description of the evolution of Swedish tax policy over the last 45 years, highlighting the effects of several major tax reforms. We characterize the income distribution non-parametrically using detailed administrative micro panel data, which enables us to describe its shape well, including at the top. We show that top incomes are approximately Pareto distributed and calculate the Pareto parameter (a measure of the thinness of the tail of the income distribution, see Atkinson et al. 2011 for a discussion). We use these measures to provide a separate analysis of how the social welfare weight on top earners has evolved during our sample period.
From a methodological viewpoint, we employ an optimal income tax framework featuring both extensive and intensive margins of labor supply. For the extensive margin, we use a decreasing profile of participation elasticities consistent with recent empirical evidence in Bastani et al. (2016). For the intensive margin, we use a uniform elasticity profile as our benchmark, but perform sensitivity analysis considering both a 'low elasticity' and 'high elasticity' scenario. In the case of top incomes, this sensitivity analysis captures the perhaps greater uncertainty regarding top earners' tax responsiveness, for instance due to tax reporting and migration responses.
In order to interpret our estimated social weights as politicians' implicit preferences for redistribution, we need to know policymakers' beliefs about behavioral elasticities. This is consistent with a recent paper by Lockwood and Weinzierl (2016) who stress the importance of using the perceived distortionary costs of taxation when backing out implied preferences for redistribution using an optimal tax model. We survey reports from the Swedish government indicating that our baseline intensive-margin elasticity of 0.2 may not be far from the beliefs of Swedish politicians, suggesting that our estimates may in fact be close to actual distributional preferences of Swedish policymakers.
Our most important results can be summarized as follows. In contrast to what is expected given standard assumptions in the optimal income tax literature, and previous results for the United States, 2 we do not find social weights to be monotonically declining with income. Instead, we document a hump-shaped pattern with the highest weights on middle-income workers. We also document some dramatic changes over time. The weights on welfare recipients increased sharply in the 1970s and then fell substantially in connection with the tax reforms in the late '80s and early '90s and have since then seen a steady increase. A general pattern throughout our analysis period is that the social weights on low-income wage-earners are relatively low, while the weights on non-workers are quite high.
Perhaps the most interesting results relate to the social welfare weights at the top. Using our benchmark elasticity, social weights for the top decile are always negative, substantially so during the high-tax years of the 1970s and 80s. However, from 1990 and onwards, the tax rates at the top have approached the revenue-maximizing rates. In our low-elasticity scenario, social welfare weights at the top are positive during the latter part of our analysis period. The implied revenue-maximizing elasticities (i.e., those that would generate top social marginal weights of zero) range between 0.03 and 0.16.
The paper is organized as follows. In Section 2 below we provide a brief review of the related literature. In Section 3, we describe the optimal income tax model that is inverted for the purpose of deriving social welfare weights. Section 4 describes the empirical application, which includes a description of the evolution of the income distribution and the Swedish tax system between 1971 and 2012, as well as a a meta-analysis of empirical estimates of behavioral responses to taxation. Section 5 presents our results: social welfare weights for different income groups and how they have evolved over time. Finally, Section 6 offers concluding remarks.

Related literature
The origin of the positive approach to optimal income taxation analyzed in this paper dates back to at least Weisbrod (1968), who recognized the importance of using distributional weights in cost-benefit analysis. The theory was further developed by Basu (1980). An early important contribution is Christiansen and Jansen (1978) who derived the social welfare weights inherent in the Norwegian structure of commodity taxation. More recently, beginning with Bourguignon and Spadaro (2012), there has been a surge of papers calculating preferences for redistribution based on observed tax/transfer systems using the inversion of optimal income tax models. For example, Bargain et al. (2014a, b) analyze the redistributive preferences of 17 EU countries as well as the United States by inverting the Saez (2002) optimal income tax model. One of the countries they analyze is Sweden. As compared to their work, we use richer administrative panel data, allowing us to provide a more detailed characterization of the income distribution, especially at the top, and importantly, our data allows us to track changes over time for more than four decades. 3 Another interesting crosscountry application of the inverse-taxation methodology is Spadaro et al. (2015), with a focus on the poverty-aversion of European policy-makers. 4 Zoutman et al. (2014) analyze the re-distributive preferences in the Netherlands by inverting the continuous optimal income tax model with both intensive and extensive margins developed by Jacquet et al. (2013). Relatedly, Zoutman et al. (2015) and Jacobs et al. (2017) analyze the political preferences for redistribution of Dutch political parties. Hendren (2014) develops a general approach to compare income distributions recognizing the cost to achieving a more equitable income distribution through modifications to the tax schedule. 5 While the above-mentioned papers focus on analyzing redistributive preferences at a specific point 3 We also calculate Pareto parameters in order to conduct an asymptotic analysis of top income taxation. From a methodological point of view, Bargain et al. (2014a) employ a discrete rather than continuous optimal income tax model and structurally estimate labor supply elasticities on the same dataset that they use to calculate social welfare weights. In contrast, we employ elasticities from the literature that have been obtained from various sources and different empirical approaches. Nonetheless, it is interesting to note that the elasticities we have chosen to reflect the distortionary costs of taxation as perceived by policymakers, are very similar to theirs. At the intensive margin, our benchmark elasticity is 0.2 while Bargain et al. obtain an elasticity of 0.17. At the extensive margin, the elasticities are 0.15 and 0.14, respectively. 4 Spadaro et al. (2015) is a study methodologically similar to Bargain et al. (2014a) in the sense that they use a discrete optimal income tax framework in the spirit of Saez (2002). In contrast to Bargain et al., but in similarity to our paper, they use exogenous elasticities from the literature. 5 While observed tax policy might not be the outcome of a perfect optimization by the government, the key assumption in the inverse optimal tax analysis is that current tax structures have been derived, at least in part, from a policymaker's concern that the redistributive benefits of taxation should be weighed against the efficiency costs of taxation. To the extent that the observed tax schedule is non-optimal, Hendren (2014) demonstrates that the analysis can be seen as the exercise of quantifying the marginal cost of taxation at different points of the income distribution, taking into account the behavioral responses to taxation (as captured by intensive and extensive margins of labor supply) as well as the local shape of the income distribution. The recovered weights can then be used to test the local optimality of the tax system in the spirit of the tax reform literature (see, e.g., Kleven and Kreiner 2006;Saez and Stantcheva 2016 for recent applications). See also Lorenz and Sachs (2016) for a related analysis with an application to Germany. These authors show that a given tax/transfer-system is likely to be inefficient if effective marginal tax rates quickly fall with income. in time, in similarity to our study, Bargain and Keane (2010) and Lockwood and Weinzierl (2016) track how preferences for redistribution have evolved over time. The paper most closely related to our study is Lockwood and Weinzierl (2016), who invert the Diamond (1998) continuous pure intensive-margin optimal income tax model to analyze the evolution of social preferences in the US between 1979 and 2010. In line with Zoutman et al. (2014Zoutman et al. ( , 2015 and Hendren (2014), we also allow for extensive margin (participation) decisions.

Theoretical framework
The origin of this paper is the Mirrlees (1971) framework of optimal income taxation, further developed and elucidated in Atkinson and Stiglitz (1980). Individuals differ in their income earnings capacities (or skills) and make labor effort choices based on their preferences and the link between pre-tax and post-tax income implied by the tax schedule. In the language of mechanism design, the optimal income tax is defined as the structure of income taxation that maximizes a social objective function subject to a set of incentive-compatability constraints and the government's budget constraint. The incentivecompatability constraints derive from the government's inability to impose individualspecific taxation which implies that individuals are free to choose any point on the income tax schedule. 6 The efficiency cost of taxation arises from the fact that raising the disposable income of low-income persons through a reduction in their tax liability (with the purpose of achieving redistribution), results in a tightening of the incentive constraint as high-income persons now have an incentive to reduce their income to obtain a more lenient tax treatment. Saez (2001) brought the literature on optimal income taxation closer to empirical research by reformulating the original Mirrlees model in terms of empirically quantifiable components. Notably, behavioral responses were expressed in terms of empirically estimable taxable income elasticities and the differences in earnings capacities (or skills) were directly mapped to the empirical income distribution. The social objective function was specified as a weighted sum of the utilities of the individuals in the economy. These weights are commonly referred to as 'social welfare weights' in the public finance literature.
To conceptually approach the problem of redistributive taxation, we find it useful to consider a population of agents indexed by a multidimensional set . Following Saez and Stantcheva (2016), we define for each θ ∈ a generalized social marginal welfare weight g(θ) which measures the value society places on a marginal increase in consumption for an individual with characteristics θ . The weights g(θ), θ ∈ embody society's judgment of fairness and may vary depending on both income and non-income characteristics including unobservable personal characteristics and circumstances. The social welfare weights can be used to assess the optimality of the tax system. The welfare gain of any policy reform can be computed by calculating the money metric utility changes for each individual and aggregating them using the weights g(θ).
Since we are interested in using taxes based on income to recover social welfare weights, we need to aggregate g(θ) over subsets of that are associated with individuals having the same earned income z. Thus we will consider the average social welfare weight at the level of income z, which we denote by g(z).
For example, suppose an agent with characteristics θ ∈ maximizes a utility function where τ is the marginal tax rate, y is non-labor income, k > 0 is a parameter measuring the intensity of the disutility of work and z 0 (θ ) is an individual-specific parameter. Then, the optimal income choice of a θ -type individual will be z = z 0 (θ )(1 − τ ) k , where z 0 (θ ) has the interpretation as the potential income of an individual with characteristics θ which is equal to the income chosen by a θ -type individual in the absence of taxation (τ = 0). In this simple framework, g(z) will be the average social welfare weight across the set of individuals who report an income level of exactly z when the income tax rate is τ , namely, the set of agents {θ ∈ : z = z 0 (θ )(1 − τ ) k }. The social welfare weights g(z) that we derive in this paper will be attached to specific income levels z while we remain agnostic about how the income z has been generated (e.g., whether income is the outcome of luck or effort).
In accordance with Hendren (2014) and Zoutman et al. (2014), we derive social welfare weights in a general optimal tax framework featuring both intensive and extensive margin labor supply decisions. As suggested by Saez (2001), and shown formally by Jacquet and Lehmann (2015), optimal income tax formulas can be extended to the case of multidimensional heterogeneity. 7 However, as discussed at length by Hendren (2014), in the presence of multidimensional heterogeneity (heterogeneity conditional on income), the nonlinear income tax schedule cannot be inverted using the established approach. In line with earlier literature, we therefore abstract from this issue. Our approach is to focus on a representative group of the population (that could be considered a specific subset of ) and then analyze how social welfare weights vary across different income levels for individuals belonging to this group.

Intensive margin
Let us now briefly discuss the determination of optimal income taxes. The basic principle behind optimal income taxation is to weigh the costs of taxation (measured in terms of behavioral responses) against the benefits of taxation (measured by the social welfare weights). A tax system is deemed optimal when there exists no perturbation to the tax schedule that would increase aggregate social welfare.
This optimality condition allows one to derive the representation of the Mirrlees optimal marginal tax rate formula presented in terms of income by Saez (2001): which here is slightly re-expressed in the convenient form presented in Saez and Stantcheva (2015) for the case without income effects in the earned income decision. 8 In the above equation, ε(z) is the average elasticity of income z with respect to 1 − T (z) for individuals earning z θ = z, α(z) is the local Pareto parameter defined as α(z) = zh(z)/(1 − H (z)), 7 At least when there are no general equilibrium effects on wages; see Rothschild and Scheuer (2014). 8 To simplify the analysis, we abstract from income effects. Income effects on labor supply are generally considered to be small. McClelland and Mok (2012) survey income elasticities in the range 0-0.1. Moreover, Zoutman et al. (2015) employ an income elasticity of 0.1, but find that this has only a small impact on the results.
where h(z) is the PDF of the income distribution and H (z) is the CDF of the income dis- is the average social weight of people earning more than z. 9 We make a few observations before we turn to the inversion of this formula. The formula embeds the key trade-off between equity and efficiency. First of all, we see that optimal marginal tax rates are decreasing in G(z). The higher is the social weight on the consumption of individuals earnings more than z, the lower should T (z) be, since the size of T (z) affects the total tax paid by all individuals earnings more than z. Second, optimal marginal tax rates are decreasing in the product α(z)ε(z). This product measures the efficiency cost of taxation manifested in the revenue gains or losses induced by the behavioral response to the tax. More specifically, the factor ε(z) measures the extent to which individuals reduce their income in response to a marginal tax increase locally at z and α(z) captures the size of the tax base (loosely speaking, the number of individuals with income z) which measures how costly such a behavioral response will be in terms of tax revenue.
Equation 1 can be inverted to obtain an expression for the social welfare weight g(z): Assuming T (z) is piece-wise linear (which is indeed the case in our empirical application) and that ε(z) is constant on a given segment of the tax schedule, we can compute the derivative in Eq. 2. Thus a simplified version of Eq. 2, valid in the interior of any segment of the tax schedule, is: where ρ(z) = zα (z) α(z) is the elasticity of the local Pareto parameter α(z). To understand the impact of the income distribution on the profile of social welfare weights, it is convenient to recognize that α(z) − ρ(z) = − 1 + zh (z) h(z) , a quantity that Hendren (2014) refers to as the local elasticity of the income distribution. 10

Intensive and extensive margin
We now add an extensive margin to the model by allowing workers to decide whether or not to enter the labor force. Conceptually, this decision is based on a comparison between a fixed cost of work and the financial reward from working where the latter is affected by the tax and transfer system. The fixed cost of working can be interpreted broadly to accommodate the utility costs (e.g., stemming from foregone leisure, searching for a job) or monetary costs (such as commuting or child care costs). 11 9 To simplify notation, we write each variable x as x, recognizing that x denotes an average across subsets of . 10 An equally convenient notation is introduced by Zoutman et al. (2015) is interpreted as the elasticity of the earnings distribution. 11 Technically, the pure intensive-margin model includes an extensive margin since it allows individuals to choose between z = 0 and z > 0. However, a standard practice in the public finance literature is to employ fixed costs of work to better explain the labor supply behavior at low levels of income. See Hausman (1980) and Cogan (1981).
In accordance with Zoutman et al. (2015), the formula for the social welfare weights with both intensive and extensive margins takes the following form: The extensive margin adds an extra term to the intensive-margin formula (2) equal to the product of the participation elasticity P (z) = and the participation tax rate where B 0 is the income provided to an individual with z = 0 (for example, through the social assistance system) and e z is the employment rate among individuals with potential income z (i.e., who would have an income of z if they would choose to enter the labor force). 12 To gain some intuition behind the extensive-margin part of the formula, notice that in the absence of intensive-margin responses, i.e., ε(z) = 0, formula (4) is equivalent to: which specifies the optimal participation tax rate at each income level z and has a form that is familiar from earlier work (e.g., Saez 2002). 13 The key intuition behind this equation is that it trades off the mechanical increases in tax revenue resulting from increases in taxes on the working population against the behavioral losses in tax revenue resulting from participation responses. As explained in Jacquet et al. (2013), in the presence of intensive-margin responses [ε(z) = 0], optimal participation taxes will deviate from formula (5) since one needs to take into account the distortions induced by the tax schedule on intensive-margin responses. 14 In the presence of intensive-margin responses, Eq. 5 instead defines a "target" for the participation tax. In this case, using Zoutman et al. (2015), appendix, equation 34 (or equation 18d in Jacquet et al. 2013) an average of Eq. 5 should hold in the optimum, i.e.

Weight on non-workers
Equation 4 allows us to compute social welfare weights for working individuals. To derive an expression for the social welfare weight on non-workers, we proceed as follows. Denote by E the fraction of the population that is working. The possibility for the government to levy a uniform lump-sum transfer that is received by everyone in the population (workers and non-workers) requires the following optimality condition to be satisfied: 12 The formula is presented by Zoutman et al. (2015) for the case of uni-dimensional heterogeneity. An equivalent formula is presented by Hendren (2014) in his setting with multidimensional taxpayer types. 13 See the Online Appendix for a heuristic derivation using a perturbation argument.
14 Notice that assuming away the intensive margin is equivalent to assuming that the government can observe the potential income of all workers, which implies that the only possible adjustment in response to income taxation is to leave the labor force. If the potential income of each workers on the other hand is unobservable to the government, individuals can adjust their income in response to taxation and it becomes impossible to implement specific participation tax rates at each level of potential income.
The RHS of Eq. 7 measures, since the population size is normalized to 1, the marginal cost of simultaneously decreasing T (z) by 1 SEK and increasing B 0 by 1 SEK (raising the disposable income of all workers by 1 SEK). The LHS of Eq. 7 measures the marginal social benefit of such a transfer where we have denoted the social welfare weight on non-workers with g 0 . Notice that by construction, such a reform does not change the financial reward from working for anyone in the population. Moreover, in the absence of income effects on the decision to supply taxable income, the intensive-margin choices of individuals are unaffected as well. Re-arranging (7) we can get an expression for the social weight placed on the total population of non-workers:

Redistribution through other channels
In this paper, we derive preferences for redistribution by analyzing the evolution of the income tax through the lens of an optimal income tax model. It should be acknowledged that the government performs redistribution through additional channels as well, such as through the choice of commodity tax structure and through the expenditure side of the government budget. We have abstracted from many of these alternative channels for redistribution. This can be motivated partly by the fact that it makes sense to focus on the income tax since it is the primary vehicle of redistribution in modern economies and partly because of the complexity involved in assessing the value that different income groups assign to different components of the government budget. Our approach is equivalent to assuming that public expenditures benefit individuals equally across the income distribution. This may be reasonable to assume in the Swedish case since Sweden has a history of providing uniform rather than means-tested benefits (perhaps for historical reasons) in order to ensure sufficient political support for the welfare state. For example, child care, health care and education are universally provided and it is uncommon for families to switch to private alternatives. This is in contrast to the US, which relies to a larger extent on means-tested in-kind transfers.
Notable such examples in the US that are highly redistributive in nature are the government subsidies directed toward medical care, food consumption/nutritional assistance, housing, and early childhood education. 15 Our simplified approach to deal with public expenditures boils down to assuming that a given percentage of the public budget is dedicated to the financing of public goods (such as defense, public administration and infrastructure) and that the value of these goods enter additively in the utility function of individual agents in the economy. This implies that the structure of social welfare weights is unaffected by the level of public good provision.

Income distribution data
We interpret the theoretical model as describing the optimal taxation of labor income. Therefore, we use concepts in the register data that are as close as possible to labor income. Our income data source is the LINDA database provided by Statistics Sweden, which contains annual income data from administrative tax registers for a random sample of 3.35% of the Swedish population (around 300,000 individuals). We characterize the income distribution non-parametrically using a kernel density estimator with an adaptive bandwidth. The shape of the income distribution for eight representative years is shown in Fig. 1.
Our analysis starts in 1971, the year that Sweden switched from joint to individual taxation for married couples. Thus, the tax unit in our analysis is the individual. Nonetheless, some elements of the transfer system are determined as a function of the income of both spouses. To simplify our calculations, and to be in line with earlier literature that focuses on countries with individual taxation, we focus on single men and women without children. The fact that we focus on childless individuals simplifies the analysis considerably since we do not need to take into account child allowances and various transfer programs which are only available to individuals with children (such as housing allowance). Moreover, to properly apply the inverse optimal tax approach to an economy with couples would require us to invert a family model of optimal income taxation. This is an interesting but formidable task that has not yet been done in the literature and we therefore leave it to future research. 16 Our theoretical model concerns the redistribution of pre-tax labor income, including self-employment income. Hence, we exclude capital income and transfers, such as pensions and unemployment benefits. We assume that individuals who receive labor income do not simultaneously receive unemployment benefits, student aid or pension payments. Therefore, we restrict our sample to individuals aged 25 to 59. Since data availability and definitions change over the 42-year period that we are considering, a completely consistent single income measure for all years is not available. For the period 1971-1992, we study taxable income while taxable labor income is used for the years 1993-2012. 17

Effective marginal tax rates 1971-2012
We now provide a brief overview of the structure and evolution of the Swedish tax system.
The basic structure of the Swedish tax system is simple. A proportional municipal income tax rate is levied at a flat rate on incomes exceeding a standard deduction (varying with income and over time). On top of this, a progressive central government income tax is paid on all income exceeding a threshold. The central income tax targets high-income earners and therefore does not generate as much tax revenue as the municipal income tax. The two other major sources of tax revenue are value-added taxes and social contributions (payroll taxes).
The period of our analysis contains substantial variation in marginal tax rates reflecting, for example, the changes to the tax system that occurred with the great expansion of the welfare state in the 1970s, and the overhaul of the tax system following the big crisis in the beginning of the 1990s. Between 1972 and 1982, the rate of social contributions applicable to top incomes rose substantially, from 2 to 33%. During this period, the income tax also became more progressive. In 1982, the effective top marginal tax rate reached a striking 91%.
The high marginal tax rates caused a growing concern about the distortionary effects of taxation. This resulted in a series of tax reforms during the 1980s, culminating in the "Tax Reform of the Century" 1990-1991. The central government income tax was subsequently greatly simplified and reduced to a single tax bracket of 20% applying to high incomes. With a municipal tax rate of about 30%, the idea was to ensure that no one faced a marginal tax rate higher than 50% (excluding VAT and social contributions).
when there is individual taxation as compared to when there is joint taxation. In countries with joint taxation (such as the United States), it is possible to focus on couples filing jointly. In such case, however, the inversion approach only seems feasible when assuming a unitary model of family decision-making, thereby ignoring potentially important sources of intra-household inequality that previous research has shown can have considerable consequence for the optimal tax problem (see, for example, Bastani 2013). 17 The earned income tax credit (EITC) was introduced in 2007 and targets precisely the income category that we are interested in. Because the EITC in a sense supersedes the standard deduction, it also determines the marginal tax rate. Therefore we use EITC-eligible income ("underlag för jobbskatteavdrag") for the years 2007-2012. "Primary income" is a closely related income concept which we use for 1993-2006. Since no clear-cut labor income variable is available in LINDA before 1993, we have chosen to use taxable earned income ("[kommunalt] taxerad förvärvsinkomst"). This variable is a bit problematic due to the fact that many social security transfers are taxable in Sweden. For example, it implies that there is a greater mass at low-tomedium income levels generated by these taxable transfers. However, this issue is somewhat mitigated by the fact that the number of individuals receiving taxable transfers from the government was lower in the 1970s and 80s than after the crisis in the early 1990s. In the Supplementary Material, Fig. 12 In the 1990s, Sweden experienced a severe economic crisis that prompted various austerity measures. Notably, the standard deduction was decreased and a second bracket of the central government tax at a level of 25% was introduced. Some of these austerity measures were then reversed in the early 2000s. The most important tax reforms in the last two decades were initiated in 2007, following the election of a new center-right government, when an earned income tax credit was implemented. It was expanded over the next couple of years, causing a substantial reduction in average tax rates for all labor income earners. The EITC was introduced in 2007 and expanded in 2008, 2009, 2010 and 2014. A detailed graphical representation of how the different components of the tax system have evolved over time can be found in Fig. 8 in the Online Appendix.
We compute the effective marginal rate of income tax, τ , as follows: where τ l is the marginal tax on labor, τ c is the weighted tax on consumption (quoted taxinclusive) and τ s denotes social security contributions (quoted tax-exclusive). 18 VAT is ultimately paid by consumers and affects the tax wedge on labor and is therefore also relevant to our analysis. Our main source for the tax law of the 1970s and 80s is Söderberg (1996). For the 1990s and 2000s, we have mostly relied on primary sources. Following the Mirrleesian approach to the treatment of nonlinear income taxes, we view government transfers as negative taxes and analyze an integrated tax-benefit system. We equate the lump-sum component of the nonlinear income tax with the social assistance system (a last-resort income support program provided by municipalities). Social assistance depends on household income as well as assets, and varies by municipality. 19 Had we included other benefit systems, such as unemployment insurance, the social weights would have been higher. We chose to focus on social assistance, as it is the most salient redistributive program and permits a consistent comparison across years. 20 Finally, it should be noted that in our analysis, we only include the value-added tax while excluding specific taxes on energy, carbon dioxide, alcohol and tobacco, as these taxes can be considered to be levied to correct for externalities and not primarily employed to achieve redistribution. 21 18 Social security contributions (SSC) are paid by employers to finance social insurance benefits such as sick pay, unemployment benefits and pensions. Because the benefits received are a function of income, social security contributions can be considered part taxes and part insurance fees. To exactly separate the tax from the benefit component of the SSC is a laborious economic exercise that is sensitive to assumptions. Here we follow Flood et al. (2013) in assuming that 60% of social security contributions constitute fees and the remainder taxes. All social security benefits are capped, which means that after a certain point, 100% of the SSC constitute taxes. The cap varies by benefit and over time, but we assume it to be 7.5 price base amounts, corresponding to 340,000 SEK in 2015, as this has been the cap for sick pay since 1990. Social contributions were initially levied at different rates at different income levels, with the highest marginal rate applying to middle-income earners because the connection with benefits was the largest in that segment. Gradually the connection with benefits weakened and from 1982 social contributions have been levied at the same rate for all taxpayers. 19 The calculation of social assistance is described in greater detail in Section A.2 of the Online Appendix. 20 The presence of the social assistance system generates a 100% effective marginal tax rate at the bottom of the income distribution. This implies that there is a region at the bottom where it is suboptimal for any individual to locate (with strictly convex preferences). Correspondingly, our formula for the social welfare weight (which is based on a principle of individual optimization) is undefined in this region. 21 We obtain effective average VAT rates from Du Rietz et al. (2013) and assume that all agents face this consumption tax rate, regardless of income.
Our analysis focuses primarily on eight representative years: 1971, 1977, 1983, 1989, 1991, 2000, 2006 and 2012. Figure 2 shows the effective marginal tax schedule for these years. In Fig. 9 in the Online Appendix we show similar graphs for all years during our sample period.

Elasticities
We now describe the intensive-margin and extensive-margin elasticities that we use to capture behavioral responses to the tax system. Our model is flexible and allows these to vary across income levels and over time.
Two approaches are possible. The first is to use reasonable estimates from the research literature. Our results will then have a fiscal externalities interpretation, in line with Hendren (2014), and will then describe the opportunity cost of taxing different income groups. A second possible approach is to use the actual elasticities employed by policymakers when they evaluate proposals for policy reforms. If two politicians disagree on the desirability of tax reform, they could in principle either disagree about labor supply elasticities, or about social weights. Assuming that the elasticities stated by politicians correctly reflect their beliefs about the efficiency costs of taxation, the elicited social welfare weights will be informative about their preferences for redistribution. This requires that policymakers truthfully reveal their beliefs, and excludes the possibility that, e.g., right-of-center politicians exaggerate behavioral responses to motivate tax cuts, when the real reason is that they have a higher subjective social welfare weight for the group concerned. Sweden is well suited for our case study, as the country has a long tradition of commissioning independent reports which the government then uses as a basis for policymaking. These reports can give us an indication of politicians' beliefs about labor supply elasticities. Although the two approaches are conceptually different, the elasticities appear to be similar for the case of Sweden, as we shall see. We begin by surveying the research literature. We also need to decide whether to use elasticities estimated on Swedish data or estimates from the international literature, recognizing that there is a potential trade-off between relevance to the Swedish context on the one hand (external validity), and concerns regarding credible identification (internal validity), on the other hand. Saez et al. (2012) provide a recent survey of the international literature on intensive margin responses (elasticities of taxable income with respect to the net-of-tax rate) and conclude that most of the estimates in the literature range between 0.12 to 0.4, with 0.25 as a reasonable midpoint. In another recent survey, Chetty (2012) finds that, in the presence of optimization frictions, an elasticity of 0.33 is the value that best rationalizes previous studies. Based on this evidence, Hendren (2014), who analyzes social welfare weights in the United States, chooses 0.3 as the elasticity for all taxpayers except those subject to the phase-out of the EITC, who are assigned a lower elasticity. Zoutman et al. (2015) use an uncompensated elasticity of 0.25 based on Dutch studies.

Intensive margin
In terms of Swedish evidence, Sørensen (2010) provides a brief survey of recent Swedish estimates of taxable income elasticities and concludes that 0.2 is a conservative best estimate. Blomquist and Selin (2010) use Swedish tax reforms in the 1980s to estimate an uncompensated elasticity of 0.2 for men and much higher (but imprecisely estimated) for women. Gelber (2014) evaluates the Swedish tax reform of 1991 and finds a compensated elasticity between 0.23 and 0.41 for married men and 0.07 to 0.47 for married women. Additional studies surveyed by Ericson et al. (2015) broadly support these magnitudes. A seemingly notable exception is Bastani and Selin (2014), who present Swedish bunching estimates of taxable income elasticities and find a compensated taxable income elasticity of zero locally around the first central government kink point. However, these authors also recognize that their point estimate is consistent with much higher elasticities in the presence of optimization frictions.
Relevant to the Swedish case is also the recent study by Kleven and Schultz (2014) which evaluates Danish tax reforms during the period 1984-2005 and finds an elasticity of 0.05 on average, with the largest tax changes producing estimates of 0.2-0.3. They argue, in similarity to Bastani and Selin (2014), that larger tax reforms may allow individuals to overcome optimization frictions and are therefore possibly more informative about the structural elasticity.
In reviewing the literature on intensive-margin elasticities, it seems that estimates from the Swedish literature do not deviate significantly from the 0.12 to 0.4 range of Saez et al. (2012). We proceed to choose 0.2 as the value of our baseline compensated intensive margin elasticity. For sensitivity analysis, we additionally consider the values 0.1 and 0.3.

Extensive margin
There is a scarce supply of credible quasi-experimental evidence on extensive-margin responses. Chetty et al. (2012) survey recent microeconometric studies that have estimated extensive-margin elasticities and find that an extensive margin elasticity of 0.25 is most closely consistent with the evidence. One should keep in mind, however, that the participation elasticity is not a deep structural parameter, but depends on the number of workers who are, at the margin, indifferent between working and not working. Thus we should expect participation elasticities to be different for different income groups (depending on the skill-specific employment level). 22 Of particular relevance to our analysis are two studies on Swedish data, Selin (2014) and Bastani et al. (2016). Selin exploits the Swedish individual tax reform of 1971 to estimate participation responses and found estimates in the range 0.5-1. Using more recent data, Bastani et al. (2016) exploit a reform in the tax-transfer system in 1997 and find a central estimate of the participation elasticity of 0.13. Their estimates range from 0.24 in the bottom quartile, where the employment level is 80%, to 0.09 in the top quartile, where the employment level is 95.5%. 23 In line with this evidence, we set the extensive margin elasticity to 0.2, 0.15, 0.1, 0.05 and 0 for the lowest to highest quintiles, respectively. Hence the average extensive margin elasticity is 0.1.

The perceived distortionary costs of taxation
The government committee that laid the groundwork for the Swedish "Tax Reform of the Century" in 1991 concluded that an intensive-margin labor supply elasticity of 0.25-0.3 best reflected available evidence, some of which had been commissioned by the committee. 24 A government report commissioned to evaluate the reform concluded that a compensated hours elasticity of 0.1 for men and 0.4 for married women seemed reasonable. 25 These estimates are fairly close to our baseline elasticity of 0.2, which indicates that our social weights estimates can be seen as representing the social weights of centrist politicians of the late 1980s and early 1990s. In 2006, an alliance of center-right parties won the election on a platform of "making work pay" and increasing labor force participation. There is some indication that this was reflected in the choice of elasticities, with greater emphasis on the extensive margin. According to the Fiscal Policy Council (2008), the government used an hours elasticity of 0.1 and a participation elasticity of 0.25 to evaluate its reforms in 2007. In a report from the Swedish Ministry of Finance (2009), the government assumes an hours elasticity of 0.05 or 0.1 for men and 0.1 or 0.15 for women, and a participation elasticity of 0.1. The participation elasticity is not very high, but considering the conservative choice of intensive-margin elasticity, there is some support for the notion that the government's rhetoric of emphasizing the importance of the labor force participation channel, was also reflected in its beliefs about the relative magnitudes of intensive-and extensive-margin elasticities.

Results
In this section, we report the results of our model: the marginal social welfare weights placed by the government on various income groups and how they have evolved over time. We also 22 See Bastani et al. (2016) for further discussion about the relationship between the participation elasticity and the employment level. 23 The estimates reported by Selin (2014) and Bastani et al. (2016) may at first glance seem very different. They are however consistent if one notices that Selin (2014) reports that the pre-reform share of married women with positive earnings was 67% (Table 8) whereas the corresponding share in Bastani et al. (2016) is 90%. 24 See Swedish Ministry of Finance (1989), supplement 4, p. 15. report how the government's relative valuation of working and non-working individuals has changed over time. Finally, we provide a special analysis of the social welfare weights at the top of the income distribution.

Social welfare weights
We plot the weights for eight representative years and present the results in Fig. 3a. 26 These years represent the major episodes of the evolution of the Swedish tax system. We focus on the solid lines, which correspond to the benchmark elasticity. In Fig. 12 in the Online Appendix, we provide a figure for every year 1971-2016.
The qualitatively most important features of the shape and evolution of the social welfare weights during our period of study can be described as follows. For most years, social weights are increasing up to about the 40th or 50th percentile, producing a hump-like shape. This is an interesting feature of our schedules, but it remains an open question if the hump reflects support of redistribution toward the middle class or a tendency of politicians to cater to the median voter (a political economy consideration that is not part of our model). During the late 1980s and early 1990s, the hump is less clear, but it reappears again in the late 1990s. An overall tendency is that the profile of social welfare weights become flatter in later years, reflecting a downward trend in the amount of redistribution carried out by the Swedish tax system.
The hump-shaped pattern of social welfare weights is surprising, as one would normally expect the social weights to decline monotonically with income, reflecting the government's desire to redistribute from high-to low-income individuals. The hump-shaped feature can be contrasted with the results of Hendren (2014), who finds a monotonically declining shape for the United States, but is similar to the results for the Netherlands presented by Zoutman et al. (2015).
These differences are interesting, but need not necessarily reflect fundamental differences between countries in terms of preferences for redistribution. Instead, they could reflect inherent differences between countries in terms of the extent the government redistributes through the tax system versus other channels. In particular, countries differ in their reliance on means-tested versus universal support programs (see also the discussion in Section 3.4). 27 Turning to the upper half of the distribution, we see that the weights decline quite steeply in this region in the 1970s and 1980s, but the decline is much less pronounced in the 1990s and beyond. Figure 3b offers a simpler way of describing how the social welfare weights at the 30th, 50th, 70th and 90th percentiles have evolved over time. We pay special attention to the very top of the distribution in Section 5.3 below, but we can already make the observation that the social weights for high income earners (represented by the P90 line in Fig. 3b) have increased over time, and then stabilized in the last decade.
Turning now to our sensitivity analysis, represented by the dotted and dashed lines (reflecting the high and low elasticity scenarios, respectively) in Fig. 3a, we see that some interesting patterns emerge. For example, with a higher elasticity, the hump-shaped pattern is amplified. Moreover, to rationalize the actual progressive income tax (that redistributes Fig. 3 The evolution of social welfare weights from high income to low income individuals) as optimal in the presence of a uniformly higher elasticity across the income distribution, the social weights on low income earners must increase, and the social weights on high income earners must decrease, relative to the benchmark scenario.

Social welfare weights for workers and non-workers
The evolution of the average social weights attached to those who are working and not working is shown in Fig. 4. 28 We may observe that, throughout our sample period, the social welfare weights are substantially higher for non-workers as compared to low-income earners. 29 As individuals who report zero income are assumed to receive social assistance, these results depend on the generosity of welfare benefits relative to the tax treatment of working individuals. Moreover, in accordance with the equations in Section 3.2, the results also strongly depend on the fraction of the population in each of the working and non-working state.
Notice that we employ a generous definition of non-work, counting as non-workers all individuals who earn less than the net (after-tax) welfare benefit level. For some years this is a quite high number, over a third. The fact that we use annual income measures also contribute to the high fraction of non-workers, as workers can have temporarily low incomes (for example by working part of the year).
It seems that the weight on workers is largely driven by variations in marginal tax rates. The increasing tax rates of the 1970s are clearly visible. There is a remarkable stability after the 1990-1991 tax reform. Fluctuations in the relative gain to working -for example the introduction of the EITC in 2007 -barely seem to affect the weight on the employed. Perhaps this is unsurprising as we have set the extensive margin elasticities (with an average of 0.1) to generally be lower than the intensive margin elasticity (equal to 0.2 in the benchmark scenario).
The weight on welfare recipients is much more volatile, varying between 1.6 and 3.3. As benefit levels have remained relatively stable, the volatility is mainly explained by tax reforms and changes to the participation rate. During the high-marginal-tax era of the late 28 Notice that the weight on workers is simply the weighted average of the social weights shown in Fig. 3a. The weight on welfare recipients is the mirror image of the weights on workers, adjusting for variations in the proportion of workers and non-workers over time. 29 This is consistent with Bargain et al. (2014a) and Spadaro et al. (2015), but less pronounced in Spadaro et al. 1970s and '80s, the weight on the non-employed is substantially higher than the weight on low-income workers. Also during the 1990s and 2000s, the weight on non-workers seems high. We interpret this to mean that Swedish politicians find those out of work to be deserving, as opposed to the American view that the working poor are especially worthy of support.

Social welfare weights at the top
The logic of the inverse optimal taxation approach becomes especially clear in the case of top incomes, since we can exploit a simplified way of expressing the optimal top income tax rate.
Consider again the optimal income tax formula (1). To derive the social welfare weight for top income earners, we assume that for z sufficiently large, say z ≥ z, G(z), α(z) and ε(z) all are constant and equal toĝ, a and e, respectively. Moreover, we assume that the government levies a constant marginal tax rate above z, equal to τ . Thus, for z ≥ z the above expression takes the form: This is the well-known top tax rate formula first presented by Saez (2001). The simplification is made possible by the insight that, in actual income data, the distribution of top incomes can be approximated by the statistical Pareto distribution. In regions where the income distribution follows a Pareto distribution, the local Pareto parameter is constant. 30 Re-arranging (9) we obtain an explicit expression for the social welfare weight at the top: An important special case occurs whenĝ = 0, i.e., when the policymaker places no weight on additional consumption of the very rich. This yields the revenue-maximizing (Laffer) tax rate τ L = 1 1+ae . If τ > τ L cutting taxes for the very rich would raise tax revenue and in that case the social welfare weight placed on the very rich,ĝ, is negative. We may also notice that whenĝ = 0 in Eq. 10 then e = 1 a 1−τ τ is the elasticity of taxable income that would rationalize current marginal tax rates as revenue-maximizing (i.e., they would be the optimal marginal tax rates whenĝ = 0). We calculate the Pareto parameter as follows: where z 1 is an arbitrary income level beyond which we wish to apply the Pareto distribution andz 1 is the average income above z 1 . The advantage of this estimator is that it exploits data in the entire right tail of the income distribution. We choose z 1 equal to three times the average income. The Pareto parameter for 1971-2012 calculated according to Eq. 11 is plotted in Fig. 5a and ranges between 3 and 4. There is a tendency of an increase during the 1970s and '80s and a decrease during the 90s and 00s. 31 30 In Fig. 11 in the Supplementary Material, we plot the local Pareto parameter for each year of our analysis. It is in general quite stable in the upper part of the income distribution. 31 The 1991 shift should be interpreted cautiously as the reform may have changed how incomes were reported.

Fig. 5 Shape and taxation of top incomes
We calculate τ using the effective top marginal tax rate (including VAT and social security contributions) that was in place in Sweden during our period of interest. The evolution of the top tax rate is shown in Fig. 5b.
In the same diagram, we also plot the revenue-maximizing top rate, using Eq. 9 witĥ g = 0. We find, given our assumptions about elasticities, that the Laffer rate has varied between 55 and 65% since 1971, always substantially below the rates that were in place at the time. The social weights on high incomes based on Eq. 10 are shown in Fig. 6a. For the benchmark elasticity scenario, the numerical values range from − 6 to − 0.2, stabilizing just above − 1 from the early 1990s and onwards. Of course, the finding that the social welfare weight is negative at the top depends on our choice of benchmark intensive-margin elasticity of 0.2. For our low elasticity scenario reported in Fig. 6a, social weights at the top are positive within the latter half of our analysis period.
Negative social welfare weights suggest that politicians have been refraining from cutting tax rates even though it would raise tax revenue to do so, violating Pareto efficiency. The convergence of top tax rates to the revenue-maximizing ones could potentially indicate that politicians have enhanced their understanding of the trade-offs involved in income taxation during the last four decades. It is also conceivable that political economy considerations could have made politicians reluctant to cut taxes in order to reap benefits that will take several years to materialize. It is also possible that Swedish politicians valued a compressed income distribution so much that they were willing to give up some tax revenue to attain it. As a complementary exercise, we have in Fig. 6b calculated the elasticities of taxable income that would rationalize historical top marginal rates in Sweden, i.e., the elasticities that would be needed for the top social weight to be zero. As evident from the figure, the implied elasticities range between 0.03 and 0.16, which belong to the lower end of the estimates reported in the literature. 32 32 The finding of a negative social welfare weight at the top is in contrast to Bargain et al. (2014a). This can potentially be explained by the fact that they estimate a low intensive margin elasticity for the top quintile, their exclusion of consumption taxation, resulting in lower top effective tax rates, or, because of differences between their top quintile and the asymptotic income level associated with our continuous optimal tax framework. Spadaro et al. (2015) also find positive social welfare weights for top incomes, which can be explained by their choice of intensive margin elasticity drawn from the labor supply literature (which is lower than our benchmark elasticity that is chosen to reflect the taxable income literature).

Concluding remarks
We have analyzed political preferences for redistribution in Sweden for more than four decades using administrative micro-data on the evolution of the income distribution, calculating effective marginal tax rates, and employing elasticities reflecting the distortionary costs of taxation. Sweden represents an interesting case-study as it is a country with one of the highest tax-to-GDP ratios in the world and has a history of highly progressive income taxation with large variation in marginal tax rates both across income groups and over time. It is also a country with a long tradition of issuing government reports, informing policymakers about the distortionary costs of taxation, suggesting that the social welfare weights that we have derived are informative about the actual preferences for redistribution among Swedish policymakers.