1 Introduction

The standard public finance approach to analyzing tax-influenced economic decisions presumes a well-informed taxpayer who makes rational decisions while understanding the important features of the economic environment. This paradigm has long been considered non-satisfactory in the context of tax evasion, where the standard (Allingham and Sandmo 1972) model overpredicts the extent of cheating (see Andreoni et al. 1998; Slemrod and Yitzhaki 2002, for surveys of the literature). Recent empirical work also recognizes that behavioral responses are sometimes puzzlingly small and inconsistent across different contexts.Footnote 1 One potential direction for reconciling theory and evidence on non-compliance is to provide a more realistic characterization of the economic environment. The objective of the current paper is to provide empirical evidence regarding a particular class of explanations for tax-motivated behavior: whether tax avoidance spreads within social networks. Our results show that tax avoidance runs in the family.

Beyond attempts to improve characterization of the incentives faced by individuals, the recent development is to postulate an existence of optimization frictions that may stop individuals from pursuing otherwise optimal tax adjustments (e.g., Chetty et al. 2011; Chetty 2012; Kleven and Waseem 2013). This is a useful abstraction that potentially allows for explaining inconsistencies in observed empirical patterns, but it encompasses many possibilities: optimization frictions may be due to behavioral biases, lack of information, monetary or time adjustment costs or non-standard preferences. These varying possibilities might have very different policy implications so that discriminating between them is very important. Furthermore, there are two related, but distinct reasons to consider frictions. On the one hand, one may be interested in developing a better understanding of individual behavior. On the other hand, frictions are a potential source of heterogeneity in behavior in the population. Our findings of social interactions in the tax avoidance context provides evidence for both of these lines of thinking: for networks to matter, individual optimization has to depend on their characteristics; at the same time, by their very nature, networks are heterogeneous and hence generate differences in behavior of otherwise similar individuals.

We focus on a particular and natural choice of an exogenous social network: family members. There are many channels through which the tax minimizing behavior may spread, such as information about costs and benefits, awareness of tax avoidance strategies, and perception of social acceptance. We do not have data to separate the relative importance of such channels, and our analysis will thus provide estimates of their combined influence on behavior.Footnote 2

Empirical work on tax avoidance and evasion faces a lot of challenges, for example due to difficulty in observing the outcomes: participation in and extent of tax avoidance/evasion. We can sidestep this problem because of the existence of a well-defined tax shelter that is observable in our data; we provide more details below. Approximately 8% of Norwegian firm owners adopted this particular tax shelter during the second half of 2005. We can also observe the precise timing of adoption and hence analyze its dynamics. We utilize very detailed administrative data covering the universe of Norwegian firms, individuals and shareholders, and we are thus able to link firms with their individual owners. And, importantly, our data allows for constructing extended family networks that we can then use to identify spillover effects.

We begin by showing that decisions to pursue legal tax avoidance are correlated within extended family networks: Early adoption in family network predicts own subsequent take-up. Furthermore, the precise timing of adoption is linked: our evidence reveals significant increase in adoption within the week after take-up of a network member. This is robust to controls and provides suggestive evidence that take-up in the network stimulates own adoption.

While timing evidence is suggestive, it does not nail causality. In order to establish causal evidence that avoidance spreads within networks, we exploit the presence of a discontinuity in taxpayers’ eligibility for setting up a tax shelter at 10% ownership in a regression discontinuity design. We show that this discontinuity affects own tax avoidance, and then we establish that it also affects tax avoidance of taxpayers in the family network (who are not necessarily themselves on the 10% margin). In other words, similar taxpayers who have similar family networks, pursue different decisions as the result of a slight difference in characteristics of one of their family members that discontinuously changes availability of avoidance for that family member (rather than the individual itself). We interpret this evidence as providing a concrete example of an optimization friction (driven by characteristics of the network) that is responsible for generating heterogeneity in taxpayer behavior with real tax consequences.Footnote 3

There is a variety of social networks to choose from when studying social interactions, such as family, colleagues, schools, sports, church attendance, shareholders, accountants, board members, neighborhoods, etc. The current paper studies the impact of predetermined family networks for participation in legal tax avoidance. Bohne and Nimczik (2018) study the dynamics of legal tax avoidance within networks of firm and employees. They document the take-up of legal individual tax deductions for personal expenses spreads as workers and accountants switch firms. There is also a small, but growing, literature studying the role of various networks for agents’ decisions to participate in illegal tax evasion, where the conclusions correspond to the conclusions in the current paper: tax evasion spreads within networks. Pomeranz (2015) highlights that networks matter in the context of VAT-evasion and finds evidence of spillover effects in a firm’s trading network. Boning et al. (2018) find that tax compliance by firms is affected by IRS-interventions in the firm’s network, through a shared tax preparer, geography, or a parent-subsidiary relationship. Paetzold and Winner (2016) use variation in job changes to identify the spillover effects from the work environment on the individual compliance decisions. They find that job changers moving to companies with a higher fraction of cheaters increase their cheating, while movers to companies with a lower fraction of cheaters tend not to alter their reporting behavior. To our knowledge, the only other (and concurrent to our work) paper that uses family relations to study the effect of norms and social interactions on the participation in tax minimization is Frimmel et al. (2018). They use Austrian data on claimed commuter tax deductions, where they can actually check the commuting distance and determine whether the deduction was rightfully or wrongfully claimed, the latter constituting tax evasion. By studying father-son pairs they find that tax evasion runs within the very close family. However, where Frimmel et al. (2018) study the intergenerational transmission of illegal tax evasion behavior, we study how legal tax avoidance behavior spreads within broad family networks.

The plan of the paper is as follows. In the next section, we describe Norwegian tax policy and the reform that gives rise to the research design in this paper and in Sect. 3 we describe our data. In Sect. 4 we show descriptive evidence on timing and find that the take-up in the network accelerates overall take-up. Section 5 is devoted to the empirical strategy. Our main results are in Sect. 6, where we present regression discontinuity-based evidence of the effect of the 10% rule that is the source of the discontinuity on individual take-up, followed by demonstrating the spillover effect in the network as well as timing effects. Conclusions are in the final section.

2 The 2006 reform and tax sheltering opportunities

Under the Norwegian dual income tax in effect as of 1992, capital gains realized by both individuals and corporations were subject to the basic tax rate of 28% (that applied also to corporate, capital and labor income). Dividends were tax exempt on both individual and corporate levels.Footnote 4 A shareholder income tax implying 28% tax on dividends for personal shareholders was announced in 2003 and introduced effective as of January 1, 2006. As one would expect, this led to massive avoidance responses. Dividends to personal shareholders were extraordinarily high in 2005, but plummeted post-2005.Footnote 5 During the transition period, the tax on realized capital gains on shares for corporate shareholders was removed without warning on March 26, 2004.Footnote 6 These changes unambiguously strengthened the incentive to own shares in a firm through another entity rather than directly.

Indirect ownership in general allows for separating two decisions: extracting resources from a firm and the ultimate transfer to the individual. Such a separation can have non-tax-related benefits to the owners such as shielding personal assets from third parties (creditors, family members) in a holding company, as well as tax-related benefits such as tax-free consumption within a (holding) firm without bearing the economic risk associated with the activity of the original firm (see Alstadsæter et al. 2014, for evidence of this type of tax planning), and arbitrage between personal and corporate taxation.

Founding a holding company implies some costs. A modest registration fee is to be paid at foundation, and at the time of the reform the annual accounts were to be approved by an auditor, which generated added costs. There are also time costs for the shareholder in keeping and submitting the accounts, or alternatively, out of pocket costs if she chooses to employ outside help by an accountant for this. However, in the case of a holding company with little or no additional activity, these auditor and accountant fees are modest due to the lack of complexity of the accounts. In addition, there was a minimum equity requirement of NOK 100,000. Another potential cost of establishing a holding company is advisor/lawyer costs if seeking advice in how to pursue legal tax avoidance strategies. These costs would be reduced or even not occur if someone in the shareholder’s network already has established a holding company and can share these insights for free.

Following the reform, tax exempting capital gains and dividends for corporate owners creates some additional and very important advantages due to deferral of taxation. First, the deferral enables tax-free growth of assets within the holding company. Second, it enables pooling of losses and gains from various enterprises on the holding company level. The shareholder income tax does not allow the rate-of-return allowance (see footnote 5) to be transferred across different types of shares, and at realization, unused rate-of-return allowances are lost at shareholder level. At company level, there is no dividend tax and thus no unused allowance to be lost at realization. This will then increase the total allowance of the owner of the holding company, as the individual shareholder’s allowance is based on her share of the external equity in the holding company, which is unaffected by this transaction. Third, the investor may make a policy bet on the dividend tax to be removed in the future. Alstadsæter et al. (2014) and Alstadsæter et al. (2016) show that the reform led to very large changes in tax reporting behavior of business owners. From our perspective, the key point is that individuals have a stronger incentive to own firms indirectly after the reform; the prima facie evidence of it being so is the massive number of conversions that took place.

For the existing firms, switching from direct (individual) to indirect (holding company) ownership should in principle require transferring/sale of existing shares and would trigger capital gains tax liability. In order to level the playing field between individual and corporate investors, the so-called Transition Rule E was introduced, which under certain conditions enabled an individual to transfer his/her shares in an existing firm to a holding company during 2005 without triggering capital gains tax.

The Transition Rule E was first proposed on November 19, 2004, and sanctioned on December 10, 2004. It removed capital gains tax liability when an individual shareholder transfers all her shares in a firm to a newly founded corporation, given that this new holding company in the end holds at least 90% of the shares in the transferred company and the compensation is in the form of shares in the new corporation. The new holding corporation had to be founded and a report sent to the company register by December 31, 2005. It turned out that this transition rule was restrictive and relatively few shareholders could utilize it. A more liberal version of the Transition Rule E was proposed on May 13, 2005, and later sanctioned on June 17. Under this new version, the 90% threshold was reduced to 10%. We will refer to a holding corporation that was founded during 2005 in response to the Transition Rule E as a tax shelter or an E-firm.

An individual who already owns 10% of shares in a firm was in a position to establish an E-firm alone with no additional adjustments, while an individual who owned just below 10% would have to either buy or coordinate with others, inducing increased coordination costs, both in time and loss of control over future payout policy from the tax shelter.

To summarize: prior to 2004, the incentive to own corporations directly was fairly strong because corporate capital gains were subject to taxation (thereby resulting in multiple layers of taxation in case of corporate ownership before reaching personal owners), while dividends were tax exempt in any case. As of March 2004, neither corporate capital gains nor dividends were subject to the tax. As a result, indirect ownership of a firm allowed for deferral of taxation of capital gains until the holding company is sold. The incentive for indirect ownership was significantly strengthened by the introduction of individual-level dividend taxation as of January 1, 2006. For the existing ownership stakes, taking advantage of these deferral opportunities should in principle require realizing capital gains and triggering tax liability, but the Transition Rule E provided an opportunity to convert to indirect ownership without the tax. The main purpose of holding companies set up under Transition Rule E appears to be to work as a tax shelter intended to defer taxation, and alternatives to achieve the same outcome would be costly. During 2005, 16,483 holding corporations were set up and approximately 9% of existing non-listed firms at the end of 2004 had at least some of the owners electing to transfer their stake to a holding company. Figure 1 shows the timing of adoption of firms that we classify as being set up under the transition rule. Adoption was slow at the very beginning and increased rapidly toward the end of 2005, just before the opportunity to take advantage of it expired.

3 Dataset description

We use very detailed administrative data covering the universe of Norwegian firms, individuals and shareholders. Every resident in Norway is provided a unique personal identifier that is present in all databases, enabling us to follow every individual over time and across datasets. The same holds for firms. The shareholder register contains records of every shareholder (firms and individuals) of every Norwegian corporation for 2004–2008 at year end. For our sample, we include all individual share holders in 2004 who resided in Norway, owned shares of a Norwegian non-listed corporation with less than 100 individual owners and are not sole proprietors.Footnote 7

We can also identify holding companies that were set up during 2005 through the sector code assigned to them by Statistics Norway, determine their ownership structure and holdings.Footnote 8 Because we observe this information for a number of subsequent years, we can also trace changes in the ownership structure such as transfers of an existing firm to a holding company. Importantly for our analysis, we know the exact date when each firm (holding companies included) was registered.

The shareholder register was established in 2004, and we do not have information on firms’ dividend distributions, or on individuals’ ownership shares, prior to 2004. In order to avoid selection into eligibility for adoption of E-firm during the second half of 2005 by increasing ownership in a firm beyond 10%, we define eligibility pre-reform as of December 31, 2004. This means that we might have individuals in our treatment group that are not eligible to adopt E-firm during the last half of 2005, while others that we have in our control group may become eligible during 2005, due to change in ownership. This might introduce some attenuation bias, making our job of identifying an effect more difficult. An additional potential complication might be that the accuracy of the shareholder register might be lower in the first year of 2004, due to start-up problems in reporting.

Fig. 1
figure 1

Timing of setting up E-firms

Table 1 Summary statistics

Using other register information we are able to link characteristics, both demographic (gender, age, marital status, immigrant status, education) and economic (including tax-related information such as gross and taxable income, dividend income, capital gains realizations).

To estimate the effect of a tax shelter being set up in shareholder i’s network on the likelihood that the shareholder himself adopts a tax shelter, we need to make operational a definition of the network. In this paper, we focus on a particular and natural choice of an exogenous network: family members. To do so, we identify the following family members of each shareholder in our 2004 sample: her direct (parents, children, siblings, spouse) relatives and direct relatives of the direct relatives.Footnote 9

For the descriptive evidence of the timing of adoption of E-firm in the next section we will use the whole sample. Table 1 shows summary statistics for this sample, and we notice that three quarters of the shareholders are male, average age is 46, a vast majority are married and live in urban areas. Also, 8% of the shareholders established an E-firm, and 12% have a family member with an E-firm.

For the regression discontinuity analyses, we focus on observations around the 10% threshold; see Sects. 5.1 and 6 for details. A complication is that inspection of the ownership data reveals clustering of individuals at ownership shares that correspond to splitting shares of the firm as exact fractions. This is a potential threat to the continuity of characteristics of the underlying population and, hence, a possible threat to a practical implementation of regression discontinuity approach that requires that the outcome is smooth in the neighborhood of the threshold. It is indeed possible that observations that are bunched at these selected points are not similar to the neighboring ones—splitting shares equally is likely to be (and is in practice) correlated with many characteristics of individuals and firms. Therefore, for the regression discontinuity analysis in Sect. 6, we exclude exact fractional observations from the sample of analysis, as described in more detail in Sect. 6.1 and Appendix B. For the network analyses, we also need to restrict the sample further to operationalize the family network variable, and to ensure that the assumptions of the regression discontinuity design are not violated by family members owning identical number of shares in the same firm; see Sects. 5.2 and 6.3 for details.

4 Timing of adoption

We start by illustrating the dynamics of setting up tax shelters in the data. Remember that prior to June 17, 2005, one needed 90% ownership to be eligible for an E-firm. From June 17 and onward, this eligibility criteria was reduced to 10%. There are few adoptions during the early period, and as also visible in Fig. 1, the timing of adoption is heavily concentrated toward the end of the period. This raises the possibility that these early adoptions may have influenced family members to also adopt a tax shelter.

Figure 2 shows the adoption of the tax shelter by individuals with (“exposed”) and without (“not exposed”) a family member setting up a tax shelter prior to June 17. Exposed individuals end up approximately 6 percentage points more likely to eventually set up a tax shelter. Furthermore, Fig. 3 shows that even conditional on ultimate adoption there are differences in timing—those who have exposed family members adopt earlier than others. These patterns do suggest that there is correlation between adoption by network members in the past and the individual’s own adoption of the tax shelter. They also suggest that there may be an effect on timing: individuals in networks with early adopters are not just more likely to adopt in general, they also tend to adopt earlier than others.

Fig. 2
figure 2

Adoption by individuals with and without a family member adopting before June 17th 2005

Fig. 3
figure 3

Timing of adoption by individuals with and without a family member adopting before June 17th 2005 (conditional on ultimately adopting)

We investigate these patterns more formally in a simple regression framework. Table 2 shows the results of regressing the E-firm adoption dummy on the indicator for having somebody in the family network adopting by a particular date, with various sets of controls. Only the coefficient on the network dummy is reported and each cell corresponds to a different regression. The first panel shows the results of regressing the dummy for ever setting up an E-firm on the dummies for having somebody in the network setting up by June 17, November 1, and December 1. Consistently with the descriptive graphs that we have just discussed, the results of baseline regressions with no controls show a strong effect in each case. In the second column, we control for a number of demographic characteristics: gender, immigrant dummy, urban dummy, self-employment status, education dummies, business/law education dummy, number of children and age dummies (decades). Including these controls does not have a strong effect on the estimated coefficient although many of them are individually very significant (not reported). The final column shows the effect of including economic controls: logarithms of total income, net worth, capital income and 2004 dividends. Inclusion of these variables reduces the estimated network coefficients but they do retain statistical significance. This indicates that early take-up in one’s network correlates with individual economic characteristics that are relevant to take-up decisions, but that it works beyond them.

The effect of somebody else adopting may not be just on ultimate adoption but also on timing of adoption. To rudimentarily pursue it further, we note in the following two panels that adoption before December 1st is more robustly explained by family network adoption before November 1st and adoption before November 1st appears correlated with family network adoption pre-June 17. Especially in the latter case, the effect of economic controls on the estimated coefficient is weakened. This is consistent with the coefficient on early adoption picking up the effect of inducement over the short horizons, but at least partially reflecting the effect of correlation of early adoption in networks with economic characteristics that ultimately matter over a longer horizon. At the same time, it is interesting to note that demographic characteristics (while individually significant) do not seem to be correlated with early adoption.

Table 2 The effect of early adoption on take-up

Overall, these results suggest that while adoption of an E-firm is also correlated with many demographic characteristics, it does not seem that correlation of early adoption in the family networks is related to these factors. At the same time, it appears that the link between adoption and the network is less sensitive to the inclusion of controls as the horizon is reduced. This is intuitive: the impact of having someone in the network adopting should be on timing first of all, and while the effect may persist in the longer term, it is possible that it is hard to distinguish from the effect of other characteristics correlated with early adoption. This motivates a strategy that treats timing more carefully.

It’s possible that taxpayers in the network are exposed to the same shocks (for example news) at the same time. But it is harder to make the case that individuals would happen to make similar decisions at similar time based purely on correlation in characteristics that are constant over time absent common shocks or interactions. We thus regress the dummy for taking up an E-firm in a particular week on having somebody in the network take-up a week before.

Fig. 4
figure 4

The effect of a family member adopting a week earlier (OLS with no controls)

Figure 4 shows the results for family network based on simple OLS regressions. This is again a linear probability model and a hazard-like context. Week 1 corresponds to the last week before January 1, 2006 (and the right-hand side variable is adoption in the family a week before that) and higher numbers correspond to earlier adoption. The figure shows the baseline effect (the constant from the OLS) that represents adoption of the tax shelter by individuals with no exposure in the family network in the preceding week, and the effect of those who were exposed last week (the sum of the constant and the coefficient on the exposure dummy), together with the 95% confidence interval for the latter. There is a significant effect for the last six weeks of the year and some weeks before that. At the longer horizon, the effect is gone. It is possible that a week is in the right ballpark of the timing of inducement effect late in the game, but is too short of a period earlier when there is no reason to rush.Footnote 10

Overall, these results are suggestive of the relationship in timing and eventual take-up, but still may not be causal. We now turn to a regression discontinuity framework to investigate the network effects further using a research design that more readily lends itself to a causal interpretation. We will revisit timing effects again in Sect. 6.4.

5 Basic framework

Our core econometric framework can be described in two closely related equations—one for the individual i herself and one for the network member j that may affect (“treat”) her

$$\begin{aligned} E_{j}= & {} \alpha _{j}+\beta _{j}\cdot X_{j}+\gamma _{j}Z_{j}+\mu _{j} \end{aligned}$$
$$\begin{aligned} E_{i}= & {} \alpha _{i}+\beta _{i}\cdot X_{j}+\gamma _{i}Z_{i}+\mu _{i} \end{aligned}$$

Equation 1 relates sheltering decision E to one’s own incentives represented by \(X_{j}\), and controlling for own characteristics \(Z_{j}\). Equation 2 relates sheltering decision of an individual to her own characteristics \(Z_{i}\) and some characteristics \(X_{j}\) of the network member. We will refer to the individual i as “treated” individual and to individual j as “treating” individual.

In most cases we will use a dummy variable for setting up a tax shelter (i.e., E-firm) as the dependent variable and estimate specifications as linear probability models. Given that we will primarily focus on local effects in small (bounded) neighborhoods of the discontinuity point, this is not particularly restrictive. We will also occasionally investigate the timing of decisions by replacing E with adoption of the shelter in some period \(\tau ,\) \(E^{\tau }\), or using the timing of adoption t directly. Some of these specifications will be estimated using tobit and probit methods to address censoring (not everybody adopts before the deadline) or accommodate periods with very low adoption rates.

The establishment of an E-firm in the family network may directly affect individual i’s likelihood of setting up an E-firm, but there are likely to be other channels at work too. Indeed, the fact that a network member is eligible to set up an E-firm may result in the collection of information about costs and benefits, and this information may affect individual i regardless of whether j ends up establishing an E-firm or not. The mere awareness of the tax shelter option in the network, or even perceptions of social acceptance in the network, can also affect the behavior of individual i. Thus, in Appendix A, we provide a simple theoretical framework for interpreting \(\beta _{i}\) and \(\beta _{j}\). The non-zero value of \(\beta _{i}\) implies that the social interactions are present. Its value provides an indication of the magnitude of the effect that is not “structural”—it measures the responsiveness to the particular shock. Remark 3 in Appendix A (under assumptions leading up to it), provides a way to guide the interpretation of the ratio \(\frac{\beta _{i}}{\beta _{j}}\). \(\beta _{i}\) being large relative to \(\beta _{j}\) indicates that either the interactions are very strong or that the awareness of sheltering opportunities of the family members that are influenced by the recipients of the shock is relatively low.

The most restrictive feature of our estimation equation may seem to be due to the fact that we include \(X_{j}\) for only a single other individual—we will discuss the interpretation below. As mentioned before, we implement a regression discontinuity design that relies on the feature of the reform that required a newly setup holding company to own at least 10% of shares of a firm. Hence, an individual who already owns 10% of shares in a firm was in a position to pursue this path alone with no additional adjustments, while an individual who owned just below 10% would have to either buy or coordinate with others. Consequently, we define \(X_{j}=1(S_{j}\ge 0.1)\) where \(S_{j}\) is individual shareholding in a firm.Footnote 11 Crucially, we have information about the exact number of shares that an individual owns in 2004 as well as the total number of shares in a firm, so that we can (1) construct \(S_{j}\) exactly and (2) do so using information that precedes the reform and hence does not reflect the effect of the reform itself.

Our basic comparison is that of individuals just below and just above the 10% threshold; corresponding to Eq. 1. While, as we stated in Remark 2 of Appendix A, the response of the individual to this incentive is not a necessary condition for the presence of network effects, a combination of the lack of such evidence with the presence of network effects would certainly be surprising. Equation 1 is important for a number of other reasons. First, we will investigate subsamples with different propensities to set up an E-firm and expect that those where the direct effect is strongest, are also likely to exhibit stronger network effects. Second, our attempts to provide a structural interpretation of the estimates rely on comparison of direct and indirect effects.

Equation 2 specifies how the decision of a “treated” individual (i) is related to incentives (\(X_{j}\)) of her “treating” network member (j). Hence, the comparison is between individuals who happen to have in their networks somebody with just over 10% shares in a firm versus those that have in their networks somebody with just under 10% shares.Footnote 12

There are many characteristics of individuals and the network that may matter as well in general. The regression discontinuity design allows to abstract from them as long as they do not change discretely at the 10% threshold. We will investigate this assumption for particular variables and will test sensitivity of results to including controls. Given that the assumptions for validity of the regression discontinuity design hold, controlling for such additional characteristics is not necessary for obtaining unbiased estimates of the effect of \(X_{j}\) on \(E_{i}\).

We will investigate heterogeneity of the response by splitting the sample along some dimensions (such as history of dividends) and/or including interaction effects.

We also note that since any operationally available definition of a network is intrinsically arbitrary, our measure of the presence of a tax shelter within the network will not be fully correct if we do not properly classify individuals as members of a network. Thus, estimates of \(\beta \) may suffer from the attenuation bias if what one is interested in is the effect of any interactions. As long as assumptions for the validity of the regression discontinuity design hold, the estimates reflect though the average effect of exposure to eligibility for sheltering in a family network. While a concern in general, the downward bias due to mis-classification makes our task harder, but should not lead to spurious findings.

5.1 Unit of observation

Our running variable is defined on the level of shareholding. A shareholding in a particular firm k of a particular individual j may or may not be eligible for establishing an E-firm depending on whether it corresponds to less or at least 10% share. Any individual may have multiple shareholdings in multiple firms that may fall on either side of the threshold.

We want to avoid assumptions necessary to aggregate such information to the individual level. This is because aggregation disposes of potentially useful information and comes with practical concerns. For example, the largest share owned by a taxpayer in any firm is also a continuous variable to which the 10% discontinuity applies, but it ignores all smaller shareholdings that also correspond to discontinuous incentives (i.e., some taxpayers who own around 10% of a firm also turn out to own a higher share of some other firm). Various forms of averaging are incompatible with regression discontinuity design because they blur the running variable, so that there is no longer discontinuity in incentives of such a measure (e.g., at 10% of average shareholding).

Hence, instead, we usually represent our data on the shareholding level. That is, we are treating each (jk) as a separate observation and use statistical correction (clustering) to correct for the dependence due to potential inclusion of multiple observations for the same person. As the size of the window around the threshold declines, the likelihood that more than one observation per individual is used declines and the distinction between individuals and shareholdings becomes irrelevant in the limit (and is of small consequence for standard errors in practice).

There is a corresponding issue that relates to the definition of the outcome variable. Setting up an E-firm can be defined on a shareholding level: an individual transfers shares of a particular firm to an E-firm and may choose to do so for some firms but not for others. We will show some evidence of the effect on the shareholding level, but will primarily focus on the outcomes defined on the shareholder (i.e., individual) level. That is, our outcome variable \(E_{j}\) represents whether an individual adopted any E-firm for any of her shareholdings. Hence, the unit of observation is (jk), the corresponding running variable is \(S_{j,k}\) but the outcome is \(E_{j}\)—constant for all k.

In the network context, we want to retain the same structure on the treatment level. The discontinuity is defined on the level of the shareholding of the treating network member, (jk). The corresponding treatment affects all individuals i who are related to the network member j. Because networks overlap, there is no straightforward way of collapsing information to the whole network level. Instead, we treat each link (i, (jk)) as a separate observation. As a result, a single shareholding k of person j gives rise to multiple observations for all individuals who are in the same network as j. We address the corresponding dependence by clustering standard errors at j level. There is also the possibility that person i gives rise to multiple observations corresponding to links with shareholdings of all her network members, but in practice this is of little concern because it is rare that the same person has multiple network members with shares falling into the same small interval around the threshold. Finally, as before, we define the outcome variable as setting up any E-firm so that it is the same for all observations corresponding to individual i.

5.2 Interpretation of the estimated coefficients

As we discussed, the unit of observation for our analysis is the (directed) network relationship and our baseline specifications Eqs. 1 and 2 include \(X_{j}\) only, rather than characteristics of all individuals in the family network (\(X_{-j}\)); see also Appendix A. In general, individuals may be influenced by many different network members

$$\begin{aligned} E_{i}=\alpha _{i}+g(X_{j},X_{-j})+\gamma _{i}Z_{i}+\mu _{i} \end{aligned}$$

Suppose that \(X_{j}\perp X_{-j}\) (i.e., that in our regression discontinuity context, the likelihood of being below/above the 10% threshold is uncorrelated in the network) and, counterfactually, that for each i we observe just one randomly selected individual j. In that case, our specification would estimate \(\beta _{i}=\text {E}\left[ \frac{\partial g}{\partial X_{j}}\Big |Z_{i}\right] \)—the local average treatment effect of exposing an additional network member to tax sheltering opportunities, with equal weights assigned to all individuals. In our application though, we include an observation for each network relationship (ij) so that, instead, we weigh equally relationships rather than individuals.

This strategy makes it straightforward to pursue estimation using relationship data and, as long as the assumption \(X_{j}\perp X_{-j}\) holds, it remains an unbiased estimator of treating an additional relationship (not an individual) in the network.

6 Regression discontinuity evidence

Our main identification strategy exploits differences in eligibility for setting up an E-firm. As discussed before, the newly created E-firm has to hold at least 10% of shares of the original firms. Hence, taxpayers who own at least that much can set up an E-firm without further complications while taxpayers who own less than 10% of shares have to either buy more or set up an E-firm in cooperation with others. However, an examination of the dataset shows bunching in many places, threatening the identification through regression discontinuity approach if not addressed.

6.1 Smoothness of the distribution

A closer inspection reveals that bunching is very systematic—it occurs at points that correspond to splitting shares of the firm as exact fractions.Footnote 13 Thus, for one, non-randomly distributed observations at bunching points differ from others because they correspond to firms that choose to split ownership in such a regular way and it is possible that observations that are bunched at these selected points are not similar to the neighboring ones—splitting shares equally is likely to be correlated with many characteristics of individuals and firms.Footnote 14

Hence, we proceed by eliminating exact fractions from the sample as explained in Appendix B. The outcome of this trimming procedure in terms of the number of observations is shown in Fig. 5.Footnote 15 While eliminating exact fractions removes a lot of bunching, we see from the figure that the density is still not completely smooth around the 0.25 share—our rules for eliminating fractions do not seem sufficient for dealing with that bunching. Tax rules pre-reform also provided an incentive to have active ownership below 2/3 in order not to be subject to the so-called split model that taxed part of profits at labor income tax rates—as a consequence, there are many examples of firms that assigned just over 1/3 stake to passive owners, in particular often dividing it further in half (e.g., among two children) and hence resulting in shareholdings of just over 1/6th—some of the irregularities are likely associated with that. Similarly, predetermined characteristics (measured in 2004) are also quite noisy, but this is so mostly away from the threshold and especially for shares above 1/6th (see Online Appendix Figs. A5, A6, A7 and A8). We draw two conclusions. First, the data around the 10% threshold appears reasonably smooth and we will limit the window around the threshold to at most of 0.05 on each side, where the case for smoothness of the distribution is strongest. Second, we will test robustness of the results by controlling for demographic characteristics. We thus proceed with this subsample in what follows.

In Sect. 6.3 we will show that the density and predetermined variables are similarly smooth around the threshold in the network analysis.

6.2 The effect of 10% rule on individual adoption

Figure 6 shows individual ownership share in 2004 (i.e., half a year before the 10% eligibility criterion was introduced) and the fraction of individuals setting up E-firms by 1% bins (starting at round percentage values, inclusive, e.g., [0.10, 0.11)). The unit of observation for this figure is a shareholding—an individual who owns shares in multiple firms corresponds to multiple shareholdings and hence multiple observations.Footnote 16 The adoption of the E-firm is defined at the individual level. Hence, the figure suggests that individuals who happen to have a shareholding that inches just above the 10% mark are more likely to set up an E-firm (overall, not just or solely for this particular shareholding). The figure illustrates a number of points that will be important below. First, there is an appearance of discontinuity at the 10% threshold but there is also enough variation in the data overall that careful testing is necessary to establish its presence.Footnote 17 Second, it is a “fuzzy” regression discontinuity design—E-firms are created by some individuals below the threshold (by coordinating with others, through additional purchases of shares during 2005 or because of imprecision in the running variable if there is corporate ownership) and take-up is far from universal above the threshold. Imperfect assignment implies that the estimated effects are very likely to be heterogeneous across different groups, since the take-up would depend on incentives, and we will investigate such heterogeneity. Third, the pattern of adoption is nonlinear over the whole support but reasonably linear in the neighborhood of 0.1; adoption increases significantly with shareholding until it reaches a plateau at around 0.2, above which around 20% of the population adopts (and the data is considerably noisier). Consequently, we will restrict analysis to a reasonably narrow neighborhood of the discontinuity point—in most cases, subsets of interval (0.05, 0.15)—where nonlinearity is not an important issue.

Fig. 5
figure 5

Distribution of ownership shares in 2004 around the 10% threshold

Fig. 6
figure 6

Ownership share in 2004 and probability of ultimate adoption of the E-firm

Fig. 7
figure 7

Ownership share in 2004 and probability of ultimate adoption of the E-firm—individual level, excluding fractions

Figure 7 zooms in to the smaller region (0, 0.30), that more clearly displays the 10% threshold (with bins corresponding to 0.01 intervals). It also shows point-wise standard errors of the mean within a bin. The likelihood of taking up an E-firm jumps discontinuously at the 10% point. This is formally investigated in the top panel of Table 3. The baseline regression is a linear probability model of the dummy for taking up an E-firm on an indicator for being at or above the 10% mark in 2004 within a narrow band around the 10% point. The “flexible” controls specification additionally allows for linear (and possibly different) terms on the left- and right-hand side of the threshold. We show the effect in adoption on individual level and (of our main interest) the effect on shareholder level. The results indicate that the discontinuity is present and statistically significant both if adoption is defined for shareholding and for an individual.Footnote 18 In particular, our preferred estimates (on shareholder level, using larger windows around the threshold) indicate that individuals just above the threshold are 4 percentage point more likely to adopt the E-firm, relative to the base of approximately 10.5 percentage points—nearly 40% increase.

Because all our regressions are estimated using a shareholding as the unit of observation, Table 3 also shows the number of unique individuals in the sample used for each specification (this is also the number of clusters for standard errors estimation). The number of individuals is generally very close (within 5%) of the number of shareholdings, because it is not very common that the same person owns shares in two different firms that happen to be close to 10%.

Table 3 The effect of crossing 10% ownership on E-firm take-up

In the following panel we pursue basic robustness checks by including a set of individual controls—age, gender, number of individual owners and log capital. Inclusion of these additional controls has small impact on both estimates and standard errors, providing some comfort that composition differences are not driving the results.

While the evidence that the 10% ownership share matters for the decision to adopt the E-firm is robust, we are primarily interested in using this effect of own eligibility (cf. Eq. 1) to trace its implications in the network (cf. Eq. 2). We are more likely to be able to statistically trace such responses if the effect of own eligibility is strong. We further investigate subsamples in order to zoom in on a group, if any, that is particularly strongly affected.

Since the benefit of setting up an E-firm is due to reduction in taxation of capital gains or dividends, individuals and firms that generate capital income should be more likely to adopt. Hence, if we further restrict the sample to those shareholdings of individuals who received dividends in 2004 (i.e., pre-reform), results are noisier but arguably more pronounced (see Online Appendix Fig. A14), and there is no discernible effect for the remainder of subsample (Online Appendix Fig. A15). The formal results are shown in the third panel of Table 3 and the magnitude of the effect seems larger than for the full sample, so that despite this group including only about 1/3 of the original sample the t-statistics are of comparable magnitude (consistent with Online Appendix Fig. A15, there is no robust regression evidence of an effect for those with no dividends).

The final panel of Table 3 imposes an additional restriction on the sample by limiting it to those individuals who own firms that have over 1000 shares—a group for which the abstraction of “continuous” variation in ownership shares is more realistic. The estimated effects are large and robust despite much smaller sample size than before.Footnote 19

Overall, the results in this section clearly demonstrate that the 10% discontinuity played an important role in determining take-up of E-firms. Those with just over 10% share are much more likely to do so than those below, and the difference is both economically and statistically large. The effect is heterogeneous. It is there for those who are most likely to benefit from it—individuals who have the history of receiving dividends. While this is intuitive, it also indicates that either alternative means of setting up an E-firm (coordinating with others or purchasing additional shares) are costly enough or that the information about availability of the shelter is not there, so that those below 10% share who are otherwise similar do not adopt E-firms to the same extent. Hence, those that take-up E-firms as the result of the treatment would have either been uninformed about this option or found coordination too costly in the absence of the treatment.

6.3 Network effects

We now turn to the network level analysis by analyzing adoption of E-firms of an individual (i) as a function of ownership around the 10% share in a particular firm (k) of a network member (j). As discussed before, for this analysis, we focus on the data on the shareholding level so that each shareholding of a treating family network member (j, k) is a separate observation affecting the impacted individual. We focus on network members who fall into subsamples in which we showed evidence of a discontinuity in adoption: we exclude network members with fractional shares, and further zoom in on those receiving capital income and in firms with large number of shares. We do not impose any additional restrictions on individuals (i) themselves—the running variable (ownership share) is the property of the network member and she may affect family members regardless of their characteristics (though we will investigate heterogeneity).

Fig. 8
figure 8

Family member’s ownership share in 2004 and number of individuals with at least 10% shares—individuals not owning the same firm

Fig. 9
figure 9

Family member’s ownership share in 2004 and ultimate adoption of the E-firm

Before proceeding further, we want to make sure that when we compare individuals with network members on either side of the 10% threshold, this is the only difference between those groups. Online Appendix Fig. A20 shows though that as the network member’s share is crossing 10%, the share owned by the individual itself is more likely to be above 10% as well. It turns out that this is driven by family members owning identical number of shares in the same firm. Hence, in what follows, we restrict attention to network links between individuals who do not own shares in the same firm (this is our \(X_{i}\perp X_{j}\) orthogonality assumption). As Fig. 8 shows, in that subsample the likelihood of having a share above 10% sails smoothly through the threshold. We restrict attention to this subsample in what follows. Online Appendix Fig. A9 shows that the number of observations is smooth around the threshold, but, similar to what we see in Sect. 6.1, there is some noise, especially as the ownership share exceeds 1/6th. Similarly, predetermined characteristics (measured in 2004) are quite noisy, but this is so mostly away from the threshold and especially for shares above 1/6th (see Online Appendix Figs. A10, A11, A12, and A13). These patterns are similar to those discussed in the case of individual-level analysis in Sect. 6.1, and again we thus proceed by limiting the window around the threshold to at most of 0.05 on each side, and test robustness of the results by controlling for demographic characteristics. Beyond the necessity of exploiting discontinuity for identification purposes, restricting attention to network links between individuals who do not own shares in the same firm also has economic content: the interaction between “treating” and “treated” individuals is guaranteed not to take place in the context of the firm, but rather has to flow through other channels.

Figure 9 shows the discontinuity-based evidence of adoption elsewhere in the network on individual adoption,Footnote 20 and the top panel of Table 4 shows the corresponding estimates. The estimates of the discontinuity are generally significant and reasonably stable as the window around 10% is adjusted. The table also shows the number of unique treating network members j that underlie each specification—there are about half as many of them as all the observations. As we see in Table 3 that is to a small extent driven by the same person having multiple shareholdings close to 10%. The bulk of the difference is explained by the same network member treating multiple individuals in the network.

Table 4 The effect of crossing 10% ownership by a family network member on E-firm take-up

While the network effect may be present regardless of one’s own ownership, individuals who already own at least 10% are already eligible for setting up an E-firm without any additional arrangements and hence may be more strongly affected. At the same time, by virtue of their eligibility, they are more likely to set up an E-firm regardless, so that the additional network incentive might be expected to be weaker for that reason. Zooming in on individuals who own at least 10% ownership share in any firm, strengthens the results (right columns in Table 4). The estimates for this group are larger, suggesting that the first effect dominates.

The second panel shows robustness of the results to inclusion of demographic controls—they are essentially unaffected. Overall, we observe that the estimated effect of a network member being eligible in Table 4 is roughly similar in magnitude to the effect of the individual herself being eligible (Table 3). Theory in Sect. 5 and Appendix A indicates that the large network effect relative to own effect is consistent with either interactions being strong or else low awareness of sheltering opportunities absent interaction with a treating individual.

Following up on our previous discussion, we further split the sample by whether the network member received dividends in 2004. The bottom two panels of Table 4 show that for those with family members who received dividends, the effects are of the expected sign and not too sensitive to the size of the window or inclusion of controls.Footnote 21 They are becoming significant when the window around the threshold (and sample size) grows and in narrow window when no controls are included (the linear terms in ownership share are generally insignificant). The results for those with family members who have not received dividends are smaller and generally insignificant.Footnote 22

This is consistent with the interpretation of take-up by a family member reflecting the presence of the treatment: since the direct effect on take-up for that group was not detectable, observing an impact on their family members would be surprising.Footnote 23

Table 5 The effect of crossing 10% ownership by a family network member on E-firm take-up-decomposition of response

In Table 5 we split the sample in additional ways. First, we look at those with treating family members with dividends in 2004 and firms with over 1000 shares. In this group, the effect of own eligibility (cf. Eq. 1) was strong and the corresponding results are strong here as well. Then, we split the sample by whether the treated individual itself received dividends in 2004. We find more precise statistical evidence for those who did not receive dividends themselves than for those who did, though the large standard errors do not allow for rejecting the possibility that point estimates are not statistically different. Still, even if the coefficients for those without dividends were similar in absolute value, the base take-up for this group is much lower and hence the impact is economically much more significant—for example, the estimated effect of 0.04 over the (0.05, 0.15) window for the flexible specification corresponds to roughly doubling the take-up. Thus, a very rough taxonomy of the results may be that treating individuals with most to gain (those with dividends) are most responsive to the 10% threshold incentive, but they stimulate take-up by (treated) individuals who have less potential to gain (those without past dividends) and so perhaps least informed otherwise.

6.4 Effect on timing

In order to further substantiate the presence of network effects, we combine the RD evidence with timing. In Sect. 4, we provided evidence of a strong association between timing of adoption in the network and individual adoption, but though suggestive, these results cannot be interpreted as causal. Here we use our regression discontinuity approach to further corroborate the presence of interesting timing effects. In all the following specifications, we focus on the 0.05 window around the discontinuity point.

Table 6 The effect of crossing 10% ownership by a family network member on E-firm take-up—distance of the date of adoption to 1/1/2006

In Table 6 we look at the effect on the number of days between January 1, 2006 and when the tax shelter was established, with individuals not establishing the shelter assigned a zero value. We estimate regression discontinuity specifications as in Sect. 6.3, just with the distance of adoption until January 1st as the dependent variable. The OLS specification results in Sect. 4 are positive, but for the most part insignificant for the full and dividend samples. Since distance to January 1st is effectively censored at zero, these results are biased downward. As an alternative, in the following panel we make the normality assumption and estimate the effect on the date of setting up a shelter via Tobit specification. The Tobit estimates have the same sign as the OLS ones but, consistently with the expected OLS bias, are much larger and statistically significant. The results indicate that having a family member exposed to the 10% rule accelerated take-up of the tax shelter by as much as 20 days; the results are robustly significant for the sample with dividends, smaller for the full sample, and possibly zero (with large standard errors) for those with family members who did not have dividends in 2004.

Table 7 The effect of crossing 10% ownership by a family network member on E-firm take-up—timing effects, everyone

When exactly did these network effects materialize? In Table 7 and Online Appendix Tables A2–A4 we focus on results for particular periods. We ask whether adoption during various time periods—either separately (1–30, 31–60, 61–90, 91–120 days before the reform) or cumulatively (overall, more than 30, 60 or 90 days before), is explained by network exposure. The results are presented both for the full sample and those with at least 10% ownership.

Focusing on the results for everyone, in Table 7 we report results from probit specifications. The table contains an estimate of the effect of crossing the 10% discontinuity on probability of adopting at the threshold and the effects on log probability to allow for more meaningful comparison across different periods.Footnote 24 The effect is strongest in the second month before the deadline and it appears to be there for both those with and without network members who received dividends. The evidence of the effect in the last month is weaker, but it is suggestive of the presence of the effect continuing up until the deadline for those with family members who received dividends. The results for three or four months prior to the reform do not indicate an effect though they are noisy and sometimes counterintuitive, reflecting a small number of individuals taking up early on. Online Appendix Table A2 shows the cumulative effect—impact on adoption by the time of the reform (same as our main specification) and by 30, 60 or 90 days pre-reform. Consistently with month-by-month results in Table 7, cumulative results indicate that the bulk of the effect is already there 30 days before the reform. The results for those with at least 10% ownership (in Appendix tables) are qualitatively similar.

Overall, we conclude that this discontinuity-based evidence on timing adds support for the presence of a causal relationship in adopting tax shelters running through the network; as such it also strengthens our observations from Sect. 4 that timing dimension of the response is important.

7 Conclusions

We considered effects of eligibility to adopt a legal tax sheltering strategy in Norwegian family networks. In a descriptive analysis of the timing of take-up of the tax shelter, we find that early adoption in the family network is correlated with individual take-up later on. Looking at the short term impact reveals significant increase in adoption within the week after a network member set up an E-firm. This is very robust to controls and provides very suggestive evidence that take-up in the network stimulates adoption of the tax shelter.

Relying on a regression discontinuity design in the incentives to adopt, we showed that family members of individuals who had a strong incentive to pursue tax sheltering (and who, in fact, responded accordingly) are more likely to pursue tax avoidance themselves. These patterns are not uniform across different group of individuals. The propensity to adopt at discontinuity is strongest by individuals who are most likely to benefit (as measured by history of capital income) and it is their family networks that are affected. At the same time, there is suggestive evidence that it is those members of family networks who themselves do not have a strong reason to pursue tax avoidance that respond most strongly. This is consistent with two possibilities: these are either uninformed individuals or they face high cost of adoption relative to benefits and that this cost is reduced by having a family member familiar with the process.

More generally, our results provide one of the first empirically well-identified examples that tax planning is a social phenomenon that is affected by what others do. This highlights the importance of accounting for social interactions in understanding enforcement and tax avoidance behavior. Recent work by Pomeranz (2015) highlights that network incentives matter in the VAT context; in our case, however, there is no compliance spillover that may explain our findings—the strategy is legal and networks are not linked by business interests that could explain correlated behavior. Instead, it is likely that knowledge on the benefits of a tax shelter, reduced costs through free advice from a family member, or norms on the acceptability of utilizing a tax shelter are transmitted within a network. Our evidence of heterogeneous patterns of response points to knowledge and cost as likely channels. More research is called for to explore the external validity of the current spillover results for other networks, other tax avoidance strategies, and in other countries.

Our findings are also related to the literature on optimization frictions. Because different individuals have different networks, they effectively are differentially exposed to planning opportunities (either through knowledge or through costs). As a result, networks are linked to optimization frictions: their importance reveals that individuals do not necessarily react to all theoretical incentives out there absent exposure in the family and that they are behind heterogeneity in behavior that varies with the extent of exposure. The recent literature in public finance stresses the relevance of optimization frictions, often taken as abstract barriers to optimization, and our work points to one possible direction for understanding when, how and why they might be present and vary.