Social Networks and Tax Avoidance: Evidence from a Well-Defined Norwegian Tax Shelter

In 2005, over 8% of Norwegian shareholders transferred their shares to new (legal) tax shelters intended to defer taxation of capital gains and dividends that would otherwise be taxable in the aftermath of 2006 reform. Using detailed administrative data we identify family networks and describe how take up of tax avoidance progresses within a network. A feature of the reform was that the ability to set up a tax shelter changed discontinuously with individual shareholding of a firm and we use this fact to estimate the causal effect of availability of tax avoidance for a taxpayer on tax avoidance by others in the network. We find that take up in a social network increases the likelihood that others will take up. This suggests that taxpayers affect each other's decisions about tax avoidance, highlighting the importance of accounting for social interactions in understanding enforcement and tax avoidance behavior, and providing a concrete example of “optimization frictions” in the context of behavioral responses to taxation.


Introduction
The standard public finance approach to analyzing tax-influenced economic decisions presumes a well-informed taxpayer who makes rational decisions while understanding the important features of the economic environment. This paradigm has long been considered non-satisfactory in the context of tax evasion, where the standard Allingham and Sandmo (1972) model overpredicts the extent of cheating (see Andreoni et al., 1998;Slemrod and Yitzhaki, 2002, for surveys of the literature).
Recent work also recognizes that empirical behavioral responses are sometimes puzzlingly small and inconsistent across different contexts -for example, Saez (2010), Chetty et al. (2011) andKleven andWaseem (2013) show that elasticities implied by the number of taxpayers who are bunching at the kinks of income tax schedule are very small, Chetty et al. (2009) and Finkelstein (2009) show evidence consistent with "salience" of tax incentives playing a role, Jones (2012) shows that taxpayers do not adjust withholding to reduce refunds, Chetty et al. (2014) show that only a small number of taxpayers makes active saving decisions, and a large literature shows the importance of default options in retirement programs (Madrian and Shea, 2001, and the literature that followed) 1 and imperfect take-up of social benefits (Currie, 2006).
The objective of this paper is to provide empirical evidence regarding a particular class of explanations for tax-motivated behavior. We are interested in testing whether the decision to pursue tax-minimizing behavior spreads within social networks, and in particular, family networks. 2 We find evidence that this is so. Our research design leverages the presence of a discontinuity in taxpayer's eligibility for setting up a particular (legal) tax shelter. We show that this discontinuity affects own tax avoidance and then we establish that it also affects tax avoidance of taxpayers in the family network who are not on the margin. In other words, similar taxpayers who have similar family networks, pursue different decisions as the result of a slight difference in characteristics of one of their family members that discontinuously change availability of avoidance for that family member (rather than the individual itself). We interpret this evidence as providing a concrete example of an optimization friction (driven by characteristics of the network) that is responsible for generating heterogeneity in taxpayer behavior with real tax consequences.
One potential direction for reconciling theory and evidence on non-compliance is to provide a more realistic characterization of the economic environment. This is what recent work on tax compliance has done by pointing out to the importance of third-party reporting (Kleven et al., 2011), attachment to the financial sector (Gordon and Li, 2009) or arms-length transactions (Kopczuk and Slemrod, 2006) as factors limiting the extent of evasion. While this strand of work suggests that administrative environment plays very important role in tax compliance, it does not fully account for the empirical patterns that suggest that taxpayers who face seemingly similar circumstances often make different tax decisions. Indeed, in the strongest piece evidence so far on the importance of third party reporting, Kleven et al. (2011) find that while accounting for third party reporting is extremely important for understanding patterns of compliance, only about 40% of taxpayers who are able to cheat do so.
Beyond attempts to improve characterization of the incentives faced by individuals, the recent development is to postulate an existence of "optimization frictions" that may stop individuals from pursuing otherwise optimal tax adjustments (e.g., Chetty et al., 2011;Chetty, 2012;Kleven and Waseem, 2013). This is a useful abstraction that potentially allows for explaining "inconsistencies" in observed empirical patterns, but it encompasses many possibilities: optimization frictions may be due to behavioral biases, lack of information, monetary or time adjustment costs or non-standard preferences. These varying possibilities might have very different policy implications so that discriminating between them is very important. Furthermore, there are two related, but distinct reasons to consider frictions. On one hand, one may be interested in developing a better understanding of individual behavior. On the other hand, frictions are a potential source of heterogeneity in behavior in the population. Our evidence about relevance of social interactions in the tax avoidance context provides evidence for both of these lines of thinking: for networks to matter, individual optimization has to depend on their characteristics; at the same time, by their very nature, networks are heterogeneous and hence generate differences in behavior of otherwise similar individuals.
Empirical work on tax avoidance and evasion faces a lot of challenges due to difficulty in observing the outcomes (pursuing and extent of tax avoidance/evasion). We can sidestep this problem due to existence of a well-defined tax shelter that is observable in our data; we provide more details below.
Approximately 8% of Norwegian firm owners adopted this particular tax shelter during the second half of 2005. We can also observe the precise timing of adoption and hence analyze its dynamics.
The existence of this well-defined measure of tax avoidance allows us to overcome measurement issues and focus instead on determinants of pursuing of this type behavior. The particular reform that we analyze introduced discontinuity in opportunities to set up this shelter that we exploit through the regression discontinuity design. Our data allows for constructing full family networks that we can then use to identify spillover effects.
To our knowledge, the only other (and concurrent to this work) paper that uses family relations to study the effect of norms and social interactions on the participation in tax minimization is Frimmel et al. (2018). They use Austrian data on claimed commuter tax deductions, where they can actually check the commuting distance and determine whether the deduction was rightfully or wrongfully claimed, the latter constituting tax evasion. They study father-child pairs and find that tax evasion runs in the family. Where Frimmel et al. study the intergenerational transmission of illegal tax evasion behavior, we study how legal tax avoidance behavior spreads within broad family networks.
The plan of the paper is as follows. In the next section, we describe Norwegian tax policy and the reform that gives rise to the research design in this paper and in section 3 we describe our data.
Section 4 is devoted to the empirical strategy. Our main reduced form results are in Section 5, where we present regression discontinuity based evidence of the effect of the 10% rule that is the source of the discontinuity on individual take up, followed by demonstrating the spillover effect in the network. In Section 6 we show the corroborating effect of timing and conclude that the take up in the network accelerates overall take up. Conclusions are in the final section.

The 2006 reform and tax sheltering opportunities
Under the Norwegian dual income tax in effect as of 1992, capital gains realized by both individuals and corporations were subject to the basic tax rate of 28% (that applied also to corporate, capital and labor income). Dividends were tax exempt on both individual and corporate levels. 3 The Shareholder Income tax was first proposed by an advisory committee on February 6, 2003. on the corporate level were treated favorably relative to individual ones. As the result, these changes unambiguously strengthened the incentive to own shares in a firm through another entity rather than directly. Indirect ownership in general allows for separating two decisions: extracting resources from a firm and the ultimate transfer to the individual. Such a separation can have non-tax benefits to the owners such as shielding personal assets from third parties (creditors, family members) in a holding company, as well as tax-related benefits such as tax-free consumption within a (holding) firm without bearing the economic risk associated with the activity of the original firm (see Alstadsaeter 3 This structure provided incentives for income shifting toward capital income tax base and to prevent it, the split model (1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005) imputed a return to the owners' labor effort in the firm, which was taxed as wage income. The split model applied to sole proprietors and corporations with 2/3 or more of shares held by active owners or where active owners were entitled to 2/3 or more of dividends. The split model and the incentives for income shifting are analyzed by Lindhe et al. (2004), Alstadsaeter (2007) and Thoresen and Alstadsaeter (2010). 4 The risk-free return to the share, the so-called Rate-of-Return-Allowance (RRA), is tax exempt. If received dividends are less than the RRA, the remaining amount is added to the imputation basis of the share for the calculation of future RRAs. The unused RRA is carried forward and added to the imputed RRA in the following year. The share-specific RRA cannot be transferred between different types of shares and only owner at the end of the year benefits from the calculated RRA for that year. Dividends paid to corporations were tax exempt at the introduction of the model, as were corporations' capital gains from realization of shares. Sørensen (2005) and Alstadsaeter and Fjaerli (2009) provide more information on the Shareholder Income tax. 5 This exemption of capital gains from taxation was implemented without warning on March 26, 2004. Anecdotal evidence that this was not expected by the business community, is the fact that one of the nation's richer investors on March 25, 2004 sold shares in a corporation that he owned indirectly through his investment company. Christian Sveeas' investment company Kistefoss sold its 6.5 % stake in the online price comparing service Kelkoo to Yahoo on March 25, 2004. This resulted in a taxable capital gain of 235 Million NOK, and capital gains taxes of 63 Million NOK or appr. 10 million USD. Had this sales contract been signed one day later, the capital gain would be tax exempt. et al., 2014, for evidence of this type of tax planning), and arbitrage between personal and corporate taxation. Following the reform, exempting capital gains and dividends for corporate owners creates an additional and very important advantage due to deferral of taxation. From our point of view, the key point is that individuals have a stronger incentive to own firms indirectly after the reform; we of course find prima facie evidence of it being so due to the massive conversions that took place.
For the existing firms, switching from direct (individual) to indirect (holding company) ownership should in principle require transferring/sale of existing shares and would trigger tax liability. In order to level the playing field between individual and corporate investors, the so-called Transition Rule E was introduced, which under certain conditions enabled an individual to transfer his/her shares in an existing firm to a holding company during 2005 without triggering capital gains tax that would otherwise be due. The Transition Rule E was first proposed on November 19, 2004, and sanctioned on December 10, 2004. It removed capital gains tax liability when an individual shareholder transfers all his shares in a firm to a newly founded corporation, given that this new holding company in the end holds at least 90% of the shares in the transferred company and the compensation is in the form of shares in the new corporation. The new holding corporation had to be founded and report sent to the company register by Dec. 31, 2005. It turned out that this transition rule was restrictive and relatively few shareholders could utilize it, and a more liberal version of the Transition Rule E was proposed on May 13, 2005, and later sanctioned on June 17. Under this new version of the Transition Rule E, the 90% threshold was reduced to 10% -to qualify, the holding company has to hold at least 10% ownership stake in the transferred corporation. Taking advantage of the Rule required that all shares that an individual owns must be transferred, and that the compensation is in the form of shares in the holding corporation. The transfer or foundation must be reported to the Corporate Register by Dec. 31, 2005. We will refer to a holding corporation that was founded during 2005 in response to the transition rule E as a "tax shelter" or an E-firm.
To summarize: prior to 2004, the incentive to own corporations directly was fairly strong because corporate capital gains were subject to taxation (thereby resulting in multiple layers of taxation before reaching personal owners), while dividends were tax exempt in any case. As of March 2004, neither corporate capital gains nor dividends were subject to the tax. As a result, indirect ownership of a firm allowed for deferral of taxation of capital gains until the holding company is sold. The incentive for indirect ownership was significantly strengthened by the introduction of individuallevel dividend taxation as of 1/1/2006. For the existing ownership stakes, taking advantage of these deferral opportunities should in principle require realizing capital gains and triggering tax liability, but the Transition E rule provided an opportunity to convert to indirect ownership without the tax.
The main purpose of holding companies set up under Transition E rule appears to be to work as a tax shelter intended to defer taxation and alternatives to achieve the same outcome would be costly.
During 2005, 16,483 holding corporations were set up and approximately 9% of existing non-listed firms at the end of 2004 had at least some of the owners electing to transfer their stake to a holding company. 6 6 Statistics Norway identified new holding corporations set up under the transition rule E by an existing sector Figure 1 shows the timing of adoption of firms that we classified as being set up under the transition rule. As the figure demonstrates, setting up holding companies was not uniform over time. Adoption was slow at the very beginning and increased rapidly toward the end of 2005, just before the opportunity to take advantage of it expired.

Dataset description
We use very detailed administrative data covering the universe of Norwegian firms, individuals and shareholders. Every resident in Norway is provided a unique personal identifier that is present in all databases, enabling us to follow every individual over time and across datasets. The same holds for firms.
The Shareholder Register 7 contains records of every shareholder of every Norwegian corporation for 2004-2008. Relying on the shareholders register, we are able to identify for each person and firm their holdings and, correspondingly, for each firm its owners, whether they are corporate or individual. Relying on this dataset, we select individual shareholders in 2004 who resided in Norway, owned shares of a Norwegian non-listed corporation with less than 100 individual owners and are not sole proprietors. 8 In particular, we can also identify holding companies that were set up during 2005 through the sector code assigned to them by Statistics Norway, determine their ownership structure and holdings. Because we observe this information for a number of subsequent years, we can also trace changes in the ownership structure such as transfers of an existing firm to a holding company. Importantly for our analysis, we know the exact date when each firm (holding companies included) was registered. The resulting sample consists of 318,818 personal shareholders at the end of 2004.
Using other register information we are able to link other characteristics, both demographic (gender, age, marital status, immigrant status, education) and economic (including tax-related information such as gross and taxable income, dividend income, capital gains realizations).
To estimate the effect of a tax shelter being set up in shareholder i's network on the likelihood that the shareholder himself adopts a tax shelter, we need to make operational a definition of the 8 We can also follow indirect ownership -via other firms -but we opted to not use it for our 2004 running variable (ownership share) because each individual owner of such a pre-existing holding company may not have full control over shares and thus may not be in a position to take advantage of the E-firm rule. Correspondingly, allocating shares owned by a firm to its owners is likely to be somewhat arbitrary and introduce noise in the running variable. Having individuals below 10% mark also owning shares through individual channels is one possible explanation for significant take up for individuals who were not eligible in 2004. network. In this paper, we focus on a particular and natural choice of an exogenous network: family members. To do so, we identify the following family members of each shareholder in our

Basic Framework
Our core econometric framework consists of two equations -for the individual i and the network member j that may potentially affect her Equation 1 relates sheltering decision E to one's own incentives represented by X j , and controlling for own characteristics Z j -this is the first stage. Equation 2 relates sheltering decision of an individual to his own characteristics Z i and some characteristics X j of the network member. In most cases we will use a dummy variable for setting up a tax shelter as the dependent variable and estimate specifications as linear probability model. Given that we will primarily focus on local effects in small (bounded) neighborhoods of the discontinuity point, this is not particularly restrictive. We will also occasionally investigate the timing of decisions by replacing E with adoption of the shelter in some period τ, E τ , or using the timing of adoption t directly. Some of these specifications will be estimated using tobit and probit methods to address censoring (not everybody adopts before the deadline) or accommodate periods with very low adoption rates.
In the Appendix A, we provide a simple theoretical framework for interpreting β i and β j . The non-zero value of β i implies that the social interactions are present. Its value provides an indication of the magnitude of the effect that is not "structural" -it measures the responsiveness to the particular shock. Remark 3 (under assumptions leading up to it), provides a way to guide the interpretation of their ratio β i β jβ i being large relative to β j indicates that either the interactions are very strong or that the awareness of sheltering opportunities of the family members that are influenced by the recipients of the shock is relatively low.
The most restrictive feature of our estimation equation may seem to be due to the fact that we include X j for only a single other individual -we will discuss the interpretation below. As mentioned before, we implement a regression discontinuity design that relies on the feature of the reform that required a newly set-up holding company to own at least 10% of shares of a firm.
Hence, an individual who already owns 10% of shares in a firm was in a position to pursue this path with no additional adjustments while individual who owned just below 10% would have to either buy or coordinate with others. Consequently, we define X j = 1(S j ≥ 0.1) where S j is individual shareholding in a firm. Crucially, we have information about the exact number of shares that an individual owns in 2004 as well as the total number of shares in a firm so that we can (1) construct S j exactly and (2) do so using information that precedes the reform and hence does not reflect the effect of the reform itself.
Our basic comparison is that of individuals just below and just above the 10% threshold. The first stage corresponds to equation 1. While, as we stated in Remark 2, the response of the individual to this incentive is not a necessary condition for the presence of network effects, a combination of the lack of such evidence with the presence of network effects would certainly be surprising. The first stage is important for a number of other reasons. First, we will investigate subsamples with different propensities to set up an e-firm and expect that those where the direct effect is strongest are also likely to exhibit stronger network effects. Second, our attempts to provide a structural interpretation of the estimates rely on comparison of direct and indirect effects. Figure 2 suggests that individuals with ownership stake smaller than 10% as of 2010 were less likely to ultimately take advantage of the Transition E rule. In particular, there is evidence of a (statistically significant) discontinuity at the 10% threshold that we will exploit in our regression discontinuity design. We will return to details involved in constructing this figure below.
Equation 2 is the second stage. The decision of an individual is related to incentives (X j ) of his network member. Hence, the comparison is between individuals who happen to have in their networks somebody with just over 10% shares in a firm versus those that have in their networks somebody with just under 10% shares.
There are many characteristics of individual and the network that may matter as well in general.
The regression discontinuity design allows to abstract from them as long as they do not change discretely at the 10% threshold. We will investigate this assumption for particular variables and will test sensitivity of results to including controls. Assuming that the assumptions for validity of RD hold, controlling for such additional characteristics is not necessary for obtaining unbiased estimates of the effect of X j .
We will investigate heterogeneity of the response by splitting the sample along some dimensions (such as history of reliance on dividends) and/or including interaction effects.
We also note that since any operationally available definition of a network is intrinsically arbitrary, our measure of the presence of a tax shelter within the network will not be fully correct if we do not properly classify individuals as members of a network. Thus, estimates of β may suffer from the attenuation bias if what one is interested is the effect of any interactions. As long as assumptions for the validity of RD hold, the estimates reflect though the average effect of exposure to sheltering in a family network. While a concern in general, the downward bias due to mis-classification makes 7 our task harder, but should not lead to spurious findings.

Unit of observation
Our running variable is defined on the level of shareholding. A shareholding in a particular firm k of a particular individual j may or may not be eligible for establishing an E-firm depending on whether it corresponds to less or more than 10% share. Any individual may have multiple shareholdings in multiple firms that may fall on either side of the threshold. We want to avoid assumptions necessary to aggregate such information to the individual level. This is because aggregation disposes of potentially useful information and comes with practical concerns.
For example, using instead the largest share owned by a taxpayer in any firm is also a continuous variable to which the 10% discontinuity applies, but it ignores all smaller shareholdings that also correspond to discontinuous incentives and it turns out to correspond to a small sample size around the 10% threshold (i.e., some taxpayers who own around 10% of a firm also turn out to own a higher share of some other firm). Various forms of averaging are incompatible with regression discontinuity design because they blur the running variable, so that there is no longer discontinuity in incentives of such a measure (i.e., at 10% of average shareholding).
Hence, instead, we usually represent our data on the shareholding level. That is, we are treating each (j, k) as a separate observation and use statistical correction (clustering) to correct for the dependence due to potential inclusion of multiple observations for the same person. As the size of the window around the threshold declines, the likelihood that more than one observation per individual is used declines and the distinction between individuals and shareholdings becomes irrelevant in the limit (and is of small consequence for standard errors in practice).
There is a corresponding issue that relates to the definition of the outcome variable. Setting up an e-firm can be defined on a shareholding level: an individual transfers shares of particular firm to an e-firm and may choose to do so for some firms but not for others. We will show some evidence of the effect on the shareholding level, but will primarily focus on the outcomes defined on the shareholder level. That is, our outcome variable E j represents whether an individual adopted any e-firm for any of her shareholdings. Hence, in our first stage regression, the unit of observation is (j, k), the corresponding running variable is S j,k but the outcome is E j -constant for all k.
In the network context (second stage), we want to retain the same structure on the treatment level. The discontinuity is defined on the level of the shareholding of the network member, (j, k).
The corresponding treatment affects all individuals i who are related to the person j. Because networks overlap, there is no straightforward way of collapsing information to the whole network level. Instead, we treat each link (i, (j, k)) as a separate observation. As the result, a single shareholding k of person j gives rise to multiple observations for all individuals who are in the same network as j. Conversely, person i gives rise to multiple observations corresponding to links with shareholdings of all her network members. As before, we define the outcome variable as setting up any e-firm so that it is the same for all observations corresponding to individual i . We address the corresponding dependence by two-way clustering of standard errors on i and j level. The likelihood 8 of including multiple observations for an individual i should vanish as the window around the threshold goes to zero but the likelihood of including multiple observations corresponding to the network member j does not vanish since the same shareholding with a given value S j corresponds to multiple observations. Hence, correcting for the dependence on the level of a network member is important even asymptotically.

Interpretation of the estimated coefficients
As we discussed, the unit of observation for our analysis is the (directed) network relationship and our baseline specifications 1 and 2 control for X j only, rather than for characteristics of all individuals in the family network. In general, individuals may be influenced by many different network members , that in our RD context, the likelihood of being below/above the 10% threshold is uncorrelated in the network) and, counterfactually, that for each i we observe just one randomly selected individual j. In that case, our specification would estimate β i = E ∂g ∂X j Z i -the local average treatment effect of exposing an additional network member to tax sheltering opportunities, with equal weights assigned to all individuals. In our application though, we include an observation for each network relationships (i, j) so that, instead, we weigh equally relationships rather than individuals.
This strategy makes it straightforward to pursue estimation using relationship data and, as long as the assumption X j ⊥ X −j holds, it remains an unbiased estimator of treating an additional relationship (not an individual!) in the network.

Regression discontinuity evidence
Our main identification strategy exploits differences in eligibility for setting up an E-firm. As discussed before, the newly created e-firm has to hold at least 10% of shares of the original firms.
Hence, taxpayers who own at least that much can set up an e-firm without further complications while taxpayers who own less than 10% of shares have to either buy more or set up an e-firm in cooperation with others. Recall Figure 2 that shows individual ownership share in 2004 (i.e., half a year before the 10% eligibility criterion was introduced) and the fraction of individuals setting up e-firms by 1% bins (starting at round percentage values, inclusive, e.g. [0.10, 0.11)) for a subsample of individuals that we describe below. As discussed before, the unit of observation for this figure is a shareholding -an individual who owns shares in multiple firms corresponds to multiple shareholdings and hence multiple observations. The adoption of the transition E firm here is defined as having set up an E-firm corresponding to any shareholding (rather than for the one associated with the observation). Hence, the figure suggests that individuals who happen to have a shareholding that inches just above the 10% mark are more likely to set up an E-firm (overall, not just or solely for this particular shareholding). The figure illustrates a number of points that will be important below. First, there is an appearance of discontinuity at the threshold but there is also enough variation in the data overall that careful statistical testing is necessary to establish its presence. Second, it is a "fuzzy" RD design -E-firms are created by some individuals below the threshold (by coordinating with others, through additional purchases of shares during 2005 or because of imprecision in the running variable if there is corporate ownership) and take up is far from universal above the threshold. Imperfect assignment implies that the effects are very likely to be heterogeneous across different groups and we will investigate such heterogeneity. Third, the pattern of adoption is nonlinear over the whole support but reasonably linear in the neighborhood of 0.1; adoption increases significantly with shareholding until it reaches a plateau at around 0.2 share above which around 20% of the population adopts (and the data is considerably noisier).
There is also some evidence that adoption may be declining at higher ownership levels, far from the threshold. Consequently, we will restrict analysis to a reasonably narrow neighborhood of the discontinuity point -in most cases, subsets of interval (0.05, 0.15) -where nonlinearity is not an important issue. The full distribution is "lumpy" in many places including the threshold itself, 10 and in order to convincingly employ the regression discontinuity approach, the distribution of individual characteristics should be smooth around the threshold. A closer inspection reveals that bunching is very systematic -it occurs at points that correspond to splitting shares of the firm as exact fractions.

Smoothness of the distribution
Thus, for one, non-randomly distributed observations at bunching points differ from others because they correspond to firms that choose to split ownership in such a regular way and it is possible that observations that are bunched at these selected points are not similar to the neighboring ones -splitting shares equally is likely to be correlated with many characteristics of individuals and firms. 11 Hence, we proceed by eliminating exact fractions from the sample as explained in the appendix.
The outcome of this trimming procedure in terms of the number of observations is shown on Figure   3The procedure is necessary to apply the regression discontinuity approach. It introduces a natural limitation for the interpretation of our results: we are focusing on a subsample, so that the estimated effects are for the corresponding population only. We want to re-emphasize though that the procedure relies on a systematic selection rule based on pre-existing variable, so that it does not depend on the effect of any reform. The procedure does of course change the composition of the sample -that is precisely its objective -but we expect that the resulting subsample satisfies the necessary conditions for the RD design.
We proceed with the subsample defined in that way in what follows. Figure 2 that we have discussed before is based on this sample. Appendix Figure A4 shows the same on shareholding level. In this sample, a number of characteristics -age, number of owners in a firm, gender and log of the initial capital all change fairly continuously around the 10% mark for our restricted sample (see Appendix Figures A5, A6, A7 and A8). The conditional mean of age and gender is noisy but this is so mostly away from the threshold. An inspection of the density demonstrates that it is still not completely smooth around the 0.25 share -our rules for eliminating fractions do not seem sufficient for dealing with that bunching. Tax rules pre-reform also provided an incentive to have active ownership below 2/3 in order not to be subject to the so-called split model that taxed part of profits at labor income tax rates -as the consequence, there are many examples of firm that assigned just over 1/3 stake to passive owners, in particular often dividing it further in half (e.g., among two children) and hence resulting in shareholdings of just over 1/6th -some of the irregularities are likely associated with that. We draw two conclusions. First, the data around the 10% threshold appears reasonably smooth and we will limit the window around the threshold to at most of 0.05 on each side, where the case for smoothness of the distribution is strongest. Second, we will test robustness of the results by controlling for demographic characteristics.
The effect of 10% rule on individual adoption indicate that the discontinuity is present and very statistically significant both if adoption is defined for shareholding and when it is defined for an individual. 12 In particular, our preferred estimates (on shareholder level, using larger windows around the threshold) indicate that individuals just above the threshold are 4 percentage point more likely to adopt the E-firm, relative to the base of approximately 10.5 percentage points -nearly 40% increase.
In the following panel we pursue basic robustness checks by including a set of individuals controls that we investigate before -age, gender, number of individual owners and log capital. Inclusion of these additional controls has small impact on both estimates and standard errors, providing some comfort that composition differences are not driving the results.
While the evidence that the 10% ownership share matters for the decision to adopt the transition E rule is robust, we are primarily interested in it in order to use it as "first stage" and trace its implications in the network. We are more likely to be able to statistically trace such responses if the first stage effect is strong. Hence, we further investigate subsamples in order to zoom in on a group, if any, that is particularly strongly affected.
Since the benefit of setting up an E-firm is due to reduction in taxation of capital gains or dividends, individuals and firms that generate capital income should be more likely to adopt. Hence, if we further restrict the sample to those shareholdings of individuals who received dividends in 2004 (i.e., pre-reform), results are noisier but arguably more pronounced (see also Appendix Figure A14), and there is no discernible effect for the remainder of subsample (Appendix Figure A15). The third panel of Table 2 tests formally that the robust effect is there for those with dividends in 2004 and that the magnitude of the effect is much larger than for the full sample so that despite this group including only about 1/3 of the original sample the t-statistics are of comparable magnitude (consistently with the Appendix Figure A15, there is no robust evidence of an effect for those with no dividends; the results are not reported).
The following panel imposes an additional restriction on the sample by limiting it to those individuals who own firms that have over 1000 shares -a group for which the abstraction of "continuous" variation in ownership shares is more realistic. Figure 5 shows the likelihood of adoption on an individual level for that sample, and we see a jump at the threshold. The last panel of Table   2 shows that the effects are large and robust. 13 Our final piece of evidence on the individual level relates to how e-firms are set up. Individuals can set up an e-firm either on their own or with others and the 10% rule makes it easier to pursue the latter. Figures 6 and 7 show that the effect is very clearly driven by setting up E-firms on one's own, with little evidence that there is any decline in setting up e-firms with others.
Overall, the results in this section clearly demonstrate that the 10% discontinuity played an important role in determining take up of E-firms. Those with just over 10% share are much more likely to do so than those below and the difference is both economically and statistically large. The effect is heterogeneous. It is there for those who are most likely to benefit from it -individuals who have the history of receiving dividends. While this is intuitive, it also indicates that either alternative means of setting up an e-firm (coordinating with others or purchasing additional shares) are costly enough or that the information about availability of the shelter is not there, so that 13 Appendix Figure A16 shows no jump at the threshold for the remaining individuals owning firms with less than 1000 shares. Restricting the sample to just those with over 1000 shares, with no dividends-in-the-past restriction, also strengthens results relative to the original sample. those below 10% share who are otherwise similar do not end up taking up an e-firm. The e-firms stimulated by the 10% rule are single owner ones, with no evidence of crowdout of multiple-owner e-firms, suggesting that setting up an e-firm with others was not the alternative entertained by the population complying with the treatment. Hence, those that take up e-firms as the result of the treatment would have either been uninformed about this option or found coordination too costly in the absence of the treatment.

Network effects
We now turn to the network level analysis by analyzing take up of E-firms of an individual (i) as a function of ownership of a network member (j). As discussed before, for this analysis, we focus on the data on the shareholding level so that each shareholding of a family network member is a separate observation affecting the impacted individual. We limit attention to network members who fall into subsamples in which we showed evidence of a discontinuity in adoption: we exclude network members with fractional shares, and further zoom in on those receiving capital income and in firms with large number of shares. We do not impose any additional restrictions on individuals (i) themselves -the running variable (ownership share) is the property of the network member and she may affect family members regardless of their characteristics (though we will investigate heterogeneity).
Before proceeding further, we want to make sure that when we compare individuals with network members on either side of the 10% threshold, this is the only difference between those groups. Figure   A17 shows though that as the network member's share is crossing 10%, the share owned by the individual itself is more likely to be above 10% as well. It turns out that this is driven by family members owning identical number of shares in the same firm. Hence, in what follows, we restrict attention to network links between individuals who do not own shares in the same firm (this is our X i ⊥ X j orthogonality assumption). As figure 8 shows, in that subsample the likelihood of having a share above 10% sails smoothly through the threshold. We restrict attention to this subsample in what follows. Beyond the necessity of imposing this restriction to exploit discontinuity for identification purposes, it also has economic content: the interaction between "treating" and "treated" individuals is guaranteed not to take place in the context of the firm, but rather has to flow through other channels. 14 Figure 9 shows the discontinuity-based evidence of adoption elsewhere in the network on individual adoption, and top panel of Table 3 shows the corresponding estimates. The estimates of the discontinuity are generally significant and reasonably stable as the window around 10% is adjusted.
Zooming in on individuals who own at least 10% ownership share in any firm strengthens the results.
While the network effect may be present regardless of one's own ownership, individuals who already own at least 10% are already eligible for setting up an E-firm without any additional arrangements and hence may be more strongly affected. At the same, by virtue of their eligibility, they are more likely to set up an E-firm regardless so that the additional network incentive might be expected to be weaker for that reason. The effects for this group are larger suggesting that the first effect dominates.
The second panel shows robustness of the results to inclusion of demographic controls -they are essentially unaffected.
Following up on our previous discussion, we further split the sample by whether the network member received dividends in 2004. Figure 10 shows suggestive evidence of discontinuity when the family member received dividends -this is the group for which the first stage was strong. At the same time, Figure 11 shows no evidence of a discontinuity for the rest of the sample. The bottom two panels of Table 3 show the corresponding estimates that confirm these impressions. That the results for those with family members who have not received dividends are generally insignificant is consistent with the interpretation of take up by a family member reflecting the presence of the treatment: since the direct effect on take up for that group was not detectable, one should not expect that their family members are affected. 16 In Table 4 we split the sample in additional ways. First, we look at those with family members with dividends in 2004 and firms with over 1000 shares. In this group, the first stage was strong and the corresponding results are strong here as well. Then, we split the sample by whether the treated individual itself received dividends in 2004. We find much stronger statistical evidence for those who did not receive dividends themselves than for those who did. The coefficients for those without dividends are larger in absolute value despite the lower base and hence are also economically very significant -for example, the estimated effect of 0.04 for the flexible specification corresponds to roughly doubling the take up. A rough taxonomy of the results is that individuals with most to gain (those with dividends) are most responsive to the 10% threshold incentive, but they stimulate take up by individuals who have less potential to gain (those without past dividends) and so perhaps least informed otherwise.
15 Appendix Table A1 shows the results from the specification that pools network links with and without dividends, but includes a dummy for the network member having received dividends, its interaction with crossing the threshold and ownership share controls restricted to be the same across groups -this restriction strengthens the results. 16 As we discussed in the context of the model in Appendix A, in principle the treatment may have an effect on family members even when it does not affect the decision of the treated individual itself. In particular, those without dividends may choose not to take up the shelter but having been given an opportunity to do so may now be in a position to inform others. Although the network results for that subsample are for the most part insignificant, they are consistently positive and fairly stable as the window around the discontinuity point widens.
14 In order to further substantiate the presence of network effects, we investigate a different dimension -timing. We will return below to regression discontinuity based evidence but begin by providing suggestive evidence of a strong association between timing of adoption in the network and individual adoption. Figures 12-15 illustrate the dynamics of setting up tax shelters in the data. Figure 12 shows the adoption of the tax shelter by individuals with ("exposed") and without ("not exposed") a family member setting up a tax shelter prior to June 17 2005 (ie., those who were able to meet the tight eligibility criteria, one needed 90% ownership stake to be transferred to the e-firm). Exposed individuals end up approximately 6 percentage points more likely to eventually set up a tax shelter. Figure 13 represents the same information as the CDF of the timing of adoption conditional on ultimately setting up the tax shelter and it shows that even conditional on ultimate adoption there are differences in timing -those who have exposed family members adopt earlier than others.
There are few adoptions during the early period: as also visible in Figure 1, the timing of adoption is heavily concentrated toward the end of the period. Figures 14 and 15 show analogous exercise but this time splitting the sample according to having a family network adopter prior December 1st, 2005 (the date is arbitrarily selected for illustrative purposes; close to half of all of the ultimate E-firms have been set up by that point). As before, individuals in exposed networks are more likely to ultimately set up a holding company and they do so earlier than those who have no family members who already set up an E firm. Perhaps because of the relatively short period of time left before the deadline, it is a bit harder to make the claim that the potential network effect wears off over time, although Figure 15 appears consistent with the two series converging a few days before the end of December. These patterns cannot be interpreted as causal but they do suggest that there is correlation between adoption by network members in the past and individual's own adoption of the tax shelter. They also suggest that there may be an effect on timing: individuals in networks with early adopters are not just more likely to adopt in general, they also tend to adopt earlier than others. Table 5 shows the result of regressing E-firm adoption dummy on the indicator for having somebody in the family network adopting by a particular date, with various sets of controls. Only the coefficient on the network dummy is reported and each cell corresponds to a different regression.
The first panels show the results of regressing the dummy for ever setting up an e-firm on the dummies for having somebody in the network setting up by June 17, November 1 and December 1.
Consistently with the graphs discussed before, the results of baseline regressions with no controls show a strong effect in each case. In the second column, we control for a number of demographic characteristics: gender, immigrant dummy, urban dummy, self-employment status, education dummies, business/law education dummy, number of children and age dummies (decades). Including these controls does not have a strong effect on the estimated coefficient although many of them are individually very significant (not reported). The final column shows the effect of including economic controls: logarithms of total income, net worth, capital income and 2004 dividends. Inclusion of these variables reduces the estimated network coefficients but they do retain statistical significance.
This indicates that early take up in one's network correlates with individual economic characteristics that are relevant to take up decisions, but that it works beyond them.
These results suggest, perhaps unsurprisingly, that the early adoption in the network is correlated with individual economic circumstances. At the same time, expecting that adoption by a family member before June or even November makes the difference for ultimate adoption may be somewhat of a stretch: the effect may be on timing rather than ultimate adoption especially if members of networks adopting early are likely to adopt in general. To rudimentarily pursue it further, we note in the following two panels that adoption before December 1st is more robustly explained by family network adoption before November 1st and adoption before November 1st appears correlated with family network adoption pre-June 17. Especially in the latter case, the effect of economic controls on the estimated coefficient is weakened. This is consistent with the coefficient on early adoption picking up the effect of inducement over the short horizons, but at least partially reflecting the effect of correlation of early adoption in networks with economic characteristics that ultimately matter over a longer horizon. At the same time, it is interesting to note that demographic characteristics, (while individually significant) do not seem to be correlated with early adoption. Overall, these results suggest that while adoption of an e-firm is also correlated with many demographic characteristics, it does not seem that correlation of early adoption in the family networks is related to these factors.
At the same time, it appears that the link between adoption and the network is less sensitive to the inclusion of controls as the horizon is reduced. This is intuitive: the impact of having someone in the network adopting should be on timing first of all and while the effect may persist in the longer term, it's possible that it's hard to distinguish from the effect of other characteristics correlated with early adoption.
This motivates our subsequent strategy that is more careful about timing. We regress the dummy for taking up an e-firm in a particular week on having somebody in the network take up a week before. Under this strategy, the interest is in the timing of adoption rather than the longer term effect. It's possible that taxpayers in the network are exposed to the same shocks (for example news) at the same time, but it is harder to make the case that individuals would happen to make similar decisions at similar time based purely on correlation in characteristics that are constant over time absent common shocks or interactions. Figure 16 shows the results for family network based on simple OLS regressions. This is again a linear probability model and this is a hazard-like context. Week 1 corresponds to the last week before 1/1/2006 (and the right-hand side variable is adoption a week before that) and higher numbers correspond to earlier adoption. The figure shows the baseline effect (constant) that represents adoption of the tax shelter by individuals with no exposure in the family network in the preceding week and the effect of those who were exposed last week (the sum of the constant and the coefficient on the exposure dummy), together with the 95% confidence interval for the latter. There is a significant effect for the last eight weeks of the year and some weeks before that. At the longer horizon, the effect is gone. It is possible that a week is in the right ballpark of the timing of inducement effect late in the game, but is too short of a period earlier when there is no reason to rush. Figure 17 compares the baseline estimated coefficients to the coefficients based on a specification with the set of demographic and economic controls as before. Contrary to the prior analysis that used wide range of adoption, the controls have very little impact on the magnitude of the effect.
This strengthens the possibility that these estimates have causal interpretation and that they don't simply reflect correlation in characteristics. Finally, figure 18 shows the effect of adoption in the last 3 weeks rather than 1 week. These estimates are smoother and more robustly extending further out, but a bit smaller, suggesting that the recent take up has stronger effect than take up further out These results are suggestive, but cannot be interpreted as causal. We use our regression discontinuity approach to further corroborate the presence of interesting timing effects. In all the following specification we focus on the 0.05 window around the discontinuity point.
In Table 6 we look at the effect on the number of days before 1/1/2006 when the tax shelter was established, with individuals not establishing the shelter assigned a zero value. The OLS specification results in positive, but for the most part insignificant coefficients for the full and dividend samples. Since timing is a censored variable, these results are biased downward. As an alternative, in the following panel we makes the normality assumption and estimate the effect on the date of setting up a shelter via Tobit specification. The Tobit estimates have the same sign as the OLS ones but, consistently with the expected OLS bias, are much larger and statistically significant. The results indicate that having a family member exposed to the 10% rule accelerated take up of the tax shelter by as much as 20 days; the results are robustly significant for the sample with dividends, smaller for the full sample, and possibly zero (with large standard errors) for those with family members who did not have dividends in 2004.
In Appendix Tables A2-A5 we focus on results for particular periods for the full sample and those with at least 10% ownership. Focusing on the results for everyone, in Appendix Table A2 we report results from probit specifications The table contains an estimate of the effect on probability of adopting at the threshold and the effects on log probability to allow for more meaningful comparison across different periods. 17 The effect is strongest in the second month before the deadline and it appears to be there for both those with and without network members who received dividends.
The evidence of the effect in the last month is weaker and inconsistent across specifications. The results for three or four months prior to the reform do not indicate an effect though they are noisy and sometimes counterintuitive, reflecting a small number of individuals taking up in this period.
Appendix Table A3 shows the cumulative effect -impact on adoption by the time of the reform (same as our main specification) and by 30, 60 or 90 days pre-reform. For the full sample, the results indicate that the bulk of the effect is already there there 30 days before the reform. For those with family members who have received dividends, the effect appears to be continuing until the deadline.

Conclusions
We considered adoption of a legal tax sheltering strategy in Norwegian family networks. Relying on a regression discontinuity design in the incentives to adopt, we showed that family members of individuals who had a strong incentive to pursue tax sheltering (and who, in fact, responded accordingly) are more likely to pursue tax avoidance themselves. This is further corroborated by the evidence from the timing of responses. These patterns are not uniform across different group of individuals. The propensity to adopt at discontinuity is strongest by individuals who are most likely to benefit (as measured by history of capital income) and its their family networks that are affected. At the same time, it is those members of family networks who themselves do not have a strong reason to pursue tax avoidance that respond most strongly. This is consistent with two possibilities: these are either uniformed individuals or they face high cost of adoption relative to benefits and that this cost is reduced by having a family member familiar with the process. (2015) highlights that network incentives matter in the VAT context; in our case, however, there is no compliance spillover that may explain our findings -the strategy is legal and networks are not linked by business interests that could explain correlated behavior. Instead, it is knowledge, reduced costs of planning or norms that need to be transmitted within a network. Our evidence of heterogeneous patterns of response points to knowledge and cost as likely channels.        q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q   Family member's ownership share (individuals not owning the same firm) %with ownership share above 10%  Effect on take up q q q q q q q q q q q q q q q q q q q q q q Exposed Not exposed With controls Without controls 95% CI with controls

A Theoretical framework
We are interested in understanding how adoption of tax avoidance strategies within a network affects the individual shareholder's uptake of such strategies. Consider individuals j and i who are linked in the family network. We are interested in the determinants of the decision to adopt an e-firm, Ei, that we presume is determined by a latent variableẼi, Ei = I(Ẽi > 0) (where I is an indicator function). We assume thatẼi = fi(Ki, Xi) + εi where Ki is the set of endogenous variables that may play a role in individual interactions (though, to simplify notation, we will be explicit about it only where it matters), Xi is the set of individual characteristics and εi is the error term orthogonal to fi(·). To fix attention, we will refer to Ki as taxpayer's awareness of tax shelter possibility. In general, we assume that Ki(Kj, K i −j , Xi), where Kj is awareness of taxpayer j and K i −j is awareness of individuals in the network other than j, but we will mostly consider just two individuals, so that Ki(Kj, Xi) and Kj(Ki, Xj). We also assume that derivatives of f and K are non-negative.
The framework assumes that individuals affect each other through variables K and allows for reciprocal reactions. 18 For our purposes, we do not need to assume symmetry so that functions Ki(·) and Kj(·) may be different. We focus on the interaction between i and j and are agnostic about the role of K−j at this point, but we will return to it below. Shocks to the environment are represented by the effect on Xi. Common shocks could affect both Xi and Xj simultaneously, but in what follows we will focus on tracing out implications of an idiosyncratic shock to individual j, ∆Xj, that corresponds to the source of identification that we explore in our empirical work.
Shocks may potentially have four different qualitative effects. First, they affect awareness of the recipient directly (when it is the shock to individual own environment) when ∂K j ∂X j = 0. Second, they may affect awareness of others when ∂K i ∂K j = 0 (with the feedback to the original recipient when ∂K j ∂K i = 0). Third, the overall impact on awareness matters for sheltering behavior when ∂f j ∂K j = 0. Fourth, sheltering may be affected by the shock directly without altering interactions with others ( ∂E ∂X j = 0). 19 The total impact of the shock to individual j on the individual itself may be traced out as follows: Remark 1. Suppose that ∆Ẽ i ∆X j = 0. It implies then that ∆K i ∆X j = 0, thereby providing evidence of social interactions between individuals.
This observation underlies our basic test. We will attempt to estimate the effect on ∆Ẽ i ∆X j by using variation in Xj that is (assumed) independent of Xi. The independence assumption is natural in the presence of explicit 18 This framework is general enough to accommodate many economically interesting special cases. For example, suppose that Xj represents information of individual j. A shock to xj may affect individual i when person i and j interact, but there need not be a feedback effect on individual j since person i has no additional information over person j as the result of that shock. This case fits in this framework by allowing K to be two-dimensional, ∂K 2i = 0, for example when K1i(K1j, K2j, Xi) = g(Xi) and K2i(K1j, K2j, Xi) = h(K1j) for all i, j so that a signal affects own awareness K1 and, through this channel, network member's outside knowledge K2 but there is no feedback from K2 on others.
19 Note that separating K from adoptionẼ allows shock to individual j to affect take up of individual i without affecting propensity of an individual j to take up -this would be the case whenever ∂f j ∂K j ∆K j ∆X j + ∂f j ∂X j = 0 (in particular, when both derivatives of fi are zero) but ∆K i ∆X j = 0. For example, an individual j may (exogenously) learn about sheltering opportunity that is not of interest to her (given all her characteristics Xj) but still pass such information to others. randomization, for example as in Duflo and Saez (2003); we are going instead to rely on a regression discontinuity design that also makes such an assumption plausible and appealing.
Remark 2. ∆Ẽ j ∆X j = 0 is neither necessary nor sufficient for the presence of social interactions. Observing that an individual responds to his own incentives is not sufficient to establish presence of interactions. It is also strictly speaking not a necessary condition (see an example in footnote 18 -a taxpayer who is exposed to information about a shelter may transmit information to others but not act on it) although it is arguably unlikely.
The formulae for ∆K j ∆x j and ∆K i ∆x j reflect interactions between those terms but can be combined to obtain: Individual j responds to a change in Xj due to the direct effect it has on sheltering and due to the effect it has on own awareness Kj. The latter effect is magnified due to the presence of interaction reflected by term S. Sheltering of person i is not affected by Xj directly, but it is affected through the awareness channel. The exogenous shift in awareness is due to the impact it has on awareness of person j (magnified by the presence of spillover effect S). This shift affects person i's awareness to the extent that interactions are present -∂K i ∂K j -and affects the ultimate decision to the extent that awareness matters for sheltering, ∂f i ∂K i = 0. In general, separately identifying the direct impact on sheltering ∂f j ∂X j , the impact through increased awareness ∂f j ∂K j ∆K j ∆X j and the interaction effect ∂K i ∂K j based on estimates of ∆Ẽ j ∆X j and ∆Ẽ i ∆X j is not possible. Note though that Remark 3. Assuming that ∂f j so that the ratio of the coefficients contains information about the strength of the social interactions.
≈ 0 is restrictive but allows for a major simplifications of formulas 3 and 4. That assumption does not rule out effects on others -indeed, other members of the network of person j can be still affected and influence person i -but it rules out feedback effects of higher than second-order: a shock to person j affecting person i who in turn affects some other person k and recognizing the feedback from person k back to i and j. Imposing additional structure on the model may allow for incorporating these types of effects. The complication in our context is that networks are not disjoint so that modeling equilibrium is considerably less tractable than in the case of, for example, peer effects within a school or neighborhood effects. We leave addressing this issue for future work. The second aspect of restrictiveness of this assumption is that individuals i and j may (and, indeed, usually will) have common network members so that even if one was willing to rule out higher order feedback effects, some members of K i −j may not be ignored. This is a conceptually separate issue that may be explicitly addressed by enumerating those individuals in arguments of Ki and Kj. Denoting a set of common network members by M , To make progress, assume indeed that ∂f j ∂X j = 0. 21 Then, making an additional assumption that ∂f j ∂K j = ∂f i ∂K i (or that the ratio of the two terms is some other known constant), the ratio β i β j would identify the strength of social interactions as the ratio of the two effects.
The assumption, ∂f j ∂K j = ∂f i ∂K i , is restrictive. When the level of awareness for the individuals varies, Ki = Kj. In fact, in our case, we will contend that individuals who are affected by the shock that we rely on to identify the effect are relatively well informed anyway while those in their networks are not so that Kj > Ki. In that case, we would expect Our objective is to apply procedure that would eliminate such bunching in a way that is systematic but at the same time would allows us for keeping as many observations as possible (after all, every shareholding is a fraction with the denominator equal to the total number of shares in a firm). We eliminated shareholdings that are exact fractions with denominators between 1 and 20, 25, 30, 40, 50, 100 and 200. In particular, this removes all shareholding that are multiples of 0.005, and of course the discontinuity point 0.1 itself. We additionally remove points that are within 1 share of a fraction with the denominator of 3, 6, 9 or 10 -there is evidence of bunching at that kind as well that is particularly strong for these values and occurs very close to the discontinuity point when the denominator is 9 or 10. The resulting histogram is Figure 3 and it no longer shows evidence of significant bunching when aggregated to 0.001 intervals. The procedure eliminates a large part of the sample -24,294 out of 47,682 observations in the (0.05, 0.15) interval (with almost 7,600 removed observations at 0.1 and another 5,600 at other 1/100th multiples in the interval). Results are robust to adjustments involving reasonable expanding or limiting the set of denominators accounted for as long as major discontinuity points are eliminated.
The bracketed terms in the last expressions are symmetric so that it is natural to assume that they are the same on average. Then, these two terms can be written as and M is the number of common network members.
Suppose that we were able to observe ∆K j ∆x j and ∆K i ∆x j , then by controlling for M , we can identify x and combine ∆K j ∆x j and ∆K i ∆x j to recover ∂K i ∂K j as before. Since K is not directly observable, pursuing the same exercise as before using observable sheltering decisions E again requires the assumption that ∂f ∂x j = 0.
21 Assumption of ∂f j ∂X j = 0 effectively eliminates the distinction between K andẼ by ruling out the possibility that Xj may have an impact on sheltering that is not interacting with behavior of others. In particular, it eliminates the natural kind of heterogeneity where individuals are interacting using some variables K but the strength of their response is determined by the value of Xj. In our context, Xj is likely to have an independent effect because it reflects eligibility for setting up a tax shelter that reduces the cost of acting for a particular individual -this effect is conceptually separate from, for example, increased awareness of the shelter and may influence behavior of a taxpayer without affecting others. If, on the other hand, a taxpayer affects others via the decision to shelter only, the assumption of ∂f j ∂X j = 0 would hold.  Online appendix page 2 Online appendix page 3 Online appendix page 4 Notes: Estimates of the effect of probability of adopting in the given timing window before the reform, as a function of having a network member with over 10% ownership (regression discontinuity, using (0.05,0.15) range)

Online appendix
Online appendix page 5 Family member's ownership share %with ownership share above 10% Online appendix page 14