1 Introduction

Neighbourhood investment programmes target government transfers toward particular geographic areas rather than individuals (e.g., Glaeser and Gottlieb 2008). These investment programmes have been evaluated using several different econometric techniques. A series of recent studies in this area have used regression discontinuity (RD) designs to estimate treatment effects. For example, Busso et al. (2013) evaluate the employment effects of the U.S. federal urban Empowerment Zone programme; Freedman (2015) studies the labour-market effects of the New Markets Tax Credit programme in the United States; and Horn (2015) investigates the relationship between school quality and capital investments in the housing stock using a boundary discontinuity identification strategy.

RD designs are increasingly used by economists to estimate treatment effects in a nonexperimental situation where treatment is determined by whether an observed forcing variable exceeds a cut-off value.Footnote 1 One of the main reasons for this increased popularity is that variation around the cut-off value, which determines assignment to the treatment, can be considered as good as random because those who take part in the programme have no control over the assignment (e.g., Lee 2008). This inability to control or influence the assignment to the treatment suggests that the identifying assumptions required for a valid design are relatively weak (e.g., Hahn et al. 2001). It is very important to check whether the identifying assumption is valid because (public or private) knowledge about the assignment rule might influence the assignment to the treatment. Influencing the assignment to the treatment invalidates the key assumption that individuals on either side of the discontinuity threshold are similar. Recent studies have considered the possibility of such “endogenous sorting” around the discontinuity threshold and have developed tools to examine its presence and consequences (e.g., Lee 2008; McCrary 2008). In addition, a number of studies offer examples of sorting around the discontinuity threshold. It seems to be the case that sorting is driven by incentives for potential receivers of the treatment to select themselves into the treatment, such as home owners, parents/schools, tax payers or traders on financial markets (e.g., Bayer et al. 2007; Urquiola and Verhoogen 2009; Saez 2010; Bubb and Kaufman 2014; Vogl 2014).

This research adds a novel case to this relatively new literature about the application of RD designs when there are opportunities for influencing the discontinuity threshold. We carefully describe a case of sorting disadvantaged areas into a large scale neighbourhood investment programme. In this case sorting did not result from subjects being able to game the threshold or to select into and out of the treatment group. Rather the designers of the programme selected units in such a way that the design becomes invalid. To be specific, policymakers at the national level, who designed and implemented the assignment rules for the investment programme in disadvantaged neighbourhoods, sorted areas into and out of the programme in such a way that there exists a large discontinuity in the share of non-Western immigrants at the discontinuity threshold. In Sect. 2 we reconstruct how the selection took place that eventually led to the discontinuity.

The neighbourhood investment programme was implemented in 2008 and consisted of large scale neighbourhood investments in social and physical infrastructure aimed at improving the living conditions in disadvantaged neighbourhoods in the Netherlands. Approximately 4000 postal code areas (PCAs)Footnote 2 were ranked based on a neighbourhood ‘quality’ index (e.g., Tables 1, 2, which we discuss in more detail below). This index was constructed by making use of eighteen different items (see Table 9 in the “Appendix”). PCAs with the worst outcomes on the ‘quality’ index were merged into 40 neighbourhoods. These neighbourhoods were selected into the programme and received additional funds. In the end, 83 PCAs received funding from the programme. Together these 83 PCAs are put together to form 40 neighbourhoods. In the period 2008–2011 the Dutch government invested 216 million Euros in these 40 neighbourhoods, while an additional amount of one billion Euros was invested by housing corporations.

Fig. 1
figure 1

Control (blue), treatment (red) and non-compliance (green) at the PCA level. (Color figure online)

The assignment of PCAs to the programme based on the ‘quality’ index score is a textbook example for the application of a RD design for estimating the causal effect of the programme. The reason why this—at first sight—is a good opportunity for applying a RD design is that PCAs, which are statistical units without any direct institutional status, should have no influence on being treated and on being selected into a neighbourhood of adjacent PCAs. However, despite this expectation we observe at the threshold a surprisingly large and statistically significant gap in the proportion of non-Western immigrants. This gap is between 11 and 21% points depending on the specification we use. Next to this unexpected discontinuity at the threshold, there appears to be non-compliance with the assignment rule because twelve eligible PCAs have been excluded from the programme by the decision makers, whereas two others have been added to the treatment group. The observed pattern of non-compliance with the assignment rule shows a similar difference in the share of non-Western immigrants. These differences cannot be explained by sorting induced by local authorities at the municipality level, as they had no control over the assignment to the treatment, nor had these local authorities the ability to influence the score of individual PCAs on the ‘quality’ index, and they have also not been able to influence the composition of neighbourhoods based on adjacent PCAs. Finally, it seems unlikely that a random threshold produces such large differences in the proportion of non-Western immigrants at the discontinuity threshold.

Fig. 2
figure 2

Example of constructing neighbourhoods. Schilderswijk, The Hague, neighbourhood boundary according to Statistics Netherlands (in bold) versus boundaries of PCAs selected into the neighbourhood programme

The violation of a continuous distribution around the discontinuity threshold of such an important baseline characteristic could be due to the way the selection process of neighbourhoods has been carried out. Politicians at the national level demanded that there had to be a list of 40 eligible neighbourhoods. To determine the 40 neighbourhoods, a two-step procedure has been used. In the first step, a preliminary list of 40 neighbourhoods was created based on the most disadvantaged PCAs according to the PCA ‘quality’ index. Because neighbourhoods can consist of multiple adjacent PCAs, policymakers at the national level sometimes merged PCAs with different rank numbers to create a neighbourhood. This opens possibilities of adding lower-ranked PCAs to an already identified neighbourhood based on a PCA ranked higher. When we move down the list of PCAs, it is possible to add more PCAs beyond the point at which 40 geographical PCAs have been identified as neighbourhoods. This process continues until a PCA from a different geographical area is next on the list and would become neighbourhood number 41. We show that the PCA that defines neighbourhood 41 is indeed in another city and that the last PCA that has been added is part of one of the previously defined neighbourhoods. We illustrate the selection of PCAs into neighbourhoods. Figures 1 and 2 and explain the selection process in more detail in Sect. 2. In the second step, a number of PCAs were removed from and added to this list to obtain a final list of 40 eligible neighbourhoods. The added neighbourhoods are not close to the discontinuity threshold, as we will describe below.

We illustrate the bias of the RD estimates when using the official cut-off. We find that the estimates from RD models that do not take account of sorting differ from the estimates from RD models that do account for sorting. We also show that a different selection process of 40 neighbourhoods does not lead to a discontinuity in the share of non-Western immigrants. Finally, we cannot rule out that the result of selecting 40 neighbourhoods in this way is a case of bad luck. Using the same procedure to select 30 neighbourhoods does not yield the same discontinuities. Nevertheless, this set of estimates and our investigation of the selection process provides a new case of sorting around a discontinuity threshold in a situation where the units that might receive treatment have no control over their assignment to treatment. We view our findings as another cautionary note regarding the use of RD designs. This conclusion does not only apply to the area of urban economics but applies in general to situations in which policymakers have control over the assignment to the treatment.

This paper is structured as follows. In the next section we provide a description of how the neighbourhood programme was developed and implemented. Sections 3 and 4 document the most salient details of the data and our empirical strategy. Section 5 presents the estimation results. In Sect. 6 we show what happens when we use the invalid design to evaluate the outcomes of the investment programme. Section 7 concludes.

2 Background of the Neighbourhood Investment Programme

In 2008 the Dutch government introduced a programme to improve the quality of life in disadvantaged neighbourhoods. Until 2011 the national government invested 216 million Euros in these neighbourhoods, while housing corporations added about one billion Euros to the programme. The programme was targeted towards investing these resources in the most disadvantaged neighbourhoods in the country. The programme was an important part of the newly appointed government and was instigated by the Labour Party (Partij van de Arbeid). When the programme was announced in 2007, it received a great deal of media attention as it was one of main spearheads of the newly established political coalition. A new ministry was established to among others manage and monitor this programme (the Ministry of Housing, Neighbourhoods and Integration). Statistics Netherlands was asked to deliver a range of statistics on the outcomes of treated neighbourhoods in an annual outcome monitor. In addition, government research organisations were asked to evaluate the effects of the policy and the Court of Audit monitored whether the funds were appropriately invested in the targeted areas.

Table 1 40 Neighbourhoods consisting of 83 PCAs (in alphabetical order)

2.1 Defining and Ranking Neighbourhoods

The neighbourhoods were created from PCAs that were ranked according to a ‘quality’ index. For each of the selected neighbourhoods a tailor-made investment plan was developed. Some neighbourhoods invested in physical infrastructure, others spent more on reducing social problems. The Dutch government’s Court of Audit made an elaborate overview and has assessed the expenditures (e.g., Court of Audit 2008).

The PCA ‘quality’ index was constructed by making use of eighteen different items. These items cover socioeconomic disadvantages, physical disadvantages, and a range of social problems, such as nuisance, vandalism or insecurity, but also social problems in terms of poor housing, environmental pollution, heavy traffic, noise pollution and a lack of safety. The items were both based on measured socioeconomic variables and information about the housing quality and obtained through surveys about nuisance and feelings of insecurity among residents (see Table 9 in the “Appendix”). The scores on this index were collected at the four-digit PCA level. The ranking of PCAs was used to construct and thereafter select the most disadvantaged neighbourhoods. There are approximately 4000 PCAs in the Netherlands.

The area of a single PCA is not always considered to define a neighbourhood. In many cases multiple, geographically adjacent PCAs form neighbourhoods. Together the selected PCAs formed 40 constructed neighbourhoods that consist of 83 PCAs. This number of 40 was—according to the responsible politicians at the Ministry of Housing, Neighbourhoods and Integration—a sound number of neighbourhoods to be able to guarantee a sufficiently large monetary investment, to carefully monitor progress and to pay regular visits.

Table 1 shows the list of the 40 disadvantaged neighbourhoods and the 83 PCAs they consist of. Figure 1 shows a map of the Netherlands in which the 83 treated PCAs are highlighted in red. In most cases, disadvantaged neighbourhoods (PCAs) are located in the largest cities of the country. The vast majority of the neighbourhoods is concentrated in the four largest cities in the Randstad (i.e., Amsterdam, Rotterdam, The Hague and Utrecht). The PCAs in blue and green are control and non-compliance areas, respectively. We explain them below in more detail.

2.2 The Process of Selecting Neighbourhoods

The consequence of the political decision at the national level to merge 83 PCAs to arrive at a number of 40 neighbourhoods is that PCAs with consecutive rank numbers (on the ‘quality’ index) are not necessarily geographically adjacent to each other. In most cases a neighbourhood consists of multiple PCAs with different rank numbers. Moreover, the geographical boundaries of (a collection of) PCAs yields neighbourhoods that do often not correspond to the official classification of neighbourhoods as defined by Statistics Netherlands (CBS). Figure 2 shows an example. It displays the neighbourhood Schilderswijk in the Hague, which, according to Table 1, consists of PCAs 2525 and 2526. The fat solid line depicts the geographical boundary of the neighbourhood according to the official classification of CBS. The thin solid lines depicts the boundaries of the PCAs. As can be seen, the areas do not coincide. Moreover, the neighbourhood not only consists of PCAs 2525 and 2526, but also of a number of other PCAs. Also, parts of the PCAs 2525 and 2526 do not lie in the Schilderswijk.

The process to construct 40 neighbourhoods involved two steps. First, 40 neighbourhoods were constructed by moving down the list of PCAs. Since these neighbourhoods do not necessarily coincide with the official classifications of Statistics Netherlands but consist of adjacent PCAs, it is difficult to precisely reconstruct the exact scope of these initial 40 neighbourhoods. In the second step, policymakers removed and added PCAs to the list to arrive at a final list of 40 neighbourhoods.

Table 2 Ranking of postal code areas (PCAs) and the neighbourhoods they belong to

Table 2 shows the results of both steps. The table documents the worst 187 PCAs in the Netherlands according to the ‘quality’ index (we discuss the most salient details of the index in Sect. 3). The first two columns display the rank number and PCA (the higher the rank, the worse the score on the ‘quality’ index). The third column shows the number of the neighbourhood the PCA has been assigned to. The fourth column displays the neighbourhood’s name. The printing of the neighbourhood ranks defines whether or not a neigbourhood is part of the treatment group. Neighbourhood ranks displayed in italics only are part of the treatment group, neighbourhood ranks in italics and bold have been removed from the treatment by policymakers and neighbourhood ranks in bold only are part of the control group. We link these PCAs to a neighbourhood just as the policymakers linked the non-removed PCAs to neighbourhoods. That is, we reconstruct the preliminary list from the first step. If we move down Table 2, at least four observations stand out.

First, and consistent with Fig. 1, a number of PCAs have been put together to form one neighbourhood. For instance 3086 (rank 2) and 3085 (rank 31) in Rotterdam form one neighbourhood (Zuidelijke Tuinsteden). This selection rule to define neighbourhoods leads to putting together PCAs into neighbourhoods until the 41st neighbourhood needs to be defined.

Second, the official cut-off is set at rank 93. Policymakers at the national level arrived at this point after removing 12 and adding 2 PCAs to the list in the second step of the selection process. The 12 removed PCAs are bold italic in Table 2. These areas are mostly touristic centres in which there is nuisance in terms of traffic and environmental pollution. We linked these PCAs to a neighbourhood. PCAs 7533 and 1024 have been added to the list.Footnote 3 As can be seen, the cut-off lies at the point where 39 neighbourhoods have been identified. Including 7533 (Enschede Velve-Lindenhof) yields the 40th neighbourhood (this PCA is ranked 210th according to the ‘quality’ index). PCA 1024 belongs to Amsterdam Noord, which was already defined. This shows the tendency of policymakers at the national level of adding PCAs to already existing neighbourhoods until a 41st neighbourhood would be created.

Third, if the selection rule to define neighbourhoods was such that each single PCA would have been considered a neighbourhood, the point at which we can identify 40 ‘neighbourhoods’, would have been at rank 40 (just after 2533 Den Haag Zuid-West).

Fourth, if we allow for the combination of adjacent PCAs into a single neighbourhood, and do not remove the twelve PCAs as the policymakers did in the second step, we arrive for the first time at 40 neighbourhoods at rank 80 (just after including 4827 Breda Geeren-Noord). Both ‘reconstructed’ cut-offs are different from the official cut-off. We analyse the consequences of using different selection rules in Sect. 5.

Finally, Fig. 3 shows the relationship between the (scaled) ‘quality’ index of PCAs and the actual participation in the programme using the official cut-off (at row number 93). PCAs with scores above 0 are eligible to participate in the neighbourhood investment programme, while PCAs with scores below 0 are not (as shown on the horizontal axis of Fig. 3). Compliance and non-compliance with this assignment rule can be observed from the vertical axis of Fig. 3. The 12 PCAs with a score on the ‘quality’ index that would justify treatment, but have not been selected into the treatment, are shown at the bottom of the horizontal axis with scores above 0. PCA 1024 Amsterdam with a negative score on the ‘quality’ index that would not justify treatment lies to the left of cut-off at the top of the horizontal axis. PCA 7533 has also been added to the treatment, but is not displayed in this figure because it has a very low score on the assignment variable \((-2.3)\) and ranks 210th. It lies far to the left of the cut-off.

Fig. 3
figure 3

Assignment of PCAs to treatment by ‘quality’ index score

3 Data

The data for our empirical analysis are obtained from various sources. First, the ranking of PCAs and the score on the ‘quality’ index were obtained from ABF Research, the organisation that was asked by the government to construct the index. The ‘quality’ index will be used as the forcing variable for the assignment of PCAs to the programme in the RD model. We rescaled this variable in such a way that neighbourhoods with scores above 0 are eligible, while neighbourhoods with scores below 0 are not.

Second, we obtained information on seven outcome measures from the Ministry of Housing, Spatial Planning and the Environment: an index for the quality of life; the quality of the public space; social cohesion; safety; quality of public services; quality of the composition of the population and quality of the housing stock. The first measure varies between 1 and 7, and is based on the other six measures. These vary between \(-50\) and 50, with 0 corresponding to the national average. The numbers do not have a clear interpretation, except that lower numbers refer to lower quality. We obtained these measures for 2006, one year before the start of the programme, and for 2012, four years after the start of the programme.

Third, we obtained information from Statistics Netherlands on the size and composition of the population within PCAs: population size and the percentages of immigrants, Western-immigrants and non-Western immigrants. Fourth, we obtained national election outcomes at the ballot box level for 2010 and 2012.Footnote 4

Table 3 compares the means of the outcomes and covariates for all 93 eligible PCAs to the right of the cut-off and the same number of ineligible PCAs to the left of the cut-off.Footnote 5 We observe that in 2006, a year before the start of the programme the eligible PCAs on average do worse on nearly all outcome measures. Moreover, these PCAs have much higher proportions of (non-Western) immigrants. In 2012, four years after the start of the programme, we observe a similar pattern for the differences on the outcomes variables.

Table 3 Descriptive statistics of estimation sample

4 Empirical Strategy

The selection of PCAs based on the ‘quality’ index is at first sight an opportunity for applying a RD design to evaluate the effects of the programme. The cut-off for assignment to the treatment generates variation that is expected to be exogenous because it is beyond the control of the treatment and control PCAs. As the central government decided about the construction of the ‘quality’ index and because this index was not announced or available on beforehand, it can be expected that PCAs at both sides of the cut-off will be very similar. A comparison of the outcomes of PCAs close to the cut-off will then yield the causal effect of the neighbourhood programme. The basic assumption in this model is that the potential outcomes and characteristics of the PCAs are smooth around the cut-off.

This basic assumption can be investigated by performing balancing tests for the similarity of covariates or outcome variables before the start of the programme across the cut-off. These tests can be carried out by using a reduced form model as specified in Eq. (1):

$$\begin{aligned} Y_i =\delta _0 +\delta _1 Z_i +f(I)+\vartheta _i, \end{aligned}$$
(1)

where \(Y_i \) is an outcome or covariate before the start of the programme of PCA i, \(Z_i \) is a dummy variable that equals 1 if the ‘quality’ index is \(>0\) and 0 if the ‘quality’ index is \(<0\), and \(\vartheta _i \) are unobserved factors. f(.) is a smooth function of the ‘quality’ index, which is allowed to be different at either side of the cut-off \((f_l \) and \(f_r )\), as suggested by Lee and Lemieux (2010), i.e. \(f(I_i )=f_l (I_i )+P_i [f_r (I_i )-f_l (I_i )]\). The parameter \(\delta _1 \) reveals whether or not the outcomes and covariates before the start of the programme are balanced across the cut-off. Statistically insignificant estimates of this parameter can be considered as support for the main assumption of the RD model.

If this main assumption holds, the causal effect of the programme can be estimated by making use of specifications that are very similar to Eq. (1). In case of full compliance with the assignment rule, which means that all PCAs with a ‘quality’ index score above (below) the cut-off (don’t) enrol into the programme, the effect of the programme can be estimated using the following specification:

$$\begin{aligned} Y_i =\alpha _0 +\alpha _1 P_i +f(I)+\alpha _2 X_i +\varepsilon _i, \end{aligned}$$
(2)

where \(Y_i \) is the outcome of PCA i, \(P_i \) is a dummy variable for treatment, \(X_i \) is a vector of control variables and \(\varepsilon _i \) are unobserved factors. The main parameter for estimation is \(\alpha _1 \), which can be interpreted as the causal effect of the treatment on the outcomes. Identification of \(\alpha _1 \) is based on the non-linear relationship between the ‘quality’ index and the allocation of resources around the cut-off.

However, the selection of PCAs into the programme did not fully comply with the assignment rule. This non-compliance can be dealt with in an instrumental variable (IV) approach. The causal effect of the programme can be estimated by using the dummy for the assignment rule \((Z_i )\) as an instrument for participation in the programme \((P_i )\) in a two-stage least squares (2SLS) approach. The first and second stage equations in this approach are

$$\begin{aligned} P_i =\beta _0 +\beta _1 Z_i +f(I)+\beta _2 X_i +\eta _i, \end{aligned}$$
(3)

and

$$\begin{aligned} Y_i =\gamma _0 +\gamma _1 \hat{{P}}_i +f(I)+\gamma _2 X_i +\theta _i, \end{aligned}$$
(4)

where \(\hat{{P}}_i \) in Eq. (4) is the predicted probability of Eq. (3). Estimates of the parameter \(\gamma _1 \) yield the causal effect of the treatment for PCAs that comply with the assignment rule.

5 Sorting Around the Threshold

The empirical strategy outlined in the previous section can be applied to estimate the causal effect of the neighbourhood investment programme when the potential outcomes behave smoothly around the cut-off for the assignment of the treatment.

To investigate this assumption we perform balancing tests for seven outcome variables measured a year before the start of the programme and for three covariates. For the balancing test, we estimate the reduced form model [Eq.(1)]. To estimate the causal effects of the programme, we apply the 2SLS approach outlined in Eqs. (3) and (4). In all our estimations we use the most conservative (i.e., largest) standard errors.Footnote 6

5.1 Balancing Tests

Table 4 and Fig. 4 show the results of the balancing tests for the seven main outcomes variables that have been used to build the ‘quality’ index. We use a sample of 187 PCAs that includes all 93 PCAs to the right of the discontinuity threshold and 94 PCAs to the left of the cut-off.

Table 4 Balancing tests: the effect of the assignment to treatment on various outcomes before the start of the programme using a discontinuity sample of 187 PCAs (reduced form estimates)
Fig. 4
figure 4

Balancing tests for six outcomes

Figure 4 illustrates that measures of social cohesion, the quality of the public space, safety, the quality of public services, the quality of the housing stock and the quality of the composition of the population in the 187 PCAs behave smoothly around the cut-off for participating in the programme. As the estimated relationships and the confidence bounds show, the bivariate relationships are statistically similar for both the treated PCAs and the non-treated PCAs.

For each outcome in Table 4 we use a specification with a linear and square term of the forcing variable. We find that all reduced-form estimates are statistically insignificant. Similar results are found when we focus on a discontinuity sample closer to the cut-off (50 PCAs to the right and 50 PCAs to the left of the cut-off). The results for the seventh outcome variable ‘quality of life’, which is based on the six outcomes used in Table 4, are also statistically insignificant (see last column in Table 4). Figure 5 illustrates this as the estimated relationship is not statistically different for the treated and non-treated PCAs. These findings suggest that the allocation of PCAs around the threshold is random, which supports the possibility and usefulness of applying a RD design.

Fig. 5
figure 5

Balancing test for seventh outcome ‘quality of life’

Next to the indicators that should reveal information about the ‘quality’ of the neighbourhood, the composition of the population seems a natural indicator to investigate. Many of the PCAs that are selected into the treatment are located in the larger cities in the Randstad. It is well-known that the population composition in these cities is different from cities outside this area. This does not have to be a problem if the comparison in the RD framework is between PCAs with similar characteristics, something we expect if the variation around the cut-off is as good as random. However, inspection of indicators of the composition of the population suggest a remarkable difference between the treatment and control PCAs at the cut-off.

Table 5 shows balancing tests for three indicators of the composition of the population, which have somewhat surprisingly not been included in the ‘quality’ index. Depending on the specification, we observe that in 2006 there are living between 11 and 21% points more non-Western immigrants in PCAs in the treatment group compared to PCAs in the control group.Footnote 7 For the smaller discontinuity sample of 100 PCAs we observe similar differences in the composition of the population.

Table 5 Balancing tests: the effect of the assignment to treatment on various outcomes before the start of the programme using a discontinuity sample of 187 PCAs (reduced form estimates)

This gap in the proportion of non-Western immigrants implies a large increase of this proportion at the cut-off, as shown in Fig. 6. The observed difference in the composition of the population implies that the basic assumption about smoothness around the discontinuity is unlikely to hold. Figure 6 illustrates this by showing that the difference between treated and non-treated PCAs is statically significant.

Fig. 6
figure 6

Discontinuity in the proportion of non-western immigrants

5.2 Non-compliance with the Assignment Rule

We next look at non-compliance of PCAs with the assignment rules. Twelve PCAs were eligible for participation but were excluded; two PCAs were ineligible but did receive the treatment. Table 6 shows descriptive statistics for these two groups. The first row shows that the two PCAs that were ineligible do better on the ‘quality’ index. It should also be noted that one of these two PCAs ranked as PCA number 210 in the original ranking. The second row in Table 6 shows however that the ‘quality of the composition of the population’ differs statistically significant between the PCAs that did receive funds and the PCAs that were eligible but did not receive funds. Two of the other population indicators ‘percentage immigrants’ and ‘percentage non-Western immigrants’ show the same picture. This pattern of non-compliance is similar when compared to the previous findings from the balancing tests.

Table 6 Descriptive statistics for PCAs that did not comply with the assignment rule

5.3 Balancing Tests with Alternative Neighbourhood Definitions/Cut-Offs

We next look what happens to our balancing tests for non-Western immigrants when we choose different neighbourhood definitions and different cut-offs. We investigate what happens with the tests if we use (i) our reconstructed cut-off at the point at which for the first time we obtain 40 neighbourhoods (rank 80 in Table 2), (ii) the cut-off at which we for the first time obtain 40 PCAs (rank 40 in Table 2), (iii) the same strategy as the policymakers have done for a selection of 30 neighbourhoods (rank 63 in Table 2), (iv) the ‘reconstructed’ cut-off for 30 neighbourhoods (rank 55 in Table 2), and (v) the cut-off at which we for the first time obtain 30 PCAs (rank 30 in Table 2).

Table 7 presents the results of this analysis. We draw two conclusions from the coefficients documented in this table. First, the coefficients of the balancing test of selecting 40 neighbourhoods in a different way show no discontinuity in the percentage non-Western immigrants. This suggests that removing PCAs that were eligible and adding PCAs until the point at which the 41st neighbourhood has to be selected yields a discontinuity. The reason for this is that the PCA which forms the 41st neighbourhood has to be different from the PCAs that together yield the first 40 neighbourhoods. If it would have been similar, policymakers would have added the PCA to one of the existing 40 neighbourhoods. Second, when using the same procedure and our alternative procedures to select 30 neighbourhoods, we do not find discontinuities. This also holds for the case in which we keep on adding PCAs to neighbourhoods until we are force to define neighbourhood 31. This suggests two things. First, we cannot rule out that the discontinuity is the result of a coincidence. Second, the difference between the treatment and control PCAs around the cut-off of 30 neighbourhoods seems to be absent because we are able to compare neighbourhoods from similar cities, mainly in the Randstad (e.g. around the cut-off at rank 55 or 63 a number of PCAs pertain to the largest four cities in the Randstad).Footnote 8 Compared to a cut-off set at 40 neighbourhoods, not one of the first six PCAs after the cut-off pertains to the Randstad. This seems to be a major reason for the discontinuity we observe at the cut-off.

Table 7 Balancing tests for percentage non-Western immigrants using different cut-offs

6 Illustration of ‘Invalid’ RD

Endogenous sorting around the discontinuity threshold invalidates the application of a RD design because the assignment of the treatment to PCAs just below or above the threshold value no longer can be considered to be (conditionally) independent. We conduct two types of analysis. First, we show the potential bias in outcomes of the RD model when we use the official cut-off and the discontinuity in the share of non-Western immigrants is not taken into account. Second, we look what happens with the RD-estimates when we control for the share of non-Western immigrants.

Table 8 investigates this. The first RD model does not take into account proportion of non-Western immigrants (columns (1), (3) and (5)), the second model controls for this variable (columns (2), (4) and (6)). We estimate the effect of the programme on three different outcomes: the quality of life and voting for the Labour Party in the elections of 2010 and in the elections of 2012. The last two outcomes might be relevant as the minister who was responsible for the programme is a member of the Labour Party.

Table 8 IV estimates of the effect of the programme using specifications controlling for percentage non-Western immigrants or not

The estimated effect of the programme on the quality of life is insignificant in both specifications. However, the estimated effects are different from each other; the estimated effect in column (1) is negative, whereas in column (2) it turns positive when including non-Western immigrants as covariate. In column (3) we observe that not taking account of the difference in non-Western immigrants at the cut-off would yield 9% points more votes for the Labour Party in the elections of 2010 which can be attributed to the programme. However, non-Western immigrants are more likely to vote for the Labour Party, and we find that the estimated effect reduces towards zero after taking account of this population difference. In the last two columns we also find a large difference between the two estimates, varying between an increase of Labour Party voters in 2012 with 5.4% points and a decrease of 4.3% points.

7 Lessons

This paper documents a case of sorting around the discontinuity threshold for assigning neighbourhoods to a large-scale investment programme. Selection of neighbourhoods into the programme was determined by policymakers at the national level based on a score on a ‘quality’ index. At first sight this seems to be a textbook example for the application of a RD model aimed at estimating the causal effect of the programme.

The forcing variable was constructed from using eighteen indicators on socioeconomic or housing disadvantages, social problems and safety issues. PCAs and neighbourhoods had no control over the assignment to the treatment. However, at the cut-off for assignment to the programme, we find a remarkably large difference in the proportion of non-Western immigrants, a variable not taken into account in the ‘quality’ index. We also find that the pattern of non-compliance with the assignment rule seems consistent with investing in neighbourhoods with a high share of non-Western immigrants. These remarkable differences cannot be explained by sorting induced by PCAs themselves, as they had no control over the assignment to the treatment. It also seems highly unlikely that random sorting of neighbourhoods will produce such large differences in the proportion of non-Western immigrants at the cut-off.

We find that this non-random sorting may generate a bias of the RD estimates. Despite the differences in the proportion of non-Western immigrants at the discontinuity threshold, both policymakers and researchers have used the cut-off to analyse the effects of the neighbourhood investment programme. The Ministry of Housing, Spatial Planning and the Environment (currently the Ministry of the Interior), under which supervision the neighbourhood investment programme was launched, has initiated several ways to review the progress of the programme. There are several more descriptive reports available about improvements in outcomes. These reports aim to inform members of parliament about the progress of the programme (e.g., CBS 2012). None of these reports have noticed or taken into account the difference in the proportion of non-Western immigrants at the discontinuity threshold. Also researchers did not take into account this difference at the threshold. For example, Wittebrood and Permentier (2011) conclude that the share of non-Western immigrants is not increasing in treatment PCAs that focussed on the restructuring of housing. Such a finding has been regarded as a positive signal of improvement, but given our observation that the share of non-Western immigrants was higher in the treatment PCAs before the programme started, this sheds different light on perceived success. In addition, a recent study by Permentier et al. (2013) uses the discontinuity threshold in a RD setting to evaluate the effects of the programme. This study also does not take into account the difference in the share of non-Western immigrants nor does it account for non-compliance with the assignment rule.

Based on our empirical analysis we have to be careful in concluding whether or not policymakers’ preferences or political forces at the national level have contributed to the sorting patterns observed in the data. The simplest explanation for the observed sorting pattern is that it is a coincidence that there is such a large discontinuity in the share of non-Western immigrants at the threshold. Indeed, several indicators have been constructed to make a decision about which PCAs would be eligible for treatment and by coincidence there could be a discontinuity in the share of non-Western immigrants at exactly this threshold. Our analysis of the alternative of selecting 30 neighbourhoods with the same criterion does not rule out this possibility.

However, some observations suggest otherwise. First, the pattern of non-compliance with the assignment rule is consistent with selecting PCAs with more or less non-Western immigrants into and out of the treatment, respectively. Second, the size of the difference at the threshold points at selecting neighbourhoods in the Randstad relative to neighbourhoods in large cities in other parts of the country. Non-Western immigrants are concentrated in the Randstad. This selection seems to be the result of the selection rule to keep on adding PCAs to neighbourhoods until the threshold of 40 neighbourhoods set by the Minister was exhausted.

Overall, our results provide a new case of sorting around a threshold in a situation where the units that might receive treatment have no control over their assignment to the treatment. We view our findings as a cautionary note regarding the use of RD designs in situations in which policymakers are able to influence the assignment to the treatment.