When to use matching and weighting or regression in instrumental variable estimation? Evidence from college proximity and returns to college

Tübbicke, Stefan

doi:10.1007/s00181-023-02441-7

When to use matching and weighting or regression in instrumental variable estimation? Evidence from college proximity and returns to college

Replication study
Open access
Published: 08 June 2023

Volume 65, pages 2979–2999, (2023)
Cite this article

Download PDF

You have full access to this open access article

Empirical Economics Aims and scope Submit manuscript

When to use matching and weighting or regression in instrumental variable estimation? Evidence from college proximity and returns to college

Download PDF

Stefan Tübbicke ORCID: orcid.org/0000-0002-0197-9930¹

1387 Accesses
Explore all metrics

Abstract

Standard two-stage least squares (2SLS) regression remains dominant in instrumental variables estimation of causal effects even though the literature has shown that 2SLS may be inconsistent when effects are heterogenous and the instrument is only valid when conditioning on covariates. To show that this is not merely a hypothetical threat, this paper re-estimates the returns to college using college proximity as an instrument based on the data from Card (Aspects of labour market behavior: essays in honour of John Vanderkamp, University of Toronto Press, Toronto, 1995). The results show that 2SLS yields systematically larger estimates of the returns to college than more flexible estimators based on the instrument propensity score. In the full sample, differences amount to about 50 to 100%. This is due to the implicit conditional-variance weighting performed by 2SLS. Moreover, in line with the theoretical prediction by Sloczynski (When should we (not) interpret linear IV estimands as LATE? IZA discussion papers 14349, Institute of Labor Economics (IZA), 2021), findings suggest that the impact of the conditional-variance weighting is larger when instrument groups are not roughly the same size. Thus, it is advised to use 2SLS with caution and use estimators based on the instrument propensity score instead when groups are of different size and covariates are predictive of the instrument.

Heterogeneous economic returns to higher education: evidence from Italy

Article 14 February 2015

Semiparametric estimation of average treatment effect through a random coefficient dummy endogenous variable model

Article 31 July 2014

Instrumental Variables: Conceptual Issues and an Application Considering High School Course Taking

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The number of available methods for causal inference has seen enormous growth in the last three decades (see Abadie and Cattaneo 2018, for a recent overview). Although progress has been tremendous, applied research does not always keep up. On the one hand, flexible semi- and non-parametric methods based on the propensity score (PS) are widely applied when estimating causal effects under the selection-on-observables assumption (e.g., see Austin and Stuart 2015; Thoemmes and Kim 2011). On the other hand, much of the literature using instrumental variable (IV) estimation to overcome bias due to unobserved factors relies on two-stage least squares (2SLS). This is despite the fact that 2SLS may yield inconsistent estimates of treatment effects when effects are heterogenous and covariates are predictive of the instrument (Abadie 2003).

This paper concentrates on the most common case in which the researcher aims to estimate causal effects of some treatment using a single binary IV. First, the paper reviews available results from the literature on implications of using standard linear-in-covariates 2SLS estimation under effect heterogeneity. Most importantly for this paper, the literature shows that 2SLS yields a ratio of conditional variance-weighted average of covariate-specific effects. If effects are indeed heterogenous and related to the PS, then 2SLS yields inconsistent estimates. Estimators based on the PS provide a consistent (Frölich 2007), readily-available and intuitive alternative. Hence, the paper briefly describes some basic IV estimators using the PS as well as the novel efficient covariate balancing approach by Heiler (2021). By re-estimating the returns to college using these approaches—exploiting college proximity as an instrument using the data by Card (1995)—the paper shows that the threat of obtaining inconsistent estimates when using 2SLS is not merely hypothetical. 2SLS yields systematically larger effect estimates than more flexible estimators based on the PS. Further inspection shows that this difference is mainly due to the implicit conditional-variance weighting performed by 2SLS.

This case study has been widely used to teach Economics students around the world about the use of IV methods to overcome bias due to unobserved confounders as well as the importance of effect heterogeneity. Moreover, the case study has been widely used in a variety of papers, see for example Tan (2006), Huber and Mellace (2015), Kitagawa (2015), Mourifié and Wan (2017), Andresen and Huber (2021), Sloczynski (2021), Sloczynski et al. (2022) and Blandhol et al. (2022). Most of these papers are concerned with instrument validity, an issue that is discussed but not of main interest in this paper. The only study known to the author that compares parametric estimators of the returns to college with more flexible estimators for this case study is Sloczynski et al. (2022). They too find sizable gaps in estimates. However, in contrast to this paper, they do not offer an explanation for this phenomenon.

The remainder of this paper is organized as follows: Sect. 2 reviews identification and estimation using IVs, Sect. 3 applies 2SLS and comparison methods based on the PS to the data. Section 4 concludes.

2 Identification and estimation using Instrumental variables

Assume we have an i.i.d. sample for i = 1,…,N units, where for each unit we observe some exogenous characteristics ${\text{X}}_{\text{i}}$, a binary treatment variable ${\text{D}}_{\text{i}}$, an outcome ${\text{Y}}_{\text{i}}$ and a single binary instrument ${\text{Z}}_{\text{i}}$. Furthermore, assume that there is an unobserved confounder ${\text{U}}_{\text{i}}$ that has an impact both the treatment variable ${\text{D}}_{\text{i}}$ and the outcome ${\text{Y}}_{\text{i}}$. In the language of classical least squares regression, this creates an omitted variable bias and the selection-on-observables assumption fails (Wooldridge 2010, Chap. 4). To stick to the returns-to-college example used throughout this paper, conditioning on observed characteristics such as labor market experience or region of residence is insufficient to remove bias from standard regression or matching estimates of the effects of college attendance on wages if unobserved ability has an impact on the college decision and labor market earnings (Blackburn and Neumark 1993).

Under such circumstances, one can use IV techniques to estimate causal effects by exploiting variation in the treatment variable ${\text{D}}_{\text{i}}$ through the instrument ${\text{Z}}_{\text{i}}$. When effects are heterogenous, IV methods identify local treatment effects, i.e. average effects for specific sub-populations influenced by the instrument. For this identification result to hold, the instrument needs to be exogenous, i.e. the instrument has to be as good as randomly assigned after conditioning on covariates and there must not be a direct effect of the instrument on the outcome. Moreover, the instrument must influence the treatment decision in a monotonous way. Imbens and Angrist (1994) introduce what Sloczynski (2021) calls “strong monotonicity”, which is the assumption that the instrument weakly increases or decreases the treatment probability for everyone. Under this assumption, IV methods identify the local average treatment effect (LATE, Imbens and Angrist 1994), also called the complier average causal effect (CACE), i.e. the average treatment effect of individuals who act in line with the instrument. If defiers, i.e. individuals who act in the opposite direction of compliers, exist, and one is willing to assume that the sign of the effect of the instrument on treatment is determined solely by covariates (“weak monotonicity”, Sloczynski 2021), the CACE may be recovered by averaging effects for individuals with covariate values estimated to behave in the direction of compliers. Moreover, a more general effect, the mover average causal effect (Kolésar 2013), i.e. the average treatment effect for compliers and defiers, is identified.

Using the standard potential outcomes framework (Imbens and Angrist 1994; Rubin 1974), define ${\text{D}}_{\text{i}}\left(1\right)$and ${\text{D}}_{\text{i}}\left(0\right)$ as the potential treatment states if the unit was assigned ${\text{Z}}_{\text{i}}=1$ or ${\text{Z}}_{\text{i}}=0$. If the instrument indeed has no direct impact on the outcome, one may write potential outcomes as ${\text{Y}}_{\text{i}}\left({d}_{i}\right)$, with ${\text{Y}}_{\text{i}}\left(1\right)$and ${Y}_{i}\left(0\right)$ being the outcomes that would be observed under treatment and without. Assuming the instrument raises the chance of receiving treatment on average, the strong monotonicity assumption implies that for compliers ${D}_{i}\left(1\right)>{D}_{i}\left(0\right)$, i.e. they receive treatment if assigned ${\text{Z}}_{\text{i}}=1$and they do not if assigned ${\text{Z}}_{\text{i}}=0$. Based on these definitions and assumptions, the standard CACE can be written as

$${{\Delta }}^{CACE}=E\left[{Y}_{i}\left(1\right)-{Y}_{i}\left(0\right)|{D}_{i}\left(1\right)>{D}_{i}(0)\right] .$$

(1)

The MACE is defined as ${{\Delta }}^{MACE}=E\left[{Y}_{i} \left(1\right)-{Y}_{i} \left(0\right)|{D}_{i}\left(1\right) \ne {D}_{i}(0)\right]$ and can be recovered by using a reordered instrument ${\text{Z}}_{\text{i}}^{\text{R}}$, i.e. an adapted instrument which is reversed for defiers, defined as^{Footnote 1}

$${Z}_{i}^{R}={Z}_{i}I\left({\delta }^{D}\left({X}_{i}\right)\ge 0\right)+\left(1-{Z}_{i}\right)I\left({\delta }^{D}\left({X}_{i}\right)<0\right),$$

(2)

where ${\delta }^{D} \left({X}_{i}\right)=E\left[{D}_{i} \right(1)-{D}_{i} (0\left)\right|{X}_{i}]$ is the covariate-specific average effect of the instrument on the treatment decision and $I(\cdot )$ is the indicator function (Sloczynski 2021).

For the empirical analysis of the paper, the exogeneity assumption is assumed to hold. Moreover, it is assumed that at least the weak monotonicity assumption holds as well. While the failure of the monotonicity assumption makes the interpretation of estimands difficult if not impossible, differences in estimates of these quantities are still interesting to inspect in order to understand the estimators’ behavior under effect heterogeneity.

For the following exposition of estimation methods, assume that strong monotonicity holds.^{Footnote 2} While Frölich (2007) shows that the CACE is non-parametrically identified under exogeneity and strong monotonicity, most applied research still uses 2SLS to estimate effects using an IV. Typically, researchers model the outcome and treatment equations as linear functions of the instrument and covariates. That is, they build regression models that look something like

$$Y_{i} = \alpha _{Y} + \beta _{Y}^{\prime } X_{i} + \gamma ^{Y} Z_{i} + \varepsilon _{i}^{Y}$$

(3)

$$D_{i} = \alpha _{D} + \beta _{D}^{\prime } X_{i} + \gamma ^{D} Z_{i} + \varepsilon _{i}^{D} ,$$

(4)

where it is (implicitly) assumed that slope-coefficients are constant and that $\varepsilon _{i}^{Y}$ and $\varepsilon _{i}^{D}$ are well-behaved error terms. The corresponding 2SLS estimator can be written as ${\widehat{{\Delta }}}_{2SLS}={\widehat{\gamma }}^{Y}/{\widehat{\gamma }}^{D}$, i.e. the ratio of the reduced form (3) OLS coefficient ${\widehat{\gamma }}^{Y}$ and the first stage (4) OLS coefficient ${\widehat{\gamma }}^{D}$ on ${Z}_{i}$. Under effect heterogeneity and standard regularity conditions, Sloczynski (2021) shows that ${\widehat{{\Delta }}}_{2SLS}$ converges to^{Footnote 3}

$$\text{plim}{\widehat{{\Delta }}}_{2SLS}=\frac{E\left[{\delta }^{Y}\left({X}_{i}\right){\delta }^{D}\left({X}_{i}\right)Var\left({Z}_{i}\right|{X}_{i})\right]}{E\left[{\delta }^{D}\left({X}_{i}\right)Var\left({Z}_{i}\right|{X}_{i})\right]},$$

(5)

where ${\delta }^{Y}\left({X}_{i}\right)=E\left[{Y}_{i} \right(1)-{Y}_{i} (0\left)\right|{X}_{i}, {D}_{i} \left(1\right)>{D}_{i} \left(0\right)]$is the average covariate-specific effect of treatment on the outcome for compliers. Hence, 2SLS yields a conditional-variance weighted average of covariate-specific effects for compliers. As $Var\left({Z}_{i}\right|{X}_{i})=P({Z}_{i}=1\left|{X}_{i}\right)(1-P\left({Z}_{i}=1|{X}_{i}\right),$weights attain a maximum when the PS $P\left({Z}_{i}=1|{X}_{i}\right)=0.5$ (e.g., see Angrist and Pischke 2008). An important but typically under-appreciated consequence of this weighting is that $\text{plim}{\widehat{{\Delta }}}_{2SLS} \ne {{\Delta }}^{CACE}$ if ${\delta }^{Y}\left({X}_{i}\right)$ and ${\delta }^{D}\left({X}_{i}\right)$ depend on the PS. Depending on the correlation structure at hand, this may lead to substantial inconsistencies when using 2SLS.

As an alternative, this paper considers IV estimators of ${{\Delta }}^{CACE}$based on the PS as derived by Frölich (2007) as well as a recent extension by Heiler (2021).^{Footnote 4} These estimators do not restrict effect heterogeneity. As a consequence, they are consistent even when effects are not homogenous (Frölich 2007, and Heiler, 2021).

The IV-matching estimator based on the PS pairs up each unit from the groups defined by the instrument with one, multiple or weighted averages of units from the opposite group based on the PS in order to infer the missing counterfactuals. Let $\widehat{{Y}_{i}\left(1\right)}$ and $\widehat{{D}_{i}\left(1\right)}$denote the estimated counterfactuals for the outcome and the treatment variable if the unit was assigned ${Z}_{i}=0$ as obtained by matching. Analogously, let $\widehat{{Y}_{i}\left(0\right)}$ and $\widehat{{D}_{i}\left(0\right)}$ be the estimated counterfactual outcome and treatment variable if the unit was assigned ${Z}_{i}=1$. Based on this definition, the IV-matching estimator can be written as

$${\widehat{\delta }}_{MAT}^{}=\frac{\sum _{i=1}^{N}{Z}_{i}\left({Y}_{i}-\widehat{{Y}_{i}\left(0\right)} \right)+(1-{Z}_{i})(\widehat{{Y}_{i}\left(1\right)}-{Y}_{i})}{\sum _{i=1}^{N}{Z}_{i}\left({D}_{i}-\widehat{{D}_{i}\left(0\right)} \right)+(1-{Z}_{i})(\widehat{{D}_{i}\left(1\right)}-{D}_{i})}.$$

(6)

To estimate the PS, a standard logit regression is used. Moreover, kernel matching (KM) is employed as it has been shown to be among the top-performing PS-based matching methods in several simulation studies under the selection-on-observables paradigm (e.g., see Frölich 2004; Busso et al. 2014). More specifically, the matching procedure is implemented using an Epanechnikov kernel with a bandwidth chosen via weighted cross-validation (Galdo et al. 2008). To avoid extrapolation, common support is imposed via the min-max criterion by Dehejia and Wahba (1999) as is standard in the PS-based literature (Caliendo and Kopeinig 2008).

The (un-normalized) inverse probability weighting (IPW) IV-estimator can be written as

$${\widehat{\delta }}_{IPW}^{}=\frac{\sum _{i=1}^{N}\frac{{{Z}_{i}Y}_{i}}{{\widehat{P}}_{i}}-\frac{{{(1-Z}_{i})Y}_{i}}{1-{\widehat{P}}_{i}}}{\sum _{i=1}^{N}\frac{{{Z}_{i}D}_{i}}{{\widehat{P}}_{i}}-\frac{{{(1-Z}_{i})D}_{i}}{1-{\widehat{P}}_{i}}},$$

(7)

where ${\widehat{P}}_{i}$ is an estimate of the PS $P\left({Z}_{i}=1|{X}_{i}\right)$. IPW has been shown to be semiparametrically efficient in the IV context (Donald et al. 2014). To estimate the PS, two approaches are used. First, the same logit estimate as for KM is employed. To ensure better performance, weights of this estimator are normalized as un-normalized weights may yield unreliable results (Frölich 2004; Busso et al. 2014). Akin to KM, the min-max criterion is used to ensure common support. In the context of IPW, this sort of trimming may be even more important as IPW with PS close to zero or one may lead to invalid statistical inference when using the non-parametric bootstrap as is done for all estimators considered. See Heiler and Kazak (2021) for derivations and alternative bootstrap approaches or Sasaki and Ura (2022) for trimming based methods.

Second, as IPW methods may be overly sensitive to the specification of the estimated PS (Schafer and Kang 2008), this paper also uses the novel efficient covariate balancing (ECB) procedure by Heiler (2021) to estimate the PS. This approach specifies a loss-function tailored to the estimation of treatment effects using IVs and algorithmically minimizes covariate imbalances, leading to improved bias and variance properties in finite-samples compared to standard IPW methods (see Heiler, 2021, for details). This approach has several advantages. First, akin to IPW, ECB is semiparametrically efficient. Second, ECB is doubly-robust if covariates are specified flexibly and third, the ECB method tends to shrink the PS which may alleviate the need to implement heuristic trimming approaches such as the min-max criterion.^{Footnote 5} Due to the last property, IPW based on the ECB is implemented without further common support restrictions.

Ultimately, choosing an IV estimator involves a trade-off: Standard 2SLS is more easily applied than matching or weighting but 2SLS may be inconsistent under effect heterogeneity. Moreover, recent simulation evidence by Sloczynski et al. (2022) suggests that more flexible estimators may even be competitive in terms of mean squared error compared to standard 2SLS. However, more research on the relative performance of IV estimation methods under realistic data-generating processes is necessary to provide better guidance to researchers.

3 Re-estimating the returns to college exploiting college proximity

This Section provides empirical evidence on the relevance of potential inconsistencies in 2SLS estimates when an instrument is only valid conditional on covariates. This is done by re-estimating the wage returns to college exploiting college proximity as instrument using the data originally analyzed by Card (1995).

3.1 Data and descriptives

The data stem from the National Longitudinal Survey of Young Men, which interviewed men aged 14–24 in 1966 with follow-up surveys until 1981. The dataset contains information on 1976 log-earnings, years of education, and an indicator for growing up in a local labor market with an accredited 4-year college as well as covariates. The latter consist of potential experience, indicators for the 1966 census region, an indicator for being black, and living in the south as well as in an urban area in 1966 and 1976. Following Sloczynski (2021), a subset of the original data is analyzed with at least five observations in each covariate cell given by the interactions of the five indicators for being black, living in the south and in an urban area in 1966 and 1976. This restriction results in a sample of 2988 individuals instead of the 3010 originally analyzed by Card (1995).^{Footnote 6}

The main idea of the instrumental variable set-up is that children who grew up near a college may live with their parents throughout their studies and thus face lower cost of post-secondary education, which should increase the likelihood of going to college independent of their ability. Accordingly, the treatment variable is defined as “some college”, i.e. having strictly more than 12 years of education.^{Footnote 7}

Table 1 provides some select descriptive statistics for the sample, split by whether individuals grew up near a college (${\text{Z}}_{\text{i}}=1$) or not (${\text{Z}}_{\text{i}}=0$).

Table 1 Descriptive statistics

Full size table

The descriptive statistics reveal quite sizable differences in terms of covariate distributions between groups defined by the binary instrument. The most-striking difference can be seen in the likelihood of living in an urban area: 80% of individuals who grew up near a 4-year college lived in an urban area in 1966, whereas the same is only true for 33% among individuals who grew up without a college nearby. Similarly, individuals who lived in the south in 1966 are under-represented among individuals who grew up near a 4-year college: of those who did (not) grow up near a 4-year college, 33 (60) percent lived in the south. Moreover, differences in racial composition and experience are also non-negligible. All of these differences are highly statistically significant as indicated by the small p values obtained from equality of means tests. As these variables tend to show quite strong associations with the outcome of interest, it is unlikely that the instrument is valid without conditioning on covariates and hence, an unconditional comparison of college attendance rates and log-wages across instrument groups is unlikely to be informative about the true effect of college attendance on earnings.

3.2 Specification and estimation methods

The returns to college will be estimated using two different sets of covariates. First, the main specification of Card (1995)—referred to as the baseline specification—will be used. This specification consists of potential experience in linear and squared form, indicators for the 1966 census region, an indicator for being black, and living in the south in 1976 as well as indicators for living in an urban area in 1966 and 1976. Second, following Sloczynski (2021), a saturated, i.e. fully interacted, specification based on the indicator for being black, living in the south in 1966 and 1976 and living in an urban area in 1966 and 1976 is used. Sloczynski (2021) adopted this flexible specification because Kitagawa (2015) provided evidence in favor of the validity of the instrument after conditioning on these covariates.

To estimate effects of college attendance on wages, the previously discussed methods are used. That is, naïve OLS and standard 2SLS and more flexible estimators based on the PS are applied. The latter consist of KM with an Epanechnikov kernel using a bandwidth chosen via weighted cross-validation (Galdo et al. 2008), IPW based on a logit estimate of the PS as well as ECB (Heiler 2021). When estimating effects based on the logit estimate of the PS, common support is imposed via the min–max criterion by Dehejia and Wahba (1999) as is standard in the PS-based literature (Caliendo and Kopeinig 2008).

As Sloczynski (2021) raises doubts about the validity of the strong monotonicity assumption, all estimators are also be applied using the reordered instrument. Following Sloczynski (2021), this adjusted instrument is obtained by estimating first stage effects non-parametrically for each covariate cell of the saturated specification and then reversing the instrument for individuals estimated to be defiers such that ${Z}_{i}^{R}$ encourages treatment for everyone. This changes the target parameter from CACE to MACE. In order to take care of this additional estimation step when performing statistical inference, standard errors are estimated using the non-parametric bootstrap not just for the PS-based estimators but also for 2SLS when using the reordered instrument. Standard errors are obtained using 999 replications, inference is based on the normal approximation.

3.3 Implementing matching and weighting

Before turning to actual estimates, it is imperative to check overlap and common support in terms of the PS as well as covariate balancing after matching or weighting (Caliendo and Kopeinig 2008). Figure 1 shows histograms of estimated PS with a bin size of 2.5%. Visual inspection suggests sufficient overlap between instrument groups, independent of the specification and estimation procedure used. Moreover, the PS distributions appear to be sufficiently bounded away from zero or one, which is important for the non-parametric bootstrap employed to be valid (Heiler and Kazak 2021; Sasaki and Ura 2022).^{Footnote 8} Applying the min-max criterion for KM and IPW based on the standard specification of the logit PS leads to the exclusion of 36 individuals. This equals roughly 1.2% of the sample and thus, one should not be overly concerned that estimated effects are no longer representative of the target estimand. Regarding covariate balance, Table 2 shows the pseudo-${R}^{2}$ from a logit regression before and after matching or weighting for each specification used. All balancing approaches yield a substantial reduction in imbalance from around 20% to less than 1%. As intended, ECB delivers exact balance, independent of the specification used. Moreover, p values of likelihood-ratio tests suggest that after matching or weighting, covariates are no longer statistically associated with the instrument. Hence, these statistics suggest adequate covariate balance in order to move on to the outcome analysis.

Table 2 Balancing

Full size table

3.4 Comparing parametric and more flexible estimates of the returns to college

Focusing on the standard specification in the first two columns of Table 3, one can see that the 2SLS estimate of roughly 0.6 log-points is more than twice as large as the OLS estimate of about 0.24 log-points. This is in line with the findings of Card (1995) using multi-valued years of education as the treatment variable instead of a binary variable as in this case. Similar results are found using the saturated specification: 2SLS yields a point estimate of 0.57 log-points and the naïve OLS estimate is even smaller than when using the standard specification. Card (1995) attributes the sizable gap in estimates between 2SLS and OLS to possibly higher returns to education among individuals with a relatively poor background as they are the most likely to be induced to receive additional education by the instrument. This may explain why effects are expected to be larger, but estimates appear to be unreasonably large. Sloczynski (2021) argues that the large estimate may be caused by the existence of defiers. Indeed, his results—which are replicated in Table 3—show that when accounting for the existence of defiers by using the reordered instrument, the 2SLS estimate drops substantially to around 0.29 log-points. Nonetheless, the estimated effect is still considerably larger than the effect of roughly 20% suggested by other research on the returns to college (see for example Hoekstra 2009; Smith et al. 2020; Zimmerman 2014).

Table 3 Main Results

Full size table

Turning to the more flexible estimates based on the PS in columns three to five of Table 3, one can see that matching and weighting estimators yield substantially smaller point estimates of the returns to college than 2SLS.^{Footnote 9} Estimates range from 0.28 to 0.32 log-points for the baseline specification. Estimates using the saturated specification are essentially identical due to their non-parametric nature, independent of whether KM or IPW with a logit or ECB PS is used. These estimates suggest a roughly a 0.27 log-point gain in wages from college attendance. If one uses the reordered instrument instead, matching and weighting estimates drop to roughly 0.2 log-points, which is fairly close to the estimates suggested by the literature. Furthermore, Table 3 shows that these smaller point estimates of returns to college are both due to smaller reduced form estimates as well as larger first stage effects when using PS-based estimators compared to 2SLS. Overall, the results suggest that the implicit conditional-variance weighting of 2SLS may have a substantial impact on resulting effect estimates when estimating the returns to college using college proximity. 2SLS estimates are somewhere between 50 and 100% larger than more flexible PS-based estimates.^{Footnote 10} These differences are rather sizeable, underscoring the potential value in using more robust PS-based estimators when estimating effects using an IV set-up.

3.5 Inspecting effect heterogeneity

To further illustrate the impact of the conditional-variance weighting by 2SLS, Table 4 compares 2SLS estimates with effect estimates using PS-based estimators as well as the estimates one would obtain if one weighted PS-based estimators with an estimate of the conditional variance of the instrument, i.e. mimicking the asymptotic behavior of 2SLS. This is done for the full sample as well as for two subsamples. For the sake of brevity, results are shown only for the saturated specification with the reordered instrument. Results for the other specifications—which are similar to the ones presented here—can be found in Tables 5 and 6 in the “Appendix”.

Panel A of Table 4 first replicates estimates for 2SLS and the PS-based estimators for the full sample found in Table 3: 2SLS yields an estimate of college returns of 0.289 log-points, PS-Based estimators suggest returns of 0.192 log-points. Conditional-variance weighted KM and the other PS-based approaches yield an estimate of 0.289 log-points, which is identical to the 2SLS estimate. Thus, in the fully saturated specification, the difference between 2SLS and more flexible estimators can be entirely attributed to the conditional-variance weighting performed by 2SLS. When using a non-saturated specification, this property breaks down. However, results still clearly show that variance weighting has a major impact on resulting estimates (see Table 5).

Table 4 Effect heterogeneity—saturated specification with reordered instrument

Full size table

As pointed out by one of the reviewers, results by Sloczynski (2021) imply that 2SLS is expected to yield similar estimates to more flexible estimators when the (reordered) instrument groups are roughly of the same size, i.e. when $\text{P}({\text{Z}}_{\text{i}}=1)\approx 0.5$ or $\text{P}({\text{Z}}_{\text{i}}^{\text{R}}=1)\approx 0.5$. To inspect this implication, Panel B and C of Table 4 estimate effects for individuals who grew up an in urban environment with $\text{P}\left({\text{Z}}_{\text{i}}^{\text{R}}=1|\text{u}\text{r}\text{b}\text{a}\text{n}\right)=0.79$ or in a more rural area with $\text{P}\left({\text{Z}}_{\text{i}}^{\text{R}}=1|\text{r}\text{u}\text{r}\text{a}\text{l}\right)=0.44$. Indeed, 2SLS estimates are much more similar to PS-based estimates in the rural sample (0.342 and 0.347 log-points) than in the urban sample (0.249 and 0.137 log-points). Again, these differences are completely accounted for by the conditional-variance weighting. Hence, it appears that 2SLS is expected to yield estimates close to more flexible estimators when instrument groups are roughly equal size because the conditional-variance weighting plays less of a role in that case.

4 Conclusion

By re-examining the Card (1995) data on college proximity and the returns to college, this paper shows that potential inconsistencies in 2SLS estimates of local treatment effects documented in the theoretical literature are not merely a hypothetical threat when effects are heterogenous. For the data at hand, 2SLS yields systematically larger effects than more flexible estimators based on the PS with differences amounting to roughly 50 to 100%. It is shown that this is because standard linear-in-covariates 2SLS yields a conditional variance-weighted average effect, putting more weight on units with a PS close to a coin flip. In line with theoretical predictions by Sloczynski (2021), the results suggest that 2SLS estimates can be expected to be more trustworthy when sample shares of instrument groups are roughly of equal size. Moreover, the paper shows that this is because the effects of conditional-variance weighting tend to be less severe when groups sizes are similar. Overall, the results show that the presumption that 2SLS yields point estimates close to more flexible estimators based on the PS as argued by Angrist and Pischke (2008) does not apply in general and that one should be suspicious of 2SLS estimates when group sizes differ substantially and covariates are predictive of the instrument. In that case it may be best to use semi- or non-parametric estimation techniques instead. At the very least, one should use these methods to assess the sensitivity of estimates regarding implicit parametric assumptions made when using linear-in-covariates 2SLS.

Notes

As noted by Sloczynski (2021), this requires the estimation of $\delta ^{D} (X_{i} )$
If ${Z}_{i}^{R}$ were known, all results presented would also hold for the estimation of the MACE. However, as it is unknown how the estimation of ${Z}_{i}^{R}$ affects the behavior of estimators and deriving such results is beyond the scope of this paper, the author does not further discuss estimators based on the reordered instrument in this part. Nonetheless, Sect. 3 applies this methodology in order to provide evidence that a failure of the strong monotonicity does not drive differences between 2SLS and PS-bases estimators.
This formula also follows immediately by combining results from Angrist (1998) on probability limits for OLS regressions and the continuous mapping theorem, as 2SLS is simply a ratio of two OLS coefficients.
Other flexible estimators are available. See Abadie (2003), Tan (2006), MaCurdy et al. (2011) and Donald et al. (2014), Sant’Anna et al. (2022) and Sloczynski et al. (2022). The latter two are promising extensions of the so-called “kappa weighting” by Abadie (2003).
Note that ECB weights are normalized by construction if an intercept is included in the model, which is done for all analyses performed using ECB in this paper.
None of the results presented are sensitive to this restriction.
Note that this definition of the treatment variable is a binarized variable based on an underlying multi-valued treatment variable (i.e. years of education). Such binarization may lead to a violation of the exclusion restriction (Andresen and Huber 2021). While such violations may affect the resulting estimates, the impact of this issue should be of minor importance as this paper compares differences between estimators which are all affected by such an issue.
The minimum and maximum PS values obtained via logit regression are 0.172 and 0.950 (baseline specification), 0.250 and 0.931 (saturated specification) and 0.086 and 0.931 (saturated specification, reordered instrument). The minima and maxima obtained via ECB are very similar to the logit estimates for the standard specification and identical for the others.
This result is also supported by contemporaneous findings by Sloczynski et al. (2022) using different versions of the kappa-approach by Abadie (2003).
Note that most matching or weighting estimates of the returns to college are insignificant at common levels and that differences discussed are also not statistically different from zero due to large standard errors. The significance of differences across estimators was tested using a random sampling splitting as well as a bootstrap procedure. Both procedures lead to highly insignificant differences (results not shown, available from the author upon request). While differences may be insignificant using the sample at hand, simply focusing on statistical significance may falsely discourage the use of more robust estimation methods in favor of 2SLS in applied work.

References

Abadie A (2003) Semiparametric instrumental variable estimation of treatment response models. J Econ 113(2):231–263
Article Google Scholar
Abadie A, Cattaneo MD (2018) Econometric methods for program evaluation. Annual Rev Econ 10:465–503
Article Google Scholar
Andresen M, Huber M (2021) Instrument-based estimation with binarised treatments: issues and tests for the exclusion restriction. Econom J 24(3):536–558
Article Google Scholar
Angrist JD (1998) Estimating the labor market impact of voluntary military service using social security data on military applicants. Econometrica 66(2):249–288
Article Google Scholar
Angrist JD, Pischke JS (2008) Mostly harmless econometrics. Princeton UUniversity Press, Princeton
Book Google Scholar
Austin PC, Stuart EA (2015) Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat Med 34(28):3661–3679
Article Google Scholar
Blackburn ML, Neumark D (1993) Omitted-ability bias and the increase in the return to schooling. J Labor Econ 11(3):521–544
Article Google Scholar
Blandhol C, Bonney J, Mogstad M, Torgovitsky A (2022) When is TSLS actually late? NBER working paper no. 29709
Busso M, DiNardo J, McCrary J (2014) New evidence on the finite sample properties of propensity score reweighting and matching estimators. Rev Econ Stat 96(5):885–897
Article Google Scholar
Caliendo M, Kopeinig S (2008) Some practical guidance for the implementation of propensity score matching. J Econ Surv 22(1):31–72
Article Google Scholar
Card D (1995) Using geographic variation in college proximity to estimate the return to schooling. In: Christophides LN, Grant EK, Swidinsky R (eds) Aspects of labour market behavior: essays in Honour of John Vanderkamp. University of Toronto Press, Toronto, pp 201–222
Google Scholar
Dehejia RH, Wahba S (1999) Causal effects in nonexperimental studies: reevaluating the evaluation of training programs. J Am Stat Assoc 94(448):1053–1062
Article Google Scholar
Donald SG, Hsu Y, Lieli RP (2014) Inverse probability weighted estimation of local average treatment effects: a higher order MSE expansion. Stat Probab Lett 95:132–138
Article Google Scholar
Frölich M (2004) Finite-sample properties of propensity-score matching and weighting estimators. Rev Econ Stat 86(1):77–90
Article Google Scholar
Frölich M (2007) Nonparametric IV estimation of local average treatment effects with covariates. J Econ 139(1):35–75
Article Google Scholar
Galdo JC, Smith J, Black D (2008) Bandwidth selection and the estimation of treatment effects with unbalanced data. Ann d’Économie et de Statistique 91/92:189–216
Google Scholar
Heiler P, Kazak E (2021) Valid inference for treatment effect parameters under irregular identification and many extreme propensity scores. J Econ 222(2):1083–1108
Article Google Scholar
Heiler P (2021) Efficient covariate balancing for the local average treatment effect. J Bus Econ Stat (forthcoming)
Hoekstra M (2009) The Effect of attending the Flagship State University on earnings: a discontinuity-based approach. Rev Econ Stat 91:717–724
Article Google Scholar
Huber M, Mellace G (2015) Testing instrument validity for LATE identification based on inequality moment constraints. Rev Econ Stat 97(2):398–411
Article Google Scholar
Imbens GW, Angrist JD (1994) Identification and estimation of local average treatment effects. Econometrica 62:467–476
Article Google Scholar
Kitagawa T (2015) A test for instrument validity. Econometrica 83(5):2043–2063
Article Google Scholar
Kolésar M (2013) Estimation in an instrumental variables model with treatment effect heterogeneity. Unpublished manuscript
MaCurdy T, Chen X, Hong H (2011) Flexible estimation of treatment effect parameters. Am Econ Rev 101(3):544–551
Article Google Scholar
Mourifié I, Wan Y (2017) Testing local average treatment effect assumptions. Rev Econ Stat 99(2):305–313
Article Google Scholar
Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66(5):688
Article Google Scholar
Sant’Anna PH, Song X, Xu Q (2022) Covariate distribution balance via propensity scores. J Appl Econ 37(6):1093–1120
Article Google Scholar
Sasaki Y, Ura T (2022) Estimation and inference for moments of ratios with robustness against large trimming bias. Econ Theory 38(1):66–112
Article Google Scholar
Schafer JL, Kang J (2008) Average causal effects from nonrandomized studies: a practical guide and simulated example. Psychol Methods 13(4):279
Article Google Scholar
Sloczynski T (2021) When should we (not) interpret linear IV estimands as LATE? IZA discussion papers 14349, Institute of Labor Economics (IZA)
Sloczynski T, Uysal SD, Wooldridge JM (2022) Abadie’s kappa and weighting estimators of the local average treatment effect. CESifo working paper no. 9715
Smith J, Goodman J, Hurwitz M (2020) The economic impact of acccess to public four-year colleges, NBER Working Paper no. 27177
Tan Z (2006) Regression and weighting methods for causal inference using Instrumental variables. J Am Stat Assoc 101(476):1607–1618
Article Google Scholar
Thoemmes FJ, Kim ES (2011) A systematic review of propensity score methods in the social sciences. Multivar Behav Res 46(1):90–118
Article Google Scholar
Wooldridge JM (2010) Econometric analysis of cross section and panel data, vol 1, 2nd ednMIT Press Books, The MIT Press, Cambridge
Google Scholar
Zimmerman SD (2014) The returns to College Admission for academically marginal students. J Labor Econ 32:711–754
Article Google Scholar

Download references

Acknowledgements

The author would like to thank the reviewers for the very helpful comments which lead to a substantial improvement of the paper during the revision.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Institute for Employment Research (IAB), Regensburger Str. 104, 90478, Nuremberg, Germany
Stefan Tübbicke

Authors

Stefan Tübbicke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefan Tübbicke.

Ethics declarations

Conflict of interest

The author declares that he has no conflict of interest.

Human and animal rights

This article does not contain any studies with human participants or. animals performed by any of the authors.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Material 1

Supplementary Material 2

Supplementary Material 3

Appendix

See Tables 5 and 6.

Table 5 Effect heterogeneity – baseline specification

Full size table

Table 6 Effect heterogeneity—saturated specification

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Tübbicke, S. When to use matching and weighting or regression in instrumental variable estimation? Evidence from college proximity and returns to college. Empir Econ 65, 2979–2999 (2023). https://doi.org/10.1007/s00181-023-02441-7

Download citation

Received: 23 May 2022
Accepted: 18 May 2023
Published: 08 June 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00181-023-02441-7

Keywords

JEL Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

When to use matching and weighting or regression in instrumental variable estimation? Evidence from college proximity and returns to college

Abstract

Similar content being viewed by others

Heterogeneous economic returns to higher education: evidence from Italy

Semiparametric estimation of average treatment effect through a random coefficient dummy endogenous variable model

Instrumental Variables: Conceptual Issues and an Application Considering High School Course Taking

1 Introduction

2 Identification and estimation using Instrumental variables

3 Re-estimating the returns to college exploiting college proximity

3.1 Data and descriptives

3.2 Specification and estimation methods

3.3 Implementing matching and weighting

3.4 Comparing parametric and more flexible estimates of the returns to college

3.5 Inspecting effect heterogeneity

4 Conclusion

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Additional information

Publisher’s Note

Supplementary Information

Supplementary Material 1

Supplementary Material 2

Supplementary Material 3

Appendix

Rights and permissions

About this article

Cite this article

Keywords

JEL Classification

Navigation

When to use matching and weighting or regression in instrumental variable estimation? Evidence from college proximity and returns to college

Abstract

Similar content being viewed by others

Heterogeneous economic returns to higher education: evidence from Italy

Semiparametric estimation of average treatment effect through a random coefficient dummy endogenous variable model

Instrumental Variables: Conceptual Issues and an Application Considering High School Course Taking

1 Introduction

2 Identification and estimation using Instrumental variables

3 Re-estimating the returns to college exploiting college proximity

3.1 Data and descriptives

3.2 Specification and estimation methods

3.3 Implementing matching and weighting

3.4 Comparing parametric and more flexible estimates of the returns to college

3.5 Inspecting effect heterogeneity

4 Conclusion

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Additional information

Publisher’s Note

Supplementary Information

Supplementary Material 1

Supplementary Material 2

Supplementary Material 3

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation