Abstract
Standard sample selection models with nonrandomly censored outcomes assume (i) an exclusion restriction (i.e., a variable affecting selection, but not the outcome) and (ii) additive separability of the errors in the selection process. This paper proposes tests for the joint satisfaction of these assumptions by applying the approach of Huber and Mellace (Testing instrument validity for LATE identification based on inequality moment constraints, 2011) (for testing instrument validity under treatment endogeneity) to the sample selection framework. We show that the exclusion restriction and additive separability imply two testable inequality constraints that come from both point identifying and bounding the outcome distribution of the subpopulation that is always selected/observed. We apply the tests to two variables for which the exclusion restriction is frequently invoked in female wage regressions: nonwife/husband’s income and the number of (young) children. Considering eight empirical applications, our results suggest that the identifying assumptions are likely violated for the former variable, but cannot be refuted for the latter on statistical grounds.
This is a preview of subscription content, access via your institution.
Notes
It has already been noticed by Manski (2003) that the exclusion restriction is violated if the identification region defined by the bounds is empty.
In contrast, Mealli and Pacini (2008) consider identification (for binary treatment variables) when conditioning on a binary instrument directly rather than using \(\Pr (S=1X,Z)\) as a control function. In this case, point identification is not obtained in general, but requires additional assumptions.
This issue does not arise in the endogenous treatment framework of Huber and Mellace (2011), where all outcomes are observed.
For a similar result in the context of selection models see Lee (2009), who in contrast to this paper considers monotonicity of selection in a binary treatment.
Note that the instrument \(Z\) and the type \(T\) uniquely determine the value of the selection indicator \(S\) such that conditioning on the latter is redundant.
As discussed in Chen and Szroeter (2012), a sufficient condition for correct asymptotic size in the uniform sense is that the first four moments exist for each of the i.i.d. data points used to estimate the constraints.
Which number and definition of the subsets \(A\) is optimal for testing is an unsolved issue. We therefore also considered more or less subsets, but the results did not differ in an important way and are for this reason not reported here.
References
Ahn H, Powell J (1993) Semiparametric estimation of censored selection models with a nonparametric selection mechanism. J Econ 58:3–29
Angrist J, Bettinger E, Kremer M (2006) Longterm educational consequences of secondary school vouchers: evidence from administrative records in Colombia. Am Econ Rev 96:847–862
Angrist J, Evans W (1998) Children and their parents labor supply: evidence from exogeneous variation in family size. Am Econ Rev 88:450–477
Angrist J, Imbens G, Rubin D (1996) Identification of causal effects using instrumental variables. J Am Stat Assoc 91:444–472 (with discussion)
Angrist J, Lang D, Oreopoulos P (2009) Incentives and services for college achievement: evidence from a randomized trial. Am Econ J Appl Econ 1:136–163
Becker G (1981) A treatise on the family. Harvard University Press, Cambridge
Blundell R, Gosling A, Ichimura H, Meghir C (2007) Changes in the distribution of male and female vages accounting for employment composition using bounds. Econometrica 75:323–363
Chang SK (2011) Simulation estimation of twotiered dynamic panel Tobit models with an application to the labor supply of married women. J Appl Econ 26:854–871
Chen LY, Szroeter J (2012) Testing multiple inequality hypotheses: a smoothed indicator approach, CeMMAP working paper 16/12
Cosslett S (1991) Distributionfree estimator of a regression model with sample selectivity. In: Barnett W, Powell J, Tauchen G (eds) Nonparametric and semiparametric methods in econometrics and statistics. Cambridge University Press, Camdridge, pp 175–198
Crépon B (2006) Testing exclusion restrictions at infinity in the semiparametric selection model. IZA Discussion Paper no. 2035
Das M, Newey WK, Vella F (2003) Nonparametric estimation of sample selection models. Rev Econ Stud 70:33–58
Fleisher BM, Rhodes J (1979) Fertility. Women’s wage rates, and labor supply. Am Econ Rev 69:14–24
Frangakis CE, Rubin DB (2002) Principal stratification in causal inference. Biometrics 58:21–29
Gallant A, Nychka D (1987) Seminonparametric maximum likelihood estimation. Econometrica 55:363–390
Gronau R (1974) Wage comparisons—a selectivity bias. J Political Econ 82:1119–1143
Heckman JJ (1974) Shadow prices. Market wages and labor supply. Econometrica 42:679–694
Heckman JJ (1976) The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Ann Econ Soc Meas 5:475–492
Heckman JJ (1979) Sample selection bias as a specification error. Econometrica 47:153–161
Horowitz JL (1992) A smoothed maximum score estimator for the binary response model. Econometrica 60:505–531
Horowitz JL, Manski CF (1995) Identification and robustness with contaminated and corrupted data. Econometrica 63:281–302
Huber M, Mellace G (2011) Testing instrument validity for LATE identification based on inequality moment constraints, University of St Gallen, Dept. of Economics Discussion Paper no. 2011–43
Imbens GW, Rubin D (1997) Estimating outcome distributions for compliers in instrumental variables models. Rev Econ Stud 64:555–574
Kitagawa T (2010) Testing for instrument independence in the selection model. University College London (unpublished manuscript)
Lee DS (2009) Training. Wages, and sample selection: estimating sharp bounds on treatment effects. Rev Econ Stud 76:1071–1102
Manski CF (2003) Partial identification of probability distributions. Springer, New York
Martins M (2001) Parametric and semiparametric estimation of sample selection models: an empirical application to the female labour force in Portugal. J Appl Econ 16:23–39
Mealli F, Pacini B (2008) Exploiting instrumental variables in causal inference with nonignorable outcome nonresponse using principal stratification, mimeo
Mroz T (1987) The sensitivity of an empirical model of married women’s hours of work to economic and statistical assumptions. Econometrica 55:765–799
Mulligan CB, Rubinstein Y (2008) Selection. Investment, and women’s relative wages over time. Q J Econ 123:1061–1110
Nakosteen RA, Westerlund O, Zimmer MA (2004) Marital matching and earnings: evidence from the unmarried population in Sweden. J Hum Resour 39:1033–1044
Newey WK (2007) Nonparametric continuous/discrete choice models. Int Econ Rev 48:1429–1439
Newey WK (2009) Twostep series estimation of sample selection models. Econ J 12:S217–S229
Powell JL (1987) Semiparametric Estimation of Bivariate Latent Variable Models. unpublished manuscript. University of WisconsinMadison
Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66:688–701
Schafgans MMA (1998) Ethnic wage differences in Malaysia: parametric and semiparametric estimation of the ChineseMalay wage gap. J Appl Econ 13:481–504
Schochet PZ, Burghardt J, Glazerman S (2001) National job corps study: the impacts of job corps on participants’ employment and related outcomes, report. Mathematica Policy Research, Inc., Washington, DC
Vytlacil E (2002) Independence. Monotonicity, and latent index models: an equivalence result. Econometrica 70:331–341
Zabel JE (1993) The relationship between hours of work and labor force participation in four models of labor supply behavior. J Labor Econ 11:387–416
Acknowledgments
We have benefited from comments by Alberto Abadie, Joshua Angrist, Guido Imbens, Toru Kitagawa, Alexa Tiemann, seminar participants at Harvard (seminar in econometrics, September 2011), and an anonymous associate editor. Martin Huber gratefully acknowledges financial support from the Swiss National Science Foundation Grant PBSGP1_138770.
Author information
Authors and Affiliations
Corresponding authors
Additional information
An earlier version of this paper was circulated under the title “Testing instrument validity in sample selection models”.
Appendix
Appendix
1.1 Link to Kitagawa (2010)
The subsequent discussion links the testable implications of Sect. 3 to Kitagawa (2010), who derives a testable implication based on comparable model assumptions. Considering only positive monotonicity, Kitagawa (2010) shows in his Proposition 2.3 that under Assumptions 1 and 2,
i.e., the joint density of \(Y\) and \(S=1\) given \(Z=1\) must nest the joint density of \(Y\) and \(S=1\) given \(Z=0\) for any value of \(Y.\) Rearranging terms such that \(f(y,S=1Z=1)f(y,S=1Z=0) \ge 0\) gives the intuitive interpretation that the pdf of the compliers’ outcome cannot be smaller than zero, as densities must not be negative.
Note that (7) in Sect. 3 is equivalent to
for all \(A\) in the support of \(Y,\) because
by using basic probability theory. (14) in turn implies that \(\hbox { for all }A \hbox { in the support of }Y,\)
and when applied to the pdf, that \( \hbox {for all }y \hbox { in the support of }Y\)
i.e., (16) yields one additional testable implication compared to (13). If we rearrange the first part in (15) \(\Pr (Y\in A,S=1Z=1)(P_{11}P_{10})\le \Pr (Y\in A,S=1Z=0)\) to be \(\Pr (Y\in A,S=1Z=1)\Pr (Y\in A,S=1Z=0)\le (P_{11}P_{10}),\) our additional implication gets an intuitive interpretation: The joint probability of being a complier and having a particular value of the outcome (and any sum of joint probabilities defined by nonoverlapping subsets \(A\)) must not be larger than the unconditional probability of being a complier, because
It is worth noting that if testing is based on subsets \(A\) that are nonoverlapping and jointly cover the entire support of \(Y,\) then our additional testable implication in (16) is are already taken into account by (13) and thus redundant. The prevalence of some \(\Pr (Y\in A,S=1Z=1)\Pr (Y\in A,S=1Z=0)>(P_{11}P_{10})\) then necessarily implies the existence of at least one distinct \(A'\) for which \(\Pr (Y\in A',S=1Z=1)\Pr (Y\in A',S=1Z=0)<0\) so that (13) is violated, too. Therefore, power gains from the additional testable implication might possibly only be realized when using subsets \(A\) that overlap (so that violations may be averaged out) and/or do not cover the entire support of \(Y,\) see also the discussion in Huber and Mellace (2011).
1.2 Chen and Szroeter’s test algorithm
This section provides the algorithm of the Chen and Szroeter (2012) test when testing the constraints on the mean outcome given in (12), but testing the probability constraints in (8) is analogous. Let \(\hat{\theta }\) denote the sample analog of \(\theta =(\theta ^m_{1},\theta ^m_{2})'.\) The algorithm can be sketched as follows:

1.
Estimate the vector of parameters \(\hat{\theta }\) and the asymptotic variance \(\hat{J}\) of \(\sqrt{n}\cdot (\hat{\theta }\theta ).\)

2.
Let \(\hat{\eta }_i=1/\sqrt{\hat{J}_{i}}, \ i=1,2,\) where \(\hat{J}_{i}\) is the ith element of the main diagonal of \(\hat{J},\) and compute the smoothing function \(\hat{\Psi }_i(\delta _n^{1}\cdot \hat{\eta }_i\cdot \hat{\theta }_i)=\Phi (\delta _n^{1}\cdot \hat{\eta }_i\cdot \theta _i),\) where \(\Phi \) is the standard normal cdf and the tuning parameter \(\delta _n\) is a sequence satisfying \(\delta _n\rightarrow 0\) and \(\sqrt{n}\cdot \delta _n\rightarrow \infty \) as \(n\rightarrow \infty .\) In the applications, we choose \(\delta _n=\sqrt{\frac{2\cdot \ln (\ln (n))}{n}}\cdot \hat{\sigma }_{\theta _i},\) where \( \hat{\sigma }_{\theta _i}\) is the estimated standard deviation of the ith inequality constraint.

3.
Compute the approximation term \(\hat{\Lambda }_i=\phi (\delta _n^{1}\cdot \hat{\eta }_i\cdot \hat{\theta }_i)\cdot \frac{1}{\delta _n\cdot \sqrt{n}}, \quad i=1,2,\) with \(\phi \) being the standard normal pdf.

4.
Define the vectors \(\hat{\Psi }=\left( \hat{\Psi }_1(\delta _n^{1}\cdot \hat{\eta }_1\cdot \hat{\theta }_1), \hat{\Psi }_2(\delta _n^{1}\cdot \hat{\eta }_2\cdot \hat{\theta }_2)\right) ^T,\,\hat{\Lambda }=\left( \hat{\Lambda }_1, \hat{\Lambda }_2\right) ^T,\iota _2=(1,1)^T,\,\hat{\Delta }=diag(\hat{J}_1, \hat{J}_2).\)

5.
Let \(\hat{Q}_1=\sqrt{(}n)\cdot \hat{\Psi }^T\hat{\Delta }\hat{\theta }\iota _2^T\hat{\Lambda }\) and \(\hat{Q}_2=\sqrt{\hat{\Psi }^T\hat{\Delta }\hat{J}\hat{\Delta }\hat{\Psi }}.\)

6.
Compute the pvalue as \(\hat{p}=\left\{ \begin{array}{cc} 1\Phi \left( \frac{\hat{Q}_1}{\hat{Q}_2}\right) &{} \hbox { if }\hat{Q}_2>0\\ 1 &{}\hbox { if }\hat{Q}_2=0. \end{array}\right. \)
Rights and permissions
About this article
Cite this article
Huber, M., Mellace, G. Testing exclusion restrictions and additive separability in sample selection models. Empir Econ 47, 75–92 (2014). https://doi.org/10.1007/s0018101307421
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s0018101307421
Keywords
 Sample selection
 Exclusion restriction
 Additive separability
 Monotonicity
 Test
JEL Classification
 C12
 C15
 C24
 C26