Skip to main content

Testing exclusion restrictions and additive separability in sample selection models


Standard sample selection models with non-randomly censored outcomes assume (i) an exclusion restriction (i.e., a variable affecting selection, but not the outcome) and (ii) additive separability of the errors in the selection process. This paper proposes tests for the joint satisfaction of these assumptions by applying the approach of Huber and Mellace (Testing instrument validity for LATE identification based on inequality moment constraints, 2011) (for testing instrument validity under treatment endogeneity) to the sample selection framework. We show that the exclusion restriction and additive separability imply two testable inequality constraints that come from both point identifying and bounding the outcome distribution of the subpopulation that is always selected/observed. We apply the tests to two variables for which the exclusion restriction is frequently invoked in female wage regressions: non-wife/husband’s income and the number of (young) children. Considering eight empirical applications, our results suggest that the identifying assumptions are likely violated for the former variable, but cannot be refuted for the latter on statistical grounds.

This is a preview of subscription content, access via your institution.


  1. It has already been noticed by Manski (2003) that the exclusion restriction is violated if the identification region defined by the bounds is empty.

  2. In contrast, Mealli and Pacini (2008) consider identification (for binary treatment variables) when conditioning on a binary instrument directly rather than using \(\Pr (S=1|X,Z)\) as a control function. In this case, point identification is not obtained in general, but requires additional assumptions.

  3. This issue does not arise in the endogenous treatment framework of Huber and Mellace (2011), where all outcomes are observed.

  4. For a similar result in the context of selection models see Lee (2009), who in contrast to this paper considers monotonicity of selection in a binary treatment.

  5. Note that the instrument \(Z\) and the type \(T\) uniquely determine the value of the selection indicator \(S\) such that conditioning on the latter is redundant.

  6. In Link to Kitagawa (2010) in Appendix we show how this result compares to Kitagawa (2010), who derives a related testable implication based on comparable model assumptions.

  7. As discussed in Chen and Szroeter (2012), a sufficient condition for correct asymptotic size in the uniform sense is that the first four moments exist for each of the i.i.d. data points used to estimate the constraints.

  8. Which number and definition of the subsets \(A\) is optimal for testing is an unsolved issue. We therefore also considered more or less subsets, but the results did not differ in an important way and are for this reason not reported here.


  • Ahn H, Powell J (1993) Semiparametric estimation of censored selection models with a nonparametric selection mechanism. J Econ 58:3–29

    Article  Google Scholar 

  • Angrist J, Bettinger E, Kremer M (2006) Long-term educational consequences of secondary school vouchers: evidence from administrative records in Colombia. Am Econ Rev 96:847–862

    Article  Google Scholar 

  • Angrist J, Evans W (1998) Children and their parents labor supply: evidence from exogeneous variation in family size. Am Econ Rev 88:450–477

    Google Scholar 

  • Angrist J, Imbens G, Rubin D (1996) Identification of causal effects using instrumental variables. J Am Stat Assoc 91:444–472 (with discussion)

    Article  Google Scholar 

  • Angrist J, Lang D, Oreopoulos P (2009) Incentives and services for college achievement: evidence from a randomized trial. Am Econ J Appl Econ 1:136–163

    Article  Google Scholar 

  • Becker G (1981) A treatise on the family. Harvard University Press, Cambridge

    Google Scholar 

  • Blundell R, Gosling A, Ichimura H, Meghir C (2007) Changes in the distribution of male and female vages accounting for employment composition using bounds. Econometrica 75:323–363

    Article  Google Scholar 

  • Chang S-K (2011) Simulation estimation of two-tiered dynamic panel Tobit models with an application to the labor supply of married women. J Appl Econ 26:854–871

    Article  Google Scholar 

  • Chen L-Y, Szroeter J (2012) Testing multiple inequality hypotheses: a smoothed indicator approach, CeMMAP working paper 16/12

  • Cosslett S (1991) Distribution-free estimator of a regression model with sample selectivity. In: Barnett W, Powell J, Tauchen G (eds) Nonparametric and semiparametric methods in econometrics and statistics. Cambridge University Press, Camdridge, pp 175–198

    Google Scholar 

  • Crépon B (2006) Testing exclusion restrictions at infinity in the semiparametric selection model. IZA Discussion Paper no. 2035

  • Das M, Newey WK, Vella F (2003) Nonparametric estimation of sample selection models. Rev Econ Stud 70:33–58

    Article  Google Scholar 

  • Fleisher BM, Rhodes J (1979) Fertility. Women’s wage rates, and labor supply. Am Econ Rev 69:14–24

    Google Scholar 

  • Frangakis CE, Rubin DB (2002) Principal stratification in causal inference. Biometrics 58:21–29

    Article  Google Scholar 

  • Gallant A, Nychka D (1987) Semi-nonparametric maximum likelihood estimation. Econometrica 55:363–390

    Article  Google Scholar 

  • Gronau R (1974) Wage comparisons—a selectivity bias. J Political Econ 82:1119–1143

    Article  Google Scholar 

  • Heckman JJ (1974) Shadow prices. Market wages and labor supply. Econometrica 42:679–694

    Article  Google Scholar 

  • Heckman JJ (1976) The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Ann Econ Soc Meas 5:475–492

    Google Scholar 

  • Heckman JJ (1979) Sample selection bias as a specification error. Econometrica 47:153–161

    Article  Google Scholar 

  • Horowitz JL (1992) A smoothed maximum score estimator for the binary response model. Econometrica 60:505–531

    Article  Google Scholar 

  • Horowitz JL, Manski CF (1995) Identification and robustness with contaminated and corrupted data. Econometrica 63:281–302

    Article  Google Scholar 

  • Huber M, Mellace G (2011) Testing instrument validity for LATE identification based on inequality moment constraints, University of St Gallen, Dept. of Economics Discussion Paper no. 2011–43

  • Imbens GW, Rubin D (1997) Estimating outcome distributions for compliers in instrumental variables models. Rev Econ Stud 64:555–574

    Article  Google Scholar 

  • Kitagawa T (2010) Testing for instrument independence in the selection model. University College London (unpublished manuscript)

  • Lee DS (2009) Training. Wages, and sample selection: estimating sharp bounds on treatment effects. Rev Econ Stud 76:1071–1102

    Article  Google Scholar 

  • Manski CF (2003) Partial identification of probability distributions. Springer, New York

    Google Scholar 

  • Martins M (2001) Parametric and semiparametric estimation of sample selection models: an empirical application to the female labour force in Portugal. J Appl Econ 16:23–39

    Article  Google Scholar 

  • Mealli F, Pacini B (2008) Exploiting instrumental variables in causal inference with nonignorable outcome nonresponse using principal stratification, mimeo

  • Mroz T (1987) The sensitivity of an empirical model of married women’s hours of work to economic and statistical assumptions. Econometrica 55:765–799

    Article  Google Scholar 

  • Mulligan CB, Rubinstein Y (2008) Selection. Investment, and women’s relative wages over time. Q J Econ 123:1061–1110

    Article  Google Scholar 

  • Nakosteen RA, Westerlund O, Zimmer MA (2004) Marital matching and earnings: evidence from the unmarried population in Sweden. J Hum Resour 39:1033–1044

    Article  Google Scholar 

  • Newey WK (2007) Nonparametric continuous/discrete choice models. Int Econ Rev 48:1429–1439

    Article  Google Scholar 

  • Newey WK (2009) Two-step series estimation of sample selection models. Econ J 12:S217–S229

    Article  Google Scholar 

  • Powell JL (1987) Semiparametric Estimation of Bivariate Latent Variable Models. unpublished manuscript. University of Wisconsin-Madison

  • Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66:688–701

    Article  Google Scholar 

  • Schafgans MMA (1998) Ethnic wage differences in Malaysia: parametric and semiparametric estimation of the Chinese-Malay wage gap. J Appl Econ 13:481–504

    Article  Google Scholar 

  • Schochet PZ, Burghardt J, Glazerman S (2001) National job corps study: the impacts of job corps on participants’ employment and related outcomes, report. Mathematica Policy Research, Inc., Washington, DC

    Google Scholar 

  • Vytlacil E (2002) Independence. Monotonicity, and latent index models: an equivalence result. Econometrica 70:331–341

    Article  Google Scholar 

  • Zabel JE (1993) The relationship between hours of work and labor force participation in four models of labor supply behavior. J Labor Econ 11:387–416

    Article  Google Scholar 

Download references


We have benefited from comments by Alberto Abadie, Joshua Angrist, Guido Imbens, Toru Kitagawa, Alexa Tiemann, seminar participants at Harvard (seminar in econometrics, September 2011), and an anonymous associate editor. Martin Huber gratefully acknowledges financial support from the Swiss National Science Foundation Grant PBSGP1_138770.

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Martin Huber or Giovanni Mellace.

Additional information

An earlier version of this paper was circulated under the title “Testing instrument validity in sample selection models”.



1.1 Link to Kitagawa (2010)

The subsequent discussion links the testable implications of Sect. 3 to Kitagawa (2010), who derives a testable implication based on comparable model assumptions. Considering only positive monotonicity, Kitagawa (2010) shows in his Proposition 2.3 that under Assumptions 1 and  2,

$$\begin{aligned} f(y,S=1|Z=0) \le f(y,S=1|Z=1) \hbox { for all }y \hbox { in the support of }Y, \end{aligned}$$

i.e., the joint density of \(Y\) and \(S=1\) given \(Z=1\) must nest the joint density of \(Y\) and \(S=1\) given \(Z=0\) for any value of \(Y.\) Rearranging terms such that \(f(y,S=1|Z=1)-f(y,S=1|Z=0) \ge 0\) gives the intuitive interpretation that the pdf of the compliers’ outcome cannot be smaller than zero, as densities must not be negative.

Note that (7) in Sect. 3 is equivalent to

$$\begin{aligned} \frac{\Pr (Y\in A,S=1|Z=1)}{P_{1|0}}-\frac{P_{1|1}-P_{1|0}}{P_{1|0}}&\le \frac{\Pr (Y\in A,S=1|Z=0)}{P_{1|0}} \nonumber \\&\le \frac{\Pr (Y\in A,S=1|Z=1)}{P_{1|0}} \end{aligned}$$

for all \(A\) in the support of \(Y,\) because

$$\begin{aligned} \frac{\Pr (Y\in A|Z=1,S=1)-(1-q)}{q}&= \frac{\Pr (Y\in A,S=1|Z=1)}{q\cdot \Pr (S=1|Z=1)}-\frac{(1-q)}{q} \\&= \frac{\Pr (Y\in A,S=1|Z=1)}{P_{1|0}}-\frac{P_{1|1}-P_{1|0}}{P_{1|0}},\\ \frac{\Pr (Y\in A|Z=1,S=1)}{q}&= \frac{\Pr (Y\in A,S=1|Z=1)}{q\cdot \Pr (S=1|Z=1)}\\&= \frac{\Pr (Y\in V,D=1|Z=1)}{P_{1|0}},\\ \Pr (Y\in A|Z=0,S=1)&= \frac{\Pr (Y\in A,S=1|Z=0)}{P_{1|0}}, \end{aligned}$$

by using basic probability theory. (14) in turn implies that \(\hbox { for all }A \hbox { in the support of }Y,\)

$$\begin{aligned} \Pr (Y\in A,S=1|Z=1)-(P_{1|1}-P_{1|0})&\le \Pr (Y\in A,S=1|Z=0)\nonumber \\&\le \Pr (Y\in A,S=1|Z=1), \end{aligned}$$

and when applied to the pdf, that \( \hbox {for all }y \hbox { in the support of }Y\)

$$\begin{aligned} f(y,S=1|Z=1)-(P_{1|1}-P_{1|0})&\le f(y,S=1|Z=0)\nonumber \\&\le f(y,S=1|Z=1), \end{aligned}$$

i.e., (16) yields one additional testable implication compared to (13). If we rearrange the first part in (15) \(\Pr (Y\in A,S=1|Z=1)-(P_{1|1}-P_{1|0})\le \Pr (Y\in A,S=1|Z=0)\) to be \(\Pr (Y\in A,S=1|Z=1)-\Pr (Y\in A,S=1|Z=0)\le (P_{1|1}-P_{1|0}),\) our additional implication gets an intuitive interpretation: The joint probability of being a complier and having a particular value of the outcome (and any sum of joint probabilities defined by non-overlapping subsets \(A\)) must not be larger than the unconditional probability of being a complier, because

$$\begin{aligned} \int [f(y,S=1|Z=1)-f(y,S=1|Z=0)] dy = P_{1|1}-P_{1|0}. \end{aligned}$$

It is worth noting that if testing is based on subsets \(A\) that are non-overlapping and jointly cover the entire support of \(Y,\) then our additional testable implication in (16) is are already taken into account by (13) and thus redundant. The prevalence of some \(\Pr (Y\in A,S=1|Z=1)-\Pr (Y\in A,S=1|Z=0)>(P_{1|1}-P_{1|0})\) then necessarily implies the existence of at least one distinct \(A'\) for which \(\Pr (Y\in A',S=1|Z=1)-\Pr (Y\in A',S=1|Z=0)<0\) so that (13) is violated, too. Therefore, power gains from the additional testable implication might possibly only be realized when using subsets \(A\) that overlap (so that violations may be averaged out) and/or do not cover the entire support of \(Y,\) see also the discussion in Huber and Mellace (2011).

1.2 Chen and Szroeter’s test algorithm

This section provides the algorithm of the Chen and Szroeter (2012) test when testing the constraints on the mean outcome given in (12), but testing the probability constraints in (8) is analogous. Let \(\hat{\theta }\) denote the sample analog of \(\theta =(\theta ^m_{1},\theta ^m_{2})'.\) The algorithm can be sketched as follows:

  1. 1.

    Estimate the vector of parameters \(\hat{\theta }\) and the asymptotic variance \(\hat{J}\) of \(\sqrt{n}\cdot (\hat{\theta }-\theta ).\)

  2. 2.

    Let \(\hat{\eta }_i=1/\sqrt{\hat{J}_{i}}, \ i=1,2,\) where \(\hat{J}_{i}\) is the ith element of the main diagonal of \(\hat{J},\) and compute the smoothing function \(\hat{\Psi }_i(\delta _n^{-1}\cdot \hat{\eta }_i\cdot \hat{\theta }_i)=\Phi (\delta _n^{-1}\cdot \hat{\eta }_i\cdot \theta _i),\) where \(\Phi \) is the standard normal cdf and the tuning parameter \(\delta _n\) is a sequence satisfying \(\delta _n\rightarrow 0\) and \(\sqrt{n}\cdot \delta _n\rightarrow \infty \) as \(n\rightarrow \infty .\) In the applications, we choose \(\delta _n=\sqrt{\frac{2\cdot \ln (\ln (n))}{n}}\cdot \hat{\sigma }_{\theta _i},\) where \( \hat{\sigma }_{\theta _i}\) is the estimated standard deviation of the ith inequality constraint.

  3. 3.

    Compute the approximation term \(\hat{\Lambda }_i=\phi (\delta _n^{-1}\cdot \hat{\eta }_i\cdot \hat{\theta }_i)\cdot \frac{1}{\delta _n\cdot \sqrt{n}}, \quad i=1,2,\) with \(\phi \) being the standard normal pdf.

  4. 4.

    Define the vectors \(\hat{\Psi }=\left( \hat{\Psi }_1(\delta _n^{-1}\cdot \hat{\eta }_1\cdot \hat{\theta }_1), \hat{\Psi }_2(\delta _n^{-1}\cdot \hat{\eta }_2\cdot \hat{\theta }_2)\right) ^T,\,\hat{\Lambda }=\left( \hat{\Lambda }_1, \hat{\Lambda }_2\right) ^T,\iota _2=(1,1)^T,\,\hat{\Delta }=diag(\hat{J}_1, \hat{J}_2).\)

  5. 5.

    Let \(\hat{Q}_1=\sqrt{(}n)\cdot \hat{\Psi }^T\hat{\Delta }\hat{\theta }-\iota _2^T\hat{\Lambda }\) and \(\hat{Q}_2=\sqrt{\hat{\Psi }^T\hat{\Delta }\hat{J}\hat{\Delta }\hat{\Psi }}.\)

  6. 6.

    Compute the p-value as \(\hat{p}=\left\{ \begin{array}{cc} 1-\Phi \left( \frac{\hat{Q}_1}{\hat{Q}_2}\right) &{} \hbox { if }\hat{Q}_2>0\\ 1 &{}\hbox { if }\hat{Q}_2=0. \end{array}\right. \)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Huber, M., Mellace, G. Testing exclusion restrictions and additive separability in sample selection models. Empir Econ 47, 75–92 (2014).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Sample selection
  • Exclusion restriction
  • Additive separability
  • Monotonicity
  • Test

JEL Classification

  • C12
  • C15
  • C24
  • C26