Sample selection models for discrete and other non-Gaussian response variables

  • Adelchi Azzalini
  • Hyoung-Moon Kim
  • Hea-Jung Kim
Original Paper


Consider observation of a phenomenon of interest subject to selective sampling due to a censoring mechanism regulated by some other variable. In this context, an extensive literature exists linked to the so-called Heckman selection model. A great deal of this work has been developed under Gaussian assumption of the underlying probability distributions; considerably less work has dealt with other distributions. We examine a general construction which encompasses a variety of distributions and allows various options of the selection mechanism, focusing especially on the case of discrete response. Inferential methods based on the pertaining likelihood function are developed.


Sample selection Selection bias Heckman model Binary variables Skew-normal distribution Count data Symmetry-modulated distributions Skew-symmetric distributions 



We are grateful to two reviewers for insightful comments leading to appreciable improvement in presentation with respect to an earlier version of the paper. Hyoung-Moon Kim’s research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2015R1D1A1A01059161). Hea-Jung Kim’s research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2015R1D1A1A01057106).


  1. Anh H, Powell JL (1993) Semiparametric estimation of censored selection models with a nonparametric selection mechanism. J Econom 58:3–29MathSciNetCrossRefzbMATHGoogle Scholar
  2. Azzalini A, Capitanio A (2014) The skew-normal and related families. In: IMS monographs series. Cambridge University Press, CambridgeGoogle Scholar
  3. Copas JB, Li HG (1997) Inference for non-random samples (with discussion). J R Stat Soc Ser B 59:55–95CrossRefzbMATHGoogle Scholar
  4. Greene W (1998) Sample selection in credit-scoring models. Jpn World Econ 10:299–316CrossRefGoogle Scholar
  5. Greene WH (2012) Econometric analysis, 7th edn. Pearson Education Ltd, HarlowGoogle Scholar
  6. Heckman JJ (1976) The common structure of statistical models of truncation, sample selection and limited dependent variables, and a simple estimator for such models. Ann Econ Soc Meas 5:475–492Google Scholar
  7. Heckman JJ (1979) Sample selection bias as a specification error. Econometrica 47:153–161MathSciNetCrossRefzbMATHGoogle Scholar
  8. Marchenko YV, Genton MG (2012) A Heckman selection-\(t\) model. J Am Stat Assoc 107:304–317MathSciNetCrossRefzbMATHGoogle Scholar
  9. Marra G, Radice R (2017) GJRM: generalised joint regression models with binary/continuous/discrete/survival margins. R package version 0.1-4Google Scholar
  10. Marra G, Wyszynski K (2016) Semi-parametric copula sample selection models for count responses. Comput Stat Data Anal 104:110–129MathSciNetCrossRefGoogle Scholar
  11. McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman & Hall/CRC, LondonCrossRefzbMATHGoogle Scholar
  12. Newey WK (2009) Two-step estimation of sample selection models. Econom J 12:S217–S229MathSciNetCrossRefzbMATHGoogle Scholar
  13. Prieger JE (2002) A flexible parametric selection model for non-normal data with application to health care usage. J Appl Econom 17:367–392CrossRefGoogle Scholar
  14. Riphahn RR, Wambach A, Million A (2003) Incentive effects in the demand for health care: a bivariate panel count data estimation. J Appl Econom 18:387–405CrossRefGoogle Scholar
  15. Terza JV (1998) Estimating count data models with endogenous switching: sample selection and endogenous treatment effects. J Econom 84:129–154MathSciNetCrossRefzbMATHGoogle Scholar
  16. Van de Ven WPMM, Van Praag BMS (1981) The demand for deductibles in private health insurance: a probit model with sample selection. J Econom 17(2):229–252 (Corrigendum in 22(3):395, 1983) CrossRefGoogle Scholar
  17. Wyszynski K, Marra G (2017) Sample selection models for count data in R. Comput Stat. Google Scholar
  18. Wooldridge J (2010) Econometric analysis of cross section and panel data, 2nd edn. The MIT Press, CambridgezbMATHGoogle Scholar
  19. Zhelonkin M, Genton GG, Ronchetti E (2016) Robust inference in sample selection models. J R Stat Soc Ser B 78:805–827MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Statistical SciencesUniversity of PaduaPaduaItaly
  2. 2.Department of Applied StatisticsKonkuk UniversitySeoulKorea
  3. 3.Department of StatisticsDongguk UniversitySeoulKorea

Personalised recommendations