Statistics and Computing

, Volume 26, Issue 5, pp 981–995 | Cite as

Copula regression spline models for binary outcomes

  • Rosalba Radice
  • Giampiero Marra
  • Małgorzata Wojtyś


We introduce a framework for estimating the effect that a binary treatment has on a binary outcome in the presence of unobserved confounding. The methodology is applied to a case study which uses data from the Medical Expenditure Panel Survey and whose aim is to estimate the effect of private health insurance on health care utilization. Unobserved confounding arises when variables which are associated with both treatment and outcome are not available (in economics this issue is known as endogeneity). Also, treatment and outcome may exhibit a dependence which cannot be modeled using a linear measure of association, and observed confounders may have a non-linear impact on the treatment and outcome variables. The problem of unobserved confounding is addressed using a two-equation structural latent variable framework, where one equation essentially describes a binary outcome as a function of a binary treatment whereas the other equation determines whether the treatment is received. Non-linear dependence between treatment and outcome is dealt using copula functions, whereas covariate-response relationships are flexibly modeled using a spline approach. Related model fitting and inferential procedures are developed, and asymptotic arguments presented.


Bivariate binary outcomes Copula  Endogeneity  Penalized regression spline Simultaneous equation estimation Unobserved confounding 



We would like to thank two anonymous reviewers and the Associate Editor for many suggestions which helped to clarify the contribution of the paper and improved considerably the presentation of the article.

Supplementary material

11222_2015_9581_MOESM1_ESM.pdf (910 kb)
Supplementary material 1 (pdf 910 KB)


  1. Abadie, A., Drukker, D., Herr, J.L., Imbens, G.W.: Implementing matching estimators for average treatment effects in Stata. Stata J. 4, 290–311 (2004)Google Scholar
  2. Azzalini, A.: A class of distributions which includes the normal one. Scand. J. Stat. 12, 171–178 (1985)MathSciNetzbMATHGoogle Scholar
  3. Azzalini, A., Arellano-Valle, R.B.: Maximum penalized likelihood estimation for skew-normal and skew-t distributions. J. Stat. Plan. Inference 143, 419–433 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  4. Barndorff-Nielsen, O., Cox, D.: Asymptotic Techniques for Use in Statistics. Chapman and Hall, London (1989)CrossRefzbMATHGoogle Scholar
  5. Bazan, J.L., Bolfarinez, H., Branco, M.B.: A framework for skew-probit links in binary regression. Commun. Stat. Theory Methods 39, 678–697 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  6. Brechmann, E.C., Schepsmeier, U.: Modeling dependence with c- and d-vine copulas: the R package CDVine. J. Stat. Softw. 52(3), 1–27 (2013)CrossRefGoogle Scholar
  7. Buchmueller, T.C., Grumbach, K., Kronick, R., Kahn, J.G.: Book review: the effect of health insurance on medical care utilization and implications for insurance expansion: a review of the literature. Med. Care Res. Rev. 62, 3–30 (2005)CrossRefGoogle Scholar
  8. Chib, S., Greenberg, E.: Semiparametric modeling and estimation of instrumental variable models. J. Comput. Graph. Stat. 16, 86–114 (2007)MathSciNetCrossRefGoogle Scholar
  9. Chib, S., Hamilton, B.H.: Semiparametric Bayes analysis of longitudinal data treatment models. J. Econom. 110, 67–89 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  10. Clarke, P.S., Windmeijer, F.: Instrumental variable estimators for binary outcomes. J. Am. Stat. Assoc. 107, 1638–1652 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  11. Deheuvels, P.: A Kolmogorov–Smirnov type test for independence and multivariate samples. Rom. J. Pure Appl. Math. 26, 213–226 (1981a)MathSciNetzbMATHGoogle Scholar
  12. Deheuvels, P.: A Nonparametric Test of Independence, pp. 29–50. L’ Institut Statistique Universitaire de Paris, Paris (1981b)Google Scholar
  13. Durante, F.: Construction of non-exchangeable bivariate distribution functions. Stat. Pap. 50, 383–391 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  14. Frees, E.W., Valdez, E.A.: Understanding relationships using copulas. North Am. Actuar. J. 2, 1–25 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  15. Genest, C., Ghoudi, K., Rivest, L.P.: A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika 82, 543–552 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  16. Genest, C., Nikoloulopoulos, A.K., Rivest, L.-P., Fortin, M.: Predicting dependent binary outcomes through logistic regressions and meta-elliptical copulas. Braz. J. Probab. Stat. 27, 265–284 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  17. Gitto, L., Santoro, D., Sobbrio, G.: Choice of dialysis treatment and type of medical unit (private vs public), application of a recursive bivariate probit. Health Econ. 15, 1251–1256 (2006)CrossRefGoogle Scholar
  18. Goldman, D.P., Bhattacharya, J., McCaffrey, D.F., Duan, N., Leibowitz, A.A., Joyce, G.F., Morton, S.C.: Effect of insurance on mortality in an HIV-positive population in care. J. Am. Stat. Assoc. 96, 883–894 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  19. Goodman, L.A., Kruskal, W.H.: Measures of association for cross classification. J. Am. Stat. Assoc. 49, 732–764 (1954)zbMATHGoogle Scholar
  20. Greene, W.H.: Econometric Analysis. Prentice Hall, New York (2012)Google Scholar
  21. Gu, C.: Smoothing Spline ANOVA Models. Springer, London (2002)CrossRefzbMATHGoogle Scholar
  22. Han, S., Vytlacil, E.J.: Identification in a generalization of bivariate probit models with endogenous regressors. Revise and resubmit. J. Econom. (2014)
  23. Hastie, T., Tibshirani, R.: Varying-coefficient models. J. R. Stat. Soc. B 55, 757–796 (1993)MathSciNetzbMATHGoogle Scholar
  24. Heckman, J.J.: Dummy endogenous variables in a simultaneous equation system. Econometrica 46, 931–959 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  25. Heckman, J.J., Ichimura, H., Todd, P.: Matching as an econometric evaluation estimator: evidence from evaluating a job training programme. Rev. Econ. Stud. 64, 605–654 (1997)CrossRefzbMATHGoogle Scholar
  26. Holly, A., Gardiol, L., Domenighetti, G., Brigitte, B.: An econometric model of health care utilization and health insurance in Switzerland. Eur. Econ. Rev. 42(3–5), 513–522 (1998)CrossRefGoogle Scholar
  27. Hopkins, S., Kiddi, M.P.: The determinants of the demand for private health insurance under medicare. Appl. Econ. 28, 1623–1632 (1996)CrossRefGoogle Scholar
  28. Jones, A.M., Koolman, X., Doorslaer, E.V.: The impact of having supplementary private health insurance on the uses of specialists. Annales d’Economie et de Statistique 83/84, 251–275 (2006)CrossRefGoogle Scholar
  29. Kauermann, G.: Penalized spline smoothing in multivariable survival models with varying coefficients. Comput. Stat. Data Anal. 49, 169–186 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  30. Kauermann, G., Krivobokova, T., Fahrmeir, L.: Some asymptotics results on generalized penalized spline smoothing. J. R. Stat. Soc. B 71, 487–503 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  31. Kawatkar, A.A., Nichol, M.B.: Estimation of causal effects of physical activity on obesity by a recursive bivariate probit model. Value Health 12, A131–A132 (2009)CrossRefGoogle Scholar
  32. Kim, Y.J., Gu, C.: Smoothing spline gaussian regression: more scalable computation via efficient approximation. J. R. Stat. Soc. B 66, 337–356 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  33. Latif, E.: The impact of diabetes on employment in Canada. Health Econ. 18, 577–589 (2009)CrossRefGoogle Scholar
  34. Li, Y., Jensen, G.A.: The impact of private long-term care insurance on the use of long-term care. Inquiry 48(1), 34–50 (2011)Google Scholar
  35. Maddala, G.S.: Limited Dependent and Qualitative Variables in Econometrics. Cambridge University Press, Cambridge (1983)CrossRefzbMATHGoogle Scholar
  36. Marra, G., Radice, R.: SemiParBIVProbit: semiparametric bivariate probit modelling. R package version 3.3 (2015)Google Scholar
  37. Marra, G.: On p-values for semiparametric bivariate probit models. Stat. Methodol. 10, 23–28 (2013)MathSciNetCrossRefGoogle Scholar
  38. Marra, G., Radice, R.: Estimation of a semiparametric recursive bivariate probit model in the presence of endogeneity. Can. J. Stat. 39, 259–279 (2011a)MathSciNetCrossRefzbMATHGoogle Scholar
  39. Marra, G., Radice, R.: A flexible instrumental variable approach. Stat. Model. 11, 581–603 (2011b)MathSciNetCrossRefGoogle Scholar
  40. Marra, G., Wood, S.N.: Practical variable selection for generalized additive models. Comput. Stat. Data Anal. 55, 2372–2387 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  41. Marra, G., Wood, S.: Coverage properties of confidence intervals for generalized additive model components. Scand. J. Stat. 39, 53–74 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  42. McCullagh, P.: Tensor Methods in Statistics. Chapman and Hall, London (1987)zbMATHGoogle Scholar
  43. Nelsen, R.: An Introduction to Copulas. Springer, New York (2006)zbMATHGoogle Scholar
  44. Nelsen, R.B.: Extremes of nonexchangeability. Stat. Pap. 48, 329–336 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  45. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (2006)zbMATHGoogle Scholar
  46. R Development Core Team: R: a Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2015). (ISBN 3-900051-07-0)Google Scholar
  47. Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55 (1983)MathSciNetCrossRefzbMATHGoogle Scholar
  48. Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric Regression. Cambridge University Press, New York (2003)CrossRefzbMATHGoogle Scholar
  49. Shane, D., Trivedi, P.K.: What drives differences in health care demand? the role of health insurance and selection bias. Health, Econometrics and Data Group (HEDG) working papers (2012)Google Scholar
  50. Sindelar, J.L.: Differential use of medical care by sex. J. Polit. Econ. 90, 1003–1019 (1982)CrossRefGoogle Scholar
  51. Sklar, A.: Fonctions de répartition é n dimensions et leurs marges. Publications de l’Institut de Statistique de l’Université de Paris 8, 229–231 (1959)MathSciNetzbMATHGoogle Scholar
  52. Sklar, A.: Random variables, joint distributions, and copulas. Kybernetica 9, 449–460 (1973)MathSciNetzbMATHGoogle Scholar
  53. Srivastava, P., Zhao, X.: Impact of private health insurance on the choice of public versus private hospital services. Health, Econometrics and Data Group (HEDG) working papers (2008)Google Scholar
  54. Swihart, B.J., Caffo, B.S., Crainiceanu, C.M.: A unifying framework for marginalised random-intercept models of correlated binary outcomes. Comput. Stat. Data Anal. 82, 275–295 (2014)MathSciNetGoogle Scholar
  55. Tajar, A., Denuit, M., Lambert, P.: Copula-type representation for random couples with bernoulli margins. Working paper (2001)Google Scholar
  56. Trivedi, P.K., Zimmer, D.M.: Copula modeling: an introduction for practitioners. Found. Trends. Econom. 1(1), 1–111 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  57. Tutz, G., Petry, S.: Generalized additive models with unknown link function including variable selection. Technical report (2013)Google Scholar
  58. Vuong, Q.H.: Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57, 307–333 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  59. Wiesenfarth, M., Kneib, T.: Bayesian geoadditive sample selection models. J. R. Stat. Soc. C 59, 381–404 (2011)MathSciNetCrossRefGoogle Scholar
  60. Wilde, J.: Identification of multiple equation probit models with endogenous dummy regressors. Econ. Lett. 69, 309–312 (2000)CrossRefzbMATHGoogle Scholar
  61. Winkelmann, R.: Copula bivariate probit models: with an application to medical expenditures. Health Econ. 21, 1444–1455 (2012)Google Scholar
  62. Wood, S.N.: Thin plate regression splines. J. R. Stat. Soc. B 65, 95–114 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  63. Wood, S.N.: Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Am. Stat. Assoc. 99, 673–686 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  64. Wood, S.N.: Generalized additive models: an introduction with R. Chapman & Hall/CRC, London (2006)zbMATHGoogle Scholar
  65. Wood, S.N.: On p-values for smooth components of an extended generalized additive model. Biometrika 100, 221–228 (2013)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Rosalba Radice
    • 1
  • Giampiero Marra
    • 2
  • Małgorzata Wojtyś
    • 3
    • 4
  1. 1.Department of Economics, Mathematics and StatisticsBirkbeckLondonUK
  2. 2.Department of Statistical ScienceUniversity College LondonLondonUK
  3. 3.Centre for Mathematical SciencesPlymouth UniversityPlymouthUK
  4. 4.Faculty of Mathematics and Information ScienceWarsaw University of TechnologyWarszawaPoland

Personalised recommendations