Statistics and Computing

, Volume 27, Issue 1, pp 283–299 | Cite as

Semi-parametric bivariate polychotomous ordinal regression

  • Francesco DonatEmail author
  • Giampiero Marra


A pair of polychotomous random variables \((Y_1,Y_2)^\top =:{\varvec{Y}}\), where each \(Y_j\) has a totally ordered support, is studied within a penalized generalized linear model framework. We deal with a triangular generating process for \({\varvec{Y}}\), a structure that has been employed in the literature to control for the presence of residual confounding. Differently from previous works, however, the proposed model allows for a semi-parametric estimation of the covariate-response relationships. In this way, the risk of model mis-specification stemming from the imposition of fixed-order polynomial functional forms is also reduced. The proposed estimation methods and related inferential results are finally applied to study the effect of education on alcohol consumption among young adults in the UK.


Alcohol (mis)use Bivariate systems of equations Ordinal responses Penalized GLM Regression splines 



We are indebted to the Associate Editor and two anonymous reviewers whose many punctual comments have improved considerably the presentation of the article. We are grateful to the Centre for Longitudinal Studies (CLS), UCL Institute of Education for allowing us to use the BCS70 data and to the UK Data Service for making them available. However, neither CLS nor the UK Data Service bear any responsibility for the analysis or interpretation of these data. This paper was completed while the first author was at the Economic Governance Support Unit of the European Parliament under a Robert Schuman traineeship.

Supplementary material

11222_2015_9622_MOESM1_ESM.pdf (2.7 mb)
Supplementary material 1 (pdf 2782 KB)


  1. Aitchison, J., Silvey, S.: The generalization of probit analysis to the case of multiple responses. Biometrika 44(1/2), 131–140 (1957)MathSciNetCrossRefzbMATHGoogle Scholar
  2. Anderson, J., Philips, P.: Regression, discrimination and measurement models for ordered categorical variables. J. R. Stat. Soc. Ser. C 30(1), 22–31 (1981)MathSciNetzbMATHGoogle Scholar
  3. Barndorff-Nielsen, O., Cox, D.: Inference and Asymptotics. Chapman & Hall, London (1994)CrossRefzbMATHGoogle Scholar
  4. Blundell, R., Dearden, L., Goodman, D., Reed, H.: The returns to higher education in Britain: evidence from a british cohort. Econ. J. 110(461), F82–F99 (2000)CrossRefGoogle Scholar
  5. Bratti, M., Miranda, A.: Selection-endogenous ordered probit and dynamic ordered probit models. Proceedings of the United Kingdom Stata Users’ Group Meetings 2009 (2009)Google Scholar
  6. Bratti, M., Miranda, A.: Non-pecuniary returns to higher education: the effect on smoking intensity in the UK. Health Econ. 19(8), 906–920 (2010)CrossRefGoogle Scholar
  7. Brunello, G., Michaud, P., Sanz-de Galdeano, A.: The rise in obesity across the Atlantic: An economic perspective. IZA Discussion Paper No. 3529 (2008)Google Scholar
  8. Buscha, F., Conte, A.: The impact of truancy on educational attainment during compulsory schooling: a bivariate ordered probit estimator with mixed effects. Manch Sch 82(1), 103–127 (2014)CrossRefGoogle Scholar
  9. Caldwell, T., Rodgers, B., Clark, C., Jefferis, B., Stansfeld, S., Power, C.: Lifecourse socioeconomic predictors of midlife drinking patterns, problems and abstention: findings from the 1958 British Birth Cohort Study. Drug Alcohol Depend 95(3), 269–278 (2008)CrossRefGoogle Scholar
  10. Cox, D., Wermuth, N.: Causality: a statistical view. Int. Stat. Rev. 72(3), 285–305 (2004)CrossRefGoogle Scholar
  11. Craven, P., Wahba, G.: Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31(4), 377–403 (1979)MathSciNetCrossRefzbMATHGoogle Scholar
  12. Dale, J.: Global cross-ratio models for bivariate, discrete, ordered responses. Biometrics 42(4), 909–917 (1986)CrossRefGoogle Scholar
  13. Delaney, L., Harmon, C., Wall, P.: Behavioral economics and drinking behavior: preliminary results from an Irish college study. Econ. Inq. 46(1), 269–272 (2008)CrossRefGoogle Scholar
  14. Droomers, M., Schrijvers, C., Casswell, S., Mackenbach, J.: Occupational level of the father and alcohol consumption during adolescence; patterns and predictors. J. Epidemiol. Community Health 57(9), 704–710 (2003)CrossRefGoogle Scholar
  15. Eilers, P., Marx, B.: Flexible smoothing with B-splines and penalties. Stat. Sci. 11(2), 89–121 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
  16. Fehr, E.: The economics of impatience. Nature 415(6869), 269–272 (2002)CrossRefGoogle Scholar
  17. Frosini, B.: Causality and causal models: a conceptual perspective. Int. Stat. Rev. 74(3), 305–334 (2006)CrossRefGoogle Scholar
  18. Fuchs, V.: Economic Aspects of Health, chapter Time Preference and Health: an Exploratory Study. University of Chicago Press, Chicago (1982)CrossRefGoogle Scholar
  19. Gertheiss, J., Tutz, G.: Penalized regression with ordinal predictors. Int. Stat. Rev. 77(3), 345–365 (2009)CrossRefGoogle Scholar
  20. Goldman, D., Smith, J.: Socioeconomic differences in the adoption of new medical technologies. Am. Econ. Rev. 95(2), 234–237 (2005)CrossRefGoogle Scholar
  21. Green, P.: Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives (with discussion). J. R. Stat. Soc., Ser. B 46(2), 149–192 (1984)zbMATHGoogle Scholar
  22. Green, P., Silverman, B.: Nonparametric Regression and Generalized Linear Models. A Roughness Penalty Approach. Chapman & Hall, London (1994)CrossRefzbMATHGoogle Scholar
  23. Greene, W., Hensher, D.: Modeling Ordered Choices. A Primer. Cambridge University Press, Cambridge (2010)CrossRefzbMATHGoogle Scholar
  24. Haberman, S.: Discussion of McCullagh (1980). J. R. Stat. Soc., Ser. B 42(2), 136–137 (1980)Google Scholar
  25. Heckman, J.: Dummy endogenous variables in a simultaneous equation system. Econometrica 46(4), 931–959 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  26. Hemmingsson, T., Lundberg, I., Diderichsen, F.: The roles of social class of origin, achieved social class and intergenerational social mobility in explaining social class inequalities in alcoholism among young men. Soc. Sci. & Med. 49(8), 1051–1059 (1999)CrossRefGoogle Scholar
  27. Hillmann, J., Kneib, T., Koepcke, L., Paz, L., Kretzberg, J.: Bivariate cumulative probit model for the comparison of neuronal encoding hypotheses. Biom. J. 56(1), 23–43 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  28. Huerta, M., Borgonovi, F.: Education, alcohol use and abuse among young adults in Britain. Soc. Sci. & Med. 71(1), 143–151 (2010)CrossRefGoogle Scholar
  29. Kauermann, G.: Penalized spline smoothing in multivariable survival models with varying coefficients. Comput. Stat. & Data Anal. 49(1), 169–186 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  30. Kauermann, G., Krivobokova, T., Fahrmeir, L.: Some asymptotic results on generalized penalized spline smoothing. J. R. Stat. Soc., Ser. B 71(2), 487–503 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  31. Keane, M.: A note on identification in the multinomial probit model. J. of Bus. & Econ. Stat. 10(2), 193–200 (1992)Google Scholar
  32. Kenkel, D.: Health behavior, health knowledge, and schooling. J. Polit. Econ. 99(2), 287–305 (1991)CrossRefGoogle Scholar
  33. Kim, K.: A bivariate cumulative probit regression model for ordered categorical data. Stat. Med. 14(12), 337–356 (1995)CrossRefGoogle Scholar
  34. Kim, Y., Gu, C.: Smoothing spline Gaussian regression: more scalable computation via efficient approximation. J. R. Stat. Soc., Ser. B 66(2), 337–356 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  35. Klein, N., Kneib, T.: Simultaneous inference in structured additive conditional copula regression models: a unifying Bayesian approach. Stat. Comput. (in press) (2015)Google Scholar
  36. Klein, N., Kneib, T., Klasen, S., Lang, L.: Bayesian structured additive distributional regression for multivariate responses. J. R. Stat. Soc., Ser. C 64(4), 569–591 (2015)Google Scholar
  37. Kosmidis, I.: Improved estimation in cumulative link models. J. R. Stat. Soc., Ser. B 76(1), 169–196 (2014)MathSciNetCrossRefGoogle Scholar
  38. Marra, G., Radice, R.: A penalized likelihood estimation approach to semiparametric sample selection binary response modeling. Electron. J. Stat. 7, 1432–1455 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  39. Marra, G., Wood, S.: Practical variable selection for generalized additive models. Comput. Stat. & Data Anal. 55(7), 2372–2387 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  40. Marra, G., Wood, S.: Coverage properties of confidence intervals for generalized additive model components. Scand. J. Stat. 39(1), 53–74 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  41. McCullagh, P.: Regression models for ordinal data (with discussion). J. R. Stat. Soc., Ser. B 42(2), 109–142 (1980)zbMATHGoogle Scholar
  42. McKelvey, R., Zavoina, W.: A statistical model for the analysis of ordinal level dependent variables. J. Math. Sociol. 4(1), 109–142 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
  43. Nelder, J., Wedderburn, R.: Generalized linear models. J. R. Stat. Soc., Ser. A 135(3), 370–384 (1972)CrossRefGoogle Scholar
  44. Nocedal, J., Wright, S.: Numerical Optimization. Springer, New York (2006)zbMATHGoogle Scholar
  45. O’Donoghue, T., Rabin, M.: The economics of immediate gratification. J. Behav. Decis. Mak. 13(2), 233–250 (2000)MathSciNetCrossRefGoogle Scholar
  46. OECD: Health at a Glance: Europe 2014. OECD Publishing (2014)Google Scholar
  47. O’Sullivan, F., Yandell, B., Raynor, W.: Automatic smoothing of regression functions in generalized linear models. J. Am. Stat. Assoc. 81(393), 96–103 (1986)MathSciNetCrossRefGoogle Scholar
  48. Peyhardi, J., Trottier, C., Guédon, Y.: A new specification of generalized linear models for categorical data. (2014) arXiv:1404.7331v2
  49. Poulton, R., Caspi, A., Milne, B., Murray Thomson, W., Taylor, A., Sears, M., Moffitt, T.: Association between children’s experience of socioeconomic disadvantage and adult health: a life-course study. Lancet 360(9346), 1640–1645 (2002)CrossRefGoogle Scholar
  50. Public Health England: Alcohol Treatment in England 2013–2014. Public Health England, London (2014)Google Scholar
  51. Radice, R., Marra, G., Wojtyś, M.: Copula regression spline models for binary outcomes. Stat. Comput. (in press) (2015)Google Scholar
  52. Royston, P., Altman, D.: Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling (with discussion). J. R. Stat. Soc., Ser. C 43(3), 429–467 (1994)Google Scholar
  53. Ruppert, D., Wand, M., Carroll, R.: Semiparametric Regression. Cambridge University Press, Cambridge (2003)CrossRefzbMATHGoogle Scholar
  54. Sajaia, Z.: Maximum likelihood estimation of a bivariate ordered probit model: implementation and monte carlo simulations. Unpublished manuscript (2008)Google Scholar
  55. Sander, W.: Schooling and quitting smoking. Rev. Econ. Stat. 77(1), 191–199 (1995)CrossRefGoogle Scholar
  56. Silverman, B.: Some aspects of the spline smoothing approach to non-parametric regression curve fitting (with discussion). J. R. Stat. Soc., Ser. B 47(1), 1–52 (1985)zbMATHGoogle Scholar
  57. Snell, E.: A scaling procedure for ordered categorical data. Biometrics 20(3), 592–607 (1964)MathSciNetCrossRefzbMATHGoogle Scholar
  58. StataCorp: STATA: data Analysis and Statistical Software: Release 13. (2015)Google Scholar
  59. UCL Institute of Education. Centre for Longitudinal Studies: Millennium Cohort Study: First Survey, 2001–2003 [computer file]. UK Data Archive [distributor], Colchester, Essex (2007)Google Scholar
  60. van der Pol, M.: Health, education and time preference. Health Econ. 20(8), 906–920 (2011)Google Scholar
  61. Wahba, G.: Bayesian “confidence intervals” for the cross-validated smoothing spline. J. R. Stat. Soc., Ser. B 45(1), 133–150 (1983)MathSciNetzbMATHGoogle Scholar
  62. Wermuth, N., Cox, D.: Distortion of effects caused by indirect confounding. Biometrika 98(1), 481–493 (2008)MathSciNetzbMATHGoogle Scholar
  63. Wood, S.: Thin plate regression splines. J. R. Stat. Soc., Ser. B 65(1), 481–493 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  64. Wood, S.: Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Am. Stat. Assoc. 99(467), 673–686 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  65. Wood, S.: Generalized Additive Models. an Introduction With R. Chapman & Hall/CRC, Boca Raton (2006)zbMATHGoogle Scholar
  66. World Health Organization: A60/14 Add.1, 60th World Health Assembly, Provisional Agenda Item 12.7. (2007)Google Scholar
  67. Yamamoto, T., Shankar, V.: Bivariate ordered-response probit model of driver’s and passenger’s injury severities in collisions with fixed objects. Accid. Anal. Prev. 36(5), 869–876 (2004)CrossRefGoogle Scholar
  68. Yee, T., Wild, C.: Vector generalized additive models. J. R. Stat. Soc., Ser. B 58(3), 481–493 (1996)MathSciNetzbMATHGoogle Scholar
  69. Zhang, Q., Ip, E.: Generalized linear model for partially ordered data. Stat. Med. 31(1), 56–68 (2012)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Department of Statistical ScienceUniversity College LondonLondonUK

Personalised recommendations