Statistics and Computing

, Volume 26, Issue 1–2, pp 1–14 | Cite as

A unified framework of constrained regression

Article

Abstract

Generalized additive models (GAMs) play an important role in modeling and understanding complex relationships in modern applied statistics. They allow for flexible, data-driven estimation of covariate effects. Yet researchers often have a priori knowledge of certain effects, which might be monotonic or periodic (cyclic) or should fulfill boundary conditions. We propose a unified framework to incorporate these constraints for both univariate and bivariate effect estimates and for varying coefficients. As the framework is based on component-wise boosting methods, variables can be selected intrinsically, and effects can be estimated for a wide range of different distributional assumptions. Bootstrap confidence intervals for the effect estimates are derived to assess the models. We present three case studies from environmental sciences to illustrate the proposed seamless modeling framework. All discussed constrained effect estimates are implemented in the comprehensive R package mboost for model-based boosting.

Keywords

Bivariate constraints Cyclic constraints Functional gradient descent boosting Generalized additive models Monotonic constraints Periodic effects  

Supplementary material

11222_2014_9520_MOESM1_ESM.pdf (300 kb)
Supplementary material 1 (pdf 302 KB)
11222_2014_9520_MOESM2_ESM.zip (47 kb)
Supplementary material 2 (pdf 47 KB)

References

  1. Bollaerts, K., Eilers, P.H.C., van Mechelen, I.: Simple and multiple P-splines regression with shape constraints. Br. J. Math. Stat. Psychol. 59, 451–469 (2006)CrossRefGoogle Scholar
  2. Bühlmann, P., Hothorn, T.: Boosting algorithms: regularization, prediction and model fitting. Stat. Sci. 22, 477–505 (2007)MATHCrossRefGoogle Scholar
  3. Bühlmann, P., Yu, B.: Boosting with the L\(_2\) loss: regression and classification. J. Am. Stat. Assoc. 98, 324–339 (2003)MATHCrossRefGoogle Scholar
  4. Conceição, G.M.S., Miraglia, S.G.E.K., Kishi, H.S., Saldiva, P.H.N., Singer, J.M.: Air pollution and child mortality: a time-series study in São Paulo, Brazil. Environ. Health Perspect. 109, 347–350 (2001)Google Scholar
  5. Dette, H., Neumeyer, N., Pilz, K.F.: A simple nonparametric estimator of a strictly monotone regression function. Bernoulli 12, 469–490 (2006)MATHMathSciNetCrossRefGoogle Scholar
  6. de Leeuw, J., Hornik, K., Mair, P.: Isotone optimization in R: pool-adjacent-violators algorithm (PAVA) and active set methods. J. Stat. Softw. 32, 5 (2009)CrossRefGoogle Scholar
  7. Eilers, P.H.C.: Unimodal smoothing. J. Chemom. 19, 317–328 (2005)CrossRefGoogle Scholar
  8. Eilers, P.H.C., Marx, B.D.: Flexible smoothing with B-splines and penalties. Stat. Sci. 11, 89–121 (1996). (with discussion)MATHMathSciNetCrossRefGoogle Scholar
  9. Eilers, P.H.C., Marx, B.D.: Splines, knots, and penalties. Wiley Interdiscip. Rev. Comput. Stat. 2, 637–653 (2010)CrossRefGoogle Scholar
  10. Fahrmeir, L., Kneib, T., Lang, S.: Penalized structured additive regression: a Bayesian perspective. Stat. Sin. 14, 731–761 (2004)MATHMathSciNetGoogle Scholar
  11. Fang, Z., Meinshausen, N.: LASSO isotone for high-dimensional additive isotonic regression. J. Comput. Gr. Stat. 21, 72–91 (2012)MathSciNetCrossRefGoogle Scholar
  12. Fenske, N., Kneib, T., Hothorn, T.: Identifying risk factors for severe childhood malnutrition by boosting additive quantile regression. J. Am. Stat. Assoc. 106, 494–510 (2011)MATHMathSciNetCrossRefGoogle Scholar
  13. Goldfarb, D., Idnani, A.: Dual and primal-dual methods for solving strictly convex quadratic programs. Numer. Anal., pp. 226–239. Springer-Verlag, Berlin (1982)Google Scholar
  14. Goldfarb, D., Idnani, A.: A numerically stable dual method for solving strictly convex quadratic programs. Math. Program. 27, 1–33 (1983)MATHMathSciNetCrossRefGoogle Scholar
  15. Hastie, T., Tibshirani, R.: Varying-coefficient models. J. Royal Stat. Soc. Ser. B (Stat. Methodol.) 55, 757–796 (1993)MATHMathSciNetGoogle Scholar
  16. Hofner,B.: Boosting in structured additive models. PhD thesis, LMU München, http://nbn-resolving.de/urn:nbn:de:bvb:19-138053, Verlag Dr. Hut, München (2011)
  17. Hofner, B., Hothorn, T., Kneib, T., Schmid, M.: A framework for unbiased model selection based on boosting. J. Comput. Gr. Stat. 20, 956–971 (2011a)MathSciNetCrossRefGoogle Scholar
  18. Hofner, B., Müller, J., Hothorn, T.: Monotonicity-constrained species distribution models. Ecology 92, 1895–1901 (2011b)CrossRefGoogle Scholar
  19. Hofner, B., Hothorn, T., Kneib, T.: Variable selection and model choice in structured survival models. Comput. Stat. 28, 1079–1101 (2013)MATHMathSciNetCrossRefGoogle Scholar
  20. Hofner, B., Boccuto, L., Göker, M.: Controlling false discoveries in high-dimensional situations: Boosting with stability selection, unpublished manuscript (2014a)Google Scholar
  21. Hofner, B., Mayr, A., Robinzonov, N., Schmid, M.: Model-based boosting in R: a hands-on tutorial using the R package mboost. Comput. Stat. 29, 3–35 (2014b)MATHMathSciNetCrossRefGoogle Scholar
  22. Hofner, B., Mayr, A., Schmid, M.: gamboostLSS: An R package for model building and variable selection in the GAMLSS framework, http://arxiv.org/abs/1407.1774, arXiv:1407.1774 (2014c)
  23. Hothorn, T., Bühlmann, P., Kneib, T., Schmid, M., Hofner, B.: Model-based boosting 2.0. J. Mach. Learn. Res. 11, 2109–2113 (2010)MATHMathSciNetGoogle Scholar
  24. Hothorn, T., Brandl, R., Müller, J.: Large-scale model-based assessment of deer-vehicle collision risk. PLOS One 7(2), e29,510 (2012)CrossRefGoogle Scholar
  25. Hothorn, T., Bühlmann, P., Kneib, T., Schmid, M., Hofner, B.: mboost: Model-Based Boosting. http://CRAN.R-project.org/package=mboost, R package version 2.4-0 (2014a)
  26. Hothorn, T., Kneib, T., Bühlmann, P.: Conditional transformation models. J. Royal Stat. Soc. Ser. B Stat. Methodol. 76, 3–27 (2014b)CrossRefGoogle Scholar
  27. Kneib, T., Hothorn, T., Tutz, G.: Variable selection and model choice in geoadditive regression models. Biometrics 65, 626–634 (2009)MATHMathSciNetCrossRefGoogle Scholar
  28. Krivobokova, T., Kneib, T., Claeskens, G.: Simultaneous confidence bands for penalized spline estimators. J. Am. Stat. Assoc. 105, 852–863 (2010)Google Scholar
  29. Mayr, A., Fenske, N., Hofner, B., Kneib, T., Schmid, M.: Generalized additive models for location, scale and shape for high-dimensional data: a flexible approach based on boosting. J. Royal Stat. Soc. Ser. C Appl. Stat. 61, 403–427 (2012)MathSciNetCrossRefGoogle Scholar
  30. Meinshausen, N., Bühlmann, P.: Stability selection. J. Royal Stat. Soc. Ser.B Stat. Methodol. 72, 417–473 (2010). (with discussion)CrossRefGoogle Scholar
  31. Pya, N.: scam: Shape constrained additive models. http://CRAN.R-project.org/package=scam, R package version 1.1-7 (2014)
  32. Pya, N., Wood, S.N.: Shape constrained additive models. Stat. Comput. pp 1–17,doi:10.1007/s11222-013-9448-7 (2014)
  33. R Core Team (2014) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org/, R version 3.1.1
  34. Rigby, R.A., Stasinopoulos, D.M.: Generalized additive models for location, scale and shape. J. Royal Stat. Soc. Ser. C Appl. Stat. 54, 507–554 (2005). (with discussion)MATHMathSciNetCrossRefGoogle Scholar
  35. Saldiva, P., Pope, C.I., Schwartz, J., Dockery, D., Lichtenfels, A., Salge, J., Barone, I., Bohm, G.: Air pollution and mortality in elderly people: a time-series study in São Paulo, Brazil. Arch. Environ. Health 50, 159–164 (1995)CrossRefGoogle Scholar
  36. Schmid, M., Hothorn, T.: Boosting additive models using component-wise P-splines. Comput. Stat. Data Anal. 53, 298–311 (2008)MATHMathSciNetCrossRefGoogle Scholar
  37. Schmid, M., Wickler, F., Maloney, K.O., Mitchell, R., Fenske, N., Mayr, A.: Boosted beta regression. PLOS One 8(4), e61623 (2013)CrossRefGoogle Scholar
  38. Shah, R.D., Samworth, R.J.: Variable selection with error control: another look at stability selection. J. Royal Stat. Soc. Ser. B Stat. Methodol. 75, 55–80 (2013)MathSciNetCrossRefGoogle Scholar
  39. Sobotka, F., Kneib, T.: Geoadditive expectile regression. Comput. Stat. Data Anal. 56, 755–767 (2012)MATHMathSciNetCrossRefGoogle Scholar
  40. Sobotka, F., Mirkov, R., Hofner, B., Eilers, P., Kneib, T.: Modelling flow in gas transmission networks using shape-constrained expectile regression, unpublished manuscript (2014)Google Scholar
  41. Stache, A., Heller, E., Hothorn, T., Heurich, M.: Activity patterns of European roe deer (Capreolus capreolus) are strongly influenced by individual behaviour. Folia Zool. 62, 67–75 (2013)Google Scholar
  42. Wood, S.N.: Generalized Additive Models: An Introduction with R. Chapman & Hall / CRC, London (2006a)Google Scholar
  43. Wood, S.N.: Low-rank scale-invariant tensor product smooths for generalized additive mixed models. Biometrics 62, 1025–1036 (2006b)MATHMathSciNetCrossRefGoogle Scholar
  44. Wood, S.N.: Fast stable direct fitting and smoothness selection for generalized additive models. J. Royal Stat. Soc. Ser. B Stat. Methodol. 70, 495–518 (2008)MATHCrossRefGoogle Scholar
  45. Wood, S.N.: mgcv: GAMs with GCV/AIC/REML smoothness estimation and GAMMs by PQL. http://CRAN.R-project.org/package=mgcv, (2010). R package version 1.7-2

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Benjamin Hofner
    • 1
  • Thomas Kneib
    • 2
  • Torsten Hothorn
    • 3
  1. 1.Institut für Medizininformatik, Biometrie und EpidemiologieFriedrich-Alexander-Universität Erlangen-NürnbergErlangenGermany
  2. 2.Lehrstuhl für StatistikGeorg-August-Universität GöttingenGöttingenGermany
  3. 3.Institut für Epidemiologie, Biostatistik und PräventionAbteilung Biostatistik, Universität ZürichZürichSwitzerland

Personalised recommendations