Computational Statistics

, Volume 28, Issue 3, pp 1079–1101 | Cite as

Variable selection and model choice in structured survival models

Original Paper


We aim at modeling the survival time of intensive care patients suffering from severe sepsis. The nature of the problem requires a flexible model that allows to extend the classical Cox-model via the inclusion of time-varying and nonparametric effects. These structured survival models are very flexible but additional difficulties arise when model choice and variable selection are desired. In particular, it has to be decided which covariates should be assigned time-varying effects or whether linear modeling is sufficient for a given covariate. Component-wise boosting provides a means of likelihood-based model fitting that enables simultaneous variable selection and model choice. We introduce a component-wise, likelihood-based boosting algorithm for survival data that permits the inclusion of both parametric and nonparametric time-varying effects as well as nonparametric effects of continuous covariates utilizing penalized splines as the main modeling technique. An empirical evaluation of the methodology precedes the model building for the severe sepsis data. A software implementation is available to the interested reader.


Hazard regression Likelihood-based boosting Model choice P-splines Smooth effects Time-varying effects 


  1. Abrahamowicz M, MacKenzie TA (2007) Joint estimation of time-dependent and non-linear effects of continuous covariates on survival. Stat Med 26:392–408MathSciNetCrossRefGoogle Scholar
  2. Bender R, Augustin T, Blettner M (2005) Generating survival times to simulate Cox proportional hazards models. Stat Med 24:1713–1723Google Scholar
  3. Binder H, Schumacher M (2008) Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models. BMC Bioinform 9:14CrossRefGoogle Scholar
  4. Breiman L (1996) Heuristics of instability and stabilization in model selection. Ann Stat 24:2350–2383MathSciNetMATHCrossRefGoogle Scholar
  5. Bühlmann P, Hothorn T (2007) Boosting algorithms: regularization, prediction and model fitting. Stat Sci 22:477–505MATHCrossRefGoogle Scholar
  6. Bühlmann P, Yu B (2003) Boosting with the \(\text{ L}_2\) loss: regression and classification. J Am Stat Assoc 98:324–339Google Scholar
  7. Cox DR (1972) Regression models and life tables (with discussion). J R Stat Soc Ser B 34:187–220MATHGoogle Scholar
  8. de Boor C (1978) A practical guide to splines. Springer, New YorkMATHCrossRefGoogle Scholar
  9. Eilers PHC, Marx BD (1996) Flexible smoothing with B-splines and penalties (with discussion). Stat Sci 11:89–121MathSciNetMATHCrossRefGoogle Scholar
  10. Fahrmeir L, Kneib T, Lang S (2004) Penalized structured additive regression: a Bayesian perspective. Stat Sinica 14:731–761MathSciNetMATHGoogle Scholar
  11. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232MATHCrossRefGoogle Scholar
  12. Gray RJ (1992) Flexible methods for analyzing survival data using splines, with application to breast cancer prognosis. J Am Stat Assoc 87:942–951CrossRefGoogle Scholar
  13. Hartl WH, Wolf H, Schneider CP, Küchenhoff H, Jauch KW (2007) Secular trends in mortality associated with new therapeutic strategies in surgical critical illness. Am J Surg 194:535–541CrossRefGoogle Scholar
  14. Hastie T (2007) Comment: Boosting algorithms: regularization, prediction and model fitting. Stat Sci 22:513–515Google Scholar
  15. Hastie T, Tibshirani R (1993) Varying-coefficient models. J R Stat Soc Ser B 55:757–796MathSciNetMATHGoogle Scholar
  16. Hofner B (2009) CoxFlexBoost: Boosting flexible Cox models (with time-varying effects). R package version 0.7-0,
  17. Hofner B, Hothorn T, Kneib T, Schmid M (2011a) A framework for unbiased model selection based on boosting. J Comput Graph Stat 20:956–971MathSciNetCrossRefGoogle Scholar
  18. Hofner B, Kneib T, Hartl W, Küchenhoff H (2011b) Building Cox-type structured hazard regression models with time-varying effects. Stat Modell Int J 11:3–24CrossRefGoogle Scholar
  19. Hothorn T, Bühlmann P, Kneib T, Schmid M, Hofner B (2010) Model-based boosting 2.0. J Mach Learn Res 11:2109–2113MathSciNetMATHGoogle Scholar
  20. Hothorn T, Bühlmann P, Kneib T, Schmid M, Hofner B (2012) mboost: Model-Based Boosting. R package version 2.1-2,
  21. Kneib T, Fahrmeir L (2007) A mixed model approach for geoadditive hazard regression. Scand J Stat 34:207–228MathSciNetMATHCrossRefGoogle Scholar
  22. Kneib T, Hothorn T, Tutz G (2009) Variable selection and model choice in geoadditive regression models. Biometrics 65:626–634MathSciNetMATHCrossRefGoogle Scholar
  23. Mayr A, Hofner B, Schmid M (2012) The importance of knowing when to stop—a sequential stopping rule for component-wise gradient boosting. Methods Inform Med 51:178–186CrossRefGoogle Scholar
  24. Meinshausen N, Bühlmann P (2010) Stability selection (with discussion). J R Stat Soc Ser B 72:417–473CrossRefGoogle Scholar
  25. Moubarak P, Zilker S, Wolf H, Hofner B, Kneib T, Küchenhoff H, Jauch K-W, Hartl WH (2008) Activity-guided antithrombin III therapy in severe surgical sepsis: efficacy and safety according to a retrospective data analysis. Shock 30:634–641CrossRefGoogle Scholar
  26. Müller MH, Moubarak P, Wolf H, Küchenhoff H, Jauch KW, Hartl WH (2008) Independent determinants of early death in critically ill surgical patients. Shock 30:11–16CrossRefGoogle Scholar
  27. Press WH, Teukolsky SA, Vetterling WT, Flannery B (1992) Numerical recipes in C: the art of scientific computing. Cambridge University Press, CambridgeGoogle Scholar
  28. R Development Core Team (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0,
  29. Rawlings JO, Pantula S, Dickey DA (1998) Applied regression analysis: a research tool. Springer, New YorkMATHCrossRefGoogle Scholar
  30. Royston P, Altman DG (1994) Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. Appl Stat 43:429–453CrossRefGoogle Scholar
  31. Rüttinger D, Wolf H, Küchenhoff H, Jauch KW, Hartl WH (2007) Red cell transfusion: an essential factor for patient prognosis in surgical critical illness? Shock 28:165–171CrossRefGoogle Scholar
  32. Sauerbrei W, Royston P, Look M (2007) A new proposal for multivariable modelling of time-varying effects in survival data based on fractional polynomial time-transformation. Biometrica J 49:453–473MathSciNetCrossRefGoogle Scholar
  33. Schmid M, Hothorn T (2008) Boosting additive models using component-wise P-splines. Comput Stat Data Anal 53:298–311MathSciNetMATHCrossRefGoogle Scholar
  34. Therneau TM, Grambsch PM (2000) Modeling survival data: extending the Cox model. Springer, New YorkGoogle Scholar
  35. Tutz G, Binder H (2006) Generalized additive modelling with implicit variable selection by likelihood-based boosting. Biometrics 62:961–971MathSciNetMATHCrossRefGoogle Scholar
  36. Zucker DM, Karr AF (1990) Non-parametric survival analysis with time-dependent covariate effects: a penalized likelihood approach. Ann Stat 18:329–352MathSciNetMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  • Benjamin Hofner
    • 1
  • Torsten Hothorn
    • 2
  • Thomas Kneib
    • 3
  1. 1.Institut für Medizininformatik, Biometrie und EpidemiologieFriedrich-Alexander-Universität Erlangen-Nürnberg ErlangenGermany
  2. 2.Institut für StatistikLudwig-Maximilians-UniversitätMünchenGermany
  3. 3.Institut für Statistik und ÖkonometrieGeorg-August-UniversitätGöttingenGermany

Personalised recommendations