# Variable selection and model choice in structured survival models

- 360 Downloads
- 7 Citations

## Abstract

We aim at modeling the survival time of intensive care patients suffering from severe sepsis. The nature of the problem requires a flexible model that allows to extend the classical Cox-model via the inclusion of time-varying and nonparametric effects. These structured survival models are very flexible but additional difficulties arise when model choice and variable selection are desired. In particular, it has to be decided which covariates should be assigned time-varying effects or whether linear modeling is sufficient for a given covariate. Component-wise boosting provides a means of likelihood-based model fitting that enables simultaneous variable selection and model choice. We introduce a component-wise, likelihood-based boosting algorithm for survival data that permits the inclusion of both parametric and nonparametric time-varying effects as well as nonparametric effects of continuous covariates utilizing penalized splines as the main modeling technique. An empirical evaluation of the methodology precedes the model building for the severe sepsis data. A software implementation is available to the interested reader.

## Keywords

Hazard regression Likelihood-based boosting Model choice P-splines Smooth effects Time-varying effects## Notes

### Acknowledgments

The authors thank the associate editor and the anonymous referees for their helpful comments, W. H. Hartl from the Department of Surgery, Klinikum Großhadern for the data set and stimulating problems and D. Inthorn and H. Schneeberger for initiation and maintenance of the database of the surgical intensive care unit. B. Hofner and T. Hothorn were supported by Deutsche Forschungsgemeinschaft, grant HO 3242/1-3.

## References

- Abrahamowicz M, MacKenzie TA (2007) Joint estimation of time-dependent and non-linear effects of continuous covariates on survival. Stat Med 26:392–408MathSciNetCrossRefGoogle Scholar
- Bender R, Augustin T, Blettner M (2005) Generating survival times to simulate Cox proportional hazards models. Stat Med 24:1713–1723Google Scholar
- Binder H, Schumacher M (2008) Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models. BMC Bioinform 9:14CrossRefGoogle Scholar
- Breiman L (1996) Heuristics of instability and stabilization in model selection. Ann Stat 24:2350–2383MathSciNetzbMATHCrossRefGoogle Scholar
- Bühlmann P, Hothorn T (2007) Boosting algorithms: regularization, prediction and model fitting. Stat Sci 22:477–505zbMATHCrossRefGoogle Scholar
- Bühlmann P, Yu B (2003) Boosting with the \(\text{ L}_2\) loss: regression and classification. J Am Stat Assoc 98:324–339Google Scholar
- Cox DR (1972) Regression models and life tables (with discussion). J R Stat Soc Ser B 34:187–220zbMATHGoogle Scholar
- de Boor C (1978) A practical guide to splines. Springer, New YorkzbMATHCrossRefGoogle Scholar
- Eilers PHC, Marx BD (1996) Flexible smoothing with B-splines and penalties (with discussion). Stat Sci 11:89–121MathSciNetzbMATHCrossRefGoogle Scholar
- Fahrmeir L, Kneib T, Lang S (2004) Penalized structured additive regression: a Bayesian perspective. Stat Sinica 14:731–761MathSciNetzbMATHGoogle Scholar
- Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232zbMATHCrossRefGoogle Scholar
- Gray RJ (1992) Flexible methods for analyzing survival data using splines, with application to breast cancer prognosis. J Am Stat Assoc 87:942–951CrossRefGoogle Scholar
- Hartl WH, Wolf H, Schneider CP, Küchenhoff H, Jauch KW (2007) Secular trends in mortality associated with new therapeutic strategies in surgical critical illness. Am J Surg 194:535–541CrossRefGoogle Scholar
- Hastie T (2007) Comment: Boosting algorithms: regularization, prediction and model fitting. Stat Sci 22:513–515Google Scholar
- Hastie T, Tibshirani R (1993) Varying-coefficient models. J R Stat Soc Ser B 55:757–796MathSciNetzbMATHGoogle Scholar
- Hofner B (2009) CoxFlexBoost: Boosting flexible Cox models (with time-varying effects). R package version 0.7-0, http://R-forge.R-project.org/projects/coxflexboost
- Hofner B, Hothorn T, Kneib T, Schmid M (2011a) A framework for unbiased model selection based on boosting. J Comput Graph Stat 20:956–971MathSciNetCrossRefGoogle Scholar
- Hofner B, Kneib T, Hartl W, Küchenhoff H (2011b) Building Cox-type structured hazard regression models with time-varying effects. Stat Modell Int J 11:3–24CrossRefGoogle Scholar
- Hothorn T, Bühlmann P, Kneib T, Schmid M, Hofner B (2010) Model-based boosting 2.0. J Mach Learn Res 11:2109–2113MathSciNetzbMATHGoogle Scholar
- Hothorn T, Bühlmann P, Kneib T, Schmid M, Hofner B (2012) mboost: Model-Based Boosting. R package version 2.1-2, http://CRAN.R-project.org/package=mboost
- Kneib T, Fahrmeir L (2007) A mixed model approach for geoadditive hazard regression. Scand J Stat 34:207–228MathSciNetzbMATHCrossRefGoogle Scholar
- Kneib T, Hothorn T, Tutz G (2009) Variable selection and model choice in geoadditive regression models. Biometrics 65:626–634MathSciNetzbMATHCrossRefGoogle Scholar
- Mayr A, Hofner B, Schmid M (2012) The importance of knowing when to stop—a sequential stopping rule for component-wise gradient boosting. Methods Inform Med 51:178–186CrossRefGoogle Scholar
- Meinshausen N, Bühlmann P (2010) Stability selection (with discussion). J R Stat Soc Ser B 72:417–473CrossRefGoogle Scholar
- Moubarak P, Zilker S, Wolf H, Hofner B, Kneib T, Küchenhoff H, Jauch K-W, Hartl WH (2008) Activity-guided antithrombin III therapy in severe surgical sepsis: efficacy and safety according to a retrospective data analysis. Shock 30:634–641CrossRefGoogle Scholar
- Müller MH, Moubarak P, Wolf H, Küchenhoff H, Jauch KW, Hartl WH (2008) Independent determinants of early death in critically ill surgical patients. Shock 30:11–16CrossRefGoogle Scholar
- Press WH, Teukolsky SA, Vetterling WT, Flannery B (1992) Numerical recipes in C: the art of scientific computing. Cambridge University Press, CambridgeGoogle Scholar
- R Development Core Team (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, http://www.R-project.org
- Rawlings JO, Pantula S, Dickey DA (1998) Applied regression analysis: a research tool. Springer, New YorkzbMATHCrossRefGoogle Scholar
- Royston P, Altman DG (1994) Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. Appl Stat 43:429–453CrossRefGoogle Scholar
- Rüttinger D, Wolf H, Küchenhoff H, Jauch KW, Hartl WH (2007) Red cell transfusion: an essential factor for patient prognosis in surgical critical illness? Shock 28:165–171CrossRefGoogle Scholar
- Sauerbrei W, Royston P, Look M (2007) A new proposal for multivariable modelling of time-varying effects in survival data based on fractional polynomial time-transformation. Biometrica J 49:453–473MathSciNetCrossRefGoogle Scholar
- Schmid M, Hothorn T (2008) Boosting additive models using component-wise P-splines. Comput Stat Data Anal 53:298–311MathSciNetzbMATHCrossRefGoogle Scholar
- Therneau TM, Grambsch PM (2000) Modeling survival data: extending the Cox model. Springer, New YorkGoogle Scholar
- Tutz G, Binder H (2006) Generalized additive modelling with implicit variable selection by likelihood-based boosting. Biometrics 62:961–971MathSciNetzbMATHCrossRefGoogle Scholar
- Zucker DM, Karr AF (1990) Non-parametric survival analysis with time-dependent covariate effects: a penalized likelihood approach. Ann Stat 18:329–352MathSciNetzbMATHCrossRefGoogle Scholar