Skip to main content

A General Machine Learning Framework for Survival Analysis

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2020)

Abstract

The modeling of time-to-event data, also known as survival analysis, requires specialized methods that can deal with censoring and truncation, time-varying features and effects, and that extend to settings with multiple competing events. However, many machine learning methods for survival analysis only consider the standard setting with right-censored data and proportional hazards assumption. The methods that do provide extensions usually address at most a subset of these challenges and often require specialized software that can not be integrated into standard machine learning workflows directly. In this work, we present a very general machine learning framework for time-to-event analysis that uses a data augmentation strategy to reduce complex survival tasks to standard Poisson regression tasks. This reformulation is based on well developed statistical theory. With the proposed approach, any algorithm that can optimize a Poisson (log-)likelihood, such as gradient boosted trees, deep neural networks, model-based boosting and many more can be used in the context of time-to-event analysis. The proposed technique does not require any assumptions with respect to the distribution of event times or the functional shapes of feature and interaction effects. Based on the proposed framework we develop new methods that are competitive with specialized state of the art approaches in terms of accuracy, and versatility, but with comparatively small investments of programming effort or requirements for specialized methodological know-how.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alaa, A.M., van der Schaar, M.: Deep multi-task gaussian processes for survival analysis with competing risks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 2326–2334 (2017)

    Google Scholar 

  2. Bender, A., Groll, A., Scheipl, F.: A generalized additive model approach to time-to-event analysis. Statistical Modelling p. 1471082X17748083 (2018)

    Google Scholar 

  3. Bender, A., Scheipl, F., Hartl, W., Day, A.G., KĂ¼chenhoff, H.: Penalized estimation of complex, non-linear exposure-lag-response associations. Biostatistics 20(2), 315–331 (2018)

    Article  MathSciNet  Google Scholar 

  4. Biganzoli, E., Boracchi, P., Marubini, E.: A general framework for neural network models on censored survival data. Neural Netw. 15(2), 209–218 (2002)

    Article  Google Scholar 

  5. Binder, H., Allignol, A., Schumacher, M., Beyersmann, J.: Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics 25(7), 890–896 (2009)

    Article  Google Scholar 

  6. Bou-Hamad, I., Larocque, D., Ben-Ameur, H.: A review of survival trees. Stat. Surv. 5, 44–71 (2011)

    Article  MathSciNet  Google Scholar 

  7. Cai, T., Hyndman, R.J., Wand, M.P.: Mixed model-based hazard estimation. J. Comput. Graph. Stat. 11(4), 784–798 (2002)

    Article  MathSciNet  Google Scholar 

  8. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2016, pp. 785–794 (2016). arXiv: 1603.02754

  9. Cox, D.R.: Regression models and life-tables. J. Royal Stat. Soc. Series B (Methodological) 34(2), 187–220 (1972)

    Google Scholar 

  10. Faraggi, D., Simon, R.: A neural network model for survival data. Stat. Med. 14(1), 73–82 (1995)

    Article  Google Scholar 

  11. Fornili, M., Ambrogi, F., Boracchi, P., Biganzoli, E.: Piecewise exponential artificial neural networks (PEANN) for modeling hazard function with right censored data. In: Formenti, E., Tagliaferri, R., Wit, E. (eds.) CIBB 2013 2013. LNCS, vol. 8452, pp. 125–136. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09042-9_9

    Chapter  Google Scholar 

  12. Friedman, J.H., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010). number: 1

    Article  Google Scholar 

  13. Friedman, M.: Piecewise exponential models for survival data with covariates. Ann. Stat. 10(1), 101–113 (1982)

    Article  MathSciNet  Google Scholar 

  14. Gerds, T.A., Kattan, M.W., Schumacher, M., Yu, C.: Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring. Stat. Med. 32(13), 2173–2184 (2013)

    Article  MathSciNet  Google Scholar 

  15. Gerds, T.A., Schumacher, M.: Consistent estimation of the expected brier score in general survival models with right-censored event times. Biometrical J. 48(6), 1029–1040 (2006)

    Article  MathSciNet  Google Scholar 

  16. Guo, G.: Event-history analysis for left-truncated data. Sociol. Methodol. 23, 217–243 (1993)

    Article  Google Scholar 

  17. Hothorn, T., BĂ¼hlmann, P.: Model-based boosting in high dimensions. Bioinformatics 22(22), 2828–2829 (2006)

    Article  Google Scholar 

  18. Hothorn, T., Hornik, K., Zeileis, A.: Unbiased recursive partitioning: a conditional inference framework. J. Comput. Graph. Stat. 15(3), 651–674 (2006)

    Article  MathSciNet  Google Scholar 

  19. Huang, X., Chen, S., Soong, S.j.: Piecewise exponential survival trees with time-dependent covariates. Biometrics 54(4), 1420–1433 (1998)

    Google Scholar 

  20. Iacobelli, S., Carstensen, B.: Multiple time scales in multi-state models. Stat. Med. 32(30), 5315–5327 (2013)

    Article  MathSciNet  Google Scholar 

  21. Ishwaran, H., et al.: Random survival forests for competing risks. Biostatistics 15(4), 757–773 (2014)

    Article  Google Scholar 

  22. Ishwaran, H., Kogalur, U.B., Blackstone, E.H., Lauer, M.S.: Random survival forests. Ann. Appl. Stat. 2(3), 841–860 (2008)

    Article  MathSciNet  Google Scholar 

  23. Jaeger, B.C., et al.: Oblique random survival forests. Ann. Appl. Stat. 13(3), 1847–1883 (2019)

    Article  MathSciNet  Google Scholar 

  24. Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 3146–3154. Curran Associates, Inc. (2017)

    Google Scholar 

  25. Klein, J.P., Moeschberger, M.L.: Survival Analysis: Techniques for Censored and Truncated Data. Springer, New York (2006)

    MATH  Google Scholar 

  26. Kyle, R.A., et al.: A long-term study of prognosis in monoclonal gammopathy of undetermined significance. N. Engl. J. Med. 346(8), 564–569 (2002)

    Article  Google Scholar 

  27. Lee, C., Yoon, J., Schaar, M.V.D.: Dynamic-DeepHit: a deep learning approach for dynamic survival analysis with competing risks based on longitudinal data. IEEE Trans. Bio-Med. Eng. 67(1), 122–133 (2020)

    Article  Google Scholar 

  28. Lee, C., Zame, W.R., Yoon, J., Schaar, M.V.d.: DeepHit: a deep learning approach to survival analysis with competing risks. In: Thirty-Second AAAI Conference on Artificial Intelligence (April 2018)

    Google Scholar 

  29. Lee, D.K.K., Chen, N., Ishwaran, H.: Boosted nonparametric hazards with time-dependent covariates. arXiv:1701.07926 [stat] (November 2019)

  30. Liestbl, K., Andersen, P.K., Andersen, U.: Survival analysis and neural nets. Stat. Med. 13(12), 1189–1200 (1994)

    Article  Google Scholar 

  31. Ranganath, R., Perotte, A., Elhadad, N., Blei, D.: Deep Survival Analysis. arXiv:1608.02158 (August 2016)

  32. Reulen, H., Kneib, T.: Boosting multi-state models. Lifetime Data Anal. 22(2), 241–262 (2015). https://doi.org/10.1007/s10985-015-9329-9

    Article  MathSciNet  MATH  Google Scholar 

  33. Sennhenn-Reulen, H., Kneib, T.: Structured fusion lasso penalized multi-state models. Stat. Med. 35(25), 4637–4659 (2016)

    Article  MathSciNet  Google Scholar 

  34. Wang, P., Li, Y., Reddy, C.K.: Machine learning for survival analysis: a survey. ACM Comput. Surv. (CSUR) 51(6), 110:1–110:36 (2019)

    Google Scholar 

  35. Wright, M.N., Ziegler, A.: Ranger: a fast implementation of random forests for high dimensional data in C++ and r. J. Stat. Softw. 77(1), 1–17 (2017)

    Article  Google Scholar 

  36. Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)

    Article  Google Scholar 

  37. Zhang, X., Zhou, Y., Ma, Y., Chen, B.C., Zhang, L., Agarwal, D.: Glmix: generalized linear mixed models for large-scale response prediction. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 363–372 (2016)

    Google Scholar 

Download references

Acknowledgements

This work has been funded by the German Federal Ministry of Education and Research (BMBF) under Grant No. 01IS18036A. The authors of this work take full responsibilities for its content.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Bender .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bender, A., RĂ¼gamer, D., Scheipl, F., Bischl, B. (2021). A General Machine Learning Framework for Survival Analysis. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12459. Springer, Cham. https://doi.org/10.1007/978-3-030-67664-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67664-3_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67663-6

  • Online ISBN: 978-3-030-67664-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics