Skip to main content

Optimizing C-Index via Gradient Boosting in Medical Survival Analysis

Part of the Studies in Computational Intelligence book series (SCI,volume 880)

Abstract

In medical databases, data represent the results of various medical procedures and analyses, often performed in non-uniform time steps. Therefore, when performing survival analysis, we deal with a data set with missing values, and changes over time. Such data are difficult to be used as a basis to predict survival of patients, as these data are complex and scarce. In survival analysis methods, usually partial log likelihood is maximized following the idea by Cox used in his regression. This approach is also most commonly adopted in non-linear survival analysis methods. On the other hand, the predictive performance of survival analysis is measured by concordance index (C-index). In our work we investigated whether optimizing directly C-index via gradient boosting yields better results and compared it with the other survival analysis methods on several medical datasets. The results indicate that in majority of cases gradient boosting tends to give the best predictive results and the choice of C-index as the optimized loss function leads to further improved performance.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Blair, A.L., Hadden, D.R., Weaver, J.A., Archer, D.B., Johnston, P.B., Maguire, C.J.: The 5-year prognosis for vision in diabetes. Am. J. Ophthalmol. 81, 383–396 (1976)

    CrossRef  Google Scholar 

  2. Bou-Hamad, I., Larocque, D., Ben-Ameur, H.: A review of survival trees. Stat. Surv. 5, 44–71 (2011)

    CrossRef  MathSciNet  Google Scholar 

  3. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    CrossRef  Google Scholar 

  4. Chen, Y., Jia, Z., Mercola, D., Xie, X.: A gradient boosting algorithm for survival analysis via direct optimization of concordance index. Comput. Math. Methods Med. Article ID 873595 (2013)

    Google Scholar 

  5. Cox, D.R.: Partial likelihood. Biometrika 62(2), 269–276 (1975)

    CrossRef  MathSciNet  Google Scholar 

  6. Dekker, F.W., de Mutsert, R., van Dijk, P.C., Zoccali, C., Jager, K.J.: Survival analysis: time-dependent effects and time-varying risk factors. Kidney Int. 74, 994–997 (2008)

    CrossRef  Google Scholar 

  7. van Dijk, P.C., Jager, K.J., Zwinderman, A.H., Zoccali, C., Dekker, F.W.: The analysis of survival data in nephrology: basic concepts and methods of Cox regression. Kidney Int. 74, 705–709 (2008)

    CrossRef  Google Scholar 

  8. Duivesteijn, W., Feelders, A.J., Knobbe, A.: Exceptional model mining supervised descriptive local pattern mining with complex target concepts. Data Min. Knowl. Disc. 30, 47–98 (2016)

    CrossRef  Google Scholar 

  9. Flemming, T.R., Harrington, D.P.: Counting Processes and Survival Analysis. Wiley, New York (1991)

    Google Scholar 

  10. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)

    CrossRef  MathSciNet  Google Scholar 

  11. Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)

    CrossRef  MathSciNet  Google Scholar 

  12. Garcia, A.L., Wagner, K., Hothorn, T., Koebnick, C., Zunft, H.-J.F., Trippo, U.: Improved prediction of body fat by measuring skinfold thickness, circumferences, and bone breadths. Obes. Res. 13(3), 626–634 (2005)

    CrossRef  Google Scholar 

  13. Harrell, F.E. Jr., Lee, K.L., Mark, D.B.: Tutorial in Biostatistics. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat. Med. 15(4), 361–387 (1996)

    CrossRef  Google Scholar 

  14. Hothorn, T., Buehlmann, P., Kneib, T., Schmid, M., Hofner, B., Sobotka, F., Scheipl, F., Mayr, A.: Model-based boosting (2018). https://cran.r-project.org/web/packages/mboost/mboost.pdf. Accessed 5 July 2018

  15. Huster, W.J., Brookmeyer, R., Self, S.G.: Modelling paired survival data with covariates. Biometrics 45, 145–156 (1989)

    CrossRef  MathSciNet  Google Scholar 

  16. Ishwaran, H., Kogalur, U.B.: Random survival forests for R. R News 7(2), 25–31 (2007)

    MATH  Google Scholar 

  17. Ishwaran, H., Kogalur, U.B.: Random forests for survival, regression, and classification (RSF-SRC) (2018). https://cran.r-project.org/web/packages/randomForestSRC/randomForestSRC.pdf. Accessed 5 July 2018

  18. Ishwaran, H., Kogalur, U.B., Blackstone, E.H., Lauer, M.S.: Random survival forests. Ann. Appl. Stat. 2(3), 841–860 (2008)

    CrossRef  MathSciNet  Google Scholar 

  19. Jager, K.J., van Dijk, P.C., Zoccali, C., Dekker, F.W.: The analysis of survival data: the Kaplan-Meier method. Kidney Int. 74, 560–565 (2008)

    CrossRef  Google Scholar 

  20. Jarmulski, W., Wieczorkowska, A., Trzaska, M., Ciszek, M., Paczek, L.: Machine learning models for predicting patients survival after liver transplantation. Comput. Sci. 19(2). https://doi.org/10.7494/csci.2018.19.2.2746

    CrossRef  Google Scholar 

  21. Kartsonaki, C.: Survival analysis. Diagn. Histopathol. 22(7), 263–270 (2016)

    CrossRef  Google Scholar 

  22. Klein, J.P., Moeschberger, M.L.: Survival Analysis Techniques for Censored and Truncated Data. Springer, Berlin (1997)

    MATH  Google Scholar 

  23. Lacny, S., Wilson, T., Clement, F., Roberts, D.J., Faris, P., Ghali, W.A., Marshall, D.A.: Kaplan-Meier survival analysis overestimates cumulative incidence of health-related events in competing risk settings: a meta-analysis. J. Clin. Epidemiol. 93, 25–35 (2018)

    CrossRef  Google Scholar 

  24. Loprinzi, C.L., Laurie, J.A., Wieand, H.S., Krook, J.E., Novotny, P.J., Kugler, J.W., Bartel, J., Law, M., Bateman, M., Klatt, N.E.: Prospective evaluation of prognostic variables from patient-completed questionnaires. North Central Cancer Treatment Group. J. Clin. Oncol. 12(3), 601–607 (1994)

    CrossRef  Google Scholar 

  25. Malinchoc, M., Kamath, P.S., Gordon, F.D., Peine, C.J., Rank, J., ter Borg, P.C.J.: A model to predict poor survival in patients undergoing transjugular intrahepatic portosystemic shunts. Hepatology 31(4), 864–871 (2000)

    CrossRef  Google Scholar 

  26. Mayr, A., Schmid, M.: Boosting the concordance index for survival data - A unified framework to derive and evaluate biomarker combinations. PLoS ONE 9(1), e84483 (2014)

    CrossRef  Google Scholar 

  27. Raykar, V.C., Steck, H., Krishnapuram, B., Dehing-Oberije, C., Lambin, P.: On ranking in survival analysis: bounds on the concordance index. Adv. Neural Inf. Process. Syst. 20, 1209–1216 (2008)

    Google Scholar 

  28. Ridgeway, G.: Generalized boosted regression models (2018). https://cran.r-project.org/web/packages/gbm/gbm.pdf. Accessed 5 July 2018

  29. The R project for statistical computing (2018). https://www.r-project.org/. Accessed 5 July 2018

  30. Therneau, T.M.: Survival analysis (2018). https://cran.r-project.org/web/packages/survival/survival.pdf. Accessed 5 July 2018

  31. Uno, H., Cai, T., Pencina, M.J., D’Agostino, R.B., Wei, L.J.: On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med. 30(10), 1105–1117 (2011)

    MathSciNet  Google Scholar 

  32. Zheng, L.-Y., Chang, Y.-T.: Risk assessment model of bottlenecks for urban expressways using survival analysis approach. Transp. Res. Procedia 25, 1544–1555 (2017)

    CrossRef  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the Research Center of the Polish-Japanese Academy of Information Technology, supported by the Ministry of Science and Higher Education in Poland.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wojciech Jarmulski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Wieczorkowska, A., Jarmulski, W. (2020). Optimizing C-Index via Gradient Boosting in Medical Survival Analysis. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) Complex Pattern Mining. Studies in Computational Intelligence, vol 880. Springer, Cham. https://doi.org/10.1007/978-3-030-36617-9_3

Download citation