Optimizing C-Index via Gradient Boosting in Medical Survival Analysis

  • Alicja Wieczorkowska
  • Wojciech JarmulskiEmail author
Part of the Studies in Computational Intelligence book series (SCI, volume 880)


In medical databases, data represent the results of various medical procedures and analyses, often performed in non-uniform time steps. Therefore, when performing survival analysis, we deal with a data set with missing values, and changes over time. Such data are difficult to be used as a basis to predict survival of patients, as these data are complex and scarce. In survival analysis methods, usually partial log likelihood is maximized following the idea by Cox used in his regression. This approach is also most commonly adopted in non-linear survival analysis methods. On the other hand, the predictive performance of survival analysis is measured by concordance index (C-index). In our work we investigated whether optimizing directly C-index via gradient boosting yields better results and compared it with the other survival analysis methods on several medical datasets. The results indicate that in majority of cases gradient boosting tends to give the best predictive results and the choice of C-index as the optimized loss function leads to further improved performance.



This work was partially supported by the Research Center of the Polish-Japanese Academy of Information Technology, supported by the Ministry of Science and Higher Education in Poland.


  1. 1.
    Blair, A.L., Hadden, D.R., Weaver, J.A., Archer, D.B., Johnston, P.B., Maguire, C.J.: The 5-year prognosis for vision in diabetes. Am. J. Ophthalmol. 81, 383–396 (1976)CrossRefGoogle Scholar
  2. 2.
    Bou-Hamad, I., Larocque, D., Ben-Ameur, H.: A review of survival trees. Stat. Surv. 5, 44–71 (2011)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)CrossRefGoogle Scholar
  4. 4.
    Chen, Y., Jia, Z., Mercola, D., Xie, X.: A gradient boosting algorithm for survival analysis via direct optimization of concordance index. Comput. Math. Methods Med. Article ID 873595 (2013)Google Scholar
  5. 5.
    Cox, D.R.: Partial likelihood. Biometrika 62(2), 269–276 (1975)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Dekker, F.W., de Mutsert, R., van Dijk, P.C., Zoccali, C., Jager, K.J.: Survival analysis: time-dependent effects and time-varying risk factors. Kidney Int. 74, 994–997 (2008)CrossRefGoogle Scholar
  7. 7.
    van Dijk, P.C., Jager, K.J., Zwinderman, A.H., Zoccali, C., Dekker, F.W.: The analysis of survival data in nephrology: basic concepts and methods of Cox regression. Kidney Int. 74, 705–709 (2008)CrossRefGoogle Scholar
  8. 8.
    Duivesteijn, W., Feelders, A.J., Knobbe, A.: Exceptional model mining supervised descriptive local pattern mining with complex target concepts. Data Min. Knowl. Disc. 30, 47–98 (2016)CrossRefGoogle Scholar
  9. 9.
    Flemming, T.R., Harrington, D.P.: Counting Processes and Survival Analysis. Wiley, New York (1991)Google Scholar
  10. 10.
    Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Garcia, A.L., Wagner, K., Hothorn, T., Koebnick, C., Zunft, H.-J.F., Trippo, U.: Improved prediction of body fat by measuring skinfold thickness, circumferences, and bone breadths. Obes. Res. 13(3), 626–634 (2005)CrossRefGoogle Scholar
  13. 13.
    Harrell, F.E. Jr., Lee, K.L., Mark, D.B.: Tutorial in Biostatistics. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat. Med. 15(4), 361–387 (1996)CrossRefGoogle Scholar
  14. 14.
    Hothorn, T., Buehlmann, P., Kneib, T., Schmid, M., Hofner, B., Sobotka, F., Scheipl, F., Mayr, A.: Model-based boosting (2018). Accessed 5 July 2018
  15. 15.
    Huster, W.J., Brookmeyer, R., Self, S.G.: Modelling paired survival data with covariates. Biometrics 45, 145–156 (1989)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Ishwaran, H., Kogalur, U.B.: Random survival forests for R. R News 7(2), 25–31 (2007)zbMATHGoogle Scholar
  17. 17.
    Ishwaran, H., Kogalur, U.B.: Random forests for survival, regression, and classification (RSF-SRC) (2018). Accessed 5 July 2018
  18. 18.
    Ishwaran, H., Kogalur, U.B., Blackstone, E.H., Lauer, M.S.: Random survival forests. Ann. Appl. Stat. 2(3), 841–860 (2008)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Jager, K.J., van Dijk, P.C., Zoccali, C., Dekker, F.W.: The analysis of survival data: the Kaplan-Meier method. Kidney Int. 74, 560–565 (2008)CrossRefGoogle Scholar
  20. 20.
    Jarmulski, W., Wieczorkowska, A., Trzaska, M., Ciszek, M., Paczek, L.: Machine learning models for predicting patients survival after liver transplantation. Comput. Sci. 19(2). Scholar
  21. 21.
    Kartsonaki, C.: Survival analysis. Diagn. Histopathol. 22(7), 263–270 (2016)CrossRefGoogle Scholar
  22. 22.
    Klein, J.P., Moeschberger, M.L.: Survival Analysis Techniques for Censored and Truncated Data. Springer, Berlin (1997)zbMATHGoogle Scholar
  23. 23.
    Lacny, S., Wilson, T., Clement, F., Roberts, D.J., Faris, P., Ghali, W.A., Marshall, D.A.: Kaplan-Meier survival analysis overestimates cumulative incidence of health-related events in competing risk settings: a meta-analysis. J. Clin. Epidemiol. 93, 25–35 (2018)CrossRefGoogle Scholar
  24. 24.
    Loprinzi, C.L., Laurie, J.A., Wieand, H.S., Krook, J.E., Novotny, P.J., Kugler, J.W., Bartel, J., Law, M., Bateman, M., Klatt, N.E.: Prospective evaluation of prognostic variables from patient-completed questionnaires. North Central Cancer Treatment Group. J. Clin. Oncol. 12(3), 601–607 (1994)CrossRefGoogle Scholar
  25. 25.
    Malinchoc, M., Kamath, P.S., Gordon, F.D., Peine, C.J., Rank, J., ter Borg, P.C.J.: A model to predict poor survival in patients undergoing transjugular intrahepatic portosystemic shunts. Hepatology 31(4), 864–871 (2000)CrossRefGoogle Scholar
  26. 26.
    Mayr, A., Schmid, M.: Boosting the concordance index for survival data - A unified framework to derive and evaluate biomarker combinations. PLoS ONE 9(1), e84483 (2014)CrossRefGoogle Scholar
  27. 27.
    Raykar, V.C., Steck, H., Krishnapuram, B., Dehing-Oberije, C., Lambin, P.: On ranking in survival analysis: bounds on the concordance index. Adv. Neural Inf. Process. Syst. 20, 1209–1216 (2008)Google Scholar
  28. 28.
    Ridgeway, G.: Generalized boosted regression models (2018). Accessed 5 July 2018
  29. 29.
    The R project for statistical computing (2018). Accessed 5 July 2018
  30. 30.
    Therneau, T.M.: Survival analysis (2018). Accessed 5 July 2018
  31. 31.
    Uno, H., Cai, T., Pencina, M.J., D’Agostino, R.B., Wei, L.J.: On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med. 30(10), 1105–1117 (2011)MathSciNetGoogle Scholar
  32. 32.
    Zheng, L.-Y., Chang, Y.-T.: Risk assessment model of bottlenecks for urban expressways using survival analysis approach. Transp. Res. Procedia 25, 1544–1555 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Polish-Japanese Academy of Information TechnologyWarsawPoland

Personalised recommendations