Abstract
In medical databases, data represent the results of various medical procedures and analyses, often performed in non-uniform time steps. Therefore, when performing survival analysis, we deal with a data set with missing values, and changes over time. Such data are difficult to be used as a basis to predict survival of patients, as these data are complex and scarce. In survival analysis methods, usually partial log likelihood is maximized following the idea by Cox used in his regression. This approach is also most commonly adopted in non-linear survival analysis methods. On the other hand, the predictive performance of survival analysis is measured by concordance index (C-index). In our work we investigated whether optimizing directly C-index via gradient boosting yields better results and compared it with the other survival analysis methods on several medical datasets. The results indicate that in majority of cases gradient boosting tends to give the best predictive results and the choice of C-index as the optimized loss function leads to further improved performance.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Blair, A.L., Hadden, D.R., Weaver, J.A., Archer, D.B., Johnston, P.B., Maguire, C.J.: The 5-year prognosis for vision in diabetes. Am. J. Ophthalmol. 81, 383–396 (1976)
Bou-Hamad, I., Larocque, D., Ben-Ameur, H.: A review of survival trees. Stat. Surv. 5, 44–71 (2011)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Chen, Y., Jia, Z., Mercola, D., Xie, X.: A gradient boosting algorithm for survival analysis via direct optimization of concordance index. Comput. Math. Methods Med. Article ID 873595 (2013)
Cox, D.R.: Partial likelihood. Biometrika 62(2), 269–276 (1975)
Dekker, F.W., de Mutsert, R., van Dijk, P.C., Zoccali, C., Jager, K.J.: Survival analysis: time-dependent effects and time-varying risk factors. Kidney Int. 74, 994–997 (2008)
van Dijk, P.C., Jager, K.J., Zwinderman, A.H., Zoccali, C., Dekker, F.W.: The analysis of survival data in nephrology: basic concepts and methods of Cox regression. Kidney Int. 74, 705–709 (2008)
Duivesteijn, W., Feelders, A.J., Knobbe, A.: Exceptional model mining supervised descriptive local pattern mining with complex target concepts. Data Min. Knowl. Disc. 30, 47–98 (2016)
Flemming, T.R., Harrington, D.P.: Counting Processes and Survival Analysis. Wiley, New York (1991)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)
Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)
Garcia, A.L., Wagner, K., Hothorn, T., Koebnick, C., Zunft, H.-J.F., Trippo, U.: Improved prediction of body fat by measuring skinfold thickness, circumferences, and bone breadths. Obes. Res. 13(3), 626–634 (2005)
Harrell, F.E. Jr., Lee, K.L., Mark, D.B.: Tutorial in Biostatistics. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat. Med. 15(4), 361–387 (1996)
Hothorn, T., Buehlmann, P., Kneib, T., Schmid, M., Hofner, B., Sobotka, F., Scheipl, F., Mayr, A.: Model-based boosting (2018). https://cran.r-project.org/web/packages/mboost/mboost.pdf. Accessed 5 July 2018
Huster, W.J., Brookmeyer, R., Self, S.G.: Modelling paired survival data with covariates. Biometrics 45, 145–156 (1989)
Ishwaran, H., Kogalur, U.B.: Random survival forests for R. R News 7(2), 25–31 (2007)
Ishwaran, H., Kogalur, U.B.: Random forests for survival, regression, and classification (RSF-SRC) (2018). https://cran.r-project.org/web/packages/randomForestSRC/randomForestSRC.pdf. Accessed 5 July 2018
Ishwaran, H., Kogalur, U.B., Blackstone, E.H., Lauer, M.S.: Random survival forests. Ann. Appl. Stat. 2(3), 841–860 (2008)
Jager, K.J., van Dijk, P.C., Zoccali, C., Dekker, F.W.: The analysis of survival data: the Kaplan-Meier method. Kidney Int. 74, 560–565 (2008)
Jarmulski, W., Wieczorkowska, A., Trzaska, M., Ciszek, M., Paczek, L.: Machine learning models for predicting patients survival after liver transplantation. Comput. Sci. 19(2). https://doi.org/10.7494/csci.2018.19.2.2746
Kartsonaki, C.: Survival analysis. Diagn. Histopathol. 22(7), 263–270 (2016)
Klein, J.P., Moeschberger, M.L.: Survival Analysis Techniques for Censored and Truncated Data. Springer, Berlin (1997)
Lacny, S., Wilson, T., Clement, F., Roberts, D.J., Faris, P., Ghali, W.A., Marshall, D.A.: Kaplan-Meier survival analysis overestimates cumulative incidence of health-related events in competing risk settings: a meta-analysis. J. Clin. Epidemiol. 93, 25–35 (2018)
Loprinzi, C.L., Laurie, J.A., Wieand, H.S., Krook, J.E., Novotny, P.J., Kugler, J.W., Bartel, J., Law, M., Bateman, M., Klatt, N.E.: Prospective evaluation of prognostic variables from patient-completed questionnaires. North Central Cancer Treatment Group. J. Clin. Oncol. 12(3), 601–607 (1994)
Malinchoc, M., Kamath, P.S., Gordon, F.D., Peine, C.J., Rank, J., ter Borg, P.C.J.: A model to predict poor survival in patients undergoing transjugular intrahepatic portosystemic shunts. Hepatology 31(4), 864–871 (2000)
Mayr, A., Schmid, M.: Boosting the concordance index for survival data - A unified framework to derive and evaluate biomarker combinations. PLoS ONE 9(1), e84483 (2014)
Raykar, V.C., Steck, H., Krishnapuram, B., Dehing-Oberije, C., Lambin, P.: On ranking in survival analysis: bounds on the concordance index. Adv. Neural Inf. Process. Syst. 20, 1209–1216 (2008)
Ridgeway, G.: Generalized boosted regression models (2018). https://cran.r-project.org/web/packages/gbm/gbm.pdf. Accessed 5 July 2018
The R project for statistical computing (2018). https://www.r-project.org/. Accessed 5 July 2018
Therneau, T.M.: Survival analysis (2018). https://cran.r-project.org/web/packages/survival/survival.pdf. Accessed 5 July 2018
Uno, H., Cai, T., Pencina, M.J., D’Agostino, R.B., Wei, L.J.: On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med. 30(10), 1105–1117 (2011)
Zheng, L.-Y., Chang, Y.-T.: Risk assessment model of bottlenecks for urban expressways using survival analysis approach. Transp. Res. Procedia 25, 1544–1555 (2017)
Acknowledgements
This work was partially supported by the Research Center of the Polish-Japanese Academy of Information Technology, supported by the Ministry of Science and Higher Education in Poland.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Wieczorkowska, A., Jarmulski, W. (2020). Optimizing C-Index via Gradient Boosting in Medical Survival Analysis. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) Complex Pattern Mining. Studies in Computational Intelligence, vol 880. Springer, Cham. https://doi.org/10.1007/978-3-030-36617-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-36617-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36616-2
Online ISBN: 978-3-030-36617-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)