Optimizing C-Index via Gradient Boosting in Medical Survival Analysis
In medical databases, data represent the results of various medical procedures and analyses, often performed in non-uniform time steps. Therefore, when performing survival analysis, we deal with a data set with missing values, and changes over time. Such data are difficult to be used as a basis to predict survival of patients, as these data are complex and scarce. In survival analysis methods, usually partial log likelihood is maximized following the idea by Cox used in his regression. This approach is also most commonly adopted in non-linear survival analysis methods. On the other hand, the predictive performance of survival analysis is measured by concordance index (C-index). In our work we investigated whether optimizing directly C-index via gradient boosting yields better results and compared it with the other survival analysis methods on several medical datasets. The results indicate that in majority of cases gradient boosting tends to give the best predictive results and the choice of C-index as the optimized loss function leads to further improved performance.
This work was partially supported by the Research Center of the Polish-Japanese Academy of Information Technology, supported by the Ministry of Science and Higher Education in Poland.
- 4.Chen, Y., Jia, Z., Mercola, D., Xie, X.: A gradient boosting algorithm for survival analysis via direct optimization of concordance index. Comput. Math. Methods Med. Article ID 873595 (2013)Google Scholar
- 9.Flemming, T.R., Harrington, D.P.: Counting Processes and Survival Analysis. Wiley, New York (1991)Google Scholar
- 14.Hothorn, T., Buehlmann, P., Kneib, T., Schmid, M., Hofner, B., Sobotka, F., Scheipl, F., Mayr, A.: Model-based boosting (2018). https://cran.r-project.org/web/packages/mboost/mboost.pdf. Accessed 5 July 2018
- 17.Ishwaran, H., Kogalur, U.B.: Random forests for survival, regression, and classification (RSF-SRC) (2018). https://cran.r-project.org/web/packages/randomForestSRC/randomForestSRC.pdf. Accessed 5 July 2018
- 24.Loprinzi, C.L., Laurie, J.A., Wieand, H.S., Krook, J.E., Novotny, P.J., Kugler, J.W., Bartel, J., Law, M., Bateman, M., Klatt, N.E.: Prospective evaluation of prognostic variables from patient-completed questionnaires. North Central Cancer Treatment Group. J. Clin. Oncol. 12(3), 601–607 (1994)CrossRefGoogle Scholar
- 27.Raykar, V.C., Steck, H., Krishnapuram, B., Dehing-Oberije, C., Lambin, P.: On ranking in survival analysis: bounds on the concordance index. Adv. Neural Inf. Process. Syst. 20, 1209–1216 (2008)Google Scholar
- 28.Ridgeway, G.: Generalized boosted regression models (2018). https://cran.r-project.org/web/packages/gbm/gbm.pdf. Accessed 5 July 2018
- 29.The R project for statistical computing (2018). https://www.r-project.org/. Accessed 5 July 2018
- 30.Therneau, T.M.: Survival analysis (2018). https://cran.r-project.org/web/packages/survival/survival.pdf. Accessed 5 July 2018