An Algorithm for Creating Prognostic Systems for Cancer
The TNM staging system is universally used for classification of cancer. This system is limited since it uses only three factors (tumor size, extent of spread to lymph nodes, and status of distant metastasis) to generate stage groups. To provide a more accurate description of cancer and thus better patient care, additional factors or variables should be used to classify cancer. In this paper we propose a hierarchical clustering algorithm to develop prognostic systems that classify cancer according to multiple prognostic factors. This algorithm has many potential applications in augmenting the data currently obtained in a staging system by allowing more prognostic factors to be incorporated. The algorithm clusters combinations of prognostic factors that are formed using categories of factors. The dissimilarity between two combinations is determined by the area between two corresponding survival curves. Groups from cutting the dendrogram and survival curves of the individual groups define our prognostic systems that classify patients using survival outcomes. A demonstration of the proposed algorithm is given for patients with breast cancer from the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute.
KeywordsTNM Survival Breast cancer Hierarchical clustering Area between curves Dendrogram Prognostic system
- 2.Edge, S.B., Byrd, D.R., Compton, C.C., Fritz, A.G., Green, F.L., AJCC Cancer staging manual. 7 ed. New York: Springer, 2010.Google Scholar
- 4.Klein, J.P., and Moeschberger, M.L., Survival Analysis: Techniques for Censored and Truncated Data. 2nd. New York: Springer, 2003.Google Scholar
- 6.Chen, D., Xing, K., Henson, D., Sheng, L., Schwartz, A., Cheng, X.: Developing Prognostic Systems of Cancer Patients by Ensemble Clustering. doi:10.1155/2009/632786 (2009)
- 7.Wu, D., Yang, C., Wong, S., Meyerle, J., Zhang, B., Chen, D., An examination of TNM staging of melanoma by a machine learning algorithm. Proceedings of 2012 International Conference on Computerized Healthcare, pp. 120–126, 2012.Google Scholar
- 8.Qi, R., Wu, D., Sheng, L., Henson, D., Schwartz, A., Xu, E., Xing, K., Chen, D., On an Ensemble algorithm for clustering cancer patient data. BMC Syst. Biol., 2013. doi:10.1186/1752-0509-7-S4-S9.
- 9.Kaplan, E.L., and Meier, P., Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 53:457–481, 1958.Google Scholar
- 13.Hastie, T., Tibshirani, R., Friedman, J., The elements of statistical learning: Data mining, inference, and prediction. 2nd Edn. New York: Springer, 2013.Google Scholar
- 14.Chen, D., Wang, H., Henson, D.E., Sheng, L., Hueman, M.T., Schwartz, A.M.: Clustering Cancer Data by Areas between Survival Curves. SubmittedGoogle Scholar
- 15.The R Project for Statistical Computing. http://www.r-project.org
- 16.SEER: http://seer.cancer.gov/
- 18.Kaufman, L., and Rousseeuw, P., Finding Groups in Data: An introduction to cluster analysis. New York: Wiley, 1990.Google Scholar