An Algorithm for Creating Prognostic Systems for Cancer

  • Dechang Chen
  • Huan Wang
  • Li Sheng
  • Matthew T. Hueman
  • Donald E. Henson
  • Arnold M. Schwartz
  • Jigar A. Patel
Mobile Systems
Part of the following topical collections:
  1. Advances in Big-Data based mHealth Theories and Applications


The TNM staging system is universally used for classification of cancer. This system is limited since it uses only three factors (tumor size, extent of spread to lymph nodes, and status of distant metastasis) to generate stage groups. To provide a more accurate description of cancer and thus better patient care, additional factors or variables should be used to classify cancer. In this paper we propose a hierarchical clustering algorithm to develop prognostic systems that classify cancer according to multiple prognostic factors. This algorithm has many potential applications in augmenting the data currently obtained in a staging system by allowing more prognostic factors to be incorporated. The algorithm clusters combinations of prognostic factors that are formed using categories of factors. The dissimilarity between two combinations is determined by the area between two corresponding survival curves. Groups from cutting the dendrogram and survival curves of the individual groups define our prognostic systems that classify patients using survival outcomes. A demonstration of the proposed algorithm is given for patients with breast cancer from the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute.


TNM Survival Breast cancer Hierarchical clustering Area between curves Dendrogram Prognostic system 


  1. 1.
    Siegel, R.L., Miller, K.D., Jemal, A., Cancer statistics. CA Cancer J. Clin. 65:5–29, 2015.CrossRefPubMedGoogle Scholar
  2. 2.
    Edge, S.B., Byrd, D.R., Compton, C.C., Fritz, A.G., Green, F.L., AJCC Cancer staging manual. 7 ed. New York: Springer, 2010.Google Scholar
  3. 3.
    Andreu-Perez, J., Poon, C.C.Y., Merrifield, R.D., Wong, S.T.C., Yang, G.Z., Big data for health. IEEE J. Biomed. Health Inform. 19(4):1193–1208, 2015.CrossRefPubMedGoogle Scholar
  4. 4.
    Klein, J.P., and Moeschberger, M.L., Survival Analysis: Techniques for Censored and Truncated Data. 2nd. New York: Springer, 2003.Google Scholar
  5. 5.
    Gimotty, P.A., Guerry, D., Ming, M.E., et al., Thin Primary Cutaneous Malignant Melanoma: A Prognostic Tree for 10-Year Metastasis Is More Accurate Than American Joint Committee on Cancer Staging. J. Clin. Oncol. 22:3668–3676, 2004.CrossRefPubMedGoogle Scholar
  6. 6.
    Chen, D., Xing, K., Henson, D., Sheng, L., Schwartz, A., Cheng, X.: Developing Prognostic Systems of Cancer Patients by Ensemble Clustering. doi:10.1155/2009/632786 (2009)
  7. 7.
    Wu, D., Yang, C., Wong, S., Meyerle, J., Zhang, B., Chen, D., An examination of TNM staging of melanoma by a machine learning algorithm. Proceedings of 2012 International Conference on Computerized Healthcare, pp. 120–126, 2012.Google Scholar
  8. 8.
    Qi, R., Wu, D., Sheng, L., Henson, D., Schwartz, A., Xu, E., Xing, K., Chen, D., On an Ensemble algorithm for clustering cancer patient data. BMC Syst. Biol., 2013. doi:10.1186/1752-0509-7-S4-S9.
  9. 9.
    Kaplan, E.L., and Meier, P., Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 53:457–481, 1958.Google Scholar
  10. 10.
    Lin, X, and Xu, Q., A new method for the comparison of survival distributions. Pharmaceut. Statist. 9: 67–76, 2010.CrossRefGoogle Scholar
  11. 11.
    Li, H., Han, D., Hou, Y., Chen, H., Chen, Z., Statistical inference methods for two crossing survival curves: A comparison of methods. PLoS ONE 10(1):e0116774, 2015. doi:10.1371/journal.pone.0116774.CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Chen, D., Hueman, M.T., Henson, D.E., Schwartz, A.M., An algorithm for expanding the TNM staging system. Future Oncol. 12(8):1015–24, 2016.CrossRefPubMedGoogle Scholar
  13. 13.
    Hastie, T., Tibshirani, R., Friedman, J., The elements of statistical learning: Data mining, inference, and prediction. 2nd Edn. New York: Springer, 2013.Google Scholar
  14. 14.
    Chen, D., Wang, H., Henson, D.E., Sheng, L., Hueman, M.T., Schwartz, A.M.: Clustering Cancer Data by Areas between Survival Curves. SubmittedGoogle Scholar
  15. 15.
    The R Project for Statistical Computing.
  16. 16.
  17. 17.
    Henson, D.E., Ries, L., Freedman, L.S., et al., Relationship among outcome, stage of disease, and histologic grade for 22,616 cases of breast cancer. Cancer 68:2142–2149, 1991.CrossRefPubMedGoogle Scholar
  18. 18.
    Kaufman, L., and Rousseeuw, P., Finding Groups in Data: An introduction to cluster analysis. New York: Wiley, 1990.Google Scholar
  19. 19.
    Harrell, F.E., Lee, K.L., Mark D.B., Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat. Med. 15:361–387, 1996.CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media New York (outside the USA) 2016

Authors and Affiliations

  • Dechang Chen
    • 1
  • Huan Wang
    • 2
  • Li Sheng
    • 3
  • Matthew T. Hueman
    • 4
  • Donald E. Henson
    • 1
    • 5
  • Arnold M. Schwartz
    • 6
    • 7
  • Jigar A. Patel
    • 8
  1. 1.Department of Preventive Medicine and BiostatisticsThe Uniformed Services University of the Health SciencesBethesdaUSA
  2. 2.Department of StatisticsThe George Washington UniversityWashingtonUSA
  3. 3.Department of MathematicsDrexel UniversityPhiladelphiaUSA
  4. 4.Surgical Oncology, John P. Murtha Cancer CenterWalter Reed National Military Medical CenterBethesdaUSA
  5. 5.Department of SurgeryThe Uniformed Services University of the Health SciencesBethesdaUSA
  6. 6.Department of PathologyThe George Washington University Medical CenterWashingtonUSA
  7. 7.Department of SurgeryThe George Washington University Medical CenterWashingtonUSA
  8. 8.Department of SurgeryWalter Reed National Military Medical CenterBethesdaUSA

Personalised recommendations