Skip to main content

Deep Learning and Prediction of Survival Period for Breast Cancer Patients

  • Conference paper
  • First Online:
AI and Analytics for Smart Cities and Service Systems (ICSS 2021)

Part of the book series: Lecture Notes in Operations Research ((LNOR))

Included in the following conference series:

  • 713 Accesses

Abstract

With the rise of deep learning, cancer-specific survival prediction is a research topic of high interest. There are many benefits to both patients and caregivers if a patient’s survival period and key factors to their survival can be acquired early in their cancer journey. In this study, we develop survival period prediction models and conduct factor analysis on data from breast cancer patients (Surveillance, Epidemiology, and End Results (SEER)). Three deep learning architectures - Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN) are selected for modeling and their performances are compared. Across both the classification and regression approaches, deep learning models significantly outperformed traditional machine learning models. For the classification approach, we obtained an 87.5% accuracy and for the regression approach, Root Mean Squared Error of 13.62% and \({R}^{2}\) value of 0.76. Furthermore, we provide an interpretation of our deep learning models by investigating feature importance and identifying features with high importance. This approach is promising and can be used to build a baseline model utilizing early diagnosis information. Over time, the predictions can be continuously enhanced through inclusion of temporal data throughout the patient’s treatment and care.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Data Availability

The SEER cancer registry is made available through the NCI and the process to access the data along with the documentation is provided at https://seer.cancer.gov/data/.

References

  1. National Cancer Institute (NCI): ‘Cancer Stat Facts: Cancer of Any Site’ (2019). https://seer.cancer.gov/statfacts/html/all.html

  2. National Cancer Institute (NCI): ‘SEER Cancer Stat Facts: Breast Cancer’ (2019). https://seer.cancer.gov/statfacts/html/breast.html

  3. Susan G. Komen: ‘Breast Cancer Statistics’ (2020). https://www.komen.org/breast-cancer/facts-statistics/breast-cancer-statistics/

  4. National Cancer Institute (NCI), National Institutes of Health (NIH): ‘Cancer Statistics’ (2019), Available: https://www.cancer.gov/about-cancer/understanding/statistics

  5. Luke, M., et al.: Imaging patterns predict patient survival and molecular subtype in glioblastoma via machine learning techniques. Neuro-Oncol. 18(3), 417–425 (2015)

    Google Scholar 

  6. Mingjie, Q., Pathak, J., Pereira, N.L., Zhai, C.: Temporal reflected logistic regression for probabilistic heart failure survival score prediction. In: 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 410–416. IEEE (2017)

    Google Scholar 

  7. Marshall, A.H., Hill, L.A., Kee, F.: Continuous dynamic bayesian networks for predicting survival of ischaemic heart disease patients. In: 2010 IEEE 23rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 178–183. IEEE (2010)

    Google Scholar 

  8. Zhang, H., Hung, C.-L., Chu, W.C.-C., Chiu, P.-F., Tang, C.Y.; Chronic kidney disease survival prediction with artificial neural networks. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1351–1356. IEEE (2018)

    Google Scholar 

  9. Bellot, A., Van der Schaar, M.: A hierarchical bayesian model for personalized survival predictions. IEEE J. Biomed. Health Inform. 23(1), 72–80 (2018)

    Google Scholar 

  10. Gray, E., Marti, J., Brewster, D.H., Wyatt, J.C., Hall, P.S.: Independent validation of the PREDICT breast cancer prognosis prediction tool in 45,789 patients using Scottish cancer registry data. Br. J. Cancer 119(7), 808–814 (2018)

    Article  Google Scholar 

  11. Song, K., et al.: Can a nomogram help to predict the overall and cancer-specific survival of patients with chondrosarcoma? Clin. Orthop. Relat. Res. 476(5), 987 (2018)

    Article  Google Scholar 

  12. Lynch Chip, M., et al.: Prediction of lung cancer patient survival via supervised machine learning classification techniques. Int. J. Med. Inform. 108, 1–8 (2017)

    Google Scholar 

  13. Said, A.A., Abd-Elmegid, L.A., Kholeif, S., Abdelsamie Gaber, A.: Classification based on clustering model for predicting main outcomes of breast cancer using hyper-parameters optimization. Int. J. Adv. Comput. Sci. Appl. 9(12), 268–273 (2018)

    Google Scholar 

  14. Bartholomai, J.A., Frieboes, H.B.; Lung cancer survival prediction via machine learning regression, classification, and statistical techniques. In: 2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), pp. 632–637. IEEE (2018)

    Google Scholar 

  15. Hegselmann, S., Gruelich, L., Varghese, J., Dugas, M.: Reproducible survival prediction with SEER cancer data. In: Machine Learning for Healthcare Conference, pp. 49–66 (2018)

    Google Scholar 

  16. Naghizadeh, M., Habibi, N.: A model to predict the survivability of cancer comorbidity through ensemble learning approach. Expert Syst. 36(3), e12392 (2019). Agrawal, Ankit, Sanchit Misra, Ramanathan Narayanan, Lalith Polepeddi, and Alok Choudhary. "A lung cancer outcome calculator using ensemble data mining on SEER data." In Proceedings of the tenth international workshop on data mining in bioinformatics, pp. 1–9. 2011

    Google Scholar 

  17. Dai, D., Jin, H., Wang, X.: Nomogram for predicting survival in triple-negative breast cancer patients with histology of infiltrating duct carcinoma: a population-based study. Am. J. Cancer Res. 8(8), 1576 (2018)

    Google Scholar 

  18. Imani, F., Chen, R., Tucker, C., Yang, H.: Random forest modeling for survival analysis of cancer recurrences. In: 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE), pp. 399–404. IEEE (2019)

    Google Scholar 

  19. Kleinlein, R., Riano, D.: Persistence of data-driven knowledge to predict breast cancer survival. Int. J. Med. Inform. 129, 303–311 (2019)

    Article  Google Scholar 

  20. Shukla, N., Hagenbuchner, M., Win, K.T., Yang, J.: Breast cancer data analysis for survivability studies and prediction. Comput. Methods Program. Biomed. 155, 199–208 (2018)

    Google Scholar 

  21. SEER Program, National Cancer Institute (NCI): ‘SEER Incidence Data, 1975–2017’ (2019), Available: https://seer.cancer.gov/data/

  22. McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5(4), 115–133 (1943)

    Article  Google Scholar 

  23. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. No. ICS-8506. California Univ San Diego La Jolla Inst for Cognitive Science (1985)

    Google Scholar 

  24. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  25. Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36(4), 193–202 (1980)

    Article  Google Scholar 

  26. Lundberg, S.M., Lee, S.-L.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, pp. 4765–4774 (2017)

    Google Scholar 

  27. Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you? Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)

    Google Scholar 

  28. Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 3145–3153 (2017)

    Google Scholar 

  29. Štrumbelj, E., Kononenko, I.: Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 41(3), 647–665 (2013)

    Article  Google Scholar 

  30. Lipovetsky, S., Conklin, M.: Analysis of regression in game theory approach. Appl. Stoch. Model. Bus. Ind. 17(4), 319–330 (2001)

    Article  Google Scholar 

  31. Datta, A., Sen, S., Zick, Y.: Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 598–617. IEEE (2016)

    Google Scholar 

  32. Bach, S., Binder, A., Montavon, G., Klauschen, F., MĂ¼ller, K.-R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS ONE 10(7), e0130140 (2015)

    Google Scholar 

  33. Friedman, J.H.:Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)

    Google Scholar 

  34. Travis, E.O.: A guide to NumPy. Trelgol Publ (2006)

    Google Scholar 

  35. McKinney, W.: Data structures for statistical computing in python. In: Proceedings of the 9th Python in Science Conference, vol. 445, pp. 51–56 (2010)

    Google Scholar 

  36. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    Google Scholar 

  37. Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265–283 (2016)

    Google Scholar 

  38. Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras

  39. Hunter, J.D.: Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9(3), 90–95 (2007)

    Article  Google Scholar 

  40. Harsha, N., Jenkins, S., Koch, P., Caruana, R.: Interpretml: a unified framework for machine learning interpretability. arXiv preprint arXiv:1909.09223 (2019)

Download references

Acknowledgement

The authors of this work would like to acknowledge the NSF I/UCRC Center for Healthcare Organization Transformation (CHOT), NSF I/UCRC award #1624727 and in part by Susan G. Komen Foundation for funding this research. Any opinions, findings, or conclusions found in this paper are those of the authors and do not necessarily reflect the views of the sponsors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robin G. Qiu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Doppalapudi, S., Yang, H., Jourquin, J., Qiu, R.G. (2021). Deep Learning and Prediction of Survival Period for Breast Cancer Patients. In: Qiu, R., Lyons, K., Chen, W. (eds) AI and Analytics for Smart Cities and Service Systems. ICSS 2021. Lecture Notes in Operations Research. Springer, Cham. https://doi.org/10.1007/978-3-030-90275-9_1

Download citation

Publish with us

Policies and ethics