Abstract
With the rise of deep learning, cancer-specific survival prediction is a research topic of high interest. There are many benefits to both patients and caregivers if a patient’s survival period and key factors to their survival can be acquired early in their cancer journey. In this study, we develop survival period prediction models and conduct factor analysis on data from breast cancer patients (Surveillance, Epidemiology, and End Results (SEER)). Three deep learning architectures - Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN) are selected for modeling and their performances are compared. Across both the classification and regression approaches, deep learning models significantly outperformed traditional machine learning models. For the classification approach, we obtained an 87.5% accuracy and for the regression approach, Root Mean Squared Error of 13.62% and \({R}^{2}\) value of 0.76. Furthermore, we provide an interpretation of our deep learning models by investigating feature importance and identifying features with high importance. This approach is promising and can be used to build a baseline model utilizing early diagnosis information. Over time, the predictions can be continuously enhanced through inclusion of temporal data throughout the patient’s treatment and care.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Data Availability
The SEER cancer registry is made available through the NCI and the process to access the data along with the documentation is provided at https://seer.cancer.gov/data/.
References
National Cancer Institute (NCI): ‘Cancer Stat Facts: Cancer of Any Site’ (2019). https://seer.cancer.gov/statfacts/html/all.html
National Cancer Institute (NCI): ‘SEER Cancer Stat Facts: Breast Cancer’ (2019). https://seer.cancer.gov/statfacts/html/breast.html
Susan G. Komen: ‘Breast Cancer Statistics’ (2020). https://www.komen.org/breast-cancer/facts-statistics/breast-cancer-statistics/
National Cancer Institute (NCI), National Institutes of Health (NIH): ‘Cancer Statistics’ (2019), Available: https://www.cancer.gov/about-cancer/understanding/statistics
Luke, M., et al.: Imaging patterns predict patient survival and molecular subtype in glioblastoma via machine learning techniques. Neuro-Oncol. 18(3), 417–425 (2015)
Mingjie, Q., Pathak, J., Pereira, N.L., Zhai, C.: Temporal reflected logistic regression for probabilistic heart failure survival score prediction. In: 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 410–416. IEEE (2017)
Marshall, A.H., Hill, L.A., Kee, F.: Continuous dynamic bayesian networks for predicting survival of ischaemic heart disease patients. In: 2010 IEEE 23rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 178–183. IEEE (2010)
Zhang, H., Hung, C.-L., Chu, W.C.-C., Chiu, P.-F., Tang, C.Y.; Chronic kidney disease survival prediction with artificial neural networks. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1351–1356. IEEE (2018)
Bellot, A., Van der Schaar, M.: A hierarchical bayesian model for personalized survival predictions. IEEE J. Biomed. Health Inform. 23(1), 72–80 (2018)
Gray, E., Marti, J., Brewster, D.H., Wyatt, J.C., Hall, P.S.: Independent validation of the PREDICT breast cancer prognosis prediction tool in 45,789 patients using Scottish cancer registry data. Br. J. Cancer 119(7), 808–814 (2018)
Song, K., et al.: Can a nomogram help to predict the overall and cancer-specific survival of patients with chondrosarcoma? Clin. Orthop. Relat. Res. 476(5), 987 (2018)
Lynch Chip, M., et al.: Prediction of lung cancer patient survival via supervised machine learning classification techniques. Int. J. Med. Inform. 108, 1–8 (2017)
Said, A.A., Abd-Elmegid, L.A., Kholeif, S., Abdelsamie Gaber, A.: Classification based on clustering model for predicting main outcomes of breast cancer using hyper-parameters optimization. Int. J. Adv. Comput. Sci. Appl. 9(12), 268–273 (2018)
Bartholomai, J.A., Frieboes, H.B.; Lung cancer survival prediction via machine learning regression, classification, and statistical techniques. In: 2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), pp. 632–637. IEEE (2018)
Hegselmann, S., Gruelich, L., Varghese, J., Dugas, M.: Reproducible survival prediction with SEER cancer data. In: Machine Learning for Healthcare Conference, pp. 49–66 (2018)
Naghizadeh, M., Habibi, N.: A model to predict the survivability of cancer comorbidity through ensemble learning approach. Expert Syst. 36(3), e12392 (2019). Agrawal, Ankit, Sanchit Misra, Ramanathan Narayanan, Lalith Polepeddi, and Alok Choudhary. "A lung cancer outcome calculator using ensemble data mining on SEER data." In Proceedings of the tenth international workshop on data mining in bioinformatics, pp. 1–9. 2011
Dai, D., Jin, H., Wang, X.: Nomogram for predicting survival in triple-negative breast cancer patients with histology of infiltrating duct carcinoma: a population-based study. Am. J. Cancer Res. 8(8), 1576 (2018)
Imani, F., Chen, R., Tucker, C., Yang, H.: Random forest modeling for survival analysis of cancer recurrences. In: 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE), pp. 399–404. IEEE (2019)
Kleinlein, R., Riano, D.: Persistence of data-driven knowledge to predict breast cancer survival. Int. J. Med. Inform. 129, 303–311 (2019)
Shukla, N., Hagenbuchner, M., Win, K.T., Yang, J.: Breast cancer data analysis for survivability studies and prediction. Comput. Methods Program. Biomed. 155, 199–208 (2018)
SEER Program, National Cancer Institute (NCI): ‘SEER Incidence Data, 1975–2017’ (2019), Available: https://seer.cancer.gov/data/
McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5(4), 115–133 (1943)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. No. ICS-8506. California Univ San Diego La Jolla Inst for Cognitive Science (1985)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36(4), 193–202 (1980)
Lundberg, S.M., Lee, S.-L.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, pp. 4765–4774 (2017)
Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you? Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)
Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 3145–3153 (2017)
Štrumbelj, E., Kononenko, I.: Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 41(3), 647–665 (2013)
Lipovetsky, S., Conklin, M.: Analysis of regression in game theory approach. Appl. Stoch. Model. Bus. Ind. 17(4), 319–330 (2001)
Datta, A., Sen, S., Zick, Y.: Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 598–617. IEEE (2016)
Bach, S., Binder, A., Montavon, G., Klauschen, F., MĂ¼ller, K.-R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS ONE 10(7), e0130140 (2015)
Friedman, J.H.:Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)
Travis, E.O.: A guide to NumPy. Trelgol Publ (2006)
McKinney, W.: Data structures for statistical computing in python. In: Proceedings of the 9th Python in Science Conference, vol. 445, pp. 51–56 (2010)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265–283 (2016)
Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras
Hunter, J.D.: Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9(3), 90–95 (2007)
Harsha, N., Jenkins, S., Koch, P., Caruana, R.: Interpretml: a unified framework for machine learning interpretability. arXiv preprint arXiv:1909.09223 (2019)
Acknowledgement
The authors of this work would like to acknowledge the NSF I/UCRC Center for Healthcare Organization Transformation (CHOT), NSF I/UCRC award #1624727 and in part by Susan G. Komen Foundation for funding this research. Any opinions, findings, or conclusions found in this paper are those of the authors and do not necessarily reflect the views of the sponsors.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Doppalapudi, S., Yang, H., Jourquin, J., Qiu, R.G. (2021). Deep Learning and Prediction of Survival Period for Breast Cancer Patients. In: Qiu, R., Lyons, K., Chen, W. (eds) AI and Analytics for Smart Cities and Service Systems. ICSS 2021. Lecture Notes in Operations Research. Springer, Cham. https://doi.org/10.1007/978-3-030-90275-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-90275-9_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90274-2
Online ISBN: 978-3-030-90275-9
eBook Packages: Business and ManagementBusiness and Management (R0)