Skip to main content

Breast Cancer Survival Prediction Using Machine Learning

  • Chapter
  • First Online:
Computational Intelligence in Oncology

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1016))

  • 623 Accesses

Abstract

Breast cancer is one of the most prevalent cancers in women, and recent breakthroughs in data mining have provided more insight into the disease and its prognosis. We offer a set of machine learning models for predicting breast cancer survival in this chapter. There are 272 occurrences, 1564 characteristics, and a target variable in the original data, which were obtained from the data world Website (195 not survived, 77 survived). Out of 1564 features, the top ten features were selected using the extreme gradient boosting (XGB) method. 5-fold and 10-fold stratified cross-validation were used to extract the average results and accuracy for both methods is compared. The outcomes of six machine learning classifiers were compared and rated using a variety of statistical rates (accuracy, precision, true positive rate, true negative rate, F1-score, ROC–AUC score). We offer random forest (RFC) and XGB as the top classifiers after evaluating the models, with overall testing accuracy of 78% and 77.2%, respectively. However, all the classifiers performed well in predicting label 0 (high true negative rate) as compared to label 1 (low true positive rate).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sharma, G. N., Dave, R., Sanadya, J., Sharma, P., & Sharma, K. K. (2010, April). Various types and management of breast cancer: An overview. Journal of Advanced Pharmaceutical Technology & Research [Internet] [cited 2021 Aug 31] 1(2), 109. Available from: /pmc/articles/PMC3255438/

    Google Scholar 

  2. Boyle, P. (2012, August 1). Triple-negative breast cancer: Epidemiological considerations and recommendations. Annals of Oncology [Internet] [cited 2021 Aug 31], 23(SUPPL. 6), vi7–12. Available from: http://www.annalsofoncology.org/article/S0923753419376355/fulltext

  3. Vickers, A. J., & Cronin, A. M. (2010). Traditional statistical methods for evaluating prediction models are uninformative as to clinical value: Towards a decision analytic framework. Seminars in Oncology., 37(1), 31–38.

    Article  Google Scholar 

  4. Feng, Y., Spezia, M., Huang, S., Yuan, C., Zeng, Z., Zhang, L., et al. (2018). Breast cancer development and progression: Risk factors, cancer stem cells, signaling pathways, genomics, and molecular pathogenesis [Internet]. In Genes and diseases. Chongqing yi ke da xue, di 2 lin chuang xue yuan Bing du xing gan yan yan jiu suo [cited 2021 Mar 7] (Vol. 5, pp. 77–106). Available from: /pmc/articles/PMC6147049/

    Google Scholar 

  5. Qazi, S., Raza, K., & Iqbal, N. (2021). Artificial intelligence in medicine (AIM): Machine learning in cancer diagnosis, prognosis and therapy. Artificial Intelligence for Data-Driven Medical Diagnosis, 10, 103–126.

    Article  Google Scholar 

  6. Jabeen, A., Ahmad, N., & Raza, K. (2018). Machine learning-based state-of-the-art methods for the classification of RNA-seq data. Lecture Notes in Computational Vision and Biomechanics [Internet], 26, 133–172. Available from: https://link.springer.com/chapter/https://doi.org/10.1007/978-3-319-65981-7_6

  7. Raza, K. (2019). Improving the prediction accuracy of heart disease with ensemble learning and majority voting rule. U-Healthcare Monitoring Systems, 1, 179–196.

    Google Scholar 

  8. Kim, J.-Y., Lee, Y. S., Yu, J., Park, Y., Lee, S. K., Lee, M., et al. (2021). Deep learning-based prediction model for breast cancer recurrence using adjuvant breast cancer cohort in tertiary cancer center registry. Frontiers in Oncology, 4, 655.

    Google Scholar 

  9. Ganggayah, M. D., Taib, N. A., Har, Y. C., Lio, P., & Dhillon, S. K. (2019, March 22). Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Medical Informatics and Decision Making [Internet]. [cited 2021 Mar 7], 19(1), 48. Available from: https://bmcmedinformdecismak.biomedcentral.com/articles/https://doi.org/10.1186/s12911-019-0801-4

  10. Genuer, R., Poggi, J.-M., & Tuleau-Malot, C. VSURF: An R package for variable selection using random forests [cited 2021 Aug 31]. Available from: http://CRAN.R-project.org/package=VSURF

  11. Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V., & Fotiadis, D. I. (2015, January 1). Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal, 13, 8–17.

    Google Scholar 

  12. Ming, C., Viassolo, V., Probst-Hensch, N., Dinov, I. D., Chappuis, P. O., & Katapodi, M. C. (2020, June 22). Machine learning-based lifetime breast cancer risk reclassification compared with the BOADICEA model: Impact on screening recommendations. British Journal of Cancer 2020 [Internet]. [cited 2021 Aug 31], 123(5), 860–867. Available from: https://www.nature.com/articles/s41416-020-0937-0

  13. Montazeri, M., Montazeri, M., Montazeri, M., & Beigzadeh, A. (2016). Machine learning models in breast cancer survival prediction. Technology and Health Care., 24(1), 31–42.

    Article  Google Scholar 

  14. Sekeroglu, B., & Tuncal, K. (2021, January 28). Prediction of cancer incidence rates for the European continent using machine learning models [Internet] [cited 2021 Aug 31], 27(1). Available from: https://journals.sagepub.com/doi/full/https://doi.org/10.1177/1460458220983878

  15. O’Lorcain, P., Deady, S., & Comber, H. (2006, June 1). Mortality predictions for colon and anorectal cancer for Ireland, 2003–17. Colorectal Disease [Internet] [cited 2021 Aug 31], 8(5), 393–401. Available from: https://onlinelibrary.wiley.com/doi/full/https://doi.org/10.1111/j.1463-1318.2006.00951.x

  16. Ganggayah, M. D., Taib, N. A., Har, Y. C., Lio, P., & Dhillon, S. K. (2019, March 22). Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Medical Informatics and Decision Making, 19(1).

    Google Scholar 

  17. Gupta, S., Tran, T., Luo, W., Phung, D., Kennedy, R. L., Broad, A., et al. (2014, March 1). Machine-learning prediction of cancer survival: A retrospective study using electronic administrative records and a cancer registry. BMJ Open [Internet]. [cited 2021 Aug 31], 4(3), e004007. Available from: https://bmjopen.bmj.com/content/4/3/e004007

  18. Chang, C.-M., Su, Y.-C., Lai, N.-S., Huang, K.-Y., Chien, S.-H., Chang, Y.-H., et al. (2012, August 30). The combined effect of individual and neighborhood socioeconomic status on cancer survival rates. PLOS ONE [Internet] [cited 2021 Aug 31], 7(8), e44325. Available from: https://journals.plos.org/plosone/article?id=https://doi.org/10.1371/journal.pone.0044325

  19. Woojae, K., Ku Sang, K., Jeong Eon, L., Don-Yong, N., Sung-Won, K., Yong Sik, J., et al. (2012, June). Development of novel breast cancer recurrence prediction model using support vector machine. Journal of breast cancer [Internet]. [cited 2021 Aug 31], 15(2), 230–238. Available from: https://pubmed.ncbi.nlm.nih.gov/22807942/

  20. Manilitch, E. A., Kiran, R. P., Tomas, R., Ian, L., Fazio, V. W., & Remzi, F. H. (2011). A novel data-driven prognostic model for staging of colorectal cancer. Journal of the American College of Surgeons [Internet] [cited 2021 Aug 31], 213(5), 579–588.e2. Available from: https://pubmed.ncbi.nlm.nih.gov/21925905/

  21. Keogh, E., & Mueen, A. (2017). Curse of dimensionality. In: Encyclopedia of machine learning and data mining [Internet]. Springer US; [cited 2021 Jan 13], pp. 314, 315. Available from: https://link.springer.com/referenceworkentry/https://doi.org/10.1007/978-1-4899-7687-1_192

  22. Albattah W, Khan RU, Khan K (2020, July 17). Attributes reduction in big data. Applied Sciences [Internet] [cited 2021 Jan 13], 10(14), 4901. Available from: https://www.mdpi.com/2076-3417/10/14/4901

  23. Liu, L., Yu, Y., Fei, Z., Li, M., Wu, F.-X., Li, H.-D., et al. (2018, November 22). An interpretable boosting model to predict side effects of analgesics for osteoarthritis. BMC Systems Biology [Internet]. [cited 2021 Jul 14], 12(6), 29–38. Available from: https://bmcsystbiol.biomedcentral.com/articles/https://doi.org/10.1186/s12918-018-0624-4

  24. Krstajic, D., Buturovic, L. J., Leahy, D. E., & Thomas, S. (2014, March 29). Cross-validation pitfalls when selecting and assessing regression and classification models. Journal of Cheminformatics 2014 [Internet] [cited 2021 Aug 31], 6(1), 1–15. Available from: https://jcheminf.biomedcentral.com/articles/https://doi.org/10.1186/1758-2946-6-10

  25. Battineni, G., Chintalapudi, N., & Amenta, F. (2019, January 1). Machine learning in medicine: Performance calculation of dementia prediction by support vector machines (SVM). Informatics in Medicine Unlocked, 16, 100200.

    Google Scholar 

  26. Fahidy, T. Z. (2011). Some applications of Bayes’ rule in probability theory to electrocatalytic reaction engineering. International Journal of Electrochemistry., 2011, 1–5.

    Article  Google Scholar 

  27. Haury, A.-C., Gestraud, P., & Vert, J.-P. (2011, December 21). The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. In M.-T. Teh (Ed.) PLoS ONE [Internet] [cited 2021 Feb 11], 6(12), e28210. Available from: https://dx.plos.org/https://doi.org/10.1371/journal.pone.0028210

  28. Lai, C., Reinders, M. J. T., van’t Veer, L. J., Wessels, L. F. A. (2006, May 2). A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets. BMC Bioinformatics [Internet] [cited 2021 Feb 11], 7(1):235. Available from: http://bmcbioinformatics.biomedcentral.com/articles/https://doi.org/10.1186/1471-2105-7-235

  29. Tyagi, A., Tiwari, P., Bhardwaj, P., & Chawla, H. (2021, October 6). Prognosis of sexual dimorphism with unfused hyoid bone: Artificial intelligence informed decision making with discriminant analysis. Science & Justice [Internet] [cited 2021 Oct 18]. Available from: https://linkinghub.elsevier.com/retrieve/pii/S1355030621001283

  30. Futreal, P. A., Coin, L., Marshall, M., Down, T., Hubbard, T., Wooster, R., et al. (2004). A census of human cancer genes. Nature Reviews Cancer [Internet] [cited 2021 Sep 2], 4(3), 177–183. Available from: https://www.nature.com/articles/nrc1299

  31. Nicolau, M., Levine, A. J., & Carlsson, G. (2011, April 26). Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. In Proceedings of the national academy of sciences [Internet] [cited 2021 Sep 2], 108(17), 7265–7270. Available from: https://www.pnas.org/content/108/17/7265

  32. Wang, R. (2012). AdaBoost for feature selection, classification and its relation with SVM. A Review. Physics Procedia, 1(25), 800–807.

    Article  Google Scholar 

  33. Kalafi, E. Y., Nor, M., Taib, N. A., Ganggayah, M. D., Town, C., Dhillon, S. K., et al. (2019). Original article machine learning and deep learning approaches in breast cancer survival prediction using clinical data (breast cancer/survival prediction/deep learning/machine learning) (Vol. 65), Folia Biologica (Praha).

    Google Scholar 

  34. Boeri, C., Chiappa, C., Galli, F., de Berardinis, V., Bardelli, L., Carcano, G., et al. (2020, May 10). Machine learning techniques in breast cancer prognosis prediction: A primary evaluation. Cancer Medicine [Internet] [cited 2021 Mar 7], 9(9), 3234–3243. Available from: https://onlinelibrary.wiley.com/doi/abs/https://doi.org/10.1002/cam4.2811

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Tiwari, P., Bhardwaj, P., Keprate, A., Tyagi, A. (2022). Breast Cancer Survival Prediction Using Machine Learning. In: Raza, K. (eds) Computational Intelligence in Oncology. Studies in Computational Intelligence, vol 1016. Springer, Singapore. https://doi.org/10.1007/978-981-16-9221-5_8

Download citation

Publish with us

Policies and ethics