Skip to main content

Supervised Machine Learning for Breast Cancer Risk Factors Analysis and Survival Prediction

  • Conference paper
  • First Online:
Proceedings of the 6th International Conference on Big Data and Internet of Things (BDIoT 2022)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 625))

Included in the following conference series:

Abstract

The choice of the most effective treatment may eventually be influenced by the breast cancer survival prediction. For the purpose of predicting the chances of a patient surviving, a variety of techniques were employed, such as statistical, machine learning, and deep learning models. In the current study, 1904 patient records from the METABRIC dataset were utilized to predict a 5-year breast cancer survival using a machine learning approach. In this study, we compare the outcomes of seven classification model to evaluate how well they perform using the following metrics: recall, AUC, confusion matrix, accuracy, precision, false positive rate, and true positive rate. The findings demonstrate that the classifiers for Logistic Regression (LR), Support Vector Machines (SVM), Decision Tree (DT), Random Forest (RD), Extremely Randomized Trees (ET), K-Nearest Neighbor (KNN), and Adaptive Boosting (AdaBoost) can accurately predict the survival rate of the tested samples, which is 75,4%, 74,7%, 71,5%, 75,5%, 70,3%, and 78%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yu, K.-H., Beam, A.L., Kohane, I.S.: Artificial intelligence in healthcare. Nat. Biomed. Eng. 2(10), 719–731 (2018)

    Article  Google Scholar 

  2. Harnoune, A., Rhanoui, M., Mikram, M., Yousfi, S., Elkaimbillah, Z., El Asri, B.: Bert based clinical knowledge extraction for biomedical knowledge graph construction and analysis. Comput. Methods Programs Biomed. Update 1, 100042 (2021)

    Article  Google Scholar 

  3. Thiébaut, R., Thiessard, F., et al.: Artificial intelligence in public health and epidemiology. Yearb. Med. Inform. 27(01), 207–210 (2018)

    Article  Google Scholar 

  4. Mikram, M., Moujahdi, C., Rhanoui, M., Meddad, M., Khallout, A.: Hybrid deep learning models for diabetic retinopathy classification. In: Lazaar, M., Duvallet, C., Touhafi, A., Al Achhab, M. (eds.) Proceedings of the 5th International Conference on Big Data and Internet of Things. BDIoT 2021. Lecture Notes in Networks and Systems, vol. 489, pp. 167–178. Springer, Cham (2021). https://doi.org/10.1007/978-3-031-07969-6_13

  5. Abdoul-Razak, A.B., Mikram, M., Rhanoui, M., Ghouzali, S.: Hybrid machine and deep transfer learning based classification models for COVID 19 and Pneumonia diagnosis using X-ray images. In: Maleh, Y., Alazab, M., Gherabi, N., Tawalbeh, L., Abd El-Latif, A.A. (eds.) ICI2C 2021. LNNS, vol. 357, pp. 403–413. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-91738-8_37

    Chapter  Google Scholar 

  6. Al-shamasneh, A.R.M., Obaidellah, U.H.B.: Artificial intelligence techniques for cancer detection and classification: review study. Eur. Sci. J. 13(3), 342–370 (2017)

    Google Scholar 

  7. Ferlay, J., et al.: Cancer statistics for the year 2020: an overview. Int. J. Cancer 149(4), 778–789 (2021)

    Article  Google Scholar 

  8. Sung, H., et al.: Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a Cancer J. Clin. 71(3), 209-249 (2021)

    Google Scholar 

  9. Organization, W.H., et al.: Who report on cancer: setting priorities, investing wisely and providing care for all (2020)

    Google Scholar 

  10. Mostavi, M., Chiu, Y.-C., Huang, Y., Chen, Y.: Convolutional neural network models for cancer type prediction based on gene expression. BMC Med. Genomics 13(5), 1–13 (2020)

    Google Scholar 

  11. Kalafi, E., Nor, N., Taib, N., Ganggayah, M., Town, C., Dhillon, S.: Machine learning and deep learning approaches in breast cancer survival prediction using clinical data. Folia Biol. 65(5/6), 212–220 (2019)

    Google Scholar 

  12. Shinde, P.P., Shah, S.: A review of machine learning and deep learning applications. In: 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), pp. 1-6 (2018). IEEE

    Google Scholar 

  13. Bellinger, C., Mohomed Jabbar, M.S., Zaïane, O., Osornio-Vargas, A.: A systematic review of data mining and machine learning for air pollution epidemiology. BMC Public Health 17(1), 1–19 (2017). https://doi.org/10.1186/s12889-017-4914-3

    Article  Google Scholar 

  14. Maharana, A., Nsoesie, E.O.: Use of deep learning to examine the association of the built environment with prevalence of neighborhood adult obesity. JAMA Netw. Open 1(4), 181535 (2018)

    Article  Google Scholar 

  15. Anno, S., et al.: Spatiotemporal dengue fever hotspots associated with climatic factors in Taiwan including outbreak predictions based on machine-learning. Geospatial Health 14(2) (2019)

    Google Scholar 

  16. Jain, V.K., Kumar, S.: Effective surveillance and predictive mapping of mosquito-borne diseases using social media. J. Comput. Sci. 25, 406–415 (2018)

    Article  Google Scholar 

  17. Montazeri, M., Montazeri, M., Montazeri, M., Beigzadeh, A.: Machine learning models in breast cancer survival prediction. Technol. Health Care 24(1), 31–42 (2016)

    Article  Google Scholar 

  18. Ganggayah, M.D., Taib, N.A., Har, Y.C., Lio, P., Dhillon, S.K.: Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med. Inform. Decis. Mak. 19(1), 1–17 (2019)

    Article  Google Scholar 

  19. Curtis, C., et al.: The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486(7403), 346–352 (2012)

    Article  Google Scholar 

  20. Refaeilzadeh, P., Tang, L., Liu, H.: Cross-validation. Encycl. Database Syst. 5, 532–538 (2009)

    Article  Google Scholar 

  21. Dreiseitl, S., Ohno-Machado, L.: Logistic regression and artificial neural network classification models: a methodology review. J. Biomed. Inform. 35(5–6), 352–359 (2002)

    Article  Google Scholar 

  22. Kurt, I., Ture, M., Kurum, A.T.: Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Syst. Appl. 34(1), 366–374 (2008)

    Article  Google Scholar 

  23. Hosmer, D.W., Jr., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, vol. 398. John Wiley & Sons, New York (2013)

    Book  MATH  Google Scholar 

  24. Menard, S.: Applied Logistic Regression Analysis, vol. 106. Sage, Newcastle upon Tyne (2002)

    Book  Google Scholar 

  25. Noble, W.S.: What is a support vector machine? Nat. Biotechnol. 24(12), 1565–1567 (2006)

    Article  Google Scholar 

  26. Thissen, U., Van Brakel, R., De Weijer, A., Melssen, W., Buydens, L.: Using support vector machines for time series prediction. Chemom. Intell. Lab. Syst. 69(1–2), 35–49 (2003)

    Article  Google Scholar 

  27. Song, Y.-Y., Ying, L.: Decision tree methods: applications for classification and prediction. Shanghai Arch. Psychiatry 27(2), 130 (2015)

    Google Scholar 

  28. Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P., Feuston, B.P.: Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43(6), 1947–1958 (2003)

    Article  Google Scholar 

  29. Speiser, J.L., Miller, M.E., Tooze, J., Ip, E.: A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 134, 93–101 (2019)

    Article  Google Scholar 

  30. Dhananjay, B., Venkatesh, N.P., Bhardwaj, A., Sivaraman, J.: Cardiac signals classification based on extra trees model, pp. 402-406. IEEE (2021)

    Google Scholar 

  31. Peterson, L.E.: K-nearest neighbor. Scholarpedia 4(2), 1883 (2009)

    Article  Google Scholar 

  32. Mucherino, A., Papajorgji, P.J., Pardalos, P.M.: K-nearest neighbor classification, pp. 83-106 (2009)

    Google Scholar 

  33. Schapire, R.E.: Explaining AdaBoost. In: Schölkopf, B., Luo, Z., Vovk, V. (eds.) Empirical Inference, pp. 37–52. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41136-6_5

    Chapter  Google Scholar 

  34. Ying, C., Qi-Guang, M., Jia-Chen, L., Lin, G.: Advance and prospects of AdaBoost algorithm. Acta Automatica Sinica 39(6), 745–758 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khaoula Chtouki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chtouki, K., Rhanoui, M., Mikram, M., Yousfi, S., Amazian, K. (2023). Supervised Machine Learning for Breast Cancer Risk Factors Analysis and Survival Prediction. In: Lazaar, M., En-Naimi, E.M., Zouhair, A., Al Achhab, M., Mahboub, O. (eds) Proceedings of the 6th International Conference on Big Data and Internet of Things. BDIoT 2022. Lecture Notes in Networks and Systems, vol 625. Springer, Cham. https://doi.org/10.1007/978-3-031-28387-1_6

Download citation

Publish with us

Policies and ethics