Supervised Machine Learning for Breast Cancer Risk Factors Analysis and Survival Prediction

Chtouki, Khaoula; Rhanoui, Maryem; Mikram, Mounia; Yousfi, Siham; Amazian, Kamelia

doi:10.1007/978-3-031-28387-1_6

Khaoula Chtouki¹⁴,
Maryem Rhanoui¹⁴,
Mounia Mikram¹⁴,
Siham Yousfi¹⁴ &
…
Kamelia Amazian¹⁵

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 625))

Included in the following conference series:

International Conference On Big Data and Internet of Things

262 Accesses
2 Citations

Abstract

The choice of the most effective treatment may eventually be influenced by the breast cancer survival prediction. For the purpose of predicting the chances of a patient surviving, a variety of techniques were employed, such as statistical, machine learning, and deep learning models. In the current study, 1904 patient records from the METABRIC dataset were utilized to predict a 5-year breast cancer survival using a machine learning approach. In this study, we compare the outcomes of seven classification model to evaluate how well they perform using the following metrics: recall, AUC, confusion matrix, accuracy, precision, false positive rate, and true positive rate. The findings demonstrate that the classifiers for Logistic Regression (LR), Support Vector Machines (SVM), Decision Tree (DT), Random Forest (RD), Extremely Randomized Trees (ET), K-Nearest Neighbor (KNN), and Adaptive Boosting (AdaBoost) can accurately predict the survival rate of the tested samples, which is 75,4%, 74,7%, 71,5%, 75,5%, 70,3%, and 78%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Yu, K.-H., Beam, A.L., Kohane, I.S.: Artificial intelligence in healthcare. Nat. Biomed. Eng. 2(10), 719–731 (2018)
Article Google Scholar
Harnoune, A., Rhanoui, M., Mikram, M., Yousfi, S., Elkaimbillah, Z., El Asri, B.: Bert based clinical knowledge extraction for biomedical knowledge graph construction and analysis. Comput. Methods Programs Biomed. Update 1, 100042 (2021)
Article Google Scholar
Thiébaut, R., Thiessard, F., et al.: Artificial intelligence in public health and epidemiology. Yearb. Med. Inform. 27(01), 207–210 (2018)
Article Google Scholar
Mikram, M., Moujahdi, C., Rhanoui, M., Meddad, M., Khallout, A.: Hybrid deep learning models for diabetic retinopathy classification. In: Lazaar, M., Duvallet, C., Touhafi, A., Al Achhab, M. (eds.) Proceedings of the 5th International Conference on Big Data and Internet of Things. BDIoT 2021. Lecture Notes in Networks and Systems, vol. 489, pp. 167–178. Springer, Cham (2021). https://doi.org/10.1007/978-3-031-07969-6_13
Abdoul-Razak, A.B., Mikram, M., Rhanoui, M., Ghouzali, S.: Hybrid machine and deep transfer learning based classification models for COVID 19 and Pneumonia diagnosis using X-ray images. In: Maleh, Y., Alazab, M., Gherabi, N., Tawalbeh, L., Abd El-Latif, A.A. (eds.) ICI2C 2021. LNNS, vol. 357, pp. 403–413. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-91738-8_37
Chapter Google Scholar
Al-shamasneh, A.R.M., Obaidellah, U.H.B.: Artificial intelligence techniques for cancer detection and classification: review study. Eur. Sci. J. 13(3), 342–370 (2017)
Google Scholar
Ferlay, J., et al.: Cancer statistics for the year 2020: an overview. Int. J. Cancer 149(4), 778–789 (2021)
Article Google Scholar
Sung, H., et al.: Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a Cancer J. Clin. 71(3), 209-249 (2021)
Google Scholar
Organization, W.H., et al.: Who report on cancer: setting priorities, investing wisely and providing care for all (2020)
Google Scholar
Mostavi, M., Chiu, Y.-C., Huang, Y., Chen, Y.: Convolutional neural network models for cancer type prediction based on gene expression. BMC Med. Genomics 13(5), 1–13 (2020)
Google Scholar
Kalafi, E., Nor, N., Taib, N., Ganggayah, M., Town, C., Dhillon, S.: Machine learning and deep learning approaches in breast cancer survival prediction using clinical data. Folia Biol. 65(5/6), 212–220 (2019)
Google Scholar
Shinde, P.P., Shah, S.: A review of machine learning and deep learning applications. In: 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), pp. 1-6 (2018). IEEE
Google Scholar
Bellinger, C., Mohomed Jabbar, M.S., Zaïane, O., Osornio-Vargas, A.: A systematic review of data mining and machine learning for air pollution epidemiology. BMC Public Health 17(1), 1–19 (2017). https://doi.org/10.1186/s12889-017-4914-3
Article Google Scholar
Maharana, A., Nsoesie, E.O.: Use of deep learning to examine the association of the built environment with prevalence of neighborhood adult obesity. JAMA Netw. Open 1(4), 181535 (2018)
Article Google Scholar
Anno, S., et al.: Spatiotemporal dengue fever hotspots associated with climatic factors in Taiwan including outbreak predictions based on machine-learning. Geospatial Health 14(2) (2019)
Google Scholar
Jain, V.K., Kumar, S.: Effective surveillance and predictive mapping of mosquito-borne diseases using social media. J. Comput. Sci. 25, 406–415 (2018)
Article Google Scholar
Montazeri, M., Montazeri, M., Montazeri, M., Beigzadeh, A.: Machine learning models in breast cancer survival prediction. Technol. Health Care 24(1), 31–42 (2016)
Article Google Scholar
Ganggayah, M.D., Taib, N.A., Har, Y.C., Lio, P., Dhillon, S.K.: Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med. Inform. Decis. Mak. 19(1), 1–17 (2019)
Article Google Scholar
Curtis, C., et al.: The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486(7403), 346–352 (2012)
Article Google Scholar
Refaeilzadeh, P., Tang, L., Liu, H.: Cross-validation. Encycl. Database Syst. 5, 532–538 (2009)
Article Google Scholar
Dreiseitl, S., Ohno-Machado, L.: Logistic regression and artificial neural network classification models: a methodology review. J. Biomed. Inform. 35(5–6), 352–359 (2002)
Article Google Scholar
Kurt, I., Ture, M., Kurum, A.T.: Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Syst. Appl. 34(1), 366–374 (2008)
Article Google Scholar
Hosmer, D.W., Jr., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, vol. 398. John Wiley & Sons, New York (2013)
Book MATH Google Scholar
Menard, S.: Applied Logistic Regression Analysis, vol. 106. Sage, Newcastle upon Tyne (2002)
Book Google Scholar
Noble, W.S.: What is a support vector machine? Nat. Biotechnol. 24(12), 1565–1567 (2006)
Article Google Scholar
Thissen, U., Van Brakel, R., De Weijer, A., Melssen, W., Buydens, L.: Using support vector machines for time series prediction. Chemom. Intell. Lab. Syst. 69(1–2), 35–49 (2003)
Article Google Scholar
Song, Y.-Y., Ying, L.: Decision tree methods: applications for classification and prediction. Shanghai Arch. Psychiatry 27(2), 130 (2015)
Google Scholar
Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P., Feuston, B.P.: Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43(6), 1947–1958 (2003)
Article Google Scholar
Speiser, J.L., Miller, M.E., Tooze, J., Ip, E.: A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 134, 93–101 (2019)
Article Google Scholar
Dhananjay, B., Venkatesh, N.P., Bhardwaj, A., Sivaraman, J.: Cardiac signals classification based on extra trees model, pp. 402-406. IEEE (2021)
Google Scholar
Peterson, L.E.: K-nearest neighbor. Scholarpedia 4(2), 1883 (2009)
Article Google Scholar
Mucherino, A., Papajorgji, P.J., Pardalos, P.M.: K-nearest neighbor classification, pp. 83-106 (2009)
Google Scholar
Schapire, R.E.: Explaining AdaBoost. In: Schölkopf, B., Luo, Z., Vovk, V. (eds.) Empirical Inference, pp. 37–52. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41136-6_5
Chapter Google Scholar
Ying, C., Qi-Guang, M., Jia-Chen, L., Lin, G.: Advance and prospects of AdaBoost algorithm. Acta Automatica Sinica 39(6), 745–758 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Meridian Team, LYRICA Laboratory, School of Information Sciences, Rabat, Morocco
Khaoula Chtouki, Maryem Rhanoui, Mounia Mikram & Siham Yousfi
Faculty of Medicine and Pharmacy, Laboratoire Pathologie Humaine, Biomédecine et Environnement, Fez, Morocco
Kamelia Amazian

Authors

Khaoula Chtouki
View author publications
You can also search for this author in PubMed Google Scholar
Maryem Rhanoui
View author publications
You can also search for this author in PubMed Google Scholar
Mounia Mikram
View author publications
You can also search for this author in PubMed Google Scholar
Siham Yousfi
View author publications
You can also search for this author in PubMed Google Scholar
Kamelia Amazian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Khaoula Chtouki .

Editor information

Editors and Affiliations

ENSIAS, Mohammed V University, Rabat, Morocco
Mohamed Lazaar
FST, Abdelmalek Essaâdi University, Tangier, Morocco
El Mokhtar En-Naimi
FST, Abdelmalek Essaâdi University, Tangier, Morocco
Abdelhamid Zouhair
ENSA, Abdelmalek Essaâdi University, Tetuan, Morocco
Mohammed Al Achhab
ENSA, Abdelmalek Essaadi University, Tetouan, Morocco
Oussama Mahboub

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chtouki, K., Rhanoui, M., Mikram, M., Yousfi, S., Amazian, K. (2023). Supervised Machine Learning for Breast Cancer Risk Factors Analysis and Survival Prediction. In: Lazaar, M., En-Naimi, E.M., Zouhair, A., Al Achhab, M., Mahboub, O. (eds) Proceedings of the 6th International Conference on Big Data and Internet of Things. BDIoT 2022. Lecture Notes in Networks and Systems, vol 625. Springer, Cham. https://doi.org/10.1007/978-3-031-28387-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-28387-1_6
Published: 29 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28386-4
Online ISBN: 978-3-031-28387-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Supervised Machine Learning for Breast Cancer Risk Factors Analysis and Survival Prediction