Abstract
The choice of the most effective treatment may eventually be influenced by the breast cancer survival prediction. For the purpose of predicting the chances of a patient surviving, a variety of techniques were employed, such as statistical, machine learning, and deep learning models. In the current study, 1904 patient records from the METABRIC dataset were utilized to predict a 5-year breast cancer survival using a machine learning approach. In this study, we compare the outcomes of seven classification model to evaluate how well they perform using the following metrics: recall, AUC, confusion matrix, accuracy, precision, false positive rate, and true positive rate. The findings demonstrate that the classifiers for Logistic Regression (LR), Support Vector Machines (SVM), Decision Tree (DT), Random Forest (RD), Extremely Randomized Trees (ET), K-Nearest Neighbor (KNN), and Adaptive Boosting (AdaBoost) can accurately predict the survival rate of the tested samples, which is 75,4%, 74,7%, 71,5%, 75,5%, 70,3%, and 78%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yu, K.-H., Beam, A.L., Kohane, I.S.: Artificial intelligence in healthcare. Nat. Biomed. Eng. 2(10), 719–731 (2018)
Harnoune, A., Rhanoui, M., Mikram, M., Yousfi, S., Elkaimbillah, Z., El Asri, B.: Bert based clinical knowledge extraction for biomedical knowledge graph construction and analysis. Comput. Methods Programs Biomed. Update 1, 100042 (2021)
Thiébaut, R., Thiessard, F., et al.: Artificial intelligence in public health and epidemiology. Yearb. Med. Inform. 27(01), 207–210 (2018)
Mikram, M., Moujahdi, C., Rhanoui, M., Meddad, M., Khallout, A.: Hybrid deep learning models for diabetic retinopathy classification. In: Lazaar, M., Duvallet, C., Touhafi, A., Al Achhab, M. (eds.) Proceedings of the 5th International Conference on Big Data and Internet of Things. BDIoT 2021. Lecture Notes in Networks and Systems, vol. 489, pp. 167–178. Springer, Cham (2021). https://doi.org/10.1007/978-3-031-07969-6_13
Abdoul-Razak, A.B., Mikram, M., Rhanoui, M., Ghouzali, S.: Hybrid machine and deep transfer learning based classification models for COVID 19 and Pneumonia diagnosis using X-ray images. In: Maleh, Y., Alazab, M., Gherabi, N., Tawalbeh, L., Abd El-Latif, A.A. (eds.) ICI2C 2021. LNNS, vol. 357, pp. 403–413. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-91738-8_37
Al-shamasneh, A.R.M., Obaidellah, U.H.B.: Artificial intelligence techniques for cancer detection and classification: review study. Eur. Sci. J. 13(3), 342–370 (2017)
Ferlay, J., et al.: Cancer statistics for the year 2020: an overview. Int. J. Cancer 149(4), 778–789 (2021)
Sung, H., et al.: Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a Cancer J. Clin. 71(3), 209-249 (2021)
Organization, W.H., et al.: Who report on cancer: setting priorities, investing wisely and providing care for all (2020)
Mostavi, M., Chiu, Y.-C., Huang, Y., Chen, Y.: Convolutional neural network models for cancer type prediction based on gene expression. BMC Med. Genomics 13(5), 1–13 (2020)
Kalafi, E., Nor, N., Taib, N., Ganggayah, M., Town, C., Dhillon, S.: Machine learning and deep learning approaches in breast cancer survival prediction using clinical data. Folia Biol. 65(5/6), 212–220 (2019)
Shinde, P.P., Shah, S.: A review of machine learning and deep learning applications. In: 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), pp. 1-6 (2018). IEEE
Bellinger, C., Mohomed Jabbar, M.S., Zaïane, O., Osornio-Vargas, A.: A systematic review of data mining and machine learning for air pollution epidemiology. BMC Public Health 17(1), 1–19 (2017). https://doi.org/10.1186/s12889-017-4914-3
Maharana, A., Nsoesie, E.O.: Use of deep learning to examine the association of the built environment with prevalence of neighborhood adult obesity. JAMA Netw. Open 1(4), 181535 (2018)
Anno, S., et al.: Spatiotemporal dengue fever hotspots associated with climatic factors in Taiwan including outbreak predictions based on machine-learning. Geospatial Health 14(2) (2019)
Jain, V.K., Kumar, S.: Effective surveillance and predictive mapping of mosquito-borne diseases using social media. J. Comput. Sci. 25, 406–415 (2018)
Montazeri, M., Montazeri, M., Montazeri, M., Beigzadeh, A.: Machine learning models in breast cancer survival prediction. Technol. Health Care 24(1), 31–42 (2016)
Ganggayah, M.D., Taib, N.A., Har, Y.C., Lio, P., Dhillon, S.K.: Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med. Inform. Decis. Mak. 19(1), 1–17 (2019)
Curtis, C., et al.: The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486(7403), 346–352 (2012)
Refaeilzadeh, P., Tang, L., Liu, H.: Cross-validation. Encycl. Database Syst. 5, 532–538 (2009)
Dreiseitl, S., Ohno-Machado, L.: Logistic regression and artificial neural network classification models: a methodology review. J. Biomed. Inform. 35(5–6), 352–359 (2002)
Kurt, I., Ture, M., Kurum, A.T.: Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Syst. Appl. 34(1), 366–374 (2008)
Hosmer, D.W., Jr., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, vol. 398. John Wiley & Sons, New York (2013)
Menard, S.: Applied Logistic Regression Analysis, vol. 106. Sage, Newcastle upon Tyne (2002)
Noble, W.S.: What is a support vector machine? Nat. Biotechnol. 24(12), 1565–1567 (2006)
Thissen, U., Van Brakel, R., De Weijer, A., Melssen, W., Buydens, L.: Using support vector machines for time series prediction. Chemom. Intell. Lab. Syst. 69(1–2), 35–49 (2003)
Song, Y.-Y., Ying, L.: Decision tree methods: applications for classification and prediction. Shanghai Arch. Psychiatry 27(2), 130 (2015)
Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P., Feuston, B.P.: Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43(6), 1947–1958 (2003)
Speiser, J.L., Miller, M.E., Tooze, J., Ip, E.: A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 134, 93–101 (2019)
Dhananjay, B., Venkatesh, N.P., Bhardwaj, A., Sivaraman, J.: Cardiac signals classification based on extra trees model, pp. 402-406. IEEE (2021)
Peterson, L.E.: K-nearest neighbor. Scholarpedia 4(2), 1883 (2009)
Mucherino, A., Papajorgji, P.J., Pardalos, P.M.: K-nearest neighbor classification, pp. 83-106 (2009)
Schapire, R.E.: Explaining AdaBoost. In: Schölkopf, B., Luo, Z., Vovk, V. (eds.) Empirical Inference, pp. 37–52. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41136-6_5
Ying, C., Qi-Guang, M., Jia-Chen, L., Lin, G.: Advance and prospects of AdaBoost algorithm. Acta Automatica Sinica 39(6), 745–758 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chtouki, K., Rhanoui, M., Mikram, M., Yousfi, S., Amazian, K. (2023). Supervised Machine Learning for Breast Cancer Risk Factors Analysis and Survival Prediction. In: Lazaar, M., En-Naimi, E.M., Zouhair, A., Al Achhab, M., Mahboub, O. (eds) Proceedings of the 6th International Conference on Big Data and Internet of Things. BDIoT 2022. Lecture Notes in Networks and Systems, vol 625. Springer, Cham. https://doi.org/10.1007/978-3-031-28387-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-28387-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28386-4
Online ISBN: 978-3-031-28387-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)