Abstract
Classification algorithms are very helpful for the healthcare sector as they help in detecting a disease at an early stage which helps to give the required treatment to the patients in a timely manner. Machine learning techniques can be used to develop a classification model that can predict a disease. In this paper we have explored classification algorithms on the Framingham Heart Disease Dataset. The dataset contains 15 features that are helpful to predict the risk of CHD in the next ten years. In our study we found that the dataset is imbalanced, i.e., the total instances of a certain class is higher than the instances of another class present in dataset, so we have used Synthetic Minority Oversampling Technique (SMOTE) to balance the dataset. Using SMOTE for the purpose of balancing the dataset has improved the F1 Score of positive class for all classifiers drastically. In our experimental analysis we have found that accuracy of SVM increases after applying SMOTE. Also AUC value of SVM is highest among all the classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Health Stats 2017 (2021) World Health Organization (WHO). Accessed https://www.who.int/news-room/factsheets/detail/cardiovascular-diseases
Khandoker A, Al Zaabi Y, Jelinek H (2019) What can tone and entropy tell us about risk of cardiovascular diseases? In: Proceedings of computing in cardiology conference (CinC), pp 1–4
Martin-Isla C, Campello VM, Izquierdo C, Raisi-Estabragh Z, Baeßler B, Petersen SE, Lekadir K (2020) Image-based cardiac diagnosis with machine learning: a review. Front Cardiovasc Med 7
Uddin MN, Kumar R (2021) An ensemble method based multilayer dynamic system to predict cardiovascular disease using machine learning approach. Inform Med Unlocked
Gao XY, Amin Ali A, Shaban Hassan H, Anwar EM (2021) Improving the accuracy for analyzing heart diseases prediction based on the ensemble method. Complexity 2021. https://doi.org/10.1155/2021/6663455
Kohli PS, Arora S (2018) Application of machine learning in disease prediction. In: Proceedings of 4th international conference on computing communication and automation (ICCCA). IEEE, Greater Noida, pp 1–4
Gonsalves AH, Thabtah F, Mohammad RMA, Singh G (2019) Prediction of coronary heart disease using machine learning: an experimental analysis. In: Proceedings of 3rd international conference on deep learning technologies, pp 51–56
Enriko IKA, Suryanegara M, Gunawan D (2018) Heart disease diagnosis system with K-nearest neighbors method using real clinical medical records. In: Proceedings of the 4th international conference on frontiers of educational technologies, pp 127–131
Rubini PE, Subasini CA, Katharine AV, Kumaresan V, Kumar SG, Nithya TM (2021) A cardiovascular disease prediction using machine learning algorithms. Ann Rom Soc Cell Biol 25(2):904–912. https://www.annalsofrscb.ro/index.php/journal/article/view/1040
Mienye ID, Sun Y, Wang Z (2020) An improved ensemble learning approach for the prediction of heart disease risk. Inform Med Unlocked 20
Latha CBC, Jeeva SC (2019) Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Inform Med Unlocked 16:100203
Gárate-Escamila AK, El Hassani AH, Andrès E (2020) Classification models for heart disease prediction using feature selection and PCA. Inform Med Unlocked 19:100330
Haq AU, Li JP, Memon MH, Nazir S, Sun R (2018) A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms. Mob Inf Syst
Mienye ID, Sun Y, Wang Z (2020) Improved sparse autoencoder based artificial neural network approach for prediction of heart disease. Inform Med Unlocked 18. https://doi.org/10.1016/j.imu.2020.100307
Ali F, El-Sappagh S, Islam SR, Kwak D, Ali A, Imran M, Kwak KS (2020) A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion. Inf Fusion 63:208–222
Dudjak M, Martinović G (2021) An empirical study of data intrinsic characteristics that make learning from imbalanced data difficult. Expert Syst Appl 182:115297
Skryjomski P, Krawczyk B, Cano A (2019) Speeding up k-nearest neighbors classifier for large-scale multi-label learning on GPUs. Neurocomputing 354:10–19
Magesh G, Swarnalatha P (2020) Optimal feature selection through a cluster-based DT learning (CDTL) in heart disease prediction. Evol Intell 1–11
Oberoi A, Chauhan R (2019) Visualizing data using Matplotlib and Seaborn libraries in Python for data science. Int J Sci Res Publ 9(3):8733
Kang H (2013) The prevention and handling of the missing data. Korean J Anesthesiol 64(5):402
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sharma, S., Singhal, A. (2023). A Comprehensive Investigation of Machine Learning Algorithms with SMOTE Integration to Maximize F1 Score. In: Sharma, H., Shrivastava, V., Bharti, K.K., Wang, L. (eds) Communication and Intelligent Systems. ICCIS 2022. Lecture Notes in Networks and Systems, vol 686. Springer, Singapore. https://doi.org/10.1007/978-981-99-2100-3_16
Download citation
DOI: https://doi.org/10.1007/978-981-99-2100-3_16
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-2099-0
Online ISBN: 978-981-99-2100-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)