Skip to main content

A Comprehensive Investigation of Machine Learning Algorithms with SMOTE Integration to Maximize F1 Score

  • Conference paper
  • First Online:
Communication and Intelligent Systems (ICCIS 2022)

Abstract

Classification algorithms are very helpful for the healthcare sector as they help in detecting a disease at an early stage which helps to give the required treatment to the patients in a timely manner. Machine learning techniques can be used to develop a classification model that can predict a disease. In this paper we have explored classification algorithms on the Framingham Heart Disease Dataset. The dataset contains 15 features that are helpful to predict the risk of CHD in the next ten years. In our study we found that the dataset is imbalanced, i.e., the total instances of a certain class is higher than the instances of another class present in dataset, so we have used Synthetic Minority Oversampling Technique (SMOTE) to balance the dataset. Using SMOTE for the purpose of balancing the dataset has improved the F1 Score of positive class for all classifiers drastically. In our experimental analysis we have found that accuracy of SVM increases after applying SMOTE. Also AUC value of SVM is highest among all the classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Health Stats 2017 (2021) World Health Organization (WHO). Accessed https://www.who.int/news-room/factsheets/detail/cardiovascular-diseases

  2. Khandoker A, Al Zaabi Y, Jelinek H (2019) What can tone and entropy tell us about risk of cardiovascular diseases? In: Proceedings of computing in cardiology conference (CinC), pp 1–4

    Google Scholar 

  3. Martin-Isla C, Campello VM, Izquierdo C, Raisi-Estabragh Z, Baeßler B, Petersen SE, Lekadir K (2020) Image-based cardiac diagnosis with machine learning: a review. Front Cardiovasc Med 7

    Google Scholar 

  4. Uddin MN, Kumar R (2021) An ensemble method based multilayer dynamic system to predict cardiovascular disease using machine learning approach. Inform Med Unlocked

    Google Scholar 

  5. Gao XY, Amin Ali A, Shaban Hassan H, Anwar EM (2021) Improving the accuracy for analyzing heart diseases prediction based on the ensemble method. Complexity 2021. https://doi.org/10.1155/2021/6663455

  6. Kohli PS, Arora S (2018) Application of machine learning in disease prediction. In: Proceedings of 4th international conference on computing communication and automation (ICCCA). IEEE, Greater Noida, pp 1–4

    Google Scholar 

  7. Gonsalves AH, Thabtah F, Mohammad RMA, Singh G (2019) Prediction of coronary heart disease using machine learning: an experimental analysis. In: Proceedings of 3rd international conference on deep learning technologies, pp 51–56

    Google Scholar 

  8. Enriko IKA, Suryanegara M, Gunawan D (2018) Heart disease diagnosis system with K-nearest neighbors method using real clinical medical records. In: Proceedings of the 4th international conference on frontiers of educational technologies, pp 127–131

    Google Scholar 

  9. Rubini PE, Subasini CA, Katharine AV, Kumaresan V, Kumar SG, Nithya TM (2021) A cardiovascular disease prediction using machine learning algorithms. Ann Rom Soc Cell Biol 25(2):904–912. https://www.annalsofrscb.ro/index.php/journal/article/view/1040

  10. Mienye ID, Sun Y, Wang Z (2020) An improved ensemble learning approach for the prediction of heart disease risk. Inform Med Unlocked 20

    Google Scholar 

  11. Latha CBC, Jeeva SC (2019) Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Inform Med Unlocked 16:100203

    Article  Google Scholar 

  12. Gárate-Escamila AK, El Hassani AH, Andrès E (2020) Classification models for heart disease prediction using feature selection and PCA. Inform Med Unlocked 19:100330

    Google Scholar 

  13. Haq AU, Li JP, Memon MH, Nazir S, Sun R (2018) A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms. Mob Inf Syst

    Google Scholar 

  14. Mienye ID, Sun Y, Wang Z (2020) Improved sparse autoencoder based artificial neural network approach for prediction of heart disease. Inform Med Unlocked 18. https://doi.org/10.1016/j.imu.2020.100307

  15. Ali F, El-Sappagh S, Islam SR, Kwak D, Ali A, Imran M, Kwak KS (2020) A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion. Inf Fusion 63:208–222

    Google Scholar 

  16. Dudjak M, Martinović G (2021) An empirical study of data intrinsic characteristics that make learning from imbalanced data difficult. Expert Syst Appl 182:115297

    Google Scholar 

  17. Skryjomski P, Krawczyk B, Cano A (2019) Speeding up k-nearest neighbors classifier for large-scale multi-label learning on GPUs. Neurocomputing 354:10–19

    Google Scholar 

  18. Magesh G, Swarnalatha P (2020) Optimal feature selection through a cluster-based DT learning (CDTL) in heart disease prediction. Evol Intell 1–11

    Google Scholar 

  19. Oberoi A, Chauhan R (2019) Visualizing data using Matplotlib and Seaborn libraries in Python for data science. Int J Sci Res Publ 9(3):8733

    Google Scholar 

  20. Kang H (2013) The prevention and handling of the missing data. Korean J Anesthesiol 64(5):402

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Surbhi Sharma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sharma, S., Singhal, A. (2023). A Comprehensive Investigation of Machine Learning Algorithms with SMOTE Integration to Maximize F1 Score. In: Sharma, H., Shrivastava, V., Bharti, K.K., Wang, L. (eds) Communication and Intelligent Systems. ICCIS 2022. Lecture Notes in Networks and Systems, vol 686. Springer, Singapore. https://doi.org/10.1007/978-981-99-2100-3_16

Download citation

Publish with us

Policies and ethics