Abstract
This paper aims toward a greater idea and utilization of machine learning in the medical sector. In this paper, comparative performances of six classification models are presented, when used over the University of California Irvine’s (UCI) Cleveland Heart Disease Records to predict coronary artery disease (CAD). At first, all the 13 provided independent features were used to build the models. On comparing the accuracy of models, it was found that K-nearest neighbors (KNN), support vector machine (SVM), and Naive Bayes have expected and better performances. Thereafter, feature selection is applied to improve prediction accuracy. The backward elimination method and filter method based on the Pearson correlation coefficient is used to choose major predicting features. The accuracy of models using all features and using features selected significantly enhanced the performance of Naive Bayes and random forest, while the other models did not perform as expected. Naive Bayes produced an accuracy of 88.16% on the test set thereafter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J. J., Meyer, M., Guppy, K.H., Abi-Mansour, P.: Algorithm to predict triple-vessel/left main coronary artery disease in patients without myocardial infarction. An international cross validation. Circulation 83(5 Suppl), III89–96 (1991)
Alwan, A.: Global status report on noncommunicable diseases 2010. World Health Organization. Open J. Prev. Med. 5(8) (2015)
Kumari, M., Godara, S.: Comparative study of data mining classification methods in cardiovascular disease prediction 1. Int. J. Comput. Sci. Technol. 2, 304–308 (2011)
Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J.J., Sandhu, S., Guppy, K.H., Lee, S., Froelicher, V.: International application of a new probability algorithm for the diagnosis of coronary artery disease. Am. J. Cardiol. 64(5), 304–310 (1989)
Yao, Z., Liu, P., Lei, L., Yin, J.: R-C4. 5 Decision tree model and its applications to health care dataset. In: Proceedings of ICSSSM’05. 2005 International Conference on Services Systems and Services Management, vol. 2, pp. 1099–1103. IEEE (2005)
Das, R., Turkoglu, I., Sengur, A.: Effective diagnosis of heart disease through neural networks ensembles. Expert Syst. Appl. 36(4), 7675–7680 (2009)
Kurt, I., Ture, M., Kurum, A.T.: Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Syst. Appl. 34(1), 366–374 (2008)
Jabbar, M.A., Deekshatulu, B.L., Chandra, P.: Classification of heart disease using artificial neural network and feature subset selection. Glob. J. Comput. Sci. Technol. Neural Artif. Intell. 13(3), 4–8 (2013)
Gennari, J.H., Langley, P., Fisher, D.: Models of incremental concept formation. Artif. Intell. 40(1–3), 11–61 (1989)
Sabay, A., Harris, L., Bejugama, V., Jaceldo-Siegl, K.: Overcoming small data limitations in heart disease prediction by using surrogate data. SMU Data Sci. Rev. 1(3), 12 (2018)
Mehanović, D., Mašetić, Z., Kečo, D.: Prediction of heart diseases using majority voting ensemble method. In: International Conference on Medical and Biological Engineering, pp. 491–498. Springer, Cham (2019)
Heart Disease Data Set, UCI Machine Learning Repository. http://archive.ics.uci.edu/ml/datasets/Heart+Disease
Detrano, R.: Heart Disease Data Set of Cleveland, V.A. Medical Center, Long Beach and Cleveland Clinic Foundation
Wikipedia: https://en.wikipedia.org/wiki/Precision_and_recall#cite_note-OlsonDelen-7
Chen, L., Cao, Q., Li, S., Ju, X.: Predicting heart attacks. Int. J. Comput. Appl. (0975–8887) 17(8) (2011)
Chaki, D., Das, A., Zaber, M.I.: A comparison of three discrete methods for classification of heart disease data. Bangladesh J. Sci. Ind. Res. 50(4), 293–296 (2015)
Wei, L., Altman, R.B.: An automated system for generating comparative disease profiles and making diagnoses. IEEE Trans. Neural Netw. 15, 597 (2004)
Sen, S.K.: Predicting and diagnosing of heart disease using machine learning algorithms. Int. J. Eng. Comput. Sci. 6(6) (2017)
Singh, Y.K., Sinha, N., Singh, S.K. Heart disease prediction system using random forest. In: International Conference on Advances in Computing and Data Sciences, pp. 613–623. Springer, Singapore (2016)
Basharat, I., Anjum, A.R., Fatima, M., Qamar, U., Khan, S.A.: A framework for classifying unstructured data of cardiac patients: a supervised learning approach. Framework 7(2) (2016)
Hossain, J., FazlidaMohdSani, N., Mustapha, A., SurianiAffendey, L.: Using feature selection as accuracy benchmarking in clinical data mining. J. Comput. Sci. 9(7), 883 (2013)
Chowdhury, D.R., Chatterjee, M., Samanta, R.K.: An artificial neural network model for neonatal disease diagnosis. Int. J. Artif. Intell. Expert Syst. (IJAE) 2(3), 96–106 (2011)
Chavda, P., Bhavsar, H., Pithadia, Y., Kotecha, R.: Early Detection of Cardiac Disease Using Machine Learning. Available at SSRN 3370813 (2019)
Feature Selection with sklearn and Pandas. https://towardsdatascience.com/feature-selection-with-pandas-e3690ad8504b
Deekshatulu, B.L., Chandra, P.: Classification of heart disease using k-nearest neighbor and genetic algorithm. Procedia Technol. 10, 85–94 (2013)
Jain, D., Singh, V.: Feature selection and classification systems for chronic disease prediction: a review. Egypt. Inf. J. 19(3), 179–189 (2018)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
Aha, D., Kibler, D.: Instance-based prediction of heart-disease presence with the Cleveland database. University of California, 3(1), 3-2 (1988)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Gupta, A., Kumar, L., Jain, R., Nagrath, P. (2020). Heart Disease Prediction Using Classification (Naive Bayes). In: Singh, P., Pawłowski, W., Tanwar, S., Kumar, N., Rodrigues, J., Obaidat, M. (eds) Proceedings of First International Conference on Computing, Communications, and Cyber-Security (IC4S 2019). Lecture Notes in Networks and Systems, vol 121. Springer, Singapore. https://doi.org/10.1007/978-981-15-3369-3_42
Download citation
DOI: https://doi.org/10.1007/978-981-15-3369-3_42
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3368-6
Online ISBN: 978-981-15-3369-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)