Abstract
Assessment of huge amount of data is the difficult task in the health care industry. Hence, it here brings the important need of the data mining in identifying the relationship between the data attributes. In this research work, an assessment model for the health care analysis is developed with the preprocessing steps of performing data cleaning by applying normalization with outlier detection by applying the k-means clustering. Then, the preprocessed data are subjected to the dimensionality reduction process by performing the Feature Selection task. Then, the selected features are analyzed by the wrapper model named SVM-based improved recursive feature selection, and its accuracy is evaluated and compared with the other traditional classifiers such as Naïve Bayes. The analysis demonstrates that the planned perfect has accomplished a regular correctness of 98.79% of health care dataset such as Pima Indians diabetes. It demonstrates that the planned technique has achieved improved consequences.
This is a preview of subscription content, access via your institution.




Similar content being viewed by others
References
Zhou J (2007) Feature selection in data mining—approaches based on information theory. VDM Verlag, Saarbrücken
Bu F, Chen Z, Zhang Q, Yang LT (2016) Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud. J Supercomput 72(8):2977
Han J, Kamber M (2000) Data mining: concepts and techniques, 1st edn. Morgan Kaufmann Publishers, Burlington
Rahm E, Do HH (2000) Data cleaning: problems and current approaches. IEEE Bull Tech Comm Data Eng 23(4):3–13
Lemke F, Mueller J-A (2003) Medical data analysis using self-organizing data mining technologies. Syst Anal Model Simul 43(10):1399–1408
Matheny ME, Ohno-Machado L, Resnic FS (2005) Discrimination and calibration of mortality risk prediction models in interventional cardiology. J Biomed Inform 38(5):367–375
Quinlan J (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo
Pang-Ning T, Steinbach M, Kumar V (2006) Introduction to data mining. Library of Congress, Washington
Koh HC, Tan G (2005) Data mining applications in healthcare. J Health Care Inf Manag 19(2):64–72
Ordonez C (2004) Improving Heart Disease Prediction Using Constrained Association Rules. Seminar presentation at University of Tokyo
Leskovec J, Rajaraman A, Ullman JD (2014) Mining massive datasets. Cambridge University Press, Cambridge
Lin K-C, Zhang K-Y, Huang Y-H, Hung JC, Yen N (2016) Feature selection based on an improved cat swarm optimization algorithm for big data classification. J Supercomput 72(8):3210
Carlsson G, Mémoli F (2010) Characterization, stability and convergence of hierarchical clustering methods. J Mach Learn Res 11:1425–1470
Osl M, Dreiseit S, Cerqueira F, Netzer M, Pfeifer B, Baumgartner C (2009) Demoting redundant features to improve the discriminatory ability in cancer data. J Biomed Inform 42(4):721–725
Cios KJ, William Moore G (2002) Uniqueness of medical data mining. Artif Intell Med 26(1):1–24
Sufi F (2011) Diagnosis of cardiovascular abnormalities from compressed ECG: a data mining-based approach. IEEE Trans Inf Technol Biomed 15(1):3–39
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Suresh, A., Kumar, R. & Varatharajan, R. Health care data analysis using evolutionary algorithm. J Supercomput 76, 4262–4271 (2020). https://doi.org/10.1007/s11227-018-2302-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2302-0