Skip to main content

Health care data analysis using evolutionary algorithm


Assessment of huge amount of data is the difficult task in the health care industry. Hence, it here brings the important need of the data mining in identifying the relationship between the data attributes. In this research work, an assessment model for the health care analysis is developed with the preprocessing steps of performing data cleaning by applying normalization with outlier detection by applying the k-means clustering. Then, the preprocessed data are subjected to the dimensionality reduction process by performing the Feature Selection task. Then, the selected features are analyzed by the wrapper model named SVM-based improved recursive feature selection, and its accuracy is evaluated and compared with the other traditional classifiers such as Naïve Bayes. The analysis demonstrates that the planned perfect has accomplished a regular correctness of 98.79% of health care dataset such as Pima Indians diabetes. It demonstrates that the planned technique has achieved improved consequences.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others


  1. Zhou J (2007) Feature selection in data mining—approaches based on information theory. VDM Verlag, Saarbrücken

    Google Scholar 

  2. Bu F, Chen Z, Zhang Q, Yang LT (2016) Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud. J Supercomput 72(8):2977

    Google Scholar 

  3. Han J, Kamber M (2000) Data mining: concepts and techniques, 1st edn. Morgan Kaufmann Publishers, Burlington

    MATH  Google Scholar 

  4. Rahm E, Do HH (2000) Data cleaning: problems and current approaches. IEEE Bull Tech Comm Data Eng 23(4):3–13

    Google Scholar 

  5. Lemke F, Mueller J-A (2003) Medical data analysis using self-organizing data mining technologies. Syst Anal Model Simul 43(10):1399–1408

    Google Scholar 

  6. Matheny ME, Ohno-Machado L, Resnic FS (2005) Discrimination and calibration of mortality risk prediction models in interventional cardiology. J Biomed Inform 38(5):367–375

    Google Scholar 

  7. Quinlan J (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo

    Google Scholar 

  8. Pang-Ning T, Steinbach M, Kumar V (2006) Introduction to data mining. Library of Congress, Washington

    MATH  Google Scholar 

  9. Koh HC, Tan G (2005) Data mining applications in healthcare. J Health Care Inf Manag 19(2):64–72

    Google Scholar 

  10. Ordonez C (2004) Improving Heart Disease Prediction Using Constrained Association Rules. Seminar presentation at University of Tokyo

  11. Leskovec J, Rajaraman A, Ullman JD (2014) Mining massive datasets. Cambridge University Press, Cambridge

    Google Scholar 

  12. Lin K-C, Zhang K-Y, Huang Y-H, Hung JC, Yen N (2016) Feature selection based on an improved cat swarm optimization algorithm for big data classification. J Supercomput 72(8):3210

    Google Scholar 

  13. Carlsson G, Mémoli F (2010) Characterization, stability and convergence of hierarchical clustering methods. J Mach Learn Res 11:1425–1470

    MathSciNet  MATH  Google Scholar 

  14. Osl M, Dreiseit S, Cerqueira F, Netzer M, Pfeifer B, Baumgartner C (2009) Demoting redundant features to improve the discriminatory ability in cancer data. J Biomed Inform 42(4):721–725

    Google Scholar 

  15. Cios KJ, William Moore G (2002) Uniqueness of medical data mining. Artif Intell Med 26(1):1–24

    Google Scholar 

  16. Sufi F (2011) Diagnosis of cardiovascular abnormalities from compressed ECG: a data mining-based approach. IEEE Trans Inf Technol Biomed 15(1):3–39

    Google Scholar 

  17. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to A. Suresh.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Suresh, A., Kumar, R. & Varatharajan, R. Health care data analysis using evolutionary algorithm. J Supercomput 76, 4262–4271 (2020).

Download citation

  • Published:

  • Issue Date:

  • DOI: