Ensemble method based predictive model for analyzing disease datasets: a predictive analysis approach

  • Dharavath RameshEmail author
  • Yogendra Singh Katheria
Original Paper


Medical datasets have attracted the research community for possible analysis and suitable prediction, which helps the human to take proper precautions in preventing future diseases. To perform related operations, data mining techniques have been widely used in developing decision support systems for disease prediction through a set of medical datasets. This work proposes a new predictive model for disease prediction using pre-processing techniques for various disease datasets. The proposed model not only analyses the datasets also improves the performance by using ensemble methods. To process the datasets, pre-processing techniques such as discretization, resampling, principal component, and decision tree have been used. To classify the datasets, classification techniques such as Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Naïve Bayes (NB), Decision Tree (DT), and Random Forest (RF) have been used. The algorithms are applied with 10 fold validation technique. A predictive analysis has also been performed on various disease datasets, where every dataset results in significant improvement for various performance measures. We perform a predictive analysis on the datasets such as CKD (Chronic Kidney Disease), Cardiovascular Disease (CVD) or heart, Diabetes, Hepatitis disease, Cancer disease and ILPD (Indian Liver Patient disease). Experimental results show that the proposed predictive model outperforms in terms of better accuracy.


Disease prediction Ensemble methods Machine learning 


Compliance with ethical standards

Conflict of interest

The author(s) declare(s) that there is no conflict of interest regarding the publication of this paper.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.


  1. 1.
    Magoulas GD, Prentza A. Machine learning in medical applications. Advanced course on artificial intelligence. Berlin, Heidelberg: Springer; 1999. p. 300–7.zbMATHGoogle Scholar
  2. 2.
    World Health Organization. The Top 10 Causes of Death, 2018.
  3. 3.
    World Health Organization, Cardiovascular, 2017. http://www.mediacentre/mediacentre/factsheets/fs317/en/. Accessed 15 January 2009
  4. 4.
    Godara S, Singh R. Evaluation of predictive machine learning techniques as expert systems in medical diagnosis. Indian J Sci Technol. 2016;9(10):1–14.Google Scholar
  5. 5.
    Bauer E, Kohavi R. An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn. 1999;36(1–2):105–39.CrossRefGoogle Scholar
  6. 6.
    UCI Machine learning Repository:
  7. 7.
    John R, Webb M, Young A, Stevens PE. Unreferred chronic kidney disease: a longitudinal study. Am J Kidney Dis. 2004;5(3):825–35.CrossRefGoogle Scholar
  8. 8.
    de Lusignan S, Chan T, Stevens P, O'donoghue D, Hague N, Dzregah B, et al. Identifying patients with chronic kidney disease from general practice computer records. Fam Pract. 2005;22(3):234–41.CrossRefGoogle Scholar
  9. 9.
    Levey AS, Eckardt KU, Tsukamoto Y, Levin A, Coresh J, Rossert J, et al. Definition and classification of chronic kidney disease: a position statement from kidney disease: improving global outcomes (KDIGO). Kidney Int. 2005;67(6):2089–100.CrossRefGoogle Scholar
  10. 10.
    Ribeiro RT, Marinho RT, Miguel Sanches J. Classification and staging of chronic liver disease from multimodal data. IEEE Trans Biomed Eng. 2013;60(5):1336–134.CrossRefGoogle Scholar
  11. 11.
    Bhatla N, Jyoti K. An analysis of heart disease prediction using different data mining techniques. IJERT. 2012; 1(8).Google Scholar
  12. 12.
    Palaniappan S, Awang R. Intelligent heart disease prediction system using data mining techniques. Int J Comput Sci Netw Sec. 2008;8(8):1–8.Google Scholar
  13. 13.
    Ho C, Pai T, Peng Y, Lee C, Chen Y, Chen Y. Ultrasonography image analysis for detection and classification of chronic kidney disease. IEEE Complex Intell Softw Intens Syst. 2012; 624–629.Google Scholar
  14. 14.
    Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D Top 10 algorithms in data mining. Knowl Inf Syst 14, 1–37, 2008.Google Scholar
  15. 15.
    Kim MJ, Suh DJ. Profiles of serum bile acids in liver diseases. Korean J Intern Med. 1986;1(1):37–43.CrossRefGoogle Scholar
  16. 16.
    Adekanle O, Ndububa DA, Olowookere SA, Ijarotimi O, Ijadunola KT. Knowledge of hepatitis B virus infection, immunization with hepatitis B vaccine, risk perception, and challenges to control hepatitis among hospital workers in a Nigerian tertiary hospital. Hepatitis Res Treat. 2015, 1:6.Google Scholar
  17. 17.
    Sharma P, Kaur M. Classification in pattern recognition: a review. Int J Adv Res Comput Sci Softw Eng. 2013;3:298.Google Scholar
  18. 18.
    Kumar Dewangan A, Agrawal P. Classification of diabetes mellitus using machine learning techniques. Int J Eng Appl Sci. 2015;2(5):145–8.Google Scholar
  19. 19.
    Nai-arun N, Moungmai R. Comparison of classifiers for the risk of diabetes prediction. Proc Comput Sci. 2015;69:132–42.CrossRefGoogle Scholar
  20. 20.
    Zheng T, Xie W, Xu L, He X, Zhang Y, You M, et al. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform. 2017;97:120–7.CrossRefGoogle Scholar
  21. 21.
    Pradeep KR, Naveen NC. Predictive analysis of diabetes using J48 algorithm of classification techniques. Contemporary Computing and Informatics (IC3I), 2016 2nd International Conference on. 2016; 347–352). IEEE.Google Scholar
  22. 22.
    Bashir S, Qamar U, Khan FH, Javed MY. An efficient rule-based classification of Diabetes using ID3, C4. 5, & CART ensembles. 2014 12th International Conference on Frontiers of Information Technology (FIT). 2014; 226–231. IEEE.Google Scholar
  23. 23.
    Guo Y, Bai G, Hu Y. Using bayes network for prediction of type-2 diabetes. Internet Technology Secured Transactions, 2012 International Conf. 2012; 471–472. IEEE.Google Scholar
  24. 24.
    Lee BJ, Ku B, Nam J, Pham DD, Kim JY. Prediction of fasting plasma glucose status using anthropometric measures for diagnosing type 2 diabetes. IEEE J Biomed Health Inform. 2014;18(2):555–61.CrossRefGoogle Scholar
  25. 25.
    Meng XH, Huang YX, Rao DP, Zhang Q, Liu Q. Comparison of three data mining models for predicting diabetes or prediabetes by risk factors. Kaohsiung J Med Sci. 2013;29(2):93–9.CrossRefGoogle Scholar
  26. 26.
    Übeyli ED. Implementing automated diagnostic systems for breast cancer detection. Expert Syst Appl. 2007;33(4):1054–62.CrossRefGoogle Scholar
  27. 27.
    Gerson SL, Jensen RA. Patient access to academic cancer centers. J Med Syst. 2018;42(5):86.CrossRefGoogle Scholar
  28. 28.
    Gupte A, Joshi S, Gadgul P, Kadam A. Comparative study of classification algorithms used in sentiment analysis. Int J Comput Sci Inform Technol. 2014;5(5):1–4.Google Scholar
  29. 29.
    Polat K, Günes S. Breast cancer diagnosis using least square support vectormachine. Digit Sign Process. 2007;17(4):694–701.CrossRefGoogle Scholar
  30. 30.
  31. 31.
    Cleveland Clinic Foundation. Heart disease dataset. Date accessed: 22/07/1988.
  32. 32.
    Kirubha V, Priya SM. Survey on data mining algorithms in disease prediction. Int J Comput Trends Technol. 2016;38(3):24–128.CrossRefGoogle Scholar
  33. 33.
    Pakhale H, Xaxa DK. A survey on diagnosis of liver disease classification. Int J Eng Techn. 2016;2:2395–1303.Google Scholar
  34. 34.
    Sen SK, Dash S. Application of Meta learning algorithms for the prediction of diabetes disease. Int J Adv Res Comput Sci Manag Stud. 2014;2:396–401.Google Scholar
  35. 35.
    World Health Organization. Diabetes, 2018.
  36. 36.
    Patil TR, Sherekar SS. Performance analysis of naive Bayes and J48 classification algorithm for data classification. Int J Comput Sci Appl. 2013;6(2):256–61.Google Scholar
  37. 37.
    Miranda E, Irwansyah E, Amelga AY, Maribondang MM, Salim M. Detection of cardiovascular disease risk's level for adults using naive Bayes classifier. Healthcare Inform Res. 2016;22(3):196–205.CrossRefGoogle Scholar
  38. 38.
    Teli S, Kanikar P. A survey on decision tree based approaches in data mining. Int J Adv Res Comput Sci Softw Eng. 2015;5(4):1–5.Google Scholar
  39. 39.
    Sindhuja D, Priyadarsini RJ. A survey on classification techniques in data mining for analyzing liver disease disorder. Int J Comput Sci Mobile Comput. 2016;5(5):483–8.Google Scholar
  40. 40.
    Kaur R. Using some data mining techniques to predict the survival year of lung cancer patient. Int J Comput Sci Mobile Comput. 2013;2(4):1–6.Google Scholar
  41. 41.
    Romani S, Hosseini SM, Mohebbi SR, Kazemian S, Derakhshani S, Khanyaghma M, et al. Interleukin-16 gene polymorphisms are considerable host genetic factors for patients’ susceptibility to chronic hepatitis B infection. Hepatitis research and treatment. 2014, 1:5.Google Scholar
  42. 42.
    Sira MM, Behairy BE, Abd-Elaziz AM, Abd Elnaby SA, Eltahan EE. Serum inter-alpha-trypsin inhibitor heavy chain 4 (ITIH4) in children with chronic hepatitis C: relation to liver fibrosis and viremia. Hepatitis Res Treat. 2014, 1:7.Google Scholar
  43. 43.
    Pouriyeh S, Vahid S, Sannino G, De Pietro G, Arabnia H, Gutierrez J. A comprehensive investigation and comparison of Machine Learning Techniques in the domain of heart disease. Comput Commun (ISCC), 2017 IEEE Symposium. 2017; 204–207. IEEE.Google Scholar
  44. 44.
    Fatima M, Pasha M. Survey of machine learning algorithms for disease diagnostic. J Intell Learn Syst Appl. 2017;9(01):1–16.Google Scholar
  45. 45.
    Kotsiantis SB, Zaharakis I, Pintelas P. Supervised machine learning: a review of classification techniques. Emerg Artif Intell Applic Comput Eng. 2007;160:3–24.Google Scholar
  46. 46.
    Mythili MS, Shanavas ARM. An analysis of students’ performance using classification algorithms. IOSR J Comput Eng. 2014;16(1):63–9.CrossRefGoogle Scholar
  47. 47.
    Elsayad A, Fakr M. Diagnosis of cardiovascular diseases with Bayesian classifiers. J Comput Sci. 2015;11(2):274–82.CrossRefGoogle Scholar
  48. 48.
    Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. J Artif Intell Med. 2001;1:89–109.CrossRefGoogle Scholar
  49. 49.
    Karabatak M. A new classifier for breast cancer detection based on Naïve Bayesian. Measurement. 2015;72:32–6.CrossRefGoogle Scholar
  50. 50.
    Marcano-Cedeño A, Quintanilla-Domínguez J, Andina D. WBCD breast cancer database classification applying artificial meta plasticity neural network. Expert Syst Appl. 2011;38(8):9573–9.CrossRefGoogle Scholar
  51. 51.
    Ba-Alwi FM, Hintaya HM. Comparative study for analysis the prognostic in hepatitis data: data mining approach. Int J Sci Eng Res. 2013;4:680–5.Google Scholar
  52. 52.
    Singh Y, Bhatia PK, Sangwan O. A review of studies on machine learning techniques. Int J Comput Sci Secur. 2007;1:70–84.Google Scholar

Copyright information

© IUPESM and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringIndian Institute of Technology (ISM)DhanbadIndia

Personalised recommendations