Tree-based classifier ensembles for early detection method of diabetes: an exploratory study

Article

Abstract

Diabetes is a lifestyle-driven disease which has become a critical health issue worldwide. In this paper, we conduct an exploratory study about early detection method of diabetes mellitus using various ensemble learning techniques. Eight tree-based machine learning algorithms, i.e. classification and regression tree, decision tree (C4.5), reduced error pruning tree, random tree, naive Bayes tree, functional tree, best-first decision tree and logistic model tree are employed as a base classifier in five different ensembles, i.e. bagging, boosting, random subspace, DECORATE, and rotation forest. The performance of ensembles and base classifiers are thoroughly benchmarked on three real-world datasets in term of area under receiver operating characteristic curve metric. Finally, we assess the performance differences among the classifiers using several statistical significant tests. We contribute to the existing literature regarding an extensive benchmark of tree-based classifier ensembles for early detection method of diabetes disease.

Keywords

Diabetes mellitus Classifier ensembles Benchmark Early detection method 

Notes

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. NRF-2014R1A2A1A11052981), and partially supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2017-2015-0-00403) supervised by the IITP (Institute for Information & communications Technology Promotion).

Compliance with ethical standards

Conflict of interest

Authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

References

  1. Ali R, Siddiqi MH, Idris M, Kang BH, Lee S (2014) Prediction of diabetes mellitus based on boosting ensemble modeling. In: International conference on ubiquitous computing and ambient intelligence, pp 25–28. SpringerGoogle Scholar
  2. Bashir S, Qamar U, Khan FH (2016) IntelliHealth: a medical decision support application using a novel weighted multi-layer classifier ensemble framework. J Biomed Inform 59:185–200CrossRefGoogle Scholar
  3. Bashir S, Qamar U, Khan FH, Naseem L (2016) HMV: a medical decision support framework using multi-layer classifiers for disease prediction. J Comput Sci 13:10–25CrossRefGoogle Scholar
  4. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATHGoogle Scholar
  5. Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefMATHGoogle Scholar
  6. Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, New YorkMATHGoogle Scholar
  7. Delen D, Walker G, Kadam A (2005) Predicting breast cancer survivability: a comparison of three data mining methods. Artif Intell Med 34(2):113–127CrossRefGoogle Scholar
  8. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30 (Jan)MathSciNetMATHGoogle Scholar
  9. Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1923CrossRefGoogle Scholar
  10. Dunn OJ (1964) Multiple comparisons using rank sums. Technometrics 6(3):241–252CrossRefGoogle Scholar
  11. El-Baz AH, Hassanien AE, Schaefer G (2016) Identification of diabetes disease using committees of neural network-based classifiers. In: Machine intelligence and big data in industry, pp 65–74. SpringerGoogle Scholar
  12. Fawcett T (2006) An introduction to roc analysis. Pattern Recognit Lett 27(8):861–874MathSciNetCrossRefGoogle Scholar
  13. Firdaus MA, Nadia R, Tama BA (2014) Detecting major disease in public hospital using ensemble techniques. In: 2014 international symposium on technology management and emerging technologies (ISTMET), pp 149–152. IEEEGoogle Scholar
  14. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139MathSciNetCrossRefMATHGoogle Scholar
  15. Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. ICML 96:148–156Google Scholar
  16. Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92MathSciNetCrossRefMATHGoogle Scholar
  17. Gama J (2004) Functional trees. Mach Learn 55(3):219–250MathSciNetCrossRefMATHGoogle Scholar
  18. García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf Sci 180(10):2044–2064CrossRefGoogle Scholar
  19. Ginter E, Simko V (2013) Global prevalence and future of diabetes mellitus. In: Diabetes, pp 35–41. SpringerGoogle Scholar
  20. Heydari M, Teimouri M, Heshmati Z, Alavinia SM (2015) Comparison of various classification algorithms in the diagnosis of type 2 diabetes in Iran. Int J Diabetes Dev Ctries 36(2):167–173CrossRefGoogle Scholar
  21. Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844CrossRefGoogle Scholar
  22. Kohavi R (1996) Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: KDD, vol. 96, pp 202–207. CiteseerGoogle Scholar
  23. Kuncheva LI (2014) Combining pattern classifiers: methods and algorithm, 2nd edn. Wiley, New YorkMATHGoogle Scholar
  24. Landwehr N, Hall M, Frank E (2005) Logistic model trees. Mach Learn 59(1–2):161–205CrossRefMATHGoogle Scholar
  25. Marcialis GL, Roli F (2004) Fusion of appearance-based face recognition algorithms. Pattern Anal Appl 7(2):151–163MathSciNetCrossRefGoogle Scholar
  26. Melville P, Mooney RJ (2005) Creating diversity in ensembles using artificial data. Inf Fusion 6(1):99–111CrossRefGoogle Scholar
  27. Quinlan JR (1993) C4.5: programs for machine learning. Elsevier, AmsterdamGoogle Scholar
  28. Quinlan JR (1999) Simplifying decision trees. Int J Hum Comput Stud 51(2):497–510CrossRefGoogle Scholar
  29. Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630CrossRefGoogle Scholar
  30. Shaw JE, Sicree RA, Zimmet PZ (2010) Global estimates of the prevalence of diabetes for 2010 and 2030. Diabetes Res Clin Pract 87(1):4–14CrossRefGoogle Scholar
  31. Shi H (2007) Best-first decision tree learning. Ph.D. thesis, The University of WaikatoGoogle Scholar
  32. Smith JW, Everhart J, Dickson W, Knowler W, Johannes R (1988) Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the annual symposium on computer application in medical care, p 261. American Medical Informatics AssociationGoogle Scholar
  33. Tama BA, Firdaus MA, Fitri R (2010) Detection of type 2 diabetes mellitus disease with data mining approach using support vector machine. In: Proceeding of The 2010 international conference on informatics, cybernetics, and computer applications (ICICCA2010). Gopalan College of Engineering and Management, BangaloreGoogle Scholar
  34. Tama BA, Fitri R (2013) Hermansyah: an early detection method of type-2 diabetes mellitus in public hospital. TELKOMNIKA (Telecommun Comput Electr Control) 9(2):287–294CrossRefGoogle Scholar
  35. Woźniak M, Graña M, Corchado E (2014) A survey of multiple classifier systems as hybrid systems. Inf Fusion 16:3–17CrossRefGoogle Scholar
  36. Zar JH et al (1999) Biostatistical analysis. Pearson Education India, LondonGoogle Scholar
  37. Zhu J, Xie Q, Zheng K (2015) An improved early detection method of type-2 diabetes mellitus using multiple classifier system. Inf Sci 292:1–14CrossRefGoogle Scholar
  38. Zolfaghari R (2012) Diagnosis of diabetes in female population of pima indian heritage with ensemble of BP neural network and SVM. Int J Comput Eng Manag 15:2230–7893Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2017

Authors and Affiliations

  1. 1.IT Convergence and Application EngineeringPukyong National UniversityBusanKorea
  2. 2.Faculty of Computer ScienceUniversity of Sriwijaya Jln Raya Palembang-Prabumulih Km.Sumatera SelatanIndonesia

Personalised recommendations