Tree-based classifier ensembles for early detection method of diabetes: an exploratory study
- 78 Downloads
Diabetes is a lifestyle-driven disease which has become a critical health issue worldwide. In this paper, we conduct an exploratory study about early detection method of diabetes mellitus using various ensemble learning techniques. Eight tree-based machine learning algorithms, i.e. classification and regression tree, decision tree (C4.5), reduced error pruning tree, random tree, naive Bayes tree, functional tree, best-first decision tree and logistic model tree are employed as a base classifier in five different ensembles, i.e. bagging, boosting, random subspace, DECORATE, and rotation forest. The performance of ensembles and base classifiers are thoroughly benchmarked on three real-world datasets in term of area under receiver operating characteristic curve metric. Finally, we assess the performance differences among the classifiers using several statistical significant tests. We contribute to the existing literature regarding an extensive benchmark of tree-based classifier ensembles for early detection method of diabetes disease.
KeywordsDiabetes mellitus Classifier ensembles Benchmark Early detection method
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. NRF-2014R1A2A1A11052981), and partially supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2017-2015-0-00403) supervised by the IITP (Institute for Information & communications Technology Promotion).
Compliance with ethical standards
Conflict of interest
Authors declare that they have no conflict of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent was obtained from all individual participants included in the study.
- Ali R, Siddiqi MH, Idris M, Kang BH, Lee S (2014) Prediction of diabetes mellitus based on boosting ensemble modeling. In: International conference on ubiquitous computing and ambient intelligence, pp 25–28. SpringerGoogle Scholar
- El-Baz AH, Hassanien AE, Schaefer G (2016) Identification of diabetes disease using committees of neural network-based classifiers. In: Machine intelligence and big data in industry, pp 65–74. SpringerGoogle Scholar
- Firdaus MA, Nadia R, Tama BA (2014) Detecting major disease in public hospital using ensemble techniques. In: 2014 international symposium on technology management and emerging technologies (ISTMET), pp 149–152. IEEEGoogle Scholar
- Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. ICML 96:148–156Google Scholar
- Ginter E, Simko V (2013) Global prevalence and future of diabetes mellitus. In: Diabetes, pp 35–41. SpringerGoogle Scholar
- Kohavi R (1996) Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: KDD, vol. 96, pp 202–207. CiteseerGoogle Scholar
- Quinlan JR (1993) C4.5: programs for machine learning. Elsevier, AmsterdamGoogle Scholar
- Shi H (2007) Best-first decision tree learning. Ph.D. thesis, The University of WaikatoGoogle Scholar
- Smith JW, Everhart J, Dickson W, Knowler W, Johannes R (1988) Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the annual symposium on computer application in medical care, p 261. American Medical Informatics AssociationGoogle Scholar
- Tama BA, Firdaus MA, Fitri R (2010) Detection of type 2 diabetes mellitus disease with data mining approach using support vector machine. In: Proceeding of The 2010 international conference on informatics, cybernetics, and computer applications (ICICCA2010). Gopalan College of Engineering and Management, BangaloreGoogle Scholar
- Zar JH et al (1999) Biostatistical analysis. Pearson Education India, LondonGoogle Scholar
- Zolfaghari R (2012) Diagnosis of diabetes in female population of pima indian heritage with ensemble of BP neural network and SVM. Int J Comput Eng Manag 15:2230–7893Google Scholar