Skip to main content
Log in

Comparison of various classification algorithms in the diagnosis of type 2 diabetes in Iran

  • Original Article
  • Published:
International Journal of Diabetes in Developing Countries Aims and scope Submit manuscript

Abstract

In today’s medical world, data on symptoms of patients with various diseases are so widespread, that analysis and consideration of all factors is merely not possible by a person (doctor). Therefore, the need for an intelligent system to consider the various factors and identify a suitable model between the different parameters is evident. Knowledge of data mining, as the foundation of such systems, has played a vital role in the advancement of medical sciences, especially in diagnosis of various diseases. Type 2 diabetes is one of these diseases, which has increased in recent years, which if diagnosed late can lead to serious complications. In this paper, several data mining methods and algorithms have been used and applied to a set of screening data for type 2 diabetes in Tabriz, Iran. The performance of methods such as support vector machine, artificial neural network, decision tree, nearest neighbors, and Bayesian network has been compared in an effort to find the best algorithm for diagnosing this disease. Artificial neural network with an accuracy rate of 97.44 % has the best performance on the chosen dataset. Accuracy rates for support vector machine, decision tree, 5-nearest neighbor, and Bayesian network are 81.19, 95.03, 90.85, and 91.60 %, respectively. The results of the simulations show that the effectiveness of various classification techniques on a dataset depends on the application, as well as the nature and complexity of the dataset used. Moreover, it is not always possible to say that a classification technique will always have the best performance. Therefore, in cases where data mining is used for diagnosis or prediction of diseases, consultation with specialists is inevitable, for selecting the number and type of dataset parameters to obtain the best possible results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Shaw J, Sicree R, Zimmet P. Global estimates of the prevalence of diabetes for 2010 and 2030. Diabetes Res Clin Pract. 2010;87(1):4–14.

    Article  CAS  PubMed  Google Scholar 

  2. Prevention and control of non-communicable diseases. WHO Information Note 23 July 2010.

  3. Global Health Observatory (GHO) data: NCD mortality and morbidity. http://www.who.int/gho/ncd/mortality_morbidity/en/. Accessed 23 February 2015.

  4. Cerqueira M, Cravioto A, Dianis N, Ghannem H, Levitt A, Yan L. Global response to non-communicable disease. BMJ. 2011;342 (d3823).

  5. Diabetes: fact sheet N°312. http://www.who.int/mediacentre/factsheets/fs312/en/. Accessed 23 February 2015.

  6. IDF Diabetes Atlas. 5th ed. International Diabetes Federation; 2011.

  7. Zimmet P. Diabetes epidemiology as a tool to trigger diabetes research and care. Diabetologia. 1999;42(5):499–518.

    Article  CAS  PubMed  Google Scholar 

  8. Hagan MT, Demuth HB, Beale MH. Neural network design. Boston: Pws Pub; 1996.

    Google Scholar 

  9. Kayaer K, Yıldırım T, editors. Medical diagnosis on Pima Indian diabetes using general regression neural networks. Proceedings of the international conference on artificial neural networks and neural information processing (ICANN/ICONIP); 2003.

  10. Patil BM, Joshi RC, Toshniwal D. Hybrid prediction model for type-2 diabetic patients. Expert Systems Appl. 2010;37(12):8102–8.

    Article  Google Scholar 

  11. Al Jarullah AA, editor. Decision tree discovery for the diagnosis of type II diabetes. Innovations in Information Technology (IIT), 2011 International Conference on; 2011: IEEE.

  12. Osuna E, Freund R, Girosi F. Support vector machines: training and applications. 1997.

    Google Scholar 

  13. Cristianini N, Shawe-Taylor J. An introduction to support vector machines and other kernel-based learning methods. Cambridge university press; 2000.

  14. Shao Y-H, Deng N-Y. A coordinate descent margin-based twin support vector machine for classification. Neural Netw. 2012;25:114–21.

    Article  PubMed  Google Scholar 

  15. Orhan U, Hekim M, Ozer M. EEG signals classification using the K-means clustering and a multilayer perceptron neural network model. Expert Systems Appl. 2011;38(10):13475–81.

    Article  Google Scholar 

  16. Yaghini M, Khoshraftar MM, Fallahi M. A hybrid algorithm for artificial neural network training. Eng Appl Artif Intell. 2013;26(1):293–301.

    Article  Google Scholar 

  17. Temurtas F. A comparative study on thyroid disease diagnosis using neural networks. Expert Systems Appl. 2009;36(1):944–9.

    Article  Google Scholar 

  18. Witten I, Frank E, Hall M. Data mining: practical machine learning tools and techniques. 3rd edition. San Francisco: Morgan Kaufmann; 2011.

    Google Scholar 

  19. Xing Z, Pei J, Keogh E. A brief survey on sequence classification. ACM SIGKDD Explorations Newsletter. 2010;12(1):40–8.

    Article  Google Scholar 

  20. Nakayama N, Oketani M, Kawamura Y, Inao M, Nagoshi S, Fujiwara K, et al. Algorithm to determine the outcome of patients with acute liver failure: a data-mining analysis using decision trees. J Gastroenterol. 2012;47(6):664–77.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Setsirichok D, Piroonratana T, Wongseree W, Usavanarong T, Paulkhaolarn N, Kanjanakorn C, et al. Classification of complete blood count and haemoglobin-typing data by a C4.5 decision tree, a naïve Bayes classifier and a multilayer perceptron for thalassaemia screening. Biomedical Signal Processing and Control. 2012;7(2):202–12.

    Article  Google Scholar 

  22. Kurt I, Ture M, Kurum AT. Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Syst Appl. 2008;34(1):366–74.

    Article  Google Scholar 

  23. Olson DL, Delen D. Advanced data mining techniques [electronic resource]. Springer; 2008.

  24. Karthikeyani V, Begum IP. Comparison a performance of data mining algorithms (CPDMA) in prediction of diabetes disease. International Journal. 2013.

  25. Huang C-L, Wang C-J. A GA-based feature selection and parameters optimization for support vector machines. Expert Sys Appl. 2006;31(2):231–40.

    Article  Google Scholar 

  26. Kahramanli H, Allahverdi N. Design of a hybrid system for the diabetes and heart diseases. Expert Sys Appl. 2008;35(1):82–9.

    Article  Google Scholar 

  27. Khashei M, Zeinal Hamadani A, Bijari M. A novel hybrid classification model of artificial neural networks and multiple linear regression models. Expert Systems Appl. 2012;39(3):2606–20.

    Article  Google Scholar 

  28. Khashei M, Eftekhari S, Parvizian J. Diagnosing diabetes type II using a soft intelligent binary classification model. Review of Bioinformatics and Biometrics. 2012;1 (1).

  29. Ibrikci T, Ustun D, Kaya IE. Diagnosis of several diseases by using combined kernels with support vector machine. J Med Syst. 2012;36(3):1831–40.

    Article  PubMed  Google Scholar 

  30. Karegowda AG, Manjunath A, Jayaram M. Application of genetic algorithm optimized neural network connection weights for medical diagnosis of Pima Indians diabetes. Int J Soft Computing. 2011;2(2):15–23.

    Article  Google Scholar 

Download references

Authors’ contribution

M.H. performed the literature search and data analysis. He also prepared the manuscript. M.T. is the corresponding author. He designed the study and supervised data analysis. He also edited and reviewed the manuscript. Z.H. assisted in data analysis. She also edited and reviewed the manuscript. S.M.A. assisted in data acquisition and data analysis. He also reviewed the manuscript. All authors read and approved the final manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mehdi Teimouri.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Heydari, M., Teimouri, M., Heshmati, Z. et al. Comparison of various classification algorithms in the diagnosis of type 2 diabetes in Iran. Int J Diabetes Dev Ctries 36, 167–173 (2016). https://doi.org/10.1007/s13410-015-0374-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13410-015-0374-4

Keywords

Navigation