Comparison of various classification algorithms in the diagnosis of type 2 diabetes in Iran

  • Mahmoud Heydari
  • Mehdi Teimouri
  • Zainabolhoda Heshmati
  • Seyed Mohammad Alavinia
Original Article

Abstract

In today’s medical world, data on symptoms of patients with various diseases are so widespread, that analysis and consideration of all factors is merely not possible by a person (doctor). Therefore, the need for an intelligent system to consider the various factors and identify a suitable model between the different parameters is evident. Knowledge of data mining, as the foundation of such systems, has played a vital role in the advancement of medical sciences, especially in diagnosis of various diseases. Type 2 diabetes is one of these diseases, which has increased in recent years, which if diagnosed late can lead to serious complications. In this paper, several data mining methods and algorithms have been used and applied to a set of screening data for type 2 diabetes in Tabriz, Iran. The performance of methods such as support vector machine, artificial neural network, decision tree, nearest neighbors, and Bayesian network has been compared in an effort to find the best algorithm for diagnosing this disease. Artificial neural network with an accuracy rate of 97.44 % has the best performance on the chosen dataset. Accuracy rates for support vector machine, decision tree, 5-nearest neighbor, and Bayesian network are 81.19, 95.03, 90.85, and 91.60 %, respectively. The results of the simulations show that the effectiveness of various classification techniques on a dataset depends on the application, as well as the nature and complexity of the dataset used. Moreover, it is not always possible to say that a classification technique will always have the best performance. Therefore, in cases where data mining is used for diagnosis or prediction of diseases, consultation with specialists is inevitable, for selecting the number and type of dataset parameters to obtain the best possible results.

Keywords

Type 2 diabetes Support vector machine Decision tree Artificial neural network Nearest neighbors Bayesian network 

References

  1. 1.
    Shaw J, Sicree R, Zimmet P. Global estimates of the prevalence of diabetes for 2010 and 2030. Diabetes Res Clin Pract. 2010;87(1):4–14.CrossRefPubMedGoogle Scholar
  2. 2.
    Prevention and control of non-communicable diseases. WHO Information Note 23 July 2010.Google Scholar
  3. 3.
    Global Health Observatory (GHO) data: NCD mortality and morbidity. http://www.who.int/gho/ncd/mortality_morbidity/en/. Accessed 23 February 2015.
  4. 4.
    Cerqueira M, Cravioto A, Dianis N, Ghannem H, Levitt A, Yan L. Global response to non-communicable disease. BMJ. 2011;342 (d3823).Google Scholar
  5. 5.
    Diabetes: fact sheet N°312. http://www.who.int/mediacentre/factsheets/fs312/en/. Accessed 23 February 2015.
  6. 6.
    IDF Diabetes Atlas. 5th ed. International Diabetes Federation; 2011.Google Scholar
  7. 7.
    Zimmet P. Diabetes epidemiology as a tool to trigger diabetes research and care. Diabetologia. 1999;42(5):499–518.CrossRefPubMedGoogle Scholar
  8. 8.
    Hagan MT, Demuth HB, Beale MH. Neural network design. Boston: Pws Pub; 1996.Google Scholar
  9. 9.
    Kayaer K, Yıldırım T, editors. Medical diagnosis on Pima Indian diabetes using general regression neural networks. Proceedings of the international conference on artificial neural networks and neural information processing (ICANN/ICONIP); 2003.Google Scholar
  10. 10.
    Patil BM, Joshi RC, Toshniwal D. Hybrid prediction model for type-2 diabetic patients. Expert Systems Appl. 2010;37(12):8102–8.CrossRefGoogle Scholar
  11. 11.
    Al Jarullah AA, editor. Decision tree discovery for the diagnosis of type II diabetes. Innovations in Information Technology (IIT), 2011 International Conference on; 2011: IEEE.Google Scholar
  12. 12.
    Osuna E, Freund R, Girosi F. Support vector machines: training and applications. 1997.Google Scholar
  13. 13.
    Cristianini N, Shawe-Taylor J. An introduction to support vector machines and other kernel-based learning methods. Cambridge university press; 2000.Google Scholar
  14. 14.
    Shao Y-H, Deng N-Y. A coordinate descent margin-based twin support vector machine for classification. Neural Netw. 2012;25:114–21.CrossRefPubMedGoogle Scholar
  15. 15.
    Orhan U, Hekim M, Ozer M. EEG signals classification using the K-means clustering and a multilayer perceptron neural network model. Expert Systems Appl. 2011;38(10):13475–81.CrossRefGoogle Scholar
  16. 16.
    Yaghini M, Khoshraftar MM, Fallahi M. A hybrid algorithm for artificial neural network training. Eng Appl Artif Intell. 2013;26(1):293–301.CrossRefGoogle Scholar
  17. 17.
    Temurtas F. A comparative study on thyroid disease diagnosis using neural networks. Expert Systems Appl. 2009;36(1):944–9.CrossRefGoogle Scholar
  18. 18.
    Witten I, Frank E, Hall M. Data mining: practical machine learning tools and techniques. 3rd edition. San Francisco: Morgan Kaufmann; 2011.Google Scholar
  19. 19.
    Xing Z, Pei J, Keogh E. A brief survey on sequence classification. ACM SIGKDD Explorations Newsletter. 2010;12(1):40–8.CrossRefGoogle Scholar
  20. 20.
    Nakayama N, Oketani M, Kawamura Y, Inao M, Nagoshi S, Fujiwara K, et al. Algorithm to determine the outcome of patients with acute liver failure: a data-mining analysis using decision trees. J Gastroenterol. 2012;47(6):664–77.CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Setsirichok D, Piroonratana T, Wongseree W, Usavanarong T, Paulkhaolarn N, Kanjanakorn C, et al. Classification of complete blood count and haemoglobin-typing data by a C4.5 decision tree, a naïve Bayes classifier and a multilayer perceptron for thalassaemia screening. Biomedical Signal Processing and Control. 2012;7(2):202–12.CrossRefGoogle Scholar
  22. 22.
    Kurt I, Ture M, Kurum AT. Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Syst Appl. 2008;34(1):366–74.CrossRefGoogle Scholar
  23. 23.
    Olson DL, Delen D. Advanced data mining techniques [electronic resource]. Springer; 2008.Google Scholar
  24. 24.
    Karthikeyani V, Begum IP. Comparison a performance of data mining algorithms (CPDMA) in prediction of diabetes disease. International Journal. 2013.Google Scholar
  25. 25.
    Huang C-L, Wang C-J. A GA-based feature selection and parameters optimization for support vector machines. Expert Sys Appl. 2006;31(2):231–40.CrossRefGoogle Scholar
  26. 26.
    Kahramanli H, Allahverdi N. Design of a hybrid system for the diabetes and heart diseases. Expert Sys Appl. 2008;35(1):82–9.CrossRefGoogle Scholar
  27. 27.
    Khashei M, Zeinal Hamadani A, Bijari M. A novel hybrid classification model of artificial neural networks and multiple linear regression models. Expert Systems Appl. 2012;39(3):2606–20.CrossRefGoogle Scholar
  28. 28.
    Khashei M, Eftekhari S, Parvizian J. Diagnosing diabetes type II using a soft intelligent binary classification model. Review of Bioinformatics and Biometrics. 2012;1 (1).Google Scholar
  29. 29.
    Ibrikci T, Ustun D, Kaya IE. Diagnosis of several diseases by using combined kernels with support vector machine. J Med Syst. 2012;36(3):1831–40.CrossRefPubMedGoogle Scholar
  30. 30.
    Karegowda AG, Manjunath A, Jayaram M. Application of genetic algorithm optimized neural network connection weights for medical diagnosis of Pima Indians diabetes. Int J Soft Computing. 2011;2(2):15–23.CrossRefGoogle Scholar

Copyright information

© Research Society for Study of Diabetes in India 2015

Authors and Affiliations

  • Mahmoud Heydari
    • 1
  • Mehdi Teimouri
    • 1
  • Zainabolhoda Heshmati
    • 1
  • Seyed Mohammad Alavinia
    • 2
  1. 1.Faculty of New Sciences and TechnologiesUniversity of TehranTehranIran
  2. 2.Vector-borne Diseases Research CenterNorth Khorasan University of Medical SciencesBojnurdIran

Personalised recommendations