Abstract
In today’s medical world, data on symptoms of patients with various diseases are so widespread, that analysis and consideration of all factors is merely not possible by a person (doctor). Therefore, the need for an intelligent system to consider the various factors and identify a suitable model between the different parameters is evident. Knowledge of data mining, as the foundation of such systems, has played a vital role in the advancement of medical sciences, especially in diagnosis of various diseases. Type 2 diabetes is one of these diseases, which has increased in recent years, which if diagnosed late can lead to serious complications. In this paper, several data mining methods and algorithms have been used and applied to a set of screening data for type 2 diabetes in Tabriz, Iran. The performance of methods such as support vector machine, artificial neural network, decision tree, nearest neighbors, and Bayesian network has been compared in an effort to find the best algorithm for diagnosing this disease. Artificial neural network with an accuracy rate of 97.44 % has the best performance on the chosen dataset. Accuracy rates for support vector machine, decision tree, 5-nearest neighbor, and Bayesian network are 81.19, 95.03, 90.85, and 91.60 %, respectively. The results of the simulations show that the effectiveness of various classification techniques on a dataset depends on the application, as well as the nature and complexity of the dataset used. Moreover, it is not always possible to say that a classification technique will always have the best performance. Therefore, in cases where data mining is used for diagnosis or prediction of diseases, consultation with specialists is inevitable, for selecting the number and type of dataset parameters to obtain the best possible results.
Similar content being viewed by others
References
Shaw J, Sicree R, Zimmet P. Global estimates of the prevalence of diabetes for 2010 and 2030. Diabetes Res Clin Pract. 2010;87(1):4–14.
Prevention and control of non-communicable diseases. WHO Information Note 23 July 2010.
Global Health Observatory (GHO) data: NCD mortality and morbidity. http://www.who.int/gho/ncd/mortality_morbidity/en/. Accessed 23 February 2015.
Cerqueira M, Cravioto A, Dianis N, Ghannem H, Levitt A, Yan L. Global response to non-communicable disease. BMJ. 2011;342 (d3823).
Diabetes: fact sheet N°312. http://www.who.int/mediacentre/factsheets/fs312/en/. Accessed 23 February 2015.
IDF Diabetes Atlas. 5th ed. International Diabetes Federation; 2011.
Zimmet P. Diabetes epidemiology as a tool to trigger diabetes research and care. Diabetologia. 1999;42(5):499–518.
Hagan MT, Demuth HB, Beale MH. Neural network design. Boston: Pws Pub; 1996.
Kayaer K, Yıldırım T, editors. Medical diagnosis on Pima Indian diabetes using general regression neural networks. Proceedings of the international conference on artificial neural networks and neural information processing (ICANN/ICONIP); 2003.
Patil BM, Joshi RC, Toshniwal D. Hybrid prediction model for type-2 diabetic patients. Expert Systems Appl. 2010;37(12):8102–8.
Al Jarullah AA, editor. Decision tree discovery for the diagnosis of type II diabetes. Innovations in Information Technology (IIT), 2011 International Conference on; 2011: IEEE.
Osuna E, Freund R, Girosi F. Support vector machines: training and applications. 1997.
Cristianini N, Shawe-Taylor J. An introduction to support vector machines and other kernel-based learning methods. Cambridge university press; 2000.
Shao Y-H, Deng N-Y. A coordinate descent margin-based twin support vector machine for classification. Neural Netw. 2012;25:114–21.
Orhan U, Hekim M, Ozer M. EEG signals classification using the K-means clustering and a multilayer perceptron neural network model. Expert Systems Appl. 2011;38(10):13475–81.
Yaghini M, Khoshraftar MM, Fallahi M. A hybrid algorithm for artificial neural network training. Eng Appl Artif Intell. 2013;26(1):293–301.
Temurtas F. A comparative study on thyroid disease diagnosis using neural networks. Expert Systems Appl. 2009;36(1):944–9.
Witten I, Frank E, Hall M. Data mining: practical machine learning tools and techniques. 3rd edition. San Francisco: Morgan Kaufmann; 2011.
Xing Z, Pei J, Keogh E. A brief survey on sequence classification. ACM SIGKDD Explorations Newsletter. 2010;12(1):40–8.
Nakayama N, Oketani M, Kawamura Y, Inao M, Nagoshi S, Fujiwara K, et al. Algorithm to determine the outcome of patients with acute liver failure: a data-mining analysis using decision trees. J Gastroenterol. 2012;47(6):664–77.
Setsirichok D, Piroonratana T, Wongseree W, Usavanarong T, Paulkhaolarn N, Kanjanakorn C, et al. Classification of complete blood count and haemoglobin-typing data by a C4.5 decision tree, a naïve Bayes classifier and a multilayer perceptron for thalassaemia screening. Biomedical Signal Processing and Control. 2012;7(2):202–12.
Kurt I, Ture M, Kurum AT. Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Syst Appl. 2008;34(1):366–74.
Olson DL, Delen D. Advanced data mining techniques [electronic resource]. Springer; 2008.
Karthikeyani V, Begum IP. Comparison a performance of data mining algorithms (CPDMA) in prediction of diabetes disease. International Journal. 2013.
Huang C-L, Wang C-J. A GA-based feature selection and parameters optimization for support vector machines. Expert Sys Appl. 2006;31(2):231–40.
Kahramanli H, Allahverdi N. Design of a hybrid system for the diabetes and heart diseases. Expert Sys Appl. 2008;35(1):82–9.
Khashei M, Zeinal Hamadani A, Bijari M. A novel hybrid classification model of artificial neural networks and multiple linear regression models. Expert Systems Appl. 2012;39(3):2606–20.
Khashei M, Eftekhari S, Parvizian J. Diagnosing diabetes type II using a soft intelligent binary classification model. Review of Bioinformatics and Biometrics. 2012;1 (1).
Ibrikci T, Ustun D, Kaya IE. Diagnosis of several diseases by using combined kernels with support vector machine. J Med Syst. 2012;36(3):1831–40.
Karegowda AG, Manjunath A, Jayaram M. Application of genetic algorithm optimized neural network connection weights for medical diagnosis of Pima Indians diabetes. Int J Soft Computing. 2011;2(2):15–23.
Authors’ contribution
M.H. performed the literature search and data analysis. He also prepared the manuscript. M.T. is the corresponding author. He designed the study and supervised data analysis. He also edited and reviewed the manuscript. Z.H. assisted in data analysis. She also edited and reviewed the manuscript. S.M.A. assisted in data acquisition and data analysis. He also reviewed the manuscript. All authors read and approved the final manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Heydari, M., Teimouri, M., Heshmati, Z. et al. Comparison of various classification algorithms in the diagnosis of type 2 diabetes in Iran. Int J Diabetes Dev Ctries 36, 167–173 (2016). https://doi.org/10.1007/s13410-015-0374-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13410-015-0374-4