Comparison of various classification algorithms in the diagnosis of type 2 diabetes in Iran
- 344 Downloads
In today’s medical world, data on symptoms of patients with various diseases are so widespread, that analysis and consideration of all factors is merely not possible by a person (doctor). Therefore, the need for an intelligent system to consider the various factors and identify a suitable model between the different parameters is evident. Knowledge of data mining, as the foundation of such systems, has played a vital role in the advancement of medical sciences, especially in diagnosis of various diseases. Type 2 diabetes is one of these diseases, which has increased in recent years, which if diagnosed late can lead to serious complications. In this paper, several data mining methods and algorithms have been used and applied to a set of screening data for type 2 diabetes in Tabriz, Iran. The performance of methods such as support vector machine, artificial neural network, decision tree, nearest neighbors, and Bayesian network has been compared in an effort to find the best algorithm for diagnosing this disease. Artificial neural network with an accuracy rate of 97.44 % has the best performance on the chosen dataset. Accuracy rates for support vector machine, decision tree, 5-nearest neighbor, and Bayesian network are 81.19, 95.03, 90.85, and 91.60 %, respectively. The results of the simulations show that the effectiveness of various classification techniques on a dataset depends on the application, as well as the nature and complexity of the dataset used. Moreover, it is not always possible to say that a classification technique will always have the best performance. Therefore, in cases where data mining is used for diagnosis or prediction of diseases, consultation with specialists is inevitable, for selecting the number and type of dataset parameters to obtain the best possible results.
KeywordsType 2 diabetes Support vector machine Decision tree Artificial neural network Nearest neighbors Bayesian network
M.H. performed the literature search and data analysis. He also prepared the manuscript. M.T. is the corresponding author. He designed the study and supervised data analysis. He also edited and reviewed the manuscript. Z.H. assisted in data analysis. She also edited and reviewed the manuscript. S.M.A. assisted in data acquisition and data analysis. He also reviewed the manuscript. All authors read and approved the final manuscript.
- 2.Prevention and control of non-communicable diseases. WHO Information Note 23 July 2010.Google Scholar
- 3.Global Health Observatory (GHO) data: NCD mortality and morbidity. http://www.who.int/gho/ncd/mortality_morbidity/en/. Accessed 23 February 2015.
- 4.Cerqueira M, Cravioto A, Dianis N, Ghannem H, Levitt A, Yan L. Global response to non-communicable disease. BMJ. 2011;342 (d3823).Google Scholar
- 5.Diabetes: fact sheet N°312. http://www.who.int/mediacentre/factsheets/fs312/en/. Accessed 23 February 2015.
- 6.IDF Diabetes Atlas. 5th ed. International Diabetes Federation; 2011.Google Scholar
- 8.Hagan MT, Demuth HB, Beale MH. Neural network design. Boston: Pws Pub; 1996.Google Scholar
- 9.Kayaer K, Yıldırım T, editors. Medical diagnosis on Pima Indian diabetes using general regression neural networks. Proceedings of the international conference on artificial neural networks and neural information processing (ICANN/ICONIP); 2003.Google Scholar
- 11.Al Jarullah AA, editor. Decision tree discovery for the diagnosis of type II diabetes. Innovations in Information Technology (IIT), 2011 International Conference on; 2011: IEEE.Google Scholar
- 12.Osuna E, Freund R, Girosi F. Support vector machines: training and applications. 1997.Google Scholar
- 13.Cristianini N, Shawe-Taylor J. An introduction to support vector machines and other kernel-based learning methods. Cambridge university press; 2000.Google Scholar
- 18.Witten I, Frank E, Hall M. Data mining: practical machine learning tools and techniques. 3rd edition. San Francisco: Morgan Kaufmann; 2011.Google Scholar
- 21.Setsirichok D, Piroonratana T, Wongseree W, Usavanarong T, Paulkhaolarn N, Kanjanakorn C, et al. Classification of complete blood count and haemoglobin-typing data by a C4.5 decision tree, a naïve Bayes classifier and a multilayer perceptron for thalassaemia screening. Biomedical Signal Processing and Control. 2012;7(2):202–12.CrossRefGoogle Scholar
- 23.Olson DL, Delen D. Advanced data mining techniques [electronic resource]. Springer; 2008.Google Scholar
- 24.Karthikeyani V, Begum IP. Comparison a performance of data mining algorithms (CPDMA) in prediction of diabetes disease. International Journal. 2013.Google Scholar
- 28.Khashei M, Eftekhari S, Parvizian J. Diagnosing diabetes type II using a soft intelligent binary classification model. Review of Bioinformatics and Biometrics. 2012;1 (1).Google Scholar