Abstract
Diabetes has become one of the most common diseases in middle- and low-income countries. Machine learning (ML) and data mining techniques have recently been used to predict diabetes with a high success rate. As a result, medical professionals seek a dependable method for predicting diagnosis. Of course, the feature selection process may be considered a global combinatorial optimization problem in machine learning. The number of features is reduced, irrelevant, noisy, redundant data are removed, and classification accuracy is acceptable. This work uses particle swarm optimization (PSO) to implement feature selection, followed by performance comparison. After that, three medical datasets are used to compare the performance of several machine learning methods. Standard approaches are used to determine the optimum technique for the three datasets. The best results for three datasets are reported for each scheme. The primary goal is to assess the validity of each algorithm's data classification in terms of efficiency and effectiveness in terms of accuracy, sensitivity, and specificity. Decision Tree, Random Forest, and Naïve Bayes deliver the highest accuracy with the lowest mistake rate, according to the findings of the experiments. Machine learning may classify and determine which instances should be sent to medical for further evaluation and treatment with high accuracy. Using such an algorithm on a global scale could help minimize the number of people diagnosed with diabetes.
Similar content being viewed by others
Data availability
References
Abdollahi J, Moghaddam BN, Parvar ME. Improving diabetes diagnosis in smart health using a genetic-based ensemble learning algorithm. Approach to IoT infrastructure. Future Gen Distrib Syst J. 2019;1:23–30.
Abdollahi J, Nouri-Moghaddam B. Hybrid stacked ensemble combined with geneticalgorithms for diabetes prediction. Iran J Comput Sci. 2022;5:1–16.
Kopitar L, Kocbek P, Cilar L, Sheikh A, Stiglic G. Early detection of type 2 diabetes mellitus using machine learning-based prediction models. Sci Rep. 2020;10(1):1–12.
Tigga NP, Garg S. Prediction of type 2 diabetes using machine learning classification methods. Procedia Comput Sci. 2020;167:706–16.
Younus M, Munna MTA, Alam MM, Allayear SM, Ara SJF (2020) Prediction model for prevalence of type-2 diabetes mellitus complications using machine learning approach. In: Data Management and Analysis. Springer, Cham, pp 103–116
Perveen S, Shahbaz M, Saba T, Keshavjee K, Rehman A, Guergachi A. Handling irregularly sampled longitudinal data and predictive modeling of diabetes using machine learning technique. IEEE Access. 2020;8:21875–85.
Kalra S, Singal A, Lathia T. What’s in a name? Redefining type 2 diabetes remission. Diabetes Therapy. 2021;12:1–8.
Saru S, Subashree S (2019) Analysis and prediction of diabetes using machine learning. Int J Emerg Technol Innov Eng 5(4)
Ahmad I. Feature selection using particle swarm optimization in intrusion detection. Int J Distrib Sens Netw. 2015;11(10): 806954.
Prasad KS, Reddy NCS, Puneeth BN. A framework for diagnosing kidney disease in diabetes patients using classification algorithms. SN Comput Sci. 2020;1(2):1–6.
Chen M, Hao Y, Hwang K, Wang L, Wang L. Disease prediction by machine learning over big data from healthcare communities. Ieee Access. 2017;5:8869–79.
Rahman RM, Afroz F. Comparison of various classification techniques using different data mining tools for diabetes diagnosis. J Softw Eng Appl. 2013;6(03):85.
Nagarajan S, Chandrasekaran RM. Design and implementation of expert clinical system for diagnosing diabetes using data mining techniques. Indian J Sci Technol. 2015;8(8):771–6.
Yıldırım EG, Karahoca A, Uçar T. Dosage planning for diabetes patients using data mining methods. Procedia Comput Sci. 2011;3:1374–80.
Garga SB, Mahajanb AK, Kamalc TS (2017) An approach for diabetes detection using data mining classification techniques. Int J Eng Sci
Maniruzzaman M, Rahman MJ, Ahammed B, Abedin MM. Classification and prediction of diabetes disease using machine learning paradigm. Health Inf Sci Syst. 2020;8(1):1–14.
Shakeel PM, Baskar S, Dhulipala VS, Jaber MM. Cloud based framework for diagnosis of diabetes mellitus using K-means clustering. Health Inf Sci Syst. 2018;6(1):1–7.
Choi SB, Kim WJ, Yoo TK, Park JS, Chung JW, Lee YH, Kim DW. Screening for prediabetes using machine learning models. Comput Math Methods Med. 2014;2014:1.
Kaur H, Kumari V. Predictive modelling and analytics for diabetes using a machine learning approach. Appl Comput Inf. 2020;18:90.
Patil R, Tamane SC (2020) PSO-ANN-based computer-aided diagnosis and classification of diabetes. In: Smart Trends in Computing and Communications: Proceedings of SmartCom 2019, Springer Singapore, pp 11–20
Choubey DK, Kumar P, Tripathi S, Kumar S. Performance evaluation of classification methods with PCA and PSO for diabetes. Netw Model Anal Heal Inf Bioinf. 2020;9(1):5.
Hasan S, Shamsuddin SM. Multi-strategy learning and deep harmony memory improvisation for self-organizing neurons. Soft Comput. 2019;23(1):285–303.
Gregory JM, Slaughter JC, Duffus SH, Smith TJ, LeStourgeon LM, Jaser SS, Moore DJ. COVID-19 severity is tripled in the diabetes community: a prospective analysis of the pandemic’s impact in type 1 and type 2 diabetes. Diabetes Care. 2021;44(2):526–32.
Graham EA, Deschenes SS, Khalil MN, Danna S, Filion KB, Schmitz N. Measures of depression and risk of type 2 diabetes: a systematic review and meta-analysis. J Affect Disord. 2020;265:224–32.
Redondo MJ, Hagopian WA, Oram R, Steck AK, Vehik K, Weedon M, Dabelea D. The clinical consequences of heterogeneity within and between different diabetes types. Diabetologia. 2020;63(10):2040–8.
Gómez-Peralta F, Abreu C, Cos X, Gómez-Huelgas R (2020) When does diabetes start? Early detection and intervention in type 2 diabetes mellitus. Revista Clínica Española (English Edition)
Middleton TL, Constantino MI, Molyneaux L, D’Souza M, Twigg SM, Wu T, Wong J. Young-onset type 2 diabetes and younger current age: increased susceptibility to retinopathy in contrast to other complications. Diabetic Med. 2020;37(6):991–9.
Alkayyali T, Qutranji L, Kaya E, Bakir A, Yilmaz Y. Clinical utility of non-invasive scores in assessing advanced hepatic fibrosis in patients with type 2 diabetes mellitus: a study in biopsy-proven non-alcoholic fatty liver disease. Acta Diabetologia. 2020;57(5):613–8.
Marinov M, Mosa ASM, Yoo I, Boren SA. Data mining technologies for diabetes: a systematic review. J Diabet Sci Technol. 2011;5:1549–56.
Anjali K. A review on the diagnosis of diabetes mellitus. Int J Digit Appl Contemp Res. 2015;4(1):1–7.
Verma P, Kaur I, Kaur J. Review of diabetes detection by machine learning and data mining. Int J Adv Res Ideas Innov Technol. 2016;2:1–5.
Yue C et al (2008) An intelligent diagnosis to type 2 diabetes based on QPSO algorithm and WLS-SVM. In: 2008 International Symposium on Intelligent Information Technology Application Workshops
Islam MF, et al. Likelihood prediction of diabetes at early stage using data mining techniques. In: Computer vision and machine intelligence in medical image analysis. Springer; 2020. p. 113–25.
Rony MAT, Satu MS, Whaiduzzaman M (2021) Mining significant features of diabetes through employing various classification methods. In: 2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)
Mehrpour O, Saeedi F, Vohra V, Abdollahi J, Shirazi FM, Goss F. The role of decision tree and machine learning models for outcome prediction of bupropion exposure: a nationwide analysis of more than 14,000 patients in the United States. Basic Clin Pharmacol Toxicol. 2023. https://doi.org/10.1111/bcpt.13865.
Sarker IH. Machine learning: algorithms, real-world applications and research directions. SN Comput Sci. 2021;2(3):1–21.
Lalwani S, Sharma H, Satapathy SC, Deep K, Bansal JC. A survey on parallel particle swarm optimization algorithms. Arab J Sci Eng. 2019;44(4):2899–923.
Wang D, Tan D, Liu L. Particle swarm optimization algorithm: an overview. Soft Comput. 2018;22(2):387–408.
Le TM, Vo TM, Pham TN, Dao SVT. A novel wrapper–based feature selection for early diabetes prediction enhanced with a metaheuristic. IEEE Access. 2020;9:7869–84.
Kewat A, Srivastava PN, Kumhar D (2020) Performance evaluation of wrapper-based feature selection techniques for medical datasets. In: Advances in Computing and Intelligent Systems. Springer, Singapore, pp 619–633
Vanaja R, Mukherjee S (2018) Novel wrapper-based feature selection for efficient clinical decision support system. In: International Conference on Intelligent Information Technologies. Springer, Singapore, pp 113–129
Eberhart R, Kennedy J. Particle swarm optimization. Proc IEEE Int Confer Neural Netw. 1995;4:1942–8.
Remeseiro B, Bolon-Canedo V. A review of feature selection methods in medical applications. Comput Biol Med. 2019;112: 103375.
Yu K, Guo X, Liu L, Li J, Wang H, Ling Z, Wu X. Causality-based feature selection: methods and evaluations. ACM Comput Surv (CSUR). 2020;53(5):1–36.
Wah YB, Ibrahim N, Hamid HA, Abdul-Rahman S, Fong S (2018) Feature selection methods: case of filter and wrapper approaches for maximising classification accuracy. Pertanika J Sci Technol, 26(1)
Song X, Waitman LR, Hu Y, Yu AS, Robins D, Liu M. Robust clinical marker identification for diabetic kidney disease with ensemble feature selection. J Am Med Inform Assoc. 2019;26(3):242–53.
Biswas S, Bordoloi M, Purkayastha B. Review on feature selection and classification using neuro-fuzzy approaches. Int J Appl Evolut Comput (IJAEC). 2016;7(4):28–44.
Koumi F, Aldasht M, Tamimi H (2019) Efficient feature selection using particle swarm optimization: a hybrid filters-wrapper approach. In: 2019 10th International Conference on Information and Communication Systems (ICICS), pp 122–127
Feature selection using PSO-SVM (2007) Int J Comput Sci
Abdollahi J (2020) A review of Deep learning methods in the study, prediction and management of COVID-19. In: 10th International Conference on Innovation and Research in Engineering Science
Abdollahi J, Keshandehghan A, Gardaneh M, Panahi Y, Gardaneh M (2020) Accurate detection of breast cancer metastasis using a hybrid model of artificial intelligence algorithm. Arch Breast Cancer 22–28
Abdollahi J, Nouri-Moghaddam B, Ghazanfari M (2021) Deep neural network based ensemble learning algorithms for the healthcare system (diagnosis of chronic diseases). arXiv preprint arXiv:2103.08182
Abdollahi J, Nouri-Moghaddam B. A hybrid method for heart disease diagnosis utilizing feature selection based ensemble classifier model generation. Iran J Comput Sci. 2022;5:1–18.
Abdollahi J, Davari N, Panahi Y, Gardaneh M. Detection of metastatic breast cancer from whole-slide pathology images using an ensemble deep-learning method. Arch Breast Cancer. 2022. https://doi.org/10.32768/abc.202293364-376.
Abdollahi J (2022) Identification of medicinal plants in Ardabil using deep learning: identification of medicinal plants using deep learning. In: 2022 27th International Computer Conference, Computer Society of Iran (CSICC), pp 1–6
Abdollahi J, Mahmoudi L (2022) An artificial intelligence system for detecting the types of the epidemic from X-rays: artificial intelligence system for detecting the types of the epidemic from X-rays. In: 2022 27th International Computer Conference, Computer Society of Iran (CSICC), pp 1–6
Herliana A, Arifin T, Susanti S, Hikmah AB (2018) Feature selection of diabetic retinopathy disease using particle swarm optimization and neural network. In: 2018 6th International Conference on Cyber and IT Service Management (CITSM), pp 1–4
Li X, Zhang J, Safara F (2021) Improving the accuracy of diabetes diagnosis applications through a hybrid feature selection algorithm. Neural Process Lett 1–17
Daliri MR. Feature selection using binary particle swarm optimization and support vector machines for medical diagnosis. Biomedizinische Technik/Biomed Eng. 2012;57(5):395–402.
Soliman OS, AboElhamd E (2014) Classification of diabetes mellitus using modified particle swarm optimization and least squares support vector machine. arXiv preprint arXiv:1405.0549
Oladimeji OO, Oladimeji A, Oladimeji O. Classification models for likelihood prediction of diabetes at early stage using feature selection. Appl Comput Inf. 2021. https://doi.org/10.1108/ACI-01-2021-0022.
Kamel SR, Yaghoubzadeh R. Feature selection using grasshopper optimization algorithm in diagnosis of diabetes disease. Inf Med Unlock. 2021;26: 100707.
Chaki J, Ganesh ST, Cidham SK, Theertan SA. Machine learning and artificial intelligence based diabetes mellitus detection and self-management: a systematic review. J King Saud Univ Comput Inf Sci. 2020;32:1158.
Biswas R, Vasan A, Roy SS. Dilated deep neural network for segmentation of retinal blood vessels in fundus images. Iran J Sci Technol Trans Electr Eng. 2020;44(1):505–18.
Perveen S, Shahbaz M, Keshavjee K, Guergachi A. Prognostic modeling and prevention of diabetes using machine learning technique. Sci Rep. 2019;9(1):1–9.
Barik S, Mohanty S, Mohanty S, Singh D (2021) Analysis of prediction accuracy of diabetes using classifier and hybrid machine learning techniques. In: Intelligent and Cloud Computing, Springer, Singapore, pp 399–409
Hossain ME, Uddin S, Khan A. Network analytics and machine learning for predictive risk modeling of cardiovascular disease in patients with type 2 diabetes. Expert Syst Appl. 2021;164: 113918.
Tigga NP, Garg S (2021). Predicting type 2 diabetes using logistic regression. In: Proceedings of the Fourth International Conference on Microelectronics, Computing and Communication Systems, Springer, Singapore, pp 491–500
Emon MU, Keya MS, Kaiser MS, Tanha T, Zulfiker MS (2021) Primary stage of diabetes prediction using machine learning approaches. In: The 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), IEEE, pp 364–367
Joshi RD, Dhakal CK. Predicting type 2 diabetes using logistic regression and machine learning approaches. Int J Environ Res Public Health. 2021;18(14):7346.
Khaleel FA, Al-Bakry AM (2021) Diagnosis of diabetes using machine learning algorithms. Mater Today Proc
Li X, Zhang J, Safara F. Improving the accuracy of diabetes diagnosis applications through a hybrid feature selection algorithm. Neural Process Lett. 2023;55:153–69. https://doi.org/10.1007/s11063-021-10491-0.
Santhanam T, Padmavathi MS. Application of K-means and genetic algorithms for dimension reduction by integrating SVM for diabetes diagnosis. Procedia Comput Sci. 2015;47:76–83.
Kumar R, Kumar P, Tripathi R, Gupta GP, Islam AN, Shorfuzzaman M. Permissioned blockchain and deep-learning for secure and efficient data sharing in industrial healthcare systems. IEEE Trans Ind Inf. 2022;18:8065.
Kumar P, Kumar R, Gupta GP, Tripathi R, Srivastava G. P2tif: a blockchain and deep learning framework for privacy-preserved threat intelligence in industrial iot. IEEE Trans Ind Inf. 2022;18:6358.
Kumar P, Kumar R, Gupta GP, Tripathi R. BDEdge: blockchain and deep-learning for secure edge-envisioned green CAVs. IEEE Trans Green Commun Netw. 2022;6:1330.
Abdollahi J, Irani AJ, Nouri-Moghaddam B (2021) Modeling and forecasting Spread of COVID-19 epidemic in Iran until Sep 22, 2021, based on deep learning. arXiv preprint arXiv:2103.08178
Abdollahi J, Mahmoudi L Investigation of artificial intelligence in stock market prediction studies. In: 10th International Conference on Innovation and Research in Engineering Science
Amani F, Abdollahi J, Mohammadnia A, Amani P, Fattahzadeh-Ardalani G. Using stacking methods based genetic algorithm to predict the time between symptom onset and hospital arrival in stroke patients and its related factors. JBE. 2022;8(1):8–23.
Khavandi H, Moghadam BN, Abdollahi J, Branch A. Maximizing the impact on social networks using the combination of PSO and GA algorithms. Future Generat Distrib Syst. 2023;5:1–13.
Funding
None.
Author information
Authors and Affiliations
Contributions
JA: designed and performed experiments and analyzed data. SA supervised the findings of this work and co-wrote the paper. All authors discussed the results and contributed to the final manuscript.
Corresponding author
Ethics declarations
Conflict of Interest
None declared.
Ethical Approval
Not required.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Pattern Recognition and Machine Learning” guest edited by Ashish Ghosh, Monidipa Das and Anwesha Law.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Abdollahi, J., Aref, S. Early Prediction of Diabetes Using Feature Selection and Machine Learning Algorithms. SN COMPUT. SCI. 5, 217 (2024). https://doi.org/10.1007/s42979-023-02545-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-023-02545-y