Skip to main content
Log in

Early Prediction of Diabetes Using Feature Selection and Machine Learning Algorithms

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Diabetes has become one of the most common diseases in middle- and low-income countries. Machine learning (ML) and data mining techniques have recently been used to predict diabetes with a high success rate. As a result, medical professionals seek a dependable method for predicting diagnosis. Of course, the feature selection process may be considered a global combinatorial optimization problem in machine learning. The number of features is reduced, irrelevant, noisy, redundant data are removed, and classification accuracy is acceptable. This work uses particle swarm optimization (PSO) to implement feature selection, followed by performance comparison. After that, three medical datasets are used to compare the performance of several machine learning methods. Standard approaches are used to determine the optimum technique for the three datasets. The best results for three datasets are reported for each scheme. The primary goal is to assess the validity of each algorithm's data classification in terms of efficiency and effectiveness in terms of accuracy, sensitivity, and specificity. Decision Tree, Random Forest, and Naïve Bayes deliver the highest accuracy with the lowest mistake rate, according to the findings of the experiments. Machine learning may classify and determine which instances should be sent to medical for further evaluation and treatment with high accuracy. Using such an algorithm on a global scale could help minimize the number of people diagnosed with diabetes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

https://archive.ics.uci.edu/dataset/34/diabetes.

References

  1. Abdollahi J, Moghaddam BN, Parvar ME. Improving diabetes diagnosis in smart health using a genetic-based ensemble learning algorithm. Approach to IoT infrastructure. Future Gen Distrib Syst J. 2019;1:23–30.

    Google Scholar 

  2. Abdollahi J, Nouri-Moghaddam B. Hybrid stacked ensemble combined with geneticalgorithms for diabetes prediction. Iran J Comput Sci. 2022;5:1–16.

    Article  Google Scholar 

  3. Kopitar L, Kocbek P, Cilar L, Sheikh A, Stiglic G. Early detection of type 2 diabetes mellitus using machine learning-based prediction models. Sci Rep. 2020;10(1):1–12.

    Article  Google Scholar 

  4. Tigga NP, Garg S. Prediction of type 2 diabetes using machine learning classification methods. Procedia Comput Sci. 2020;167:706–16.

    Article  Google Scholar 

  5. Younus M, Munna MTA, Alam MM, Allayear SM, Ara SJF (2020) Prediction model for prevalence of type-2 diabetes mellitus complications using machine learning approach. In: Data Management and Analysis. Springer, Cham, pp 103–116

  6. Perveen S, Shahbaz M, Saba T, Keshavjee K, Rehman A, Guergachi A. Handling irregularly sampled longitudinal data and predictive modeling of diabetes using machine learning technique. IEEE Access. 2020;8:21875–85.

    Article  Google Scholar 

  7. Kalra S, Singal A, Lathia T. What’s in a name? Redefining type 2 diabetes remission. Diabetes Therapy. 2021;12:1–8.

    Article  Google Scholar 

  8. Saru S, Subashree S (2019) Analysis and prediction of diabetes using machine learning. Int J Emerg Technol Innov Eng 5(4)

  9. Ahmad I. Feature selection using particle swarm optimization in intrusion detection. Int J Distrib Sens Netw. 2015;11(10): 806954.

    Google Scholar 

  10. Prasad KS, Reddy NCS, Puneeth BN. A framework for diagnosing kidney disease in diabetes patients using classification algorithms. SN Comput Sci. 2020;1(2):1–6.

    Article  Google Scholar 

  11. Chen M, Hao Y, Hwang K, Wang L, Wang L. Disease prediction by machine learning over big data from healthcare communities. Ieee Access. 2017;5:8869–79.

    Article  Google Scholar 

  12. Rahman RM, Afroz F. Comparison of various classification techniques using different data mining tools for diabetes diagnosis. J Softw Eng Appl. 2013;6(03):85.

    Article  Google Scholar 

  13. Nagarajan S, Chandrasekaran RM. Design and implementation of expert clinical system for diagnosing diabetes using data mining techniques. Indian J Sci Technol. 2015;8(8):771–6.

    Article  Google Scholar 

  14. Yıldırım EG, Karahoca A, Uçar T. Dosage planning for diabetes patients using data mining methods. Procedia Comput Sci. 2011;3:1374–80.

    Article  Google Scholar 

  15. Garga SB, Mahajanb AK, Kamalc TS (2017) An approach for diabetes detection using data mining classification techniques. Int J Eng Sci

  16. Maniruzzaman M, Rahman MJ, Ahammed B, Abedin MM. Classification and prediction of diabetes disease using machine learning paradigm. Health Inf Sci Syst. 2020;8(1):1–14.

    Article  Google Scholar 

  17. Shakeel PM, Baskar S, Dhulipala VS, Jaber MM. Cloud based framework for diagnosis of diabetes mellitus using K-means clustering. Health Inf Sci Syst. 2018;6(1):1–7.

    Article  Google Scholar 

  18. Choi SB, Kim WJ, Yoo TK, Park JS, Chung JW, Lee YH, Kim DW. Screening for prediabetes using machine learning models. Comput Math Methods Med. 2014;2014:1.

    Article  Google Scholar 

  19. Kaur H, Kumari V. Predictive modelling and analytics for diabetes using a machine learning approach. Appl Comput Inf. 2020;18:90.

    Google Scholar 

  20. Patil R, Tamane SC (2020) PSO-ANN-based computer-aided diagnosis and classification of diabetes. In: Smart Trends in Computing and Communications: Proceedings of SmartCom 2019, Springer Singapore, pp 11–20

  21. Choubey DK, Kumar P, Tripathi S, Kumar S. Performance evaluation of classification methods with PCA and PSO for diabetes. Netw Model Anal Heal Inf Bioinf. 2020;9(1):5.

    Article  Google Scholar 

  22. Hasan S, Shamsuddin SM. Multi-strategy learning and deep harmony memory improvisation for self-organizing neurons. Soft Comput. 2019;23(1):285–303.

    Article  Google Scholar 

  23. Gregory JM, Slaughter JC, Duffus SH, Smith TJ, LeStourgeon LM, Jaser SS, Moore DJ. COVID-19 severity is tripled in the diabetes community: a prospective analysis of the pandemic’s impact in type 1 and type 2 diabetes. Diabetes Care. 2021;44(2):526–32.

    Article  Google Scholar 

  24. Graham EA, Deschenes SS, Khalil MN, Danna S, Filion KB, Schmitz N. Measures of depression and risk of type 2 diabetes: a systematic review and meta-analysis. J Affect Disord. 2020;265:224–32.

    Article  Google Scholar 

  25. Redondo MJ, Hagopian WA, Oram R, Steck AK, Vehik K, Weedon M, Dabelea D. The clinical consequences of heterogeneity within and between different diabetes types. Diabetologia. 2020;63(10):2040–8.

    Article  Google Scholar 

  26. Gómez-Peralta F, Abreu C, Cos X, Gómez-Huelgas R (2020) When does diabetes start? Early detection and intervention in type 2 diabetes mellitus. Revista Clínica Española (English Edition)

  27. Middleton TL, Constantino MI, Molyneaux L, D’Souza M, Twigg SM, Wu T, Wong J. Young-onset type 2 diabetes and younger current age: increased susceptibility to retinopathy in contrast to other complications. Diabetic Med. 2020;37(6):991–9.

    Article  Google Scholar 

  28. Alkayyali T, Qutranji L, Kaya E, Bakir A, Yilmaz Y. Clinical utility of non-invasive scores in assessing advanced hepatic fibrosis in patients with type 2 diabetes mellitus: a study in biopsy-proven non-alcoholic fatty liver disease. Acta Diabetologia. 2020;57(5):613–8.

    Article  Google Scholar 

  29. Marinov M, Mosa ASM, Yoo I, Boren SA. Data mining technologies for diabetes: a systematic review. J Diabet Sci Technol. 2011;5:1549–56.

    Article  Google Scholar 

  30. Anjali K. A review on the diagnosis of diabetes mellitus. Int J Digit Appl Contemp Res. 2015;4(1):1–7.

    Google Scholar 

  31. Verma P, Kaur I, Kaur J. Review of diabetes detection by machine learning and data mining. Int J Adv Res Ideas Innov Technol. 2016;2:1–5.

    Google Scholar 

  32. Yue C et al (2008) An intelligent diagnosis to type 2 diabetes based on QPSO algorithm and WLS-SVM. In: 2008 International Symposium on Intelligent Information Technology Application Workshops

  33. Islam MF, et al. Likelihood prediction of diabetes at early stage using data mining techniques. In: Computer vision and machine intelligence in medical image analysis. Springer; 2020. p. 113–25.

    Chapter  Google Scholar 

  34. Rony MAT, Satu MS, Whaiduzzaman M (2021) Mining significant features of diabetes through employing various classification methods. In: 2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)

  35. Mehrpour O, Saeedi F, Vohra V, Abdollahi J, Shirazi FM, Goss F. The role of decision tree and machine learning models for outcome prediction of bupropion exposure: a nationwide analysis of more than 14,000 patients in the United States. Basic Clin Pharmacol Toxicol. 2023. https://doi.org/10.1111/bcpt.13865.

    Article  Google Scholar 

  36. Sarker IH. Machine learning: algorithms, real-world applications and research directions. SN Comput Sci. 2021;2(3):1–21.

    Article  MathSciNet  Google Scholar 

  37. Lalwani S, Sharma H, Satapathy SC, Deep K, Bansal JC. A survey on parallel particle swarm optimization algorithms. Arab J Sci Eng. 2019;44(4):2899–923.

    Article  Google Scholar 

  38. Wang D, Tan D, Liu L. Particle swarm optimization algorithm: an overview. Soft Comput. 2018;22(2):387–408.

    Article  Google Scholar 

  39. Le TM, Vo TM, Pham TN, Dao SVT. A novel wrapper–based feature selection for early diabetes prediction enhanced with a metaheuristic. IEEE Access. 2020;9:7869–84.

    Article  Google Scholar 

  40. Kewat A, Srivastava PN, Kumhar D (2020) Performance evaluation of wrapper-based feature selection techniques for medical datasets. In: Advances in Computing and Intelligent Systems. Springer, Singapore, pp 619–633

  41. Vanaja R, Mukherjee S (2018) Novel wrapper-based feature selection for efficient clinical decision support system. In: International Conference on Intelligent Information Technologies. Springer, Singapore, pp 113–129

  42. Eberhart R, Kennedy J. Particle swarm optimization. Proc IEEE Int Confer Neural Netw. 1995;4:1942–8.

    Article  Google Scholar 

  43. Remeseiro B, Bolon-Canedo V. A review of feature selection methods in medical applications. Comput Biol Med. 2019;112: 103375.

    Article  Google Scholar 

  44. Yu K, Guo X, Liu L, Li J, Wang H, Ling Z, Wu X. Causality-based feature selection: methods and evaluations. ACM Comput Surv (CSUR). 2020;53(5):1–36.

    Article  Google Scholar 

  45. Wah YB, Ibrahim N, Hamid HA, Abdul-Rahman S, Fong S (2018) Feature selection methods: case of filter and wrapper approaches for maximising classification accuracy. Pertanika J Sci Technol, 26(1)

  46. Song X, Waitman LR, Hu Y, Yu AS, Robins D, Liu M. Robust clinical marker identification for diabetic kidney disease with ensemble feature selection. J Am Med Inform Assoc. 2019;26(3):242–53.

    Article  Google Scholar 

  47. Biswas S, Bordoloi M, Purkayastha B. Review on feature selection and classification using neuro-fuzzy approaches. Int J Appl Evolut Comput (IJAEC). 2016;7(4):28–44.

    Article  Google Scholar 

  48. Koumi F, Aldasht M, Tamimi H (2019) Efficient feature selection using particle swarm optimization: a hybrid filters-wrapper approach. In: 2019 10th International Conference on Information and Communication Systems (ICICS), pp 122–127

  49. Feature selection using PSO-SVM (2007) Int J Comput Sci

  50. Abdollahi J (2020) A review of Deep learning methods in the study, prediction and management of COVID-19. In: 10th International Conference on Innovation and Research in Engineering Science

  51. Abdollahi J, Keshandehghan A, Gardaneh M, Panahi Y, Gardaneh M (2020) Accurate detection of breast cancer metastasis using a hybrid model of artificial intelligence algorithm. Arch Breast Cancer 22–28

  52. Abdollahi J, Nouri-Moghaddam B, Ghazanfari M (2021) Deep neural network based ensemble learning algorithms for the healthcare system (diagnosis of chronic diseases). arXiv preprint arXiv:2103.08182

  53. Abdollahi J, Nouri-Moghaddam B. A hybrid method for heart disease diagnosis utilizing feature selection based ensemble classifier model generation. Iran J Comput Sci. 2022;5:1–18.

    Article  Google Scholar 

  54. Abdollahi J, Davari N, Panahi Y, Gardaneh M. Detection of metastatic breast cancer from whole-slide pathology images using an ensemble deep-learning method. Arch Breast Cancer. 2022. https://doi.org/10.32768/abc.202293364-376.

    Article  Google Scholar 

  55. Abdollahi J (2022) Identification of medicinal plants in Ardabil using deep learning: identification of medicinal plants using deep learning. In: 2022 27th International Computer Conference, Computer Society of Iran (CSICC), pp 1–6

  56. Abdollahi J, Mahmoudi L (2022) An artificial intelligence system for detecting the types of the epidemic from X-rays: artificial intelligence system for detecting the types of the epidemic from X-rays. In: 2022 27th International Computer Conference, Computer Society of Iran (CSICC), pp 1–6

  57. Herliana A, Arifin T, Susanti S, Hikmah AB (2018) Feature selection of diabetic retinopathy disease using particle swarm optimization and neural network. In: 2018 6th International Conference on Cyber and IT Service Management (CITSM), pp 1–4

  58. Li X, Zhang J, Safara F (2021) Improving the accuracy of diabetes diagnosis applications through a hybrid feature selection algorithm. Neural Process Lett 1–17

  59. Daliri MR. Feature selection using binary particle swarm optimization and support vector machines for medical diagnosis. Biomedizinische Technik/Biomed Eng. 2012;57(5):395–402.

    Article  Google Scholar 

  60. Soliman OS, AboElhamd E (2014) Classification of diabetes mellitus using modified particle swarm optimization and least squares support vector machine. arXiv preprint arXiv:1405.0549

  61. Oladimeji OO, Oladimeji A, Oladimeji O. Classification models for likelihood prediction of diabetes at early stage using feature selection. Appl Comput Inf. 2021. https://doi.org/10.1108/ACI-01-2021-0022.

    Article  Google Scholar 

  62. Kamel SR, Yaghoubzadeh R. Feature selection using grasshopper optimization algorithm in diagnosis of diabetes disease. Inf Med Unlock. 2021;26: 100707.

    Article  Google Scholar 

  63. Chaki J, Ganesh ST, Cidham SK, Theertan SA. Machine learning and artificial intelligence based diabetes mellitus detection and self-management: a systematic review. J King Saud Univ Comput Inf Sci. 2020;32:1158.

    Google Scholar 

  64. Biswas R, Vasan A, Roy SS. Dilated deep neural network for segmentation of retinal blood vessels in fundus images. Iran J Sci Technol Trans Electr Eng. 2020;44(1):505–18.

    Article  Google Scholar 

  65. Perveen S, Shahbaz M, Keshavjee K, Guergachi A. Prognostic modeling and prevention of diabetes using machine learning technique. Sci Rep. 2019;9(1):1–9.

    Article  Google Scholar 

  66. Barik S, Mohanty S, Mohanty S, Singh D (2021) Analysis of prediction accuracy of diabetes using classifier and hybrid machine learning techniques. In: Intelligent and Cloud Computing, Springer, Singapore, pp 399–409

  67. Hossain ME, Uddin S, Khan A. Network analytics and machine learning for predictive risk modeling of cardiovascular disease in patients with type 2 diabetes. Expert Syst Appl. 2021;164: 113918.

    Article  Google Scholar 

  68. Tigga NP, Garg S (2021). Predicting type 2 diabetes using logistic regression. In: Proceedings of the Fourth International Conference on Microelectronics, Computing and Communication Systems, Springer, Singapore, pp 491–500

  69. Emon MU, Keya MS, Kaiser MS, Tanha T, Zulfiker MS (2021) Primary stage of diabetes prediction using machine learning approaches. In: The 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), IEEE, pp 364–367

  70. Joshi RD, Dhakal CK. Predicting type 2 diabetes using logistic regression and machine learning approaches. Int J Environ Res Public Health. 2021;18(14):7346.

    Article  Google Scholar 

  71. Khaleel FA, Al-Bakry AM (2021) Diagnosis of diabetes using machine learning algorithms. Mater Today Proc

  72. Li X, Zhang J, Safara F. Improving the accuracy of diabetes diagnosis applications through a hybrid feature selection algorithm. Neural Process Lett. 2023;55:153–69. https://doi.org/10.1007/s11063-021-10491-0.

    Article  Google Scholar 

  73. Santhanam T, Padmavathi MS. Application of K-means and genetic algorithms for dimension reduction by integrating SVM for diabetes diagnosis. Procedia Comput Sci. 2015;47:76–83.

    Article  Google Scholar 

  74. Kumar R, Kumar P, Tripathi R, Gupta GP, Islam AN, Shorfuzzaman M. Permissioned blockchain and deep-learning for secure and efficient data sharing in industrial healthcare systems. IEEE Trans Ind Inf. 2022;18:8065.

    Article  Google Scholar 

  75. Kumar P, Kumar R, Gupta GP, Tripathi R, Srivastava G. P2tif: a blockchain and deep learning framework for privacy-preserved threat intelligence in industrial iot. IEEE Trans Ind Inf. 2022;18:6358.

    Article  Google Scholar 

  76. Kumar P, Kumar R, Gupta GP, Tripathi R. BDEdge: blockchain and deep-learning for secure edge-envisioned green CAVs. IEEE Trans Green Commun Netw. 2022;6:1330.

    Article  Google Scholar 

  77. Abdollahi J, Irani AJ, Nouri-Moghaddam B (2021) Modeling and forecasting Spread of COVID-19 epidemic in Iran until Sep 22, 2021, based on deep learning. arXiv preprint arXiv:2103.08178

  78. Abdollahi J, Mahmoudi L Investigation of artificial intelligence in stock market prediction studies. In: 10th International Conference on Innovation and Research in Engineering Science

  79. Amani F, Abdollahi J, Mohammadnia A, Amani P, Fattahzadeh-Ardalani G. Using stacking methods based genetic algorithm to predict the time between symptom onset and hospital arrival in stroke patients and its related factors. JBE. 2022;8(1):8–23.

    Google Scholar 

  80. Khavandi H, Moghadam BN, Abdollahi J, Branch A. Maximizing the impact on social networks using the combination of PSO and GA algorithms. Future Generat Distrib Syst. 2023;5:1–13.

    Google Scholar 

Download references

Funding

None.

Author information

Authors and Affiliations

Authors

Contributions

JA: designed and performed experiments and analyzed data. SA supervised the findings of this work and co-wrote the paper. All authors discussed the results and contributed to the final manuscript.

Corresponding author

Correspondence to Jafar Abdollahi.

Ethics declarations

Conflict of Interest

None declared.

Ethical Approval

Not required.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Pattern Recognition and Machine Learning” guest edited by Ashish Ghosh, Monidipa Das and Anwesha Law.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abdollahi, J., Aref, S. Early Prediction of Diabetes Using Feature Selection and Machine Learning Algorithms. SN COMPUT. SCI. 5, 217 (2024). https://doi.org/10.1007/s42979-023-02545-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-023-02545-y

Keywords

Navigation