Skip to main content

Advertisement

Log in

Diabetes prediction model using machine learning techniques

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Diabetes has emerged as a significant global health concern, contributing to various severe complications such as kidney disease, vision loss, and coronary issues. Leveraging machine learning algorithms in medical services has shown promise in accurate disease diagnosis and treatment, thereby alleviating the burden on healthcare professionals. The field of diabetes forecasting has rapidly evolved, offering the potential for early intervention and patient empowerment. To this end, our study presents an innovative diabetes prediction model employing a range of machine learning techniques, including Logistic Regression, SVM, Naïve Bayes, and Random Forest. In addition to these foundational techniques, we harness the power of ensemble learning to further enhance prediction accuracy and robustness. Specifically, we explore ensemble methods such as XGBoost, LightGBM, CatBoost, Adaboost, and Bagging. These techniques amalgamate predictions from multiple base learners, yielding a more precise and resilient final prediction. Our proposed framework is developed and trained using Python, utilizing a real-world dataset sourced from Kaggle. Our methodology is rigorously examined through performance evaluation metrics, including the confusion matrix, sensitivity, and accuracy measurements. Among the ensemble techniques tested, CatBoost emerges as the most effective, boasting an impressive accuracy rate of 95.4% compared to XGBoost's 94.3%. Furthermore, CatBoost's higher AUC-ROC score of 0.99 reinforces its potential superiority over XGBoost, which achieved an AUC-ROC score of 0.98.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

https://github.com/MicrosoftLearning/DP100/blob/master/data/diabetes.csv

References 

  1. Sisodia D, Sisodia DS (2018) Prediction of diabetes using classification algorithms. Procedia Comput Sci 132:1578–1585

    Article  Google Scholar 

  2. Hasan MK, Alam MA, Das D, Hossain E, Hasan M (2020) Diabetes prediction using ensembling of different machine learning classifiers. IEEE Access 8:76516–76531

    Article  Google Scholar 

  3. Saeedi P, Petersohn I, Salpea P et al (2019) Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: results from the International Diabetes Federation Diabetes Atlas. Diabetes Res Clin Pract 157:107843

    Article  Google Scholar 

  4. Idicula-Thomas S, Kulkarni AJ, Kulkarni BD, Jayaraman VK, Balaji PV (2006) A support vector machine-based method for predicting the propensity of a protein to be soluble or to form inclusion body on overexpression in Escherichia coli. Bioinformatics 22(3):278–284

    Article  Google Scholar 

  5. Mellitus D (2005) Diagnosis and classification of diabetes mellitus. Diabetes Care 28:S5–S10

    Google Scholar 

  6. Kalyankar GD, Poojara SR, Dharwadkar NV (2017) Predictive Analysis of Diabetic Patient Data Using Machine Learning and Hadoop. International Conference On I-SMAC, 978–1–5090–3243–3

  7. Khanam JJ, Foo SY (2021) A comparison of machine learning algorithms for diabetes prediction. ICT Exp 7(4):432–439

    Article  Google Scholar 

  8. Seka S, Pon K, Shakila S (2021) Machine Learning-Based Diabetic Disease Prediction With Big Healthcare Data. Webology 18:6

  9. Hasan MK et al (2020) Diabetes prediction using ensembling of different machine learning classifiers. IEEE Access 8:76516–76531

    Article  Google Scholar 

  10. Maniruzzaman M et al (2020) Classification and prediction of diabetes disease using machine learning paradigm. Health Information Sci Syst 8:1–14

    Google Scholar 

  11. Sun YL, Zhang DL (2019) Machine learning techniques for screening and diagnosis of diabetes: a survey. Tehnički vjesnik 26(3):872–880

    Google Scholar 

  12. Yoo I, Alafaireet P, Marinov M, Pena-Hernandez K, Gopidi R, Chang JF, Hua L (2012) Data mining in healthcare and biomedicine: a survey of the literature. J Med Syst 36(4):2431–2448

    Article  Google Scholar 

  13. Kaur H, Kumari V (2022) Predictive modelling and analytics for diabetes using a machine learning approach. Appl Comput Informatics 18:90–100 

  14. Sarstedt M, Mooi E (2014) Regression Analysis. https://doi.org/10.1007/978-3-642-53965-7_7

  15. Song Y-Y, Ying LU (2015) Decision tree methods: applications for classification and prediction.". Shanghai Arch Psychiatry 27(2):130

    Google Scholar 

  16. Mavrogiorgou A, Kiourtis A, Manias G, Kyriazis D (2021) An optimized KDD process for collecting and processing ingested and streaming healthcare data, In: 2021 12th International Conference on Information And Communication Systems (ICICS), IEEE, pp 49–56

  17. Zhang Y (2012) Support vector machine classification algorithm and its application. Information Computing and Applications: Third International Conference, ICICA 2012, Chengde, China, September 14-16, 2012. Proceedings, Part II 3. Springer Berlin Heidelberg, pp 179–186 

  18. Lee M, Gatton TM, Lee KK (2010) A monitoring and advisory system for diabetes patient management using a rule-based method and KNN. Sensors 10(4):3934–3953

    Article  Google Scholar 

  19. Resende PAA, Drummond AC (2018) A survey of random forest based methods for intrusion detection systems. ACM Comput Surv (CSUR) 51(3):1–36

    Article  Google Scholar 

  20. Chen H et al (2021) Improved naive Bayes classification algorithm for traffic risk management. EURASIP J Adv Signal Process 2021(1):1–12

    Article  Google Scholar 

  21. Na S, Xumin L, Yong G (2010) Research on k-means Clustering Algorithm: An Improved k-means Clustering Algorithm, 2010 Third International Symposium on Intelligent Information Technology and Security Informatics, Jian, China, pp 63-67. https://doi.org/10.1109/IITSI.2010.74

  22. Alcalá-Fdez J, Alcala R, Herrera F (2011) A fuzzy association rule-based classification model for high-dimensional problems with genetic rule selection and lateral tuning. IEEE Trans Fuzzy Syst 19(5):857–872

    Article  Google Scholar 

  23. Grossi E, Buscema M (2007) Introduction to artificial neural networks. Eur J Gastroenterol Hepatol 19(12):1046–1054

    Article  Google Scholar 

  24. Wu H, Yang S, Huang Z, He J, Wang X (2018) Type 2 diabetes mellitus prediction model based on data mining. Inform Med Unlocked 10:100–107

    Article  Google Scholar 

  25. Islam MM, Ferdousi R, Rahman S, Bushra HY (2020) Likelihood prediction of diabetes at early stage using data mining techniques. In: Computer Vision and Machine Intelligence in Medical Image Analysis, Springer, Singaporep, pp 113–125

  26. Woldemichael FG, Menaria S (2018) Prediction of diabetes using data mining techniques. In: 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI). IEEE, pp 414–418.

  27. Fiarni C, Sipayung EM, Maemunah S (2019) Analysis and prediction of diabetes complication disease using data mining algorithm. Procedia Comput Sci 161:449–457

    Article  Google Scholar 

  28. Aldallal A, Al-Moosa AAA (2018) Using data mining techniques to predict diabetes and heart diseases. In: 2018 4th International Conference on Frontiers Of Signal Processing (ICFSP), IEEE, pp 150–154

  29. Khan FA, Zeb K, Al-Rakhami M, Derhab A, Bukhari SAC (2021) Detection and prediction of diabetes using data mining: a comprehensive review. IEEE Access 9:43711–43735

    Article  Google Scholar 

  30. Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I, Chouvarda I (2017) Machine learning and data mining methods in diabetes research, Comput. Struct. Biotechnol J 15:104–116

    Google Scholar 

  31. Kumar A, Kumar P, Srivastava A, Ambeth Kumar VD, Vengatesan K, Singhal A (2020) Comparative analysis of data mining techniques to predict heart disease for diabetic patients, In: Advances in Computing and Data Sciences: 4th International Conference, ICACDS 2020, Valletta, Malta, April 24–25, 2020, Revised Selected Papers 4, Springer, Singapore, pp 507–518

  32. Mahesh TR, Kumar D, Vinoth Kumar V, Asghar J, Mekcha Bazezew B, Natarajan R, Vivek V (2022) Blended Ensemble Learning Prediction Model for Strengthening Diagnosis and Treatment of Chronic Diabetes Disease, Computational Intelligence and Neuroscience

  33. Oza A, Bokhare A (2022) Diabetes prediction using logistic regression and K-nearest neighbor, In: Congress on Intelligent Systems: Proceedings of CIS 2021, Volume 2, Springer, Singapore, pp 407–418

  34. Anil KS, Jain R (2022) Data mining techniques in diabetes prediction and diagnosis: a review. In: Conference: 2022 6th International Conference on Trends in Electronics and Informatics (ICOEI). IEEE, pp 1696–1701

  35. Paisanwarakiat R, Na-udom A, Rungrattanaubol J (2022) Combining logistic regression analysis with data mining techniques to predict diabetes. In: International Conference on Computing and Information Technology, Springer International Publishing, pp 88–98

  36. S.S. Arumugam, V. Kuppan, V. Chakravarthi, K. Palaniappan, An accurate diagnosis of diabetes using data mining, In: AIP Conference Proceedings, AIP Publishing 2405(1)

  37. Abdollahi J, Nouri-Moghaddam B (2022) Hybrid stacked ensemble combined with genetic algorithms for diabetes prediction. Iran J Comput Sci 5(3):205–220 

  38. Luo J, Cao S, Ding N, Liao X, Peng L, Xu C (2022) A deep learning method to assist with chronic atrophic gastritis diagnosis using white light images. Dig Liver Dis 54(11):1513–1519

    Article  Google Scholar 

  39. Luo J, Sun Y, Chi J, Liao X, Xu C (2022) A novel deep learning-based method for COVID-19 pneumonia detection from CT images. BMC Med Inform Decis Mak 22(1):1–7

    Article  Google Scholar 

  40. Zamzami IF, Pathoee K, Gupta BB, Mishra A, Rawat D, Alhalabi W (2022) Machine learning algorithms for smart and intelligent healthcare system in Society 5.0. Int J Intell Syst 37(12):11742–11763

    Article  Google Scholar 

  41. Sedik A, Hammad M, Abd El-Samie FE, Gupta BB, Abd El-Latif AA (2021) Efficient deep learning approach for augmented detection of Coronavirus disease. Neural Comput Appl 1–18

  42. Pathoee K, Rawat D, Mishra A, Arya V, Rafsanjani MK, Gupta AK (2022) A cloud-based predictive model for the detection of breast cancer. Int J Cloud Appl Comput (IJCAC) 12(1):1–12

    Google Scholar 

  43. Hammad M, Abd El-Latif AA, Hussain A, Abd El-Samie FE, Gupta BB, Ugail H, Sedik A (2022) Deep learning models for arrhythmia detection in IoT healthcare applications. Comput Electr Eng 100:108011

    Article  Google Scholar 

  44. Rastogi R, Bansal M (2023) Diabetes prediction model using data mining techniques. Measurement: Sensors 25:100605

    Google Scholar 

  45. Febrian ME, Ferdinan FX, Sendani GP, Suryanigrum KM, Yunanda R (2023) Diabetes prediction using supervised machine learning. Proc Comput Sci 216:21–30

    Article  Google Scholar 

  46. Maniruzzaman M, Rahman MJ, Ahammed B, Abedin MM (2020) Classification and prediction of diabetes disease using machine learning paradigm. Health Inform Sci Syst 8:1–14

    Google Scholar 

  47. Tasin I, Nabil TU, Islam S, Khan R (2023) Diabetes prediction using machine learning and explainable AI techniques. Healthcare Technol Lett 10(1–2):1–10

    Article  Google Scholar 

  48. Chatrati SP, Hossain G, Goyal A et al (2020) Smart home health monitoring system for predicting type 2 diabetes and hypertension. J King Saud Univ Comput Inf Sci 34(3):862–870

    Google Scholar 

  49. Kumari S, Kumar D, Mittal M (2021) An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. Int J Cognit Comput Eng 2:40–46

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandip Kumar Singh Modak.

Ethics declarations

Conflict of interest

There is no conflict of interest in the current research.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Modak, S.K.S., Jha, V.K. Diabetes prediction model using machine learning techniques. Multimed Tools Appl 83, 38523–38549 (2024). https://doi.org/10.1007/s11042-023-16745-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16745-4

Keywords

Navigation