Abstract
Diabetes has emerged as a significant global health concern, contributing to various severe complications such as kidney disease, vision loss, and coronary issues. Leveraging machine learning algorithms in medical services has shown promise in accurate disease diagnosis and treatment, thereby alleviating the burden on healthcare professionals. The field of diabetes forecasting has rapidly evolved, offering the potential for early intervention and patient empowerment. To this end, our study presents an innovative diabetes prediction model employing a range of machine learning techniques, including Logistic Regression, SVM, Naïve Bayes, and Random Forest. In addition to these foundational techniques, we harness the power of ensemble learning to further enhance prediction accuracy and robustness. Specifically, we explore ensemble methods such as XGBoost, LightGBM, CatBoost, Adaboost, and Bagging. These techniques amalgamate predictions from multiple base learners, yielding a more precise and resilient final prediction. Our proposed framework is developed and trained using Python, utilizing a real-world dataset sourced from Kaggle. Our methodology is rigorously examined through performance evaluation metrics, including the confusion matrix, sensitivity, and accuracy measurements. Among the ensemble techniques tested, CatBoost emerges as the most effective, boasting an impressive accuracy rate of 95.4% compared to XGBoost's 94.3%. Furthermore, CatBoost's higher AUC-ROC score of 0.99 reinforces its potential superiority over XGBoost, which achieved an AUC-ROC score of 0.98.
Similar content being viewed by others
Data availability
References
Sisodia D, Sisodia DS (2018) Prediction of diabetes using classification algorithms. Procedia Comput Sci 132:1578–1585
Hasan MK, Alam MA, Das D, Hossain E, Hasan M (2020) Diabetes prediction using ensembling of different machine learning classifiers. IEEE Access 8:76516–76531
Saeedi P, Petersohn I, Salpea P et al (2019) Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: results from the International Diabetes Federation Diabetes Atlas. Diabetes Res Clin Pract 157:107843
Idicula-Thomas S, Kulkarni AJ, Kulkarni BD, Jayaraman VK, Balaji PV (2006) A support vector machine-based method for predicting the propensity of a protein to be soluble or to form inclusion body on overexpression in Escherichia coli. Bioinformatics 22(3):278–284
Mellitus D (2005) Diagnosis and classification of diabetes mellitus. Diabetes Care 28:S5–S10
Kalyankar GD, Poojara SR, Dharwadkar NV (2017) Predictive Analysis of Diabetic Patient Data Using Machine Learning and Hadoop. International Conference On I-SMAC, 978–1–5090–3243–3
Khanam JJ, Foo SY (2021) A comparison of machine learning algorithms for diabetes prediction. ICT Exp 7(4):432–439
Seka S, Pon K, Shakila S (2021) Machine Learning-Based Diabetic Disease Prediction With Big Healthcare Data. Webology 18:6
Hasan MK et al (2020) Diabetes prediction using ensembling of different machine learning classifiers. IEEE Access 8:76516–76531
Maniruzzaman M et al (2020) Classification and prediction of diabetes disease using machine learning paradigm. Health Information Sci Syst 8:1–14
Sun YL, Zhang DL (2019) Machine learning techniques for screening and diagnosis of diabetes: a survey. Tehnički vjesnik 26(3):872–880
Yoo I, Alafaireet P, Marinov M, Pena-Hernandez K, Gopidi R, Chang JF, Hua L (2012) Data mining in healthcare and biomedicine: a survey of the literature. J Med Syst 36(4):2431–2448
Kaur H, Kumari V (2022) Predictive modelling and analytics for diabetes using a machine learning approach. Appl Comput Informatics 18:90–100
Sarstedt M, Mooi E (2014) Regression Analysis. https://doi.org/10.1007/978-3-642-53965-7_7
Song Y-Y, Ying LU (2015) Decision tree methods: applications for classification and prediction.". Shanghai Arch Psychiatry 27(2):130
Mavrogiorgou A, Kiourtis A, Manias G, Kyriazis D (2021) An optimized KDD process for collecting and processing ingested and streaming healthcare data, In: 2021 12th International Conference on Information And Communication Systems (ICICS), IEEE, pp 49–56
Zhang Y (2012) Support vector machine classification algorithm and its application. Information Computing and Applications: Third International Conference, ICICA 2012, Chengde, China, September 14-16, 2012. Proceedings, Part II 3. Springer Berlin Heidelberg, pp 179–186
Lee M, Gatton TM, Lee KK (2010) A monitoring and advisory system for diabetes patient management using a rule-based method and KNN. Sensors 10(4):3934–3953
Resende PAA, Drummond AC (2018) A survey of random forest based methods for intrusion detection systems. ACM Comput Surv (CSUR) 51(3):1–36
Chen H et al (2021) Improved naive Bayes classification algorithm for traffic risk management. EURASIP J Adv Signal Process 2021(1):1–12
Na S, Xumin L, Yong G (2010) Research on k-means Clustering Algorithm: An Improved k-means Clustering Algorithm, 2010 Third International Symposium on Intelligent Information Technology and Security Informatics, Jian, China, pp 63-67. https://doi.org/10.1109/IITSI.2010.74
Alcalá-Fdez J, Alcala R, Herrera F (2011) A fuzzy association rule-based classification model for high-dimensional problems with genetic rule selection and lateral tuning. IEEE Trans Fuzzy Syst 19(5):857–872
Grossi E, Buscema M (2007) Introduction to artificial neural networks. Eur J Gastroenterol Hepatol 19(12):1046–1054
Wu H, Yang S, Huang Z, He J, Wang X (2018) Type 2 diabetes mellitus prediction model based on data mining. Inform Med Unlocked 10:100–107
Islam MM, Ferdousi R, Rahman S, Bushra HY (2020) Likelihood prediction of diabetes at early stage using data mining techniques. In: Computer Vision and Machine Intelligence in Medical Image Analysis, Springer, Singaporep, pp 113–125
Woldemichael FG, Menaria S (2018) Prediction of diabetes using data mining techniques. In: 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI). IEEE, pp 414–418.
Fiarni C, Sipayung EM, Maemunah S (2019) Analysis and prediction of diabetes complication disease using data mining algorithm. Procedia Comput Sci 161:449–457
Aldallal A, Al-Moosa AAA (2018) Using data mining techniques to predict diabetes and heart diseases. In: 2018 4th International Conference on Frontiers Of Signal Processing (ICFSP), IEEE, pp 150–154
Khan FA, Zeb K, Al-Rakhami M, Derhab A, Bukhari SAC (2021) Detection and prediction of diabetes using data mining: a comprehensive review. IEEE Access 9:43711–43735
Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I, Chouvarda I (2017) Machine learning and data mining methods in diabetes research, Comput. Struct. Biotechnol J 15:104–116
Kumar A, Kumar P, Srivastava A, Ambeth Kumar VD, Vengatesan K, Singhal A (2020) Comparative analysis of data mining techniques to predict heart disease for diabetic patients, In: Advances in Computing and Data Sciences: 4th International Conference, ICACDS 2020, Valletta, Malta, April 24–25, 2020, Revised Selected Papers 4, Springer, Singapore, pp 507–518
Mahesh TR, Kumar D, Vinoth Kumar V, Asghar J, Mekcha Bazezew B, Natarajan R, Vivek V (2022) Blended Ensemble Learning Prediction Model for Strengthening Diagnosis and Treatment of Chronic Diabetes Disease, Computational Intelligence and Neuroscience
Oza A, Bokhare A (2022) Diabetes prediction using logistic regression and K-nearest neighbor, In: Congress on Intelligent Systems: Proceedings of CIS 2021, Volume 2, Springer, Singapore, pp 407–418
Anil KS, Jain R (2022) Data mining techniques in diabetes prediction and diagnosis: a review. In: Conference: 2022 6th International Conference on Trends in Electronics and Informatics (ICOEI). IEEE, pp 1696–1701
Paisanwarakiat R, Na-udom A, Rungrattanaubol J (2022) Combining logistic regression analysis with data mining techniques to predict diabetes. In: International Conference on Computing and Information Technology, Springer International Publishing, pp 88–98
S.S. Arumugam, V. Kuppan, V. Chakravarthi, K. Palaniappan, An accurate diagnosis of diabetes using data mining, In: AIP Conference Proceedings, AIP Publishing 2405(1)
Abdollahi J, Nouri-Moghaddam B (2022) Hybrid stacked ensemble combined with genetic algorithms for diabetes prediction. Iran J Comput Sci 5(3):205–220
Luo J, Cao S, Ding N, Liao X, Peng L, Xu C (2022) A deep learning method to assist with chronic atrophic gastritis diagnosis using white light images. Dig Liver Dis 54(11):1513–1519
Luo J, Sun Y, Chi J, Liao X, Xu C (2022) A novel deep learning-based method for COVID-19 pneumonia detection from CT images. BMC Med Inform Decis Mak 22(1):1–7
Zamzami IF, Pathoee K, Gupta BB, Mishra A, Rawat D, Alhalabi W (2022) Machine learning algorithms for smart and intelligent healthcare system in Society 5.0. Int J Intell Syst 37(12):11742–11763
Sedik A, Hammad M, Abd El-Samie FE, Gupta BB, Abd El-Latif AA (2021) Efficient deep learning approach for augmented detection of Coronavirus disease. Neural Comput Appl 1–18
Pathoee K, Rawat D, Mishra A, Arya V, Rafsanjani MK, Gupta AK (2022) A cloud-based predictive model for the detection of breast cancer. Int J Cloud Appl Comput (IJCAC) 12(1):1–12
Hammad M, Abd El-Latif AA, Hussain A, Abd El-Samie FE, Gupta BB, Ugail H, Sedik A (2022) Deep learning models for arrhythmia detection in IoT healthcare applications. Comput Electr Eng 100:108011
Rastogi R, Bansal M (2023) Diabetes prediction model using data mining techniques. Measurement: Sensors 25:100605
Febrian ME, Ferdinan FX, Sendani GP, Suryanigrum KM, Yunanda R (2023) Diabetes prediction using supervised machine learning. Proc Comput Sci 216:21–30
Maniruzzaman M, Rahman MJ, Ahammed B, Abedin MM (2020) Classification and prediction of diabetes disease using machine learning paradigm. Health Inform Sci Syst 8:1–14
Tasin I, Nabil TU, Islam S, Khan R (2023) Diabetes prediction using machine learning and explainable AI techniques. Healthcare Technol Lett 10(1–2):1–10
Chatrati SP, Hossain G, Goyal A et al (2020) Smart home health monitoring system for predicting type 2 diabetes and hypertension. J King Saud Univ Comput Inf Sci 34(3):862–870
Kumari S, Kumar D, Mittal M (2021) An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. Int J Cognit Comput Eng 2:40–46
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
There is no conflict of interest in the current research.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Modak, S.K.S., Jha, V.K. Diabetes prediction model using machine learning techniques. Multimed Tools Appl 83, 38523–38549 (2024). https://doi.org/10.1007/s11042-023-16745-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16745-4