Skip to main content

Advertisement

Log in

A stacked ensemble machine learning approach for the prediction of diabetes

  • Research article
  • Published:
Journal of Diabetes & Metabolic Disorders Aims and scope Submit manuscript

Abstract

Objectives

Diabetes has become a leading cause of mortality in both developed and developing countries, impacting a growing number of individuals worldwide. As the prevalence of the disease continues to rise, researchers have diligently worked towards developing accurate diabetes prediction models. The primary aim of this study is to utilize a diverse set of machine learning algorithms to detect the presence of diabetes, particularly in females, at an early stage. By leveraging these methods, this research seeks to provide physicians with valuable tools to identify the disease early, enabling timely interventions and improving patient outcomes.

Methods

In this study, some state-of-the-art machine learning techniques, such as random forest classifiers with gridsearchCV, XGBoost, NGBoost, Bagging, LightGBM, and AdaBoost classifiers, were employed. These models were chosen as the base layer of our proposed stacked ensemble model because of their high accuracy. Before feeding the data into the models, the dataset was preprocessed to ensure optimal performance and obtain improved results.

Results

The accuracy achieved in this study was 92.91%, which demonstrates its competitiveness with the existing approaches. Moreover, the utilization of the Shapley additive explanation (SHAP) facilitated the interpretation of machine learning models.

Conclusion

We anticipate that these findings will be beneficial to healthcare providers, stakeholders, students, and researchers involved in diabetes prediction research and development.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

The data used to support the findings of the study are available at https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database.

References

  1. Alam TM, Iqbal MA, Ali Y, Wahab A, Ijaz S, Baig TI, Hussain A, Malik MA, Raza MM, Ibrar S, et al. A model for early prediction of diabetes. Inform Med Unlocked. 2019;16:100204.

    Article  Google Scholar 

  2. National Diabetes Statistics Report | Diabetes | Centers for Disease Control and Prevention. 2022. https://www.cdc.gov/diabetes/data/statistics-report/index.html. Accessed 25 Jan 2023

  3. Hosseini Sarkhosh SM, Esteghamati A, Hemmatabadi M, Daraei M. Predicting diabetic nephropathy in type 2 diabetic patients using machine learning algorithms. J Diabetes Metab Disord. 2022;21(2):1433–41.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Yang MH, Hall SA, Piccolo RS, Maserejian NN, McKinlay JB. Do behavioral risk factors for prediabetes and insulin resistance differ across the socioeconomic gradient? results from a community-based epidemiologic survey. International journal of endocrinology 2015. 2015

  5. Hemanth S, Alagarsamy S. Hybrid adaptive deep learning classifier for early detection of diabetic retinopathy using optimal feature extraction and classification. J Diabetes Metab Disord. 2023:1–15

  6. Nabovati E, Rangraz Jeddi F, Tabatabaeizadeh SM, Hamidi R, Sharif R. Design, development, and usability evaluation of a smartphone-based application for nutrition management in patients with type ii diabetes. J Diabetes Metab Disord. 2022:1–9

  7. Bukhari MM, Alkhamees BF, Hussain S, Gumaei A, Assiri A, Ullah SS. An improved artificial neural network model for effective diabetes prediction. Complexity. 2021;2021:1–10.

    Article  Google Scholar 

  8. Khodabakhsh P, Asadnia A, Moghaddam AS, Khademi M, Shakiba M, Maher A, Salehian E. Prediction of in-hospital mortality rate in covid-19 patients with diabetes mellitus using machine learning methods. J Diabetes Metab Disord. 2023:1–14

  9. Gupta H, Varshney H, Sharma TK, Pachauri N, Verma OP. Comparative performance analysis of quantum machine learning with deep learning for diabetes prediction. Complex Intell Syst. 2022;8(4):3073–87.

    Article  Google Scholar 

  10. Maniruzzaman M, Rahman MJ, Ahammed B, Abedin MM. Classification and prediction of diabetes disease using machine learning paradigm. Health Inf Sci Syst. 2020;8:1–14.

    Article  Google Scholar 

  11. Ramesh J, Aburukba R, Sagahyroon A. A remote healthcare monitoring framework for diabetes prediction using machine learning. Healthc Technol Lett. 2021;8(3):45–57.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Mujumdar A, Vaidehi V. Diabetes prediction using machine learning algorithms. Procedia Comput Sci. 2019;165:292–9.

    Article  Google Scholar 

  13. Swapna G, Vinayakumar R, Soman K. Diabetes detection using deep learning algorithms. ICT Express. 2018;4(4):243–6.

    Article  Google Scholar 

  14. Mohammadi G, Pezeshki F, Vatanchi YM, Moghbeli F. Application of technology in educating nursing students during covid-19: A systematic review. Front Health Inform. 2021;10(1):64.

    Article  Google Scholar 

  15. Latchoumi T, Dayanika J, Archana G. A comparative study of machine learning algorithms using quick-witted diabetic prevention. Ann Romanian Soc Cell Biol. 2021:4249–59

  16. Krishnamoorthi R, Joshi S, Almarzouki HZ, Shukla PK, Rizwan A, Kalpana C, Tiwari B, et al. A novel diabetes healthcare disease prediction framework using machine learning techniques. J Healthc Eng. 2022:2022

  17. Abdulhadi, N., Al-Mousa, A.: Diabetes detection using machine learning classification methods. In: 2021 International conference on information technology (ICIT). IEEE; 2021. pp. 350–354.

  18. Nadeem MW, Goh HG, Ponnusamy V, Andonovic I, Khan MA, Hussain M. A fusion-based machine learning approach for the prediction of the onset of diabetes. In: Healthcare, MDPI; 2021. vol. 9, p. 1393.

  19. Hasan MK, Alam MA, Das D, Hossain E, Hasan M. Diabetes prediction using ensembling of different machine learning classifiers. IEEE Access. 2020;8:76516–31.

    Article  Google Scholar 

  20. Naz H, Ahuja S. Deep learning approach for diabetes prediction using pima indian dataset. J Diabetes Metab Disord. 2020;19:391–403.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Juneja A, Juneja S, Kaur S, Kumar V. Predicting diabetes mellitus with machine learning techniques using multi-criteria decision making. Int J Inf Retr Res (IJIRR). 2021;11(2):38–52.

    Google Scholar 

  22. Zou Q, Qu K, Luo Y, Yin D, Ju Y, Tang H. Predicting diabetes mellitus with machine learning techniques. Front Genet. 2018;9:515.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Moradifar P, Amiri MM. Prediction of hypercholesterolemia using machine learning techniques. J Diabetes Metab Disord. 2022:1–11

  24. Srivastava S, Sharma L, Sharma V, Kumar A, Darbari H. Prediction of diabetes using artificial neural network approach. In: Engineering vibration, communication and information processing: ICoEVCI 2018, Springer: India; 2019. pp. 679–687.

  25. Ahmed U, Issa GF, Khan MA, Aftab S, Khan MF, Said RA, Ghazal TM, Ahmad M. Prediction of diabetes empowered with fused machine learning. IEEE Access. 2022;10:8529–38.

    Article  Google Scholar 

  26. Rehman A, Athar A, Khan MA, Abbas S, Fatima A, Saeed A, et al. Modelling, simulation, and optimization of diabetes type ii prediction using deep extreme learning machine. J Ambient Intell Smart Environ. 2020;12(2):125–38.

    Article  Google Scholar 

  27. Pima Indians Diabetes Database — kaggle.com. https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database. Accessed 22 Nov 2022

  28. Data MC, Komorowski M, Marshall DC, Salciccioli JD, Crutain Y. Exploratory data analysis. Secondary Analysis of Electronic Health Records, 2016:185–203

  29. Ahmad GN, Fatima H, Ullah S, Saidi AS, et al. Efficient medical diagnosis of human heart diseases using machine learning techniques with and without gridsearchcv. IEEE Access. 2022;10:80151–73.

    Article  Google Scholar 

  30. Ahamed BS, Arya S, et al. Lgbm classifier based technique for predicting type-2 diabetes. Eur J Intern Med. 2021;8(3):454–67.

    Google Scholar 

  31. Wang C, Deng C, Wang S. Imbalance-xgboost: leveraging weighted and focal losses for binary label-imbalanced classification with xgboost. Pattern Recogn Lett. 2020;136:190–7.

    Article  Google Scholar 

  32. Dhaliwal SS, Nahid A-A, Abbas R. Effective intrusion detection system using xgboost. Information. 2018;9(7):149.

    Article  Google Scholar 

  33. Duan T, Anand A, Ding DY, Thai KK, Basu S, Ng A, Schuler A. Ngboost: natural gradient boosting for probabilistic prediction. In: International conference on machine learning. PMLR; 2020. pp. 2690–2700.

  34. Soui M, Mansouri N, Alhamad R, Kessentini M, Ghedira K. Nsga-ii as feature selection technique and adaboost classifier for covid-19 prediction using patient’s symptoms. Nonlinear Dyn. 2021;106(2):1453–75.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Manimegalai T, Manju J, Rubiston MM, Vidhyashree B, Prabu RT. Prediction of optimized stock market trends using hybrid approach based on knn and bagging classifier (knnb). In: 2022 IEEE 11th International Conference on Communication Systems and Network Technologies (CSNT). IEEE; 2022. pp. 257–262.

  36. Wang D, Thunéll S, Lindberg U, Jiang L, Trygg J, Tysklind M. Towards better process management in wastewater treatment plants: Process analytics based on shap values for tree-based machine learning methods. J Environ Manage. 2022;301: 113941.

  37. Sagar SP, Oliullah K, Sohan K, Patwary MFK. Prcmla: product review classification using machine learning algorithms. In: Proceedings of international conference on trends in computational and cognitive engineering: proceedings of TCCE 2020. Springer; 2021. pp. 65–75.

Download references

Acknowledgements

We would like to thank the Bangladesh University of Business and Technology, and the Queensland University of Technology for providing the necessary facilities.

Funding

This work received no external funding.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization and methodology, K.O. and M.H.R.; software, K.O., M.H.R. and M.M.I.; validation, K.O., M.H.R and M.R.I.; formal analysis, M.W.; investigation, K.O. and M.R.I; resources, M.H.R, and M.M.I; writing-original draft preparation, K.O. and M.H.R.; writing-review and editing, M.W. and A.H.W.; visualization, K.O. and A.H.W; supervision, M.W. and A.H.W. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Khondokar Oliullah.

Ethics declarations

Conflicts of Interest

We have no conflicts of interest.

Financial Disclosure

No financial interests related to the material of this manuscript have been declared.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Supplementary data

Appendix: Supplementary data

Algorithm 2
figure b

Extended Feature Engineering (Dataset).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oliullah, K., Rasel, M.H., Islam, M.M. et al. A stacked ensemble machine learning approach for the prediction of diabetes. J Diabetes Metab Disord (2023). https://doi.org/10.1007/s40200-023-01321-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s40200-023-01321-2

Keywords

Navigation