Skip to main content
Log in

Implementation of Logistic Regression on Diabetic Dataset using Train-Test-Split, K-Fold and Stratified K-Fold Approach

  • Short Communication
  • Published:
National Academy Science Letters Aims and scope Submit manuscript

Abstract

Diabetes is a chronic metabolic disorder causing high blood sugars, that further severely affect body parts like the heart, liver, kidneys, lungs, eyes, nerves, blood vessels etc. There are three types of diabetes- Type-1 Diabetes, Type-2 Diabetes, and Gestational Diabetes. In Type-1, body of the patient fails to produce insulin. In Type-2 diabetes, cells of the body fails to respond to insulin effectively. Gestational diabetes occurs during pregnancy. There are many approaches used to analyse this disease. We have used the Machine learning approach for analysing diabetes. We have used 768 records from “pima diabetes dataset”. In this paper, we have used Logistic regression with Train Test Split, K-Fold cross-validation and Stratified K-Fold approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

References

  1. Anna V, van der Ploeg HP, Cheung NW, Huxley RR, Bauman AE (2008) Socio-demographic correlates of the increasing trend in prevalence of gestational diabetes mellitus in a large population of women between 1995 and 2005. Diabetes Care 31(12):2288–2293. https://doi.org/10.2337/dc08-1038

    Article  PubMed  PubMed Central  Google Scholar 

  2. Després JP, Lemieux I (2006) Abdominal obesity and metabolic syndrome. Nature 444(7121):881–887. https://doi.org/10.1038/nature05488

    Article  ADS  CAS  PubMed  Google Scholar 

  3. Sudharsan B, Peeples M, Shomali M (2015) Hypoglycemia prediction using machine learning models for patients with type 2 diabetes. J Diabetes Sci Technol 9(1):86–90. https://doi.org/10.1177/1932296814554260

    Article  PubMed  Google Scholar 

  4. Georga EI, Protopappas VC, Ardigò D, Polyzos D, Fotiadis DI (2013) A glucose model based on support vector regression for the prediction of hypoglycemic events under free-living conditions. Diabetes Technol Ther 15(8):634–643. https://doi.org/10.1089/dia.2012.0285

    Article  CAS  PubMed  Google Scholar 

  5. Zeng X, Martinez TR (2000) Distribution-balanced stratified cross-validation for accuracy estimation. J Exp Theor Artif Intell 12(1):1–12. https://doi.org/10.1080/095281300146272

    Article  MATH  Google Scholar 

  6. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984), Classification and regression trees (Wadsworth International Group).

  7. Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the international joint conference on artificial intelligence (IJCAI), 1137–1143.

  8. Xu W, Zhang J, Zhang Q, Wei X (2017) Risk prediction of type II diabetes based on random forest model. 2017 Third International Conference on Advances in Electrical Electronics Information Communication and Bio-Informatics (AEEICB). https://doi.org/10.1109/AEEICB.2017.7972337

    Article  Google Scholar 

  9. Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I, Chouvarda I (2017) Machine learning and data mining methods in diabetes research. Comput Struct Biotechnol J 8(15):104–116. https://doi.org/10.1016/j.csbj.2016.12.005

    Article  Google Scholar 

  10. Kaggle.com. ‘Pima Indians diabetes data set’ (Online). https://www.kaggle.com/uciml/pima-indians-diabetes-database. Accessed 7 June 2020.

  11. Sisodia D, Sisodia DS (2018) Prediction of diabetes using classification algorithms. Procedia Comput Sci 132:1578–1585

    Article  Google Scholar 

  12. Diwani SA, Sam AE (2014) Diabetes forecasting using supervised learning techniques. Adv Comput Sci: Int J 3:10–18

    Google Scholar 

  13. Sneha and Gangil (2019) Analysis of diabetes mellitus for early prediction using optimal features selection. J Big Data 6:13. https://doi.org/10.1186/s40537-019-0175-6

    Article  Google Scholar 

  14. Jakka A, Vakula-Rani J (2019) Performance evaluation of machine learning models for diabetes prediction. IJITEE. https://doi.org/10.35940/ijitee.K2155.0981119

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meenu Bhagat.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Significance Statement: In this paper, the proposed approach analyses implementation of Train test Split, K-Fold, and Stratified K-Fold cross-validation techniques while using Logistic Regression on Diabetic Database.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhagat, M., Bakariya, B. Implementation of Logistic Regression on Diabetic Dataset using Train-Test-Split, K-Fold and Stratified K-Fold Approach. Natl. Acad. Sci. Lett. 45, 401–404 (2022). https://doi.org/10.1007/s40009-022-01131-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40009-022-01131-9

Keywords

Navigation