Abstract
Diabetes is a chronic metabolic disorder causing high blood sugars, that further severely affect body parts like the heart, liver, kidneys, lungs, eyes, nerves, blood vessels etc. There are three types of diabetes- Type-1 Diabetes, Type-2 Diabetes, and Gestational Diabetes. In Type-1, body of the patient fails to produce insulin. In Type-2 diabetes, cells of the body fails to respond to insulin effectively. Gestational diabetes occurs during pregnancy. There are many approaches used to analyse this disease. We have used the Machine learning approach for analysing diabetes. We have used 768 records from “pima diabetes dataset”. In this paper, we have used Logistic regression with Train Test Split, K-Fold cross-validation and Stratified K-Fold approach.
References
Anna V, van der Ploeg HP, Cheung NW, Huxley RR, Bauman AE (2008) Socio-demographic correlates of the increasing trend in prevalence of gestational diabetes mellitus in a large population of women between 1995 and 2005. Diabetes Care 31(12):2288–2293. https://doi.org/10.2337/dc08-1038
Després JP, Lemieux I (2006) Abdominal obesity and metabolic syndrome. Nature 444(7121):881–887. https://doi.org/10.1038/nature05488
Sudharsan B, Peeples M, Shomali M (2015) Hypoglycemia prediction using machine learning models for patients with type 2 diabetes. J Diabetes Sci Technol 9(1):86–90. https://doi.org/10.1177/1932296814554260
Georga EI, Protopappas VC, Ardigò D, Polyzos D, Fotiadis DI (2013) A glucose model based on support vector regression for the prediction of hypoglycemic events under free-living conditions. Diabetes Technol Ther 15(8):634–643. https://doi.org/10.1089/dia.2012.0285
Zeng X, Martinez TR (2000) Distribution-balanced stratified cross-validation for accuracy estimation. J Exp Theor Artif Intell 12(1):1–12. https://doi.org/10.1080/095281300146272
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984), Classification and regression trees (Wadsworth International Group).
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the international joint conference on artificial intelligence (IJCAI), 1137–1143.
Xu W, Zhang J, Zhang Q, Wei X (2017) Risk prediction of type II diabetes based on random forest model. 2017 Third International Conference on Advances in Electrical Electronics Information Communication and Bio-Informatics (AEEICB). https://doi.org/10.1109/AEEICB.2017.7972337
Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I, Chouvarda I (2017) Machine learning and data mining methods in diabetes research. Comput Struct Biotechnol J 8(15):104–116. https://doi.org/10.1016/j.csbj.2016.12.005
Kaggle.com. ‘Pima Indians diabetes data set’ (Online). https://www.kaggle.com/uciml/pima-indians-diabetes-database. Accessed 7 June 2020.
Sisodia D, Sisodia DS (2018) Prediction of diabetes using classification algorithms. Procedia Comput Sci 132:1578–1585
Diwani SA, Sam AE (2014) Diabetes forecasting using supervised learning techniques. Adv Comput Sci: Int J 3:10–18
Sneha and Gangil (2019) Analysis of diabetes mellitus for early prediction using optimal features selection. J Big Data 6:13. https://doi.org/10.1186/s40537-019-0175-6
Jakka A, Vakula-Rani J (2019) Performance evaluation of machine learning models for diabetes prediction. IJITEE. https://doi.org/10.35940/ijitee.K2155.0981119
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Significance Statement: In this paper, the proposed approach analyses implementation of Train test Split, K-Fold, and Stratified K-Fold cross-validation techniques while using Logistic Regression on Diabetic Database.
Rights and permissions
About this article
Cite this article
Bhagat, M., Bakariya, B. Implementation of Logistic Regression on Diabetic Dataset using Train-Test-Split, K-Fold and Stratified K-Fold Approach. Natl. Acad. Sci. Lett. 45, 401–404 (2022). https://doi.org/10.1007/s40009-022-01131-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40009-022-01131-9