Abstract
Diabetes is a disease that actually impacts the capacity of the body to obtain blood glucose, which is usually referred to as blood sugar. At the end of 2019, a new public health problem (COVID-19) emerged. This disease has greatly harmed people with diabetes. Therefore, we intend to make use of data mining algorithms to prevent death and improve the quality of life through the prediction of diabetes. In this paper, four different algorithms have been used to analyze Diabetes from DAT260x Lab01: Logistic, Decision Tree Classifier, Xgboost and SVC. The models are evaluated for which algorithm is much effective. The paper then provides a quick overview of both the set of data and the fieldwork carried out on the subject. In the adjoining step, the dataset and its features are discussed. In addition, the paper explains the four algorithms and virtual environments that have been used to clarify the variables, which have the largest impact on raw data. The findings are obtained by evaluating the confusion matrix applied to the whole selected algorithm. The paper outlines the full observations and conclusions taken based on the results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdar, M., Nasarian, E., Zhou, X., Bargshady, G., Wijayaningrum, V. N., & Hussain, S. (2019). Performance improvement of decision trees for diagnosis of coronary artery disease using multi filtering approach. In 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS) (pp. 26–30). Singapore. https://doi.org/10.1109/CCOMS.2019.8821633
Akter, L., & Ferdib-Al-Islam. (2021). Dementia identification for diagnosing Alzheimer's disease using XGBoost algorithm. In 2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD) (pp. 205–209).
American Diabetes Association. (2021). How COVID-19 Impacts People with Diabetes. Available online: https://www.diabetes.org/coronavirus-covid-19/how-coronavirus-impacts-people-with-diabetes. Retrieved January 3, 2021.
Asselman, A., Khaldi, M., & Aammou, S. (2021). Enhancing the prediction of student performance based on the machine learning XGBoost algorithm. Interactive Learning Environments, 1–20
Boyd, C. R., Tolson, M. A., & Copes, W. S. (1987). Evaluating trauma care: The TRISS method. Trauma score and the injury severity score. The Journal of Trauma., 27(4), 370–378. https://doi.org/I0.1097/00005373-198704000-00005.PMID3106646
Charan, R., Manisha. A., Ravichandran, K., & Muthu, R. (2017). A text-independent speaker verification model: A comparative analysis. In 2017 IEEE International Conference on Intelligent Computing and Control (I2C2). India. https://doi.org/10.1109/I2C2.2017.8321794.
Chaves, L., & Marques, G. (2021). Data mining techniques for early diagnosis of diabetes: A comparative study. Applied Sciences, 11(5), 2218.
Chitra, R., & Seenivasagam, V. (2013). Review of heart disease prediction system using data mining and hybrid intelligent techniques. ICTACT Journal on Soft Computing, 3(04), 605–609.
Christodoulou, E., Ma, J., Collins, G. S., Steyerberg, E. W., Verbakel, J. Y., & Calster, B. V. (2019). A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. Journal of Clinical Epidemiology, 110, 12–22.
Deng, M., Jiang, L., Ren, Y., & Liao, J. (2020). Can we reduce mortality of COVID-19 if we do better in glucose control? Medicine in Drug Discovery, 7(2020), 100048.
Dewi, K. E., & Widiastuti, N. I. (2020, July). Support vector regression for GPA prediction. In IOP Conference Series: Materials Science and Engineering (Vol. 879, No. 1, p. 012112). IOP Publishing.
Fitriyani, N., Syafrudin, M., Alfian, G., & Rhee, J. (2019). Development of disease prediction model based on ensemble learning approach for diabetes and hypertension. IEEE Access., 7, 144777–144789. https://doi.org/10.1109/ACCESS.2019.2945129
Gawali, S., Agale, P., Ghorpade, S., Gawade, R., Nimat, P. (2020). Intrusion detection using hidden Markov model and XGBoost algorithm. International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 466–470. https://doi.org/10.32628/CSEIT206287.
Gomathi, S., & Narayani, V. (2015). Monitoring of Lupus disease using decision tree induction classification algorithm. In 2015 International Conference on Advanced Computing and Communication Systems (pp. 1–6). Coimbatore, India. https://doi.org/10.1109/ICACCS.2015.7324054
Hackernoon.com. (2020). Introduction to Machine Learning Algorithms: Logistic Regressio|Hacker Noon. [online] Available at: https://hackernoon.com/introduction-to-machine-learning-algorithms-logistic-regression-cbdd82d81a36. Retrieved August 10, 2020.
Hartmann-Boyce, J., Morris, E., Goyder, C., Kinton, J., Perring, J., Nunan, D., & Khunti, K. (2020). Diabetes and COVID-19: risks, management, and learnings from other national disasters. Diabetes Care, 43(8), 1695–1703
Hu, C., & Albertani, R. (2021). Wind turbine event detection by support vector machine. Wind Energy, 24(7), 672–685.
Kologlu M., Elker D., Altun H., & Sayek I. (2001). Validation of MPI and OIA II in two different groups of patients with secondary peritonitis II. Hepato-Gastroenterology, 48, N2 37. 147–151.
Komi, M., Li, J., Zhai, Y., & Zhang, X. (2017). Application of data mining methods in diabetes prediction. In Presented at the 2017 2nd International Conference on Image, Vision and Computing (ICIVC) (pp. 1006–1010). Chengdu, China: IEEE. https://doi.org/10.1109/ICIVC.2017.7984706
Kumar, P. S., Kumari, A., Mohapatra, S., Naik, B., Nayak, J., & Mishra, M. (2021). CatBoost ensemble approach for diabetes risk prediction at early stages. In Presented at the 2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON) (pp. 1–6). Bhubaneswar, India: IEEE. https://doi.org/10.1109/ODICON50556.2021.9428943
Lim, S., Bae, J. H., Kwon, H. S., & Nauck, M. A. (2021). COVID-19 and diabetes mellitus: From pathophysiology to clinical management. Nature Reviews Endocrinology, 17(1), 11–30.
Medicalnewstoday.com. 2020. Diabetes: Symptoms, Treatment, And Early Diagnosis. [online] Available at: https://www.medicalnewstoday.com/articles/323627. Retrieved August 10, 2020.
Medium. (2020). Decision Tree Algorithm — Explained. [online] Available at: https://towardsdatascience.com/decision-tree-algorithm-explained-83beb6e78ef4. Retrieved August 12, 2020.
Ming, Y., Zhang, J., Qi, J., Liao, T., Wang, M., & Zhang, L. (2020, September). Prediction and analysis of Chengdu housing rent based on XGBoost algorithm. In Proceedings of the 2020 3rd International Conference on Big Data Technologies (pp. 1–5).
nhs.uk. (2020). Diabetes. [online] Available at: https://www.nhs.uk/conditions/diabetes/. Retrieved August 10, 2020.
NIDDK. (2020). What Is Diabetes?|NIDDK. [online] National Institute of Diabetes and Digestive and Kidney Diseases. Available at: https://www.niddk.nih.gov/health-information/diabetes/overview/what-is-diabetes. Retrieved August 10, 2020.
Ohri, A. (2021). XGBoost Algorithm: An Easy Overview For 2021. Available at: XGBoost Algorithm: An Easy Overview For 2021 (jigsawacademy.com). Retrieved June 15.
Pandiangan, N., Buono, M. L. C., & Loppies, S. H. D. (2020). Implementation of decision tree and Naïve Bayes classification method for predicting study period. Journal of Physics: Conference, 1569, 022022. https://doi.org/10.1088/1742-6596/1569/2/022022
Parui, S., Bajiya, A. K. R., Samanta, D., & Chakravorty, N. (2019, December). Emotion recognition from EEG signal using XGBoost algorithm. In 2019 IEEE 16th India Council International Conference (INDICON) (pp. 1–4). IEEE.
Quinlan, J. R. (1996). Learning decision tree classifiers. ACM Computing Surveys (CSUR), 28(1), 71–72.
Rashid, M., Singh, H., Goyal, V., Parah, S. A., & Wani, A. R. (2021). Big data based hybrid machine learning model for improving performance of medical Internet of Things data in healthcare systems. In Healthcare Paradigms in the Internet of Things Ecosystem (pp. 47–62). Academic Press.
Reinstein, I. (2020). Xgboost, A Top Machine Learning Method On Kaggle, Explained—Kdnuggets. [online] KDnuggets. Available at: https://www.kdnuggets.com/2017/10/xgboost-top-machine-learning-method-kaggle-explained.html. Retrieved August 10, 2020.
Rochmawati, N., Hidayati, H. B., Yamasari, Y., Yustanti, W., Rakhmawati, L., Tjahyaningtijas, H. P. A., & Anistyasari, Y. (2020). Covid Symptom Severity Using Decision Tree. In 2020 Third International Conference on Vocational Education and Electrical Engineering (ICVEE) (pp. 1–5). Indonesia: Surabaya. https://doi.org/10.1109/ICVEE50212.2020.9243246
Samant, P., & Agarwal, R. (2018a). Machine learning techniques for medical diagnosis of diabetes using iris images. Computer Methods and Programs in Biomedicine, 157, 121–128. https://doi.org/10.1016/j.cmpb.2018.01.004
Samant, P., & Agarwal, R. (2018b). Comparative analysis of classification based algorithms for diabetes diagnosis using iris images. Journal of Medical Engineering and Technology, 42, 35–42. https://doi.org/10.1080/03091902.2017.1412521
Saxena, R. (2017). How Decision Tree Algorithm Works. Available at: https://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/. Retrieved 04, April 40.
Sisodia, D., & Sisodia, D. S. (2018). Prediction of diabetes using classification algorithms. Procedia Computer Science, 132, 1578–1585.
Swapna, G., Soman, K. P., & Vinayakumar, R. (2018). Automated detection of diabetes using CNN and CNN-LSTM network and heart rate signals. Procedia Computer Science, 132, 1253–1262.
Syam, N., & Kaul, R. (2021). Support vector machines in marketing and sales. In Machine learning and artificial intelligence in marketing and sales. Emerald Publishing Limited.
WHO. Available at :https://covid19.who.int/. Retrieved October 27, 2021.
Wu, F., Zhao, S., Yu, B., Chen, Y. M., Wang, W., Song, Z. G., & Zhang, Y. Z. (2020). A new coronavirus associated with human respiratory disease in China. Nature, 579(7798), 265–269.
Yang, J. K., Lin, S. S., Ji, X. J., & Guo, L. M. (2010). Binding of SARS coronavirus to its receptor damages islets and causes acute diabetes. Acta Diabetologica, 47(3), 193–199.
Acknowledgements
This work is partly supported by VC Research (VCR 0000156).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
The program is composed in a python programed on a Jupiter notebook. That necessary resources are mentioned throughout the code as provided by the algorithm.
Logistic Regression:
Applying Logistic
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Chang, V., Javvaji, S., Xu, Q.A., Hall, K., Guan, S. (2022). Diabetes Analysis with a Dataset Using Machine Learning. In: Chang, V., Kaur, H., Fong, S.J. (eds) Artificial Intelligence and Machine Learning Methods in COVID-19 and Related Health Diseases. Studies in Computational Intelligence, vol 1023. Springer, Cham. https://doi.org/10.1007/978-3-031-04597-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-04597-4_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04596-7
Online ISBN: 978-3-031-04597-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)