Abstract
It is a big challenge to diagnose diabetes in an early stage. This causes a health problem because it is a severe cause of death if it is not treated early or it can trigger many secondary diseases that impact the well-being of the patient. In this document, we present a new method to accurately predict this disease using data mining, deep learning, and ensemble algorithms. Data mining includes the processes of data preprocessing to make it more comprehensible and gaining insights from the dataset. This architecture is divided in 7 steps: First, the dataset is loaded. Second, the variables are analyzed to understand their value to predict diabetes. Third, the noise is removed from the dataset, deleting empty data. Fourth, the variables are transformed and scaled. Fifth, an exploratory analysis is made to explore the correlations between the variables. Sixth, the following predictive methods are applied: random forest, artificial neural network, and AdaBoost. Finally, results are presented and explained. To implement this method, we used a public dataset from kaggle called: diabetes dataset. This method achieved great accuracy, precision, and recall, which helps demonstrate the effectiveness of the method. Finally, this document could be the base for new research in this disease like trying to predict the type of diabetes the patient has, and it can be applied to different health problems. Furthermore, more predictive methods should be applied to try to achieve a higher accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abiodun OI, Jantan A, Omolara AE, Dada KV, Umar AM, Linus OU, Arshad H, Kazaure AA, Gana U, Kiru MU (2019) Comprehensive review of artificial neural network applications to pattern recognition. IEEE Access 7:158820–158846
Ahmed TM (2016) Developing a predicted model for diabetes type 2 treatment plans by using data mining. J Theor Appl Inf Technol 90(2):181
Cano JR, Gutiérrez PA, Krawczyk B, Woźniak M, García S (2019) Monotonic classification: an overview on algorithms, performance measures and data sets. Neurocomputing 341:168–182
Chen CC, Li ST (2014) Credit rating with a monotonicity-constrained support vector machine model. Expert Syst Appl 41(16):7235–7247
Hasan MK, Alam MA, Das D, Hossain E, Hasan M (2020) Diabetes prediction using ensembling of different machine learning classifiers. IEEE Access 8:76516–76531
Jahani M, Mahdavi M (2016) Comparison of predictive models for the early diagnosis of diabetes. Healthcare Inform Res 22(2):95–100
Jayanthi N, Babu BV, Rao NS (2017) Survey on clinical prediction models for diabetes prediction. J Big Data 4(1):1–15
Kumar K, Kishore P, Kumar DA, Kumar EK (2018) Indian classical dance action identification using adaboost multiclass classifier on multifeature fusion. In: 2018 conference on signal processing and communication engineering systems (SPACES). IEEE, pp 167–170
Ma J (2020) Machine learning in predicting diabetes in the early stage. In: 2020 2nd international conference on machine learning, big data and business intelligence (MLBDBI), pp 167–172
Mujumdar A, Vaidehi V (2019) Diabetes prediction using machine learning algorithms. Procedia Comput Sci 165:292–299; 2nd international conference on recent trends in advanced computing ICRTAC-DISRUP-TIV INNOVATION, 11–12 Nov 2019. https://www.sciencedirect.com/science/article/pii/S1877050920300557
OPS/OMS: Diabetes. https://www.paho.org/es/temas/diabetes
Saeed N, Nam H, Al-Naffouri TY, Alouini MS (2019) A state-of-the-art survey on multidimensional scaling-based localization techniques. IEEE Commun Surv Tutor 21(4):3565–3583
Saeed N, Nam H, Haq MIU, Muhammad Saqib DB (2018) A survey on multidimensional scaling. ACM Comput Surv 51(3). https://doi.org/10.1145/3178155
Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. Plos One 10(3):1–21
Saxena P, Saha S, Devi SK (2022) Analysis and prediction of diabetes using machine models. In: 2022 international mobile and embedded technology conference (MECON), pp 315–319
Tang D, Tang L, Dai R, Chen J, Li X, Rodrigues JJ (2020) Mf-adaboost: Ldos attack detection based on multi-features and improved adaboost. Future Gener Comput Syst 106:347–359. https://www.sciencedirect.com/science/article/pii/S0167739X19310544
Tuan Hoang A, Nieti S, Chyuan Ong H, Tarelko W, Viet Pham V, Hieu Le T, Quang Chau M, Phuong Nguyen X (2021) A review on application of artificial neural network (ANN) for performance and emission characteristics of diesel engine fueled with biodiesel-based fuels. Sustain Energy Technol Assess 47:101416. https://www.sciencedirect.com/science/article/pii/S2213138821004264
Tyralis H, Papacharalampous G, Langousis A (2019) A brief review of random forests for water scientists and practitioners and their recent history in water resources. Water 11(5). https://www.mdpi.com/2073-4441/11/5/910
Ukani V (2020) Diabetes data set. https://www.kaggle.com/datasets/vikasukani/diabetes-data-set/metadata?datasetId=821698
WHO: Diabetes (Nov 2021). https://www.who.int/news-room/fact-sheets/detail/diabetes
Yap FY, Varghese BA, Cen SY, Hwang DH, Lei X, Desai B, Lau C, Yang LL, Fullenkamp AJ, Hajian S et al (2021) Shape and texture-based radiomics signature on CT effectively discriminates benign from malignant renal masses. Euro Radiol 31(2):1011–1021
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Jara-Gavilanes, A., Ávila-Faicán, R., Hurtado Ortiz, R. (2024). A New Architecture for Diabetes Prediction Using Data Mining, Deep Learning, and Ensemble Algorithms. In: Yang, XS., Sherratt, R.S., Dey, N., Joshi, A. (eds) Proceedings of Eighth International Congress on Information and Communication Technology. ICICT 2023. Lecture Notes in Networks and Systems, vol 695. Springer, Singapore. https://doi.org/10.1007/978-981-99-3043-2_17
Download citation
DOI: https://doi.org/10.1007/978-981-99-3043-2_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-3042-5
Online ISBN: 978-981-99-3043-2
eBook Packages: EngineeringEngineering (R0)