Skip to main content

A New Architecture for Diabetes Prediction Using Data Mining, Deep Learning, and Ensemble Algorithms

  • Conference paper
  • First Online:
Proceedings of Eighth International Congress on Information and Communication Technology (ICICT 2023)

Abstract

It is a big challenge to diagnose diabetes in an early stage. This causes a health problem because it is a severe cause of death if it is not treated early or it can trigger many secondary diseases that impact the well-being of the patient. In this document, we present a new method to accurately predict this disease using data mining, deep learning, and ensemble algorithms. Data mining includes the processes of data preprocessing to make it more comprehensible and gaining insights from the dataset. This architecture is divided in 7 steps: First, the dataset is loaded. Second, the variables are analyzed to understand their value to predict diabetes. Third, the noise is removed from the dataset, deleting empty data. Fourth, the variables are transformed and scaled. Fifth, an exploratory analysis is made to explore the correlations between the variables. Sixth, the following predictive methods are applied: random forest, artificial neural network, and AdaBoost. Finally, results are presented and explained. To implement this method, we used a public dataset from kaggle called: diabetes dataset. This method achieved great accuracy, precision, and recall, which helps demonstrate the effectiveness of the method. Finally, this document could be the base for new research in this disease like trying to predict the type of diabetes the patient has, and it can be applied to different health problems. Furthermore, more predictive methods should be applied to try to achieve a higher accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abiodun OI, Jantan A, Omolara AE, Dada KV, Umar AM, Linus OU, Arshad H, Kazaure AA, Gana U, Kiru MU (2019) Comprehensive review of artificial neural network applications to pattern recognition. IEEE Access 7:158820–158846

    Article  Google Scholar 

  2. Ahmed TM (2016) Developing a predicted model for diabetes type 2 treatment plans by using data mining. J Theor Appl Inf Technol 90(2):181

    Google Scholar 

  3. Cano JR, Gutiérrez PA, Krawczyk B, Woźniak M, García S (2019) Monotonic classification: an overview on algorithms, performance measures and data sets. Neurocomputing 341:168–182

    Article  Google Scholar 

  4. Chen CC, Li ST (2014) Credit rating with a monotonicity-constrained support vector machine model. Expert Syst Appl 41(16):7235–7247

    Article  Google Scholar 

  5. Hasan MK, Alam MA, Das D, Hossain E, Hasan M (2020) Diabetes prediction using ensembling of different machine learning classifiers. IEEE Access 8:76516–76531

    Article  Google Scholar 

  6. Jahani M, Mahdavi M (2016) Comparison of predictive models for the early diagnosis of diabetes. Healthcare Inform Res 22(2):95–100

    Article  Google Scholar 

  7. Jayanthi N, Babu BV, Rao NS (2017) Survey on clinical prediction models for diabetes prediction. J Big Data 4(1):1–15

    Article  Google Scholar 

  8. Kumar K, Kishore P, Kumar DA, Kumar EK (2018) Indian classical dance action identification using adaboost multiclass classifier on multifeature fusion. In: 2018 conference on signal processing and communication engineering systems (SPACES). IEEE, pp 167–170

    Google Scholar 

  9. Ma J (2020) Machine learning in predicting diabetes in the early stage. In: 2020 2nd international conference on machine learning, big data and business intelligence (MLBDBI), pp 167–172

    Google Scholar 

  10. Mujumdar A, Vaidehi V (2019) Diabetes prediction using machine learning algorithms. Procedia Comput Sci 165:292–299; 2nd international conference on recent trends in advanced computing ICRTAC-DISRUP-TIV INNOVATION, 11–12 Nov 2019. https://www.sciencedirect.com/science/article/pii/S1877050920300557

  11. OPS/OMS: Diabetes. https://www.paho.org/es/temas/diabetes

  12. Saeed N, Nam H, Al-Naffouri TY, Alouini MS (2019) A state-of-the-art survey on multidimensional scaling-based localization techniques. IEEE Commun Surv Tutor 21(4):3565–3583

    Article  Google Scholar 

  13. Saeed N, Nam H, Haq MIU, Muhammad Saqib DB (2018) A survey on multidimensional scaling. ACM Comput Surv 51(3). https://doi.org/10.1145/3178155

  14. Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. Plos One 10(3):1–21

    Google Scholar 

  15. Saxena P, Saha S, Devi SK (2022) Analysis and prediction of diabetes using machine models. In: 2022 international mobile and embedded technology conference (MECON), pp 315–319

    Google Scholar 

  16. Tang D, Tang L, Dai R, Chen J, Li X, Rodrigues JJ (2020) Mf-adaboost: Ldos attack detection based on multi-features and improved adaboost. Future Gener Comput Syst 106:347–359. https://www.sciencedirect.com/science/article/pii/S0167739X19310544

  17. Tuan Hoang A, Nieti S, Chyuan Ong H, Tarelko W, Viet Pham V, Hieu Le T, Quang Chau M, Phuong Nguyen X (2021) A review on application of artificial neural network (ANN) for performance and emission characteristics of diesel engine fueled with biodiesel-based fuels. Sustain Energy Technol Assess 47:101416. https://www.sciencedirect.com/science/article/pii/S2213138821004264

  18. Tyralis H, Papacharalampous G, Langousis A (2019) A brief review of random forests for water scientists and practitioners and their recent history in water resources. Water 11(5). https://www.mdpi.com/2073-4441/11/5/910

  19. Ukani V (2020) Diabetes data set. https://www.kaggle.com/datasets/vikasukani/diabetes-data-set/metadata?datasetId=821698

  20. WHO: Diabetes (Nov 2021). https://www.who.int/news-room/fact-sheets/detail/diabetes

  21. Yap FY, Varghese BA, Cen SY, Hwang DH, Lei X, Desai B, Lau C, Yang LL, Fullenkamp AJ, Hajian S et al (2021) Shape and texture-based radiomics signature on CT effectively discriminates benign from malignant renal masses. Euro Radiol 31(2):1011–1021

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adolfo Jara-Gavilanes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jara-Gavilanes, A., Ávila-Faicán, R., Hurtado Ortiz, R. (2024). A New Architecture for Diabetes Prediction Using Data Mining, Deep Learning, and Ensemble Algorithms. In: Yang, XS., Sherratt, R.S., Dey, N., Joshi, A. (eds) Proceedings of Eighth International Congress on Information and Communication Technology. ICICT 2023. Lecture Notes in Networks and Systems, vol 695. Springer, Singapore. https://doi.org/10.1007/978-981-99-3043-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-3043-2_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-3042-5

  • Online ISBN: 978-981-99-3043-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics