Skip to main content

Advertisement

Log in

Multivariable prediction model of complications derived from diabetes mellitus using machine learning on scarce highly unbalanced data

  • Original Article
  • Published:
International Journal of Diabetes in Developing Countries Aims and scope Submit manuscript

Abstract

Background

Diabetes mellitus (DM) increases the risk complications in addition to mortality. Quantifying the risk of complications using artificial intelligence could be a way to design comprehensive patient healthcare programs.

Objective

Predicting the probability of macro and microvascular complications in patients with DM through Machine Learning.

Methods

Retrospective cohort study. Based on an outpatient follow-up program for diabetic patients, 64,081 records and 287 variables were identified, with highly unbalanced data. Predictive models for chronic kidney disease (CKD), lower extremity amputation (LEA), coronary heart disease (CHD), and early mortality (MOR) were developed. An exhaustive computational method was conducted to find the best combination between machine learning (ML) algorithms and sampling method.

Results

The best model was determined by assessing its performance through the heuristics obtained from a comprehensive analysis of the accuracy and F1 values for ML, sampling, and dataset. Regarding each complication, 99.9% accuracy was obtained for LEA, 94.3% for CHD, 97.4% for MOR, and 98.8% for CKD. F1 was assessed to identify false positives, with 84.5% for CKD, 63.6% for MOR, 46.2% for LEA, and 44.8% for CHD.

Conclusions

This ML model can be applied to predict CHD, CKD, and MOR. The success of ML predictions lies in the clinical definition of initial variables and their simplification for obtaining variables based on which the algorithms can identify patients that are likely to develop a complication. For clinical application of this system, it is necessary to assess the cross performance of metrics, as found here (accuracy higher 95% and F1-Score higher than 80%).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

Data is available through the corresponding author upon justified request.

References

  1. Situación de la enfermedad renal crónica, la hipertensión arterial y la diabetes mellitus en Colombia 2020 | Cuenta de Alto Costo n.d. https://cuentadealtocosto.org/site/erc/situacion-de-la-enfermedad-renal-cronica-la-hipertension-arterial-y-la-diabetes-mellitus-en-colombia-2020/. Accessed April 22, 2022.

  2. Dall TM, Yang W, Gillespie K, Mocarski M, Byrne E, Cintina I, et al. The economic burden of elevated blood glucose levels in 2017: diagnosed and undiagnosed diabetes, gestational diabetes mellitus, and prediabetes. Diabetes Care. 2019;42:1661–8. https://doi.org/10.2337/DC18-1226.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Zimmet P, Alberti KG, Magliano DJ, Bennett PH. Diabetes mellitus statistics on prevalence and mortality: facts and fallacies. Nat Rev Endocrinol. 2016;12:616–22. https://doi.org/10.1038/nrendo.2016.105.

    Article  PubMed  Google Scholar 

  4. Forbes JM, Cooper ME. Mechanisms of diabetic complications. Physiol Rev. 2013;93:137–88. https://doi.org/10.1152/physrev.00045.2011.

    Article  CAS  PubMed  Google Scholar 

  5. Tanaka S, Tanaka S, Iimuro S. Predicting macro- and microvascular complications in type 2 diabetes. Diabetes Care. 2013;36:1193–9. https://doi.org/10.2337/dc12-0958.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Laxy M, Schöning VM, Kurz C, Holle R, Peters A, Meisinger C, et al. Performance of the UKPDS outcomes model 2 for predicting death and cardiovascular events in patients with type 2 diabetes mellitus from a German population-based cohort. Pharmacoeconomics. 2019;37:1485–94. https://doi.org/10.1007/S40273-019-00822-4/TABLES/5.

    Article  PubMed  Google Scholar 

  7. Sim J, Kim YA, Kim JH, Lee JM, Kim MS, Shim YM, et al. The major effects of health-related quality of life on 5-year survival prediction among lung cancer survivors: applications of machine learning. Sci Rep. 2020;10:1–12. https://doi.org/10.1038/s41598-020-67604-3.

    Article  CAS  Google Scholar 

  8. Stiglic G, Kocbek P, Fijacko N, Zitnik M, Verbert K, Cilar L. Interpretability of machine learning-based prediction models in healthcare. Wiley Interdiscip Rev Data Min Knowl Discov. 2020;10: e1379. https://doi.org/10.1002/WIDM.1379.

    Article  Google Scholar 

  9. Shamout F, Zhu T, Clifton DA. Machine learning for clinical outcome prediction. IEEE Rev Biomed Eng. 2021;14:116–26. https://doi.org/10.1109/RBME.2020.3007816.

    Article  PubMed  Google Scholar 

  10. Dagliati A, Marini S, Sacchi L, Cogni G, Teliti M, Tibollo V, et al. Machine learning methods to predict diabetes complications. J Diabetes Sci Technol. 2018;12:295–302. https://doi.org/10.1177/1932296817706375.

    Article  PubMed  Google Scholar 

  11. Levin A, Stevens PE, Bilous RW, Coresh J, De Francisco ALM, De Jong PE, et al. Kidney disease: improving global outcomes (KDIGO) CKD work group. KDIGO 2012 clinical practice guideline for the evaluation and management of chronic kidney disease. Kidney Int Suppl. 2011;2013(3):1–150. https://doi.org/10.1038/KISUP.2012.73.

    Article  Google Scholar 

  12. Niaksu O. CRISP data mining methodology extension for medical domain. Balt J Mod Comput 2015;3(2):92–109.

  13. Abhari S, Kalhori SRN, Ebrahimi M, Hasannejadasl H, Garavand A. Artificial intelligence applications in type 2 diabetes mellitus care: focus on machine learning methods. Healthc Inform Res. 2019;25:248. https://doi.org/10.4258/HIR.2019.25.4.248.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Casanova R, Saldana S, Simpson SL, Lacy ME, Subauste AR, Blackshear C, et al. Prediction of incident diabetes in the Jackson heart study using high-dimensional machine learning. PLoS One. 2016;11:e0163942. https://doi.org/10.1371/journal.pone.0163942.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Rau HH, Hsu CY, Lin YA, Atique S, Fuad A, Wei LM, et al. Development of a web-based liver cancer prediction model for type II diabetes patients by using an artificial neural network. Comput Methods Programs Biomed. 2016;125:58–65. https://doi.org/10.1016/j.cmpb.2015.11.009.

    Article  PubMed  Google Scholar 

  16. Chen J, Tang H, Huang H, Lv L, Wang Y, Liu X et al (2015) Development and validation of new glomerular filtration rate predicting models for Chinese patients with type 2 diabetes. J Transl Med13. https://doi.org/10.1186/s12967-015-0674-y.

  17. Huang GM, Huang KY, Lee TY, Weng JTY (2015) An interpretable rule-based diagnostic classification of diabetic nephropathy among type 2 diabetes patients. BMC Bioinformatics 16. https://doi.org/10.1186/1471-2105-16-S1-S5.

  18. Chu-Su Y, Liu CS, Chen RS, Lin CW. Artificial neural networks for estimating glomerular filtration rate by urinary dipstick for type 2 diabetic patients. Biomed Eng (Singapore). 2016;28:1650016. https://doi.org/10.4015/S1016237216500162.

    Article  CAS  Google Scholar 

Download references

Acknowledgment

The authors want to thank the funding institutions: MINCIENCIAS, EPS SANITAS Colombia, as well as University of Santander (UDES) and University Foundation SANITAS for all the support in this process.

Funding

This study was funded by Ministry of Science Technology and Innovation of the Republic of Colombia (a.k.a. MINCIENCIAS) from call 811 of 2018, under code C160I000000011758-19, contract number 433 of 2019, and contingent recovery number 80740-433-2019.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Claudia C. Colmenares-Mejía.

Ethics declarations

Competing Interests

The authors have no financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOC 989 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Colmenares-Mejía, C.C., Rincón-Acuña, J.C., Cely, A. et al. Multivariable prediction model of complications derived from diabetes mellitus using machine learning on scarce highly unbalanced data. Int J Diabetes Dev Ctries (2023). https://doi.org/10.1007/s13410-023-01264-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13410-023-01264-7

Keywords

Navigation