Abstract
Background
Diabetes mellitus (DM) increases the risk complications in addition to mortality. Quantifying the risk of complications using artificial intelligence could be a way to design comprehensive patient healthcare programs.
Objective
Predicting the probability of macro and microvascular complications in patients with DM through Machine Learning.
Methods
Retrospective cohort study. Based on an outpatient follow-up program for diabetic patients, 64,081 records and 287 variables were identified, with highly unbalanced data. Predictive models for chronic kidney disease (CKD), lower extremity amputation (LEA), coronary heart disease (CHD), and early mortality (MOR) were developed. An exhaustive computational method was conducted to find the best combination between machine learning (ML) algorithms and sampling method.
Results
The best model was determined by assessing its performance through the heuristics obtained from a comprehensive analysis of the accuracy and F1 values for ML, sampling, and dataset. Regarding each complication, 99.9% accuracy was obtained for LEA, 94.3% for CHD, 97.4% for MOR, and 98.8% for CKD. F1 was assessed to identify false positives, with 84.5% for CKD, 63.6% for MOR, 46.2% for LEA, and 44.8% for CHD.
Conclusions
This ML model can be applied to predict CHD, CKD, and MOR. The success of ML predictions lies in the clinical definition of initial variables and their simplification for obtaining variables based on which the algorithms can identify patients that are likely to develop a complication. For clinical application of this system, it is necessary to assess the cross performance of metrics, as found here (accuracy higher 95% and F1-Score higher than 80%).
Similar content being viewed by others
Data Availability
Data is available through the corresponding author upon justified request.
References
Situación de la enfermedad renal crónica, la hipertensión arterial y la diabetes mellitus en Colombia 2020 | Cuenta de Alto Costo n.d. https://cuentadealtocosto.org/site/erc/situacion-de-la-enfermedad-renal-cronica-la-hipertension-arterial-y-la-diabetes-mellitus-en-colombia-2020/. Accessed April 22, 2022.
Dall TM, Yang W, Gillespie K, Mocarski M, Byrne E, Cintina I, et al. The economic burden of elevated blood glucose levels in 2017: diagnosed and undiagnosed diabetes, gestational diabetes mellitus, and prediabetes. Diabetes Care. 2019;42:1661–8. https://doi.org/10.2337/DC18-1226.
Zimmet P, Alberti KG, Magliano DJ, Bennett PH. Diabetes mellitus statistics on prevalence and mortality: facts and fallacies. Nat Rev Endocrinol. 2016;12:616–22. https://doi.org/10.1038/nrendo.2016.105.
Forbes JM, Cooper ME. Mechanisms of diabetic complications. Physiol Rev. 2013;93:137–88. https://doi.org/10.1152/physrev.00045.2011.
Tanaka S, Tanaka S, Iimuro S. Predicting macro- and microvascular complications in type 2 diabetes. Diabetes Care. 2013;36:1193–9. https://doi.org/10.2337/dc12-0958.
Laxy M, Schöning VM, Kurz C, Holle R, Peters A, Meisinger C, et al. Performance of the UKPDS outcomes model 2 for predicting death and cardiovascular events in patients with type 2 diabetes mellitus from a German population-based cohort. Pharmacoeconomics. 2019;37:1485–94. https://doi.org/10.1007/S40273-019-00822-4/TABLES/5.
Sim J, Kim YA, Kim JH, Lee JM, Kim MS, Shim YM, et al. The major effects of health-related quality of life on 5-year survival prediction among lung cancer survivors: applications of machine learning. Sci Rep. 2020;10:1–12. https://doi.org/10.1038/s41598-020-67604-3.
Stiglic G, Kocbek P, Fijacko N, Zitnik M, Verbert K, Cilar L. Interpretability of machine learning-based prediction models in healthcare. Wiley Interdiscip Rev Data Min Knowl Discov. 2020;10: e1379. https://doi.org/10.1002/WIDM.1379.
Shamout F, Zhu T, Clifton DA. Machine learning for clinical outcome prediction. IEEE Rev Biomed Eng. 2021;14:116–26. https://doi.org/10.1109/RBME.2020.3007816.
Dagliati A, Marini S, Sacchi L, Cogni G, Teliti M, Tibollo V, et al. Machine learning methods to predict diabetes complications. J Diabetes Sci Technol. 2018;12:295–302. https://doi.org/10.1177/1932296817706375.
Levin A, Stevens PE, Bilous RW, Coresh J, De Francisco ALM, De Jong PE, et al. Kidney disease: improving global outcomes (KDIGO) CKD work group. KDIGO 2012 clinical practice guideline for the evaluation and management of chronic kidney disease. Kidney Int Suppl. 2011;2013(3):1–150. https://doi.org/10.1038/KISUP.2012.73.
Niaksu O. CRISP data mining methodology extension for medical domain. Balt J Mod Comput 2015;3(2):92–109.
Abhari S, Kalhori SRN, Ebrahimi M, Hasannejadasl H, Garavand A. Artificial intelligence applications in type 2 diabetes mellitus care: focus on machine learning methods. Healthc Inform Res. 2019;25:248. https://doi.org/10.4258/HIR.2019.25.4.248.
Casanova R, Saldana S, Simpson SL, Lacy ME, Subauste AR, Blackshear C, et al. Prediction of incident diabetes in the Jackson heart study using high-dimensional machine learning. PLoS One. 2016;11:e0163942. https://doi.org/10.1371/journal.pone.0163942.
Rau HH, Hsu CY, Lin YA, Atique S, Fuad A, Wei LM, et al. Development of a web-based liver cancer prediction model for type II diabetes patients by using an artificial neural network. Comput Methods Programs Biomed. 2016;125:58–65. https://doi.org/10.1016/j.cmpb.2015.11.009.
Chen J, Tang H, Huang H, Lv L, Wang Y, Liu X et al (2015) Development and validation of new glomerular filtration rate predicting models for Chinese patients with type 2 diabetes. J Transl Med13. https://doi.org/10.1186/s12967-015-0674-y.
Huang GM, Huang KY, Lee TY, Weng JTY (2015) An interpretable rule-based diagnostic classification of diabetic nephropathy among type 2 diabetes patients. BMC Bioinformatics 16. https://doi.org/10.1186/1471-2105-16-S1-S5.
Chu-Su Y, Liu CS, Chen RS, Lin CW. Artificial neural networks for estimating glomerular filtration rate by urinary dipstick for type 2 diabetic patients. Biomed Eng (Singapore). 2016;28:1650016. https://doi.org/10.4015/S1016237216500162.
Acknowledgment
The authors want to thank the funding institutions: MINCIENCIAS, EPS SANITAS Colombia, as well as University of Santander (UDES) and University Foundation SANITAS for all the support in this process.
Funding
This study was funded by Ministry of Science Technology and Innovation of the Republic of Colombia (a.k.a. MINCIENCIAS) from call 811 of 2018, under code C160I000000011758-19, contract number 433 of 2019, and contingent recovery number 80740-433-2019.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing Interests
The authors have no financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Colmenares-Mejía, C.C., Rincón-Acuña, J.C., Cely, A. et al. Multivariable prediction model of complications derived from diabetes mellitus using machine learning on scarce highly unbalanced data. Int J Diabetes Dev Ctries (2023). https://doi.org/10.1007/s13410-023-01264-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13410-023-01264-7