Integrated machine learning and deep learning for predicting diabetic nephropathy model construction, validation, and interpretability

Ma, Junjie; An, Shaoguang; Cao, Mohan; Zhang, Lei; Lu, Jin

doi:10.1007/s12020-024-03735-1

Integrated machine learning and deep learning for predicting diabetic nephropathy model construction, validation, and interpretability

Original Article
Published: 23 February 2024

(2024)
Cite this article

Endocrine Aims and scope Submit manuscript

Junjie Ma ORCID: orcid.org/0000-0002-8847-3093¹,
Shaoguang An¹,
Mohan Cao¹,
Lei Zhang² &
…
Jin Lu^3,4

250 Accesses
Explore all metrics

Abstract

Objective

To construct a risk prediction model for assisted diagnosis of Diabetic Nephropathy (DN) using machine learning algorithms, and to validate it internally and externally.

Methods

Firstly, the data was cleaned and enhanced, and was divided into training and test sets according to the 7:3 ratio. Then, the metrics related to DN were filtered by difference analysis, Least Absolute Shrinkage and Selection Operator (LASSO), Recursive Feature Elimination (RFE), and Max-relevance and Min-redundancy (MRMR) algorithms. Ten machine learning models were constructed based on the key variables. The best model was filtered by Receiver Operating Characteristic (ROC), Precision-Recall (PR), Accuracy, Matthews Correlation Coefficient (MCC), and Kappa, and was internally and externally validated. Based on the best model, an online platform had been constructed.

Results

15 key variables were selected, and among the 10 machine learning models, the Random Forest model achieved the best predictive performance. In the test set, the area under the ROC curve was 0.912, and in two external validation cohorts, the area under the ROC curve was 0.828 and 0.863, indicating excellent predictive and generalization abilities.

Conclusion

The model has a good predictive value and is expected to help in the early diagnosis and screening of clinical DN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development and validation of a risk score for diabetic kidney disease prediction in type 2 diabetes patients: a machine learning approach

Article 17 September 2022

Predicting diabetic nephropathy in type 2 diabetic patients using machine learning algorithms

Article 26 July 2022

A machine learning driven nomogram for predicting chronic kidney disease stages 3–5

Article Open access 07 December 2023

Data availability

All data in this article can be found in the following databases: NPHDC, NHANES, and TWBB. An online platform has been created and you can access it through the following link (https://dn-prediction.shinyapps.io/DN-PRED-English).

References

M. Darenskaya, S. Kolesnikov, N. Semenova, L. Kolesnikova. Diabetic nephropathy: significance of determining oxidative stress and opportunities for antioxidant therapies. Int. J. Mol. Sci. 24 (2023). https://doi.org/10.3390/ijms241512378.
Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017, A systematic analysis for the Global Burden of Disease Study 2017. Lancet 392, 1789–1858 (2018). https://doi.org/10.1016/s0140-6736(18)32279-7
Article Google Scholar
M. Guedes, R. Pecoits-Filho, Can we cure diabetic kidney disease? Present and future perspectives from a nephrologist’s point of view. J. Intern. Med. 291, 165–180 (2022). https://doi.org/10.1111/joim.13424
Article PubMed Google Scholar
Q. Hu, Y. Chen, X. Deng, Y. Li, X. Ma, J. Zeng, Y. Zhao, Diabetic nephropathy: Focusing on pathological signals, clinical treatment, and dietary regulation. Biomed. Pharmacother. 159, 114252 (2023). https://doi.org/10.1016/j.biopha.2023.114252
Article PubMed Google Scholar
K. Zhang, Z. Fu, Y. Zhang, X. Chen, G. Cai, Q. Hong, The role of cellular crosstalk in the progression of diabetic nephropathy. Front. Endocrinol. (Lausanne) 14, 1173933 (2023). https://doi.org/10.3389/fendo.2023.1173933
Article PubMed Google Scholar
M. Vučić Lovrenčić, S. Božičević, L. Smirčić Duvnjak, Diagnostic challenges of diabetic kidney disease. Biochem. Med. (Zagreb) 33, 030501 (2023). https://doi.org/10.11613/bm.2023.030501
Article PubMed Google Scholar
R.Y. Choi, A.S. Coyner, J. Kalpathy-Cramer, M.F. Chiang, J.P. Campbell, Introduction to machine learning, neural networks, and deep learning. Transl. Vis. Sci. Technol. 9, 14 (2020). https://doi.org/10.1167/tvst.9.2.14
Article PubMed PubMed Central Google Scholar
G.S. Handelman, H.K. Kok, R.V. Chandra, A.H. Razavi, M.J. Lee, H. Asadi, eDoctor: Machine learning and the future of medicine. J. Intern Med. 284, 603–619 (2018). https://doi.org/10.1111/joim.12822
Article CAS PubMed Google Scholar
R. Gupta, S. Kumari, A. Senapati, R.K. Ambasta, P. Kumar, New era of artificial intelligence and machine learning-based detection, diagnosis, and therapeutics in Parkinson’s disease. Ageing Res. Rev. 90, 102013 (2023). https://doi.org/10.1016/j.arr.2023.102013
Article CAS PubMed Google Scholar
Z. Bao, J. Bufton, R.J. Hickman, A. Aspuru-Guzik, P. Bannigan, C. Allen, Revolutionizing drug formulation development: The increasing impact of machine learning. Adv. Drug Deliv. Rev. 202, 115108 (2023). https://doi.org/10.1016/j.addr.2023.115108
Article CAS PubMed Google Scholar
J.B. Xue, S. Xia, X.Y. Wang, L.L. Huang, L.Y. Huang, Y.W. Hao, L.J. Zhang, S.Z. Li, Recognizing and monitoring infectious sources of schistosomiasis by developing deep learning models with high-resolution remote sensing images. Infect. Dis. Poverty 12, 6 (2023). https://doi.org/10.1186/s40249-023-01060-9
Article PubMed PubMed Central Google Scholar
J.M. Yin, Y. Li, J.T. Xue, G.W. Zong, Z.Z. Fang, L. Zou, Explainable machine learning-based prediction model for diabetic nephropathy. J. Diabetes Res. 2024, 8857453 (2024). https://doi.org/10.1155/2024/8857453
Article PubMed PubMed Central Google Scholar
M. Xu, H. Zhou, P. Hu, Y. Pan, S. Wang, L. Liu, X. Liu, Identification and validation of immune and oxidative stress-related diagnostic markers for diabetic nephropathy by WGCNA and machine learning. Front. Immunol. 14, 1084531 (2023). https://doi.org/10.3389/fimmu.2023.1084531
Article CAS PubMed PubMed Central Google Scholar
X.Z. Liu, M. Duan, H.D. Huang, Y. Zhang, T.Y. Xiang, W.C. Niu, B. Zhou, H.L. Wang, T.T. Zhang, Predicting diabetic kidney disease for type 2 diabetes mellitus by machine learning in the real world: A multicenter retrospective study. Front Endocrinol. (Lausanne) 14, 1184190 (2023). https://doi.org/10.3389/fendo.2023.1184190
Article PubMed Google Scholar
S.M. Hosseini Sarkhosh, M. Hemmatabadi, A. Esteghamati, Development and validation of a risk score for diabetic kidney disease prediction in type 2 diabetes patients: a machine learning approach. J. Endocrinol. Invest 46, 415–423 (2023). https://doi.org/10.1007/s40618-022-01919-y
Article CAS PubMed Google Scholar
L. Zhao, H. Ren, J. Zhang, Y. Cao, Y. Wang, D. Meng, Y. Wu, R. Zhang, Y. Zou, H. Xu et al. Diabetic retinopathy, classified using the lesion-aware deep learning system, predicts diabetic end-stage renal disease in Chinese patients. Endocr. Pract. 26, 429–443 (2020). https://doi.org/10.4158/ep-2019-0512
Article PubMed Google Scholar
C.T. Fan, J.C. Lin, C.H. Lee, Taiwan Biobank: a project aiming to aid Taiwan’s transition into a biomedical island. Pharmacogenomics 9, 235–246 (2008). https://doi.org/10.2217/14622416.9.2.235
Article PubMed Google Scholar
S.v. Buuren. Flexible Imputation of Missing Data, 2nd edn. (Boca Raton, FL, 2018)
Z. Xu, D. Shen, Y. Kou, T. Nie. A synthetic minority oversampling technique based on Gaussian mixture model filtering for imbalanced data classification. IEEE Trans Neural Netw Learn Syst (2022). https://doi.org/10.1109/tnnls.2022.3197156
L. McInnes, J. Healy, J. Melville. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 2018,
J. Chen, X. Zhang, D-MANOVA: fast distance-based multivariate analysis of variance for large-scale microbiome association studies. Bioinformatics 38, 286–288 (2021). https://doi.org/10.1093/bioinformatics/btab498
Article CAS PubMed PubMed Central Google Scholar
J.K. Tay, B. Narasimhan, T. Hastie. Elastic net regularization paths for all generalized linear models. J. Stat. Softw. 106 (2023). https://doi.org/10.18637/jss.v106.i01
Y. Han, L. Huang, F. Zhou, A dynamic recursive feature elimination framework (dRFE) to further refine a set of OMIC biomarkers. Bioinformatics 37, 2183–2189 (2021). https://doi.org/10.1093/bioinformatics/btab055
Article CAS PubMed Google Scholar
H. Peng, F. Long, C. Ding, Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1226–1238 (2005). https://doi.org/10.1109/tpami.2005.159
Article PubMed Google Scholar
L. Breiman, Random forests. Mach. Learn. 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Article Google Scholar
T. Chen, C. Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp. 785–794 (2016)
E. Alfaro, M. Gáamez, N. García. adabag: An R package for classification with boosting and bagging. J. Stat. Softw. 2013, 54, https://doi.org/10.18637/jss.v054.i02
L. Prokhorenkova, G. Gusev, A. Vorobev, A.V. Dorogush, A. Gulin. CatBoost: unbiased boosting with categorical features. Adv. Neural Inform. Process. Syst. 31 (2018). https://doi.org/10.48550/arXiv.1706.09516
G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, T.-Y. Liu. LightGBM: a highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, California, USA, 2017; pp. 3149–3157
M. Abadi, P. Barham, J. Chen, Z. Chen, X. Zhang. TensorFlow: A system for large-scale machine learning. USENIX Association 2016, 265–283, https://doi.org/10.48550/arXiv.1605.08695
T.A. Dejenie, E.C. Abebe, M.A. Mengstie, M.A. Seid, N.A. Gebeyehu, G.A. Adella, G.A. Kassie, A.Y. Gebrekidan, M.M. Gesese, K.D. Tegegne et al. Dyslipidemia and serum cystatin C levels as biomarker of diabetic nephropathy in patients with type 2 diabetes mellitus. Front Endocrinol. (Lausanne) 14, 1124367 (2023). https://doi.org/10.3389/fendo.2023.1124367
Article PubMed PubMed Central Google Scholar
A.K. Clift, D. Dodwell, S. Lord, S. Petrou, M. Brady, G.S. Collins, J. Hippisley-Cox, Development and internal-external validation of statistical and machine learning models for breast cancer prognostication: cohort study. Bmj 381, e073800 (2023). https://doi.org/10.1136/bmj-2022-073800
Article PubMed PubMed Central Google Scholar
V. Subbiah, The next generation of evidence-based medicine. Nat. Med 29, 49–58 (2023). https://doi.org/10.1038/s41591-022-02160-z
Article CAS PubMed Google Scholar
R.D. Joshi, C.K. Dhakal. Predicting type 2 diabetes using logistic regression and machine learning approaches. Int. J. Environ. Res. Public Health 18 (2021). https://doi.org/10.3390/ijerph18147346
A. Zanchi, A.W. Jehle, F. Lamine, B. Vogt, C. Czerlau, S. Bilz, H. Seeger, S. de Seigneux, Diabetic kidney disease in type 2 diabetes: a consensus statement from the Swiss Societies of Diabetes and Nephrology. Swiss Med Wkly 153, 40004 (2023). https://doi.org/10.57187/smw.2023.40004
Article PubMed Google Scholar
B.F. Palmer, Change in albuminuria as a surrogate endpoint for cardiovascular and renal outcomes in patients with diabetes. Diabetes Obes. Metab. 25, 1434–1443 (2023). https://doi.org/10.1111/dom.15030
Article PubMed Google Scholar
X. Ren, N. Kang, X. Yu, X. Li, Y. Tang, J. Wu, Prevalence and association of diabetic nephropathy in newly diagnosed Chinese patients with diabetes in the Hebei province: A single-center case-control study. Medicine (Baltimore) 102, e32911 (2023). https://doi.org/10.1097/md.0000000000032911
Article CAS PubMed Google Scholar
S. Chen, L. Chen, H. Jiang, Prognosis and risk factors of chronic kidney disease progression in patients with diabetic kidney disease and non-diabetic kidney disease: a prospective cohort CKD-ROUTE study. Ren. Fail 44, 1309–1318 (2022). https://doi.org/10.1080/0886022x.2022.2106872
Article CAS PubMed PubMed Central Google Scholar
K. Azushima, J.P. Kovalik, T. Yamaji, J. Ching, T.W. Chng, J. Guo, J.J. Liu, M. Nguyen, R.B. Sakban, S.E. George, et al. Abnormal lactate metabolism is linked to albuminuria and kidney injury in diabetic nephropathy. Kidney Int. (2023). https://doi.org/10.1016/j.kint.2023.08.006
J.G. Greener, S.M. Kandathil, L. Moffat, D.T. Jones, A guide to machine learning for biologists. Nat. Rev. Mol. Cell Biol. 23, 40–55 (2022). https://doi.org/10.1038/s41580-021-00407-0
Article CAS PubMed Google Scholar
M. Garofolo, V. Napoli, D. Lucchesi, S. Accogli, M.L. Mazzeo, P. Rossi, E. Neri, S. Del Prato, G. Penno, Type 2 diabetes albuminuric and non-albuminuric phenotypes have different morphological and functional ultrasound features of diabetic kidney disease. Diabetes Metab. Res Rev. 39, e3585 (2023). https://doi.org/10.1002/dmrr.3585
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank the National Population Health Data Center of China, the National Health and Nutrition Examination Survey of the United States, and Taiwan Biobank for providing data support.

Funding

This study was supported by the College Students’ Innovative Entrepreneurial Training Plan Program (202310367071).

Author information

Authors and Affiliations

Department of Clinical Medicine, Bengbu Medical University, Bengbu, China
Junjie Ma, Shaoguang An & Mohan Cao
Department of Oncology Surgery, the Second Affiliated Hospital of Bengbu Medical University, Bengbu, China
Lei Zhang
Anhui Key Laboratory of Computational Medicine and Intelligent Health, Bengbu Medical University, Bengbu, China
Jin Lu
School of Basic Medicine, Bengbu Medical University, Bengbu, China
Jin Lu

Authors

Junjie Ma
View author publications
You can also search for this author in PubMed Google Scholar
Shaoguang An
View author publications
You can also search for this author in PubMed Google Scholar
Mohan Cao
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jin Lu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study’s conception and design. Data collection and analysis were performed by J.J.M. The first draft of the manuscript was written by J.J.M., S.G.A., and M.H.C. The revision of the manuscript was completed by L.Z. and J.L. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jin Lu.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ma, J., An, S., Cao, M. et al. Integrated machine learning and deep learning for predicting diabetic nephropathy model construction, validation, and interpretability. Endocrine (2024). https://doi.org/10.1007/s12020-024-03735-1

Download citation

Received: 13 November 2023
Accepted: 06 February 2024
Published: 23 February 2024
DOI: https://doi.org/10.1007/s12020-024-03735-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integrated machine learning and deep learning for predicting diabetic nephropathy model construction, validation, and interpretability