Skip to main content

Advertisement

Log in

Identification of influence factors in overweight population through an interpretable risk model based on machine learning: a large retrospective cohort

  • Original Article
  • Published:
Endocrine Aims and scope Submit manuscript

Abstract

Background

The identification of associated overweight risk factors is crucial to future health risk predictions and behavioral interventions. Several consensus problems remain in machine learning, such as cross-validation, and the resulting model may suffer from overfitting or poor interpretability.

Methods

This study employed nine commonly used machine learning methods to construct overweight risk models. The general community are the target of this study, and a total of 10,905 Chinese subjects from Ningde City in Fujian province, southeast China, participated. The best model was selected through appropriate verification and validation and was suitably explained.

Results

The overweight risk models employing machine learning exhibited good performance. It was concluded that CatBoost, which is used in the construction of clinical risk models, may surpass previous machine learning methods. The visual display of the Shapley additive explanation value for the machine model variables accurately represented the influence of each variable in the model.

Conclusions

The construction of an overweight risk model using machine learning may currently be the best approach. Moreover, CatBoost may be the best machine learning method. Furthermore, combining Shapley’s additive explanation and machine learning methods can be effective in identifying disease risk factors for prevention and control.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

The raw data supporting the conclusions of this article will be made available by the authors without undue reservation.

Abbreviations

ANN/MLP:

artificial neural network/multiparametric linear programming

AUC:

area under the curve

BMI:

body mass index

BP:

blood pressure

DM:

diabetes mellitus

DPB:

diastolic blood pressure

FBG:

fasting blood glucose

FINS:

fasting insulin

GBDT:

gradient boosted decision tree

GBM:

gradient boosting machine

GNB:

Gaussian NB

HDL-C:

high-density lipoprotein cholesterol

HOMA-IR:

homeostasis model assessment of insulin resistance

KNN:

K-nearest neighbor

LDL-C:

low-density lipoprotein cholesterol

PBG:

postprandial blood glucose

ROC:

receiver operating characteristic

SBP:

systolic blood pressure

SHAP:

Shapley additive explanation

SVM:

supported vector machine

TC:

total cholesterol

TG:

total triglyceride

WC:

waist circumference

WHO:

World Health Organization

References

  1. A. Chatterjee, M.W. Gerdes, S.G. Martinez, Identification of risk factors associated with obesity and overweight-a machine learning overview. Sensors 20(9), 2734 (2020). https://doi.org/10.3390/s20092734

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  2. E.P. Williams, M. Mesidor, K. Winters, P.M. Dubbert, S.B. Wyatt, Overweight and obesity: prevalence, consequences, and causes of a growing public health problem. Curr. Obes. Rep. 4, 363–370 (2015). https://doi.org/10.1007/s13679-015-0169-4

    Article  PubMed  Google Scholar 

  3. H. Chen, B. Yang, D. Liu et al., Using blood indexes to predict overweight statuses: an extreme learning machine-based approach. PLoS ONE 10(11), e0143003 (2015). https://doi.org/10.1371/journal.pone.0143003

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. E.M. Bomberg, O.Y. Addo, K. Sarafoglou, B.S. Miller, Adjusting for pubertal status reduces overweight and obesity prevalence in the United States. J. Pediatr. 231, 200–206.e1 (2021). https://doi.org/10.1016/j.jpeds.2020.12.038

    Article  PubMed  PubMed Central  Google Scholar 

  5. Y. Wang, M.A. Beydoun, J. Min, H. Xue, L.A. Kaminsky, L.J. Cheskin, Has the prevalence of overweight, obesity and central obesity levelled off in the United States? Trends, patterns, disparities, and future projections for the obesity epidemic. Int J. Epidemiol. 49, 810–823 (2020). https://doi.org/10.1093/ije/dyz273

    Article  PubMed  PubMed Central  Google Scholar 

  6. C.J. Ireland, S.K. Thompson, T.A. Laws, A. Esterman, Risk factors for Barrett’s esophagus: a scoping review. Cancer Causes Control 27, 301–323 (2016). https://doi.org/10.1007/s10552-015-0710-5

    Article  PubMed  Google Scholar 

  7. Z. Obermeyer, E.J. Emanuel, Predicting the future - big data, machine learning, and clinical medicine. N. Engl. J. Med. 375, 1216–1219 (2016). https://doi.org/10.1056/NEJMp1606181

    Article  PubMed  PubMed Central  Google Scholar 

  8. M. Padmanabhan, P. Yuan, G. Chada, H.V. Nguyen, Physician-friendly machine learning: a case study with cardiovascular disease risk prediction. J Clin Med. 8(7), 1050 (2019). https://doi.org/10.3390/jcm8071050

    Article  PubMed  PubMed Central  Google Scholar 

  9. K.W. DeGregory, P. Kuiper, T. DeSilvio et al., A review of machine learning in obesity. Obes. Rev. 19, 668–685 (2018). https://doi.org/10.1111/obr.12667

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. H.F. Golino, L.S. Amaral, S.F. Duarte et al., Predicting increased blood pressure using machine learning. J. Obes. 2014, 637635 (2014). https://doi.org/10.1155/2014/637635

    Article  PubMed  PubMed Central  Google Scholar 

  11. A. Maharana, E.O. Nsoesie, Use of deep learning to examine the association of the built environment with prevalence of neighborhood adult obesity. JAMA Netw. Open 1, e181535 (2018). https://doi.org/10.1001/jamanetworkopen.2018.1535

    Article  PubMed  PubMed Central  Google Scholar 

  12. I. Yoo, P. Alafaireet, M. Marinov et al., Data mining in healthcare and biomedicine: a survey of the literature. J. Med. Syst. 36, 2431–2448 (2012). https://doi.org/10.1007/s10916-011-9710-5

    Article  PubMed  Google Scholar 

  13. M.N. LeCroy, R.S. Kim, J. Stevens, D.B. Hanna, C.R. Isasi, Identifying key determinants of childhood obesity: a narrative review of machine learning studies. Child Obes. 17, 153–159 (2021). https://doi.org/10.1089/chi.2020.0324

    Article  PubMed  PubMed Central  Google Scholar 

  14. S. Lundberg, S.- Lee, A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 4766–4775 (2017)

    Google Scholar 

  15. L. Pezzoli, N. Andrews, O. Ronveaux, Clustered lot quality assurance sampling to assess immunisation coverage: increasing rapidity and maintaining precision. Trop. Med. Int. Health 15, 540–546 (2010). https://doi.org/10.1111/j.1365-3156.2010.02482.x

    Article  PubMed  Google Scholar 

  16. Hypertension Study Group of Chinese Society of Cardiology of Chinese Medical A, [Chinese expert consensus on obesityrelatedhypertension management]. Zhonghua Xin Xue Guan Bing Za Zhi 44, 212–219 (2016)

  17. Endocrinology. CSo, Medicine. DSoCAoC, Surgery. CSfMaB, Surgery. CSoDaB, Hospitals, CAoR. Multidisciplinary clinical consensus on diagnosis and treatment of obesity (2021 edition). Chin. J. Endocrinol. Metab. 37(11), 959–972 (2021). https://doi.org/10.3760/cma.j.cn311282-20210807-00503

    Article  Google Scholar 

  18. W. Lin, S. Shi, H. Huang, N. Wang, J. Wen, G. Chen, Development of a risk model for predicting microalbuminuria in the Chinese population using machine learning algorithms. Front. Med. 9, 775275 (2022). https://doi.org/10.3389/fmed.2022.775275

    Article  Google Scholar 

  19. W. Jia, J. Weng, D. Zhu et al., Standards of medical care for type 2 diabetes in China 2019. Diabetes Metab. Res. Rev. 35, e3158 (2019). https://doi.org/10.1002/dmrr.3158

    Article  PubMed  Google Scholar 

  20. Joint Committee for Guideline R, 2018 Chinese guidelines for prevention and treatment of hypertension–a report of the Revision Committee of Chinese Guidelines for Prevention and Treatment of Hypertension. J. Geriatr. Cardiol. 16, 182–241 (2019). https://doi.org/10.11909/j.issn.1671-5411.2019.03.014

    Article  Google Scholar 

  21. T.M. Wallace, J.C. Levy, D.R. Matthews, Use and abuse of HOMA modeling. Diabetes Care 27, 1487–1495 (2004). https://doi.org/10.2337/diacare.27.6.1487

    Article  PubMed  Google Scholar 

  22. I.M. Nasir, M.A. Khan, M. Yasmin, et al., Pearson correlation-based feature selection for document classification using balanced training. Sensors 20(23), 6793 (2020). https://doi.org/10.3390/s20236793

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  23. P. Fabian, V. Gael, G. Alexandre, M. BVincent, T. Bertrand, Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(85), 2825–2830 (2011)

    MathSciNet  Google Scholar 

  24. W. Seo, N. Kim, S.K. Lee, S.M. Park, Machine learning-based analysis of adolescent gambling factors. J. Behav. Addict. 9, 734–743 (2020). https://doi.org/10.1556/2006.2020.00063

    Article  PubMed  PubMed Central  Google Scholar 

  25. A. Abraham, F. Pedregosa, M. Eickenberg et al., Machine learning for neuroimaging with scikit-learn. Front. Neuroinform. 8, 14 (2014). https://doi.org/10.3389/fninf.2014.00014

    Article  PubMed  PubMed Central  Google Scholar 

  26. G. Colmenarejo, Machine Learning Models to Predict Childhood and Adolescent Obesity: A Review. Nutrients 12(8), 2466 (2020). https://doi.org/10.3390/nu12082466

    Article  PubMed  PubMed Central  Google Scholar 

  27. B. Van Calster, D.J. McLernon, M. van Smeden et al., Calibration: the Achilles heel of predictive analytics. BMC Med. 17, 230 (2019). https://doi.org/10.1186/s12916-019-1466-7

    Article  PubMed  PubMed Central  Google Scholar 

  28. A.J. Vickers, F. Holland, Decision curve analysis to evaluate the clinical benefit of prediction models. Spine J. 21, 1643–1648 (2021). https://doi.org/10.1016/j.spinee.2021.02.024

    Article  PubMed  PubMed Central  Google Scholar 

  29. A.J. Vickers, E.B. Elkin, Decision curve analysis: a novel method for evaluating prediction models. Med Decis. Mak. 26, 565–574 (2006). https://doi.org/10.1177/0272989X06295361

    Article  Google Scholar 

  30. M.J. Pencina, R.B. D’Agostino Sr, R.B. D’Agostino Jr, R.S. Vasan, Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat. Med. 27, 157–172 (2008). https://doi.org/10.1002/sim.2929.

    Article  MathSciNet  PubMed  Google Scholar 

  31. Y. Yang, Y. Yuan, Z. Han, G. Liu, Interpretability analysis for thermal sensation machine learning models: an exploration based on the SHAP approach. Indoor Air 32, e12984 (2022). https://doi.org/10.1111/ina.12984

    Article  PubMed  Google Scholar 

  32. S.M. Lundberg, G. Erion, H. Chen et al., From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020). https://doi.org/10.1038/s42256-019-0138-9

    Article  PubMed  PubMed Central  Google Scholar 

  33. X. Wang, G. Gong, N. Li, S. Qiu, Detection analysis of epileptic EEG using a novel random forest model combined with grid search optimization. Front. Hum. Neurosci. 13, 52 (2019). https://doi.org/10.3389/fnhum.2019.00052

    Article  PubMed  PubMed Central  Google Scholar 

  34. J.T. Hancock, T.M. Khoshgoftaar, CatBoost for big data: an interdisciplinary review. J. Big Data 7(1), 94 (2020). https://doi.org/10.1186/s40537-020-00369-8

    Article  PubMed  PubMed Central  Google Scholar 

  35. K. Ambe, M. Suzuki, T. Ashikaga, M. Tohkin, Development of quantitative model of a local lymph node assay for evaluating skin sensitization potency applying machine learning CatBoost. Regul. Toxicol. Pharmacol. 125, 105019 (2021). https://doi.org/10.1016/j.yrtph.2021.105019

    Article  CAS  PubMed  Google Scholar 

  36. C. Zhang, X. Chen, S. Wang, J. Hu, C. Wang, X. Liu, Using CatBoost algorithm to identify middle-aged and elderly depression, national health and nutrition examination survey 2011-2018. Psychiatry Res. 306, 114261 (2021). https://doi.org/10.1016/j.psychres.2021.114261

    Article  PubMed  Google Scholar 

  37. T.M. Dugan, S. Mukhopadhyay, A. Carroll, S. Downs, Machine learning techniques for prediction of early childhood obesity. Appl. Clin. Inf. 6(3), 506–520 (2015). https://doi.org/10.4338/ACI-2015-03-RA-0036

    Article  CAS  Google Scholar 

  38. N. Kanerva, J. Kontto, M. Erkkola, J. Nevalainen, S. Mannisto, Suitability of random forest analysis for epidemiological research: exploring sociodemographic and lifestyle-related risk factors of overweight in a cross-sectional design. Scand. J. Public Health 46, 557–564 (2018). https://doi.org/10.1177/1403494817736944

    Article  PubMed  Google Scholar 

  39. M. Safaei, E.A. Sundararajan, M. Driss, W. Boulila, A. Shapi’i, A systematic literature review on obesity: understanding the causes & consequences of obesity and reviewing various machine learning approaches used to predict obesity. Comput. Biol. Med. 136, 104754 (2021). https://doi.org/10.1016/j.compbiomed.2021.104754

    Article  PubMed  Google Scholar 

  40. X. Pang, C.B. Forrest, F. Le-Scherban, A.J. Masino, Prediction of early childhood obesity with machine learning and electronic health record data. Int. J. Med. Inform. 150, 104454 (2021). https://doi.org/10.1016/j.ijmedinf.2021.104454

    Article  PubMed  Google Scholar 

  41. B. Farran, R. AlWotayan, H. Alkandari, D. Al-Abdulrazzaq, A. Channanath, T.A. Thanaraj, Use of non-invasive parameters and machine-learning algorithms for predicting future risk of type 2 diabetes: a retrospective cohort study of health data from Kuwait. Front. Endocrinol. 10, 624 (2019). https://doi.org/10.3389/fendo.2019.00624

    Article  Google Scholar 

  42. C.C. Olisah, L. Smith, M. Smith, Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective. Comput. Methods Prog. Biomed. 220, 106773 (2022). https://doi.org/10.1016/j.cmpb.2022.106773

    Article  Google Scholar 

  43. S.M. Lee, S. Hwangbo, E.R. Norwitz et al., Nonalcoholic fatty liver disease and early prediction of gestational diabetes mellitus using machine learning methods. Clin. Mol. Hepatol. 28, 105–116 (2022). https://doi.org/10.3350/cmh.2021.0174

    Article  PubMed  Google Scholar 

  44. A. Cahn, A. Shoshan, T. Sagiv et al., Prediction of progression from pre-diabetes to diabetes: development and validation of a machine learning model. Diabetes Metab. Res. Rev. 36, e3252 (2020). https://doi.org/10.1002/dmrr.3252

    Article  PubMed  Google Scholar 

  45. H. Wei, J. Sun, W. Shan et al., Environmental chemical exposure dynamics and machine learning-based prediction of diabetes mellitus. Sci. Total Environ. 806, 150674 (2022). https://doi.org/10.1016/j.scitotenv.2021.150674

    Article  CAS  PubMed  ADS  Google Scholar 

  46. A. Nicolucci, L. Romeo, M. Bernardini et al., Prediction of complications of type 2 diabetes: a machine learning approach. Diabetes Res. Clin. Pract. 190, 110013 (2022). https://doi.org/10.1016/j.diabres.2022.110013

    Article  PubMed  Google Scholar 

  47. H. Liu, J. Li, J. Leng et al., Machine learning risk score for prediction of gestational diabetes in early pregnancy in Tianjin, China. Diabetes Metab. Res. Rev. 37, e3397 (2021). https://doi.org/10.1002/dmrr.3397

    Article  CAS  PubMed  Google Scholar 

  48. S. Belur Nagaraj, M.J. Pena, W. Ju, H.L. Heerspink, B.E.-D. Consortium, Machine-learning-based early prediction of end-stage renal disease in patients with diabetic kidney disease using clinical trials data. Diabetes Obes. Metab. 22, 2479–2486 (2020). https://doi.org/10.1111/dom.14178

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. I. Motaib, F. Aitlahbib, A. Fadil et al., Predicting poor glycemic control during Ramadan among non-fasting patients with diabetes using artificial intelligence based machine learning models. Diabetes Res. Clin. Pract. 190, 109982 (2022). https://doi.org/10.1016/j.diabres.2022.109982

    Article  PubMed  Google Scholar 

  50. Y. Ruan, A. Bellot, Z. Moysova et al., Predicting the risk of inpatient hypoglycemia with machine learning using electronic health records. Diabetes Care 43, 1504–1511 (2020). https://doi.org/10.2337/dc19-1743

    Article  CAS  PubMed  Google Scholar 

  51. Y.T. Wu, C.J. Zhang, B.W. Mol et al., Early prediction of gestational diabetes mellitus in the Chinese population via advanced machine learning. J. Clin. Endocrinol. Metab. 106, e1191–e1205 (2021). https://doi.org/10.1210/clinem/dgaa899

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the participants for providing the information used in this study and for kindly making arrangements for the data collection.

Funding

This study was supported by Fujian Research and Training Grants for Young and Middle-aged Leaders in Healthcare (Grant No. (2023)417#), the Innovation Project of Fujian Provincial Health Commission (2021CXA003), Natural Science Foundation of Fujian Province (Grant No. 2022J011017 and Grant No. 2020J011068), National Key Research and Development Program of China (2018YFC2001100-5), and Natural Science Foundation of China (82070878).

Author information

Authors and Affiliations

Authors

Contributions

W.L. and S.S. performed the formal analysis, devised the methodology, and wrote the original draft. H.L., H.H., and J.W. performed the curation of data and resources. W.L. and G.C. were involved in the conceptualization, formal analysis, writing of the original draft, and project administration. G.C. are the guarantors of this manuscript, had full access to all the data in the study, and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Corresponding authors

Correspondence to Wei Lin or Gang Chen.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Ethical approval

Our study was performed in accordance with the Declaration of Helsinki and approved by The Ethics Committee of Fujian Provincial Hospital (approval no. K2019-06-032). All patients provided written informed consent prior to enrollment in the study.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, W., Shi, S., Lan, H. et al. Identification of influence factors in overweight population through an interpretable risk model based on machine learning: a large retrospective cohort. Endocrine 83, 604–614 (2024). https://doi.org/10.1007/s12020-023-03536-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12020-023-03536-y

Keywords

Navigation