Identification of influence factors in overweight population through an interpretable risk model based on machine learning: a large retrospective cohort

Lin, Wei; Shi, Songchang; Lan, Huiyu; Wang, Nengying; Huang, Huibin; Wen, Junping; Chen, Gang

doi:10.1007/s12020-023-03536-y

Identification of influence factors in overweight population through an interpretable risk model based on machine learning: a large retrospective cohort

Original Article
Published: 30 September 2023

Volume 83, pages 604–614, (2024)
Cite this article

Endocrine Aims and scope Submit manuscript

194 Accesses
Explore all metrics

Abstract

Background

The identification of associated overweight risk factors is crucial to future health risk predictions and behavioral interventions. Several consensus problems remain in machine learning, such as cross-validation, and the resulting model may suffer from overfitting or poor interpretability.

Methods

This study employed nine commonly used machine learning methods to construct overweight risk models. The general community are the target of this study, and a total of 10,905 Chinese subjects from Ningde City in Fujian province, southeast China, participated. The best model was selected through appropriate verification and validation and was suitably explained.

Results

The overweight risk models employing machine learning exhibited good performance. It was concluded that CatBoost, which is used in the construction of clinical risk models, may surpass previous machine learning methods. The visual display of the Shapley additive explanation value for the machine model variables accurately represented the influence of each variable in the model.

Conclusions

The construction of an overweight risk model using machine learning may currently be the best approach. Moreover, CatBoost may be the best machine learning method. Furthermore, combining Shapley’s additive explanation and machine learning methods can be effective in identifying disease risk factors for prevention and control.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development and Validation of Prediction Model for Risk Reduction of Metabolic Syndrome by Body Weight Control: A Prospective Population-based Study

Article Open access 19 June 2020

Comparing Performance of Ensemble-Based Machine Learning Algorithms to Identify Potential Obesity Risk Factors from Public Health Datasets

Ranking of a wide multidomain set of predictor variables of children obesity by machine learning variable importance techniques

Article Open access 21 January 2021

Data availability

The raw data supporting the conclusions of this article will be made available by the authors without undue reservation.

Abbreviations

ANN/MLP:: artificial neural network/multiparametric linear programming
AUC:: area under the curve
BMI:: body mass index
BP:: blood pressure
DM:: diabetes mellitus
DPB:: diastolic blood pressure
FBG:: fasting blood glucose
FINS:: fasting insulin
GBDT:: gradient boosted decision tree
GBM:: gradient boosting machine
GNB:: Gaussian NB
HDL-C:: high-density lipoprotein cholesterol
HOMA-IR:: homeostasis model assessment of insulin resistance
KNN:: K-nearest neighbor
LDL-C:: low-density lipoprotein cholesterol
PBG:: postprandial blood glucose
ROC:: receiver operating characteristic
SBP:: systolic blood pressure
SHAP:: Shapley additive explanation
SVM:: supported vector machine
TC:: total cholesterol
TG:: total triglyceride
WC:: waist circumference
WHO:: World Health Organization

References

A. Chatterjee, M.W. Gerdes, S.G. Martinez, Identification of risk factors associated with obesity and overweight-a machine learning overview. Sensors 20(9), 2734 (2020). https://doi.org/10.3390/s20092734
Article PubMed PubMed Central ADS Google Scholar
E.P. Williams, M. Mesidor, K. Winters, P.M. Dubbert, S.B. Wyatt, Overweight and obesity: prevalence, consequences, and causes of a growing public health problem. Curr. Obes. Rep. 4, 363–370 (2015). https://doi.org/10.1007/s13679-015-0169-4
Article PubMed Google Scholar
H. Chen, B. Yang, D. Liu et al., Using blood indexes to predict overweight statuses: an extreme learning machine-based approach. PLoS ONE 10(11), e0143003 (2015). https://doi.org/10.1371/journal.pone.0143003
Article CAS PubMed PubMed Central Google Scholar
E.M. Bomberg, O.Y. Addo, K. Sarafoglou, B.S. Miller, Adjusting for pubertal status reduces overweight and obesity prevalence in the United States. J. Pediatr. 231, 200–206.e1 (2021). https://doi.org/10.1016/j.jpeds.2020.12.038
Article PubMed PubMed Central Google Scholar
Y. Wang, M.A. Beydoun, J. Min, H. Xue, L.A. Kaminsky, L.J. Cheskin, Has the prevalence of overweight, obesity and central obesity levelled off in the United States? Trends, patterns, disparities, and future projections for the obesity epidemic. Int J. Epidemiol. 49, 810–823 (2020). https://doi.org/10.1093/ije/dyz273
Article PubMed PubMed Central Google Scholar
C.J. Ireland, S.K. Thompson, T.A. Laws, A. Esterman, Risk factors for Barrett’s esophagus: a scoping review. Cancer Causes Control 27, 301–323 (2016). https://doi.org/10.1007/s10552-015-0710-5
Article PubMed Google Scholar
Z. Obermeyer, E.J. Emanuel, Predicting the future - big data, machine learning, and clinical medicine. N. Engl. J. Med. 375, 1216–1219 (2016). https://doi.org/10.1056/NEJMp1606181
Article PubMed PubMed Central Google Scholar
M. Padmanabhan, P. Yuan, G. Chada, H.V. Nguyen, Physician-friendly machine learning: a case study with cardiovascular disease risk prediction. J Clin Med. 8(7), 1050 (2019). https://doi.org/10.3390/jcm8071050
Article PubMed PubMed Central Google Scholar
K.W. DeGregory, P. Kuiper, T. DeSilvio et al., A review of machine learning in obesity. Obes. Rev. 19, 668–685 (2018). https://doi.org/10.1111/obr.12667
Article CAS PubMed PubMed Central Google Scholar
H.F. Golino, L.S. Amaral, S.F. Duarte et al., Predicting increased blood pressure using machine learning. J. Obes. 2014, 637635 (2014). https://doi.org/10.1155/2014/637635
Article PubMed PubMed Central Google Scholar
A. Maharana, E.O. Nsoesie, Use of deep learning to examine the association of the built environment with prevalence of neighborhood adult obesity. JAMA Netw. Open 1, e181535 (2018). https://doi.org/10.1001/jamanetworkopen.2018.1535
Article PubMed PubMed Central Google Scholar
I. Yoo, P. Alafaireet, M. Marinov et al., Data mining in healthcare and biomedicine: a survey of the literature. J. Med. Syst. 36, 2431–2448 (2012). https://doi.org/10.1007/s10916-011-9710-5
Article PubMed Google Scholar
M.N. LeCroy, R.S. Kim, J. Stevens, D.B. Hanna, C.R. Isasi, Identifying key determinants of childhood obesity: a narrative review of machine learning studies. Child Obes. 17, 153–159 (2021). https://doi.org/10.1089/chi.2020.0324
Article PubMed PubMed Central Google Scholar
S. Lundberg, S.- Lee, A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 4766–4775 (2017)
Google Scholar
L. Pezzoli, N. Andrews, O. Ronveaux, Clustered lot quality assurance sampling to assess immunisation coverage: increasing rapidity and maintaining precision. Trop. Med. Int. Health 15, 540–546 (2010). https://doi.org/10.1111/j.1365-3156.2010.02482.x
Article PubMed Google Scholar
Hypertension Study Group of Chinese Society of Cardiology of Chinese Medical A, [Chinese expert consensus on obesityrelatedhypertension management]. Zhonghua Xin Xue Guan Bing Za Zhi 44, 212–219 (2016)
Endocrinology. CSo, Medicine. DSoCAoC, Surgery. CSfMaB, Surgery. CSoDaB, Hospitals, CAoR. Multidisciplinary clinical consensus on diagnosis and treatment of obesity (2021 edition). Chin. J. Endocrinol. Metab. 37(11), 959–972 (2021). https://doi.org/10.3760/cma.j.cn311282-20210807-00503
Article Google Scholar
W. Lin, S. Shi, H. Huang, N. Wang, J. Wen, G. Chen, Development of a risk model for predicting microalbuminuria in the Chinese population using machine learning algorithms. Front. Med. 9, 775275 (2022). https://doi.org/10.3389/fmed.2022.775275
Article Google Scholar
W. Jia, J. Weng, D. Zhu et al., Standards of medical care for type 2 diabetes in China 2019. Diabetes Metab. Res. Rev. 35, e3158 (2019). https://doi.org/10.1002/dmrr.3158
Article PubMed Google Scholar
Joint Committee for Guideline R, 2018 Chinese guidelines for prevention and treatment of hypertension–a report of the Revision Committee of Chinese Guidelines for Prevention and Treatment of Hypertension. J. Geriatr. Cardiol. 16, 182–241 (2019). https://doi.org/10.11909/j.issn.1671-5411.2019.03.014
Article Google Scholar
T.M. Wallace, J.C. Levy, D.R. Matthews, Use and abuse of HOMA modeling. Diabetes Care 27, 1487–1495 (2004). https://doi.org/10.2337/diacare.27.6.1487
Article PubMed Google Scholar
I.M. Nasir, M.A. Khan, M. Yasmin, et al., Pearson correlation-based feature selection for document classification using balanced training. Sensors 20(23), 6793 (2020). https://doi.org/10.3390/s20236793
Article PubMed PubMed Central ADS Google Scholar
P. Fabian, V. Gael, G. Alexandre, M. BVincent, T. Bertrand, Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(85), 2825–2830 (2011)
MathSciNet Google Scholar
W. Seo, N. Kim, S.K. Lee, S.M. Park, Machine learning-based analysis of adolescent gambling factors. J. Behav. Addict. 9, 734–743 (2020). https://doi.org/10.1556/2006.2020.00063
Article PubMed PubMed Central Google Scholar
A. Abraham, F. Pedregosa, M. Eickenberg et al., Machine learning for neuroimaging with scikit-learn. Front. Neuroinform. 8, 14 (2014). https://doi.org/10.3389/fninf.2014.00014
Article PubMed PubMed Central Google Scholar
G. Colmenarejo, Machine Learning Models to Predict Childhood and Adolescent Obesity: A Review. Nutrients 12(8), 2466 (2020). https://doi.org/10.3390/nu12082466
Article PubMed PubMed Central Google Scholar
B. Van Calster, D.J. McLernon, M. van Smeden et al., Calibration: the Achilles heel of predictive analytics. BMC Med. 17, 230 (2019). https://doi.org/10.1186/s12916-019-1466-7
Article PubMed PubMed Central Google Scholar
A.J. Vickers, F. Holland, Decision curve analysis to evaluate the clinical benefit of prediction models. Spine J. 21, 1643–1648 (2021). https://doi.org/10.1016/j.spinee.2021.02.024
Article PubMed PubMed Central Google Scholar
A.J. Vickers, E.B. Elkin, Decision curve analysis: a novel method for evaluating prediction models. Med Decis. Mak. 26, 565–574 (2006). https://doi.org/10.1177/0272989X06295361
Article Google Scholar
M.J. Pencina, R.B. D’Agostino Sr, R.B. D’Agostino Jr, R.S. Vasan, Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat. Med. 27, 157–172 (2008). https://doi.org/10.1002/sim.2929.
Article MathSciNet PubMed Google Scholar
Y. Yang, Y. Yuan, Z. Han, G. Liu, Interpretability analysis for thermal sensation machine learning models: an exploration based on the SHAP approach. Indoor Air 32, e12984 (2022). https://doi.org/10.1111/ina.12984
Article PubMed Google Scholar
S.M. Lundberg, G. Erion, H. Chen et al., From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020). https://doi.org/10.1038/s42256-019-0138-9
Article PubMed PubMed Central Google Scholar
X. Wang, G. Gong, N. Li, S. Qiu, Detection analysis of epileptic EEG using a novel random forest model combined with grid search optimization. Front. Hum. Neurosci. 13, 52 (2019). https://doi.org/10.3389/fnhum.2019.00052
Article PubMed PubMed Central Google Scholar
J.T. Hancock, T.M. Khoshgoftaar, CatBoost for big data: an interdisciplinary review. J. Big Data 7(1), 94 (2020). https://doi.org/10.1186/s40537-020-00369-8
Article PubMed PubMed Central Google Scholar
K. Ambe, M. Suzuki, T. Ashikaga, M. Tohkin, Development of quantitative model of a local lymph node assay for evaluating skin sensitization potency applying machine learning CatBoost. Regul. Toxicol. Pharmacol. 125, 105019 (2021). https://doi.org/10.1016/j.yrtph.2021.105019
Article CAS PubMed Google Scholar
C. Zhang, X. Chen, S. Wang, J. Hu, C. Wang, X. Liu, Using CatBoost algorithm to identify middle-aged and elderly depression, national health and nutrition examination survey 2011-2018. Psychiatry Res. 306, 114261 (2021). https://doi.org/10.1016/j.psychres.2021.114261
Article PubMed Google Scholar
T.M. Dugan, S. Mukhopadhyay, A. Carroll, S. Downs, Machine learning techniques for prediction of early childhood obesity. Appl. Clin. Inf. 6(3), 506–520 (2015). https://doi.org/10.4338/ACI-2015-03-RA-0036
Article CAS Google Scholar
N. Kanerva, J. Kontto, M. Erkkola, J. Nevalainen, S. Mannisto, Suitability of random forest analysis for epidemiological research: exploring sociodemographic and lifestyle-related risk factors of overweight in a cross-sectional design. Scand. J. Public Health 46, 557–564 (2018). https://doi.org/10.1177/1403494817736944
Article PubMed Google Scholar
M. Safaei, E.A. Sundararajan, M. Driss, W. Boulila, A. Shapi’i, A systematic literature review on obesity: understanding the causes & consequences of obesity and reviewing various machine learning approaches used to predict obesity. Comput. Biol. Med. 136, 104754 (2021). https://doi.org/10.1016/j.compbiomed.2021.104754
Article PubMed Google Scholar
X. Pang, C.B. Forrest, F. Le-Scherban, A.J. Masino, Prediction of early childhood obesity with machine learning and electronic health record data. Int. J. Med. Inform. 150, 104454 (2021). https://doi.org/10.1016/j.ijmedinf.2021.104454
Article PubMed Google Scholar
B. Farran, R. AlWotayan, H. Alkandari, D. Al-Abdulrazzaq, A. Channanath, T.A. Thanaraj, Use of non-invasive parameters and machine-learning algorithms for predicting future risk of type 2 diabetes: a retrospective cohort study of health data from Kuwait. Front. Endocrinol. 10, 624 (2019). https://doi.org/10.3389/fendo.2019.00624
Article Google Scholar
C.C. Olisah, L. Smith, M. Smith, Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective. Comput. Methods Prog. Biomed. 220, 106773 (2022). https://doi.org/10.1016/j.cmpb.2022.106773
Article Google Scholar
S.M. Lee, S. Hwangbo, E.R. Norwitz et al., Nonalcoholic fatty liver disease and early prediction of gestational diabetes mellitus using machine learning methods. Clin. Mol. Hepatol. 28, 105–116 (2022). https://doi.org/10.3350/cmh.2021.0174
Article PubMed Google Scholar
A. Cahn, A. Shoshan, T. Sagiv et al., Prediction of progression from pre-diabetes to diabetes: development and validation of a machine learning model. Diabetes Metab. Res. Rev. 36, e3252 (2020). https://doi.org/10.1002/dmrr.3252
Article PubMed Google Scholar
H. Wei, J. Sun, W. Shan et al., Environmental chemical exposure dynamics and machine learning-based prediction of diabetes mellitus. Sci. Total Environ. 806, 150674 (2022). https://doi.org/10.1016/j.scitotenv.2021.150674
Article CAS PubMed ADS Google Scholar
A. Nicolucci, L. Romeo, M. Bernardini et al., Prediction of complications of type 2 diabetes: a machine learning approach. Diabetes Res. Clin. Pract. 190, 110013 (2022). https://doi.org/10.1016/j.diabres.2022.110013
Article PubMed Google Scholar
H. Liu, J. Li, J. Leng et al., Machine learning risk score for prediction of gestational diabetes in early pregnancy in Tianjin, China. Diabetes Metab. Res. Rev. 37, e3397 (2021). https://doi.org/10.1002/dmrr.3397
Article CAS PubMed Google Scholar
S. Belur Nagaraj, M.J. Pena, W. Ju, H.L. Heerspink, B.E.-D. Consortium, Machine-learning-based early prediction of end-stage renal disease in patients with diabetic kidney disease using clinical trials data. Diabetes Obes. Metab. 22, 2479–2486 (2020). https://doi.org/10.1111/dom.14178
Article CAS PubMed PubMed Central Google Scholar
I. Motaib, F. Aitlahbib, A. Fadil et al., Predicting poor glycemic control during Ramadan among non-fasting patients with diabetes using artificial intelligence based machine learning models. Diabetes Res. Clin. Pract. 190, 109982 (2022). https://doi.org/10.1016/j.diabres.2022.109982
Article PubMed Google Scholar
Y. Ruan, A. Bellot, Z. Moysova et al., Predicting the risk of inpatient hypoglycemia with machine learning using electronic health records. Diabetes Care 43, 1504–1511 (2020). https://doi.org/10.2337/dc19-1743
Article CAS PubMed Google Scholar
Y.T. Wu, C.J. Zhang, B.W. Mol et al., Early prediction of gestational diabetes mellitus in the Chinese population via advanced machine learning. J. Clin. Endocrinol. Metab. 106, e1191–e1205 (2021). https://doi.org/10.1210/clinem/dgaa899
Article PubMed Google Scholar

Download references

Acknowledgements

The authors would like to thank the participants for providing the information used in this study and for kindly making arrangements for the data collection.

Funding

This study was supported by Fujian Research and Training Grants for Young and Middle-aged Leaders in Healthcare (Grant No. (2023)417#), the Innovation Project of Fujian Provincial Health Commission (2021CXA003), Natural Science Foundation of Fujian Province (Grant No. 2022J011017 and Grant No. 2020J011068), National Key Research and Development Program of China (2018YFC2001100-5), and Natural Science Foundation of China (82070878).

Author information

These authors contributed equally: Wei Lin, Songchang Shi

Authors and Affiliations

Department of Endocrinology, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, FuZhou, 350001, PR China
Wei Lin, Huiyu Lan, Nengying Wang, Huibin Huang, Junping Wen & Gang Chen
Department of Critical Care Medicine, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital South Branch, Fujian Provincial Hospital Jinshan Branch, Fujian Provincial Hospital, Fuzhou, 350001, PR China
Songchang Shi

Authors

Wei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Songchang Shi
View author publications
You can also search for this author in PubMed Google Scholar
Huiyu Lan
View author publications
You can also search for this author in PubMed Google Scholar
Nengying Wang
View author publications
You can also search for this author in PubMed Google Scholar
Huibin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Junping Wen
View author publications
You can also search for this author in PubMed Google Scholar
Gang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

W.L. and S.S. performed the formal analysis, devised the methodology, and wrote the original draft. H.L., H.H., and J.W. performed the curation of data and resources. W.L. and G.C. were involved in the conceptualization, formal analysis, writing of the original draft, and project administration. G.C. are the guarantors of this manuscript, had full access to all the data in the study, and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Corresponding authors

Correspondence to Wei Lin or Gang Chen.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Ethical approval

Our study was performed in accordance with the Declaration of Helsinki and approved by The Ethics Committee of Fujian Provincial Hospital (approval no. K2019-06-032). All patients provided written informed consent prior to enrollment in the study.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental 1

Supplemental 2

Supplemental 3

Supplemental 4

Supplemental 5

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lin, W., Shi, S., Lan, H. et al. Identification of influence factors in overweight population through an interpretable risk model based on machine learning: a large retrospective cohort. Endocrine 83, 604–614 (2024). https://doi.org/10.1007/s12020-023-03536-y

Download citation

Received: 06 April 2023
Accepted: 12 September 2023
Published: 30 September 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s12020-023-03536-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identification of influence factors in overweight population through an interpretable risk model based on machine learning: a large retrospective cohort