Skip to main content

SynthA1c: Towards Clinically Interpretable Patient Representations for Diabetes Risk Stratification

  • Conference paper
  • 144 Accesses

Part of the Lecture Notes in Computer Science book series (LNCS,volume 14277)


Early diagnosis of Type 2 Diabetes Mellitus (T2DM) is crucial to enable timely therapeutic interventions and lifestyle modifications. As the time available for clinical office visits shortens and medical imaging data become more widely available, patient image data could be used to opportunistically identify patients for additional T2DM diagnostic workup by physicians. We investigated whether image-derived phenotypic data could be leveraged in tabular learning classifier models to predict T2DM risk in an automated fashion to flag high-risk patients without the need for additional blood laboratory measurements. In contrast to traditional binary classifiers, we leverage neural networks and decision tree models to represent patient data as ‘SynthA1c’ latent variables, which mimic blood hemoglobin A1c empirical lab measurements, that achieve sensitivities as high as 87.6%. To evaluate how SynthA1c models may generalize to other patient populations, we introduce a novel generalizable metric that uses vanilla data augmentation techniques to predict model performance on input out-of-domain covariates. We show that image-derived phenotypes and physical examination data together can accurately predict diabetes risk as a means of opportunistic risk stratification enabled by artificial intelligence and medical imaging. Our code is available at


  • Disease Prediction
  • Representation Learning
  • Radiomics

M. S. Yao and A. Chae—Equal contribution as co-first authors.

W. R. Witschey and H. Sagreiya—Equal contribution as co-senior authors.

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. Khan, M.A.B., Hashim, M.J., King, J.K., Govender, R.D., Mustafa, H., Al Kaabi, J.: Epidemiology of type 2 diabetes - Global burden of disease and forecasted trends. J. Epi. Glob. Health 10(1), 107–111 (2020).

  2. Xu, G., et al.: Prevalence of diagnosed type 1 and type 2 diabetes among US adults in 2016 and 2017: Population based study. BMJ 362 (2018).

  3. Albarakat, M., Guzu, A.: Prevalence of type 2 diabetes and their complications among home health care patients at Al-Kharj military industries corporation hospital. J. Family Med. Prim. Care 8(10), 3303–3312 (2019).

  4. Polubriaginof, F.C.G., Shang, N., Hripcsak, G., Tatonetti, N.P., Vawdrey, D.K.: Low screening rates for diabetes mellitus among family members of affected relatives. In: AMIA Annual Symposium Proceedings, pp. 1471–1417 (2019)

    Google Scholar 

  5. Kaul, P., Chu, L.M., Dover, D.C., Yeung, R.O., Eurich, D.T., Butalia, S.: Disparities in adherence to diabetes screening guidelines among males and females in a universal care setting: a population-based study of 1,380,697 adults. Lancet Regional Health (2022).

  6. Porter, J., Boyd, C., Skandari, M.R., Laiteerapong, N.: Revisiting the time needed to provide adult primary care. J. Gen. Intern. Med. (2022).

  7. Farran, B., Channanath, A.M., Behbehani, K., Thanaraj, T.A.: Predictive models to assess risk of type 2 diabetes, hypertension and comorbidity: machine-learning algorithms and validation using national health data from Kuwait—A cohort study. BMJ Open 3(5) (2013).

  8. Dowhanik, S.P.D., Schieda, N., Patlas, M.N., Salehi, F., van der Pol, C.B.: Doing more with less: CT and MRI utilization in Canada 2003–2019. Canadian Assoc. Radiol. J. 73(3), 592–594 (2022).

  9. Hong, A.S., Levin, D., Parker, L., Rao, V.M., Ross-Degnan, D., Wharam, J.F.: Trends in diagnostic imaging utilization among Medicare and commercially insured adults from 2003 through 2016. Radiology 294(2), 342–350 (2020).

  10. Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., Johannes, R.S.: Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of Symposium on Computer Application in Medical Care, 261–265 (1988)

    Google Scholar 

  11. MacLean, M.T., et al.: Quantification of abdominal fat from computed tomography using deep learning and its association with electronic health records in an academic biobank. J. Am. Med. Inform. Assoc. 28(6), 1178–1187 (2021).

    CrossRef  Google Scholar 

  12. Uddin, S., Khan, A., Hossain, M.E., Moni, M.A.: Comparing different supervised machine learning algorithms for disease prediction. BMC Med. Inform. Decis. Mak. 19(281) (2019).

  13. Kopitar, L., Kocbek, P., Cilar, L., Sheikh, A., Stiglic, G.: Early detection of type 2 diabetes mellitus using machine learning-based prediction models. Nat. Sci. Rep. (2020).

  14. Deberneh, H.M., Kim, I.: Prediction of type 2 diabetes based on machine learning algorithm. Int. J. Environ. Res. Public Health 18(6), 3317 (2021).

    CrossRef  Google Scholar 

  15. Sivaraman, V., Bukowski, L.A., Levin, J., Kahn, J.M., Perer, A.: Ignore, trust, or negotiate: Understanding clinician acceptance of AI-based treatment recommendations in health care. arXiv (2023).

  16. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016).

  17. Popov, S., Morozov, S., Babenko, A.: Neural oblivious decision ensembles for deep learning on tabular data. arXiv (2019).

  18. Gorishniy, Y., Rubachev, I., Khrulkov, V., Babenko, A.: Revisiting deep learning models for tabular data. arXiv (2021).

  19. Bang, H., et al.: Development and validation of a patient self-assessment score for diabetes risk. Ann. Intern. Med. 151(11), 775–783 (2009).

    CrossRef  Google Scholar 

  20. Ng, N., Hulkund, N., Cho, K., Ghassemi, M.: Predicting out-of-domain generalization with local manifold smoothness. arXiv (2022).

  21. Jiang, Z., Zhou, J., Huang, H.: Relationship between manifold smoothness and adversarial vulnerability in deep learning with local errors. Chin. Phys. B 30(4) (2021).

  22. Rashid, A.: Iraqi Diabetes Dataset (2020).,

Download references


MSY is supported by NIH T32 EB009384. AC is supported by the A\(\Upomega \)A Carolyn L. Kuckein Student Research Fellowship and the University of Pennsylvania Diagnostic Radiology Research Fellowship. WRW is supported by NIH R01 HL137984. MTM received funding from the Sarnoff Cardiovascular Research Foundation. HS received funding from the RSNA Scholar Grant.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Hersh Sagreiya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Cite this paper

Yao, M.S. et al. (2023). SynthA1c: Towards Clinically Interpretable Patient Representations for Diabetes Risk Stratification. In: Rekik, I., Adeli, E., Park, S.H., Cintas, C., Zamzmi, G. (eds) Predictive Intelligence in Medicine. PRIME 2023. Lecture Notes in Computer Science, vol 14277. Springer, Cham.

Download citation

  • DOI:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-46004-3

  • Online ISBN: 978-3-031-46005-0

  • eBook Packages: Computer ScienceComputer Science (R0)