Analysis of Health Screening Records Using Interpretations of Predictive Models

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12721)


Health screening is conducted in many countries to track general health conditions and find asymptomatic patients. In recent years, large-scale data analyses on health screening records have been utilized to predict patients’ future health conditions. While such predictions are significantly important, it is also of great interest for medical researchers to identify factors that could deteriorate patients’ medical conditions in the future. For this purpose, we propose to use interpretations of trained predictive models. Specifically, we trained machine learning models to predict future diabetes stages, then applied permutation importance, SHapley Additive exPlanations (SHAP), and a sensitivity analysis to extract features that contribute to aggravation. Among the trained models, XGBoost performed best in terms of the Matthews correlation coefficient. Permutation importance and SHAP showed that the model makes good predictions using a number of attributes conventionally known to be related to diabetes, but also those not commonly used in the diagnosis of diabetes. A sensitivity analysis showed that the predictions’ changes were mostly consistent with our intuition on how daily behavior affects type 2 diabetes’s aggravation.



This work was supported by JST COI Grant Number JPMJCE1301 and JSPS KAKENHI Grant Number JP16K00228, JP16H02904.


  1. 1.
    Altmann, A., Toloşi, L., Sander, O., Lengauer, T.: Permutation importance: a corrected feature importance measure. Bioinformatics 26(10), 1340–1347 (2010)CrossRefGoogle Scholar
  2. 2.
    Arrieta, A.B., et al.: Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115 (2020)CrossRefGoogle Scholar
  3. 3.
    Bi, H., Gan, Y., Yang, C., Chen, Y., Tong, X., Lu, Z.: Breakfast skipping and the risk of type 2 diabetes: a meta-analysis of observational studies. Public Health Nutr. 18(16), 3013–3019 (2015)CrossRefGoogle Scholar
  4. 4.
    Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceeding ACM International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)Google Scholar
  5. 5.
    Garske, T.: Using Deep Learning on EHR Data to Predict Diabetes. Ph.D. thesis, University of St. Thomas (2018)Google Scholar
  6. 6.
    Kim, H.-G., Jang, G.-J., Choi, H.-J., Kim, M., Kim, Y.-W., Choi, J.: Recurrent neural networks with missing information imputation for medical examination data prediction. In: International Conference on Big Data and Smart Computing (2017)Google Scholar
  7. 7.
    Ichikawa, D., Saito, T., Oyama, H.: Impact of predicting health-guidance candidates using massive health check-up data: a data-driven analysis. Int. J. Medical Informatics 106, 32–36 (2017)CrossRefGoogle Scholar
  8. 8.
    Lundberg, S.M., et al.: From local explanations to global understanding with explainable AI for trees. Nature Machine Intell. 2(1), 56–67 (2020)CrossRefGoogle Scholar
  9. 9.
    Makino, M., et al.: Artificial intelligence predicts the progression of diabetic kidney disease using big data machine learning. Nat. Sci. Rep. 9(1), 1–9 (2019)Google Scholar
  10. 10.
    Marini, S., et al.: A dynamic Bayesian Network model for long-term simulation of clinical complications in type 1 diabetes. J. Biomed. Inform. (2015)Google Scholar
  11. 11.
    Mussone, L., Bassani, M., Masci, P.: Analysis of factors affecting the severity of crashes in urban road intersections. Accident Analysis & Prevention 103 (2017)Google Scholar
  12. 12.
    Shimoda, A., Ichikawa, D., Oyama, H.: Using machine-learning approaches to predict non-participation in a nationwide general health check-up scheme. Comput. Methods Programs Biomed. 163, 39–46 (2018)CrossRefGoogle Scholar
  13. 13.
    Shortliffe, E.H., Sepúlveda, M.J.: Clinical decision support in the era of artificial intelligence. J. Am. Med. Assoc. 320(21), 2199–2200 (2018)CrossRefGoogle Scholar
  14. 14.
    Sisodia, D., Sisodia, D.S.: Prediction of diabetes using classification algorithms. Procedia Comput. Sci. 132, 1578–1585 (2018)CrossRefGoogle Scholar
  15. 15.
    Tsunekawa, M., Oka, N., Araki, M., Shintani, M., Yoshikawa, M., Tanigawa, T.: Prediction of the onset of lifestyle-related diseases using regular health checkup data. In: Proceedings of the Annual Conference of the Japan Social for Artificial Intelligence (2019)Google Scholar
  16. 16.
    Zou, Q., Qu, K., Luo, Y., Yin, D., Ju, Y., Tang, H.: Predicting diabetes mellitus with machine learning techniques. Front. Genet. 9, 515 (2018)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2021

Authors and Affiliations

  1. 1.Graduate School of Comprehensive Human SciencesUniversity of TsukubaTsukubaJapan
  2. 2.Faculty of Library, Information and Media ScienceUniversity of TsukubaTsukubaJapan
  3. 3.Faculty of MedicineUniversity of TsukubaTsukubaJapan

Personalised recommendations