Skip to main content
Log in

EAPR: explainable and augmented patient representation learning for disease prediction

  • Research
  • Published:
Health Information Science and Systems Aims and scope Submit manuscript


Patient representation learning aims to encode meaningful information about the patient’s Electronic Health Records (EHR) in the form of a mathematical representation. Recent advances in deep learning have empowered Patient representation learning methods with greater representational power, allowing the learned representations to significantly improve the performance of disease prediction models. However, the inherent shortcomings of deep learning models, such as the need for massive amounts of labeled data and inexplicability, limit the performance of deep learning-based Patient representation learning methods to further improvements. In particular, learning robust patient representations is challenging when patient data is missing or insufficient. Although data augmentation techniques can tackle this deficiency, the complex data processing further weakens the inexplicability of patient representation learning models. To address the above challenges, this paper proposes an Explainable and Augmented Patient Representation Learning for disease prediction (EAPR). EAPR utilizes data augmentation controlled by confidence interval to enhance patient representation in the presence of limited patient data. Moreover, EAPR proposes to use two-stage gradient backpropagation to address the problem of unexplainable patient representation learning models due to the complex data enhancement process. The experimental results on real clinical data validate the effectiveness and explainability of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others





  1. Wang T, Bendayan R, Msosa Y, Pritchard M, Roberts A, Stewart R, Dobson R. Patient-centric characterization of multimorbidity trajectories in patients with severe mental illnesses. J Biomed Inform. 2022;127:104010.

    Article  Google Scholar 

  2. Ma F, Yu L, Ye L, Yao DD, Zhuang W. Length-of-stay prediction for pediatric patients with respiratory diseases using decision tree methods. IEEE J Biomed Health Inform. 2020;24(9):2651–62.

    Article  Google Scholar 

  3. Zheng Z, Wang C, Xu T, Shen D, Chen E. Drug package recommendation via interaction-aware graph induction. 2021.

  4. Lu H, Uddin S, Hajati F, Moni MA, Khushi M. A patient network-based machine learning model for disease prediction: the case of type 2 diabetes mellitus. Appl Intell. 2022;52(3):2411–22.

    Article  Google Scholar 

  5. Fan Y, Tao Z, Lin J, Chen H. An encoder-decoder network for automatic clinical target volume target segmentation of cervical cancer in CT images. Int J Crowd Sci. 2022;6(3):111–6.

    Article  Google Scholar 

  6. Yu F, Cui L, Chen H, Cao Y, Liu N, Huang W, Xu Y, Lu H. Healthnet: a health progression network via heterogeneous medical information fusion. IEEE Trans Neural Netw Learn Syst. 2022.

  7. Yu F, Cui L, Cao Y, Liu N, Huang W, Xu Y. Similarity-aware collaborative learning for patient outcome prediction. In: International conference on database systems for advanced applications. Springer, Berlin; 2022; p. 407–422.

  8. Niu J, Tang Y, Sun Z, Zhang W. Inter-patient ECG classification with symbolic representations and multi-perspective convolutional neural networks. IEEE J Biomed Health Inform. 2020;24(5):1321–32.

    Article  Google Scholar 

  9. Han M, Özdenizci O, Koike-Akino T, Wang Y, Erdoğmuş D. Universal physiological representation learning with soft-disentangled rateless autoencoders. IEEE J Biomed Health Inform. 2021;25(8):2928–37.

    Article  Google Scholar 

  10. Pokharel S, Zuccon G, Li X, Utomo CP, Li Y. Temporal tree representation for similarity computation between medical patients. Artif Intell Med. 2020;108:101900.

    Article  Google Scholar 

  11. Meng Y, Speier W, Ong MK, Arnold CW. Bidirectional representation learning from transformers using multimodal electronic health record data to predict depression. IEEE J Biomed Health Inform. 2021;25(8):3121–9.

    Article  Google Scholar 

  12. Darabi S, Kachuee M, Fazeli S, Sarrafzadeh M. Taper: time-aware patient EHR representation. IEEE J Biomed Health Inform. 2020;24(11):3268–75.

    Article  Google Scholar 

  13. Xie F, Yuan H, Ning Y, Ong MEH, Feng M, Hsu W, Chakraborty B, Liu N. Deep learning for temporal data representation in electronic health records: a systematic review of challenges and methodologies. J Biomed Inform. 2022;126:103980.

    Article  Google Scholar 

  14. Wang H, Ahn E, Kim J. Self-supervised representation learning framework for remote physiological measurement using spatiotemporal augmentation loss. 2021.

  15. Kim N, Piao Y, Kim S. Clinical note owns its hierarchy: multi-level hypergraph neural networks for patient-level representation learning. In: Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers), p. 5559–5573. Association for computational linguistics, Toronto, Canada. 2023.

  16. Daniali M, Galer PD, Lewis-Smith D, Parthasarathy S, Kim E, Salvucci DD, Miller JM, Haag S, Helbig I. Enriching representation learning using 53 million patient notes through human phenotype ontology embedding. Artif Intell Med. 2023;139:102523.

    Article  Google Scholar 

  17. Huang Y, Luo F, Wang X, Di Z, Li B, Luo B. A one-size-fits-three representation learning framework for patient similarity search. Data Sci Eng. 2023; p. 1–12.

  18. Zhang C, Gao X, Ma L, Wang Y, Wang J, Tang W. Grasp: generic framework for health status representation learning based on incorporating knowledge from similar patients. In: Proceedings of the AAAI conference on artificial intelligence. 2021; vol. 35, p. 715–723.

  19. Lu Q, Dou D, Nguyen TH. Textual data augmentation for patient outcomes prediction. In: 2021 IEEE international conference on bioinformatics and biomedicine (BIBM). 2021; p. 2817–2821.

  20. Deng Y, Lu L, Aponte L, Angelidi AM, Mantzoros CS. Deep transfer learning and data augmentation improve glucose levels prediction in type 2 diabetes patients. NPJ Digit Med.

  21. Saranya SS, Fatima NS. IoT-based patient health data using improved context-aware data fusion and enhanced recursive feature elimination model. IEEE Access. 2022;10:128318–35.

    Article  Google Scholar 

  22. Yu L, Xiang W, Fang J, Phoebe Chen Y-P, Zhu R. A novel explainable neural network for Alzheimer’s disease diagnosis. Pattern Recogn. 2022;131:108876.

    Article  Google Scholar 

  23. Wang S-H, Zhang Y, Cheng X, Zhang X, Zhang Y-D: Psspnn: Patchshuffle stochastic pooling neural network for an explainable diagnosis of covid-19 with multiple-way data augmentation. Comput Math Methods Med 2021.

  24. Zhang Y, Zhang X, Zhu W. Anc: Attention network for covid-19 explainable diagnosis based on convolutional block attention module. Comput Model Eng Sci. 2021; p. 1037–1058.

  25. Payrovnaziri SN, Chen Z, Rengifo-Moreno P, Miller T, Bian J, Chen JH, Liu X, He Z. Explainable artificial intelligence models using real-world electronic health record data: a systematic scoping review. J Am Med Inform Assoc. 2020;27(7):1173–85.

    Article  Google Scholar 

  26. Zhang J, Yu H. Eid: facilitating explainable ai design discussions in team-based settings. Int J Crowd Sci. 2023;7(2):47–54.

    Article  Google Scholar 

  27. Shang Z, Meng H, Zhao Y, Xu R, Xu Y, Cui L. Cross-domain credit default prediction via interpretable ensemble transfer. Int J Crowd Sci. 2023;7(3):106–12.

    Article  Google Scholar 

  28. Shah R, Kumar V. Rrl: Resnet as representation for reinforcement learning. arXiv preprint arXiv:2107.03380. 2021.

  29. Grill JB, Strub F, Altché F, Tallec C, Richemond PH, Buchatskaya E, Doersch C, Pires BA, Guo ZD, Azar MG. Bootstrap your own latent: a new approach to self-supervised learning. 2020.

  30. Ding X, Zhang X, Han J, Ding G. Scaling up your kernels to 31\(\times \)31: revisiting large kernel design in CNNS. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR). 2022; p. 11953–11965.

  31. Aitken K, Ramasesh VV, Garg A, Cao Y, Sussillo D, Maheswaranathan N. The geometry of integration in text classification RNNS. In: International conference on learning representations. 2021.

  32. De Brouwer E, Simm J, Arany A, Moreau Y. Gru-ode-bayes: continuous modeling of sporadically-observed time series. Advances in neural information processing systems 32. 2019.

  33. Abdul W, Alsulaiman M, Amin SU, Faisal M, Ghaleb H. Intelligent real-time Arabic sign language classification using attention-based inception and BiLSTM. Comput Electr Eng. 2021;95(6):107395.

    Article  Google Scholar 

  34. Steinberg E, Jung K, Fries JA, Corbin CK, Pfohl SR, Shah NH. Language models are an effective representation learning technique for electronic health record data. J Biomed Inform. 2021;113:103637.

    Article  Google Scholar 

  35. Zhang E, Robinson R, Pfahringer B. Deep holistic representation learning from ehr. In: 2018 12th international symposium on medical information and communication technology (ISMICT). 2018.

  36. Jaume-Santero F, Zhang B, Proios D, Yazdani A, Gouareb R, Bjelogrlic M, Teodoro D. Cluster analysis of low-dimensional medical concept representations from electronic health records. In: International conference on health information science. 2022.

  37. Wang Y, Wu T, Wang Y, Wang G. Enhancing model interpretability and accuracy for disease progression prediction via phenotype-based patient similarity learning. In: Pacific symposium on biocomputing 2020. World Scientific. 2019; p. 511–522.

Download references


This research is partially supported by the National Key R &D Program of China 2021YFF0900800, NSFC No.62202279, the Shandong Provincial Natural Science Foundation (No.ZR2022QF018), Shandong Provincial Outstanding Youth Science Foundation (No.2023HWYQ-039), the Fundamental Research Funds of Shandong University.

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Jiancheng Zhang, Yonghui Xu or Yang Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests or potential conflict.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Xu, Y., Ye, B. et al. EAPR: explainable and augmented patient representation learning for disease prediction. Health Inf Sci Syst 11, 53 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: