Skip to main content
Log in

Data structuring of electronic health records: a systematic review

  • Review Paper
  • Published:
Health and Technology Aims and scope Submit manuscript

Abstract

The medical field has experienced a series of transformations with the adoption of new technologies. One of the aspects that experienced significant changes is how a patient’s information is stored. Electronic health records have brought a series of advantages but still present many issues. One of them is the degree of structuring for contained information. More structuring brings a greater richness of information. On the other hand, it contains more challenging and complex content when most of the information is stored in free text (unstructured information). In this sense, many studies focused on structuring the information contained in free text have emerged. This work aims to review the studies focused on the structuring of unstructured health record information, seeking to answer key questions to propose new studies in the field on topics such as the form in which information is structured, the main techniques used, and how data acquisition for development and evaluation is done. To answer these questions, a wide systematic review of the field was conducted since the emergence of BERT networks. In addition to answering those questions, this systematic review identified the main challenges, such as difficulty in data acquisition, problems with natural language processing, and the specific challenges of the studies that process non-English languages, finalizing with a general view of the state of the art in the field and its future opportunities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://scholar.google.com

  2. https://www.acm.org/dl

  3. https://ieeexplore.ieee.org

  4. https://www.sciencedirect.com

  5. https://www.springerlink.com

  6. https://www.scopus.com

  7. pubmed.ncbi.nlm.nih.gov

  8. arxiv.org

References

  1. Roehrs A, Da Costa CA, da Rosa Righi R, De Oliveira KSF. Personal health records: a systematic literature review. J Med Int Res 2017;19(1):e13.

  2. Castillo VH, Martínez-García AI, Soriano-Equigua L, Maciel-Mendoza FM, Álvarez-Flores JL, Juárez-Ramírez R. An interaction framework for supporting the adoption of ehrs by physicians. Univ Access Inf Soc. 2019;18(2):399–412.

    Article  Google Scholar 

  3. Maximilian Z, J BO, Michael M. Using openehr archetypes for automated extraction of numerical information from clinical narratives. Studies in Health Technology and Informatics 267(German Medical Data Sciences: Shaping Change Creative Solutions for Innovative Medicine) 2019;156-163. https://doi.org/10.3233/SHTI190820

  4. Tognola G, Murri A, Cuda D. Cognitive computing for the automated extraction and meaningful use of health data in narrative medical notes: An application to the clinical management of hearing impaired aged patients. In: 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), 2018 IEEE, https://doi.org/10.1109%2Fbhi.2018.8333428

  5. Kormilitzin A, Vaci N, Liu Q, Nevado-Holgado A. Med7: a transferable clinical natural language processing model for electronic health records. 2020 CoRR.

  6. Kersloot MG, van Putten FJ, Abu-Hanna A, Cornet R, Arts DL. Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies. Journal of biomedical semantics. 2020;11(1):1–21.

    Article  Google Scholar 

  7. Kim MC, Nam S, Wang F, Zhu Y. Mapping scientific landscapes in umls research: a scientometric review. J Am Med Inform Assoc. 2020;27(10):1612–24.

    Article  Google Scholar 

  8. Basyal GP, Rimal BP, Zeng D. A systematic review of natural language processing for knowledge management in healthcare. 2020. https://arxiv.org/abs/2007.09134

  9. AlShuweihi M, Salloum SA, Shaalan K. Biomedical corpora and natural language processing on clinical text in languages other than english: A systematic review. Recent Advances in Intelligent Systems and Smart Applications. 2021;491–509.

  10. Luque C, Luna JM, Luque M, Ventura S. An advanced review on text mining in medicine. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2019;9(3).

  11. Sun W, Cai Z, Liu F, Fang S, Wang G. A survey of data mining technology on electronic medical records. In: IEEE 19th International Conference on e-Health Networking. IEEE: Applications and Services (Healthcom); 2017. p. 1–6.

    Google Scholar 

  12. Sun W, Cai Z, Li Y, Liu F, Fang S, Wang G. Data processing and text mining technologies on electronic medical records: a review. Journal of healthcare engineering 2018.

  13. Yadav P, Steinbach M, Kumar V, Simon G. Mining electronic health records (ehrs) a survey. ACM Computing Surveys (CSUR). 2018;50(6):1–40.

    Article  Google Scholar 

  14. Alfattni G, Peek N, Nenadic G. Extraction of temporal relations from clinical free text: A systematic review of current approaches. Journal of Biomedical Informatics 2020;103488.

  15. Fu S, Chen D, He H, Liu S, Moon S, Peterson KJ, Shen F, Wang L, Wang Y, Wen A, et al. Clinical concept extraction: a methodology review. Journal of Biomedical Informatics 2020;103526.

  16. Kaieski N, da Costa CA, da RosaRighi R, Lora PS, Eskofier B. Application of artificial intelligence methods in vital signs analysis of hospitalized patients: A systematic literature review. Appl Soft Comp, 2020;106612.

  17. Koleck TA, Dreisbach C, Bourne PE, Bakken S. Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review. J Am Med Inform Assoc. 2019;26(4):364–79.

    Article  Google Scholar 

  18. Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V. Natural language processing of clinical notes on chronic diseases: systematic review. JMIR Med Inform. 2019;7(2).

    Article  Google Scholar 

  19. Al-Aiad A, El-shqeirat T. Text mining in radiology reports (methodologies and algorithms), and how it affects on workflow and supports decision making in clinical practice (systematic review). In: 2020 11th International Conference on Information and Communication Systems (ICICS), IEEE, 2020;283–287.

  20. Luo JW, Chong JJ. Review of natural language processing in radiology. Neuroimaging Clinics. 2020;30(4):447–58.

    Article  Google Scholar 

  21. Colmenarejo G. Machine learning models to predict childhood and adolescent obesity: A review. Nutrients. 2020;12(8):2466.

    Article  Google Scholar 

  22. Kreimeyer K, Foster M, Pandey A, Arya N, Halford G, Jones SF, Forshee R, Walderhaug M, Botsis T. Natural language processing systems for capturing and standardizing unstructured clinical information: a systematic review. J Biomed Inform. 2017;73:14–29.

    Article  Google Scholar 

  23. Percha B. Modern clinical text mining: A guide and review. Preprints 2021.

  24. Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep ehr: a survey of recent advances in deep learning techniques for electronic health record (ehr) analysis. IEEE J Biomed Health Inform. 2017;22(5):1589–604.

    Article  Google Scholar 

  25. Spasic I, Nenadic G. Clinical text data in machine learning: Systematic review. JMIR Med Inform. 2020;8(3).

    Article  Google Scholar 

  26. Gubert LC, da Costa CA, da Rosa Righi R. Context awareness in healthcare: a systematic literature review. Univ Access Inf Soc. 2020;19(2):245–59.

    Article  Google Scholar 

  27. Budgen D, Brereton P. Performing systematic literature reviews in software engineering. In: Proceedings of the 28th International Conference on Software Engineering, Association for Computing Machinery, New York, NY, USA, ICSE ’06, 2006;1051-1052. https://doi.org/10.1145/1134285.1134500

  28. Keele S, et al. Guidelines for performing systematic literature reviews in software engineering. Tech. rep., Technical report, Ver. 2.3 EBSE Tech Rep. EBSE 2007. 

  29. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. Attention is all you need. 2017, arXiv preprint arXiv:170603762

  30. Devlin J, Chang M, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR. 2018. http://arxiv.org/abs/1810.04805

  31. Alsentzer E, Murphy JR, Boag W, Weng WH, Jin D, Naumann T, McDermott MBA. Publicly available clinical bert embeddings. 2019. https://arxiv.org/abs/1904.03323.

  32. Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019;4171–4186, https://www.aclweb.org/anthology/N19-1423

  33. Qin X, Liu J, Wang Y, Liu Y, Deng K, Ma Y, Zou K, Li L, Sun X. Natural language processing was effective in assisting rapid title and abstract screening when updating systematic reviews. J Clin Epidemiol. 2021.

  34. El Rifai O, Biotteau M, de Boissezon X, Megdiche I, Ravat F, Teste O. Blockchain-based personal health records for patients’ empowerment. In: Dalpiaz F, Zdravkovic J, Loucopoulos P, editors. Research Challenges in Information Science. Cham: Springer International Publishing, 2020;455–71.

    Chapter  Google Scholar 

  35. Reza F, Prieto JT, Julien SP. Electronic Health Records: Origination, Adoption, and Progression, Springer International Publishing, Cham, 2020;183–201. https://doi.org/10.1007/978-3-030-41215-9_11

  36. Syed L, Jabeen S, Manimala S. Telemammography: A Novel Approach for Early Detection of Breast Cancer Through Wavelets Based Image Processing and Machine Learning Techniques, 2018;149–183. https://doi.org/10.1007/978-3-319-63754-9_8

  37. Feature Extraction Method from Electronic Health Records in Russia, FRUCT Oy. 2020. https://doi.org/10.5281/zenodo.4007408

  38. Amin S, Neumann G, Dunfield K, Vechkaeva A, Chapman KA, Wixted MK. Mlt-dfki at clef ehealth 2019: Multi-label classification of icd-10 codes with bert. In: CLEF (Working Notes). 2019.

  39. Blanco A, Casillas A, Pérez A, deIlarraza AD. Multi-label clinical document classification: Impact of label-density. Expert Systems with Applications 2019;138:112835. https://doi.org/10.1016%2Fj.eswa.2019.112835

  40. Breischneider C, Zillner S, Hammon M, Gass P, Sonntag D. Automatic extraction of breast cancer information from clinical reports. In: 2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS), IEEE. 2017. https://doi.org/10.1109%2Fcbms.2017.138

  41. Cai T, Zhou Y, Zheng H. Cost-quality adaptive active learning for chinese clinical named entity recognition. 2020a. arXiv:200812548

  42. Cai T, Zhou Y, Zheng H. Cost-quality adaptive active learning for chinese clinical named entity recognition. 2020b. arXiv preprint arXiv:200812548

  43. Chen R, Ho JC, Lin JMS. Extracting medication information from unstructured public health data: a demonstration on data from population-based and tertiary-based samples. BMC Medical Research Methodology 2020;20(1). https://doi.org/10.1186/s12874-020-01131-7

  44. Chen Y, Zhou C, Li T, Wu H, Zhao X, Ye K, Liao J. Named entity recognition from chinese adverse drug event reports with lexical feature based bilstm-crf and tri-training. Journal of Biomedical Informatics 2019b;96:103252. http://www.sciencedirect.com/science/article/pii/S1532046419301716

  45. Chowdhury S, Dong X, Qian L, Li X, Guan Y, Yang J, Yu Q. A multitask bi-directional RNN model for named entity recognition on chinese electronic medical records. BMC Bioinformatics. 2018;19(S17). https://doi.org/10.1186%2Fs12859-018-2467-9

  46. Dai HJ, Su CH, Wu CS. Adverse drug event and medication extraction in electronic health records via a cascading architecture with different sequence labeling models and word embeddings. J Am Med Inform Asso 2019;27(1):47–55. https://doi.org/10.1093%2Fjamia%2Focz120

  47. Dong X, Chowdhury S, Qian L, Li X, Guan Y, Yang J, Yu Q. Deep learning for named entity recognition on chinese electronic medical records: Combining deep transfer learning with multitask bi-directional LSTM RNN. PLOS ONE. 2019;14(5):e0216046. https://doi.org/10.1371%2Fjournal.pone.0216046

  48. Du M, Pang M, Xu B. Multi-task learning for attribute extraction from unstructured electronic medical records. In: Wang X, Lisi FA, Xiao G, Botoeva E, editors. Semantic Technology. Singapore: Springer Singapore; 2020. p. 117–28.

    Chapter  Google Scholar 

  49. Huang HL, Hong SH, Tsai YC. Approaches to text mining for analyzing treatment plan of quit smoking with free-text medical records: A prisma-compliant meta-analysis. Medicine 2020;99(29).

  50. Ji J, Chen B, Jiang H (2020) Fully-connected LSTM–CRF on medical concept extraction. International Journal of Machine Learning and Cybernetics 11(9):1971–1979. https://doi.org/10.1007/s13042-020-01087-6

  51. Jouffroy J, Feldman SF, Lerner I, Rance B, Burgun A, Neuraz A. MedExt: combining expert knowledge and deep learning for medication extraction from french clinical texts (preprint). JMIR Med Inform 10.2196/preprints.17934, URL, 2020. https://doi.org/10.2196%2Fpreprints.17934

  52. Kraljevic Z, Searle T, Shek A, Roguski L, Noor K, Bean D, Mascio A, Zhu L, Folarin AA, Roberts A, et al. Multi-domain clinical natural language processing with medcat: the medical concept annotation toolkit. 2020; arXiv preprint arXiv:201001165

  53. Lee W, Kim K, Lee EY, Choi J. Conditional random fields for clinical named entity recognition: A comparative study using korean clinical texts. Computers in Biology and Medicine 2018;101:7–14. http://www.sciencedirect.com/science/article/pii/S0010482518302105

  54. Lerner I, Paris N, Tannier X. Terminologies augmented recurrent neural network model for clinical named entity recognition. J Biomed Inform 2020;102:103356. http://www.sciencedirect.com/science/article/pii/S1532046419302734

  55. Li Y, Du G, Xiang Y, Li S, Ma L, Shao D, Wang X, Chen H. Towards chinese clinical named entity recognition by dynamic embedding using domain-specific knowledge. J Biomed Inform 2020a;106:103435. https://doi.org/10.1016%2Fj.jbi.2020.103435

  56. Li Y, Wang X, Hui L, Zou L, Li H, Xu L, Liu W. Chinese clinical named entity recognition in electronic medical records: Development of a lattice long short-term memory model with contextualized character representations. JMIR Medical Inform 2020b;8(9):e19848. https://doi.org/10.2196%2F19848

  57. Liu K, Hu Q, Liu J, Xing C. Named entity recognition in chinese electronic medical records based on CRF. In: 2017 14th Web Information Systems and Applications Conference (WISA), IEEE, 2017. https://doi.org/10.1109%2Fwisa.2017.8

  58. Lopes F, Teixeira C, Oliveira HG. Named entity recognition in portuguese neurology text using crf. In: EPIA Conference on Artificial Intelligence, Springer, 2019;336–348

  59. Lopes F, Teixeira C, Oliveira HG. Comparing different methods for named entity recognition in portuguese neurology text. J Med Systems 2020;44(4). https://doi.org/10.1007%2Fs10916-020-1542-8

  60. Lu N, Zheng J, Wu W, Yang Y, Chen K, Hu W. Chinese clinical named entity recognition with word-level information incorporating dictionaries. In: 2019 International Joint Conference on Neural Networks (IJCNN), IEEE, 2019. https://doi.org/10.1109%2Fijcnn.2019.8852113

  61. Maguire FB, Morris CR, Parikh-Patel A, Cress RD, Keegan THM, Li CS, Lin PS, Kizer KW. A text-mining approach to obtain detailed treatment information from free-text fields in population-based cancer registries: A study of non-small cell lung cancer in california. PLOS ONE 2019;14(2):e0212454, https://doi.org/10.1371%2Fjournal.pone.0212454

  62. Nuthakki S, Neela S, Gichoya JW, Purkayastha S. Natural language processing of mimic-iii clinical notes for identifying diagnosis and procedures with neural networks, 2019. arXiv preprint arXiv:191212397

  63. Ohno-Machado L, Séroussi B. Automatic methods to extract prescription status quality measures from unstructured health records. In: MEDINFO 2019: Health and Wellbeing e-Networks for All: Proceedings of the 17th World Congress on Medical and Health Informatics, IOS Press, 2019;264(15).

  64. Pérez A, Weegar R, Casillas A, Gojenola K, Oronoz M, Dalianis H. Semi-supervised medical entity recognition: A study on spanish and swedish clinical corpora. J Biomed Info 2017;71:16–30, https://doi.org/10.1016%2Fj.jbi.2017.05.009

  65. Raiskin Y, Eickhoff C, Beeler PE. Categorization of free-text drug orders using character-level recurrent neural networks. Int J Med Info 2019;129:20–28. https://doi.org/10.1016%2Fj.ijmedinf.2019.05.020

  66. Schneider ETR, deSouza JVA, Knafou J, eOliveira LES, Copara J, Gumiel YB, deOliveira LFA, Paraiso EC, Teodoro D, Barra CMCM. Biobertpt-a portuguese neural language model for clinical named entity recognition. In: Proceedings of the 3rd Clin Natural Language Process Workshop, 2020;65–72.

  67. Sen C, Hartvigsen T, Kong X, Rundensteiner E. Patient-level classification on clinical note sequences guided by attributed hierarchical attention. In: 2019 IEEE International Conference on Big Data (Big Data), IEEE, 2019. https://doi.org/10.1109%2Fbigdata47090.2019.9006403

  68. Sharma B, Dligach D, Swope K, Salisbury-Afshar E, Karnik NS, Joyce C, Afshar M. Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients. BMC Med Info Dec Making, 2020;20(1). https://doi.org/10.1186/s12911-020-1099-y

  69. Soriano IM, Peña JLC. Automatic medical concept extraction from free text clinical reports, a new named entity recognition approach. Int J Computers, 2017;2.

  70. Spandorfer A, Branch C, Sharma P, Sahbaee P, Schoepf UJ, Ravenel JG, Nance JW. Deep learning to convert unstructured ct pulmonary angiography reports into structured reports. European radiology experimental. 2019;3(1):37.

    Article  Google Scholar 

  71. Steinkamp JM, Bala W, Sharma A, Kantrowitz JJ. Task definition, annotated dataset, and supervised natural language processing models for symptom extraction from unstructured clinical notes. J Biomed Info. 2020;102:103354. https://doi.org/10.1016%2Fj.jbi.2019.103354

  72. Symeonidou A, Sazonau V, Groth P. Transfer learning for biomedical named entity recognition with biobert. In: SEMANTICS Posters & Demos. 2019.

  73. Tarcar AK, Tiwari A, Rao D, Dhaimodker VN, Rebelo P, Desai R. Healthcare ner models using language model pretraining. In: Proceedings of the 13th International Conference on Web Search and Data Mining, WSDM ’20, 2020;12–18.

  74. Wang Q, Zeng L. Chinese symptom component recognition via bidirectional lstm-crf. In: 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI), 2018;45–50. https://doi.org/10.1109/ICACI.2018.8377564

  75. Wang R, Zhao J, Peng L, Yang B, Wang L, Li B. Medical entity recognition of esophageal carcinoma based on word clustering. In: 2018 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), IEEE, 2018a. https://doi.org/10.1109%2Fspac46244.2018.8965515

  76. Wang S, Ma S, Chen M, Wei M, Yu G. A childhood disease database based on word segmentation technology: Research and practice. In: 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), IEEE, 2018b. https://doi.org/10.1109%2Fcompsac.2018.10269

  77. Wang S, Pang M, Pan C, Yuan J, Xu B, Du M, Zhang H. Information extraction for intestinal cancer electronic medical records. IEEE Access, 2020;8:125923–125934. https://doi.org/10.1109/access.2020.3005684

  78. Weegar R, Perez A, Casillas A, Oronoz M. Deep medical entity recognition for swedish and spanish. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE, 2018. https://doi.org/10.1109%2Fbibm.2018.8621282

  79. Weeks HL, Beck C, McNeer E, Williams ML, Bejan CA, Denny JC, Choi L. medExtractR: A targeted, customizable approach to medication extraction from electronic health records. J Am Med Inform Association, 2020;27(3):407–418, 10.1093/jamia/ocz207. https://doi.org/10.1093%2Fjamia%2Focz207

  80. Wunnava S, Qin X, Kakar T, Sen C, Rundensteiner EA, Kong X. Adverse drug event detection from electronic health records using hierarchical recurrent neural networks with dual-level embedding. Drug Safety, 2019;42(1):113–122. https://doi.org/10.1007/s40264-018-0765-9

  81. Yang T, Jiang D, Shi S, Zhan S, Zhuo L, Yin Y, Liang Z. Chinese data extraction and named entity recognition. In: 2020 5th IEEE International Conference on Big Data Analytics (ICBDA), IEEE, 2020. https://doi.org/10.1109%2Ficbda49040.2020.9101204

  82. Yin M, Mou C, Xiong K, Ren J. Chinese clinical named entity recognition with radical-level feature and self-attention mechanism. J Biomed Inform, 2019;98:103289. https://doi.org/10.1016%2Fj.jbi.2019.103289

  83. Zhang T, Wang Y, Wang X, Yang Y, Ye Y. Constructing fine-grained entity recognition corpora based on clinical records of traditional chinese medicine. BMC Medical Informatics and Decision Making 2020;20(1) https://doi.org/10.1186/s12911-020-1079-2

  84. Zhang Y, Wang X, Hou Z, Li J. Clinical named entity recognition from chinese electronic health records via machine learning methods. JMIR medical informatics, 2018b;6(4):e50.

  85. Zhao B. Clinical data extraction and normalization of cyrillic electronic health records via deep-learning natural language processing. JCO Clin Canc Inform, 2019;(3):1–9. https://doi.org/10.1200%2Fcci.19.00057

  86. Almeida JR, Matos S. Rule-based extraction of family history information from clinical notes. In: Proceedings of the 35th Annual ACM Symposium on Applied Computing, ACM. 2020. https://doi.org/10.1145%2F3341105.3374000

  87. Alodadi MS, Janeja VP. Clinical entities association rules (CLEAR): Untangling clinical notes in electronic health records. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE. 2019. https://doi.org/10.1109%2Fbibm47256.2019.8983140

  88. Balabaeva K, Kovalchuk S. Experiencer detection and automated extraction of a family disease tree from medical texts in russian language. In: Krzhizhanovskaya VV, Závodszky G, Lees MH, Dongarra JJ, Sloot PMA, Brissos S, Teixeira J, editors. Computational Science - ICCS 2020. Cham: Springer International Publishing, 2020;603–12.

    Chapter  Google Scholar 

  89. Boytcheva S. Indirect association rules mining in clinical texts. In: International Conference on Artificial Intelligence: Methodology, Systems, and Applications, Springer, 2018;36–47.

  90. Cai T, Zhang L, Yang N, Kumamaru KK, Rybicki FJ, Cai T, Liao KP. EXTraction of EMR numerical data: an efficient and generalizable tool to EXTEND clinical research. BMC Medical Informatics and Decision Making, 2019;19(1). https://doi.org/10.1186/s12911-019-0970-1

  91. Chen L, Song L, Shao Y, Li D, Ding K. Using natural language processing to extract clinically useful information from chinese electronic medical records. Int J Med Inform, 2019a;124:6–12. http://www.sciencedirect.com/science/article/pii/S138650561830594X

  92. Cheng M, Li L, Ren Y, Lou Y, Gao J. A hybrid method to extract clinical information from chinese electronic medical records. IEEE Access, 2019;7:70624–70633. https://doi.org/10.1109%2Faccess.2019.2919121

  93. Dandala B, Joopudi V, Tsou C, Liang JJ, Suryanarayanan P. Extraction of information related to drug safety surveillance from electronic health record notes: Joint modeling of entities and relations using knowledge-aware neural attentive models. JMIR Medical Informatics. 2020;8(7). www.scopus.com

  94. Fonferko-Shadrach B, Lacey AS, Roberts A, Akbari A, Thompson S, Ford DV, Lyons RA, Rees MI, Pickrell WO. Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the exect (extraction of epilepsy clinical text) system. BMJ Open, 2019;9(4). https://bmjopen.bmj.com/content/9/4/e023232

  95. Iqbal E, Mallah R, Rhodes D, Wu H, Romero A, Chang N, Dzahini O, Pandey C, Broadbent M, Stewart R, Dobson RJB, Ibrahim ZM. ADEPt, a semantically-enriched pipeline for extracting adverse drug events from free-text electronic health records. PLOS ONE, 2017;12(11):e0187121. https://doi.org/10.1371/journal.pone.0187121

  96. Kersloot MG, Lau F, Abu-Hanna A, Arts DL, Cornet R. Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES. J Biomed Semantics, 2019;10(1). https://doi.org/10.1186/s13326-019-0207-3

  97. Lamy M, Pereira R, Ferreira JC, Melo F, Velez I. Extracting clinical knowledge from electronic medical records. Extracting clinical knowledge from electronic medical records, 2018a;(3):488–493.

  98. Leiter RE, Santus E, Jin Z, Lee KC, Yusufov M, Chien I, Ramaswamy A, Moseley ET, Qian Y, Schrag D, Lindvall C. Deep natural language processing to identify symptom documentation in clinical notes for patients with heart failure undergoing cardiac resynchronization therapy. J Pain and Symp Manage, 2020;60(5):948–958.e3. http://www.sciencedirect.com/science/article/pii/S0885392420305248

  99. Li P, Yuan Z, Tu W, Yu K, Lu D. Medical knowledge extraction and analysis from electronic medical records using deep learning. Chinese Med Sci J, 2019;34(2):133–139. http://www.sciencedirect.com/science/article/pii/S1001929419300355

  100. Li Z, Li C, Long Y, Wang X. A system for automatically extracting clinical events with temporal information. BMC Medical Informatics and Decision Making, 2020c;20(1). https://doi.org/10.1186/s12911-020-01208-9

  101. Liu S, Pan X, Chen B, Gao D, Hao T. An automated approach for clinical quantitative information extraction from chinese electronic medical records. In: Health Information Science, Springer International Publishing, 2018;98–109. https://doi.org/10.1007%2F978-3-030-01078-2_9

  102. Luo Y. Recurrent neural networks for classifying relations in clinical notes. J Biomed Inform, 2017;72:85–95, https://doi.org/10.1016%2Fj.jbi.2017.07.006

  103. Luo Y, Cheng Y, Uzuner O, Szolovits P, Starren J. Segment convolutional neural networks (seg-cnns) for classifying relations in clinical notes. J Am Med Inform Assoc. 2018;25(1):93–8.

    Article  Google Scholar 

  104. Munkhdalai T, Liu F, Yu H. Clinical relation extraction toward drug safety surveillance using electronic health record narratives: Classical learning versus deep learning. JMIR Public Health and Surveillance, 2018;4(2):e29. https://doi.org/10.2196%2Fpublichealth.9361

  105. Natarajan S, Bangera V, Khot T, Picado J, Wazalwar A, Costa VS, Page D, Caldwell M. Markov logic networks for adverse drug event extraction from text. Knowl Info Syst, 2016;51(2):435–457. https://doi.org/10.1007/s10115-016-0980-6

  106. Peterson KJ, Liu H. Automating the transformation of free-text clinical problems into snomed ct expressions. AMIA Summits on Translational Science Proceedings. 2020;2020:497.

    Google Scholar 

  107. Sagheb E, Ramazanian T, Tafti AP, Fu S, Kremers WK, Berry DJ, Lewallen DG, Sohn S, Kremers HM. Use of natural language processing algorithms to identify common data elements in operative notes for knee arthroplasty. The J Arthroplasty. 2020. 

  108. Shah S, Luo X, Kanakasabai S, Tuason R, Klopper G. Neural networks for mining the associations between diseases and symptoms in clinical notes. Health Info Sci Syst 2018;7(1). https://doi.org/10.1007%2Fs13755-018-0062-0

  109. Shi X, Jiang D, Huang Y, Wang X, Chen Q, Yan J, Tang B. Family history information extraction via deep joint learning. BMC Med Info Dec Making, 2019;19(S10). https://doi.org/10.1186%2Fs12911-019-0995-5

  110. Singh G, Marshall IJ, Thomas J, Shawe-Taylor J, Wallace BC. A neural candidate-selector architecture for automatic structured clinical text annotation. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Association for Computing Machinery, New York, NY, USA, CIKM ’17, 2017;1519-1528. https://doi.org/10.1145/3132847.3132989

  111. Song B, Feng Y, Li X, Sun Z, Yang Y. Un-apriori: A novel association rule mining algorithm for unstructured emrs. In: 2017 IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom), 2017;1–6. https://doi.org/10.1109/HealthCom.2017.8210792

  112. Steinkamp JM, Chambers C, Lalevic D, Zafar HM, Cook TS. Toward complete structured information extraction from radiology reports using machine learning. J Dig Imag 2019;32(4):554–564. https://doi.org/10.1007/s10278-019-00234-y

  113. Su J, Hu J, Jiang J, Xie J, Yang Y, He B, Yang J, Guan Y. Extraction of risk factors for cardiovascular diseases from chinese electronic medical records. Comp Meth Prog Biomed, 2019;172:1–10. https://doi.org/10.1016%2Fj.cmpb.2019.01.007

  114. Viani N, Larizza C, Tibollo V, Napolitano C, Priori SG, Bellazzi R, Sacchi L. Information extraction from italian medical reports: An ontology-driven approach. International J Med Inform, 2018;111:140- 148. http://www.sciencedirect.com/science/article/pii/S1386505617304586

  115. Yadav S, Ramteke P, Ekbal A, Saha S, Bhattacharyya P. Exploring disorder-aware attention for clinical event extraction. ACM Transactions on Multimedia Computing, Communications, and Applications, 2020b;16(1s):1–21. https://doi.org/10.1145%2F3372328

  116. Yang X, Bian J, Gong Y, Hogan WR, Wu Y. MADEx: A system for detecting medications, adverse drug events, and their relations from clinical notes. Drug Safety, 2019;42(1):123–133. https://doi.org/10.1007/s40264-018-0761-0

  117. Yehia E, Boshnak H, AbdelGaber S, Abdo A, Elzanfaly DS. Ontology-based clinical information extraction from physician’s free-text notes. J Biomed Inform. 2019;98.

    Article  Google Scholar 

  118. Zhang Z, Zhou T, Zhang Y, Pang Y. Attention-based deep residual learning network for entity relation extraction in chinese EMRs. BMC Med Inform Decision Making, 2019b;19(S2), https://doi.org/10.1186%2Fs12911-019-0769-0

  119. Kenei JK, Moso JC, Omullo ETO, Oboko R. Deep CNN with residual connections and range normalization for clinical text classification. Comp Sci Inform Tech, 2019;7(4):111–127. https://doi.org/10.13189%2Fcsit.2019.070402

  120. Moen H, Hakala K, Peltonen LM, Suhonen H, Loukasmki P, Salakoski T, Ginter F, Salanter S. Evaluation of a prototype system that automatically assigns subject headings to nursing narratives using recurrent neural network. In: Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis, Association for Computational Linguistics, 2018. https://doi.org/10.18653%2Fv1%2Fw18-5611

  121. Moen H, Hakala K, Peltonen LM, Suhonen H, Ginter F, Salakoski T, Salanter S. Supporting the use of standardized nursing terminologies with automatic subject heading prediction: a comparison of sentence-level text classification methods. J Am Med Inform Assoc. 2019;27(1):81–8. https://doi.org/10.1093/jamia/ocz150.

    Article  Google Scholar 

  122. Moen H, Hakala K, Peltonen LM, Matinolli HM, Suhonen H, Terho K, Danielsson-Ojala R, Valta M, Ginter F, Salakoski T, Salanter S. Assisting nurses in care documentation: from automated sentence classification to coherent document structures with subject headings. J Biomed Seman, 2020;11(1) https://doi.org/10.1186%2Fs13326-020-00229-7

  123. Wu PH, Yu A, Tsai CW, Koh JL, Kuo CC, Chen ALP. Keyword extraction and structuralization of medical reports. Health Information Science and Systems, 2020;8(1). https://doi.org/10.1007/s13755-020-00108-6

  124. Zhang R, Chu F, Chen D, Shang X. A text structuring method for chinese medical text based on temporal information. Int J Environ Res Pub Health, 2018a;15(3), https://www.mdpi.com/1660-4601/15/3/402

  125. Mansouri A, Affendey LS, Mamat A. Named entity recognition approaches. International Journal of Computer Science and Network Security. 2008;8(2):339–44.

    Google Scholar 

  126. Soriano IM, Castro J. Dner clinical (named entity recognition) from free clinical text to snomed-ct concept. WSEAS Trans Comput. 2017;16:83–91.

    Google Scholar 

  127. Han X, Ruonan R. The method of medical named entity recognition based on semantic model and improved svm-knn algorithm. In: 2011 Seventh International Conference on Semantics. IEEE: Knowledge and Grids; 2011. p. 21–7.

    Chapter  Google Scholar 

  128. Kundeti SR, Vijayananda J, Mujjiga S, Kalyan M. Clinical named entity recognition: Challenges and opportunities. In: 2016 IEEE International Conference on Big Data (Big Data), 2016;(1937–1945)10.1109/BigData.2016.7840814.

  129. Saripalle R, Sookhak M, Haghparast M. An interoperable umls terminology service using fhir. Future Internet. 2020;12(11):199.

    Article  Google Scholar 

  130. Browne AC, Divita G, Aronson AR, McCray AT. Umls language and vocabulary tools: Amia 2003 open source expo. In: AMIA annual symposium proceedings, American Medical Informatics Association, 2003;798.

  131. Organization WH, et al. The ICD-10 classification of mental and behavioural disorders: diagnostic criteria for research, vol 2. World Health Organization. 1993. 

  132. Donnelly K. Snomed-ct: The advanced terminology and coding system for ehealth. Studies in health technology and informatics. 2006;121:279.

    Google Scholar 

  133. Liu S, Ma W, Moore R, Ganesan V, Nelson S. Rxnorm: prescription for electronic drug information exchange. IT professional. 2005;7(5):17–23.

    Article  Google Scholar 

  134. Zeng Z, Deng Y, Li X, Naumann T, Luo Y. Natural language processing for ehr-based computational phenotyping. IEEE/ACM Trans Comput Biol Bioinf. 2018;16(1):139–53.

    Article  Google Scholar 

  135. Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG. Mayo clinical text analysis and knowledge extraction system (ctakes): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17(5):507–13.

    Article  Google Scholar 

  136. Aronson AR, Lang FM. An overview of metamap: historical perspective and recent advances. J Am Med Inform Assoc. 2010;17(3):229–36.

    Article  Google Scholar 

  137. Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC. Medex: a medication information extraction system for clinical narratives. J Am Med Inform Assoc. 2010;17(1):19–24.

    Article  Google Scholar 

  138. Osborne JD, Gyawali B, Solorio T. Evaluation of ytex and metamap for clinical concept recognition. 2014, arXiv preprint arXiv:14021668

  139. Gorrell G, Song X, Roberts A. Bio-yodie: A named entity linking system for biomedical text. 2018, arXiv preprint arXiv:181104860

  140. Soysal E, Wang J, Jiang M, Wu Y, Pakhomov S, Liu H, Xu H. Clamp-a toolkit for efficiently building customized clinical natural language processing pipelines. J Am Med Inform Assoc. 2018;25(3):331–6.

    Article  Google Scholar 

  141. Friedman C, Shagina L, Lussier Y, Hripcsak G. Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc. 2004;11(5):392–402.

    Article  Google Scholar 

  142. Neumann M, King D, Beltagy I, Ammar W. Scispacy: Fast and robust models for biomedical natural language processing. 2019, arXiv preprint arXiv:190207669

  143. Tibbo ME, Wyles CC, Fu S, Sohn S, Lewallen DG, Berry DJ, Kremers HM. Use of natural language processing tools to identify and classify periprosthetic femur fractures. J Arthroplasty. 2019;34(10):2216–9.

    Article  Google Scholar 

  144. Névéol A, Dalianis H, Velupillai S, Savova G, Zweigenbaum P. Clinical natural language processing in languages other than english: opportunities and challenges. Journal of biomedical semantics. 2018;9(1):12.

    Article  Google Scholar 

  145. Fu TJ, Li PH, Ma WY. Graphrel: Modeling text as relational graphs for joint entity and relation extraction. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019;1409–1418

  146. Yadav S, Ramesh S, Saha S, Ekbal A. Relation extraction from biomedical and clinical text: Unified multitask learning framework. IEEE/ACM Transac Comput Biol Bioinform, 2020a. 

  147. Uzuner O, Solti I, Xia F, Cadag E. Community annotation experiment for ground truth generation for the i2b2 medication challenge. J Am Med Inform Asso 2010;17(5):519–523, https://doi.org/10.1136/jamia.2010.004200, https://academic.oup.com/jamia/article-pdf/17/5/519/5940619/17-5-519.pdf

  148. Boshnaka H, AbdelGaberb S, AmanyAbdoc EY. Ontology-based knowledge modelling for clinical data representation in electronic health records. Int J Comp Sci Info Sec (IJCSIS) 2018;16(10).

  149. Tomanek K, Wermter J, Hahn U. Sentence and token splitting based on conditional random fields. In: Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics, Citeseer, 2007;49(57).

  150. Deshpande S, Palshikar GK, Athiappan G. An unsupervised approach to sentence classification. In: COMAD, 2020;88

  151. Cameron S, Turtle-Song I. Learning to write case notes using the soap format. Journal of Counseling & Development. 2002;80(3):286–92.

    Article  Google Scholar 

  152. Gallant SI, Gallant SI. Neural network learning and expert systems. MIT press, 1993. 

  153. Dash S, Dash S, Tripathy BK, Rahman A. Handbook of Research on Modeling, Analysis, and Application of Nature-Inspired Metaheuristic Algorithms. 1st ed. USA: IGI Global, 2017.

    Google Scholar 

  154. Lafferty J, McCallum A, Pereira FC. Conditional random fields: Probabilistic models for segmenting and labeling sequence data, 2001.

  155. Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 2019;36(4):1234–1240, https://doi.org/10.1093/bioinformatics/btz682, https://academic.oup.com/bioinformatics/article-pdf/36/4/1234/32527770/btz682.pdf

  156. Zhang H, Candido E, Wilton AS, Duchen R, Jaakkimainen L, Wodchis W, Morris Q (2019a) Identifying transitional high cost users from unstructured patient profiles written by primary care physicians. In: Biocomputing 2020, World Sci, https://doi.org/10.1142%2F9789811215636_0012

  157. Bampa M, Dalianis H. Detecting adverse drug events from swedish electronic health records using text mining. In: Proceedings of the LREC 2020 Workshop on Multilingual Biomedical Text Processing (MultiligualBIO), 2020;1–8.

  158. Lamy M, Pereira R, Ferreira JC, Vasconcelos JB, Melo F, Velez I. Extracting clinical information from electronic medical records. In: Int Symp on Amb Intel, Springer, 2018b;113–120.

  159. Blinov P, Avetisian M, Kokh V, Umerenkov D, Tuzhilin A. Predicting clinical diagnosis from patients electronic health records using BERT-based neural networks. In: Artificial Intelligence in Medicine, Springer International Publishing, 2020;111–121. https://doi.org/10.1007%2F978-3-030-59137-3_11.

  160. Feng J, Shaib C, Rudzicz F. Explainable clinical decision support from text. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020;1478–1489.

  161. Singh AKB, Guntu M, Bhimireddy AR, Gichoya JW, Purkayastha S. Multi-label natural language processing to identify diagnosis and procedure codes from mimic-iii inpatient notes. 2020;2003.07507

Download references

Acknowledgements

The authors would like to thank the Coordination for the Improvement of Higher Education Personnel - CAPES (Financial Code 001), the National Council for Scientific and Technological Development - CNPq (Grant number 309537 / 2020-7) and the Instituto Federal de Educação, Ciência e Tecnologia do Rio Grande do Sul (IFRS) for their support in this work.

Funding

The article was partially funded by the Coordination for the Improvement of Higher Education Personnel - CAPES (Financial Code 001), the National Council for Scientific and Technological Development - CNPq (Grant number 309537 / 2020-7), and the Instituto Federal de Educação, Ciência e Tecnologia do Rio Grande do Sul (IFRS).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cristiano André da Costa.

Ethics declarations

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

de Oliveira, J.M., da Costa, C.A. & Antunes, R.S. Data structuring of electronic health records: a systematic review. Health Technol. 11, 1219–1235 (2021). https://doi.org/10.1007/s12553-021-00607-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12553-021-00607-w

Keywords

Navigation