Skip to main content
Log in

Extracting and structuring information from the electronic medical text: state of the art and trendy directions

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In the medical field, doctors must have comprehensive knowledge by reading and writing narrative documents, and they are responsible for every decision they take for patients. Unfortunately, reading all the necessary information about drugs, diseases, and patients might be time-consuming due to the large number of documents that are increasing every day. Consequently, potential medical errors could be hazardous. Likewise, information extraction can handle this problem using several important tasks to structure the text and extract the relevant and desired information from unstructured text written in natural language. The main principle tasks are named entity recognition and relation extraction. However, to treat the narrative text, we should use natural language processing techniques to extract useful information and features. In our paper, we show and discuss several techniques and useful data used for these tasks. Furthermore, we outline the challenges in information extraction from medical documents. To our knowledge, this is the most comprehensive survey in the literature with a numerical comparison and a suggestion for some uncovered directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Availability of data and materials

Not applicable.

References

  1. Akbik A, Bergmann T, Blythe D et al (2019) FLAIR: an easy-to-use framework for state-of-the-art NLP. In: Proceedings of the 2019 Conference of the north american chapter of the association for computational linguistics (Demonstrations), pp 54–59

  2. Alex B, Grover C, Tobin R et al (2019) Text mining brain imaging reports. J Biomed Semant 10(1):1–11

    Google Scholar 

  3. Angeli G, Premkumar MJJ, Manning CD (2015) Leveraging linguistic structure for open domain information extraction. In: Proceedings of the 53rd Annual meeting of the association for computational linguistics and the 7th International joint conference on natural language processing (vol 1: Long Papers), pp 344–354

  4. Apostolova E, Channin DS, Demner-Fushman D et al (2009) Automatic segmentation of clinical texts. In: 2009 Annual international conference of the IEEE engineering in medicine and biology society, IEEE, pp 5905–5908

  5. Arbabi A, Adams DR, Fidler S et al (2019) Identifying clinical terms in medical text using Ontology-Guided machine learning. JMIR Med Inform 7(2):e12,596

    Google Scholar 

  6. Aronson AR, Lang FM (2010) An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc 17(3):229–236

    PubMed  PubMed Central  Google Scholar 

  7. Aydar M, Bozal O, Ozbay F (2020) Neural relation extraction: a survey. arXiv e-prints pp arXiv–2007

  8. Batista DS (2018) Named-Entity evaluation metrics based on entity-level. http://www.davidsbatista.net/blog/2018/05/09/Named_Entity_Evaluation

  9. Beel J, Gipp B, Shaker A et al (2010) SciPlore xtract: extracting titles from scientific PDF documents by analyzing style information (font size). In: International conference on theory and practice of digital libraries, Springer, pp 413–416

  10. Ben Abdessalem Karaa W, Alkhammash EH, Bchir A (2021) Drug disease relation extraction from biomedical literature using NLP and machine learning. Mob Inf Syst, p 2021

  11. Berrazega I (2012) Temporal information processing: a survey. Int J Naturel Lang Comput 1(2):1–14

    Google Scholar 

  12. Bethard S, Savova G, Chen WT et al (2016) Semeval-2016 task 12: Clinical tempeval. In: Proceedings of the 10th International workshop on semantic evaluation (SemEval-2016), pp 1052–1062

  13. Bethard S, Savova G, Palmer M et al (2017) SemEval-2017 task 12: Clinical TempEval. In: Proceedings of the 11th International workshop on semantic evaluation (SemEval-2017). Association for computational linguistics, Vancouver, Canada, pp 565–572. https://doi.org/10.18653/v1/S17-2093

  14. Bhatia P, Celikkaya B, Khalilia M (2019) Joint entity extraction and assertion detection for clinical text. In: Proceedings of the 57th Conference of the association for computational linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, vol 1: Long Papers. Association for computational linguistics, pp 954–959. https://doi.org/10.18653/v1/p19-1091

  15. Bodenreider O (2004) The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 32(suppl_1):D267–D270

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Bottou L (1999) On-line learning and stochastic approximations. Cambridge University Press, USA, pp 9–42

    Google Scholar 

  17. Bramsen P, Deshpande P, Lee YK et al (2006) Finding temporal order in discharge summaries. In: AMIA annual symposium proceedings, american medical informatics association, p 81

  18. Carrell D S, Halgrim S, Tran D T et al (2014) Using natural language processing to improve efficiency of manual chart abstraction in research: The case of breast cancer recurrence. Am J Epidemiol 179(6):749–758

    PubMed  PubMed Central  Google Scholar 

  19. Chapman W, Dowling J, Chu D (2007) ConText: an algorithm for identifying contextual features from clinical text. In: Biological, translational, and clinical language processing, pp 81–88

  20. Chapman WW, Savova GK, Zheng J et al (2012) Anaphoric reference in clinical reports: characteristics of an annotated corpus. J Biomed Inform 45(3):507–521

    PubMed  Google Scholar 

  21. Chirila OS, Chirila CB, Stoicu-Tivadar L (2019) Named entity recognition and classification for medical prospectuses. Stud Health Technol Inform 262:284–287

    PubMed  Google Scholar 

  22. Chirila OS, Chirila CB, Stoicu-Tivadar L (2019) Improving the prescription process information support with structured medical prospectuses using neural networks. Stud Health Technol Inform 264:353–357

    PubMed  Google Scholar 

  23. Cohen KB, Lanfranchi A, MJy Choi et al (2017) Coreference annotation and resolution in the colorado richly annotated full text (CRAFT) corpus of biomedical journal articles. BMC Bioinforma 18(1):1–14

    Google Scholar 

  24. Cohen KB, Verspoor K, Fort K et al (2017) The colorado richly annotated full text (craft) corpus: Multi-model annotation in the biomedical domain. In: Handbook of linguistic annotation. Springer, pp 1379–1394

  25. Dai X, Karimi S, Hachey B et al (2020) An effective transition-based model for discontinuous NER. arXiv:200413454

  26. Dai HJ, Syed-Abdul S, Chen C W et al (2015) Recognition and evaluation of clinical section headings in clinical documents using token-based formulation with conditional random fields. BioMed Research International, p 2015

  27. De Bruijn B, Cherry C, Kiritchenko S et al (2011) Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. J Am Med Inform Assoc 18(5):557–562

    PubMed  PubMed Central  Google Scholar 

  28. Del Corro L, Gemulla R (2013) Clausie: clause-based open information extraction. In: Proceedings of the 22nd international conference on World Wide Web, pp 355–366

  29. Deléger L, Névéol A (2014) Automatic identification of document sections for designing a french clinical corpus (identification automatique de zones dans des documents pour la constitution d’un corpus médical en français) [in french]. In: TALN

  30. Deng N, Fu H, Chen X (2021) Named entity recognition of traditional chinese medicine patents based on BiLSTM-CRF. Wirel Commun Mob Comput, p 2021

  31. Devlin J, Chang M, Lee K et al (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the north american chapter of the association for computational linguistics: Human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, vol 1 (Long and Short Papers). Association for computational linguistics, pp 4171–4186. https://doi.org/10.18653/v1/n19-1423

  32. Donnelly K (2006) SNOMED-CT: the advanced terminology and coding system for ehealth. Stud Health Technol Inform 121:279

    PubMed  Google Scholar 

  33. Doġan RI, Leaman R, Lu Z (2014) NCBI disease corpus: a resource for disease name recognition and concept normalization. J Biomed Inform 47:1–10

    PubMed  PubMed Central  Google Scholar 

  34. drissiya El-allaly E, Sarrouti M, En-Nahnahi N et al (2022) An attentive joint model with transformer-based weighted graph convolutional network for extracting adverse drug event relation. J Biomed Inform 125(103):968

    Google Scholar 

  35. Edinger T, Demner-Fushman D, Cohen AM et al (2017) Evaluation of clinical text segmentation to facilitate cohort retrieval. In: AMIA Annual symposium proceedings, american medical informatics association, p 660

  36. Elhadad N, Pradhan S, Gorman S et al (2015) SemEval-2015 task 14: Analysis of clinical text. In: Proceedings of the 9th International workshop on semantic evaluation (SemEval, vol 2015, pp 303–310

  37. Eriksson R, Jensen P B, Frankild S et al (2013) Dictionary construction and identification of possible adverse drug events in danish clinical narrative text. J Am Med Inform Assoc 20(5):947–953

    PubMed  PubMed Central  Google Scholar 

  38. Fader A, Soderland S, Etzioni O (2011) Identifying relations for open information extraction. In: Proceedings of the 2011 conference on empirical methods in natural language processing, pp 1535–1545

  39. Ford E, Carroll JA, Smith HE et al (2016) Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Inform Assoc 23(5):1007–1015

    PubMed  PubMed Central  Google Scholar 

  40. Fundel K, Küffner R, Zimmer R (2007) RelEx—Relation extraction using dependency parse trees. Bioinformatics 23(3):365–371

    CAS  PubMed  Google Scholar 

  41. Garvin JH, DuVall SL, South BR et al (2012) Automated extraction of ejection fraction for quality measurement using regular expressions in Unstructured Information Management Architecture (UIMA) for heart failure. J Am Med Inform Assoc 19(5):859–866

    PubMed  PubMed Central  Google Scholar 

  42. Ghiasvand O, Kate RJ (2018) Learning for clinical named entity recognition without manual annotations. Inform Med Unlocked 13:122–127

    Google Scholar 

  43. Goenaga I, Lahuerta X, Atutxa A et al (2021) A section identification tool: Towards HL7 CDA/CCR standardization in spanish discharge summaries. J Biomed Inf 121(103):875

    Google Scholar 

  44. Grishman R, Sundheim BM (1996) Message understanding conference-6: A brief history. In: COLING 1996 vol 1: The 16th International conference on computational linguistics

  45. Guo F, He R, Dang J (2019) Implicit discourse relation recognition via a BiLSTM-CNN architecture with dynamic chunk-based max pooling. IEEE Access 7(169):281–169,292

    Google Scholar 

  46. Hafiene N, Karoui W, Romdhane LB (2020) Influential nodes detection in dynamic social networks: A survey. Exp Syst Appl 159(113):642

    Google Scholar 

  47. Hahn U, Oleynik M (2020) Medical information extraction in the age of deep learning. Yearb Med Inform 29(01):208–220

    PubMed  PubMed Central  Google Scholar 

  48. Hallersten A, Fürst W, Mezzasalma R (2016) Physicians prefer greater detail in the biosimilar label (SmPC)–results of a survey across seven european countries. Regul Toxicol Pharmacol 77:275–281

    PubMed  Google Scholar 

  49. Hasan F, Roy A, Pan S (2020) Integrating text embedding with traditional NLP features for clinical relation extraction. In: 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), IEEE, pp 418–425

  50. Haug PJ, Wu X, Ferraro JP et al (2014) Developing a section labeler for clinical documents. In: AMIA Annual symposium proceedings, american medical informatics association, p 636

  51. He S, Sun D, Wang Z (2022) Named entity recognition for chinese marine text with knowledge-based self-attention. Multimed Tool Appl 81 (14):19,135–19,149

    Google Scholar 

  52. Henry S, Buchan K, Filannino M et al (2020) 2018 n2c2 shared task on adverse drug events and medication extraction in electronic health records. J Am Med Inform Assoc 27(1):3–12

    PubMed  Google Scholar 

  53. Hong WS, Haimovich AD, Taylor RA (2018) Predicting hospital admission at emergency department triage using machine learning. PloS one 13 (7):e0201,016

    Google Scholar 

  54. Honnibal M, Montani I (2017) spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing, to appear

  55. Hsu W, Han SX, Arnold CW et al (2015) A data-driven approach for quality assessment of radiologic interpretations. J Am Med Inform Assoc 23(e1):e152–e156

    PubMed  PubMed Central  Google Scholar 

  56. Islamaj R, Leaman R, Kim S et al (2021) NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature. Sci Data 8(1):1–12

    Google Scholar 

  57. Jagannatha A, Liu F, Liu W et al (2019) Overview of the first natural language processing challenge for extracting medication, indication, and adverse drug events from electronic health record notes (MADE 1.0). Drug Saf 42(1):99–111

    PubMed  PubMed Central  Google Scholar 

  58. Jancsary J, Matiasek J, Trost H (2008) Revealing the structure of medical dictations with conditional random fields. In: Proceedings of the 2008 Conference on empirical methods in natural language processing, pp 1–10

  59. Jaouadi M, Romdhane LB (2019) Influence maximization problem in social networks: An overview. In: 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA), IEEE, pp 1–8

  60. Jelier R, Jenster G, Dorssers LC et al (2005) Co-occurrence based meta-analysis of scientific texts: retrieving biological relationships between genes. Bioinformatics 21(9):2049–2058

    CAS  PubMed  Google Scholar 

  61. Johnson AE, Pollard TJ, Shen L et al (2016) MIMIC-III, a freely accessible critical care database. Sci Data 3(1):1–9

    Google Scholar 

  62. Jonnalagadda SR, Adupa AK, Garg RP et al (2017) Text mining of the electronic health record: an information extraction approach for automated identification and subphenotyping of HFpEF patients for clinical trials. J Cardiovasc Transl Res 10(3):313–321

    PubMed  Google Scholar 

  63. Karlsson I, Boström H (2016) Predicting adverse drug events using heterogeneous event sequences. In: 2016 IEEE International Conference on Healthcare Informatics (ICHI), IEEE, pp 356–362

  64. Kim Y, Heider PM, Lally IR et al (2021) A hybrid model for family history information identification and relation extraction: Development and evaluation of an End-to-End information extraction system. JMIR Med Inform 9 (4):e22,797

    Google Scholar 

  65. Koleck TA, Dreisbach C, Bourne PE et al (2019) Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review. J Am Med Inform Assoc 26(4):364–379

    PubMed  PubMed Central  Google Scholar 

  66. Komariah KS, Shin BK (2021) Medical entity recognition in twitter using conditional random fields. In: 2021 International Conference on Electronics, Information, and Communication (ICEIC), IEEE, pp 1–4

  67. Komninos A, Manandhar S (2016) Dependency based embeddings for sentence classification tasks. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1490–1500

  68. Kouni IBE, Karoui W, Romdhane LB (2021) WLNI-LPA: detecting overlapping communities in attributed networks based on label propagation process. In: Proceedings of the 16th International conference on software technologies, ICSOFT 2021, Online Streaming, July 6-8, 2021. SCITEPRESS, pp 408–416. https://doi.org/10.5220/0010605904080416

  69. Kreuzthaler M, Schulz S (2015) Detection of sentence boundaries and abbreviations in clinical narratives. BMC Medical Inform Decis Mak 15:S4–S4

    Google Scholar 

  70. Kroll H, Pirklbauer J, Ruthmann J et al (2020) A semantically enriched dataset based on biomedical NER for the COVID19 open research dataset challenge. arXiv:2005.08823

  71. Kropf S, Krücken P, Mueller W et al (2017) Structuring legacy pathology reports by openEHR archetypes to enable semantic querying. Method Inform Med 56(03):230–237

    Google Scholar 

  72. Kumar S (2017) A survey of deep learning methods for relation extraction. arXiv:170503645

  73. Lai KH, Topaz M, Goss FR et al (2015) Automated misspelling detection and correction in clinical free-text records. J Biomed Inform 55:188–195

    PubMed  Google Scholar 

  74. Lan M, Wang J, Wu Y et al (2017) Multi-task attention-based neural networks for implicit discourse relationship representation and identification. In: Proceedings of the 2017 Conference on empirical methods in natural language processing, pp 1299–1308

  75. Landolsi MY, Mohamed HH, Romdhane LB (2021) Image annotation in social networks using graph and multimodal deep learning features. Multimed Tools Appl 034(8):12,009–12

    Google Scholar 

  76. Laparra E, Su X, Zhao Y et al (2021) SemEval-2021 task 10: Source-free domain adaptation for semantic processing. In: Proceedings of the 15th International workshop on semantic evaluation (SemEval-2021). 348–356

  77. Laparra E, Xu D, Elsayed A et al (2018) SemEval 2018 task 6: Parsing time normalizations. In: SemEval@ NAACL-HLT, pp 88–96

  78. Lee W, Choi J (2018) Temporal segmentation for capturing snapshots of patient histories in korean clinical narrative. Healthc Inform Res 24(3):179–186

    PubMed  PubMed Central  Google Scholar 

  79. Lee J, Yoon W, Kim S et al (2020) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4):1234–1240

    CAS  PubMed  Google Scholar 

  80. Lei J, Tang B, Lu X et al (2014) A comprehensive study of named entity recognition in chinese clinical text. J Am Med Inform Assoc 21(5):808–814

    PubMed  Google Scholar 

  81. Leroy G, Chen H (2001) Filling preposition-based templates to capture information from medical abstracts. In: Biocomputing 2002. World Scientific. 350–361

  82. Li F, Lin Z, Zhang M et al (2021) A Span-Based model for joint overlapped and discontinuous named entity recognition. arXiv:2106.14373

  83. Li Y, Lipsky Gorman S, Elhadad N (2010) Section classification in clinical notes using supervised hidden markov model. In: Proceedings of the 1st ACM International health informatics symposium, pp 744–750

  84. Li W, Shi S, Gao Z et al (2018) Improved deep belief network model and its application in named entity recognition of chinese electronic medical records. In: 2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA), IEEE, pp 356–360

  85. Li J, Sun Y, Johnson RJ et al (2016) BioCreative v CDR task corpus: a resource for chemical disease relation extraction. Database, p 2016

  86. Liu F, Chen J, Jagannatha A et al (2016) Learning for biomedical information extraction: Methodological review of recent advances. arXiv:1606.07993

  87. Liu Y, Ott M, Goyal N et al (2019) RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692

  88. Liu Y, Wei L, Yao Z et al (2016) The practice and experience of emergency information system construction. Chin Digit Med 11(5):53–55

    Google Scholar 

  89. Lohr C, Luther S, Matthies F et al (2018) CDA-compliant section annotation of german-language discharge summaries: Guideline development, annotation campaign, section classification. In: AMIA 2018, American medical informatics association annual symposium, San Francisco, CA, November 3-7, 2018. AMIA

  90. Lohr C, Luther S, Matthies F et al (2018) CDA-compliant section annotation of german-language discharge summaries: guideline development, annotation campaign, section classification. In: AMIA Annual symposium proceedings, american medical informatics association, p 770

  91. Luan Y, Wadden D, He L et al (2019) A general framework for information extraction using dynamic span graphs. In: Proceedings of the 2019 Conference of the north american chapter of the association for computational linguistics: Human language technologies, vol 1 (Long and Short Papers). Association for computational linguistics, Minneapolis, Minnesota, pp 3036–3046. https://doi.org/10.18653/v1/N19-1308

  92. Ludwick DA, Doucette J (2009) Adopting electronic medical records in primary care: lessons learned from health information systems implementation experience in seven countries. Int J Med Inform 78(1):22–31

    CAS  PubMed  Google Scholar 

  93. Lupṡe O, Stoicu-Tivadar L (2018) Supporting prescriptions with synonym matching of section names in prospectuses. Stud Health Technol Inform 251:153–156

    PubMed  Google Scholar 

  94. Lupṡe O, Stoicu-Tivadar L (2018) Extracting and structuring drug information to improve e-prescription and streamline medical treatment. Appl Med Inf 40(1-2):7–14

    Google Scholar 

  95. Mabrouk O, Hlaoua L, Omri MN (2021) Exploiting ontology information in fuzzy SVM social media profile classification. Appl Intell 51(6):3757–3774

    Google Scholar 

  96. Mahendran D, McInnes BT (2021) Extracting adverse drug events from clinical notes. In: AMIA Annual symposium proceedings, american medical informatics association, p 420

  97. Mahendran D, Tang C, McInnes B (2022) Graph convolutional networks for chemical relation extraction. In: Proceedings of the semantics-enabled biomedical literature Analytics (SeBiLAn)

  98. Manning CD, Surdeanu M, Bauer J et al (2014) The stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pp 55–60

  99. Mausam SM, Bart R et al (2012) Open language learning for information extraction. In: Proceedings of the 2012 Joint conference on empirical methods in natural language processing and computational natural language learning. Association for computational linguistics, USA, EMNLP-CoNLL ’12, pp 523–534

  100. Mehrabi S, Krishnan A, Roch A M et al (2015) Identification of patients with family history of pancreatic cancer-investigation of an nlp system portability. Stud Health Technol Inform 216:604

    PubMed  PubMed Central  Google Scholar 

  101. Mercorelli L, Nguyen H, Gartell N et al (2022) A framework for de-identification of free-text data in electronic medical records enabling secondary use. Australian Health Review

  102. Meystre SM, Savova GK, Kipper-Schuler KC et al (2008) Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inf 17(01):128–144

    Google Scholar 

  103. Mnasri W, Azaouzi M, Romdhane LB (2021) Parallel social behavior-based algorithm for identification of influential users in social network. Appl Intell, pp 1–19

  104. Nair N, Narayanan S, Achan P et al (2022) Clinical note section identification using transfer learning. In: Proceedings of 6th International congress on information and communication technology, Springer, pp 533–542

  105. Nasar Z, Jaffry SW, Malik MK (2021) Named entity recognition and relation extraction: State-of-the-art. ACM Comput Surv (CSUR) 54(1):1–39

    Google Scholar 

  106. Nayel HA, ShashrekhaH L (2019) Integrating dictionary feature into a deep learning model for disease named entity recognition. arXiv:1911.01600

  107. Neumann M, King D, Beltagy I et al (2019) ScispaCy: fast and robust models for biomedical natural language processing. In: Proceedings of the 18th BioNLP workshop and shared task, BioNLP@ACL 2019, Florence, Italy, August 1, 2019. Association for computational linguistics, pp 319–327. https://doi.org/10.18653/v1/w19-5034

  108. Ni J, Delaney B, Florian R (2015) Fast model adaptation for automated section classification in electronic medical records. Stud Health Technol Inform 216:35–39

    PubMed  Google Scholar 

  109. Peters ME, Neumann M, Iyyer M et al (2018) Deep contextualized word representations. In: Proceedings of the 2018 Conference of the north american chapter of the association for computational linguistics: Human language technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, vol 1 (Long Papers). Association for computational linguistics, pp 2227–2237. https://doi.org/10.18653/v1/n18-1202

  110. Pomares-Quimbaya A, Kreuzthaler M, Schulz S (2019) Current approaches to identify sections within clinical narratives from electronic health records: a systematic review. BMC Med Res Methodol 19(1):155

    PubMed  PubMed Central  Google Scholar 

  111. Popejoy LL, Khalilia MA, Popescu M et al (2014) Quantifying care coordination using natural language processing and domain-specific ontology. J Am Med Inform Assoc 22(e1):e93–e103

    PubMed  PubMed Central  Google Scholar 

  112. Popovski G, Seljak BK, Eftimov T (2020) A survey of named-entity recognition methods for food information extraction. IEEE Access 8(31):586–31,594

    Google Scholar 

  113. Pradhan S, Elhadad N, Chapman WW et al (2014) SemEval-2014 task 7: Analysis of clinical text. In: SemEval@ COLING, pp 54–62

  114. Qi P, Zhang Y, Zhang Y et al (2020) Stanza: A python natural language processing toolkit for many human languages. In: Proceedings of the 58th Annual meeting of the association for computational linguistics: System Demonstrations, ACL 2020, Online, July 5-10, 2020. Association for computational linguistics, pp 101–108. https://doi.org/10.18653/v1/2020.acl-demos.14

  115. Quimbaya AP, Múnera AS, Rivera RAG et al (2016) Named entity recognition over electronic health records through a combined dictionary-based approach. Procedia Computer Science 100:55–61

    Google Scholar 

  116. Ramshaw LA, Marcus MP (1999) Text chunking using transformation-based learning. In: Natural language processing using very large corpora. Springer, pp 157–176

  117. Rebholz-Schuhman D, Jimeno-Yepes A, Li C et al (2011) Assessment of NER solutions against the first and second CALBC silver standard corpus. J Biomed Semantics 2(5):1–12

    Google Scholar 

  118. Roberts RJ (2001) PubMed central: The GenBank of the published literature

  119. Rochefort CM, Buckeridge DL, Forster AJ (2015) Accuracy of using automated methods for detecting adverse events from electronic health record data: a research protocol. Implement Sci 10(1):1–9

    Google Scholar 

  120. Rosario B, Hearst MA (2004) Classifying semantic relations in bioscience texts. In: Proceedings of the 42nd Annual meeting of the association for computational linguistics (ACL-04), pp 430–437

  121. Rundo L, Pirrone R, Vitabile S et al (2020) Recent advances of HCI in decision-making tasks for optimized clinical workflows and precision medicine. J Biomed Inf 108:103,479

    Google Scholar 

  122. Sadoughi N, Finley GP, Edwards E et al (2018) Detecting section boundaries in medical dictations: toward real-time conversion of medical dictations to clinical reports. In: International conference on speech and computer, Springer, pp 563–573

  123. Sandhya P, Kantesaria ML (2020) Named entity recognition in document summarization. In: Trends and applications of text summarization techniques. IGI Global. 125–149

  124. Shen J, Robertson N (2021) Bbas: Towards large scale effective ensemble adversarial attacks against deep neural network learning. Inf Sci 569:469–478

    Google Scholar 

  125. Shi J, Li W, Yang Y et al (2021) Automated concern exploration in pandemic Situations-COVID-19 as a use case. In: Pacific rim knowledge acquisition workshop, springer, pp 178–185

  126. Shi J, Li W, Yongchareon S et al (2022) Graph-based joint pandemic concern and relation extraction on twitter. Exp Syst Appl 195(116):538. https://doi.org/10.1016/j.eswa.2022.116538

    Google Scholar 

  127. Sohrab MG, Duong K, Miwa M et al (2020) BENNERD: a neural named entity linking system for COVID-19. In: Proceedings of the 2020 Conference on empirical methods in natural language processing: System demonstrations, pp 182–188

  128. Song HJ, Jo BC, Park CY et al (2018) Comparison of named entity recognition methodologies in biomedical documents. Biomed Eng Online 17(2):1–14

    Google Scholar 

  129. Sorgente A, Vettigli G, Mele F (2013) Automatic extraction of cause-effect relations in natural language text. DART@ AI* IA 2013:37–48

    Google Scholar 

  130. Stubbs A, Kotfila C, Uzuner Ö (2015) Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task track 1. J Biomed Inf 58:S11–S19

    Google Scholar 

  131. Stubbs A, Kotfila C, Xu H et al (2015) Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task track 2. J Biomed Inform 58:S67–S77

    PubMed  PubMed Central  Google Scholar 

  132. Sui Y, Bu F, Hu Y et al (2022) Trigger-GNN: a Trigger-Based graph neural network for nested named entity recognition. 2204.05518

  133. Sun Q, Bhatia P (2021) Neural entity recognition with gazetteer based fusion. In: Findings of the association for computational linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021, Findings of ACL, vol ACL/IJCNLP 2021. Association for computational linguistics, pp 3291–3295. https://doi.org/10.18653/v1/2021.findings-acl.291

  134. Sun W, Cai Z, Li Y et al (2018) Data processing and text mining technologies on electronic medical records: a review. J Healthcare Eng

  135. Sun W, Cai Z, Liu F et al (2017) A survey of data mining technology on electronic medical records. In: 2017 IEEE 19th International conference on e-health networking, applications and services (Healthcom), IEEE, pp 1–6

  136. Suominen HJ, Salakoski TI (2010) Supporting communication and decision making in finnish intensive care with language technology. J Healthcare Eng 1(4):595–614

    Google Scholar 

  137. Tang B, Cao H, Wu Y et al (2013) Recognizing clinical entities in hospital discharge summaries using structural support vector machines with word representation features. In: BMC Medical informatics and decision making, BioMed Central. 1–10

  138. Tchraktchiev D, Angelova G, Boytcheva S et al (2011) Completion of structured patient descriptions by semantic mining. In: Patient safety informatics. IOS Press, pp 260–269

  139. Tepper M, Capurro D, Xia F et al (2012) Statistical section segmentation in free-text clinical records. In: Lrec, pp 2001–2008

  140. Tran T, Kavuluru R (2019) Distant supervision for treatment relation extraction by leveraging MeSH subheadings. Artif Intell Med 98:18–26

    PubMed  PubMed Central  Google Scholar 

  141. Tran V, Tran VH, Nguyen P et al (2021) CovRelex: a COVID-19 retrieval system with relation extraction. In: Proceedings of the 16th Conference of the european chapter of the association for computational linguistics: System demonstrations, pp 24–31

  142. Uzuner Ö, Solti I, Cadag E (2010) Extracting medication information from clinical text. J Am Med Inform Assoc 17(5):514–518

    PubMed  PubMed Central  Google Scholar 

  143. Uzuner Ö, South BR, Shen S et al (2011) 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc 18 (5):552–556

    PubMed  PubMed Central  Google Scholar 

  144. Vunikili R, Supriya H, Marica VG et al (2020) Clinical NER using spanish BERT embeddings. In: IberLEF@ SEPLN, pp 505–511

  145. Wang L, Foer D, MacPhaul E et al (2021) PASCLex: a comprehensive Post-Acute sequelae of COVID-19 (PASC) symptom lexicon derived from electronic health record clinical notes. J Biomed Inf, p 103951

  146. Wang Y, Fu S, Shen F et al (2020) The 2019 n2c2/ohnlp track on clinical semantic textual similarity: overview. JMIR Med Inform 8(11):e23,375

    Google Scholar 

  147. Wang P, Hao T, Yan J et al (2017) Large-scale extraction of drug–disease pairs from the medical literature. J Assoc Inform Sci Technol 68(11):2649–2661

    CAS  Google Scholar 

  148. Wang X, Hripcsak G, Markatou M et al (2009) Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: A feasibility study. J Am Med Inform Assoc 16(3):328–337

    PubMed  PubMed Central  Google Scholar 

  149. Wang S, Ren F, Lu H (2018) A review of the application of natural language processing in clinical medicine. In: 2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA), pp 2725–2730

  150. Wang Y, Wang L, Rastegar-Mojarad M et al (2018) Clinical information extraction applications: a literature review. J Biomed Inform 77:34–49

    PubMed  Google Scholar 

  151. Wei WQ, Feng Q, Jiang L et al (2014) Characterization of statin dose response in electronic medical records. Clin Pharmacol Ther 95(3):331–338

    CAS  PubMed  Google Scholar 

  152. Wei Q, Ji Z, Si Y et al (2019) Relation extraction from clinical narratives using pre-trained language models. In: AMIA annual symposium proceedings, American medical informatics association, p 1236

  153. Weiskopf NG, Hripcsak G, Swaminathan S et al (2013) Defining and measuring completeness of electronic health records for secondary use. J Biomed Inform 46(5):830–836

    PubMed  Google Scholar 

  154. Wu Y, Jiang M, Xu J et al (2017) Clinical named entity recognition using deep learning models. In: AMIA Annual symposium proceedings, american medical informatics association, p 1812

  155. Xu J, Gan L, Cheng M et al (2018) Unsupervised medical entity recognition and linking in chinese online medical text. J Healthcare Eng, p 2018

  156. Yang Z, Dai Z, Yang Y et al (2019) Xlnet: Generalized autoregressive pretraining for language understanding. Adv Neural Inf Process Syst, p 32

  157. Yang J, Han SC, Poon J (2021) A survey on extraction of causal relations from natural language text. arXiv:2101.06426

  158. Yang Z, Lin H, Li Y (2008) Exploiting the performance of dictionary-based bio-entity name recognition in biomedical literature. Comput Biol Chem 32(4):287–291

    CAS  PubMed  Google Scholar 

  159. Yang X, Yu Z, Guo Y et al (2021) Clinical relation extraction using transformer-based models. arXiv:2107.08957

  160. Yang X, Zhang H, He X et al (2020) Extracting family history of patients from clinical narratives: exploring an end-to-end solution with deep learning models. JMIR Med Inform 8(12):e22,982

    Google Scholar 

  161. Zhang R, Chu F, Chen D et al (2018) A text structuring method for chinese medical text based on temporal information. Int J Environ Res Public Health 15(3):402

    PubMed  PubMed Central  Google Scholar 

  162. Zhang S, Elhadad N (2013) Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts. J Biomed Inform 46 (6):1088–1098

    PubMed  Google Scholar 

  163. Zhang T, Huang Z, Wang Y et al (2022) Information extraction from the text data on traditional chinese medicine: A review on tasks, challenges, and methods from 2010 to 2021. Evidence-Based Complementary and Alternative Medicine

  164. Zhang Y, Yan X, Gao X et al (2016) Demand analysis of decision support system of grass-roots health. Chinese Gen Pract 19:2636–2639. https://doi.org/10.3969/j.issn.1007-9572.2016.22.005

    Google Scholar 

  165. Zhao X, Ding H, Feng Z (2021) GLaRA: graph-based labeling rule augmentation for weakly supervised named entity recognition. In: Proceedings of the 16th Conference of the european chapter of the association for computational linguistics: Main Volume, EACL 2021, Online, April 19 - 23, 2021. Association for computational linguistics, pp 3636–3649. https://doi.org/10.18653/v1/2021.eacl-main.318

  166. Zheng C, Rashid N, Koblick R et al (2015) Medication extraction from electronic clinical notes in an integrated health system: a study on aspirin use in patients with nonvalvular atrial fibrillation. Clin Ther 37(9):2048–2058

    PubMed  Google Scholar 

  167. Zhou J, Fu Bq (2018) The research on gene-disease association based on text-mining of pubmed. BMC bioinformatics 19(1):1–8

    CAS  Google Scholar 

  168. Zhou Y, Ju C, Caufield JH et al (2021) Clinical named entity recognition using contextualized token representations. arXiv:2106.12608

  169. Zweigenbaum P, Deléger L, Lavergne T et al (2013) A supervised abbreviation resolution system for medical text. In: CLEF (Working Notes)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Yassine Landolsi.

Ethics declarations

Conflict of Interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Lobna Hlaoua and Lotfi Ben Romdhane are contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Landolsi, M.Y., Hlaoua, L. & Romdhane, L.B. Extracting and structuring information from the electronic medical text: state of the art and trendy directions. Multimed Tools Appl 83, 21229–21280 (2024). https://doi.org/10.1007/s11042-023-15080-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15080-y

Keywords

Navigation