Electronic medical record (EMR) containing rich biomedical information has a great potential in disease diagnosis and biomedical research. However, the EMR information is usually in the form of unstructured text, which increases the use cost and hinders its applications. In this work, an effective named entity recognition (NER) method is presented for information extraction on Chinese EMR, which is achieved by word embedding bootstrapped deep active learning to promote the acquisition of medical information from Chinese EMR and to release its value. In this work, deep active learning of bi-directional long short-term memory followed by conditional random field (Bi-LSTM+CRF) is used to capture the characteristics of different information from labeled corpus, and the word embedding models of contiguous bag of words and skip-gram are combined in the above model to respectively capture the text feature of Chinese EMR from unlabeled corpus. To evaluate the performance of above method, the tasks of NER on Chinese EMR with “medical history” content were used. Experimental results show that the word embedding bootstrapped deep active learning method using unlabeled medical corpus can achieve a better performance compared with other models.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
YE Q, SHU T. EMR-based evaluation of medical care quality: Status quo and trends [J]. Chinese Journal of Hospital Administration, 2018, 34(7): 560–563 (in Chinese).
TANG Q, YUAN J, MA Q. Implementation and application of paperless filing system for medical records based on electronic signature [J]. China Medical Devices, 2018, 33(9): 129–131 (in Chinese).
SUN W, CAI Z, LI Y, et al. Data processing and text mining technologies on electronic medical records: a review [J]. Journal of Healthcare Engineering, 2018, 2018: 4302425.
LIANG H, TSUI B Y, NI H, et al. Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence [J]. Nature Medicine, 2019, 25(3): 433–438.
DENIS M. U.K. clinical record interactive search(cris) [J]. Alzheimer’s & Dementia, 2017, 13(7): P1223.
KARYSTIANIS G, NEVADO A J, KIM C H, et al. Automatic mining of symptom severity from psychiatric evaluation notes [J]. International Journal of Methods in Psychiatric Research, 2018, 27(1): e1602.
CAMBRIA E, WHITE B. Jumping NLP curves: A review of natural language processing research [J]. IEEE Computational Intelligence Magazine, 2014, 9(2): 48–57.
YAO C, QU Y, JIN B, et al. A convolutional neural network model for online medical guidance [J]. IEEE Access, 2016, 4: 4094–4103.
DONG X, QIAN L, GUAN Y, et al. A multiclass classification method based on deep learning for named entity recognition in electronic medical records [C]//2016 New York Scientific Data Summit (NYSDS). Piscataway, NJ, USA: IEEE, 2016: 1–10.
HAMMERTON J. Named entity recognition with long short-term memory [C]//Proceedings of the Seventh Conference on Natural Language Learning at HLTNAACL 2003-Volume 4. Morristown, NJ, USA: ACL, 2003: 172–175.
WANG P, QIAN Y, SOONG F K, et al. A unified tagging solution: Bidirectional LSTM recurrent neural network with word embedding [DB/OL]. (2015-11-01) [2020-08-01]. https://arxiv.org/abs/1511.00215.
DONG X, CHOWDHURY S, QIAN L, et al. Transfer bi-directional LSTM RNN for named entity recognition in Chinese electronic medical records [C]//2017 IEEE 19th International Conference on E-Health Networking, Applications and Services (Healthcom). Piscataway, NJ, USA: IEEE, 2017: 1–4.
LU S, DOU Z, WEN J. Research on structural data extraction in surgical cases [J]. Chinese Journal of Computers, 2019, 42(12): 2754–2768.
GLIGIC L, KORMILITZIN A, GOLDBERG P, et al. Named entity recognition in electronic health records using transfer learning bootstrapped Neural Networks [J]. Neural Networks, 2020, 121: 132–139.
XU G, MENG Y, QIU X, et al. Sentiment analysis of comment texts based on BiLSTM [J]. IEEE Access, 2019, 7: 51522–51532.
WINTAKA D C, BIJAKSANA M A, ASROR I. Named-entity recognition on Indonesian tweets using bidirectional LSTM-CRF [J]. Procedia Computer Science, 2019, 157: 221–228.
MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality [DB/OL]//(2013-10-16) [2020-08-01]. https://arxiv.org/abs/1310.4546.
LI J, ZHAO S, YANG J, et al. WCP-RNN: A novel RNN-based approach for Bio-NER in Chinese EMRs [J]. Journal of Supercomputing, 2020, 76: 1450–1467.
MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space [DB/OL]. (2013-09-07) [2020-08-01]. https://arxiv.org/abs/1301.3781.
DYER C, BALLESTEROS M, LING W, et al. Transition-based dependency parsing with stack long short-term memory [DB/OL]. (2015-05-29) [2020-08-01]. https://arxiv.org/abs/1505.08075.
YANG J, YU Q, GUAN Y, et al. An overview of research on electronic medical record oriented named entity recognition and entity relation extraction [J]. Acta Automatica Sinica, 2014, 40(8): 1537–1562.
LINDBERG D A B, HUMPHREYS B L, MCCRAY A T. The unified medical language system [J]. Methods of Information in Medicine, 1993, 32(4): 281–291.
RATINOV L, ROTH D. Design challenges and misconceptions in named entity recognition [C]//Proceedings of the Thirteenth Conference on Computational Natural Language Learning. Morristown, NJ, USA: ACL, 2009: 147–155.
KINGMA D P, BA J. Adam: A method for stochastic optimization [DB/OL]. (2014-12-22) [2020-08-01]. https://arxiv.org/abs/1412.6980.
SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: A simple way to prevent neural networks from overfitting [J]. Journal of Machine Learning Research, 2014, 15: 1929–1958.
Foundation item: the Artificial Intelligence Innovation and Development Project of Shanghai Municipal Commission of Economy and Information (No. 2019-RGZN-01081)
About this article
Cite this article
Ma, Q., Cen, X., Yuan, J. et al. Word Embedding Bootstrapped Deep Active Learning Method to Information Extraction on Chinese Electronic Medical Record. J. Shanghai Jiaotong Univ. (Sci.) 26, 494–502 (2021). https://doi.org/10.1007/s12204-021-2285-5
- deep active learning
- named entity recognition (NER)
- information extraction
- word embedding
- Chinese electronic medical record (EMR)
- R 319