Character-Based Deep Learning Approaches for Clinical Named Entity Recognition: A Comparative Study Using Chinese EHR Texts

Wu, Jun; Shao, Dan-rui; Guo, Jia-hang; Cheng, Yao; Huang, Ge

doi:10.1007/978-3-030-34482-5_28

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11924))

Included in the following conference series:

International Conference on Smart Health

956 Accesses
2 Citations

Abstract

Previous studies on clinical sequence labeling require large amounts of task specific knowledge in the form of handcrafted features. Using latest development in representation learning, this paper introduces BERT embedding as character based pretrained model and incorporates it with three competing deep learning models (CNN-LSTM, Bi-LSTM and Bi-LSTM-CRF) to extract clinical entities from electronic health records. A comparative evaluation based on CCKS-2017 task 2 benchmark dataset reveals that: (1) BERT embedding not only facilitates improving performance of clinical NER tasks but also acts as good candidate for building end-to-end NER model requiring no feature engineering from Chinese EHR. (2) Bi-LSTM-CRF has the highest performance, i.e., 93% F1 scores when it uses BERT embedding. This paper may enhance our understanding of how to use BERT embedding in clinical NER researches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Dataset available at http://www.ccks2017.com/en/index.php/sharedtask/.
2.
Accessed at https://openbayes.com/.
3.
BERT-Base, Chinese is available at https://github.com/google-research/bert.

References

Stubbs, A., Kotfila, C., Xu, H., Uzuner, Ö.: Identifying risk factors for heart disease over time: overview of 2014 i2b2/UTHealth shared task Track 2. J. Biomed. Inf. 58, S67–S77 (2015)
Article Google Scholar
Jung, K., LePendu, P., Iyer, S., Bauer-Mehren, A., Percha, B., Shah, N.H.: Functional evaluation of out-of-the-box text-mining tools for data-mining tasks. J. Am. Med. Inf. Assoc. 22, 121–131 (2015)
Google Scholar
Ben Abacha, A., Zweigenbaum, P.: Medical entity recognition: a comparison of semantic and statistical methods. In: Proceedings BioNLP 2011 Work, pp. 56–64(2011)
Google Scholar
Skeppstedt, M., Kvist, M., Nilsson, G.H., Dalianis, H.: Automatic recognition of disorders, an annotation and machine learning study findings, pharmaceuticals and body structures from clinical text. J. Biomed. Inf. 49, 148–158 (2014)
Article Google Scholar
Chen, Y.K., Lasko, T.A., Mei, Q.Z., Denny, J.C., Xu, H.: A study of active learning methods for named entity recognition. J. Biomed. Inform. 58, 11–18 (2016)
Article Google Scholar
Erik, F., Sang, T.K., Meulder, F.D.: Introduction to the conll-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, vol. 4, pp. 142–147. Association for Computational Linguistics (2003)
Google Scholar
Ma, X., Xia, F.: Unsupervised dependency parsing with transferring distribution via parallel guidance and entropy regularization. In: Proceedings of ACL-2014, pp. 1337–1348, Baltimore, June 2014
Google Scholar
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Networks 5(2), 157–166 (1994)
Article Google Scholar
Passos, A., Kumar, V., McCallum, A.: Lexicon infused phrase embeddings for named entity resolution. In: Proceedings of CoNLL-2014, pp. 78–86, Ann Arbor, June 2014
Google Scholar
Xu, K., Yang, Z., Kang, P., Wang, Q., et al.: Document-level attention-based Bi LSTM-CRF incorporating disease dictionary for disease named entity recognition. Comput. Biol. Med. 108, 122–132 (2019)
Article Google Scholar
Na, S.H., Kim, H., Min, J., et al.: Improving LSTM CRFs using character-based compositions for Korean named entity recognition. Comput. Speech Lang. 54, 106–121 (2019)
Article Google Scholar
Shi, X., Chen, Z., Wang, H., et al.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting (2015)
Google Scholar
Sainath, T.N., Vinyals, O., Senior, A., et al.: Convolutional, long short-term memory, fully connected deep neural networks. In: International Conference on Acoustics (2015)
Google Scholar
Unanue, I.J., Borzeshi, E.Z., Piccardi, M.: Recurrent neural networks with specialized word embeddings for health domain named-entity recognition. J. Biomed. Inform. 76, 102–109 (2017)
Article Google Scholar
Donahue, J., Hendricks, L.A., Guadarrama, S.: Long-term recurrent convolutional networks for visual recognition and description. In: AB Into Calculation of the Structures and Properties of Molecules (2015)
Google Scholar
Vinyals, O., Toshev, A., Bengio, S., et al.: Show and tell: a neural image caption generator (2014)
Google Scholar
Ruch, P., Baud, R., Geissbuhler, A.: Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record. Artif. Intell. Med. 29, 169–184 (2003)
Article Google Scholar
Liu, H., Mi, X., Li, Y.: Smart deep learning-based wind speed prediction model using wavelet packet decomposition, convolutional neural network and convolutional long short-term memory network. Energy Convers. Manage. 166, 120–131 (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Economics and Management, Beijing University of Posts and Telecommunications, Beijing, China
Jun Wu, Dan-rui Shao, Jia-hang Guo & Yao Cheng
School of Software, Beijing University of Posts and Telecommunications, Beijing, China
Ge Huang

Authors

Jun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Dan-rui Shao
View author publications
You can also search for this author in PubMed Google Scholar
Jia-hang Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yao Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Ge Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Wu .

Editor information

Editors and Affiliations

The University of Arizona, Tucson, AZ, USA
Hsinchun Chen
College of Management, The University of Arizona, Tucson, AZ, USA
Daniel Zeng
University of Science and Technology Beijing, Beijing, China
Xiangbin Yan
Tsinghua University, Beijing, China
Chunxiao Xing

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, J., Shao, Dr., Guo, Jh., Cheng, Y., Huang, G. (2019). Character-Based Deep Learning Approaches for Clinical Named Entity Recognition: A Comparative Study Using Chinese EHR Texts. In: Chen, H., Zeng, D., Yan, X., Xing, C. (eds) Smart Health. ICSH 2019. Lecture Notes in Computer Science(), vol 11924. Springer, Cham. https://doi.org/10.1007/978-3-030-34482-5_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-34482-5_28
Published: 02 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34481-8
Online ISBN: 978-3-030-34482-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics