Skip to main content

Character-Based Deep Learning Approaches for Clinical Named Entity Recognition: A Comparative Study Using Chinese EHR Texts

  • Conference paper
  • First Online:
Smart Health (ICSH 2019)

Abstract

Previous studies on clinical sequence labeling require large amounts of task specific knowledge in the form of handcrafted features. Using latest development in representation learning, this paper introduces BERT embedding as character based pretrained model and incorporates it with three competing deep learning models (CNN-LSTM, Bi-LSTM and Bi-LSTM-CRF) to extract clinical entities from electronic health records. A comparative evaluation based on CCKS-2017 task 2 benchmark dataset reveals that: (1) BERT embedding not only facilitates improving performance of clinical NER tasks but also acts as good candidate for building end-to-end NER model requiring no feature engineering from Chinese EHR. (2) Bi-LSTM-CRF has the highest performance, i.e., 93% F1 scores when it uses BERT embedding. This paper may enhance our understanding of how to use BERT embedding in clinical NER researches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Dataset available at http://www.ccks2017.com/en/index.php/sharedtask/.

  2. 2.

    Accessed at https://openbayes.com/.

  3. 3.

    BERT-Base, Chinese is available at https://github.com/google-research/bert.

References

  1. Stubbs, A., Kotfila, C., Xu, H., Uzuner, Ö.: Identifying risk factors for heart disease over time: overview of 2014 i2b2/UTHealth shared task Track 2. J. Biomed. Inf. 58, S67–S77 (2015)

    Article  Google Scholar 

  2. Jung, K., LePendu, P., Iyer, S., Bauer-Mehren, A., Percha, B., Shah, N.H.: Functional evaluation of out-of-the-box text-mining tools for data-mining tasks. J. Am. Med. Inf. Assoc. 22, 121–131 (2015)

    Google Scholar 

  3. Ben Abacha, A., Zweigenbaum, P.: Medical entity recognition: a comparison of semantic and statistical methods. In: Proceedings BioNLP 2011 Work, pp. 56–64(2011)

    Google Scholar 

  4. Skeppstedt, M., Kvist, M., Nilsson, G.H., Dalianis, H.: Automatic recognition of disorders, an annotation and machine learning study findings, pharmaceuticals and body structures from clinical text. J. Biomed. Inf. 49, 148–158 (2014)

    Article  Google Scholar 

  5. Chen, Y.K., Lasko, T.A., Mei, Q.Z., Denny, J.C., Xu, H.: A study of active learning methods for named entity recognition. J. Biomed. Inform. 58, 11–18 (2016)

    Article  Google Scholar 

  6. Erik, F., Sang, T.K., Meulder, F.D.: Introduction to the conll-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, vol. 4, pp. 142–147. Association for Computational Linguistics (2003)

    Google Scholar 

  7. Ma, X., Xia, F.: Unsupervised dependency parsing with transferring distribution via parallel guidance and entropy regularization. In: Proceedings of ACL-2014, pp. 1337–1348, Baltimore, June 2014

    Google Scholar 

  8. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Networks 5(2), 157–166 (1994)

    Article  Google Scholar 

  9. Passos, A., Kumar, V., McCallum, A.: Lexicon infused phrase embeddings for named entity resolution. In: Proceedings of CoNLL-2014, pp. 78–86, Ann Arbor, June 2014

    Google Scholar 

  10. Xu, K., Yang, Z., Kang, P., Wang, Q., et al.: Document-level attention-based Bi LSTM-CRF incorporating disease dictionary for disease named entity recognition. Comput. Biol. Med. 108, 122–132 (2019)

    Article  Google Scholar 

  11. Na, S.H., Kim, H., Min, J., et al.: Improving LSTM CRFs using character-based compositions for Korean named entity recognition. Comput. Speech Lang. 54, 106–121 (2019)

    Article  Google Scholar 

  12. Shi, X., Chen, Z., Wang, H., et al.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting (2015)

    Google Scholar 

  13. Sainath, T.N., Vinyals, O., Senior, A., et al.: Convolutional, long short-term memory, fully connected deep neural networks. In: International Conference on Acoustics (2015)

    Google Scholar 

  14. Unanue, I.J., Borzeshi, E.Z., Piccardi, M.: Recurrent neural networks with specialized word embeddings for health domain named-entity recognition. J. Biomed. Inform. 76, 102–109 (2017)

    Article  Google Scholar 

  15. Donahue, J., Hendricks, L.A., Guadarrama, S.: Long-term recurrent convolutional networks for visual recognition and description. In: AB Into Calculation of the Structures and Properties of Molecules (2015)

    Google Scholar 

  16. Vinyals, O., Toshev, A., Bengio, S., et al.: Show and tell: a neural image caption generator (2014)

    Google Scholar 

  17. Ruch, P., Baud, R., Geissbuhler, A.: Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record. Artif. Intell. Med. 29, 169–184 (2003)

    Article  Google Scholar 

  18. Liu, H., Mi, X., Li, Y.: Smart deep learning-based wind speed prediction model using wavelet packet decomposition, convolutional neural network and convolutional long short-term memory network. Energy Convers. Manage. 166, 120–131 (2018)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, J., Shao, Dr., Guo, Jh., Cheng, Y., Huang, G. (2019). Character-Based Deep Learning Approaches for Clinical Named Entity Recognition: A Comparative Study Using Chinese EHR Texts. In: Chen, H., Zeng, D., Yan, X., Xing, C. (eds) Smart Health. ICSH 2019. Lecture Notes in Computer Science(), vol 11924. Springer, Cham. https://doi.org/10.1007/978-3-030-34482-5_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34482-5_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34481-8

  • Online ISBN: 978-3-030-34482-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics