Skip to main content
Log in

Research of Clinical Named Entity Recognition Based on Bi-LSTM-CRF

  • Published:
Journal of Shanghai Jiaotong University (Science) Aims and scope Submit manuscript

Abstract

Electronic Medical Records (EMR) with unstructured sentences and various conceptual expressions provide rich information for medical information extraction. However, common Named Entity Recognition (NER) in Natural Language Processing (NLP) are not well suitable for clinical NER in EMR. This study aims at applying neural networks to clinical concept extractions. We integrate Bidirectional Long Short-Term Memory Networks (Bi-LSTM) with a Conditional Random Fields (CRF) layer to detect three types of clinical named entities. Word representations fed into the neural networks are concatenated by character-based word embeddings and Continuous Bag of Words (CBOW) embeddings trained both on domain and non-domain corpus. We test our NER system on i2b2/VA open datasets and compare the performance with six related works, achieving the best result of NER with F1 value 0.853 7. We also point out a few specific problems in clinical concept extractions which will give some hints to deeper studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. SAGER N, FRIEDMAN C, LYMAN M S. Review of medical language processing: computer management of narrative data [J]. Computational Linguistics, 1989, 15(3): 195–198.

    Google Scholar 

  2. UZUNER O, SOUTH B R, SHEN S, et al. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text [J]. Journal of the American Medical Informatics Association. 2011, 18(5): 552–556.

    Article  Google Scholar 

  3. CURRAN J R, CLARK S. Language independent NER using a maximum entropy tagger [C]//Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL. Edmonton, Canada: ACL, 2003: 164–167.

    Chapter  Google Scholar 

  4. TJONG KIM SANG E F, DE MEULDER F. Introduction to the CoNLL-2003 shared task: Language-Independent named entity recognition [C]//Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL. Edmonton, Canada: ACL, 2003: 142–147.

    Chapter  Google Scholar 

  5. COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch [J]. Journal of Machine Learning Research, 2011, 12(8): 2493–2537.

    MATH  Google Scholar 

  6. HUANG Z, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging [EB/OL]. (2015-08-19). [2017-06-21]. https://arxiv.org/pdf/1508.01991v1.pdf.

  7. LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural architectures for named entity recognition [C]//Proceedings of NAACL-2016, San Diego, US: ACL, 2016: 260–270.

    Google Scholar 

  8. HOCHREITER S, SCHMIDHUBER J. Long shortterm memory [J]. Neural Computation, 1997, 9(8): 1735–1780.

    Article  Google Scholar 

  9. LAFFERTY J, MCCALLUM A, PEREIRA F C N. Conditional random fields: Probabilistic models for segmenting and labeling sequence data [C]//Proceedings of the 18th International Conference on Machine Learning. Williamstown, US: IMLS, 2001: 282–289.

    Google Scholar 

  10. BOAG W, WACOME K, NAUMANN T, et al. CliNER: A lightweight tool for clinical named entity recognition [C]//AMIA Joint Summits on Clinical Research Informatics. San Francisco, CA: AMIA, 2015.

  11. DE BRUIJN B, CHERRY C, KIRITCHENKO S, et al. Machine-Learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010 [J]. Journal of the American Medical Informatics Association, 2011, 18(5): 557–562.

    Article  Google Scholar 

  12. WU Y H, XU J, JIANG M, et al. A study of neural word embeddings for named entity recognition in clinical text [C]//AMIA Annual Symposium Proceedings. 2015: 1326–1333.

    Google Scholar 

  13. JONNALAGADDA S, COHEN T, WU S, et al. Enhancing clinical concept extraction with distributional semantics [J]. Journal of Biomedical Informatics, 2012,45(1): 129–140.

    Article  Google Scholar 

  14. CHALAPATHY R, BORZESHI, E Z, PICCARDI M. Bidirectional LSTM-CRF for clinical concept extraction [EB/OL]. (2016-10-19). [2017-06-21]. https://arxiv.org/pdf/1610.05858.pdf.

  15. MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space [EB/OL]. (2013-09-07). [2017-06-21]. https://arxiv.org/pdf/1301.3781v3.pdf.

  16. BENGIO Y, SIMARD P, FRASCONI P. Learning long-term dependencies with gradient descent is difficult [J]. IEEE Transactions on Neural Networks, 1994, 5(2): 157–166.

    Article  Google Scholar 

  17. GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures [J]. Neural Networks, 2005, 18(5/6): 602–610.

    Article  Google Scholar 

  18. MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality [EB/OL]. (2013-10-16). [2017-06-21]. https://arxiv.org/pdf/1310.4546.pdf.

  19. FU X, ANANIADOU S. Improving the extraction of clinical concepts from clinical records [C]//Proceedings of the 4th Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing. Reykjavik, Iceland: ELRA, 2014.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Qin  (秦颖).

Additional information

Foundation item: the National Social Science Foundation of China (No. 17BYY047)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qin, Y., Zeng, Y. Research of Clinical Named Entity Recognition Based on Bi-LSTM-CRF. J. Shanghai Jiaotong Univ. (Sci.) 23, 392–397 (2018). https://doi.org/10.1007/s12204-018-1954-5

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12204-018-1954-5

Key words

CLC number

Document code

Navigation