Skip to main content
Log in

A parallel computing-based Deep Attention model for named entity recognition

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Named entity recognition (NER) is an important task in natural language processing and has been widely studied. In recent years, end-to-end NER with bidirectional long short-term memory (BiLSTM) has received more and more attention. However, it remains a major challenge for BiLSTM to parallel computing, long-range dependencies and single feature space mapping. We propose a deep neural network model which is based on parallel computing self-attention mechanism to address these problems. We only use a small number of BiLSTMs to capture the time series of texts and then make use of self-attention mechanism that allows parallel computing to capture long-range dependencies. Experiments on two NER datasets show that our model is superior in quality and takes less training time. Our model achieves an F1 score of 92.63% on the SIGHAN bakeoff 2006 MSRA portion for Chinese NER, improving over the existing best results by over 1.4%. On the CoNLL2003 shared task portion for English NER, our model achieves an F1 score of 92.17%, which outperforms the previous state-of-the-art results by 0.91%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. The toolkit is developed by Tsinghua University natural language processing laboratory. Refer to http://thulac.thunlp.org/ for more details.

  2. https://code.google.com/p/word2vec/.

  3. http://www.sogou.com/labs/resource/ca.php.

  4. https://dumps.wikimedia.org/zhwiki/latest.

References

  1. Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733

  2. Dernoncourt F, Lee JY, Szolovits P (2017) Neuroner: an easy-to-use program for named-entity recognition based on neural networks. arXiv preprint arXiv:1705.05487

  3. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

  4. Dong C, Zhang J, Zong C, Hattori M, Di H (2016) Character-based LSTM-CRF with radical-level features for Chinese named entity recognition. In: Natural Language Understanding and Intelligent Applications. Springer, pp 239–250

  5. Gu L, Han Y, Wang C, Chen W, Jun J, Yuan X (2018) Module overlapping structure detection in PPI using an improved link similarity-based Markov clustering algorithm. Neural Comput Appl 1:1–10

    Google Scholar 

  6. Habibi M, Weber L, Neves M, Wiegandt DL, Leser U (2017) Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics 33(14):i37

    Article  Google Scholar 

  7. Kim Y, Jernite Y, Sontag D, Rush AM (2016) Character-aware neural language models. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI Press, pp 2741–2749

  8. Kuru O, Can OA, Yuret D (2016) Charner: character-level named entity recognition. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. pp 911–921

  9. Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360

  10. Lei Ba J, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450

  11. Lin Z, Feng M, Santos CNd, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130

  12. Lu Y, Zhang Y, Ji D (2016) Multi-prototype Chinese character embedding. In: Proceedings of the tenth international conference on language resources and evaluation, pp 855–859

  13. Luo G, Huang X, Lin CY, Nie Z (2016) Joint entity recognition and disambiguation. In: Conference on Empirical Methods in Natural Language Processing. pp 879–888

  14. Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional lstm-cnns-crf. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). vol 1, pp 1064–1074

  15. Mnih V, Heess N, Graves A (2014) Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp 2204–2212

  16. Rao H, Shi X, Rodrigue AK, Feng J, Xia Y, Elhoseny M, Yuan X, Gu L (2019) Feature selection based on artificial bee colony and gradient boosting decision tree. Appl Soft Comput 74(1):634–642

    Article  Google Scholar 

  17. Ratinov L, Roth D (2009) Conll ’09 design challenges and misconceptions in named entity recognition. In: CoNLL ’09: Proceedings of the Thirteenth Conference on Computational Natural Language Learning. pp 147–155

  18. Santos CD, Zadrozny B (2014) Learning character-level representations for part-of-speech tagging. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14). pp 1818–1826

  19. Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C (2017) Disan: directional self-attention network for RNN/CNN-free language understanding. arXiv preprint arXiv:1709.04696

  20. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  21. Stephen H, Du BK, Johnson BR (2012) The homemade alternative: Teaching human neurophysiology with instrumentation made (almost) from scratch. J Undergrad Neurosci Educ 11(1):A161–A168

    Google Scholar 

  22. Stevenson S, Carreras X (2009) Proceedings of the thirteenth conference on computational natural language learning: shared task. In: Thirteenth Conference on Computational Natural Language Learning: Shared Task

  23. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I  (2017) Attention is all you need. In: Advances in Neural Information Processing Systems, pp 5998–6008

  24. Wang C, Chen W, Xu B (2017) Named entity recognition with gated convolutional neural networks. In: Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. Springer, pp 110–121

  25. Yadav V, Sharp R, Bethard S (2018) Deep affix features improve neural named entity recognizers. In: Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics. pp 167–172

  26. Yang YS, Zhang M, Chen W, Zhang W, Wang H, Zhang M (2018) Adversarial learning for Chinese NER from crowd annotations. arXiv preprint arXiv:1801.05147

  27. Yang Z, Salakhutdinov R, Cohen WW (2017) Transfer learning for sequence tagging with hierarchical recurrent networks. arXiv preprint arXiv:1703.06345

  28. Yuan X, Xie L, Abouelenien M (2002) A regularized ensemble framework of deep learning for cancer detection from multiclass, imbalanced training data. Pattern Recognit 77:160–172

    Article  Google Scholar 

  29. Yuan X, Buckles BP, Yuan Z, Zhang J (2018) Mining negative association rules. In: Proceedings ISCC 2002 Seventh International Symposium on Computers and Communications. pp 623–628

  30. Zhou J, Qu W, Zhang F (2013) Chinese named entity recognition via joint identification and categorization. Chin J Electron 22(2):225–230

    Google Scholar 

  31. Zhou J, Xu W (2015) End-to-end learning of semantic role labeling using recurrent neural networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). vol 1, pp 1127–1137

Download references

Acknowledgements

We thank the reviewers for their thoughtful comments and suggestions. This work was supported by the National Natural Science Foundation of China (Grant Nos. 31771679, 31371533, 31671589), the Special Fund for Key Program of Science and Technology of Anhui Province of China (Grant Nos. 16030701092, kJ2016A836, 18030901034), the Key Laboratory of Agricultural Electronic Commerce (Grant Nos. AEC2018003, AEC2018006) and Hefei Major Research Project of Key Technology (Grant No. J2018G14).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lichuan Gu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Yang, N., Jiang, Y. et al. A parallel computing-based Deep Attention model for named entity recognition. J Supercomput 76, 814–830 (2020). https://doi.org/10.1007/s11227-019-02985-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-019-02985-5

Keywords

Navigation