Skip to main content
Log in

Lexicon enhanced Chinese named entity recognition with pointer network

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In recent time, lexicon-based LSTM and pre-training language models are combined to explore the Chinese Named Entity Recognition (NER) and achieve the current state-of-the-art (SOTA) performance on several Chinese benchmark datasets. However, existing lexicon-based models only conform lexicon features through shallow and randomly initialized coding layers and do not integrate them into the bottom layer of the pre-training language model to mine the deep lexicon knowledge. To address the above issue, we propose a novel BERT-based Enhanced Lexicon Adapter (BLA) model that fuses external lexicon feature into the pre-training language model BERT in-depth. Specifically, the external lexicon knowledge is integrated into the deep Transformer layers of BERT by the lexicon adapter mechanism. With the comparison of existing methods, our model achieves the genuine deep fusion of the lexicon knowledge and BERT representation, effectively obtaining entity boundaries and word information.Besides, given the value of high-level global semantic features in alleviating word ambiguity and segmenting precisely the entity boundary in Chinese NER, transforming the sequence labeling task into sequence generation task provides the new cogitation for extracting global semantic features. Therefore, we explore the strategies of local lexicon information’s fusion and global semantic features extraction for entity category labeling. Specifically, we utilize the sequence-to-sequence (Seq2Seq) framework with pointer network as the prominent model architecture, in which the pointing function implements a custom attention mechanism and models different interactions between the source text and the semantic embedding by the generated probability \(p_{point}\). Furthermore, the decoder with the pointer mechanism generates the target sequence autoregressively. Experiments on several different benchmark Chinese datasets indicate that the proposed model achieves remarkable improvement compared with the current lexicon-based methods, and the results significantly outperform the current SOTA models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://catalog.ldc.upenn.edu/LDC2013T19.

  2. In Tables 4, 5 and 6, \(*\) indicates that external labeled tag is exploited for semi-supervised learning. \(\dag \) denotes that it also capitalizes the discrete features.

References

  1. Bunescu RC, Mooney R (2005) A shortest path dependency kernel for relation extraction. In: HLT/EMNLP

  2. Cao P, Chen Y, Liu K, Zhao J, Liu S (2018) Adversarial transfer learning for chinese named entity recognition with self-attention mechanism. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 182–192

  3. Che W, Wang M, Manning CD, Liu T (2013) Named entity recognition with bilingual constraints. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 52–62

  4. Che W, Wang M, Manning C.D, Liu T (2013) Named entity recognition with bilingual constraints. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 52–62. Association for Computational Linguistics, Atlanta, Georgia. https://www.aclweb.org/anthology/N13-1006

  5. Chen A, Peng F, Shan R, Sun G (2006) Chinese named entity recognition with conditional probabilistic models. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 173–176

  6. Chen X, Shi Z, Qiu X, Huang X (2017) Adversarial multi-criteria learning for Chinese word segmentation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1193–1203. Association for Computational Linguistics, Vancouver, Canada. 10.18653/v1/P17-1110. https://www.aclweb.org/anthology/P17-1110

  7. Chen Y, Xu L, Liu K, Zeng D, Zhao J (2015) Event extraction via dynamic multi-pooling convolutional neural networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 167–176

  8. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078

  9. Cichy RM, Kaiser D (2019) Deep neural networks as scientific models. Trends Cogn Sci 23(4):305–317

    Article  Google Scholar 

  10. Dai X, Karimi S, Hachey B, Paris C (2020) An effective transition-based model for discontinuous ner. arXiv preprint arXiv:2004.13454

  11. Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. north american chapter of the association for computational linguistics

  12. Diao, S., Bai, J., Song, Y., Zhang, T., Wang, Y.: Zen: Pre-training chinese text encoder enhanced by n-gram representations. arXiv preprint arXiv:1911.00720 (2019)

  13. Diefenbach D, López V, Singh K, Maret P (2017) Core techniques of question answering systems over knowledge bases: a survey. Knowl Inf Syst 55:529–569

    Article  Google Scholar 

  14. Ding R, Xie P, Zhang X, Lu W, Li L, Si L (2019) A neural multi-digraph model for chinese ner with gazetteers. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1462–1467

  15. Dogra V, Singh A, Verma S, Alharbi A, Alosaimi W (2021) Event study: advanced machine learning and statistical technique for analyzing sustainability in banking stocks. Mathematics 9(24):3319

    Article  Google Scholar 

  16. Dong C, Zhang J, Zong C, Hattori M, Di H (2016) Character-based lstm-crf with radical-level features for chinese named entity recognition. In: Natural Language Understanding and Intelligent Applications, pp. 239–250. Springer

  17. Duan H (2011) A study on features of the crfs-based chinese named entity recognition

  18. Elman JL (1990) Finding structure in time. Cogn Sci 14(2):179–211

    Article  Google Scholar 

  19. Fernández-González D, Gómez-Rodríguez C (2019) Left-to-right dependency parsing with pointer networks. arXiv preprint arXiv:1903.08445

  20. Gui T, Ma R, Zhang Q, Zhao L, Jiang YG, Huang X (2019) Cnn-based chinese ner with lexicon rethinking. In: IJCAI, pp. 4982–4988

  21. Gui T, Ma R, Zhang Q, Zhao L, Jiang YG, Huang X (2019) Cnn-based chinese ner with lexicon rethinking. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pp. 4982–4988. International Joint Conferences on Artificial Intelligence Organization . 10.24963/ijcai.2019/692. https://doi.org/10.24963/ijcai.2019/692

  22. Gui T, Zou Y, Zhang Q, Peng M, Fu J, Wei Z, Huang X.J (2019) A lexicon-based graph neural network for chinese ner. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 1040–1050

  23. He,H. Sun X (2016) F-score driven max margin neural network for named entity recognition in chinese social media. arXiv preprint arXiv:1611.04234

  24. He H, Sun X (2017) A unified model for cross-domain and semi-supervised named entity recognition in chinese social media. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31

  25. Hewitt J, Manning CD (2019) A structural probe for finding syntax in word representations. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4129–4138. Association for Computational Linguistics, Minneapolis, Minnesota . 10.18653/v1/N19-1419. https://aclanthology.org/N19-1419

  26. Houlsby N, Giurgiu A, Jastrzebski S, Morrone B, De Laroussilhe Q, Gesmundo A, Attariyan M, Gelly S (2019) Parameter-efficient transfer learning for nlp. In: International Conference on Machine Learning, pp. 2790–2799. PMLR

  27. Huang S, Sun X, Wang H (2017) Addressing domain adaptation for chinese word segmentation with global recurrent structure. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 184–193

  28. Huang S, Sun X, Wang H (2017) Addressing domain adaptation for Chinese word segmentation with global recurrent structure. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 184–193. Asian Federation of Natural Language Processing, Taipei, Taiwan . https://www.aclweb.org/anthology/I17-1019

  29. Karimi S, Metke-Jimenez A, Kemp M, Wang C (2015) Cadec: A corpus of adverse drug event annotations. J Biomed Inform 55:73–81

    Article  Google Scholar 

  30. Kim J.D, Ohta T, Tateisi Y, Tsujii J (2003) Genia corpus-a semantically annotated corpus for bio-textmining. Bioinformatics 19(suppl_1), i180–i182

  31. Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition

  32. Levow G.A (2006) The third international chinese language processing bakeoff: Word segmentation and named entity recognition. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 108–117

  33. Li H, Hagiwara M, Li Q, Ji H (2014) Comparison of the impact of word segmentation on name tagging for chinese and japanese. In: LREC, pp. 2532–2536

  34. Li X, Yan H, Qiu X, Huang X (2020) Flat: Chinese ner using flat-lattice transformer. arXiv preprint arXiv:2004.11795

  35. Li Z, Ding N, Liu Z, Zheng H, Shen Y (2019) Chinese relation extraction with multi-grained information and external linguistic knowledge. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4377–4386

  36. Liu L, Shang J, Xu F.F, Ren X, Gui H, Peng J, Han J (2017) Empower sequence labeling with task-aware neural language model. CoRR abs/1709.04109. http://arxiv.org/abs/1709.04109

  37. Liu W, Fu X, Zhang Y, Xiao W (2021) Lexicon enhanced chinese sequence labelling using bert adapter. arXiv preprint arXiv:2105.07148

  38. Liu W, Xu T, Xu Q, Song J, Zu Y (2019) An encoding strategy based word-character lstm for chinese ner. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 2379–2389

  39. Ma R, Peng M, Zhang Q, Huang X (2019) Simplify the usage of lexicon in chinese ner. arXiv preprint arXiv:1908.05969

  40. Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1064–1074. Association for Computational Linguistics, Berlin, Germany . 10.18653/v1/P16-1101. https://www.aclweb.org/anthology/P16-1101

  41. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546

  42. Mowery D.L, Velupillai S, South B.R, Christensen L, Martinez D, Kelly L, Goeuriot L, Elhadad N, Pradhan S, Savova G, et al (2014) Task 2: Share/clef ehealth evaluation lab 2014. In: Proceedings of CLEF 2014

  43. Muis A.O, Lu W (2018) Labeling gaps between words: Recognizing overlapping mentions with mention separators. arXiv preprint arXiv:1810.09073

  44. Peng N, Dredze M (2015) Named entity recognition for chinese social media with jointly trained embeddings. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 548–554

  45. Peng N, Dredze M (2016) Improving named entity recognition for chinese social media with word segmentation representation learning. arXiv preprint arXiv:1603.00786 (2016)

  46. Pfeiffer J, Vulić I, Gurevych I, Ruder S (2020) Mad-x: An adapter-based framework for multi-task cross-lingual transfer. arXiv preprint arXiv:2005.00052

  47. Pradhan S, Elhadad N, South BR, Martinez D, Christensen LM, Vogel A, Suominen H, Chapman WW, Savova GK (2013) Task 1: Share/clef ehealth evaluation lab 2013. In: CLEF (Working Notes), pp. 212–31

  48. Rebuffi SA, Bilen H, Vedaldi A (2017) Learning multiple visual domains with residual adapters. Advances in neural information processing systems 30

  49. Sanh V, Debut L, Chaumond J, Wolf T (2019) Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108

  50. See A, Liu P.J, Manning C.D (2017) Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368

  51. Sui D, Chen Y, Liu K, Zhao J, Liu S (2019) Leverage lexical knowledge for chinese named entity recognition via collaborative graph network. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3830–3840

  52. Suster S, van Noord G, Titov I (2015) Word representations, tree models and syntactic functions. CoRR abs/1508.07709. http://arxiv.org/abs/1508.07709

  53. Sutskever I, Vinyals O, Le Q.V (2014) Sequence to sequence learning with neural networks. Advances in neural information processing systems 27

  54. Tang Z, Hahn-Powell G, Surdeanu M (2020) Exploring interpretability in event extraction: Multitask learning of a neural event classifier and an explanation decoder

  55. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp. 5998–6008

  56. Vinyals O, Fortunato M, Jaitly N (2015) Pointer networks. arXiv preprint arXiv:1506.03134

  57. Walker C, Consortium L.D (2005) ACE 2005 Multilingual Training Corpus. LDC corpora. Linguistic Data Consortium . https://books.google.com/books?id=SbjjuQEACAAJ

  58. Wang M, Che W, Manning C (2013) Effective bilingual constraints for semi-supervised learning of named entity recognizers. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 27

  59. Wang R, Tang D, Duan N, Wei Z, Huang X, Cao G, Jiang D, Zhou M, et al (2020) K-adapter: Infusing knowledge into pre-trained models with adapters. arXiv preprint arXiv:2002.01808

  60. Wang Y, Li Y, Tong H, Zhu Z (2020) Hit: nested named entity recognition via head-tail pair and token interaction. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6027–6036

  61. Wu S, Li Y, Zhang D, Zhou Y, Wu Z (2020) Diverse and informative dialogue generation with context-specific commonsense knowledge awareness. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5811–5820. Association for Computational Linguistics, Online . 10.18653/v1/2020.acl-main.515. https://aclanthology.org/2020.acl-main.515

  62. Yan H, Gui T, Dai J, Guo Q, Zhang Z, Qiu X (2021) A unified generative framework for various ner subtasks. arXiv preprint arXiv:2106.01223

  63. Yang J, Teng Z, Zhang M, Zhang Y (2016) Combining discrete and neural features for sequence labeling. In: International Conference on Intelligent Text Processing and Computational Linguistics, pp. 140–154. Springer

  64. Yang J, Zhang Y, Dong F (2017) Neural word segmentation with rich pretraining. arXiv preprint arXiv:1704.08960

  65. Yu J, Bohnet B, Poesio M (2020) Named entity recognition as dependency parsing. arXiv preprint arXiv:2005.07150

  66. Yu T, Joty S (2020) Online conversation disentanglement with pointer networks

  67. Zhai F, Potdar S, Xiang B, Zhou B (2017) Neural models for sequence chunking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31

  68. Zhang D.Z, Xie Y.H, Man L.I, Shi C (2017) Construction of knowledge graph of traditional chinese medicine based on the ontology. Technology Intelligence Engineering

  69. Zhang S, Qin Y, Hou W.J, Wang X (2006) Word segmentation and named entity recognition for sighan bakeoff3. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 158–161

  70. Zhang Y, Yang J (2018) Chinese ner using lattice lstm. arXiv preprint arXiv:1805.02023

  71. ZHOU Junsheng QU Weiguang Z.F (2013) Chinese named entity recognition via joint identification and categorization. http://cje.ejournal.org.cn/en/article/id/7635

  72. Zhu W, Cheung D (2021) Lex-bert: Enhancing bert based ner with lexicons. ArXiv abs/2101.00396

  73. Zhu Y, Wang G, Karlsson B.F (2019) Can-ner: Convolutional attention network for chinese named entity recognition. arXiv preprint arXiv:1904.02141

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Guo.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, Q., Guo, Y. Lexicon enhanced Chinese named entity recognition with pointer network. Neural Comput & Applic 34, 14535–14555 (2022). https://doi.org/10.1007/s00521-022-07287-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07287-1

Keywords

Navigation