Abstract
Few-shot named entity recognition aims at recognizing novel-class named entities in low resources scenarios. Low resource scenarios contain limited data in the support set with sparse labels. Existing methods neglect the relevance of the support set to the task and the semantics of label naming. In this paper, on the basis of contrastive learning, we propose a multi-task learning framework CLINER for Few-Shot NER. We construct a mechanism for joint learning of label semantic information and support set information. For label support set information, we find a view in the support set that is most relevant to the current task, maximizing the utilization of each support set. Momentum encoder, a dynamic queue, is constructed to keep track of positive and negative examples learned from previous support sets, and keep it updated. For label semantic information, it is implied in the label naming and is derived explicitly by pre-trained language encoder. Experiments demonstrate that our model improves the overall performance comparing with recent baseline models, achieves state-of-the-art results on the commonly used standard datasets. The source code of CLINER will be available at: https://github.com/yizumi426/CLINER.
Similar content being viewed by others
Data availability
The datasets analyzed during the current study are available in the Few-NERD repository.
References
Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. In: Proceedings of NAACL-HLT, pp 260–270
Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional lstm-cnns-crf. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1064–1074
Chiu JP, Nichols E (2016) Named entity recognition with bidirectional lstm-cnns. Trans Assoc Comput Linguist 4:357–370
Peters ME, Ammar W, Bhagavatula C, Power R (2017) Semi-supervised sequence tagging with bidirectional language models. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Vol 1: Long Papers), pp 1756–1765
Kenton JDM-WC, Toutanova LK (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp 4171–4186
Hofer M, Kormilitzin A, Goldberg P, Nevado-Holgado A (2018) Few-shot learning for named entity recognition in medical text. arXiv preprint arXiv:1811.05468
Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. Adv Neural Inform Process Syst 30:4077–4087
Fritzler A, Logacheva V, Kretov M (2019) Few-shot classification in named entity recognition task. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, pp 993–1000
Hou Y, Che W, Lai Y, Zhou Z, Liu Y, Liu H, Liu T (2020) Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive projection network. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 1381–1393
Yang Y, Katiyar A (2020) Simple and effective few-shot named entity recognition with structured nearest neighbor learning. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 6365–6375
Lafferty JD, McCallum A, Pereira FC (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp 282–289
Lin D, Wu X (2009) Phrase clustering for discriminative learning. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp 1030–1038
Turian J, Ratinov L, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp 384–394
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp 2227–2237
Akbik A, Blythe D, Vollgraf R (2018) Contextual string embeddings for sequence labeling. In: Proceedings of the 27th International Conference on Computational Linguistics, pp 1638–1649
Yamada I, Asai A, Shindo H, Takeda H, Matsumoto Y (2020) Luke: Deep contextualized entity representations with entity-aware self-attention. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 6442–6454
Han X, Zhu H, Yu P, Wang Z, Yao Y, Liu Z, Sun M (2018) Fewrel: A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 4803–4809
Geng R, Li B, Li Y, Sun J, Zhu X (2020) Dynamic memory induction networks for few-shot text classification. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 1087–1094
Wang P, Xun R, Liu T, Dai D, Chang B, Sui Z (2021) Behind the scenes: an exploration of trigger biases problem in few-shot event classification. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp 1969–1978
Das SSS, Katiyar A, Passonneau RJ, Zhang R (2022) Container: few-shot named entity recognition via contrastive learning. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 6338–6353
Huang J, Li C, Subudhi K, Jose D, Balakrishnan S, Chen W, Peng B, Gao J, Han J (2021) Few-shot named entity recognition: an empirical baseline study. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 10408–10423
Vinyals O, Blundell C, Lillicrap T, Wierstra D, et al (2016) Matching networks for one shot learning. Adv Neural Inform Process Syst 29:3630–3638
Gu J, Wang Y, Chen Y, Li VO, Cho K (2018) Meta-learning for low-resource neural machine translation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 3622–3631
Deng S, Zhang N, Kang J, Zhang Y, Zhang W, Chen H (2020) Meta-learning with dynamic-memory-based prototypical network for few-shot event detection. In: Proceedings of the 13th International Conference on Web Search and Data Mining, pp 151–159
Wang Y, Chao W-L, Weinberger KQ, van der Maaten L (2019) Simpleshot: revisiting nearest-neighbor classification for few-shot learning. arXiv preprint arXiv:1911.04623
Ma J, Ballesteros M, Doss S, Anubhai R, Mallya S, Al-Onaizan Y, Roth D (2022) Label semantics for few shot named entity recognition. In: Findings of the Association for Computational Linguistics: ACL 2022, pp 1956–1971
Cui L, Wu Y, Liu J, Yang S, Zhang Y (2021) Template-based named entity recognition using bart. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp 1835–1845
Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol 2, pp 1735–1742. IEEE
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning, pp 1597–1607
He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9729–9738
Doersch C, Zisserman A (2017) Multi-task self-supervised visual learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2051–2060
Ye M, Zhang X, Yuen PC, Chang S-F (2019) Unsupervised embedding learning via invariant and spreading instance feature. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6210–6219
Kim T, Yoo KM, Lee S-g (2021) Self-guided contrastive learning for bert sentence representations. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp 2528–2540
Gunel B, Du J, Conneau A, Stoyanov V (2020) Supervised contrastive learning for pre-trained language model fine-tuning. arXiv preprint arXiv:2011.01403
Gao T, Yao X, Chen D (2021) Simcse: simple contrastive learning of sentence embeddings. In: 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, pp 6894–6910. Association for Computational Linguistics (ACL)
Athiwaratkun B, Wilson AG, Anandkumar A (2018) Probabilistic fasttext for multi-sense word embeddings. arXiv preprint arXiv:1806.02901
Li H, Eigen D, Dodge S, Zeiler M, Wang X (2019) Finding task-relevant features for few-shot learning by category traversal. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1–10
Ding N, Xu G, Chen Y, Wang X, Han X, Xie P, Zheng H, Liu Z (2021) Few-nerd: a few-shot named entity recognition dataset. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp 3198–3213
Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, et al (2019) Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771
Su J, Cao J, Liu W, Ou Y (2021) Whitening sentence representations for better semantics and faster retrieval. arXiv preprint arXiv:2103.15316
Acknowledgements
This work is jointly supported by National Natural Science Foundation of China (61877043) and National Natural Science Foundation of China (61877044).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, X., Li, X., Zhao, M. et al. CLINER: exploring task-relevant features and label semantic for few-shot named entity recognition. Neural Comput & Applic 36, 4679–4691 (2024). https://doi.org/10.1007/s00521-023-09285-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-09285-3