A Framework of Data Augmentation While Active Learning for Chinese Named Entity Recognition

Li, Qingqing; Huang, Zhen; Dou, Yong; Zhang, Ziwen

doi:10.1007/978-3-030-82147-0_8

Qingqing Li¹³,
Zhen Huang¹³,
Yong Dou¹³ &
…
Ziwen Zhang¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12816))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

1945 Accesses
1 Citations

Abstract

Named entity recognition (NER) is a basic task to construct knowledge graph. The training performance is limited with few labelled data. One solution is active learning, which can achieve ideal results by multi-round sampling strategy to augment unlabelled data. However, there is very few labelled data in the early rounds, which leads to slow improvement on training performance. We thus propose a framework of data augmentation while active learning. To validate our claims, we focus on Chinese NER task and carry out extensive experiments on two public datasets. Experimental results show that our framework is effective for a series of classical query strategy. We can achieve 99% of the best deep model trained on full data using only 22% of the data on Resume, 63% labelled data is reduced as compared to pure active learning (PAL).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cai, T., Ma, Z., Zheng, H., Zhou, Y.: NE–LP: normalized entropy- and loss prediction-based sampling for active learning in Chinese word segmentation on EHRs. Neural Comput. Appl., 1–15 (2021). https://doi.org/10.1007/s00521-021-05896-w
Chiu, J.P., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNS. Trans. Assoc. Comput. Linguist. 4, 357–370 (2016)
Article Google Scholar
Culotta, A., McCallum, A.: Reducing labeling effort for structured prediction tasks, AAAI, vol. 5, pp. 746–751 (2005)
Google Scholar
Dai, X., Adel, H.: An analysis of simple data augmentation for named entity recognition. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 3861–3867 (2020)
Google Scholar
Ding, B., et al.: Daga: Data augmentation with a generation approach for low-resource tagging tasks (2020)
Google Scholar
Dong, C., Zhang, J., Zong, C., Hattori, M., Di, H.: Character-based LSTM-CRF with radical-level features for Chinese named entity recognition. In: Lin, C.-Y., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds.) ICCPOL/NLPCC -2016. LNCS (LNAI), vol. 10102, pp. 239–250. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50496-4_20
Chapter Google Scholar
Gal, Y., Islam, R., Ghahramani, Z.: Deep Bayesian active learning with image data. In: International Conference on Machine Learning, pp. 1183–1192. PMLR (2017)
Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging
Google Scholar
Kumar, V., Choudhary, A., Cho, E.: Data augmentation using pre-trained transformer models. In: Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems, pp. 18–26 (2020)
Google Scholar
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Proceedings of NAACL-HLT, pp. 260–270 (2016)
Google Scholar
Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015)
Google Scholar
Liu, M., Tu, Z., Wang, Z., Xu, X.: LTP: A new active learning strategy for BERT-CRF based named entity recognition. arXiv preprint arXiv:2001.02524 (2020)
Ma, R., Peng, M., Zhang, Q., Wei, Z., Huang, X.J.: Simplify the usage of lexicon in Chinese NER. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5951–5960 (2020)
Google Scholar
Marcheggiani, D., Artieres, T.: An experimental comparison of active learning strategies for partially labeled sequences. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 898–906 (2014)
Google Scholar
Peng, N., Dredze, M.: Named entity recognition for Chinese social media with jointly trained embeddings. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 548–554 (2015)
Google Scholar
Shen, D., Zhang, J., Su, J., Zhou, G., Tan, C.L.: Multi-criteria-based active learning for named entity recognition. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-2004), pp. 589–596 (2004)
Google Scholar
Shen, Y., Yun, H., Lipton, Z.C., Kronrod, Y., Anandkumar, A.: Deep active learning for named entity recognition. In: Proceedings of the 2nd Workshop on Representation Learning for NLP, pp. 252–256 (2017)
Google Scholar
Tenney, I., Das, D., Pavlick, E.: BERT rediscovers the classical NLP pipeline. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4593–4601 (2019)
Google Scholar
Yang, Z., Salakhutdinov, R., Cohen, W.: Multi-task cross-lingual sequence tagging from scratch. arXiv preprint arXiv:1603.06270 (2016)
Zhang, Y., Yang, J.: Chinese NER using lattice LSTM. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1554–1564 (2018)
Google Scholar

Download references

Acknowledgment

This work is supported by the National Key R&D Program of China under Grants (No. 2018YFB0204300).

Author information

Authors and Affiliations

National Key Laboratory of Parallel and Distributed Processing, National University of Defense Technology, Changsha, China
Qingqing Li, Zhen Huang, Yong Dou & Ziwen Zhang

Authors

Qingqing Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Dou
View author publications
You can also search for this author in PubMed Google Scholar
Ziwen Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhen Huang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Han Qiu
Ibaraki University, Hitachi, Japan
Cheng Zhang
University of Kentucky, Lexington, KY, USA
Zongming Fei
Texas A&M University – Commerce, Commerce, TX, USA
Meikang Qiu
Princeton University, Princeton, NJ, USA
Sun-Yuan Kung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Q., Huang, Z., Dou, Y., Zhang, Z. (2021). A Framework of Data Augmentation While Active Learning for Chinese Named Entity Recognition. In: Qiu, H., Zhang, C., Fei, Z., Qiu, M., Kung, SY. (eds) Knowledge Science, Engineering and Management . KSEM 2021. Lecture Notes in Computer Science(), vol 12816. Springer, Cham. https://doi.org/10.1007/978-3-030-82147-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-82147-0_8
Published: 07 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82146-3
Online ISBN: 978-3-030-82147-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics