Abstract
The effective utilization of accumulated forestry science papers is of paramount significance in enhancing our understanding of the current state of forests and the formulation of strategies for forest environmental preservation. However, the present challenge lies in the deficient richness of metadata associated with these pivotal documents, rendering their comprehensive exploitation a formidable endeavor. Metadata from forestry science papers serves as a foundational cornerstone for the efficient management and utilization of these scholarly documents, playing an indispensable role in the advancement of research within the domain of forestry science. Constructing a training corpus and extracting distant semantic relationships is challenging inherent, the utilization of named entity recognition (NER) technology for metadata entity identification in forestry science papers remains an unexplored avenue. To overcome these limitations, this paper creates a specialized training corpus and introduces a novel few-shot NER framework tailored specifically for metadata extraction from forestry science papers. Within this innovative framework, a data augmentation layer, employing word replacement (WR) and enhanced mixup (EM), effectively addresses the issue of suboptimal performance resulting from a scarcity of training data. The semantic comprehension layer incorporates a multi-granularity dilated convolution neural network (MGDCNN) to capture and extract distant semantic associations. Moreover, a meta-learning-based reweighting layer is introduced to mitigate the adverse effects of low-quality augmented examples on the model. Experimental results conclusively demonstrate the efficacy of the proposed framework, yielding precision, recall, and F1 of 91.08%, 88.96%, and 90.00%, respectively. Compared to traditional models, precision, recall, and F1 can be improved by up to 10.69%, 7.48%, and 9.07%, respectively.
Similar content being viewed by others
References
Dai X, Adel H (2020) An analysis of simple data augmentation for named entity recognition. arXiv:2010.11683
Dai Z, Yang Z, Yang Y, Carbonell J, Le Quoc V, Salakhutdinov R (2019) Transformer-xl: attentive language models beyond a fixed-length context. arXiv:1901.02860
Devlin J, Chang MW, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of naacL-HLT, vol 1, p 2
Dongmei LI, Wen TAN (2019) Research on named entity recognition method in plant attribute text. J Front Comput Sci Technol 13(12):2085
Du H (2020) Research and construction of a forestry law and regulation q &a system integrating knowledge graph. Beijing Forestry University
Gong Y, Mao L, Changliang L (2021) Few-shot learning for named entity recognition based on bert and two-level model fusion. Data Intell 3(4):568–577
Guo H, Mao Y, Zhang R (2019) Augmenting data with mixup for sentence classification: an empirical study. arXiv:1905.08941
Huang Z, Xu W, Yu K (2015) Bidirectional lstm-crf models for sequence tagging. arXiv:1508.01991
Ji P, Xiao Y, Hou R (2019) Exploration and practice of forestry science data management. J Agric Big Data 1(03):46–56
Jing S (2022) Thoughts and countermeasures on strengthening scientific data management in the era of big data. China Soft Sci 09:50–54
Kang Y, Sun L, Zhu R, Li M (2022) A review of deep learning chinese named entity recognition research. J Huazhong Univ Sci Technol (Natural Science Edition) 50(11)
Ke J, Wang W, Chen X, Gou J, Gao Y, Jin S (2023) Medical entity recognition and knowledge map relationship analysis of Chinese emrs based on improved bilstm-crf. Comput Electr Eng 108:108709
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) Albert: a lite bert for self-supervised learning of language representations
Lee C-S, Wang M-H, Reformat M, Huang S-H (2023) Human intelligence-based metaverse for co-learning of students and smart machines. J Ambient Intell Humaniz Comput 14(6):7695–7718
Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. Association for Computational Linguistics, pp 1064–1074
Patil NV, Patil AS, Pawar BV (2017) Hmm based named entity recognition for inflectional language, pp 565–572
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Qian H, Liu N, Wang J, Zhichao W, Zhang X, Liu Q, Zhao Y, Feng X (2021) An overlapping sequence tagging mechanism for symptoms and details extraction on Chinese medical records. Comput Electr Eng 91:107019
Ramachandran R, Arutchelvan K (2021) Named entity recognition on bio-medical literature documents using hybrid based approach. J Ambient Intell Humaniz Comput 1–10
Ren M, Zeng W, Yang B, Urtasun R (2018) Learning to reweight examples for robust deep learning. In: International conference on machine learning. PMLR, pp 4334–4343
Rubí JNS, de Carvalho PHP, Gondim PRL (2022) Forestry 4.0 and industry 4.0: use case on wildfire behavior predictions. Comput Electric Eng 102:108200
Ruidan Wang, Jing Yang, Menxu Gao, Wang C (2018) Reflections on strengthening and standardizing scientific data management in china. China Sci Technol Resour Guide 50(02):1–5
Sundheim BM (1995) Named entity task definition, version2.1. In: Proc. sixth message understanding conf. (MUC-6)
Sun Y, Wang S, Li Y, Feng S, Tian H, Hua W, Wang H (2020) Ernie 2.0: a continual pre-training framework for language understanding. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 8968–8975
Sun Y, Wang S, Feng S, Ding S, Pang C, Shang J, Liu J, Chen X, Zhao Y, Lu Y, Liu W, Wu Z, Gong W, Liang J, Shang Z, Sun P, Liu W, Ouyang X, Yu D, Tian H, Wu H, Wang H (2021) Ernie 3.0: large-scale knowledge enhanced pre-training for language understanding and generation. arXiv:2107.02137
Wang Q, Xiyou S (2022) Research on named entity recognition methods in Chinese forest disease texts. Appl Sci 12(8):3885
Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2017) mixup: beyond empirical risk minimization. arXiv:1710.09412
Zhang L, Nie X, Zhang M, Gu M, Geissen V, Ritsema CJ, Niu D, Zhang H (2022) Lexicon and attention-based named entity recognition for kiwifruit diseases and pests: a deep learning approach. Front Plant Sci 13:1053449
Zhang Y, Pu P, Huang L, Qian B, Liu Y (2023) Chinese named entity recognition of apple diseases and pests based on iterative dilated convolution, pp 1810–1815
Zhao P, Wang W, Liu H, Han M (2022) Recognition of the agricultural named entities with multifeature fusion based on albert. IEEE Access 10:98936–98943
Zhu H, Yang L, Ding W (2018) Chinese weibo named entity recognition based on topic tags and crf. J Central China Normal Univ (Natural Science Edition)
Acknowledgements
This study was funded by Guangdong Basic and Applied Basic Research Fund Project (Grant/ Award Number 2020B1515120010), Key Technology Project of Foshan City (Grant/Award Number 1920001001367), Guangdong Science and Technology Plan Project (Grant/Award Number 2019B010139001), Guangdong Natural Science Fund Project (Grant/Award Number 2021A1515011243), and Guangzhou Science and Technology Plan Project (Grant/Award Number 201902020016).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fan, Y., Xiao, H., Wang, M. et al. Few-shot named entity recognition framework for forestry science metadata extraction. J Ambient Intell Human Comput 15, 2105–2118 (2024). https://doi.org/10.1007/s12652-023-04740-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-023-04740-4