Skip to main content
Log in

Few-shot named entity recognition framework for forestry science metadata extraction

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

The effective utilization of accumulated forestry science papers is of paramount significance in enhancing our understanding of the current state of forests and the formulation of strategies for forest environmental preservation. However, the present challenge lies in the deficient richness of metadata associated with these pivotal documents, rendering their comprehensive exploitation a formidable endeavor. Metadata from forestry science papers serves as a foundational cornerstone for the efficient management and utilization of these scholarly documents, playing an indispensable role in the advancement of research within the domain of forestry science. Constructing a training corpus and extracting distant semantic relationships is challenging inherent, the utilization of named entity recognition (NER) technology for metadata entity identification in forestry science papers remains an unexplored avenue. To overcome these limitations, this paper creates a specialized training corpus and introduces a novel few-shot NER framework tailored specifically for metadata extraction from forestry science papers. Within this innovative framework, a data augmentation layer, employing word replacement (WR) and enhanced mixup (EM), effectively addresses the issue of suboptimal performance resulting from a scarcity of training data. The semantic comprehension layer incorporates a multi-granularity dilated convolution neural network (MGDCNN) to capture and extract distant semantic associations. Moreover, a meta-learning-based reweighting layer is introduced to mitigate the adverse effects of low-quality augmented examples on the model. Experimental results conclusively demonstrate the efficacy of the proposed framework, yielding precision, recall, and F1 of 91.08%, 88.96%, and 90.00%, respectively. Compared to traditional models, precision, recall, and F1 can be improved by up to 10.69%, 7.48%, and 9.07%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Fig. 3
Algorithm 3
Fig. 4
Fig. 5
Algorithm 4
Fig. 6

Similar content being viewed by others

References

  • Dai X, Adel H (2020) An analysis of simple data augmentation for named entity recognition. arXiv:2010.11683

  • Dai Z, Yang Z, Yang Y, Carbonell J, Le Quoc V, Salakhutdinov R (2019) Transformer-xl: attentive language models beyond a fixed-length context. arXiv:1901.02860

  • Devlin J, Chang MW, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of naacL-HLT, vol 1, p 2

  • Dongmei LI, Wen TAN (2019) Research on named entity recognition method in plant attribute text. J Front Comput Sci Technol 13(12):2085

    Google Scholar 

  • Du H (2020) Research and construction of a forestry law and regulation q &a system integrating knowledge graph. Beijing Forestry University

  • Gong Y, Mao L, Changliang L (2021) Few-shot learning for named entity recognition based on bert and two-level model fusion. Data Intell 3(4):568–577

    Article  Google Scholar 

  • Guo H, Mao Y, Zhang R (2019) Augmenting data with mixup for sentence classification: an empirical study. arXiv:1905.08941

  • Huang Z, Xu W, Yu K (2015) Bidirectional lstm-crf models for sequence tagging. arXiv:1508.01991

  • Ji P, Xiao Y, Hou R (2019) Exploration and practice of forestry science data management. J Agric Big Data 1(03):46–56

    Google Scholar 

  • Jing S (2022) Thoughts and countermeasures on strengthening scientific data management in the era of big data. China Soft Sci 09:50–54

    Google Scholar 

  • Kang Y, Sun L, Zhu R, Li M (2022) A review of deep learning chinese named entity recognition research. J Huazhong Univ Sci Technol (Natural Science Edition) 50(11)

  • Ke J, Wang W, Chen X, Gou J, Gao Y, Jin S (2023) Medical entity recognition and knowledge map relationship analysis of Chinese emrs based on improved bilstm-crf. Comput Electr Eng 108:108709

    Article  Google Scholar 

  • Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) Albert: a lite bert for self-supervised learning of language representations

  • Lee C-S, Wang M-H, Reformat M, Huang S-H (2023) Human intelligence-based metaverse for co-learning of students and smart machines. J Ambient Intell Humaniz Comput 14(6):7695–7718

    Article  Google Scholar 

  • Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. Association for Computational Linguistics, pp 1064–1074

  • Patil NV, Patil AS, Pawar BV (2017) Hmm based named entity recognition for inflectional language, pp 565–572

  • Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  • Qian H, Liu N, Wang J, Zhichao W, Zhang X, Liu Q, Zhao Y, Feng X (2021) An overlapping sequence tagging mechanism for symptoms and details extraction on Chinese medical records. Comput Electr Eng 91:107019

    Article  Google Scholar 

  • Ramachandran R, Arutchelvan K (2021) Named entity recognition on bio-medical literature documents using hybrid based approach. J Ambient Intell Humaniz Comput 1–10

  • Ren M, Zeng W, Yang B, Urtasun R (2018) Learning to reweight examples for robust deep learning. In: International conference on machine learning. PMLR, pp 4334–4343

  • Rubí JNS, de Carvalho PHP, Gondim PRL (2022) Forestry 4.0 and industry 4.0: use case on wildfire behavior predictions. Comput Electric Eng 102:108200

    Article  Google Scholar 

  • Ruidan Wang, Jing Yang, Menxu Gao, Wang C (2018) Reflections on strengthening and standardizing scientific data management in china. China Sci Technol Resour Guide 50(02):1–5

    Google Scholar 

  • Sundheim BM (1995) Named entity task definition, version2.1. In: Proc. sixth message understanding conf. (MUC-6)

  • Sun Y, Wang S, Li Y, Feng S, Tian H, Hua W, Wang H (2020) Ernie 2.0: a continual pre-training framework for language understanding. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 8968–8975

  • Sun Y, Wang S, Feng S, Ding S, Pang C, Shang J, Liu J, Chen X, Zhao Y, Lu Y, Liu W, Wu Z, Gong W, Liang J, Shang Z, Sun P, Liu W, Ouyang X, Yu D, Tian H, Wu H, Wang H (2021) Ernie 3.0: large-scale knowledge enhanced pre-training for language understanding and generation. arXiv:2107.02137

  • Wang Q, Xiyou S (2022) Research on named entity recognition methods in Chinese forest disease texts. Appl Sci 12(8):3885

    Article  Google Scholar 

  • Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2017) mixup: beyond empirical risk minimization. arXiv:1710.09412

  • Zhang L, Nie X, Zhang M, Gu M, Geissen V, Ritsema CJ, Niu D, Zhang H (2022) Lexicon and attention-based named entity recognition for kiwifruit diseases and pests: a deep learning approach. Front Plant Sci 13:1053449

    Article  Google Scholar 

  • Zhang Y, Pu P, Huang L, Qian B, Liu Y (2023) Chinese named entity recognition of apple diseases and pests based on iterative dilated convolution, pp 1810–1815

  • Zhao P, Wang W, Liu H, Han M (2022) Recognition of the agricultural named entities with multifeature fusion based on albert. IEEE Access 10:98936–98943

    Article  Google Scholar 

  • Zhu H, Yang L, Ding W (2018) Chinese weibo named entity recognition based on topic tags and crf. J Central China Normal Univ (Natural Science Edition)

Download references

Acknowledgements

This study was funded by Guangdong Basic and Applied Basic Research Fund Project (Grant/ Award Number 2020B1515120010), Key Technology Project of Foshan City (Grant/Award Number 1920001001367), Guangdong Science and Technology Plan Project (Grant/Award Number 2019B010139001), Guangdong Natural Science Fund Project (Grant/Award Number 2021A1515011243), and Guangzhou Science and Technology Plan Project (Grant/Award Number 201902020016).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenchao Jiang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, Y., Xiao, H., Wang, M. et al. Few-shot named entity recognition framework for forestry science metadata extraction. J Ambient Intell Human Comput 15, 2105–2118 (2024). https://doi.org/10.1007/s12652-023-04740-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-023-04740-4

Keywords

Navigation