Cross-domain NER in the data-poor scenarios for human mobility knowledge

Jiang, Yutong; Jin, Fusheng; Chen, Mengnan; Liu, Guoming; Pang, He; Yuan, Ye

doi:10.1007/s10707-024-00513-z

Cross-domain NER in the data-poor scenarios for human mobility knowledge

Research
Published: 05 March 2024

(2024)
Cite this article

GeoInformatica Aims and scope Submit manuscript

Yutong Jiang¹,
Fusheng Jin¹,
Mengnan Chen¹,
Guoming Liu²,
He Pang² &
…
Ye Yuan¹

113 Accesses
Explore all metrics

Abstract

In recent years, the exploration of knowledge in large-scale human mobility has gained significant attention. In order to achieve a semantic understanding of human behavior and uncover patterns in large-scale human mobility, Named Entity Recognition (NER) is a crucial technology. The rapid advancements in IoT and CPS technologies have led to the collection of massive human mobility data from various sources. Therefore, there’s a need for Cross-domain NER which can transfer entity information from the source domain to automatically identify and classify entities in different target domain texts. In the situation of the data-poor, how could we transfer human mobility knowledge over time and space is particularly significant, therefore this paper proposes an Adaptive Text Sequence Enhancement Module (at-SAM) to help the model enhance the association between entities in sentences in the data-poor target domains. This paper also proposes a Predicted Label-Guided Dual Sequence Aware Information Module (Dual-SAM) to improve the transferability of label information. Experiments were conducted in domains that contain hidden knowledge about human mobility, the results show that this method can transfer task knowledge between multiple different domains in the data-poor scenarios and achieve SOTA performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on large language model based autonomous agents

Article Open access 22 March 2024

Information extraction from electronic medical documents: state of the art and future research directions

Article 08 November 2022

Joint entity and relation extraction model based on directed-relation GAT oriented to Chinese patent texts

Article 08 February 2024

Availability of data and materials

The data that support the findings of this study are openly available at https://github.com/jinpeng01/LANER/tree/main/ner_data.

References

Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1):3–26
Article Google Scholar
Zhang Z, Han X, Liu Z, Jiang X, Sun M, Liu Q (2019) Ernie: enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129
Cheng P, Erk K (2020) Attending to entities for better text understanding. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 7554–7561
Cowan B, Zethelius S, Luk B, Baras T, Ukarde P, Zhang D (2015) Named entity recognition in travel-related search queries. In: Proceedings of the AAAI conference on artificial intelligence, vol 29, pp 3935–3941
Brandsen A, Verberne S, Lambers K, Wansleeben M (2022) Can bert dig it? named entity recognition for information retrieval in the archaeology domain. J Comput Cult Herit (JOCCH) 15(3):1–18
Article Google Scholar
Khademi ME, Fakhredanesh M (2020) Persian automatic text summarization based on named entity recognition. Iran J Sci Technol, Trans Electr Eng, 1–12
Mollá D, Van Zaanen M, Smith D (2006) Named entity recognition for question answering. In: Proceedings of the Australasian language technology workshop 2006, pp 51–8
Li Z, Qu D, Xie C, Zhang W, Li Y (2020) Language model pre-training method in machine translation based on named entity recognition. Int J Artif Intell Tools 29(07n08):2040021
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:5998–6008
Google Scholar
Radford A, Narasimhan K, Salimans T, Sutskever I et al (2018) Improving language understanding by generative pre-training
Baevski A, Edunov S, Liu Y, Zettlemoyer L, Auli M (2019) Cloze-driven pretraining of self-attention networks. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 5360–5369
Liu T, Yao J-G, Lin C-Y (2019) Towards improving neural named entity recognition with gazetteers. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 5301–5307
Jie Z, Lu W (2019) Dependency-guided lstm-crf for named entity recognition. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 3862–3872
Xia C, Zhang C, Yang T, Li Y, Du N, Wu X, Fan W, Ma F, Yu P (2019) Multi-grained named entity recognition. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 1430–1440
Liu Y, Meng F, Zhang J, Xu J, Chen Y, Zhou J (2019) GCDT: a global context enhanced deep transition architecture for sequence labeling. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 2431–2441
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. Journal of machine learning research 12(ARTICLE):2493–2537
Google Scholar
Xie J, Yang Z, Neubig G, Smith NA, Carbonell J (2018) Neural cross-lingual named entity recognition with minimal resources. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 369–379
Liu Z, Winata GI, Fung P (2020) Zero-resource cross-domain named entity recognition. In: Proceedings of the 5th workshop on representation learning for NLP, pp 1–6
Liu Z, Xu Y, Yu T, Dai W, Ji Z, Cahyawijaya S, Madotto A, Fung P (2021) Crossner: evaluating cross-domain named entity recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 13452–13460
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4171–4186
Sharnagat R (2014) Named entity recognition: a literature survey. Center For Indian Language Technology, 1–27
Hu J, Zhao H, Guo D, Wan X, Chang T-H (2022) A label-aware autoregressive framework for cross-domain ner. In: Findings of the association for computational linguistics: NAACL 2022, pp 2222–2232
Jia C, Liang X, Zhang Y (2019) Cross-domain ner using cross-domain language modeling. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 2464–2474
Wang Z, Qu Y, Chen L, Shen J, Zhang W, Zhang S, Gao Y, Gu G, Chen K, Yu Y (2018) Label-aware double transfer learning for cross-specialty medical named entity recognition. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long Papers), pp 1–15
Wang J, Kulkarni M, Preoţiuc-Pietro D (2020) Multi-domain named entity recognition with genre-aware and agnostic inference. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 8476–8488
Liu Z, Winata GI, Xu P, Fung P (2020) Coach: a coarse-to-fine approach for cross-domain slot filling. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 19–25
Sachan DS, Xie P, Sachan M, Xing EP (2018) Effective use of bidirectional language modeling for transfer learning in biomedical named entity recognition. In: Machine learning for healthcare conference. PMLR, pp 383–402
Jia C, Zhang Y (2020) Multi-cell compositional lstm for ner domain adaptation. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 5906–5917
OpenAI: introducing ChatGPT (2022). https://openai.com/blog/chatgpt. Accessed 03 April 2023
Joshi M, Chen D, Liu Y, Weld DS, Zettlemoyer L, Levy O (2020) Spanbert: improving pre-training by representing and predicting spans. Trans Assoc Comput Linguist 8:64–77
Article Google Scholar
Liu J, Pasupat P, Wang Y, Cyphers S, Glass J (2013) Query understanding enhanced by hierarchical parsing structures. In: 2013 IEEE workshop on automatic speech recognition and understanding. IEEE, pp 72–77
Liu J, Pasupat P, Cyphers S, Glass J (2013) Asgard: a portable architecture for multilingual dialogue systems. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 8386–8390
Akbik A, Blythe D, Vollgraf R (2018) Contextual string embeddings for sequence labeling. In: Proceedings of the 27th international conference on computational linguistics, pp 1638–1649
Yan H, Gui T, Dai J, Guo Q, Zhang Z, Qiu X (2021) A unified generative framework for various NER subtasks. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 5808–5822
Wu Y, Jiang M, Lei J, Xu H (2015) Named entity recognition in chinese clinical text using deep neural network. Stud Health Technol Inform 216:624
PubMed PubMed Central Google Scholar
Borthwick AE (1999) A maximum entropy approach to named entity recognition. New York University, ???
Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2020) BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 7871–7880
Tang M, Zhang P, He Y, Xu Y, Chao C, Xu H (2022) Dosea: a domain-specific entity-aware framework for cross-domain named entity recogition. In: Proceedings of the 29th international conference on computational linguistics, pp 2147–2156
Tjong Kim Sang EF, De Meulder F (2003) Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003, pp 142–147

Download references

Funding

This study was funded by National Natural Science Foundation of China(No. 62272045).

Author information

Authors and Affiliations

School of Computer Science and Technology, Beijing Institute of Technology, Southern Street of Zhongguanchun, Beijing, 100081, China
Yutong Jiang, Fusheng Jin, Mengnan Chen & Ye Yuan
Beijing Aerospace Automatic Control Institute, Yongding Rd, Beijing, 100854, China
Guoming Liu & He Pang

Authors

Yutong Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Fusheng Jin
View author publications
You can also search for this author in PubMed Google Scholar
Mengnan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Guoming Liu
View author publications
You can also search for this author in PubMed Google Scholar
He Pang
View author publications
You can also search for this author in PubMed Google Scholar
Ye Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yutong Jiang: Conceptualization ,Investigation, Methodology, Visualization, Writing, Resources. Fusheng Jin: Funding acquisition, Supervision, Conceptualization, Data curation, Methodology, Writing review & editing. Mengnan Chen: Data curation, Methodology, Validation, Visualization, Writing. Guoming Liu: Writing, Validation. He Pang: Writing, Supervision, review & editing. Ye Yuan: Writing review & editing.

Corresponding author

Correspondence to Fusheng Jin.

Ethics declarations

Competing interests

The authors declare that they have no conflict of interest regarding the publication of this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jiang, Y., Jin, F., Chen, M. et al. Cross-domain NER in the data-poor scenarios for human mobility knowledge. Geoinformatica (2024). https://doi.org/10.1007/s10707-024-00513-z

Download citation

Received: 09 July 2023
Revised: 08 December 2023
Accepted: 20 February 2024
Published: 05 March 2024
DOI: https://doi.org/10.1007/s10707-024-00513-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-domain NER in the data-poor scenarios for human mobility knowledge

Abstract

Access this article

Similar content being viewed by others

A survey on large language model based autonomous agents

Information extraction from electronic medical documents: state of the art and future research directions

Joint entity and relation extraction model based on directed-relation GAT oriented to Chinese patent texts

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cross-domain NER in the data-poor scenarios for human mobility knowledge

Abstract

Access this article

Similar content being viewed by others

A survey on large language model based autonomous agents

Information extraction from electronic medical documents: state of the art and future research directions

Joint entity and relation extraction model based on directed-relation GAT oriented to Chinese patent texts

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation