Skip to main content
Log in

Cross-domain NER in the data-poor scenarios for human mobility knowledge

  • Research
  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

In recent years, the exploration of knowledge in large-scale human mobility has gained significant attention. In order to achieve a semantic understanding of human behavior and uncover patterns in large-scale human mobility, Named Entity Recognition (NER) is a crucial technology. The rapid advancements in IoT and CPS technologies have led to the collection of massive human mobility data from various sources. Therefore, there’s a need for Cross-domain NER which can transfer entity information from the source domain to automatically identify and classify entities in different target domain texts. In the situation of the data-poor, how could we transfer human mobility knowledge over time and space is particularly significant, therefore this paper proposes an Adaptive Text Sequence Enhancement Module (at-SAM) to help the model enhance the association between entities in sentences in the data-poor target domains. This paper also proposes a Predicted Label-Guided Dual Sequence Aware Information Module (Dual-SAM) to improve the transferability of label information. Experiments were conducted in domains that contain hidden knowledge about human mobility, the results show that this method can transfer task knowledge between multiple different domains in the data-poor scenarios and achieve SOTA performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Availability of data and materials

The data that support the findings of this study are openly available at https://github.com/jinpeng01/LANER/tree/main/ner_data.

References

  1. Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1):3–26

    Article  Google Scholar 

  2. Zhang Z, Han X, Liu Z, Jiang X, Sun M, Liu Q (2019) Ernie: enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129

  3. Cheng P, Erk K (2020) Attending to entities for better text understanding. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 7554–7561

  4. Cowan B, Zethelius S, Luk B, Baras T, Ukarde P, Zhang D (2015) Named entity recognition in travel-related search queries. In: Proceedings of the AAAI conference on artificial intelligence, vol 29, pp 3935–3941

  5. Brandsen A, Verberne S, Lambers K, Wansleeben M (2022) Can bert dig it? named entity recognition for information retrieval in the archaeology domain. J Comput Cult Herit (JOCCH) 15(3):1–18

    Article  Google Scholar 

  6. Khademi ME, Fakhredanesh M (2020) Persian automatic text summarization based on named entity recognition. Iran J Sci Technol, Trans Electr Eng, 1–12

  7. Mollá D, Van Zaanen M, Smith D (2006) Named entity recognition for question answering. In: Proceedings of the Australasian language technology workshop 2006, pp 51–8

  8. Li Z, Qu D, Xie C, Zhang W, Li Y (2020) Language model pre-training method in machine translation based on named entity recognition. Int J Artif Intell Tools 29(07n08):2040021

    Article  Google Scholar 

  9. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:5998–6008

    Google Scholar 

  10. Radford A, Narasimhan K, Salimans T, Sutskever I et al (2018) Improving language understanding by generative pre-training

  11. Baevski A, Edunov S, Liu Y, Zettlemoyer L, Auli M (2019) Cloze-driven pretraining of self-attention networks. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 5360–5369

  12. Liu T, Yao J-G, Lin C-Y (2019) Towards improving neural named entity recognition with gazetteers. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 5301–5307

  13. Jie Z, Lu W (2019) Dependency-guided lstm-crf for named entity recognition. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 3862–3872

  14. Xia C, Zhang C, Yang T, Li Y, Du N, Wu X, Fan W, Ma F, Yu P (2019) Multi-grained named entity recognition. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 1430–1440

  15. Liu Y, Meng F, Zhang J, Xu J, Chen Y, Zhou J (2019) GCDT: a global context enhanced deep transition architecture for sequence labeling. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 2431–2441

  16. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. Journal of machine learning research 12(ARTICLE):2493–2537

    Google Scholar 

  17. Xie J, Yang Z, Neubig G, Smith NA, Carbonell J (2018) Neural cross-lingual named entity recognition with minimal resources. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 369–379

  18. Liu Z, Winata GI, Fung P (2020) Zero-resource cross-domain named entity recognition. In: Proceedings of the 5th workshop on representation learning for NLP, pp 1–6

  19. Liu Z, Xu Y, Yu T, Dai W, Ji Z, Cahyawijaya S, Madotto A, Fung P (2021) Crossner: evaluating cross-domain named entity recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 13452–13460

  20. Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4171–4186

  21. Sharnagat R (2014) Named entity recognition: a literature survey. Center For Indian Language Technology, 1–27

  22. Hu J, Zhao H, Guo D, Wan X, Chang T-H (2022) A label-aware autoregressive framework for cross-domain ner. In: Findings of the association for computational linguistics: NAACL 2022, pp 2222–2232

  23. Jia C, Liang X, Zhang Y (2019) Cross-domain ner using cross-domain language modeling. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 2464–2474

  24. Wang Z, Qu Y, Chen L, Shen J, Zhang W, Zhang S, Gao Y, Gu G, Chen K, Yu Y (2018) Label-aware double transfer learning for cross-specialty medical named entity recognition. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long Papers), pp 1–15

  25. Wang J, Kulkarni M, Preoţiuc-Pietro D (2020) Multi-domain named entity recognition with genre-aware and agnostic inference. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 8476–8488

  26. Liu Z, Winata GI, Xu P, Fung P (2020) Coach: a coarse-to-fine approach for cross-domain slot filling. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 19–25

  27. Sachan DS, Xie P, Sachan M, Xing EP (2018) Effective use of bidirectional language modeling for transfer learning in biomedical named entity recognition. In: Machine learning for healthcare conference. PMLR, pp 383–402

  28. Jia C, Zhang Y (2020) Multi-cell compositional lstm for ner domain adaptation. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 5906–5917

  29. OpenAI: introducing ChatGPT (2022). https://openai.com/blog/chatgpt. Accessed 03 April 2023

  30. Joshi M, Chen D, Liu Y, Weld DS, Zettlemoyer L, Levy O (2020) Spanbert: improving pre-training by representing and predicting spans. Trans Assoc Comput Linguist 8:64–77

    Article  Google Scholar 

  31. Liu J, Pasupat P, Wang Y, Cyphers S, Glass J (2013) Query understanding enhanced by hierarchical parsing structures. In: 2013 IEEE workshop on automatic speech recognition and understanding. IEEE, pp 72–77

  32. Liu J, Pasupat P, Cyphers S, Glass J (2013) Asgard: a portable architecture for multilingual dialogue systems. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 8386–8390

  33. Akbik A, Blythe D, Vollgraf R (2018) Contextual string embeddings for sequence labeling. In: Proceedings of the 27th international conference on computational linguistics, pp 1638–1649

  34. Yan H, Gui T, Dai J, Guo Q, Zhang Z, Qiu X (2021) A unified generative framework for various NER subtasks. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 5808–5822

  35. Wu Y, Jiang M, Lei J, Xu H (2015) Named entity recognition in chinese clinical text using deep neural network. Stud Health Technol Inform 216:624

    PubMed  PubMed Central  Google Scholar 

  36. Borthwick AE (1999) A maximum entropy approach to named entity recognition. New York University, ???

  37. Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2020) BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 7871–7880

  38. Tang M, Zhang P, He Y, Xu Y, Chao C, Xu H (2022) Dosea: a domain-specific entity-aware framework for cross-domain named entity recogition. In: Proceedings of the 29th international conference on computational linguistics, pp 2147–2156

  39. Tjong Kim Sang EF, De Meulder F (2003) Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003, pp 142–147

Download references

Funding

This study was funded by National Natural Science Foundation of China(No. 62272045).

Author information

Authors and Affiliations

Authors

Contributions

Yutong Jiang: Conceptualization ,Investigation, Methodology, Visualization, Writing, Resources. Fusheng Jin: Funding acquisition, Supervision, Conceptualization, Data curation, Methodology, Writing review & editing. Mengnan Chen: Data curation, Methodology, Validation, Visualization, Writing. Guoming Liu: Writing, Validation. He Pang: Writing, Supervision, review & editing. Ye Yuan: Writing review & editing.

Corresponding author

Correspondence to Fusheng Jin.

Ethics declarations

Competing interests

The authors declare that they have no conflict of interest regarding the publication of this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, Y., Jin, F., Chen, M. et al. Cross-domain NER in the data-poor scenarios for human mobility knowledge. Geoinformatica (2024). https://doi.org/10.1007/s10707-024-00513-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10707-024-00513-z

Keywords

Navigation