Skip to main content

Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents

  • 557 Accesses

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13186)

Abstract

We present the HIPE-2022 shared task on named entity processing in multilingual historical documents. Following the success of the first CLEF-HIPE-2020 evaluation lab, this edition confronts systems with the challenges of dealing with more languages, learning domain-specific entities, and adapting to diverse annotation tag sets. HIPE-2022 is part of the ongoing efforts of the natural language processing and digital humanities communities to adapt and develop appropriate technologies to efficiently retrieve and explore information from historical texts. On such material, however, named entity processing techniques face the challenges of domain heterogeneity, input noisiness, dynamics of language, and lack of resources. In this context, the main objective of the evaluation lab is to gain new insights into the transferability of named entity processing approaches across languages, time periods, document types, and annotation tag sets.

Keywords

  • Named entity processing
  • Information extraction
  • Text understanding
  • Historical documents
  • Digital humanities

This is a preview of subscription content, access via your institution.

Buying options

Chapter
EUR   30.94
Price includes VAT (Finland)
  • DOI: 10.1007/978-3-030-99739-7_44
  • Chapter length: 8 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
EUR   93.08
Price includes VAT (Finland)
  • ISBN: 978-3-030-99739-7
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
EUR   120.99
Price includes VAT (Finland)

Notes

  1. 1.

    https://impresso.github.io/CLEF-HIPE-2020.

  2. 2.

    https://hipe-eval.github.io/HIPE-2022/.

  3. 3.

    Classical commentaries are scholarly publications dedicated to the in-depth analysis and explanation of ancient literary works. As such, they aim to facilitate the reading and understanding of a given literary text. More information on the HIPE-2022 classical commentaries corpus in Sect. 3.2.

  4. 4.

    https://www.newseye.eu/.

  5. 5.

    https://sonar.fh-potsdam.de/.

  6. 6.

    https://livingwithmachines.ac.uk/.

  7. 7.

    Impresso [4] and SoNAR guidelines [12] were derived from Quaero guidelines [16], while NewsEye guidelines correspond to a subset of the impresso guidelines.

  8. 8.

    https://github.com/impresso/CLEF-HIPE-2020-scorer.

References

  1. Beryozkin, G., Drori, Y., Gilon, O., Hartman, T., Szpektor, I.: A joint named-entity recognizer for heterogeneous tag-sets using a tag hierarchy. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 140–150, Florence, Italy, July 2019. https://aclanthology.org/P19-1014

  2. Coll Ardanuy, M., Beavan, D., Beelen, K., Hosseini, K., Lawrence, J.: Dataset for Toponym Resolution in Nineteenth-Century English Newspapers (2021). https://doi.org/10.23636/b1c4-py78

  3. Ehrmann, M., Colavizza, G., Rochat, Y., Kaplan, F.: Diachronic evaluation of NER systems on old newspapers. In: Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016), pp. 97–107, Bochum (2016). Bochumer Linguistische Arbeitsberichte. https://infoscience.epfl.ch/record/221391

  4. Ehrmann, M., Romanello, M., Flückiger, A., Clematide, S.: Impresso Named Entity Annotation Guidelines. Annotation guidelines, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Zurich University (UZH), January 2020. https://zenodo.org/record/3585750

  5. Ehrmann, M., Romanello, M., Flückiger, A., Clematide, S.: Extended Overview of CLEF HIPE 2020: named entity processing on historical newspapers. In: Cappellato, L., Eickhoff, C., Ferro, N., Névéol, A., (eds.), Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, vol. 2696, p. 38, Thessaloniki, Greece (2020). CEUR-WS. https://doi.org/10.5281/zenodo.4117566, https://infoscience.epfl.ch/record/281054

  6. Ehrmann, M., Hamdi, A., Pontes, E.L., Romanello, M., Doucet, A.: Named Entity Recognition and Classification on Historical Documents: A Survey. arXiv:2109.11406 [cs], September 2021

  7. Markus, G., Neudecker, C., Isaac, A., Bergel, G., et al.: AI in relation to GLAMs task FOrce - Report and Recommendations. Technical report, Europeana Network ASsociation (2021). https://pro.europeana.eu/project/ai-in-relation-to-glams

  8. Hamdi, A., et al.: A multilingual dataset for named entity recognition, entity linking and stance detection in historical newspapers. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021, pp. 2328–2334, New York, NY, USA, July 2021. Association for Computing Machinery. ISBN 978-1-4503-8037-9. https://doi.org/10.1145/3404835.3463255

  9. Kaplan, F., di Lenardo, I.: Big data of the past. Front. Digit. Hum. 4:1–21 (2017). ISSN 2297–2668. https://doi.org/10.3389/fdigh.2017.00012. Publisher: Frontiers

  10. Li, J., Chiu, B., Feng, S., Wang, H.: Few-shot named entity recognition via meta-learning. IEEE Trans. Knowl. Data Eng. 1 (2020)

    Google Scholar 

  11. Li, J., Shang, S., Shao, L.: Metaner: Named entity recognition with meta-learning. In: Proceedings of The Web Conference 2020, WWW 2020, pp. 429–440, New York, NY, USA (2020). Association for Computing Machinery. ISBN 9781450370233. https://doi.org/10.1145/3366423.3380127

  12. Menzel, S., Zinck, J., Schnaitter, H., Petras, V.: Guidelines for Full Text Annotations in the SoNAR (IDH) Corpus. Technical report, Zenodo, July 2021. https://zenodo.org/record/5115933

  13. Padilla, T.: Responsible Operations: Data Science, Machine Learning, and AI in Libraries. Technical report, OCLC Research, USA, May 2020. https://www.oclc.org/content/research/publications/2019/oclcresearch-responsible-operations-data-science-machine-learning-ai.html

  14. Ridge, M., Colavizza, G., Brake, L., Ehrmann, M., Moreux, J.P., Prescott, A.: The past, present and future of digital scholarship with newspaper collections. In: DH 2019 Book of Abstracts, pp. 1–9, Utrecht, The Netherlands (2019). http://infoscience.epfl.ch/record/271329

  15. Matteo, R., Sven, N.-M., Bruce, R.: Optical character recognition of 19th century classical commentaries: the current state of affairs. In: The 6th International Workshop on Historical Document Imaging and Processing (HIP 2021), Lausanne, September 2021. Association for Computing Machinery. https://doi.org/10.1145/3476887.3476911

  16. Rosset, S., Grouin, C., Zweigenbaum, P.: Entités nommées structurées : Guide d’annotation Quaero. Technical Report 2011–04, LIMSI-CNRS, Orsay, France (2011)

    Google Scholar 

  17. Wu, Q.: Enhanced meta-learning for cross-lingual named entity recognition with minimal resources. CoRR, abs/1911.06161 (2019). http://arxiv.org/abs/1911.06161

Download references

Acknowledgements

We are grateful to the research project consortia and teams who kindly accepted to retain the publication of part of their NE-annotated datasets to support HIPE-2022: the NewsEye project (The NewsEye project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 770299); the Living with Machine project, in particular Mariona Coll’Ardanuy; and the SoNAR project, in particular Clemens Neudecker. We also thank Sally Chambers, Clemens Neudecker and Frédéric Kaplan for their support and guidance as part of the lab’s advisory board.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Maud Ehrmann .

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Ehrmann, M., Romanello, M., Doucet, A., Clematide, S. (2022). Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents. In: , et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13186. Springer, Cham. https://doi.org/10.1007/978-3-030-99739-7_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-99739-7_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-99738-0

  • Online ISBN: 978-3-030-99739-7

  • eBook Packages: Computer ScienceComputer Science (R0)