Skip to main content

Knowledge-Based Named Entity Recognition of Archaeological Concepts in Dutch

  • Conference paper
  • First Online:
Metadata and Semantic Research (MTSR 2020)

Abstract

The advancement of Natural Language Processing (NLP) allows the process of deriving information from large volumes of text to be automated, making text-based resources more discoverable and useful. The attention is turned to one of the most important, but traditionally difficult to access resources in archaeology; the largely unpublished reports generated by commercial or “rescue” archaeology, commonly known as “grey literature”. The paper presents the development and evaluation of a Named Entity Recognition system of Dutch archaeological grey literature targeted at extracting mentions of artefacts, archaeological features, materials, places and time entities. The role of domain vocabulary is discussed for the development of a KOS-driven NLP pipeline which is evaluated against a Gold Standard, human-annotated corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Simple Knowledge Organization System (SKOS) is a Semantic Web format and aW3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject heading systems, or any other type of structured controlled vocabulary https://www.w3.org/2004/02/skos/.

  2. 2.

    http://openskos.org/api/collections/rce:EGT.html.

  3. 3.

    https://opennlp.apache.org/.

  4. 4.

    https://snowballstem.org/.

  5. 5.

    https://gate.ac.uk/sale/tao/splitch13.html#x18-33400013.8.

  6. 6.

    https://cloud.gate.ac.uk/shopfront/displayItem/archaeology-ner-nl.

References

  1. Evans, T.N.: A reassessment of archaeological grey literature: semantics and paradoxes. Internet Archaeol. 40 (2015)

    Google Scholar 

  2. Rijksdienst vvor het Cultureel Erfgoed. Archis Invoer. https://archis.cultureelerfgoed.nl. Accessed 05 May 2019 (2019)

  3. Richards, J., Tudhope, D., Vlachidis, A.: Text mining in archaeology: extracting information from archaeological reports. In: Barcelo, J.A., Bogdanovic, I. (eds.) Mathematics and Archaeology, pp. 240–254. CRC Press, Boca Raton (2015)

    Google Scholar 

  4. Brandsen, A., Lambers, K., Verberne, S., Wansleeben, M.: User requirement solicitation for an information retrieval system applied to Dutch grey literature in the archaeology domain. J. Comput. Appl. Archaeol. 2(1), 21–30 (2019)

    Google Scholar 

  5. Vlachidis, A., Tudhope, D.: A knowledge- based approach to Information Extraction for semantic interoperability in the archaeology domain. J. Assoc. Inf. Sci. Technol. 67(5), 1138–1152 (2016)

    Article  Google Scholar 

  6. Cunningham, H., Tablan, V., Roberts, A., Bontcheva, K.: Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics. PLoS Comput. Biol. 9(2), e1002854 (2013)

    Article  Google Scholar 

  7. Meghini, C., et al.: ARIADNE: a research infrastructure for archaeology. J. Comput. Cult. Heritage (JOCCH) 10(3), 18 (2017)

    Google Scholar 

  8. Tudhope, D., May, K., Binding, C., Vlachidis. A.: Connecting archaeological data and grey literature via semantic cross search. Internet Archaeol. 30 (2011)

    Google Scholar 

  9. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)

    Article  Google Scholar 

  10. Toledo, J.I., Carbonell, M., Fornés, A., Lladós, A.J.: Information extraction from historical handwritten document images with a context-aware neural model. Pattern Recogn. 86, 27–36 (2019)

    Article  Google Scholar 

  11. Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: 16th International Conference on Computational Lingusitics, pp. 466–471 (1996)

    Google Scholar 

  12. Tjong Kim Sang, E.F.: Introduction to the CoNLL-2002 shared task: language-independent named entity recognition. In: Proceedings of CoNLL-2002, pp. 155–158 (2002)

    Google Scholar 

  13. Hooland, S., De Wilde, M., Verborgh, R., Steiner, T., Van de Walle, R.: Exploring entity recognition and disambiguation for cultural heritage collections. Digit. Sch. Hum. 30(2), 262–279 (2013)

    Google Scholar 

  14. Amrani, A., Abajian, V., Kodratoff, Y.: A chain of text-mining to extract information in archaeology. In: Annual IEEE Computer Conference, International Conference on Information and Communication Technologies: From Theory to Applications, and ICTTA, 3rd International Conference on Information and Communication Technologies: From Theory to Applications, 7–11 April (2008)

    Google Scholar 

  15. Paijmans, H., Wubben, S.: Preparing archaeological reports for intelligent retrieval. In: Posluschny, A., Lambers, K., Herzog, I. (eds.) Layers of Perception. Proceedings of the 35th International Conference on Computer Applications and Quantitative Methods in Archaeology (CAA) Berlin, Germany, April 2–6, pp. 212–217 (2007)

    Google Scholar 

  16. Byrne, K.F., Klein, E.: Automatic extraction of archaeological events from text. In: Frischer, B., Crawford, J.W., Koller, D. (eds.) Making History Interactive. Proceedings of the 37th Computer Application in Archaeology Conference, pp. 48–56 (2009)

    Google Scholar 

  17. Jeffrey, S., Richards, J., Ciravegna, F., Waller, S., Chapman, S., Zhang, Z.: The archaeotools project: faceted classification and natural language processing in an archaeological context. Philosoph. Trans. Ser. A. Math. Phys. Eng. Sci. 367(1897), 2507–2519 (2009)

    Google Scholar 

  18. Vlachidis, A.: Semantic indexing via knowledge organization systems: applying the CIDOC-CRM to archaeological grey literature. Doctoral dissertation, University of Glamorgan (2012)

    Google Scholar 

  19. Piskorski, J., Yangarber, J.R.: Information extraction: past, present and future. In: Poibeau, T., Saggion, H., Piskorski, J., Yangarber, R. (eds.) Multi-source, Multilingual Information Extraction and Summarization, pp. 23–49. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-28569-1_2

    Chapter  Google Scholar 

  20. Piskorski, J., Wieloch, K., Sydow, M.: On knowledge-poor methods for person name matching and lemmatization for highly inflectional languages. Inf. Retr. 12(3), 275–299 (2009)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the European Commission under the Community’s Seventh Framework Programme, contract no. FP7-INFRASTRUCTURES-2012-1-313193 (the ARIADNE project). Thanks, are due to ARIADNE project partners from Leiden University who helped with the definition of the Gold standard

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Vlachidis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vlachidis, A., Tudhope, D., Wansleeben, M. (2021). Knowledge-Based Named Entity Recognition of Archaeological Concepts in Dutch. In: Garoufallou, E., Ovalle-Perandones, MA. (eds) Metadata and Semantic Research. MTSR 2020. Communications in Computer and Information Science, vol 1355. Springer, Cham. https://doi.org/10.1007/978-3-030-71903-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-71903-6_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-71902-9

  • Online ISBN: 978-3-030-71903-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics