Abstract
The advancement of Natural Language Processing (NLP) allows the process of deriving information from large volumes of text to be automated, making text-based resources more discoverable and useful. The attention is turned to one of the most important, but traditionally difficult to access resources in archaeology; the largely unpublished reports generated by commercial or “rescue” archaeology, commonly known as “grey literature”. The paper presents the development and evaluation of a Named Entity Recognition system of Dutch archaeological grey literature targeted at extracting mentions of artefacts, archaeological features, materials, places and time entities. The role of domain vocabulary is discussed for the development of a KOS-driven NLP pipeline which is evaluated against a Gold Standard, human-annotated corpus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Simple Knowledge Organization System (SKOS) is a Semantic Web format and aW3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject heading systems, or any other type of structured controlled vocabulary https://www.w3.org/2004/02/skos/.
- 2.
- 3.
- 4.
- 5.
- 6.
References
Evans, T.N.: A reassessment of archaeological grey literature: semantics and paradoxes. Internet Archaeol. 40 (2015)
Rijksdienst vvor het Cultureel Erfgoed. Archis Invoer. https://archis.cultureelerfgoed.nl. Accessed 05 May 2019 (2019)
Richards, J., Tudhope, D., Vlachidis, A.: Text mining in archaeology: extracting information from archaeological reports. In: Barcelo, J.A., Bogdanovic, I. (eds.) Mathematics and Archaeology, pp. 240–254. CRC Press, Boca Raton (2015)
Brandsen, A., Lambers, K., Verberne, S., Wansleeben, M.: User requirement solicitation for an information retrieval system applied to Dutch grey literature in the archaeology domain. J. Comput. Appl. Archaeol. 2(1), 21–30 (2019)
Vlachidis, A., Tudhope, D.: A knowledge- based approach to Information Extraction for semantic interoperability in the archaeology domain. J. Assoc. Inf. Sci. Technol. 67(5), 1138–1152 (2016)
Cunningham, H., Tablan, V., Roberts, A., Bontcheva, K.: Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics. PLoS Comput. Biol. 9(2), e1002854 (2013)
Meghini, C., et al.: ARIADNE: a research infrastructure for archaeology. J. Comput. Cult. Heritage (JOCCH) 10(3), 18 (2017)
Tudhope, D., May, K., Binding, C., Vlachidis. A.: Connecting archaeological data and grey literature via semantic cross search. Internet Archaeol. 30 (2011)
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Toledo, J.I., Carbonell, M., Fornés, A., Lladós, A.J.: Information extraction from historical handwritten document images with a context-aware neural model. Pattern Recogn. 86, 27–36 (2019)
Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: 16th International Conference on Computational Lingusitics, pp. 466–471 (1996)
Tjong Kim Sang, E.F.: Introduction to the CoNLL-2002 shared task: language-independent named entity recognition. In: Proceedings of CoNLL-2002, pp. 155–158 (2002)
Hooland, S., De Wilde, M., Verborgh, R., Steiner, T., Van de Walle, R.: Exploring entity recognition and disambiguation for cultural heritage collections. Digit. Sch. Hum. 30(2), 262–279 (2013)
Amrani, A., Abajian, V., Kodratoff, Y.: A chain of text-mining to extract information in archaeology. In: Annual IEEE Computer Conference, International Conference on Information and Communication Technologies: From Theory to Applications, and ICTTA, 3rd International Conference on Information and Communication Technologies: From Theory to Applications, 7–11 April (2008)
Paijmans, H., Wubben, S.: Preparing archaeological reports for intelligent retrieval. In: Posluschny, A., Lambers, K., Herzog, I. (eds.) Layers of Perception. Proceedings of the 35th International Conference on Computer Applications and Quantitative Methods in Archaeology (CAA) Berlin, Germany, April 2–6, pp. 212–217 (2007)
Byrne, K.F., Klein, E.: Automatic extraction of archaeological events from text. In: Frischer, B., Crawford, J.W., Koller, D. (eds.) Making History Interactive. Proceedings of the 37th Computer Application in Archaeology Conference, pp. 48–56 (2009)
Jeffrey, S., Richards, J., Ciravegna, F., Waller, S., Chapman, S., Zhang, Z.: The archaeotools project: faceted classification and natural language processing in an archaeological context. Philosoph. Trans. Ser. A. Math. Phys. Eng. Sci. 367(1897), 2507–2519 (2009)
Vlachidis, A.: Semantic indexing via knowledge organization systems: applying the CIDOC-CRM to archaeological grey literature. Doctoral dissertation, University of Glamorgan (2012)
Piskorski, J., Yangarber, J.R.: Information extraction: past, present and future. In: Poibeau, T., Saggion, H., Piskorski, J., Yangarber, R. (eds.) Multi-source, Multilingual Information Extraction and Summarization, pp. 23–49. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-28569-1_2
Piskorski, J., Wieloch, K., Sydow, M.: On knowledge-poor methods for person name matching and lemmatization for highly inflectional languages. Inf. Retr. 12(3), 275–299 (2009)
Acknowledgments
This work was supported by the European Commission under the Community’s Seventh Framework Programme, contract no. FP7-INFRASTRUCTURES-2012-1-313193 (the ARIADNE project). Thanks, are due to ARIADNE project partners from Leiden University who helped with the definition of the Gold standard
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Vlachidis, A., Tudhope, D., Wansleeben, M. (2021). Knowledge-Based Named Entity Recognition of Archaeological Concepts in Dutch. In: Garoufallou, E., Ovalle-Perandones, MA. (eds) Metadata and Semantic Research. MTSR 2020. Communications in Computer and Information Science, vol 1355. Springer, Cham. https://doi.org/10.1007/978-3-030-71903-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-71903-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71902-9
Online ISBN: 978-3-030-71903-6
eBook Packages: Computer ScienceComputer Science (R0)