Skip to main content

Who is Mona L.? Identifying Mentions of Artworks in Historical Archives

Part of the Lecture Notes in Computer Science book series (LNISA,volume 11799)

Abstract

Named entity recognition (NER) plays an important role in many information retrieval tasks, including automatic knowledge graph construction. Most NER systems are typically limited to a few common named entity types, such as person, location, and organization. However, for cultural heritage resources, such as art historical archives, the recognition of titles of artworks as named entities is of high importance. In this work, we focus on identifying mentions of artworks, e.g. paintings and sculptures, from historical archives. Current state of the art NER tools are unable to adequately identify artwork titles due to the particular difficulties presented by this domain. The scarcity of training data for NER for cultural heritage poses further hindrances. To mitigate this, we propose a semi-supervised approach to create high-quality training data by leveraging existing cultural heritage resources. Our experimental evaluation shows significant improvement in NER performance for artwork titles as compared to baseline approach.

Keywords

  • Named entity recognition
  • Cultural heritage data

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-30760-8_10
  • Chapter length: 8 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   69.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-30760-8
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   89.99
Price excludes VAT (USA)
Fig. 1.

Notes

  1. 1.

    Linked Open Data: http://www.w3.org/DesignIssues/LinkedData.

  2. 2.

    OpenGLAM: http://openglam.org.

  3. 3.

    Europeana: http://europeana.eu.

  4. 4.

    SpaCy: https://spacy.io/, version 2.1.3.

  5. 5.

    from the exhibition catalogue “Lukas Cranach: Gemälde, Zeichnungen, Druckgraphik” (https://digi.ub.uni-heidelberg.de/diglit/koepplin1974bd1/0084).

  6. 6.

    https://query.wikidata.org/.

  7. 7.

    https://github.com/HPI-Information-Systems/enno.

  8. 8.

    https://wpi.art.

References

  1. Chinchor, N.: Overview of MUC-7. In: Proceedings of the Seventh Message Understanding Conference (MUC-7) (1998)

    Google Scholar 

  2. de Boer, V., et al.: Supporting linked data production for cultural heritage institutes: the Amsterdam museum case study. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 733–747. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30284-8_56

    CrossRef  Google Scholar 

  3. Dijkshoorn, C., et al.: The Rijksmuseum collection as linked data. Semant. Web 9(2), 221–230 (2018)

    CrossRef  Google Scholar 

  4. Ehrmann, M., Colavizza, G., Rochat, Y., Kaplan, F.: Diachronic evaluation of NER systems on old newspapers. In: Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016), pp. 97–107 (2016)

    Google Scholar 

  5. Pradhan, S., et al.: Towards robust linguistic analysis using OntoNotes. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pp. 143–152 (2013)

    Google Scholar 

  6. Prokofyev, R., Demartini, G., Cudré-Mauroux, P.: Effective named entity recognition for idiosyncratic web collections. In: Proceedings of the 23rd International Conference on World Wide Web (WWW), pp. 397–408. ACM (2014)

    Google Scholar 

  7. Rodriquez, K.J., Bryant, M., Blanke, T., Luszczynska, M.: Comparison of Named entity recognition tools for raw OCR text. In: Konvens, pp. 410–414 (2012)

    Google Scholar 

  8. Sang, E.F.T.K., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. Development 922, 1341 (1837)

    Google Scholar 

  9. Szekely, P., et al.: Connecting the smithsonian american art museum to the linked data cloud. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 593–607. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38288-8_40

    CrossRef  Google Scholar 

  10. Van Hooland, S., De Wilde, M., Verborgh, R., Steiner, T., Van de Walle, R.: Exploring entity recognition and disambiguation for cultural heritage collections. Digit. Sch. Humanit. 30(2), 262–279 (2013)

    CrossRef  Google Scholar 

  11. Van Hooland, S., Verborgh, R.: Linked Data for Libraries, Archives and Museums: How to Clean, Link and Publish Your Metadata. Facet Publishing, London (2014)

    Google Scholar 

  12. Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledge base. Commun. ACM 57(10), 78–85 (2014). https://doi.org/10.1145/2629489

    CrossRef  Google Scholar 

Download references

Acknowledgements

We thank the Wildenstein Plattner InstituteFootnote 8 for providing the corpus used in this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nitisha Jain .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Jain, N., Krestel, R. (2019). Who is Mona L.? Identifying Mentions of Artworks in Historical Archives. In: Doucet, A., Isaac, A., Golub, K., Aalberg, T., Jatowt, A. (eds) Digital Libraries for Open Knowledge. TPDL 2019. Lecture Notes in Computer Science(), vol 11799. Springer, Cham. https://doi.org/10.1007/978-3-030-30760-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30760-8_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30759-2

  • Online ISBN: 978-3-030-30760-8

  • eBook Packages: Computer ScienceComputer Science (R0)