Advertisement

Linking Named Entities in Dutch Historical Newspapers

  • Theo van VeenEmail author
  • Juliette Lonij
  • Willem Jan Faber
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 672)

Abstract

We improved access to the collection of Dutch historical newspapers of the Koninklijke Bibliotheek by linking named entities in the newspaper articles to corresponding Wikidata descriptions by means of machine learning techniques and crowdsourcing. Indexing the Wikidata identifiers for named entities together with the newspaper articles opens up new possibilities for retrieving articles that mention these resources and searching the newspaper collection using semantic relations from Wikidata. In this paper we describe our steps so far in setting up this combination of entity linking, machine learning and crowdsourcing in our research environment as well as our planned activities aimed at improving the quality of the links and extending the semantic search capabilities.

Keywords

Named entities Linked data Entity linking Semantic enrichment Semantic search Machine learning Classification Crowdsourcing 

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
    VIAF, Virtual International Authority File. http://viaf.org/
  5. 5.
    Van Veen, T., Lonij, J., Koppelaar, H.: Semantic enrichment: a low-barrier infrastructure and proposal for alignment. D-Lib Mag. (2015). doi: 10.1045/july2015-vanveen Google Scholar
  6. 6.
    Odijk, D., Meij, E., de Rijke, M.: Feeding the second screen: semantic linking based on subtitles. In: Open Research Areas in Information Retrieval (OAIR 2013), Lisbon (2013)Google Scholar
  7. 7.
    Sil, A., Croning, E., et al.: Linking named entities in any database. In: EMNLP-CoNLL 2012 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea (2012)Google Scholar
  8. 8.
    Stanford Named Entity Recognizer. http://nlp.stanford.edu/software/CRF-NER.shtml
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
    SPARQL, query language for RDF. http://www.w3.org/TR/rdf-sparql-query/
  14. 14.
    SRU, Search and Retrieval via URL’s. http://www.loc.gov/standards/sru/
  15. 15.

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Theo van Veen
    • 1
    Email author
  • Juliette Lonij
    • 1
  • Willem Jan Faber
    • 1
  1. 1.Koninklijke Bibliotheek, National Library of the NetherlandsThe HagueThe Netherlands

Personalised recommendations