Named Entity Linking in a Complex Domain: Case Second World War History

  • Erkki Heino
  • Minna Tamper
  • Eetu Mäkelä
  • Petri Leskinen
  • Esko Ikkala
  • Jouni Tuominen
  • Mikko Koho
  • Eero Hyvönen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10318)

Abstract

This paper discusses the challenges of applying named entity linking in a rich, complex domain – specifically, the linking of (1) military units, (2) places and (3) people in the context of interlinked Second World War data. Multiple sub-scenarios are discussed in detail through concrete evaluations, analyzing the problems faced, and the solutions developed. A key contribution of this work is to highlight the heterogeneity of problems and approaches needed even inside a single domain, depending on both the source data as well as the target authority.

Keywords

Arena Extractor Prose Sanoma 

Notes

Acknowledgements

Our work is funded by the Open Science and Research Initiative (http://openscience.fi/) of the Finnish Ministry of Education and Culture, the Finnish Cultural Foundation, and the Academy of Finland

References

  1. 1.
    Bunescu, R.C., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. In: EACL, vol. 6, pp. 9–16 (2006)Google Scholar
  2. 2.
    Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: EMNLP-CoNLL, vol. 7, pp. 708–716 (2007)Google Scholar
  3. 3.
    Doerr, M.: The CIDOC CRM - an ontological approach to semantic interoperability of metadata. AI Mag. 24(3), 75–92 (2003)Google Scholar
  4. 4.
    Godoy, J., Atkinson, J., Rodriguez, A.: Geo-referencing with semi-automatic gazetteer expansion using lexico-syntactical patterns and co-reference analysis. Int. J. Geogr. Inf. Sci. 25(1), 149–170 (2011). http://dx.doi.org/10.1080/13658816.2010.513981 CrossRefGoogle Scholar
  5. 5.
    Gracia, J., Mena, E.: Multiontology semantic disambiguation in unstructured web contexts. In: Proceedings of the 2009 K-CAP Workshop on Collective Knowledge Capturing and Representation, pp. 1–9 (2009)Google Scholar
  6. 6.
    Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: Coling, vol. 96, pp. 466–471 (1996)Google Scholar
  7. 7.
    Grover, C., Tobin, R., Byrne, K., Woollard, M., Reid, J., Dunn, S., Ball, J.: Use of the edinburgh geoparser for georeferencing digitized historical collections. Philos. Trans. R. Soc. Lond. A Math. Phys. Eng. Sci. 368(1925), 3875–3889 (2010). http://rsta.royalsocietypublishing.org/content/368/1925/3875 Google Scholar
  8. 8.
    Hachey, B., Radford, W., Nothman, J., Honnibal, M., Curran, J.R.: Evaluating entity linking with Wikipedia. Artif. Intell. 194, 130–150 (2013). http://dx.doi.org/10.1016/j.artint.2012.04.005 MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 782–792 (2011). http://dl.acm.org/citation.cfm?id=2145432.2145521
  10. 10.
    Hu, Y., Janowicz, K., Prasad, S.: Improving Wikipedia-based place name disambiguation in short texts using structured data from DBpedia. In: Proceedings of the 8th Workshop on Geographic Information Retrieval, GIR 2014, NY, USA, pp. 8:1–8:8 (2014). http://doi.acm.org/10.1145/2675354.2675356
  11. 11.
    Hyvönen, E., Heino, E., Leskinen, P., Ikkala, E., Koho, M., Tamper, M., Tuominen, J., Mäkelä, E.: WarSampo Data Service and Semantic Portal for Publishing Linked Open Data About the Second World War History. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 758–773. Springer, Cham (2016). doi: 10.1007/978-3-319-34129-3_46 CrossRefGoogle Scholar
  12. 12.
    Hyvönen, E., Tuominen, J., Kauppinen, T., Väätäinen, J.: Representing and utilizing changing historical places as an ontology time series. In: Ashish, N., Sheth, A. (eds.) Geospatial Semantics and Semantic Web: Foundations, Algorithms, and Applications. Springer, New York (2011)Google Scholar
  13. 13.
    Kettunen, K., Mäkelä, E., Kuokkala, J., Ruokolainen, T., Niemi, J.: Modern tools for old content - in search of named entities in a finnish ocred historical newspaper collection 1771–1910. In: Proceedings of LWDA 2016, September 2016Google Scholar
  14. 14.
    Koho, M., Hyvönen, E., Heino, E., Tuominen, J., Leskinen, P., Mäkelä, E.: Linked death - representing, publishing, and using second world war death records as linked open data. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds.) The Semantic Web: ESWC 2016 Satellite Events. Springer, Heidelberg (2016)Google Scholar
  15. 15.
    Löfberg, L., Archer, D., Piao, S., Rayson, P., McEnery, T., Varantola, K., Juntunen, J.P.: Porting an English semantic tagger to the finnish language. In: Proceedings of the Corpus Linguistics 2003 conference, pp. 457–464 (2003)Google Scholar
  16. 16.
    Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: Dbpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. ACM (2011)Google Scholar
  17. 17.
    Mäkelä, E.: Combining a REST Lexical Analysis Web Service with SPARQL for Mashup Semantic Annotation from Text. In: Presutti, V., Blomqvist, E., Troncy, R., Sack, H., Papadakis, I., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8798, pp. 424–428. Springer, Cham (2014). doi: 10.1007/978-3-319-11955-7_60 Google Scholar
  18. 18.
    Mäkelä, E.: LAS: an integrated language analysis tool for multiple languages. J. Open Source Softw. 1(6), 2 (2016). http://dx.doi.org/10.21105/joss.00035 CrossRefGoogle Scholar
  19. 19.
    Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Invest. 30(1), 3–26 (2007)CrossRefGoogle Scholar
  20. 20.
    Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng. 27(2), 443–460 (2015)CrossRefGoogle Scholar
  21. 21.
    The Association for Military History in Finland: Kansa taisteli lehdet 1957–1986 (2014). http://www.sshs.fi/sitenews/view/-/nid/92/ngid/1
  22. 22.
    Wentland, W., Knopp, J., Silberer, C., Hartung, M.: Building a multilingual lexical resource for named entity disambiguation, translation and transliteration. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation, LREC 2008, European Language Resources Association (ELRA), Marrakech, Morocco, May 2008. http://www.lrec-conf.org/proceedings/lrec2008/

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Erkki Heino
    • 1
    • 2
  • Minna Tamper
    • 1
    • 2
  • Eetu Mäkelä
    • 1
    • 2
  • Petri Leskinen
    • 1
    • 2
  • Esko Ikkala
    • 1
    • 2
  • Jouni Tuominen
    • 1
    • 2
  • Mikko Koho
    • 1
    • 2
  • Eero Hyvönen
    • 1
    • 2
  1. 1.Semantic Computing Research Group (SeCo)Aalto UniversityEspooFinland
  2. 2.HELDIG – Helsinki Centre for Digital HumanitiesUniversity of HelsinkiHelsinkiFinland

Personalised recommendations