VoxEL: A Benchmark Dataset for Multilingual Entity Linking

  • Henry Rosales-MéndezEmail author
  • Aidan Hogan
  • Barbara Poblete
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11137)


The Entity Linking (EL) task identifies entity mentions in a text corpus and associates them with corresponding entities in a given knowledge base. While traditional EL approaches have largely focused on English texts, current trends are towards language-agnostic or otherwise multilingual approaches that can perform EL over texts in many languages. One of the obstacles to ongoing research on multilingual EL is a scarcity of annotated datasets with the same text in different languages. In this work we thus propose VoxEL: a manually-annotated gold standard for multilingual EL featuring the same text expressed in five European languages. We first motivate and describe the VoxEL dataset, using it to compare the behaviour of state of the art EL (multilingual) systems for five different languages, contrasting these results with those obtained using machine translation to English. Overall, our results identify how five state-of-the-art multilingual EL systems compare for various languages, how the results of different languages compare, and further suggest that machine translation of input text to English is now a competitive alternative to dedicated multilingual EL configurations.


Multilingual Entity linking Information extraction 



The work of Henry Rosales-Méndez was supported by CONICYT-PCHA/Doctorado Nacional/2016-21160017. The work was also supported by the Millennium Institute for Foundational Research on Data (IMFD) and by Fondecyt Grant No. 1181896. We also thank Michael Röder for his considerable help with GERBIL.


  1. 1.
    Brando, C., Frontini, F., Ganascia, J.G.: REDEN: Named entity linking in digital literary editions using linked data sets. CSIMQ 7, 60–80 (2016)CrossRefGoogle Scholar
  2. 2.
    Brümmer, M., Dojchinovski, M., Hellmann, S.: DBpedia abstracts: a large-scale, open, Multilingual NLP Training Corpus. In: LREC (2016)Google Scholar
  3. 3.
    Charton, E., Gagnon, M., Ozell, B.: Automatic semantic web annotation of named entities. In: Butz, C., Lingras, P. (eds.) AI 2011. LNCS (LNAI), vol. 6657, pp. 74–85. Springer, Heidelberg (2011). Scholar
  4. 4.
    Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: EMNLP-CoNLL, p. 708 (2007)Google Scholar
  5. 5.
    Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: I-SEMANTICS, pp. 121–124 (2013)Google Scholar
  6. 6.
    Dojchinovski, M., Kliegr, T.: Recognizing, classifying and linking entities with Wikipedia and DBpedia. In: WIKT, pp. 41–44 (2012)Google Scholar
  7. 7.
    Fahrni, A., Göckel, T., Strube, M.: HITS’ Monolingual and Cross-lingual Entity Linking System at TAC 2012: A Joint Approach. In: TAC (2012)Google Scholar
  8. 8.
    Ferragina, P., Scaiella, U.: Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In: CIKM, pp. 1625–1628. ACM (2010)Google Scholar
  9. 9.
    Fu, B., Brennan, R., O’Sullivan, D.: Cross-lingual ontology mapping and its use on the multilingual semantic web. In: Multilingual Semantic Web, pp. 13–20 (2010)Google Scholar
  10. 10.
    Guo, Z., Xu, Y., de Sá Mesquita, F., Barbosa, D., Kondrak, G.: ualberta at TAC-KBP 2012: English and Cross-Lingual Entity Linking. In: TAC (2012)Google Scholar
  11. 11.
    Hellmann, S., Lehmann, J., Auer, S., Brümmer, M.: Integrating NLP Using Linked Data. In: ISWC, pp. 98–113 (2013)Google Scholar
  12. 12.
    Hoffart, J., et al.: Robust disambiguation of named entities in text. In: EMNLP, pp. 782–792. ACL (2011)Google Scholar
  13. 13.
    Hoffart, J., Seufert, S., Nguyen, D.B., Theobald, M., Weikum, G.: KORE: keyphrase overlap relatedness for entity disambiguation. In: CIKM, pp. 545–554 (2012)Google Scholar
  14. 14.
    Jha, K., Röder, M., Ngonga Ngomo, A.-C.: All that glitters is not gold – rule-based curation of reference datasets for named entity recognition and entity linking. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10249, pp. 305–320. Springer, Cham (2017). Scholar
  15. 15.
    Kulkarni, S., Singh, A., Ramakrishnan, G., Chakrabarti, S.: Collective annotation of Wikipedia entities in web text. In: SIGKDD, pp. 457–466 (2009)Google Scholar
  16. 16.
    Lehmann, J.: DBpedia-a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6(2), 167–195 (2015)Google Scholar
  17. 17.
    Ling, X., Singh, S., Weld, D.S.: Design challenges for entity linking. TACL 3, 315–328 (2015)Google Scholar
  18. 18.
    Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: I-SEMANTICS, pp. 1–8. ACM (2011)Google Scholar
  19. 19.
    Minard, A., Speranza, M., Urizar, R., Altuna, B., van Erp, M., Schoen, A., van Son, C.: MEANTIME, the NewsReader Multilingual Event and Time Corpus. In: LREC (2016)Google Scholar
  20. 20.
    Moro, A., Navigli, R.: SemEval-2015 Task 13: Multilingual all-words sense disambiguation and entity linking. In: SemEval@ NAACL-HLT, pp. 288–297 (2015)Google Scholar
  21. 21.
    Moro, A., Raganato, A., Navigli, R.: Entity linking meets word sense disambiguation: a unified approach. Trans. ACL 2, 231–244 (2014)Google Scholar
  22. 22.
    Moussallem, D., et al.: MAG: A multilingual, knowledge-base agnostic and deterministic entity linking approach. In: K-CAP, p. 9 (2017)Google Scholar
  23. 23.
    Pappu, A., Blanco, R., Mehdad, Y., Stent, A., Thadani, K.: Lightweight multilingual entity extraction and linking. In: WSDM, pp. 365–374. ACM (2017)Google Scholar
  24. 24.
    Perera, S., Mendes, P.N., Alex, A., Sheth, A.P., Thirunarayan, K.: Implicit entity linking in Tweets. In: ESWC, pp. 118–132 (2016)CrossRefGoogle Scholar
  25. 25.
    Popov, B., et al.: Kim-a semantic platform for information extraction and retrieval. Nat. Lang. Eng. 10(3–4), 375–392 (2004)CrossRefGoogle Scholar
  26. 26.
    Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to Wikipedia. In: NAACL-HLT, pp. 1375–1384 (2011)Google Scholar
  27. 27.
    Röder, M., et al.: N\(^3\)-A collection of datasets for named entity recognition and disambiguation in the NLP interchange format. In: LREC, pp. 3529–3533 (2014)Google Scholar
  28. 28.
    Rosales-Méndez, H., Poblete, B., Hogan, A.: Multilingual entity linking: comparing English and Spanish. In: LD4IE@ISWC, pp. 62–73 (2017)Google Scholar
  29. 29.
    Rosales-Méndez, H., Poblete, B., Hogan, A.: What should entity linking link? In: AMW (2018)Google Scholar
  30. 30.
    Sasaki, F., Dojchinovski, M., Nehring, J.: Chainable and extendable knowledge integration web services. In: ISWC, pp. 89–101 (2016)Google Scholar
  31. 31.
    Speck, R., et al.: Ensemble learning of named entity recognition algorithms using multilayer perceptron for the multilingual web of data. In: K-CAP, p. 26 (2017)Google Scholar
  32. 32.
    Tsai, C.T., Roth, D.: Cross-lingual wikification using multilingual embeddings. In: NAACL-HLT, pp. 589–598 (2016)Google Scholar
  33. 33.
    Usbeck, R., et al.: AGDISTIS - Graph-based disambiguation of named entities using linked data. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 457–471. Springer, Cham (2014). Scholar
  34. 34.
    Usbeck, R., et al.: GERBIL: General Entity Annotator Benchmarking Framework. In: WWW, pp. 1133–1143 (2015)Google Scholar
  35. 35.
    Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)CrossRefGoogle Scholar
  36. 36.
    Waitelonis, J., Exeler, C., Sack, H.: Linked data enabled generalized vector space model to improve document retrieval. In: NLP & DBpedia @ ISWC (2015)Google Scholar
  37. 37.
    Wang, Z., Li, J., Tang, J.: Boosting cross-lingual knowledge linking via concept annotation. In: IJCAI, pp. 2733–2739 (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Henry Rosales-Méndez
    • 1
    Email author
  • Aidan Hogan
    • 1
  • Barbara Poblete
    • 1
  1. 1.IMFD Chile and Department of Computer ScienceUniversity of ChileSantiagoChile

Personalised recommendations