DoSeR - A Knowledge-Base-Agnostic Framework for Entity Disambiguation Using Semantic Embeddings

  • Stefan Zwicklbauer
  • Christin Seifert
  • Michael Granitzer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9678)

Abstract

Entity disambiguation is the task of mapping ambiguous terms in natural-language text to its entities in a knowledge base. It finds its application in the extraction of structured data in RDF (Resource Description Framework) from textual documents, but equally so in facilitating artificial intelligence applications, such as Semantic Search, Reasoning and Question & Answering. In this work, we propose DoSeR (Disambiguation of Semantic Resources), a (named) entity disambiguation framework that is knowledge-base-agnostic in terms of RDF (e.g. DBpedia) and entity-annotated document knowledge bases (e.g. Wikipedia). Initially, our framework automatically generates semantic entity embeddings given one or multiple knowledge bases. In the following, DoSeR accepts documents with a given set of surface forms as input and collectively links them to an entity in a knowledge base with a graph-based approach. We evaluate DoSeR on seven different data sets against publicly available, state-of-the-art (named) entity disambiguation frameworks. Our approach outperforms the state-of-the-art approaches that make use of RDF knowledge bases and/or entity-annotated document knowledge bases by up to 10 % F1 measure.

Keywords

Entity disambiguation Neural networks Linked Data Semantic web 

References

  1. 1.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: 7th WWW, pp. 107–117. Elsevier Science Publishers B.V., Amsterdam (1998)Google Scholar
  2. 2.
    Cheng, X., Roth, D.: Relational inference for wikification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2013 (2013)Google Scholar
  3. 3.
    Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: EMNLP-CoNLL, pp. 708–716. ACL, Prague, June 2007Google Scholar
  4. 4.
    Ferragina, P., Scaiella, U.: Fast and accurate annotation of short texts with wikipedia pages. IEEE Softw. 29(1), 70–75 (2012)CrossRefGoogle Scholar
  5. 5.
    Han, X., Sun, L.: An entity-topic model for entity linking. In: EMNLP-CoNLL, pp. 105–115. ACL, Stroudsburg (2012)Google Scholar
  6. 6.
    Han, X., Sun, L., Zhao, J.: Collective entity linking in web text: a graph-based method. In: SIGIR, pp. 765–774. ACM, New York (2011)Google Scholar
  7. 7.
    Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: EMNLP, pp. 782–792. ACL, Stroudsburg (2011)Google Scholar
  8. 8.
    Huang, H., Heck, L., Ji, H.: Leveraging deep neural networks and knowledge graphs for entity disambiguation. CoRR abs/1504.07678 (2015)Google Scholar
  9. 9.
    Kataria, S.S., Kumar, K.S., Rastogi, R.R., Sen, P., Sengamedu, S.H.: Entity disambiguation with hierarchical topic models. In: 17th SIGKDD, pp. 1037–1045. ACM, New York (2011)Google Scholar
  10. 10.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Kulkarni, S., Singh, A., Ramakrishnan, G., Chakrabarti, S.: Collective annotation of wiki-pedia entities in web text. In: 15th SIGKDD, pp. 457–466. ACM, New York (2009)Google Scholar
  12. 12.
    Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web J. 6, 167–195 (2014)Google Scholar
  13. 13.
    Mahdisoltani, F., Biega, J., Suchanek, F.M.: Yago3: a knowledge base from multilingual wikipedias (2015)Google Scholar
  14. 14.
    Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: Dbpedia spotlight: shedding light on the web of documents. In: 7th I-Semantics, pp. 1–8. ACM, New York (2011)Google Scholar
  15. 15.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)Google Scholar
  16. 16.
    Milne, D., Witten, I.H.: Learning to link with wikipedia. In: 17th CIKM, pp. 509–518. ACM, New York (2008)Google Scholar
  17. 17.
    Piccinno, F., Ferragina, P.: From TagMe to WAT: a new entity annotator. In: First International Workshop on Entity Recognition/Disambiguation, ERD 2014, pp. 55–62. ACM, New York (2014)Google Scholar
  18. 18.
    Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to wikipedia. In: ACL, pp. 1375–1384. ACL, Stroudsburg (2011)Google Scholar
  19. 19.
    Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop, pp. 45–50. ELRA, Valletta, May 2010Google Scholar
  20. 20.
    Röder, M., Usbeck, R., Hellmann, S., Gerber, D., Both, A.: N3 - a collection of datasets for named entity recognition and disambiguation in the NLP interchange format. In: 9th LREC, Reykjavik, Iceland, 26–31 May 2014 (2014)Google Scholar
  21. 21.
    Shen, W., Wang, J., Luo, P., Wang, M.: Linden: linking named entities with knowledge base via semantic knowledge. In: 21st WWW, pp. 449–458. ACM, New York (2012)Google Scholar
  22. 22.
    Usbeck, R., Ngonga Ngomo, A.-C., Röder, M., Gerber, D., Coelho, S.A., Auer, S., Both, A.: AGDISTIS - graph-based disambiguation of named entities using linked data. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 457–471. Springer, Heidelberg (2014)Google Scholar
  23. 23.
    Usbeck, R., Röder, M., Ngonga Ngomo, A.C., Baron, C., Both, A., Brümmer, M., Ceccarelli, D., Cornolti, M., Cherix, D., Eickmann, B., Ferragina, P., Lemke, C., Moro, A., Navigli, R., Piccinno, F., Rizzo, G., Sack, H., Speck, R., Troncy, R., Waitelonis, J., Wesemann, L.: GERBIL - general entity annotation benchmark framework. In: 24th WWW Conference (2015)Google Scholar
  24. 24.
    White, S., Smyth, P.: Algorithms for estimating relative importance in networks. In: 9th SIGKDD, pp. 266–275. ACM, New York (2003)Google Scholar
  25. 25.
    Xie, W., Bindel, D., Demers, A., Gehrke, J.: Edge-weighted personalized pagerank: breaking a decade-old performance barrier. In: 21th SIGKDD, pp. 1325–1334. ACM, New York (2015)Google Scholar
  26. 26.
    Zwicklbauer, S., Seifert, C., Granitzer, M.: From general to specialized domain: analyzing three crucial problems of biomedical entity disambiguation. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds.) DEXA 2015. LNCS, vol. 9261, pp. 76–93. Springer, Heidelberg (2015)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Stefan Zwicklbauer
    • 1
  • Christin Seifert
    • 1
  • Michael Granitzer
    • 1
  1. 1.University of PassauPassauGermany

Personalised recommendations