Combining Textual and Graph-Based Features for Named Entity Disambiguation Using Undirected Probabilistic Graphical Models

  • Sherzod Hakimov
  • Hendrik ter Horst
  • Soufian Jebbara
  • Matthias Hartung
  • Philipp Cimiano
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10024)

Abstract

Named Entity Disambiguation (NED) is the task of disambiguating named entities in a natural language text by linking them to their corresponding entities in a knowledge base such as DBpedia, which are already recognized. It is an important step in transforming unstructured text into structured knowledge. Previous work on this task has proven a strong impact of graph-based methods such as PageRank on entity disambiguation. Other approaches rely on distributional similarity between an article and the textual description of a candidate entity. However, the combined impact of these different feature groups has not been explored to a sufficient extent. In this paper, we present a novel approach that exploits an undirected probabilistic model to combine different types of features for named entity disambiguation. Capitalizing on Markov Chain Monte Carlo sampling, our model is capable of exploiting complementary strengths between both graph-based and textual features. We analyze the impact of these features and their combination on named entity disambiguation. In an evaluation on the GERBIL benchmark, our model compares favourably to the current state-of-the-art in 8 out of 14 data sets.

Keywords

Entity disambiguation Collective entity disambiguation Named entity disambiguation Probabilistic graphical models Factor graphs 

References

  1. 1.
    Alhelbawy, A., Gaizauskas, R.J.: Graph ranking for collective named entity disambiguation. In: Proceedings of ACL (Short Papers), Baltimore, MD, pp. 75–80 (2014)Google Scholar
  2. 2.
    Andrieu, C., de Freitas, N., Doucet, A., Jordan, M.I.: An introduction to MCMC for machine learning. Mach. Learn. 50, 5–43 (2003)CrossRefMATHGoogle Scholar
  3. 3.
    Barrena, A., Soroa, A., Agirre, E.: Combining mention context and hyperlinks from Wikipedia for named entity disambiguation. In: Proceedings of \(\star \)SEM, Denver, CO, pp. 101–105 (2015)Google Scholar
  4. 4.
    Blei, D.M., Ng, A., Jordan, M.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATHGoogle Scholar
  5. 5.
    Bunescu, R.C., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of EACL, pp. 9–16 (2006)Google Scholar
  6. 6.
    Cano, A.E., Rizzo, G., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.S.: Making sense of microposts: (# microposts2014) named entity extraction & linking challenge. In: CEUR Workshop Proceedings, vol. 1141, pp. 54–60 (2014)Google Scholar
  7. 7.
    Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of SEMANTICS (2013)Google Scholar
  8. 8.
    Das Sarma, A., Molla, A.R., Pandurangan, G., Upfal, E.: Fast distributed pagerank computation. Theor. Comput. Sci. 561(Part B), 113–121 (2015). Special Issue on Distributed Computing and NetworkingMathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Ganea, O.E., Horlescu, M., Lucchi, A., Eickhoff, C., Hofmann, T.: Probabilistic bag-of-hyperlinks model for entity linking. In: Proceedings of WWW (2016)Google Scholar
  10. 10.
    Guo, Z., Barbosa, D.: Robust entity linking via random walks. In: Proceedings of CIKM, Shanghai, China, pp. 499–508 (2014)Google Scholar
  11. 11.
    Hakimov, S., Oto, S.A., Dogdu, E.: Named entity recognition and disambiguation using linked data and graph-based centrality scoring. In: Proceedings of the Workshop on Semantic Web Information Management (SWIM), pp. 1–7 (2012)Google Scholar
  12. 12.
    Haveliwala, T.H.: Topic-sensitive PageRank. In: Proceedings of WWW, pp. 517–526 (2002)Google Scholar
  13. 13.
    Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: Proceedings of EMNLP, Edinburgh, Scotland, UK, pp. 782–792 (2011)Google Scholar
  14. 14.
    Houlsby, N., Ciaramita, M.: A scalable gibbs sampler for probabilistic entity linking. In: de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C.X., de Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 335–346. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  15. 15.
    Jin, Y., Kcman, E., Wang, K., Loynd, R.: Entity linking at the tail: sparse signals, unknown entities and phrase models. In: Proceedings of WSDM (2014)Google Scholar
  16. 16.
    Klinger, R., Cimiano, P.: Joint and pipeline probabilistic models for fine-grained sentiment analysis: extracting aspects, subjective phrases and their relations. In: Proceedings of ICDMW, pp. 937–944 (2013)Google Scholar
  17. 17.
    Kschischang, F.R., Frey, B.J., Loeliger, H.A.: Factor graphs and sum product algorithm. IEEE Trans. Inf. Theory 47(2), 498–519 (2001)MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 163(4), 707–710 (1966)MathSciNetMATHGoogle Scholar
  19. 19.
    Liu, X., Li, Y., Wu, H., Zhou, M., Wei, F., Lu, Y.: Entity linking for tweets. In: Proceedings of ACL, Sofia, Bulgaria, pp. 1304–1311 (2013)Google Scholar
  20. 20.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  21. 21.
    Moro, A., Raganato, A., Navigli, R.: Entity linking meets word sense disambiguation: a unified approach. Trans. Assoc. Comput. Linguist. 2, 231–244 (2014)Google Scholar
  22. 22.
    Passos, A., Kumar, V., McCallum, A.: Lexicon infused phrase embeddings for named entity resolution. arXiv preprint arXiv:1404.5367 (2014)
  23. 23.
    Piccinno, F., Ferragina, P.: From TagME to WAT. A new entity annotator. In: Proceedings of ACM Workshop on Entity Recognition and Disambiguation, pp. 55–62 (2014)Google Scholar
  24. 24.
    Singh, S., Subramanya, A., Pereira, F., McCallum, A.: Large-scale cross-document coreference using distributed inference and hierarchical models. Proc. ACL 1, 793–803 (2011)Google Scholar
  25. 25.
    Tristram, F., Walter, S., Cimiano, P., Unger, C.: Weasel. A machine learning based approach to entity linking combining different features. In: Proceedings of ISWC Workshop on NLP and DBpedia (2015)Google Scholar
  26. 26.
    Usbeck, R., Ngonga Ngomo, A.-C., Röder, M., Gerber, D., Coelho, S.A., Auer, S., Both, A.: AGDISTIS - graph-based disambiguation of named entities using linked data. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 457–471. Springer, Heidelberg (2014)Google Scholar
  27. 27.
    Usbeck, R., Röder, M., Ngonga Ngomo, A.C., Baron, C., Both, A., Brümmer, M., Ceccarelli, D., Cornolti, M., Cherix, D., Eickmann, B., et al.: GERBIL. General entity annotator benchmarking framework. In: Proceedings of WWW, pp. 1133–1143 (2015)Google Scholar
  28. 28.
    Waitelonis, J., Sack, H.: Named entity linking in #tweets with kea. In: Proceedings of 6th workshop on Making Sense of Microposts - Named Entity Recognition and Linking (NEEL) Challenge, at WWW2016 (2016)Google Scholar
  29. 29.
    Wick, M., Rohanimanesh, K., Culotta, A., McCallum, A.: SampleRank. Learning preferences from atomic gradients. In: NIPS Workshop on Advances in Ranking, pp. 1–5 (2009)Google Scholar
  30. 30.
    Zwicklbauer, S., Seifert, C., Granitzer, M.: DoSeR - a knowledge-base-agnostic framework for entity disambiguation using semantic embeddings. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 182–198. Springer, Heidelberg (2016). doi:10.1007/978-3-319-34129-3_12 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Sherzod Hakimov
    • 1
  • Hendrik ter Horst
    • 1
  • Soufian Jebbara
    • 1
  • Matthias Hartung
    • 1
  • Philipp Cimiano
    • 1
  1. 1.Semantic Computing Group Cognitive Interaction Technology – Center of Excellence (CITEC)Bielefeld UniversityBielefeldGermany

Personalised recommendations