A Scalable Gibbs Sampler for Probabilistic Entity Linking

  • Neil Houlsby
  • Massimiliano Ciaramita
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8416)


Entity linking involves labeling phrases in text with their referent entities, such as Wikipedia or Freebase entries. This task is challenging due to the large number of possible entities, in the millions, and heavy-tailed mention ambiguity. We formulate the problem in terms of probabilistic inference within a topic model, where each topic is associated with a Wikipedia article. To deal with the large number of topics we propose a novel efficient Gibbs sampling scheme which can also incorporate side information, such as the Wikipedia graph. This conceptually simple probabilistic approach achieves state-of-the-art performance in entity-linking on the Aida-CoNLL dataset.


Gibbs Sampling Latent Dirichlet Allocation Anchor Text Coherence Score Topic Assignment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. JMLR 3, 993–1022 (2003)zbMATHGoogle Scholar
  2. 2.
    Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: EMNLP, pp. 782–792. ACL (2011)Google Scholar
  3. 3.
    Kulkarni, S., Singh, A., Ramakrishnan, G., Chakrabarti, S.: Collective annotation of wikipedia entities in web text. In: SIGKDD, pp. 457–466. ACM (2009)Google Scholar
  4. 4.
    Ferragina, P., Scaiella, U.: TagMe: On-the-fly annotation of short text fragments (by wikipedia entities). In: CIKM, pp. 1625–1628. ACM (2010)Google Scholar
  5. 5.
    Han, X., Sun, L.: An entity-topic model for entity linking. In: EMNLP-CoNLL, pp. 105–115. ACL (2012)Google Scholar
  6. 6.
    Mihalcea, R., Csomai, A.: Wikify!: Linking documents to encyclopedic knowledge. In: CIKM, pp. 233–242. ACM (2007)Google Scholar
  7. 7.
    Milne, D., Witten, I.H.: Learning to link with Wikipedia. In: CIKM, pp. 509–518. ACM (2008)Google Scholar
  8. 8.
    Ratinov, L.A., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to wikipedia. In: ACL, vol. 11, pp. 1375–1384 (2011)Google Scholar
  9. 9.
    Sil, A., Yates, A.: Re-ranking for joint named-entity recognition and linking. In: CIKM (2013)Google Scholar
  10. 10.
    Newman, D., Chemudugunta, C., Smyth, P.: Statistical entity-topic models. In: SIGKDD, pp. 680–686. ACM (2006)Google Scholar
  11. 11.
    Kim, H., Sun, Y., Hockenmaier, J., Han, J.: Etm: Entity topic models for mining documents associated with entities. In: 2012 IEEE 12th International Conference on Data Mining (ICDM), pp. 349–358. IEEE (2012)Google Scholar
  12. 12.
    Kataria, S.S., Kumar, K.S., Rastogi, R.R., Sen, P., Sengamedu, S.H.: Entity disambiguation with hierarchical topic models. In: SIGKDD, pp. 1037–1045. ACM (2011)Google Scholar
  13. 13.
    Sen, P.: Collective context-aware topic models for entity disambiguation. In: Proceedings of the 21st International Conference on World Wide Web, pp. 729–738. ACM (2012)Google Scholar
  14. 14.
    Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., Welling, M.: Fast collapsed gibbs sampling for latent dirichlet allocation. In: SIGKDD, pp. 569–577. ACM (2008)Google Scholar
  15. 15.
    Hansen, J.A., Ringger, E.K., Seppi, K.D.: Probabilistic explicit topic modeling using wikipedia. In: Gurevych, I., Biemann, C., Zesch, T. (eds.) GSCL. LNCS, vol. 8105, pp. 69–82. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  16. 16.
    Houlsby, N., Ciaramita, M.: Scalable probabilistic entity-topic modeling. arXiv preprint arXiv:1309.0337 (2013)Google Scholar
  17. 17.
    Griffiths, T.L., Steyvers, M.: Finding scientific topics. PNAS 101(suppl. 1), 5228–5235 (2004)CrossRefGoogle Scholar
  18. 18.
    Teh, Y.W., Newman, D., Welling, M.: A collapsed variational bayesian inference algorithm for latent dirichlet allocation. In: NIPS, vol. 19, p. 1353 (2007)Google Scholar
  19. 19.
    Mimno, D., Hoffman, M., Blei, D.: Sparse stochastic inference for latent dirichlet allocation. In: Langford, J., Pineau, J. (eds.) ICML, pp. 1599–1606. Omni Press, New York (2012)Google Scholar
  20. 20.
    Milne, D., Witten, I.: An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: AAAI Workshop on Wikipedia and Artificial Intelligence (2008)Google Scholar
  21. 21.
    Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: EMNLP-CoNLL, vol. 7, pp. 708–716 (2007)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Neil Houlsby
    • 1
  • Massimiliano Ciaramita
    • 2
  1. 1.University of CambridgeUK
  2. 2.Google ResearchZürichSwitzerland

Personalised recommendations