Scalable Disambiguation System Capturing Individualities of Mentions

  • Tiep Mai
  • Bichen Shi
  • Patrick K. Nicholson
  • Deepak Ajwani
  • Alessandra Sala
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10318)

Abstract

Entity disambiguation, or mapping a phrase to its canonical representation in a knowledge base, is a fundamental step in many natural language processing applications. Existing techniques based on global ranking models fail to capture the individual peculiarities of the words and hence, struggle to meet the accuracy-time requirements of many real-world applications. In this paper, we propose a new system that learns specialized features and models for disambiguating each ambiguous phrase in the English language. We train and validate the hundreds of thousands of learning models for this purpose using a Wikipedia hyperlink dataset with more than 170 million labelled annotations. The computationally intensive training required for this approach can be distributed over a cluster. In addition, our approach supports fast queries, efficient updates and its accuracy compares favorably with respect to other state-of-the-art disambiguation systems.

Keywords

Entity linking Entity disambiguation Wikification Word-sense disambiguation 

References

  1. 1.
    Brando, C., Frontini, F., Ganascia, J.: REDEN: named entity linking in digital literary editions using linked data sets. CSIMQ 7, 60–80 (2016)CrossRefGoogle Scholar
  2. 2.
    Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: Proceedings of the EMNLP-CoNLL, pp. 708–716, June 2007Google Scholar
  3. 3.
    Cucerzan, S.: Name entities made obvious: the participation in the ERD 2014 evaluation. In: Proceedings of the ERD, pp. 95–100. ACM, New York (2014)Google Scholar
  4. 4.
    Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the I-SEMANTICS (2013)Google Scholar
  5. 5.
    Ferragina, P., Scaiella, U.: TAGME: on-the-fly annotation of short text fragments (by Wikipedia entities). In: Proceedings of the CIKM, pp. 1625–1628 (2010)Google Scholar
  6. 6.
    Ferrucci, D.A.: Introduction to “This is Watson”. IBM J. Res. Dev. 56(3), 235–249 (2012)Google Scholar
  7. 7.
    Ganea, O., Ganea, M., Lucchi, A., Eickhoff, C., Hofmann, T.: Probabilistic bag-of-hyperlinks model for entity linking. In: Proceedings of the WWW, pp. 927–938 (2016)Google Scholar
  8. 8.
    Guo, Z., Barbosa, D.: Robust entity linking via random walks. In: Proceedings of the CIKM, pp. 499–508 (2014)Google Scholar
  9. 9.
    Han, X., Sun, L.: A generative entity-mention model for linking entities with knowledge base. In: Proceedings of the HLT, pp. 945–954 (2011)Google Scholar
  10. 10.
    Han, X., Sun, L., Zhao, J.: Collective entity linking in web text: a graph-based method. In: Proceedings of the SIGIR, pp. 765–774 (2011)Google Scholar
  11. 11.
    Hoffart, J.: Discovering and disambiguating named entities in text. In: Proceedings of the SIGMOD/PODS Ph.D. Symposium, pp. 43–48 (2013)Google Scholar
  12. 12.
    Houlsby, N., Ciaramita, M.: A scalable Gibbs sampler for probabilistic entity linking. In: Rijke, M., Kenter, T., Vries, A.P., Zhai, C.X., Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 335–346. Springer, Cham (2014). doi:10.1007/978-3-319-06028-6_28 CrossRefGoogle Scholar
  13. 13.
    Hulpuş, I., Prangnawarat, N., Hayes, C.: Path-based semantic relatedness on linked data and its use to word and entity disambiguation. In: Arenas, M., Corcho, O., Simperl, E., Strohmaier, M., d’Aquin, M., Srinivas, K., Groth, P., Dumontier, M., Heflin, J., Thirunarayan, K., Staab, S. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 442–457. Springer, Cham (2015). doi:10.1007/978-3-319-25007-6_26 CrossRefGoogle Scholar
  14. 14.
    Kulkarni, S., Singh, A., Ramakrishnan, G., Chakrabarti, S.: Collective annotation of Wikipedia entities in web text. In: Proceedings of the KDD, pp. 457–466 (2009)Google Scholar
  15. 15.
    McNamee, P.: HLTCOE efforts in entity linking at TAC KBP 2010. In: Proceedings of the TAC (2010)Google Scholar
  16. 16.
    Meij, E., Weerkamp, W., de Rijke, M.: Adding semantics to microblog posts. In: Proceedings of the WSDM, pp. 563–572 (2012)Google Scholar
  17. 17.
    Milne, D., Witten, I.H.: Learning to link with Wikipedia. In: Proceedings of the CIKM, pp. 509–518 (2008)Google Scholar
  18. 18.
    Moro, A., Raganato, A., Navigli, R.: Entity linking meets word sense disambiguation: a unified approach. TACL 2, 231–244 (2014)Google Scholar
  19. 19.
    Olieman, A., Azarbonyad, H., Dehghani, M., Kamps, J., Marx, M.: Entity linking by focusing DBpedia candidate entities. In: Proceedings of the ERD, pp. 13–24 (2014)Google Scholar
  20. 20.
    Piccinno, F., Ferragina, P.: From TAGME to WAT: a new entity annotator. In: Proceedings of the ERD, pp. 55–62 (2014)Google Scholar
  21. 21.
    Qureshi, M.A., O’Riordan, C., Pasi, G.: Exploiting wikipedia for entity name disambiguation in tweets. In: Proceedings of the NLDB, pp. 184–195 (2014)Google Scholar
  22. 22.
    Suchanek, F., Weikum, G.: Knowledge harvesting in the big-data era. In: Proceedings of the SIGMOD, pp. 933–938. ACM, New YorkGoogle Scholar
  23. 23.
    Usbeck, R., Ngomo, A.N., Röder, M., Gerber, D., Coelho, S.A., Auer, S., Both, A.: AGDISTIS - agnostic disambiguation of named entities using linked open data. In: Proceedings of the ECAI, pp. 1113–1114 (2014)Google Scholar
  24. 24.
    Usbeck, R., Röder, M., Ngonga Ngomo, A.-C., Baron, C., Both, A., Brümmer, M., Ceccarelli, D., Cornolti, M., Cherix, D., Eickmann, B., Ferragina, P., Lemke, C., Moro, A., Navigli, R., Piccinno, F., Rizzo, G., Sack, H., Speck, R., Troncy, R., Waitelonis, J., Wesemann, L.: GERBIL: general entity annotator benchmarking framework. In: Proceedings of the WWW, pp. 1133–1143 (2015)Google Scholar
  25. 25.
    Zwicklbauer, S., Seifert, C., Granitzer, M.: Robust and collective entity disambiguation through semantic embeddings. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 425–434. ACM (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Tiep Mai
    • 1
  • Bichen Shi
    • 2
  • Patrick K. Nicholson
    • 1
  • Deepak Ajwani
    • 1
  • Alessandra Sala
    • 1
  1. 1.Nokia Bell LabsDublinIreland
  2. 2.University College DublinDublinIreland

Personalised recommendations