Semantic Relatedness Approach for Named Entity Disambiguation

  • Anna Lisa Gentile
  • Ziqi Zhang
  • Lei Xia
  • José Iria
Part of the Communications in Computer and Information Science book series (CCIS, volume 91)

Abstract

Natural Language is a mean to express and discuss about concepts, objects, events, i.e., it carries semantic contents. One of the ultimate aims of Natural Language Processing techniques is to identify the meaning of the text, providing effective ways to make a proper linkage between textual references and their referents, that is, real world objects. This work addresses the problem of giving a sense to proper names in a text, that is, automatically associating words representing Named Entities with their referents. The proposed methodology for Named Entity Disambiguation is based on Semantic Relatedness Scores obtained with a graph based model over Wikipedia. We show that, without building a Bag of Words representation of the text, but only considering named entities within the text, the proposed paradigm achieves results competitive with the state of the art on two different datasets.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Gentile, A.L., Zhang, Z., Xia, L., Iria, J.: Graph-based Semantic Relatedness for Named Entity Disambiguation. In: Dicheva, D., Nikolov, R., Stefanova, E. (eds.) Proceedings of S3T 2009: International Conference on Software, Services and Semantic Technologies, Sofia, Bulgaria, October 28-29, pp. 13–20 (2009)Google Scholar
  2. 2.
    Agirre, E., Martínez, D., López de Lacalle, O., Soroa, A.: Two graph-based algorithms for state-of-the-art WSD. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, Australia, pp. 585–593. Association for Computational Linguistics (July 2006)Google Scholar
  3. 3.
    Mihalcea, R.: Unsupervised large-vocabulary word sense disambiguation with graph-based algorithms for sequence data labeling. In: HLT/EMNLP 2005, Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, The Association for Computational Linguistics (2005)Google Scholar
  4. 4.
    Navigli, R., Lapata, M.: Graph connectivity measures for unsupervised word sense disambiguation. In: Veloso, M.M. (ed.) IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 1683–1688 (2007)Google Scholar
  5. 5.
    Sinha, R., Mihalcea, R.: Unsupervised Graph-based Word Sense Disambiguation Using Measures of Word Semantic Similarity. In: Proceedings of the First IEEE International Conference on Semantic Computing (ICSC 2007), pp. 363–369. IEEE Computer Society, Los Alamitos (2007)CrossRefGoogle Scholar
  6. 6.
    Bunescu, R.C., Pasca, M.: Using Encyclopedic Knowledge for Named Entity Disambiguation. In: EACL 2006, 11st Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, The Association for Computer Linguistics (2006)Google Scholar
  7. 7.
    Cucerzan, S.: Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, pp. 708–716. Association for Computational Linguistics (June 2007)Google Scholar
  8. 8.
    Minkov, E., Cohen, W.W., Ng, A.Y.: Contextual search and name disambiguation in email using graphs. In: Efthimiadis, E.N., Dumais, S.T., Hawking, D., Järvelin, K. (eds.) SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, pp. 27–34. ACM, New York (2006)Google Scholar
  9. 9.
    Kalashnikov, D.V., Mehrotra, S.: A probabilistic model for entity disambiguation using relationships. In: SIAM International Conference on Data Mining, SDM (2005)Google Scholar
  10. 10.
    Grishman, R., Sundheim, B.: Message Understanding Conference- 6: A Brief History. In: COLING, pp. 466–471 (1996)Google Scholar
  11. 11.
    Ponzetto, S.P., Strube, M.: Exploiting Semantic Role Labeling, WordNet and Wikipedia for Coreference Resolution. In: Moore, R.C., Bilmes, J.A., Chu-Carroll, J., Sanderson, M. (eds.) Proceedings of HLT-NAACL, Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, ACL (2006)Google Scholar
  12. 12.
    Strube, M., Ponzetto, S.P.: WikiRelate! Computing Semantic Relatedness Using Wikipedia. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference, pp. 1419–1424. AAAI Press, Menlo Park (2006)Google Scholar
  13. 13.
    Zesch, T., Gurevych, I., Mühlhäuser, M.: Analyzing and Accessing Wikipedia as a Lexical Semantic Resource. In: Biannual Conference of the Society for Computational Linguistics and Language Technology (2007)Google Scholar
  14. 14.
    Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database, pp. 265–283. MIT Press, Cambridge (1998)Google Scholar
  15. 15.
    Toral, A., Munoz, R.: A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia. In: Workshop on New Text, 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy (April 2006)Google Scholar
  16. 16.
    Kazama, J., Torisawa, K.: Exploiting wikipedia as external knowledge for named entity recognition. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 698–707 (2007)Google Scholar
  17. 17.
    Vercoustre, A.M., Thom, J.A., Pehcevski, J.: Entity ranking in Wikipedia. In: Wainwright, R.L., Haddad, H. (eds.) Proceedings of the 2008 ACM Symposium on Applied Computing (SAC), pp. 1101–1106. ACM, New York (2008)CrossRefGoogle Scholar
  18. 18.
    Gentile, A.L., Basile, P., Semeraro, G.: WibNED: Wikipedia Based Named Entity Disambiguation. In: Agosti, M., Esposito, F., Thanos, C. (eds.) Post-proceedings of the Fifth Italian Research Conference on Digital Libraries - IRCDL 2009: A Conference of the DELOS Association and the Department of Information Engineering of the University of Padua. Revised Selected Papers, DELOS: an Association for Digital Libraries, pp. 51–59 (2009)Google Scholar
  19. 19.
    Banerjee, S., Pedersen, T.: An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 136–145. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  20. 20.
    Minkov, E., Wang, R., Cohen, W.W.: Extracting personal names from emails: Applying named entity recognition to informal text. In: Proceedings of HLT/EMNLP 2005, Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, British Columbia, Canada (2005)Google Scholar
  21. 21.
    Zesch, T., Müller, C., Gurevych, I.: Using wiktionary for computing semantic relatedness. In: Fox, D., Gomes, C.P. (eds.) Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, AAAI 2008, pp. 861–866. AAAI Press, Menlo Park (2008)Google Scholar
  22. 22.
    Banerjee, S., Pedersen, T.: Extended Gloss Overlaps as a Measure of Semantic Relatedness. In: Gottlob, G., Walsh, T. (eds.) IJCAI 2003, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, pp. 805–810. M. Kaufmann, San Francisco (2003)Google Scholar
  23. 23.
    Resnik, P.: Disambiguating noun groupings with respect to WordNet senses. In: Proceedings of the 3th Workshop on Very Large Corpora, pp. 54–68. ACL (1995)Google Scholar
  24. 24.
    Kudo, T., Matsumoto, Y.: Fast Methods for Kernel-Based Text Analysis. In: Hinrichs, E., Roth, D. (eds.) Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pp. 24–31 (2003)Google Scholar
  25. 25.
    Turdakov, D., Velikhov, P.: Semantic relatedness metric for wikipedia concepts based on link analysis and its application to word sense disambiguation. In: Kuznetsov, S.D., Pleshachkov, P., Novikov, B., Shaporenkov, D. (eds.) SYRCoDIS. CEUR Workshop Proceedings, CEUR-WS.org, vol. 355 (2008)Google Scholar
  26. 26.
    Lovász, L.: Random walks on graphs: A survey. Combinatorics, Paul Erdös is Eighty 2, 353–398 (1996)MathSciNetMATHGoogle Scholar
  27. 27.
    Iria, J., Xia, L., Zhang, Z.: Wit: Web people search disambiguation using random walks. In: Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval 2007), Prague, Czech Republic, pp. 480–483. ACL (2007)Google Scholar
  28. 28.
    Nie, Z., Zhang, Y., Wen, J., Ma, W.: Object-level ranking: bringing order to web objects. In: WWW 2005: Proceedings of the 14th international conference on World Wide Web, pp. 567–574. ACM, New York (2005)Google Scholar
  29. 29.
    Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. ACM Transactions on Information Systems 20(1), 116–131 (2002)CrossRefGoogle Scholar
  30. 30.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Seventh International World-Wide Web Conference (WWW 1998). (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Anna Lisa Gentile
    • 1
  • Ziqi Zhang
    • 2
  • Lei Xia
    • 3
  • José Iria
    • 4
  1. 1.Department of Computer ScienceUniversity of BariItaly
  2. 2.Department of Computer ScienceThe University of SheffieldUK
  3. 3.Archaeology Data ServiceUniversity of YorkUK
  4. 4.IBM Research - ZurichSwitzerland

Personalised recommendations