Simple Window Selection Strategies for the Simplified Lesk Algorithm for Word Sense Disambiguation

  • Francisco Viveros-Jiménez
  • Alexander Gelbukh
  • Grigori Sidorov
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8265)

Abstract

The Simplified Lesk Algorithm (SLA) is frequently used for word sense disambiguation. It disambiguates by calculating the overlap of a set of dictionary definitions (senses) and the context words. The algorithm is simple and fast, but it has relatively low accuracy. We propose simple strategies for the context window selection that improve the performance of the SLA: (1) constructing the window only with words that have an overlap with some sense of the target word, (2) excluding the target word itself from matching, and (3) avoiding repetitions in the context window. This paper describes the corresponding experiments. Comparison with other more complex knowledge-based algorithms is presented.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Banerjee, S., Pedersen, T.: An adapted lesk algorithm for word sense disambiguation using WordNet. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 136–145. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  2. 2.
    Buscaldi, D., Rosso, P., Masulli, F.: Finding predominant word senses in untagged text. In: Workshop Senseval-3, Proc. of ACL, ACL 2004, pp. 77–82. Association for Computational Linguistics (2004)Google Scholar
  3. 3.
    Carpuat, M., Shen, Y., Yu, X., Wu, D.: Toward integrating word sense and entity disambiguation into statistical machine translation. In: Proc. of IWSLT, IWSLT 2006, pp. 37–44 (2006)Google Scholar
  4. 4.
    Chan, Y.S., Ng, H.T.: Word sense disambiguation improves statistical machine translation. In: Proc. of ACL, ACL 2007, pp. 33–40 (2007)Google Scholar
  5. 5.
    Kilgarriff, A., Rosenzweig, J.: Framework and results for english SENSEVAL. Computers and the Humanities 34(1-2), 15–48 (2000)CrossRefGoogle Scholar
  6. 6.
    Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proc. of SIGDOC, SIGDOC 1986, pp. 24–26. ACM, New York (1986)Google Scholar
  7. 7.
    McCarthy, D., Koeling, R., Weeds, J., Carroll, J.: Finding predominant word senses in untagged text. In: Proc. of ACL, ACL 2004, pp. 280–287. Association for Computational Linguistics, Stroudsburg (2004)Google Scholar
  8. 8.
    Mihalcea, R.: Knowledge-based methods for WSD. In: Word Sense Disambiguation: Algorithms and Applications, Text, Speech and Language Technology, pp. 107–132. Springer, Dordrecht (2006)CrossRefGoogle Scholar
  9. 9.
    Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: Proc. of CIKM, CIKM 2007, pp. 233–242. ACM, New York (2007)Google Scholar
  10. 10.
    Miller, G.A.: WordNet: A lexical database for English. Communications of the ACM 38, 39–41 (1995)CrossRefGoogle Scholar
  11. 11.
    Navigli, R.: Word sense disambiguation: A survey. ACM Comput. Surv. 41, 10:1–10:69 (2009)Google Scholar
  12. 12.
    Navigli, R., Lapata, M.: An experimental study of graph connectivity for unsupervised word sense disambiguation. IEEE Trans. Pattern Anal. Mach. Intell. 32(4), 678–692 (2010)CrossRefGoogle Scholar
  13. 13.
    Palmer, M., Fellbaum, C., Cotton, S., Delfs, L., Dang, H.T.: English tasks: All-words and verb lexical sample (2001)Google Scholar
  14. 14.
    Patwardhan, S., Banerjee, S., Pedersen, T.: Using measures of semantic relatedness for word sense disambiguation. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 241–257. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  15. 15.
    Pinto, D., Vilario, D., Balderas, C., Tovar, M., Beltran, B.: Evaluating n-gram models for a bilingual word sense disambiguation task. Computación y Sistemas 15(2) (2011)Google Scholar
  16. 16.
    Sinha, R., Mihalcea, R.: Unsupervised graph-based word sense disambiguation using measures of word semantic similarity. In: Proc. of ICSC, ICSC 2007, pp. 363–369 (2007)Google Scholar
  17. 17.
    Snyder, B., Palmer, M.: The English all-words task (2004)Google Scholar
  18. 18.
    Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proc. of EMNLP, EMNLP 2000, pp. 63–70. ACL, PA (2000)Google Scholar
  19. 19.
    Vasilescu, F., Langlais, P., Lapalme, G.: Evaluating variants of the lesk approach for disambiguating words. In: Proc. of LREC, LREC 2004, pp. 633–636. Lisbon, Portugal (May 2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Francisco Viveros-Jiménez
    • 1
  • Alexander Gelbukh
    • 1
    • 2
  • Grigori Sidorov
    • 1
    • 2
  1. 1.Centro de Investigación en Computación, Instituto Politécnico NacionalMexico CityMexico
  2. 2.Institute for Modern Linguistic Research“Sholokhov” Moscow State University for HumanitiesMoscowRussia

Personalised recommendations