Journal of Computer Science and Technology

, Volume 25, Issue 5, pp 1030–1039 | Cite as

Unsupervised WSD by Finding the Predominant Sense Using Context as a Dynamic Thesaurus

  • Javier Tejada-Cárcamo
  • Hiram Calvo
  • Alexander Gelbukh
  • Kazuo Hara
Regular Paper

Abstract

We present and analyze an unsupervised method for Word Sense Disambiguation (WSD). Our work is based on the method presented by McCarthy et al. in 2004 for finding the predominant sense of each word in the entire corpus. Their maximization algorithm allows weighted terms (similar words) from a distributional thesaurus to accumulate a score for each ambiguous word sense, i.e., the sense with the highest score is chosen based on votes from a weighted list of terms related to the ambiguous word. This list is obtained using the distributional similarity method proposed by Lin Dekang to obtain a thesaurus. In the method of McCarthy et al., every occurrence of the ambiguous word uses the same thesaurus, regardless of the context where the ambiguous word occurs. Our method accounts for the context of a word when determining the sense of an ambiguous word by building the list of distributed similar words based on the syntactic context of the ambiguous word. We obtain a top precision of 77.54% of accuracy versus 67.10% of the original method tested on SemCor. We also analyze the effect of the number of weighted terms in the tasks of finding the Most Frecuent Sense (MFS) and WSD, and experiment with several corpora for building the Word Space Model.

Keywords

word sense disambiguation word space model semantic similarity text corpus thesaurus 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Schütze H. Dimensions of meaning. In Proc. ACM/IEEE Conference on Supercomputing (Supercomputing 1992), Mannheim, Germany, June, 1992, pp.787–796.Google Scholar
  2. [2]
    Karlgren J, Sahlgren M. From Words to Understanding. Foundations of Real-World Intelligence, Stanford: CSLI Publications, 2001, pp.294–308.Google Scholar
  3. [3]
    McCarthy D, Koeling R, Weeds J et al. Finding predominant word senses in untagged text. In Proc. the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, 2004.Google Scholar
  4. [4]
    Lin D. Automatic retrieval and clustering of similar words. In Proc. the 17th Int. Conf. Computational Linguistics, Montreal, Canada, Aug. 10-14, 1998, pp.768–774.Google Scholar
  5. [5]
    Kilgarriff A, Rosenzweig J. English SENSEVAL: Report and results. In Proc. LREC, Athens, May-June 2000.Google Scholar
  6. [6]
    Patwardhan S, Banerjee S, Pedersen T. Using measures of semantic relatedness for word sense disambiguation. In Proc. the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, Mexico, 2003, pp.241–257.Google Scholar
  7. [7]
    Sahlgren M. The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces [Ph.D. Dissertation]. Department of Linguistics, Stockholm University, 2006.Google Scholar
  8. [8]
    Lin D. Dependency-based evaluation of MINIPAR. In Proc. Workshop on the Evaluation of Parsing Systems at LREC, Granada, Spain, 1998, pp.317–330.Google Scholar
  9. [9]
    Hays D. Dependency theory: A formalism and some observations. Language, 1964, 40(4): 511–525.CrossRefGoogle Scholar
  10. [10]
    Mel'čuk I A. Dependency Syntax: Theory and Practice. State University of New York Press, Albany, N.Y., 1987.Google Scholar
  11. [11]
    Pedersen T, Patwardhan S, Michelizzi J.WordNet::Similarity: Measuring the relatedness of concepts. In Proc. the Nineteenth National Conference on Arti¯cial Intelligence (AAAI-2004), San Jose, CA, 2004, pp.1024–1025.Google Scholar
  12. [12]
    Miller G. Introduction to WordNet: An On-line Lexical Database. Princeton Univesity, 1993.Google Scholar
  13. [13]
    Miller G. WordNet: An on-line lexical database. International Journal of Lexicography, 1990, 3(4): 235–244.CrossRefGoogle Scholar
  14. [14]
    Resnik P. Using information content to evaluate semantic similarity in a taxonomy. In Proc. the 14th Intern-tional Joint Conference on Artificial Intelligence, Montreal, Canada, Aug. 20-25, 1995, pp.448–453.Google Scholar
  15. [15]
    Jiang J J, Conrath D W. Semantic similarity based on corpus statistics and lexical taxonomy. In Proc. International Conference on Research in Computational Linguistics, Taiwan, China, Sept. 1997, pp.19–33.Google Scholar
  16. [16]
    Leacock C, Chodorow M. Combining Local Context and WordNet Similarity for Word Sense Identification. Word-Net: An Electronic Lexical Database, Fellbaum C (ed.), 1998, pp.265–283.Google Scholar
  17. [17]
    Tejada J, Gelbukh A, Calvo H. Unsupervised WSD with a dynamic thesaurus. In Proc. the 11th International Conference on Text, Speech and Dialogue (TSD 2008), Brno, Czech, Sept. 8-12, 2008, pp.201–210.Google Scholar
  18. [18]
    Tejada J, Gelbukh A, Calvo H. An innovative two-stage WSD unsupervised method. SEPLN Journal, March 2008, 40: 99–105.Google Scholar

Copyright information

© Springer 2010

Authors and Affiliations

  • Javier Tejada-Cárcamo
    • 1
  • Hiram Calvo
    • 2
    • 3
  • Alexander Gelbukh
    • 2
  • Kazuo Hara
    • 3
  1. 1.San Pablo Catholic UniversityArequipaPeru
  2. 2.Center for Computing ResearchNational Polytechnic InstituteMexico CityMexico
  3. 3.Nara Institute of Science and TechnologyNaraJapan

Personalised recommendations