Improving Word Sense Disambiguation by Pseudo-samples

  • Xiaojie Wang
  • Yuji Matsumoto
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3248)


Data sparseness is a major problem in word sense disambiguation. Automatic sample acquisition and smoothing are two ways that have been explored to alleviate the influence of data sparseness. In this paper, we consider a combination of these two methods. Firstly, we propose a pattern-based way to acquire pseudo samples, and then we estimate conditional probabilities for variables by combining pseudo data set with sense tagged data set. By using the combinational estimation, we build an appropriate leverage between the two different data sets, which is vital to achieve the best performance. Experiments show that our approach brings significant improvement for Chinese word sense disambiguation.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agirre, E., Martinez, D.: Exploring Automatic Word Sense Disambiguation With Decision Lists and the Web. In: Proceedings of the Semantic Annotation And Intelligent Annotation workshop organized by COLING, Luxembourg (2000)Google Scholar
  2. 2.
    Diab, M., Resnik, P.: An Unsupervised Method for Word Sense Tagging using Parallel Corpora. In: Proceedings of ACL2002, pp. 255–262 (2002)Google Scholar
  3. 3.
    Zhendong Dong (2000),
  4. 4.
    Gale, W.W., Church, K.W., Yarowsky, D.: A Method for Disambiguating Word Senses in a Large Corpus. Computers and Humanities 26, 415–439 (1992)CrossRefGoogle Scholar
  5. 5.
    Ide, N., Veronis, J.: Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art. Computational Linguistics 24(1), 1–40 (1998)Google Scholar
  6. 6.
    Karov, Y., Edelman, S.: Similarity-based Word Sense Disambiguation. Computational Linguistics 24(1), 41–59 (1998)Google Scholar
  7. 7.
    Leacook, C., Chodorow, M., Miller, G.A.: Using Corpus Statistics and WordNet Relations for Sense Identification. Computational Linguistics 24(1), 147–166 (1998)Google Scholar
  8. 8.
    Li, C., Li, H.: Word Translation Disambiguation Using Bilingual Bootstrapping. In: Proceedings of ACL 2002, pp. 343–351 (2002)Google Scholar
  9. 9.
    Luk, A.K.: Statistical sense disambiguation with relatively small corpora using dictionary definition. In: Proceedings of ACL 1995, pp. 181–188 (1995)Google Scholar
  10. 10.
    Mihalcea, R., Moldovan, D.: An Automatic Method for Generating Sense Tagged Corpora. In: Proceedings of AAAI 1999, Orlando, FL, July 1999, pp. 461–466 (1999)Google Scholar
  11. 11.
    Mihalcea, R.: Bootstrapping Large Sense Tagged Corpora. In: Proceedings of the 3rd International Conference on Languages Resources and Evaluations LREC 2002, Las Palmas, Spain (May 2002)Google Scholar
  12. 12.
    Ng, H.T.: Exemplar-Based Word Sense Disambiguation: Some Recent Improvements. In: Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, Providence, Rhode Island, USA, pp. 208–213 (1997)Google Scholar
  13. 13.
    Yarowsky, D.: Unsupervised Word Sense Disambiguation Rivaling Supervised Method. In: Proceedings of ACL 1995, pp. 189–196 (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Xiaojie Wang
    • 1
    • 2
  • Yuji Matsumoto
    • 1
  1. 1.Graduate School of Information ScienceNara Institute of Science and TechnologyNaraJapan
  2. 2.School of Information EngineeringBeijing University of Posts and TechnologyBeijingChina

Personalised recommendations