Investigating Problems of Semi-supervised Learning for Word Sense Disambiguation

  • Anh-Cuong Le
  • Akira Shimazu
  • Le-Minh Nguyen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4285)


Word Sense Disambiguation (WSD) is the problem of determining the right sense of a polysemous word in a given context. In this paper, we will investigate the use of unlabeled data for WSD within the framework of semi supervised learning, in which the original labeled dataset is iteratively extended by exploiting unlabeled data. This paper addresses two problems occurring in this approach: determining a subset of new labeled data at each extension and generating the final classifier. By giving solutions for these problems, we generate some variants of bootstrapping algorithms and apply to word sense disambiguation. The experiments were done on the datasets of four words: interest, line, hard, and serve; and on English lexical sample of Senseval-3.


Support Vector Machine Label Data Unlabeled Data Ambiguous Word Word Sense Disambiguation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proc. COLT, pp. 92–100 (1998)Google Scholar
  2. 2.
    Goldman, S., Zhou, Y.: Enhancing supervised learning with unlabeled data. In: Proc. ICML 2000, pp. 327–334 (2000)Google Scholar
  3. 3.
    Lee, Y.K., Ng, H.T.: An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In: Proc. EMNLP, pp. 41–48 (2002)Google Scholar
  4. 4.
    Le, C.A., Huynh, V.-N., Dam, H.-C., Shimazu, A.: Combining Classifiers Based on OWA Operators with an Application to Word Sense Disambiguation. In: Ślęzak, D., Wang, G., Szczuka, M.S., Düntsch, I., Yao, Y. (eds.) RSFDGrC 2005. LNCS, vol. 3641, pp. 512–521. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  5. 5.
    Mihalcea, R.: Co-training and Self-training for Word Sense Disambiguation. In: Proc. CoNLL, pp. 33–40 (2004)Google Scholar
  6. 6.
    Ng, H.T., Lee, H.B.: Integrating multiple knowledge sources to Disambiguate Word Sense: An exemplar-based approach. In: Proc. ACL, pp. 40–47 (1996)Google Scholar
  7. 7.
    Pham, T.P., Ng, H.T., Lee, W.S.: Word Sense Disambiguation with Semi-Supervised Learning. In: Proc. AAAI, pp. 1093–1098 (2005)Google Scholar
  8. 8.
    Pierce, D., Cardie, C.: Limitations of co-training for natural language learning from large datasets. In: Proc. EMNLP, pp. 1–9 (2001)Google Scholar
  9. 9.
    Yarowsky, D.: Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In: Proc. ACL, pp. 189–196 (1995)Google Scholar
  10. 10.
    Yu, N.Z., Hong, J.D., Lim, T.C.: Word Sense Disambiguation Using Label Propagation Based Semi-supervised Learning Method. In: Proc. ACL, pp. 395–402 (2005)Google Scholar
  11. 11.
    Su, W., Carpuat, M., Wu, D.: Semi-Supervised Training of a Kernel PCA-Based Model for Word Sense Disambiguation. In: Proc. COLING, pp. 1298–1304 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Anh-Cuong Le
    • 1
  • Akira Shimazu
    • 1
  • Le-Minh Nguyen
    • 1
  1. 1.School of Information Science, Japan Advanced Institute of Science and TechnologyIshikawaJapan

Personalised recommendations