Semi-supervised Word Sense Disambiguation Using the Web as Corpus

  • Rafael Guzmán-Cabrera
  • Paolo Rosso
  • Manuel Montes-y-Gómez
  • Luis Villaseñor-Pineda
  • David Pinto-Avendaño
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5449)


As any other classification task, Word Sense Disambiguation requires a large number of training examples. These examples, which are easily obtained for most of the tasks, are particularly difficult to obtain for this case. Based on this fact, in this paper we investigate the possibility of using a Web-based approach for determining the correct sense of an ambiguous word based only in its surrounding context. In particular, we propose a semi-supervised method that is specially suited to work with just a few training examples. The method considers the automatic extraction of unlabeled examples from the Web and their iterative integration into the training data set. The experimental results, obtained over a subset of ten nouns from the SemEval lexical sample task, are encouraging. They showed that it is possible to improve the baseline accuracy of classifiers such as Naïve Bayes and SVM using some unlabeled examples extracted from the Web.


Unlabeled Data Ambiguous Word Word Sense Disambiguation Relevant Word Polysemous Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aguirre, E., Rigau, G.: A Proposal for Word Sense Disambiguation using Conceptual Distance. In: Proc. of the Int. Conf. on Recent Advances in NLP. RANLP 1995 (1995)Google Scholar
  2. 2.
    Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proc. COLT, pp. 92–100 (1998)Google Scholar
  3. 3.
    Buscaldi, D., Rosso, P.: A conceptual density-based approach for the disambiguation of toponyms. International Journal of Geographical Information Science 22(3), 143–153 (2008)CrossRefGoogle Scholar
  4. 4.
    Goldman, S., Zhou, Y.: Enhancing supervised learning with unlabeled data. In: Proc. ICML, pp. 327–334 (2000)Google Scholar
  5. 5.
    Guzmán-Cabrera, R., Montes-y-Gómez, M., Rosso, P., Villaseñor-Pineda, L.: Using the Web as Corpus for Self-training Text Categorization. Journal of Information Retrieval (forthcoming, 2009) ISSN 1386-4564Google Scholar
  6. 6.
    Ide, N., Veronis, J.: Introduction to the special Issue on word sense disambiguation: the state of the art, Computational Linguistics. Special Issue on word sense Disambiguation 24(1), 1–40 (1998)Google Scholar
  7. 7.
    Kilgarriff, A., Greffenstette, G.: Introduction to the Special Issue on Web as Corpus. Computational Linguistics 29(3), 1–15 (2003)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Lee, Y.K., Ng, H.T.: An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In: Proc. EMNLP, pp. 41–48 (2002)Google Scholar
  9. 9.
    Mihalcea, R.: Co-training and Self-training for Word Sense Disambiguation. In: Proc. CoNLL, pp. 33–40 (2004)Google Scholar
  10. 10.
    Pham, T.P., Ng, H.T., Lee, W.S.: Word Sense Disambiguation with Semi-Supervised Learning. In: Proc. AAAI, pp. 1093–1098 (2005)Google Scholar
  11. 11.
    Pinto, D.: On Clustering and Evaluation of Narrow Domain Short-Text Corpora. PhD thesis, Universidad Politécnica de Valencia, Spain (2008)Google Scholar
  12. 12.
    Solorio, T.: Using unlabeled data to improve classifier accuracy. M.Sc. thesis, Computer Science Department, INAOE, Mexico (2002)Google Scholar
  13. 13.
    Su, W., Carpuat, M., Wu, D.: Semi-Supervised Training of a Kernel PCA-Based Model for Word Sense Disambiguation. In: Proc. COLING, pp. 1298–1304 (2004)Google Scholar
  14. 14.
    Tratz, S., Sanfilippo, A., Gregory, M., Chappell, A., Posse, C., Paul, W.: PNNL: A Supervised Maximum Entropy Approach to Word Sense Disambiguation. In: Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval. 2007), pp. 264–267 (2007)Google Scholar
  15. 15.
    Yarowsky, D.: Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In: Proc. ACL, pp. 189–196 (1995)Google Scholar
  16. 16.
    Yu, N.Z., Hong, J.D., Lim, T.C.: Word Sense Disambiguation Using Label Propagation Based Semi-supervised Learning Method. In: Proc. ACL, pp. 395–402 (2005)Google Scholar
  17. 17.
    Zelikovitz, S., Kogan, M.: Using Web Searches on Important Words to Create Background Sets for LSI Classification. In: 19th Int. FLAIRS Conf., Melbourne Beach, Florida (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Rafael Guzmán-Cabrera
    • 1
    • 2
  • Paolo Rosso
    • 2
  • Manuel Montes-y-Gómez
    • 3
  • Luis Villaseñor-Pineda
    • 3
  • David Pinto-Avendaño
    • 4
  1. 1.FIMEEUniversidad de GuanajuatoMexico
  2. 2.NLE Lab, DSICUniversidad Politécnica de ValenciaSpain
  3. 3.LabTLInstituto Nacional de Astrofísica, Óptica y ElectrónicaMexico
  4. 4.FCCBenemérita Universidad Autónoma de PueblaMexico

Personalised recommendations