Two Web-Based Approaches for Noun Sense Disambiguation

  • Paolo Rosso
  • Manuel Montes-y-Gómez
  • Davide Buscaldi
  • Aarón Pancardo-Rodríguez
  • Luis Villaseñor Pineda
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3406)


The problem of the resolution of the lexical ambiguity seems to be stuck because of the knowledge acquisition bottleneck. Therefore, it is worthwhile to investigate the possibility of using the Web as a lexical resource. This paper explores two attempts of using Web counts collected through a search engine. The first approach calculates the hits of each possible synonym of the noun to disambiguate together with the nouns of the context. In the second approach the disambiguation of a noun uses a modifier adjective as supporting evidence. A better precision than the baseline was obtained using adjective-noun pairs, even if with a low recall. A comprehensive set of weighting formulae for combining Web counts was investigated in order to give a complete picture of what are the various possibilities, and what are the formulae that work best. The comparison across different search engines was also useful: Web counts, and consequently disambiguation results, were almost identical. Moreover, the Web seems to be more effective than the WordNet Domains lexical resource if integrated rather than stand-alone.


Search Engine Word Sense Disambiguation Lexical Ambiguity Lexical Resource Anaphora Resolution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agirre, E., Rigau, G.: A Proposal for Word Sense Disambiguation using Conceptual Distance. In: Proc. of the Int. Conf. on Recent Advances in NLP. RANLP 1995 (1995)Google Scholar
  2. 2.
    Agirre, E., Martinez, D.: Exploring Automatic Word Sense Disambiguation with Decision Lists and the Web. In: Proc. of the the COLING 2000 (2000)Google Scholar
  3. 3.
    Agirre, E., Olatz, A., Hovy, Martinez, D.: Enriching Very Large Ontologies using the WWW. In: ECAI 2000, Workshop on Ontology Learning, Berlin (2000)Google Scholar
  4. 4.
    Brill, E., Lin, J., Banko, M., Dumais, S., Ng, A.: Data-intensive Question Answering. In: Proc. of the Tenth Text REtrieval Conference TREC 2001 (2001)Google Scholar
  5. 5.
    Brill, E.: Processing Natural Language Processing without Natural Language Processing. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 360–369. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  6. 6.
    Bunescu, R.: Associative Anaphora Resolution:A Web-Based Approach. In: Proc. of the EACL 2003 Workshop on the Computational Treatment of Anaphora, Budapest, Hungary (April 2003)Google Scholar
  7. 7.
    Buscaldi, D., Rosso, P., Masulli, F.: The upv-unige-CIAOSENSO WSD System. In: Senseval-3 Workshop, Association for Computational Linguistics (ACL 2004), Barcelona, Spain (2004)Google Scholar
  8. 8.
    Calvo, H., Gelbukh, A.: Improving Prepositional Phrase Attachment Disambiguation Using the Web as Corpus. In: Sanfeliu, A., Ruiz-Shulcloper, J. (eds.) CIARP 2003. LNCS, vol. 2905, pp. 604–610. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  9. 9.
    Gonzalo, J., Verdejo, F., Chugar, I.: The Web as a Resource for WSD. In: 1st MEANING Workshop, Spain (2003)Google Scholar
  10. 10.
    Grefenstette, G.: The World Wide Web as a resource for example-based Machine Translation Tasks. In: Proc. of Aslib Conference on Translating and the Computer, London (1999)Google Scholar
  11. 11.
    Lin, D.: An Information-Theoretic Definition of Similarity. In: Proc. of the 15th Int. Conf. on Machine Learning, Toronto, Canada (2003)Google Scholar
  12. 12.
    Magnini, B., Cavaglià, G.: Integrating Subject Field Codes into WordNet. In: Proc. of LREC 2000, 2nd Int. Conf. on Language Resources and Evaluation, pp. 1413–1418 (2000)Google Scholar
  13. 13.
    Mihalcea, R., Moldovan, D.I.: An Automatic Method for Generating Sense Tagged Corpora. In: Proc. of the 16th National Conf. on Artificial Intelligence. AAAI Press, Menlo Park (1999)Google Scholar
  14. 14.
    Mihalcea, R., Moldovan, D.I.: A Method for Word Sense Disambiguation of Unrestricted Text. In: Proc. of the 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999), Maryland, NY, U.S.A (1999)Google Scholar
  15. 15.
    Mihalcea, R., Edmonds, P. (eds.): Proc. of Senseval-3: The 3rd Int. Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain (2004)Google Scholar
  16. 16.
    Montoyo, A.: Método basado en Marcas de Especificidad para WSD. Procesamiento del Lenguaje Natural. In: Revista, vol. 26 (Septiembre 2000)Google Scholar
  17. 17.
    Rosso, P., Masulli, F., Buscaldi, D., Pla, F., Molina, A.: Automatic Noun Disambiguation. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 273–276. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  18. 18.
    Santamaria, C., Gonzalo, J., Verdejo, F.: Automatic Association of WWW Directories to Word Senses. Computational Linguistics. Special Issue on the Web as Corpus 3(3), 485–502 (2003)Google Scholar
  19. 19.
    Sidorov, G., Gelbukh, A.: Word Sense Disambiguation in a Spanish Explanatory Dictionary. In: TALN 2001: Tratamiento Automático de Lengauje Natural, France, pp. 398–402 (2001),
  20. 20.
    Solorio, T., Pérez, M., Montes, M., Villaseñor, L., López, A.: A Language Independent Method for Question Classification. In: Proc. of the 20th Int. Conf. on Computational Linguistics (COLING 2004), Geneva, Switzerland (2004)Google Scholar
  21. 21.
    Volk, M.: Exploiting the WWW as a Corpus to Resolve PP Attachment Ambiguities. In: Proc. of Corpus Linguistics, Lancaster (2001)Google Scholar
  22. 22.
    Volk, M.: Using the Web as Corpus for Linguistic Research. In: Pajusalu, R., Hennoste, T. (eds.) Catcher of the Meaning, Dept. of General Linguistics, vol. 3. University of Tartu, Germany (2002)Google Scholar
  23. 23.
    Wackerbauer, R., Witt, A., Atmanspacher, H., Kurths, J., Scheingraber, H.: A Comparative Classification of Complexity Measures. Chaos, Solitons and Fractals 4, 133–173 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Zinder, B., Palmer, M.: Proc. of Senseval-3: The 3rd Int. Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Paolo Rosso
    • 1
  • Manuel Montes-y-Gómez
    • 1
    • 2
  • Davide Buscaldi
    • 3
  • Aarón Pancardo-Rodríguez
    • 2
  • Luis Villaseñor Pineda
    • 2
  1. 1.Dpto. de Sistemas Informáticos y Computación (DSIC)Universidad Politécnica de ValenciaSpain
  2. 2.Lab. de Tecnologías del LenguajeInstituto Nacional de Astrofísica, Optica y ElectrónicaMexico
  3. 3.Dipartimento di Informatica e Scienze dell’Informazione (DISI)Università di GenovaItaly

Personalised recommendations