Skip to main content

Five Languages Are Better Than One: An Attempt to Bypass the Data Acquisition Bottleneck for WSD

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7816))

Abstract

This paper presents a multilingual classification-based approach to Word Sense Disambiguation that directly incorporates translational evidence from four other languages. The need of a large predefined monolingual sense inventory (such as WordNet) is avoided by taking a language-independent approach where the word senses are derived automatically from word alignments on a parallel corpus. As a consequence, the task is turned into a cross-lingual WSD task, that consists in selecting the contextually correct translation of an ambiguous target word.

In order to evaluate the viability of cross-lingual Word Sense Disambiguation, we built five classifiers with English as an input language and translations in the five supported languages (viz. French, Dutch, Italian, Spanish and German) as classification output. The feature vectors incorporate both local context features as well as translation features that are extracted from the aligned translations. The experimental results confirm the validity of our approach: the classifiers that employ translational evidence outperform the classifiers that only exploit local context information. Furthermore, a comparison with state-of-the-art systems for the same task revealed that our system outperforms all other systems for all five target languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agirre, E., Edmonds, P.: Word Sense Disambiguation. Algorithms and Applications. Text, Speech and Language Technology. Springer (2006)

    Google Scholar 

  2. Navigli, R.: Word Sense Disambiguation: a Survey. ACM Computing Surveys 41(2), 1–69 (2009)

    Article  Google Scholar 

  3. Landes, S., Leacock, C., Tengi, R.: Building Semantic Concordances, ch. 8, pp. 199–216. MIT Press, Cambridge (1998)

    Google Scholar 

  4. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press (1998)

    Google Scholar 

  5. Resnik, P., Yarowsky, D.: Distinguishing systems and distinguishing senses: New evaluation methods for word sense disambiguation. Natural Language Engineering 5(3), 113–133 (2000)

    Google Scholar 

  6. Brown, P., Pietra, S., Pietra, V., Mercer, R.: Word-sense disambiguation using statistical methods. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California, pp. 264–270 (1991)

    Google Scholar 

  7. Gale, W., Church, K., Yarowsky, D.: A method for disambiguating word senses in a large corpus. Computers and the Humanities 26, 415–439 (1992)

    Article  Google Scholar 

  8. Diab, M.: Word Sense Disambiguation within a Multilingual Framework. Phd, University of Maryland, USA (2004)

    Google Scholar 

  9. Specia, L., Nunes, M., Stevenson, M.: Learning Expressive Models for Word Sense Disambiguation. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 41–48 (2007)

    Google Scholar 

  10. Tufiş, D., Ion, R., Ide, N.: Fine-Grained Word Sense Disambiguation Based on Parallel Corpora, Word Alignment, Word Clustering and Aligned Wordnets. In: Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 1312–1318. Association for Computational Linguistics (August 2004)

    Google Scholar 

  11. Dagan, I., Itai, A., Schwall, U.: Two Languages are more Informative than One. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, pp. 130–137 (1991)

    Google Scholar 

  12. Ng, H., Wang, B., Chan, Y.: Exploiting parallel texts for word sense disambiguation: An empirical study. In: 41st Annual Meeting of the Association for Computational Linguistics (ACL), Sapporo, Japan, pp. 455–462 (2003)

    Google Scholar 

  13. Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: Tenth Machine Translation Summit, Phuket, Thailand, pp. 79–86 (2005)

    Google Scholar 

  14. Landauer, T., Foltz, P., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)

    Article  Google Scholar 

  15. Hoste, V., Hendrickx, I., Daelemans, W., van den Bosch, A.: Parameter Optimization for Machine-Learning of Word Sense Disambiguation. Natural Language Engineering, Special Issue on Word Sense Disambiguation Systems 8, 311–325 (2002)

    Article  Google Scholar 

  16. Lopez de Lacalle, O.: Domain-Specific Word Sense Disambiguation. Phd, Lengoiaia eta Sistema Informatikoak Saila (UPV-EHU). Donostia 2009ko Abenduaren 14ean (2009)

    Google Scholar 

  17. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  18. Daelemans, W., van den Bosch, A.: Memory-based Language Processing. Cambridge University Press (2005)

    Google Scholar 

  19. Holland, J.: Adaptation in natural and artificial Systems. MIT Press (1975)

    Google Scholar 

  20. Lefever, E., Hoste, V.: Construction of a Benchmark Data Set for Cross-Lingual Word Sense Disambiguation. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Tapias, D. (eds.) Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta. European Language Resources Association, ELRA (May 2010)

    Google Scholar 

  21. van Gompel, M.: UvT-WSD1: A Cross-Lingual Word Sense Disambiguation System. In: Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval 2010), Uppsala, Sweden, pp. 238–241. Association for Computational Linguistics (2010)

    Google Scholar 

  22. Guo, W., Diab, M.: COLEPL and COLSLM: An Unsupervised WSD Approach to Multilingual Lexical Substitution, Tasks 2 and 3 SemEval 2010. In: Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden, pp. 129–133. Association for Computational Linguistics (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lefever, E., Hoste, V., De Cock, M. (2013). Five Languages Are Better Than One: An Attempt to Bypass the Data Acquisition Bottleneck for WSD. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37247-6_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37247-6_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37246-9

  • Online ISBN: 978-3-642-37247-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics