Abstract
This paper presents a multilingual classification-based approach to Word Sense Disambiguation that directly incorporates translational evidence from four other languages. The need of a large predefined monolingual sense inventory (such as WordNet) is avoided by taking a language-independent approach where the word senses are derived automatically from word alignments on a parallel corpus. As a consequence, the task is turned into a cross-lingual WSD task, that consists in selecting the contextually correct translation of an ambiguous target word.
In order to evaluate the viability of cross-lingual Word Sense Disambiguation, we built five classifiers with English as an input language and translations in the five supported languages (viz. French, Dutch, Italian, Spanish and German) as classification output. The feature vectors incorporate both local context features as well as translation features that are extracted from the aligned translations. The experimental results confirm the validity of our approach: the classifiers that employ translational evidence outperform the classifiers that only exploit local context information. Furthermore, a comparison with state-of-the-art systems for the same task revealed that our system outperforms all other systems for all five target languages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agirre, E., Edmonds, P.: Word Sense Disambiguation. Algorithms and Applications. Text, Speech and Language Technology. Springer (2006)
Navigli, R.: Word Sense Disambiguation: a Survey. ACM Computing Surveys 41(2), 1–69 (2009)
Landes, S., Leacock, C., Tengi, R.: Building Semantic Concordances, ch. 8, pp. 199–216. MIT Press, Cambridge (1998)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press (1998)
Resnik, P., Yarowsky, D.: Distinguishing systems and distinguishing senses: New evaluation methods for word sense disambiguation. Natural Language Engineering 5(3), 113–133 (2000)
Brown, P., Pietra, S., Pietra, V., Mercer, R.: Word-sense disambiguation using statistical methods. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California, pp. 264–270 (1991)
Gale, W., Church, K., Yarowsky, D.: A method for disambiguating word senses in a large corpus. Computers and the Humanities 26, 415–439 (1992)
Diab, M.: Word Sense Disambiguation within a Multilingual Framework. Phd, University of Maryland, USA (2004)
Specia, L., Nunes, M., Stevenson, M.: Learning Expressive Models for Word Sense Disambiguation. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 41–48 (2007)
Tufiş, D., Ion, R., Ide, N.: Fine-Grained Word Sense Disambiguation Based on Parallel Corpora, Word Alignment, Word Clustering and Aligned Wordnets. In: Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 1312–1318. Association for Computational Linguistics (August 2004)
Dagan, I., Itai, A., Schwall, U.: Two Languages are more Informative than One. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, pp. 130–137 (1991)
Ng, H., Wang, B., Chan, Y.: Exploiting parallel texts for word sense disambiguation: An empirical study. In: 41st Annual Meeting of the Association for Computational Linguistics (ACL), Sapporo, Japan, pp. 455–462 (2003)
Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: Tenth Machine Translation Summit, Phuket, Thailand, pp. 79–86 (2005)
Landauer, T., Foltz, P., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)
Hoste, V., Hendrickx, I., Daelemans, W., van den Bosch, A.: Parameter Optimization for Machine-Learning of Word Sense Disambiguation. Natural Language Engineering, Special Issue on Word Sense Disambiguation Systems 8, 311–325 (2002)
Lopez de Lacalle, O.: Domain-Specific Word Sense Disambiguation. Phd, Lengoiaia eta Sistema Informatikoak Saila (UPV-EHU). Donostia 2009ko Abenduaren 14ean (2009)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)
Daelemans, W., van den Bosch, A.: Memory-based Language Processing. Cambridge University Press (2005)
Holland, J.: Adaptation in natural and artificial Systems. MIT Press (1975)
Lefever, E., Hoste, V.: Construction of a Benchmark Data Set for Cross-Lingual Word Sense Disambiguation. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Tapias, D. (eds.) Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta. European Language Resources Association, ELRA (May 2010)
van Gompel, M.: UvT-WSD1: A Cross-Lingual Word Sense Disambiguation System. In: Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval 2010), Uppsala, Sweden, pp. 238–241. Association for Computational Linguistics (2010)
Guo, W., Diab, M.: COLEPL and COLSLM: An Unsupervised WSD Approach to Multilingual Lexical Substitution, Tasks 2 and 3 SemEval 2010. In: Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden, pp. 129–133. Association for Computational Linguistics (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lefever, E., Hoste, V., De Cock, M. (2013). Five Languages Are Better Than One: An Attempt to Bypass the Data Acquisition Bottleneck for WSD. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37247-6_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-37247-6_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37246-9
Online ISBN: 978-3-642-37247-6
eBook Packages: Computer ScienceComputer Science (R0)