Crossing Parallel Corpora and Multilingual Lexical Databases for WSD
Word Sense Disambiguation (WSD) is the task of selecting the correct sense of a word in a context from a sense repository. Typically, WSD is approached as a supervised classification task to get state-of-the-art performance (e.g. ), and thus a large amount of sense-tagged examples for each sense of the word is needed, according to the word-expert approach. This requirement makes the supervised approach unfeasible for “all-words” tasks, consisting on disambiguating all the words in texts. This problem has been called the Knowledge Acquisition Bottleneck and many solutions have been proposed for it (see for example ) .
Unable to display preview. Download preview PDF.
- 1.Strapparava, C., Gliozzo, A., Giuliano, C.: Pattern abstraction and term similarity for word sense disambiguation: Irst at senseval-3. In: Proc. of SENSEVAL-3, Barcelona, Spain (2004)Google Scholar
- 2.Mihalcea, R., Moldovan, D.: An automatic method for generating sense tagged corpora. In: Proc. of AAAI 1999, Orlando, FL (1999)Google Scholar
- 3.Magnini, B., Strapparava, C.: Experiments in word domain disambiguation for parallel texts. In: Proc. of Word Senses and Multi-Linguality, Hong Kong, Workshop held in conjunction of ACL 2000 (2000)Google Scholar
- 4.Diab, M., Resnik, P.: An unsupervised method for word sense tagging using parallel texts. In: Proc. of ACL 2002, Philadelphia (2002)Google Scholar
- 5.Bentivogli, L., Pianta, E.: Exploiting parallel texts in the creation of multilingual semantically annotated resources: the MultiSemCor corpus. Journal of Natural Language Engineering (NLE), Special Issue on Parallel Texts (to appear) Google Scholar
- 6.Koehn, P.: EuroParl: A multilingual corpus for evaluation of machine translation (Unpublished), http://people.csail.mit.edu/~koehn/publications/europarl.ps