n-Best Reranking for the Efficient Integration of Word Sense Disambiguation and Statistical Machine Translation

  • Lucia Specia
  • Baskaran Sankaran
  • Maria das Graças Volpe Nunes
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4919)

Abstract

Although it has been always thought that Word Sense Disambiguation (WSD) can be useful for Machine Translation, only recently efforts have been made towards integrating both tasks to prove that this assumption is valid, particularly for Statistical Machine Translation (SMT). While different approaches have been proposed and results started to converge in a positive way, it is not clear yet how these applications should be integrated to allow the strengths of both to be exploited. This paper aims to contribute to the recent investigation on the usefulness of WSD for SMT by using n-best reranking to efficiently integrate WSD with SMT. This allows using rich contextual WSD features, which is otherwise not done in current SMT systems. Experiments with English-Portuguese translation in a syntactically motivated phrase-based SMT system and both symbolic and probabilistic WSD models showed significant improvements in BLEU scores.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agirre, E., Màrquez, L., Wicentowski, R.: Proceedings of SemEval-2007 - the Fourth International Workshop on Semantic Evaluations, Prague (2007)Google Scholar
  2. 2.
    Bar-Hillel, Y.: The Present Status of Automatic Translations of Languages, 91–163 (1960)Google Scholar
  3. 3.
    Brown, P.F., et al.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19(2) (1993)Google Scholar
  4. 4.
    Cabezas, C., Resnik, P.: Using WSD Techniques for Lexical Selection in Statistical Machine Translation. UMIACS Technical Report UMIACS-TR-2005-42 (2005)Google Scholar
  5. 5.
    Carpuat, M., Wu, D.: Word Sense Disambiguation vs. Statistical Machine Translation. In: 43rd Annual Meeting of the Association for Computational Linguistics (ACL-2005), Ann Arbor, pp. 387–394 (2005)Google Scholar
  6. 6.
    Carpuat, M., Wu, D.: Improving Statistical Machine Translation Using Word Sense Disambiguation. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL-2007), Prague, pp. 61–72 (2007)Google Scholar
  7. 7.
    Chan, Y.S., Ng, H.T., Chiang, D.: Word Sense Disambiguation Improves Statistical Machine Translation. In: 45th Annual Meeting of the Association for Computational Linguistics (ACL-2007), Prague, pp. 33–40 (2007)Google Scholar
  8. 8.
    Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm
  9. 9.
    Nunes, M.G.V., et al.: The design of a Lexicon for Brazilian Portuguese: Lessons learned and Perspectives. In: II Workshop on Computational Processing of Written and Speak Portuguese (Propor), Curitiba, pp. 61–70 (1996)Google Scholar
  10. 10.
    Och, F.J.: Minimum error rate training in statistical machine translation. In: 41st Annual Meeting of the Association for Computational Linguistics (ACL-2003), Sapporo, pp. 160–167 (2003)Google Scholar
  11. 11.
    Och, F.J., Ney, H.: Improved statistical alignment models. In: 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000), Hong Kong, pp. 440–447 (2000)Google Scholar
  12. 12.
    Och, F.J., et al.: A Smorgasbord of Features for Statistical Machine Translation. Human Language Technology / North American Chapter of the Association for Computational Linguistics (HLT/NAACL-04), Boston, pp. 161–168 (2004)Google Scholar
  13. 13.
    Papineni, K., et al.: BLEU: a method for automatic evaluation of machine translation. In: 40th Annual Meeting of the Association for Computational Linguistics (ACL-2002), Philadelphia, pp. 311–318 (2002)Google Scholar
  14. 14.
    Quirk, C., Menezes, A., Cherry, C.: Dependency Treelet Translation: Syntactically Informed Phrasal SMT. In: 43rd Annual Meeting of the Association for Computational Linguistics (ACL-2005), Ann Arbor, pp. 271–279 (2005)Google Scholar
  15. 15.
    Specia, L., Nunes, M.G.V., Stevenson, M.: Exploiting Parallel Texts to Produce a Multilingual Sense-tagged Corpus for Word Sense Disambiguation. Recent Advances in Natural Language Processing (RANLP-2005), Borovets, pp. 525–531 (2005)Google Scholar
  16. 16.
    Specia, L., Stevenson, M., Nunes, M.G.V.: Learning Expressive Models for Word Sense Disambiguation. In: 45th Annual Meeting of the Association for Computational Linguistics (ACL-2007), Prague, pp. 41–48 (2007)Google Scholar
  17. 17.
    Stevenson, M., Wilks, Y.: The Interaction of Knowledge Sources for Word Sense Disambiguation. Computational Linguistics 27(3), 321–349 (2001)CrossRefGoogle Scholar
  18. 18.
    Toutanova, K., Suzuki, H.: Generating Case Markers in Machine Translation. Human Language Technology / North American Chapter of the Association for Computational Linguistics (HLT/NAACL-2007), Rochester, pp. 49–56 (2007)Google Scholar
  19. 19.
    Vickrey, D., et al.: Word-Sense Disambiguation for Machine Translation. Joint Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT/EMNLP-2005), Vancouver, pp. 771–778 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Lucia Specia
    • 1
    • 2
  • Baskaran Sankaran
    • 2
  • Maria das Graças Volpe Nunes
    • 1
  1. 1.NILC/ICMCUniversidade de São PauloSão CarlosBrazil
  2. 2.Microsoft Research IndiaBangaloreIndia

Personalised recommendations