Advertisement

Czech-English Phrase-Based Machine Translation

  • Ondřej Bojar
  • Evgeny Matusov
  • Hermann Ney
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4139)

Abstract

We describe experiments with Czech-to-English phrase-based machine translation. Several techniques for improving translation quality (in terms of well-established measure BLEU) are evaluated. In total, we are able to achieve BLEU of 0.36 to 0.41 on the examined corpus of Wall Street Journal texts, outperforming all other systems evaluated on this language pair.

Keywords

Machine Translation Statistical Machine Translation Language Pair Translation Quality Word Alignment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hajič, J.: Complex Corpus Annotation: The Prague Dependency Treebank. In: Šimková, M. (ed.) Insight into Slovak and Czech Corpus Linguistics, Bratislava, Slovakia, Veda, vydavateľstvo SAV, pp. 54–73 (2005)Google Scholar
  2. 2.
    Sgall, P., Hajičová, E., Panevová, J.: The Meaning of the Sentence and Its Semantic and Pragmatic Aspects. Academia/Reidel Publishing Company, Prague, Czech Republic/Dordrecht, Netherlands (1986)Google Scholar
  3. 3.
    Čmejrek, M., Cuřín, J., Havelka, J.: Czech-English Dependency-based Machine Translation. In: EACL 2003 Proceedings of the Conference, Association for Computational Linguistics, pp. 83–90 (2003)Google Scholar
  4. 4.
    Zens, R., Bender, O., Hasan, S., Khadivi, S., Matusov, E., Xu, J., Zhang, Y., Ney, H.: The RWTH Phrase-based Statistical Machine Translation System. In: Proceedings of the International Workshop on Spoken Language Translation (IWSLT), Pittsburgh, PA, pp. 155–162 (2005)Google Scholar
  5. 5.
    Čmejrek, M., Cuřín, J., Havelka, J., Hajič, J., Kuboň, V.: Prague Czech-English Dependecy Treebank: Syntactically Annotated Resources for Machine Translation. In: Proceedings of LREC 2004, Lisbon (2004)Google Scholar
  6. 6.
    Linguistic Data Consortium: Penn Treebank 3, LDC99T42 (1999)Google Scholar
  7. 7.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a Method for Automatic Evaluation of Machine Translation. In: ACL 2002, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, pp. 311–318 (2002)Google Scholar
  8. 8.
    Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)CrossRefGoogle Scholar
  9. 9.
    Matusov, E., Zens, R., Ney, H.: Symmetric Word Alignments for Statistical Machine Translation. In: Proceedings of COLING 2004, Geneva, Switzerland, pp. 219–225 (2004)Google Scholar
  10. 10.
    Bojar, O., Prokopová, M.: Czech-English Word Alignment. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006), ELRA (in print, 2006)Google Scholar
  11. 11.
    Lopatková, M., Plátek, M., Kuboň, V.: Modeling Syntax of Free Word-Order Languages: Dependency Analysis by Reduction. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS, vol. 3658, pp. 140–147. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  12. 12.
    Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor, Michigan, Association for Computational Linguistics, pp. 263–270 (2005)Google Scholar
  13. 13.
    Och, F.J.: Statistical Machine Translation: Foundations and Recent Advances. In: Tutorial at MT Summit 2005 (2005)Google Scholar
  14. 14.
    Leusch, G., Ueffing, N., Vilar, D., Ney, H.: Preprocessing and Normalization for Automatic Evaluation of Machine Translation. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, Michigan, Association for Computational Linguistics, pp. 17–24 (2005)Google Scholar
  15. 15.
    Germann, U.: Greedy decoding for statistical machine translation in almost linear time. In: HLT-NAACL (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ondřej Bojar
    • 1
  • Evgeny Matusov
    • 2
  • Hermann Ney
    • 2
  1. 1.Institute of Formal and Applied Linguistics, ÚFAL MFF UKPrahaCzech Republic
  2. 2.Lehrstuhl für Informatik 6, Computer Science DepartmentRWTH Aachen UniversityAachenGermany

Personalised recommendations