SyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation

  • Marcin Junczys-Dowmunt
  • Arkadiusz Szał
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7053)

Abstract

SyMGiza++ — a tool that computes symmetric word alignment models with the capability to take advantage of multi-processor systems — is presented. A series of fairly simple modifications to the original IBM/Giza++ word alignment models allows to update the symmetrized models between chosen iterations of the original training algorithms. We achieve a relative alignment quality improvement of more than 17% compared to Giza++ and MGiza++ on the standard Canadian Hansards task, while maintaining the speed improvements provided by the capability of parallel computations of MGiza++.

Furthermore, the alignment models are evaluated in the context of phrase-based statistical machine translation, where a consistent improvement measured in BLEU scores can be observed when SyMGiza++ is used instead of Giza++ or MGiza++.

Keywords

Statistical Machine Translation Alignment Quality Sentence Pair Parallel Corpus Translation Quality 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)CrossRefMATHGoogle Scholar
  2. 2.
    Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–311 (1993)Google Scholar
  3. 3.
    Vogel, S., Ney, H., Tillmann, C.: Hmm-based word alignment in statistical translation. In: Proceedings of ACL, pp. 836–841 (1996)Google Scholar
  4. 4.
    Zens, R., Matusov, E., Ney, H.: Improved word alignment using a symmetric lexicon model. In: Proceedings of ACL-COLING, p. 36 (2004)Google Scholar
  5. 5.
    Liang, P., Taskar, B., Klein, D.: Alignment by agreement. In: Proceedings of ACL-COLING, pp. 104–111 (2006)Google Scholar
  6. 6.
    Gao, Q., Vogel, S.: Parallel implementations of word alignment tool. In: Proceedings of SETQA-NLP, pp. 49–57 (2008)Google Scholar
  7. 7.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistcial Society, Series B 39(1), 1–38 (1977)MathSciNetMATHGoogle Scholar
  8. 8.
    Al-Onaizan, Y., Curin, J., Jahr, M., Knight, K., Lafferty, J., Melamed, I., Och, F., Purdy, D., Smith, N., Yarowsky, D.: Statistical machine translation. Technical report, JHU workshop (1999)Google Scholar
  9. 9.
    Matusov, E., Zens, R., Ney, H.: Symmetric word alignments for statistical machine translation. In: Proceedings of ACL-COLING, pp. 219–225 (2004)Google Scholar
  10. 10.
    Mihalcea, R., Pedersen, T.: An evaluation exercise for word alignment. In: Proceedings of HLT-NAACL, pp. 1–10 (2003)Google Scholar
  11. 11.
    Fraser, A., Marcu, D.: Measuring word alignment quality for statistical machine translation. Computational Linguistics 33, 239–303 (2007)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open source toolkit for statistical machine translation. In: ACL (2007)Google Scholar
  13. 13.
    Koehn, P.: Statistical significance tests for machine translation evaluation. In: EMNLP, pp. 388–395 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Marcin Junczys-Dowmunt
    • 1
  • Arkadiusz Szał
    • 1
  1. 1.Faculty of Mathematics and Computer ScienceAdam Mickiewicz UniversityPoznańPoland

Personalised recommendations