A Syntactic Transformation Model for Statistical Machine Translation

  • Thai Phuong Nguyen
  • Akira Shimazu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4285)


We present a phrase-based SMT approach in which the word-order problem is solved using syntactic transformation in the preprocessing phase (There is no reordering in the decoding phase.) We describe a syntactic transformation model based on the probabilistic context-free grammar. This model is trained by using bilingual corpus and a broad coverage parser of the source language. This phrase-based SMT approach is applicable to language pairs in which the target language is poor in resources. We considered translation from English to Vietnamese and from English to French. Our experiments showed significant BLEU-score improvements in comparison with Pharaoh, a state-of-the-art phrase-based SMT system.


Transformational Model Transformational Rule Statistical Machine Translation Language Pair Translation Quality 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bikel, D.M.: Intricacies of Collins’ Parsing Model. Computational Linguistics 30(4), 479–511 (2004)CrossRefGoogle Scholar
  2. 2.
    Brown, P.F., Pietra, S.A.D., Pietra, V.J.D., Mercer, R.L.: The mathematics of statistical machine translation. Computational Linguistics 22(1), 39–69 (1993)Google Scholar
  3. 3.
    Charniak, E.: A maximum entropy inspired parser. In: Proceedings of HLT-NAACL (2000)Google Scholar
  4. 4.
    Charniak, E., Knight, K., Yamada, K.: Syntax-based language models for statistical machine translation. In: Proceedings of the MT Summit IX (2003)Google Scholar
  5. 5.
    Collins, M.: Head-Driven Statistical Models for Natural Language Parsing. PhD Thesis, University of Pennsylvania (1999)Google Scholar
  6. 6.
    Collins, M., Koehn, P., Kucerova, I.: Clause restructuring for statistical machine translation. In: Proceedings of ACL 2005 (2005)Google Scholar
  7. 7.
    Goldwater, S., McClosky, D.: Improving statistical MT through morphological analysis. In: Proceedings of EMNLP 2005 (2005)Google Scholar
  8. 8.
    Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of ACL 2003 (2003)Google Scholar
  9. 9.
    Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of HLT-NAACL 2003 (2003)Google Scholar
  10. 10.
    Koehn, P.: Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: Frederking, R.E., Taylor, K.B. (eds.) AMTA 2004. LNCS (LNAI), vol. 3265, pp. 115–124. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  11. 11.
    Lehmann, E.L.: Testing Statistical Hypotheses, 2nd edn. Springer, Heidelberg (1986)MATHGoogle Scholar
  12. 12.
    Marcu, D., Wong, W.: A phrase-based, joint probability model for statistical machine translation. In: Proceedings of EMNLP 2002 (2002)Google Scholar
  13. 13.
    Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Buildind a large annotated corpus of English: The Penn TreeBank. Computational Linguistics 19, 313–330 (1993)Google Scholar
  14. 14.
    Melamed, I.D.: Statistical machine translation by parsing. In: Proceedings of ACL 2004 (2004)Google Scholar
  15. 15.
    Niessen, S., Ney, H.: Statistical machine translation with scarce resources using morpho-syntactic information. Computational Linguistics 30(2), 181–204 (2004)CrossRefGoogle Scholar
  16. 16.
    Och, F.J., Ney, H.: Improved statistical alignment models. In: Proceedings of ACL 2000 (2000)Google Scholar
  17. 17.
    Och, F.J., Ney, H.: The alignment template approach to statistical machine translation. Computational Linguistics 30, 417–449 (2004)CrossRefGoogle Scholar
  18. 18.
    Och, F.J., Gildea, D., Khudanpur, S., Sarkar, A., Yamada, K., Fraser, A., Kumar, S., Shen, L., Smith, D., Eng, K., Jain, V., Jin, Z., Radev, D.: A smorgasbord of features for statistical machine translation. In: Proceedings of HLT-NAACL 2004 (2004)Google Scholar
  19. 19.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. Technical Report RC22176 (W0109-022), IBM Research Report (2001)Google Scholar
  20. 20.
    Shen, L., Sarkar, A., Och, F.J.: Discriminative reranking for machine translation. In: Proceedings of HLT-NAACL 2004 (2004)Google Scholar
  21. 21.
    Stolcke, A.: SRILM - An Extensible Language Modeling Toolkit. In: Proc. Intl. Conf. Spoken Language Processing, Denver, Colorado (September 2002)Google Scholar
  22. 22.
    Nguyen, T.P., Nguyen, V.V., Le, A.C.: Vietnam-ese Word Segmentation Using Hidden Markov Model. In: International Workshop for Computer, Information, and Communication Technologies in Korea and Vietnam (2003)Google Scholar
  23. 23.
    Nguyen, T.P., Shimazu, A.: Improving Phrase-Based SMT with Morpho-Syntactic Analysis and Transformation. In: Proceedings of AMTA 2006 (2006)Google Scholar
  24. 24.
    Xia, F., McCord, M.: Improving a statistical MT system with automatically learned rewrite patterns. In: Proceedings of COLING 2004 (2004)Google Scholar
  25. 25.
    Yamada, K., Knight, K.: A syntax-based statistical translation model. In: Proceedings of ACL 2001 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Thai Phuong Nguyen
    • 1
  • Akira Shimazu
    • 1
  1. 1.School of Information ScienceJapan Advanced Institute of Science and Technology 

Personalised recommendations