Skip to main content
Log in

Learning local word reorderings for hierarchical phrase-based statistical machine translation

  • Published:
Machine Translation

Abstract

Statistical models for reordering source words have been used to enhance hierarchical phrase-based statistical machine translation. There are existing word-reordering models that learn reorderings for any two source words in a sentence or only for two contiguous words. This paper proposes a series of separate sub-models to learn reorderings for word pairs with different distances. Our experiments demonstrate that reordering sub-models for word pairs with distances less than a specific threshold are useful to improve translation quality. Compared with previous work, our method more effectively and efficiently exploits helpful word-reordering information; it improves a basic hierarchical phrase-based system by 2.4-3.1 BLEU points and keeps the average time of translating one sentence under 10 s.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. In translation experiments, we also tried adding a new penalty feature (how many source words in the input sentence are unaligned) to penalize unaligned words. However, this feature did not influence translation performance significantly.

  2. Note that these scores are correspondingly calculated for different sub-models \(M_n\) and the sub-model weights are tuned separately.

  3. In the original Hiero paper (Chiang 2005), only two nonterminals are allowed. However, it is not theoretically impossible to create rules with more than two nonterminals, hence our use of K here.

  4. As we are using a cache, memory usage is a concern, but the size of the cache for each sentence is negligible compared to the size of the translation and language models, and thus the memory footprint is not increased significantly.

  5. http://sourceforge.net/projects/mecab/files/.

  6. http://hlt.fbk.eu/en/irstlm.

  7. Note that “4” and “5” in the source and target sentences are original source and target words. This sentence pair is from a patent-translation corpus and there is a figure in the article, where the light source is labeled as 4 and the optical fiber is labeled as 5.

  8. Cache was used in all experiments.

References

  • Bisazza A, Federico M (2013) Dynamically shaping the reordering search space of phrase-based statistical machine translation. Trans Assoc Comput Linguist 1:327–340

    Google Scholar 

  • Cao H, Zhang D, Li M, Zhou M, Zhao T (2014) A lexicalized reordering model for hierarchical phrase-based translation. In: Coling 2014: proceedings of 25th international conference on computational linguistics. Dublin, pp 1144–1153

  • Chen S, Goodman J (1999) An empirical study of smoothing techniques for language modeling. Comput Speech Lang 4(13):359–393

    Article  Google Scholar 

  • Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. ACL-05: 43rd annual meeting of the association for computational linguistics. Michigan, Ann Arbor, pp 263–270

  • Chiang D (2012) Hope and fear for discriminative training of statistical translation models. J Mach Learn Res 13(1):1159–1187

    MathSciNet  MATH  Google Scholar 

  • Cui L, Zhang D, Li M, Zhou M, Zhao T (2010) A joint rule selection model for hierarchical phrase-based translation. ACL 2010: the 48th annual meeting of the association for computational linguistics. Uppsala, pp 6–11

  • Feng M, Peter JT, Ney H (2013) Advancements in reordering models for statistical machine translation. Proceedings of the 51st annual meeting of the association for computational linguistics, vol 1, Long Papers, Sofia, pp 322–332

  • Gao Y, Koehn P, Birch A (2011) Soft dependency constraints for reordering in hierarchical phrase-based translation. In: Proceedings of EMNLP 2011, conference on empirical methods in natural language processing. Edinburgh, pp 857–868

  • Goto I, Lu B, Chow KP, Sumita E, Tsou BK (2011) Overview of the patent machine translation task at the NTCIR-9 workshop. In: Proceedings of the 9th NII test collection for IR systems workshop meeting. Tokyo, pp 559–578

  • Hayashi K, Tsukada H, Sudoh K, Duh K, Yamamoto S (2010) Hierarchical phrase-based machine translation with word-based reordering model. Coling 2010: proceedings of 23rd international conference on computational linguistics. Beijing, pp 439–446

  • He Z, Liu Q, Lin S (2008) Improving statistical machine translation using lexicalized rule selection. Coling 2008: proceedings of 22nd international conference on computational linguistics. Manchester, pp 321–328

  • Hopkins M, May J (2011) Tuning as ranking. In: Proceedings of EMNLP 2011, conference on empirical methods in natural language processing, Edinburgh, pp 1352–1362

  • Huck M, Wuebker J, Rietig F, Ney H (2013) A phrase orientation model for hierarchical machine translation. WMT 2013: Proceedings of 8th workshop on statistical machine translation. Sofia, pp 452–463

  • Kazemi A, Toral A, Way A, Monadjemi A, Nematbakhsh M (2015) Dependency-based reordering model for constituent pairs in hierarchical SMT. EAMT-2015: proceedings of the eighteenth annual conference of the european association for machine translation. Antalya, pp 43–50

  • Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of the 2004 conference on empirical methods in natural language processing. Barcelona pp 388–395

  • Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. HLT-NAACL 2003: conference combining human language technology conference series and the North American chapter of the association for computational linguistics conference series. Edmonton, pp 48–54

  • Koehn P, Axelrod A, Birch A, Callison-Burch C, Osborne M, Talbot D, White M (2005) Edinburgh system description for the 2005 IWSLT speech translation evaluation. In: International workshop on spoken language translation: evaluation campaign on spoken language translation. Pittsburgh, pp 68–75

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. The 45th annual meeting of the association for computational linguistics: demo and poster sessions. Prague, pp 177–180

  • Li P, Liu Y, Sun M, Izuha T, Zhang D (2014) A neural reordering model for phrase-based translation. Coling 2014: proceedings of 25th international conference on computational linguistics. Dublin, pp 1897–1907

  • Liu Q, He Z, Liu Y, Lin S (2008) Maximum entropy based rule selection model for syntax-based statistical machine translation. In: EMNLP 2008: proceedings of 2008 conference on empirical methods in natural language processing, Honolulu, pp 89–97

  • Marton Y, Resnik P (2008) Soft syntactic constraints for hierarchical phrased-based translation. In: ACL-08: HLT, proceedings of 46th annual meeting of the association for computational linguistics: human language technologies. Columbus, pp 1003–1011

  • Nguyen T, Vogel S (2013) Integrating phrase-based reordering features into a chart-based decoder for machine translation. In: ACL 2013, Proceedings of 51st annual meeting of the association for computational linguistics. Sofia, pp 1587–1596

  • Ni Y, Saunders C, Szedmak S, Niranjan M (2009) Handling phrase reorderings for machine translation. In: Proceedings of ACL-IJCNLP 2009, joint conference of the 47th annual meeting of the association for computational linguistics and 4th international joint conference on natural language processing of the AFNLP. Suntec, pp 241–244

  • Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of 41st Annual meeting of the association for computational linguistics. Sapporo, pp 160–167

  • Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the conference on 40th annual meeting of the association for computational linguistics, Philadelphia, pp 295–302

  • Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51

    Article  MATH  Google Scholar 

  • Rumelhart D, Hinton G, Williams R (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536

    Article  Google Scholar 

  • Tromble R, Eisner J (2009) Learning linear ordering problems for better translation. In: EMNLP 2009, proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, pp 1007–1016

  • Vaswani A, Zhao Y, Fossum V, Chiang D (2013) Decoding with large-scale neural language models improves translation. In: Proceedings of the 2013 Conference on empirical methods in natural language processing. Seattle, pp 1387–1392

  • Wang X, Xiong D, Zhang M (2015) Learning semantic representations for nonterminals in hierarchical phrase-based translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Lisbon, pp 1391–1400

  • Zens R, Ney H (2006) Discriminative reordering models for statistical machine translation. In: Proceedings of the workshop HLT-NAACL 06, statistical machine translation. New York City, pp 55–63

  • Zhang J, Utiyama M, Sumita E, Zhao H (2015) Learning word reorderings for hierarchical phrase-based statistical machine translation. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing, vol 2. Short Papers. Beijing, pp 542–548

  • Zhao H, Huang CN, Li M (2006) An improved Chinese word segmentation system with conditional random field. In: Proceedings of the fifth SIGHAN workshop on chinese language processing. Sydney, pp 162–165

Download references

Acknowledgments

Hai Zhao was partially supported by the National Natural Science Foundation of China (Grant No. 61170114, and Grant No. 61272248), the National Basic Research Program of China (Grant No. 2013CB329401), the Science and Technology Commission of Shanghai Municipality (Grant No. 13511500200), the European Union Seventh Framework Program (Grant No. 247619), the Cai Yuanpei Program (CSC fund 201304490199, 201304490171), and the art and science interdisciplinary funds of Shanghai Jiao Tong University, No. 14X190040031, and the Key Project of National Society Science Foundation of China, No. 15-ZDA041.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jingyi Zhang, Masao Utiyama or Hai Zhao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Utiyama, M., Sumita, E. et al. Learning local word reorderings for hierarchical phrase-based statistical machine translation. Machine Translation 30, 1–18 (2016). https://doi.org/10.1007/s10590-016-9178-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-016-9178-7

Keywords

Navigation