Skip to main content
Log in

Improving neural machine translation through phrase-based soft forced decoding

  • Published:
Machine Translation

Abstract

Compared to traditional statistical machine translation (SMT), such as phrase-based machine translation (PBMT), neural machine translation (NMT) often sacrifices adequacy for the sake of fluency. We propose a method to combine the advantages of traditional SMT and NMT by exploiting an existing phrase-based SMT model to compute the phrase-based decoding cost for an NMT output and then using this cost to rerank the n-best NMT outputs. The main challenge in implementing this approach is that NMT outputs may not be in the search space of the standard phrase-based decoding algorithm, because the search space of PBMT is limited by the phrase-based translation rule table. We propose a phrase-based soft forced decoding algorithm, which can always successfully find a decoding path for any NMT output. We show that using the phrase-based decoding cost to rerank the NMT outputs can successfully improve translation quality on four different language pairs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. In fact, our method can take in the output of any up-stream system, but we experiment exclusively with using it to rerank NMT outputs.

  2. g, f and a in Eqs. (1), (2) and (4) are nonlinear, potentially multi-layered, functions.

  3. In actual phrase-based decoding, it is common to integrate reordering probabilities in the forced decoding score defined in Eq. (9). However, because NMT generally produces more properly ordered sentences than traditional SMT, in this work we do not consider reordering probabilities in our forced decoding algorithm.

  4. The original rule table includes only translation rules without the new introduced word inserting/deleting rules.

  5. In our previous work (Zhang et al. 2017b), we only used the sampled outputs and the 1-best output from beam search for reranking. However, in this paper, we also include the 100-best outputs from beam search for reranking; 100 is the maximum beam size that we can set due to memory limitations. In addition, we add a comparison using beam search outputs and sampled outputs for reranking in Table 10. We also add results for different sampling strategies in Fig. 2.

  6. The NMT outputs used for reranking in this paper are different from our previous IJCNLP paper. We test the influence of different NMT outputs for reranking in Sect. 5.3.

  7. Note that NTCIR-9 only contained a Chinese-to-English translation task, whereas we used English as the source language in our experiments. In NTCIR-9, the development and test sets were both provided for the zh-en task while only the test set was provided for the en-ja task. We used the sentences from the NTCIR-8 en-ja and ja-en test sets as the development set in our experiments.

  8. http://sourceforge.net/projects/mecab/files/.

  9. https://github.com/neubig/lamtram.

  10. We used the default Moses settings for phrase-based SMT.

  11. The best NMT system and the systems that have no significant difference from the best NMT system at the \(p < 0.05\) level using bootstrap resampling (Koehn 2004) are shown in bold font.

  12. The results of \(\hbox{NMT}_L\) in Table 4 were obtained with beam size 10. We found \(\hbox{NMT}_L\) BLEU scores decreased with beam size 100 because \(P_n\) prefers shorter translations and the \(\hbox{NMT}_L\) outputs became much shorter with beam size 100.

References

  • Alkhouli T, Bretschner G, Peter JT, Hethnawi M, Guta A, Ney H (2016) Alignment-based neural machine translation. In: Proceedings of the first conference on machine translation, Berlin, Germany, pp 54–65

  • Arthur P, Neubig G, Nakamura S (2016) Incorporating discrete translation lexicons into neural machine translation. In: Proceedings of the 2016 conference on empirical methods in natural language processing, Austin, TX, pp 1557–1567

  • Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint. arXiv:14090473

  • Brown PF, Pietra VJD, Pietra SAD, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263–311

    Google Scholar 

  • Cohn T, Hoang CDV, Vymolova E, Yao K, Dyer C, Haffari G (2016) Incorporating structural alignment biases into an attentional neural translation model. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, San Diego, CA, pp 876–885

  • Denkowski M, Lavie A (2014) Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the ninth workshop on statistical machine translation, Baltimore, MD, USA, pp 376–380

  • Goto I, Lu B, Chow KP, Sumita E, Tsou BK (2011) Overview of the patent machine translation task at the NTCIR-9 workshop. In: Proceedings of the 9th NTCIR workshop, Tokyo, Japan, pp 559–578

  • He W, He Z, Wu H, Wang H (2016) Improved neural machine translation with SMT features. In: Thirtieth AAAI conference on artificial intelligence, Phoenix, AZ, USA, pp 151–157

  • Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint. arXiv:14126980

  • Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of the 2004 conference on empirical methods in natural language processing, Barcelona, Spain, pp 388–395

  • Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: MT Summit X, conference proceedings: the tenth machine translation summit, Phuket, Thailand, pp 79–86

  • Koehn P, Knowles R (2017) Six challenges for neural machine translation. In: Proceedings of the first workshop on neural machine translation, Vancouver, Canada, pp 28–39

  • Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology, vol 1, Edmonton, Canada, pp 48–54

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, Prague, Czech Republic, pp 177–180

  • Liu L, Utiyama M, Finch A, Sumita E (2016) Agreement on target-bidirectional neural machine translation. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, San Diego, CA, pp 411–416

  • Meng F, Lu Z, Li H, Liu Q (2016) Interactive attention for neural machine translation. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, Osaka, Japan, pp 2174–2185

  • Mi H, Sankaran B, Wang Z, Ittycheriah A (2016) Coverage embedding models for neural machine translation. In: Proceedings of the 2016 conference on empirical methods in natural language processing, Austin, TX, pp 955–960

  • Neubig G, Morishita M, Nakamura S (2015) Neural reranking improves subjective quality of machine translation: NAIST at WAT2015. In: Proceedings of the 2nd workshop on Asian Translation (WAT2015), Kyoto, Japan, pp 35–41

  • Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51

    Article  Google Scholar 

  • Popović M (2015) chrF: character n-gram F-score for automatic MT evaluation. In: Proceedings of the tenth workshop on statistical machine translation, Lisbon, Portugal, pp 392–395

  • Sennrich R, Haddow B, Birch A (2016) Edinburgh neural machine translation systems for WMT 16. In: Proceedings of the first conference on machine translation: vol 2, Shared Task Papers, Berlin, Germany, pp 371–376

  • Shen S, Cheng Y, He Z, He W, Wu H, Sun M, Liu Y (2016) Minimum risk training for neural machine translation. In: Proceedings of the 54th annual meeting of the Association for Computational Linguistics, vol 1: long papers, Berlin, Germany, pp 1683–1692

  • Stahlberg F, Hasler E, Waite A, Byrne B (2016) Syntactically guided neural machine translation. In: Proceedings of the 54th annual meeting of the Association for Computational Linguistics, vol 2: Short papers, Berlin, Germany, pp 299–305

  • Stahlberg F, de Gispert A, Hasler E, Byrne B (2017) Neural machine translation by minimising the Bayes-risk with respect to syntactic translation lattices. In: Proceedings of the 15th conference of the European chapter of the Association for Computational Linguistics: vol 2, Short Papers, Valencia, Spain, pp 362–368

  • Tang Y, Meng F, Lu Z, Li H, Yu PL (2016) Neural machine translation with external phrase memory. arXiv preprint. arXiv:160601792

  • Tu Z, Lu Z, Liu Y, Liu X, Li H (2016) Modeling coverage for neural machine translation. In: Proceedings of the 54th annual meeting of the Association for Computational Linguistics (vol 1: long papers), Berlin, Germany, pp 76–85

  • Wuebker J, Mauser A, Ney H (2010) Training phrase translation models with leaving-one-out. In: Proceedings of the 48th annual meeting of the Association for Computational Linguistics, Uppsala, Sweden, pp 475–484

  • Wuebker J, Hwang MY, Quirk C (2012) Leave-one-out phrase model training for large-scale deployment. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 460–467

  • Xiao T, Wong DF, Zhu J (2016) A loss-augmented approach to training syntactic machine translation systems. IEEE/ACM Trans Audio Speech Lang Process 24(11):2069–2083

    Article  Google Scholar 

  • Yu H, Huang L, Mi H, Zhao K (2013) Max-violation perceptron and forced decoding for scalable MT training. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Seattle, WA, USA, pp 1112–1123

  • Zhang J, Liu Y, Luan H, Xu J, Sun M (2017a) Prior knowledge integration for neural machine translation using posterior regularization. In: Proceedings of the 55th annual meeting of the Association for Computational Linguistics (vol 1: Long Papers), Vancouver, Canada, pp 1514–1523

  • Zhang J, Utiyama M, Sumita E, Neubig G, Nakamura S (2017b) Improving neural machine translation through phrase-based forced decoding. In: Proceedings of the eighth international joint conference on natural language processing (vol 1: Long Papers), Taipei, Taiwan, pp 152–162

  • Zhao H, Huang CN, Li M, et al (2006) An improved chinese word segmentation system with conditional random field. In: Proceedings of the fifth SIGHAN workshop on Chinese language processing, Sydney, Australia, pp 162–165

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jingyi Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

An earlier version of this paper (Zhang et al. 2017b) was published as a long paper in IJCNLP 2017. We extended this paper, including comparison with target-bidirectional NMT models (Liu et al. 2016) and results of using different n-best lists (from beam search and ancestral sampling) for reranking.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Utiyama, M., Sumita, E. et al. Improving neural machine translation through phrase-based soft forced decoding. Machine Translation 34, 21–39 (2020). https://doi.org/10.1007/s10590-020-09244-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-020-09244-y

Keywords

Navigation