Improving neural machine translation through phrase-based soft forced decoding

Zhang, Jingyi; Utiyama, Masao; Sumita, Eiichro; Neubig, Graham; Nakamura, Satoshi

doi:10.1007/s10590-020-09244-y

Improving neural machine translation through phrase-based soft forced decoding

Published: 04 April 2020

Volume 34, pages 21–39, (2020)
Cite this article

Machine Translation

Jingyi Zhang^1,2,
Masao Utiyama¹,
Eiichro Sumita¹,
Graham Neubig³ &
…
Satoshi Nakamura²

381 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Compared to traditional statistical machine translation (SMT), such as phrase-based machine translation (PBMT), neural machine translation (NMT) often sacrifices adequacy for the sake of fluency. We propose a method to combine the advantages of traditional SMT and NMT by exploiting an existing phrase-based SMT model to compute the phrase-based decoding cost for an NMT output and then using this cost to rerank the n-best NMT outputs. The main challenge in implementing this approach is that NMT outputs may not be in the search space of the standard phrase-based decoding algorithm, because the search space of PBMT is limited by the phrase-based translation rule table. We propose a phrase-based soft forced decoding algorithm, which can always successfully find a decoding path for any NMT output. We show that using the phrase-based decoding cost to rerank the NMT outputs can successfully improve translation quality on four different language pairs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Investigation on Statistical Machine Translation with Neural Language Models

A Comparable Study on Model Averaging, Ensembling and Reranking in NMT

ISTIC’s Neural Machine Translation System for CCMT’ 2021

Notes

In fact, our method can take in the output of any up-stream system, but we experiment exclusively with using it to rerank NMT outputs.
g, f and a in Eqs. (1), (2) and (4) are nonlinear, potentially multi-layered, functions.
In actual phrase-based decoding, it is common to integrate reordering probabilities in the forced decoding score defined in Eq. (9). However, because NMT generally produces more properly ordered sentences than traditional SMT, in this work we do not consider reordering probabilities in our forced decoding algorithm.
The original rule table includes only translation rules without the new introduced word inserting/deleting rules.
In our previous work (Zhang et al. 2017b), we only used the sampled outputs and the 1-best output from beam search for reranking. However, in this paper, we also include the 100-best outputs from beam search for reranking; 100 is the maximum beam size that we can set due to memory limitations. In addition, we add a comparison using beam search outputs and sampled outputs for reranking in Table 10. We also add results for different sampling strategies in Fig. 2.
The NMT outputs used for reranking in this paper are different from our previous IJCNLP paper. We test the influence of different NMT outputs for reranking in Sect. 5.3.
Note that NTCIR-9 only contained a Chinese-to-English translation task, whereas we used English as the source language in our experiments. In NTCIR-9, the development and test sets were both provided for the zh-en task while only the test set was provided for the en-ja task. We used the sentences from the NTCIR-8 en-ja and ja-en test sets as the development set in our experiments.
http://sourceforge.net/projects/mecab/files/.
https://github.com/neubig/lamtram.
We used the default Moses settings for phrase-based SMT.
The best NMT system and the systems that have no significant difference from the best NMT system at the \(p < 0.05\) level using bootstrap resampling (Koehn 2004) are shown in bold font.
The results of \(\hbox{NMT}_L\) in Table 4 were obtained with beam size 10. We found \(\hbox{NMT}_L\) BLEU scores decreased with beam size 100 because \(P_n\) prefers shorter translations and the \(\hbox{NMT}_L\) outputs became much shorter with beam size 100.

References

Alkhouli T, Bretschner G, Peter JT, Hethnawi M, Guta A, Ney H (2016) Alignment-based neural machine translation. In: Proceedings of the first conference on machine translation, Berlin, Germany, pp 54–65
Arthur P, Neubig G, Nakamura S (2016) Incorporating discrete translation lexicons into neural machine translation. In: Proceedings of the 2016 conference on empirical methods in natural language processing, Austin, TX, pp 1557–1567
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint. arXiv:14090473
Brown PF, Pietra VJD, Pietra SAD, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263–311
Google Scholar
Cohn T, Hoang CDV, Vymolova E, Yao K, Dyer C, Haffari G (2016) Incorporating structural alignment biases into an attentional neural translation model. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, San Diego, CA, pp 876–885
Denkowski M, Lavie A (2014) Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the ninth workshop on statistical machine translation, Baltimore, MD, USA, pp 376–380
Goto I, Lu B, Chow KP, Sumita E, Tsou BK (2011) Overview of the patent machine translation task at the NTCIR-9 workshop. In: Proceedings of the 9th NTCIR workshop, Tokyo, Japan, pp 559–578
He W, He Z, Wu H, Wang H (2016) Improved neural machine translation with SMT features. In: Thirtieth AAAI conference on artificial intelligence, Phoenix, AZ, USA, pp 151–157
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint. arXiv:14126980
Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of the 2004 conference on empirical methods in natural language processing, Barcelona, Spain, pp 388–395
Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: MT Summit X, conference proceedings: the tenth machine translation summit, Phuket, Thailand, pp 79–86
Koehn P, Knowles R (2017) Six challenges for neural machine translation. In: Proceedings of the first workshop on neural machine translation, Vancouver, Canada, pp 28–39
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology, vol 1, Edmonton, Canada, pp 48–54
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, Prague, Czech Republic, pp 177–180
Liu L, Utiyama M, Finch A, Sumita E (2016) Agreement on target-bidirectional neural machine translation. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, San Diego, CA, pp 411–416
Meng F, Lu Z, Li H, Liu Q (2016) Interactive attention for neural machine translation. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, Osaka, Japan, pp 2174–2185
Mi H, Sankaran B, Wang Z, Ittycheriah A (2016) Coverage embedding models for neural machine translation. In: Proceedings of the 2016 conference on empirical methods in natural language processing, Austin, TX, pp 955–960
Neubig G, Morishita M, Nakamura S (2015) Neural reranking improves subjective quality of machine translation: NAIST at WAT2015. In: Proceedings of the 2nd workshop on Asian Translation (WAT2015), Kyoto, Japan, pp 35–41
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51
Article Google Scholar
Popović M (2015) chrF: character n-gram F-score for automatic MT evaluation. In: Proceedings of the tenth workshop on statistical machine translation, Lisbon, Portugal, pp 392–395
Sennrich R, Haddow B, Birch A (2016) Edinburgh neural machine translation systems for WMT 16. In: Proceedings of the first conference on machine translation: vol 2, Shared Task Papers, Berlin, Germany, pp 371–376
Shen S, Cheng Y, He Z, He W, Wu H, Sun M, Liu Y (2016) Minimum risk training for neural machine translation. In: Proceedings of the 54th annual meeting of the Association for Computational Linguistics, vol 1: long papers, Berlin, Germany, pp 1683–1692
Stahlberg F, Hasler E, Waite A, Byrne B (2016) Syntactically guided neural machine translation. In: Proceedings of the 54th annual meeting of the Association for Computational Linguistics, vol 2: Short papers, Berlin, Germany, pp 299–305
Stahlberg F, de Gispert A, Hasler E, Byrne B (2017) Neural machine translation by minimising the Bayes-risk with respect to syntactic translation lattices. In: Proceedings of the 15th conference of the European chapter of the Association for Computational Linguistics: vol 2, Short Papers, Valencia, Spain, pp 362–368
Tang Y, Meng F, Lu Z, Li H, Yu PL (2016) Neural machine translation with external phrase memory. arXiv preprint. arXiv:160601792
Tu Z, Lu Z, Liu Y, Liu X, Li H (2016) Modeling coverage for neural machine translation. In: Proceedings of the 54th annual meeting of the Association for Computational Linguistics (vol 1: long papers), Berlin, Germany, pp 76–85
Wuebker J, Mauser A, Ney H (2010) Training phrase translation models with leaving-one-out. In: Proceedings of the 48th annual meeting of the Association for Computational Linguistics, Uppsala, Sweden, pp 475–484
Wuebker J, Hwang MY, Quirk C (2012) Leave-one-out phrase model training for large-scale deployment. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 460–467
Xiao T, Wong DF, Zhu J (2016) A loss-augmented approach to training syntactic machine translation systems. IEEE/ACM Trans Audio Speech Lang Process 24(11):2069–2083
Article Google Scholar
Yu H, Huang L, Mi H, Zhao K (2013) Max-violation perceptron and forced decoding for scalable MT training. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Seattle, WA, USA, pp 1112–1123
Zhang J, Liu Y, Luan H, Xu J, Sun M (2017a) Prior knowledge integration for neural machine translation using posterior regularization. In: Proceedings of the 55th annual meeting of the Association for Computational Linguistics (vol 1: Long Papers), Vancouver, Canada, pp 1514–1523
Zhang J, Utiyama M, Sumita E, Neubig G, Nakamura S (2017b) Improving neural machine translation through phrase-based forced decoding. In: Proceedings of the eighth international joint conference on natural language processing (vol 1: Long Papers), Taipei, Taiwan, pp 152–162
Zhao H, Huang CN, Li M, et al (2006) An improved chinese word segmentation system with conditional random field. In: Proceedings of the fifth SIGHAN workshop on Chinese language processing, Sydney, Australia, pp 162–165

Download references

Author information

Authors and Affiliations

National Institute of Information and Communications Technology, Koganei, Tokyo, Japan
Jingyi Zhang, Masao Utiyama & Eiichro Sumita
Graduate School of Information Science, Nara Institute of Science and Technology, Ikoma, Japan
Jingyi Zhang & Satoshi Nakamura
Language Technologies Institute, Carnegie Mellon University, Pittsburgh, USA
Graham Neubig

Authors

Jingyi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Masao Utiyama
View author publications
You can also search for this author in PubMed Google Scholar
Eiichro Sumita
View author publications
You can also search for this author in PubMed Google Scholar
Graham Neubig
View author publications
You can also search for this author in PubMed Google Scholar
Satoshi Nakamura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jingyi Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

An earlier version of this paper (Zhang et al. 2017b) was published as a long paper in IJCNLP 2017. We extended this paper, including comparison with target-bidirectional NMT models (Liu et al. 2016) and results of using different n-best lists (from beam search and ancestral sampling) for reranking.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, J., Utiyama, M., Sumita, E. et al. Improving neural machine translation through phrase-based soft forced decoding. Machine Translation 34, 21–39 (2020). https://doi.org/10.1007/s10590-020-09244-y

Download citation

Received: 29 January 2018
Accepted: 17 March 2020
Published: 04 April 2020
Issue Date: April 2020
DOI: https://doi.org/10.1007/s10590-020-09244-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving neural machine translation through phrase-based soft forced decoding

Abstract

Access this article

Similar content being viewed by others

An Investigation on Statistical Machine Translation with Neural Language Models

A Comparable Study on Model Averaging, Ensembling and Reranking in NMT

ISTIC’s Neural Machine Translation System for CCMT’ 2021

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving neural machine translation through phrase-based soft forced decoding

Abstract

Access this article

Similar content being viewed by others

An Investigation on Statistical Machine Translation with Neural Language Models

A Comparable Study on Model Averaging, Ensembling and Reranking in NMT

ISTIC’s Neural Machine Translation System for CCMT’ 2021

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation