Advertisement

A Comparable Study on Model Averaging, Ensembling and Reranking in NMT

  • Yuchen Liu
  • Long Zhou
  • Yining Wang
  • Yang Zhao
  • Jiajun Zhang
  • Chengqing ZongEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11109)

Abstract

Neural machine translation has become a benchmark method in machine translation. Many novel structures and methods have been proposed to improve the translation quality. However, it is difficult to train and turn parameters. In this paper, we focus on decoding techniques that boost translation performance by utilizing existing models. We address the problem from three aspects—parameter, word and sentence level, corresponding to checkpoint averaging, model ensembling and candidates reranking which all do not need to retrain the model. Experimental results have shown that the proposed decoding approaches can significantly improve the performance over baseline model.

Notes

Acknowledgments

The research work descried in this paper has been supported by the National Key Research and Development Program of China under Grant No. 2016QY02D0303.

References

  1. 1.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings of ICLR (2015)Google Scholar
  2. 2.
    Cheng, Y., Xu, W., He, Z., He, W., Wu, H., Sun, M., Liu, Y.: Semi-supervised learning for neural machine translation. In: Proceedings of ACL (2016)Google Scholar
  3. 3.
    Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of ACL (2005)Google Scholar
  4. 4.
    Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning (2017). arXiv preprint: arXiv:1705.03122
  5. 5.
    He, W., He, Z., Wu, H., Wang, H.: Improved neural machine translation with SMT features. In: Proceedings of AAAI (2016)Google Scholar
  6. 6.
    Herbrich, R.: Large margin rank boundaries for ordinal regression. In: Advances in Large Margin Classifiers (2000)Google Scholar
  7. 7.
    Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models. In: Proceedings of EMNLP (2013)Google Scholar
  8. 8.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of ICLR (2015)Google Scholar
  9. 9.
    Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of ACL-NAACL (2003)Google Scholar
  10. 10.
    Mi, H., Sankaran, B., Wang, Z., Ittycheriah, A.: A coverage embedding model for neural machine translation (2016). arXiv preprint: arXiv:1605.03148
  11. 11.
    Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of ACL (2003)Google Scholar
  12. 12.
    Sennrich, R., Birch, A., Currey, A., Germann, U., Haddow, B., Heafield, K., Barone, A.V.M., Williams, P.: The University of Edinburgh’s neural MT systems for WMT 2017 (2017). arXiv preprint: arXiv:1708.00726
  13. 13.
    Sennrich, R., Haddow, B., Birch, A.: Edinburgh neural machine translation systems for WMT 2016 (2016). arXiv preprint: arXiv:1606.02891
  14. 14.
    Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of ACL (2016)Google Scholar
  15. 15.
    Shen, L., Sarkar, A., Och, F.J.: Discriminative reranking for machine translation. In: Proceedings of HLT-NAACL (2004)Google Scholar
  16. 16.
    Shen, S., Cheng, Y., He, Z., He, W., Wu, H., Sun, M., Liu, Y.: Minimum risk training for neural machine translation (2015)Google Scholar
  17. 17.
    Shu, R., Nakayama, H.: Later-stage Minimum Bayes-Risk Decoding for Neural Machine Translation (2017). arXiv preprint: arXiv:1704.03169
  18. 18.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceedings of NIPS (2014)Google Scholar
  19. 19.
    Tromble, R.W., Kumar, S., Och, F., Macherey, W.: Minimum Bayes-risk decoding for statistical machine translation. In: Proceedings of HLT-NAACL (2004)Google Scholar
  20. 20.
    Tu, Z., Lu, Z., Liu, Y., Liu, X., Li, H.: Modeling coverage for neural machine translation. In: Proceedings of ACL (2016)Google Scholar
  21. 21.
    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)Google Scholar
  22. 22.
    Wang, X., Lu, Z., Tu, Z., Li, H., Xiong, D., Zhang, M.: Neural machine translation advised by statistical machine translation. In: Proceedings of AAAI (2017)Google Scholar
  23. 23.
    Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M.: Google’s neural machine translation system: bridging the gap between human and machine translation (2016). arXiv preprint: arXiv:1609.08144
  24. 24.
    Zhai, F., Zhang, J., Zhou, Y., Zong, C., et al.: Tree-based translation without using parse trees. In: Proceedings of COLING (2012)Google Scholar
  25. 25.
    Zhang, J., Zong, C.: Exploiting source-side monolingual data in neural machine translation. In: Proceedings of EMNLP (2016)Google Scholar
  26. 26.
    Zhou, L., Hu, W., Zhang, J., Zong, C.: Neural system combination for machine translation. In: Proceedings of ACL (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Yuchen Liu
    • 1
  • Long Zhou
    • 1
  • Yining Wang
    • 1
  • Yang Zhao
    • 1
  • Jiajun Zhang
    • 1
  • Chengqing Zong
    • 1
    • 2
    Email author
  1. 1.National Laboratory of Pattern Recognition, CASIAUniversity of Chinese Academy of SciencesBeijingChina
  2. 2.CAS Center for Excellence in Brain Science and Intelligence TechnologyShanghaiChina

Personalised recommendations