Advertisement

Towards Building a Strong Transformer Neural Machine Translation System

  • Qiang Wang
  • Bei Li
  • Jiqiang Liu
  • Bojian Jiang
  • Zheyang Zhang
  • Yinqiao Li
  • Ye Lin
  • Tong XiaoEmail author
  • Jingbo Zhu
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 954)

Abstract

Transformer model based on self-attention mechanism [17] has achieved state-of-the-art in recent evaluations. However, it is still unclear how much room there is for improvement of the translation system based on this model. In this paper we further explore how to build a stronger neural machine system from four aspects, including architectural improvements, diverse ensemble decoding, reranking, and post-processing. Experimental results on CWMT-18 Chinese \(\leftrightarrow \) English tasks show that our approach can consistently improve the translation performance of 2.3–3.8 BLEU points than the strong baseline. Particularly, we find that ensemble decoding with a large number of diverse models is crucial for significant improvement.

Notes

Acknowledgments

This work was supported in part by the National Science Foundation of China (No. 61672138 and 61432013), the Fundamental Research Funds for the Central Universities.

References

  1. 1.
    Ba, J.L., Kiros, R., Hinton, G.E.: Layer normalization. CoRR abs/1607.06450 (2016). http://arxiv.org/abs/1607.06450
  2. 2.
    Chiang, D., Marton, Y., Resnik, P.: Online large-margin training of syntactic and structural translation features, pp. 224–233. Association for Computational Linguistics (2008)Google Scholar
  3. 3.
    Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. ArXiv e-prints, May 2017Google Scholar
  4. 4.
    Hassan, H., et al.: Achieving human parity on automatic Chinese to English news translation. arXiv preprint arXiv:1803.05567 (2018)
  5. 5.
    He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_38CrossRefGoogle Scholar
  6. 6.
    Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of the EMNLP 2011 Sixth Workshop on Statistical Machine Translation, Edinburgh, Scotland, United Kingdom, pp. 187–197, July 2011. https://kheafield.com/papers/avenue/kenlm.pdf
  7. 7.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  8. 8.
    Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions (2018)Google Scholar
  9. 9.
    Rousseau, A.: Xenc: an open-source tool for data selection in natural language processing. Prague Bull. Math. Linguist. 100, 73–82 (2013)CrossRefGoogle Scholar
  10. 10.
    Sennrich, R., et al.: The University of Edinburgh’s neural MT systems for WMT17. In: WMT 2017, pp. 389 (2017)Google Scholar
  11. 11.
    Sennrich, R., Haddow, B.: Linguistic input features improve neural machine translation. In: Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers, pp. 83–91. Association for Computational Linguistics (2016)Google Scholar
  12. 12.
    Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 86–96 (2016)Google Scholar
  13. 13.
    Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, Volume 1: Long Papers, 7–12 August 2016, Berlin, Germany (2016). http://aclweb.org/anthology/P/P16/P16-1162.pdf
  14. 14.
    Sennrich, R., Haddow, B., Birch, A.: Edinburgh neural machine translation systems for WMT 16. In: Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, pp. 371–376. Association for Computational Linguistics (2016)Google Scholar
  15. 15.
    Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), vol. 2, pp. 464–468 (2018)Google Scholar
  16. 16.
    Tu, Z., Liu, Y., Shang, L., Liu, X., Li, H.: Neural machine translation with reconstruction. In: AAAI, pp. 3097–3103 (2017)Google Scholar
  17. 17.
    Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 6000–6010 (2017)Google Scholar
  18. 18.
    Wang, Y., et al.: Sogou neural machine translation systems for WMT17. In: Proceedings of the Second Conference on Machine Translation, pp. 410–415 (2017)Google Scholar
  19. 19.
    Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Qiang Wang
    • 1
    • 2
  • Bei Li
    • 1
  • Jiqiang Liu
    • 1
  • Bojian Jiang
    • 1
  • Zheyang Zhang
    • 1
  • Yinqiao Li
    • 1
    • 2
  • Ye Lin
    • 1
  • Tong Xiao
    • 1
    • 2
    Email author
  • Jingbo Zhu
    • 1
    • 2
  1. 1.Natrual Language Processing LabNortheastern UniversityShenyangChina
  2. 2.NiuTrans Inc.ShenyangChina

Personalised recommendations