Advertisement

Deep Learning in Machine Translation

  • Yang Liu
  • Jiajun ZhangEmail author
Chapter

Abstract

Machine translation (MT) is an important natural language processing task that investigates the use of computers to translate human languages automatically. Deep learning-based methods have made significant progress in recent years and quickly become the new de facto paradigm of MT in both academia and industry. This chapter introduces two broad categories of deep learning-based MT methods: (1) component-wise deep learning for machine translation that leverages deep learning to improve the capacity of the main components of SMT such as translation models, reordering models, and language models; and (2) end-to-end deep learning for machine translation that uses neural networks to directly map between source and target languages based on the encoder–decoder framework. The chapter closes with a discussion on challenges and future directions of deep learning-based MT.

References

  1. Arthur, P., Neubug, G., & Nakamura, S. (2016). Incorporating discrete translation lexicons into neural machine translation. In Proceedings of EMNLP. arXiv:1606.02006v2.
  2. Auli, M., & Gao, J. (2014). Decoder integration and expected bleu training for recurrent neural network language models. In Proceedings of ACL (pp. 136–142).Google Scholar
  3. Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In Proceedings of ICLR.Google Scholar
  4. Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137–1155.zbMATHGoogle Scholar
  5. Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics.Google Scholar
  6. Calixto, I., Liu, Q., & Campbell, N. (2017). Doubly-attentive decoder for multi-modal neural machine translation. In Proceedings of ACL.Google Scholar
  7. Chen, H., Huang, S., Chiang, D., & Chen, J. (2017a). Improved neural machine translation with a syntax-aware encoder and decoder. In Proceedings of ACL 2017.Google Scholar
  8. Chen, S., & Goodman, J. (1999). An empirical study of smoothing techniques for language modeling. Computer Speech and Language.CrossRefGoogle Scholar
  9. Chen, Y., Liu, Y., Cheng, Y., & Li, V. O. (2017b). A teacher-student framework for zero-resource neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 1925–1935). Vancouver, Canada: Association for Computational Linguistics.Google Scholar
  10. Cheng, Y., Shen, S., He, Z., He, W., Wu, H., Sun, M., & Liu, Y. (2016a). Agreement-based learning of parallel lexicons and phrases from non-parallel corpora. In Proceedings of IJCAI.Google Scholar
  11. Cheng, Y., Xu, W., He, Z., He, W., Wu, H., Sun, M., & Liu, Y. (2016b). Semi-supervised learning for neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 1965–1974). Berlin, Germany: Association for Computational Linguistics.Google Scholar
  12. Cheng, Y., Yang, Q., Liu, Y., Sun, M., & Xu, W. (2017). Joint training for pivot-based neural machine translation. In Proceedings of IJCAI.Google Scholar
  13. Chiang, D. (2007). Hierarchical phrase-based translation. Computational Linguistics.CrossRefGoogle Scholar
  14. Chiang, D., Knight, K., & Wang, W. (2009). 11,001 new features for statistical machine translation. In Proceedings of NAACL.Google Scholar
  15. Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of EMNLP.Google Scholar
  16. Chung, J., Cho, K., & Bengio, Y. (2016). A character-level decoder without explicit segmentation for neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 1693–1703). Berlin, Germany: Association for Computational Linguistics.Google Scholar
  17. Cohn, T., Hoang, C. D. V., Vymolova, E., Yao, K., Dyer, C., & Haffari, G. (2016). Incorporating structural alignment biases into an attentional neural translation model. In Proceedings of NAACL.Google Scholar
  18. Costa-jussà, M. R., & Fonollosa, J. A. R. (2016). Character-based neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 2: Short Papers, pp. 357–361). Berlin, Germany: Association for Computational Linguistics.Google Scholar
  19. Devlin, J., Zbib, R., Huang, Z., Lamar, T., Schwartz, R. M., & Makhoul, J. (2014). Fast and robust neural network joint models for statistical machine translation. In Proceedings of ACL (pp. 1370–1380).Google Scholar
  20. Ding, Y., Liu, Y., Luan, H., & Sun, M. (2017). Visualizing and understanding neural machine translation. In Proceedings of ACL.Google Scholar
  21. Duong, L., Anastasopoulos, A., Chiang, D., Bird, S., & Cohn, T. (2016). An attentional model for speech translation without transcription. In Proceedings of NAACL.Google Scholar
  22. Eriguchi, A., Hashimoto, K., & Tsuruoka, Y. (2016). Tree-to-sequence attentional neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 823–833). Berlin, Germany: Association for Computational Linguistics.Google Scholar
  23. Firat, O., Cho, K., & Bengio, Y. (2016). Multi-way, multilingual neural machine translation with a shared attention mechanism. In HLT-NAACL.Google Scholar
  24. Galley, M., & Manning, C. (2008). A simple and effective hierarchical phrase reordering model. In Proceedings of EMNLP.Google Scholar
  25. Ganchev, K., Graça, J., Gillenwater, J., & Taskar, B. (2010). Posterior regularization for structured latent variable models. Journal of Machine Learning Research.Google Scholar
  26. Gao, J., He, X., Yih, W.-t., & Deng, L. (2014). Learning continuous phrase representations for translation modeling. In Proceedings of ACL (pp. 699–709).Google Scholar
  27. Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017). Convolutional sequence to sequence learning. In Proceedings of ICML 2017.Google Scholar
  28. Gulcehre, C., Firat, O., Xu, K., Cho, K., Barrault, L., Lin, H. C. et al. (2015). On using monolingual corpora in neural machine translation. Computer Science.Google Scholar
  29. He, W., He, Z., Wu, H., & Wang, H. (2016). Improved neural machine translation with SMT features. In Proceedings of AAAI 2016 (pp. 151–157).Google Scholar
  30. He, X., & Deng, L. (2012). Maximum expected bleu training of phrase and lexicon translation models. In Proceedings of ACL.Google Scholar
  31. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation.CrossRefGoogle Scholar
  32. Huang, S., Chen, H., Dai, X., & Chen, J. (2015). Non-linear learning for statistical machine translation. In Proceedings of ACL.Google Scholar
  33. Jean, S., Cho, K., Memisevic, R., & Bengio, Y. (2015). On using very large target vocabulary for neural machine translation. In ACL.Google Scholar
  34. Johnson, M., Schuster, M., Le, Q. V., Krikun, M., Wu, Y., Chen, Z. et al. (2016). Google’s multilingual neural machine translation system: Enabling zero-shot translation. CoRR, abs/1611.04558.Google Scholar
  35. Koehn, P., Och, F. J., & Marcu, D. (2003). Statistical phrase-based translation. In Proceedings of NAACL.Google Scholar
  36. Li, J., Xiong, D., Tu, Z., Zhu, M., Zhang, M., & Zhou, G. (2017). Modeling source syntax for neural machine translation. In Proceedings of ACL 2017.Google Scholar
  37. Li, P., Liu, Y., & Sun, M. (2013). Recursive autoencoders for ITG-based translation. In Proceedings of EMNLP.Google Scholar
  38. Li, P., Liu, Y., Sun, M., Izuha, T., & Zhang, D. (2014). A neural reordering model for phrase-based translation. In Proceedings of COLING (pp. 1897–1907).Google Scholar
  39. Luong, M.-T., & Manning, C. D. (2016). Achieving open vocabulary neural machine translation with hybrid word-character models. In Proceedings of ACL.Google Scholar
  40. Luong, T., Sutskever, I., Le, Q. V., Vinyals, O., & Zaremba, W. (2015). Addressing the rare word problem in neural machine translation. In ACL.Google Scholar
  41. Meng, F., Lu, Z., Wang, M., Li, H., Jiang, W., & Liu, Q. (2015). Encoding source language with convolutional neural network for machine translation. In Proceedings of ACL.Google Scholar
  42. Mi, H., Sankaran, B., Wang, Z., & Ittycheriah, A. (2016). Coverage embedding models for neural machine translation. In Proceedings of EMNLP.Google Scholar
  43. Nakayama, H., & Nishida, N. (2016). Zero-resource machine translation by multimodal encoder-decoder network with multimedia pivot. Machine Translation 2017. CoRR, abs/1611.04503.Google Scholar
  44. Niehues, J., Cho, E., Ha, T.-L., & Waibel, A. (2016). Pre-translation for neural machine translation. In Proceedings of COLING 2016.Google Scholar
  45. Nirenburg, S. (1989). Knowledge-based machine translation. Machine Translation.CrossRefGoogle Scholar
  46. Och, F. J. (2003). Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (Vol. 1, pp. 160–167).Google Scholar
  47. Och, F. J., & Ney, H. (2002). Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of ACL.Google Scholar
  48. Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. In ACL.Google Scholar
  49. Ranzato, M., Chopra, S., Auli, M., & Zaremba, W. (2016). Sequence level training with recurrent neural networks. In CoRR.Google Scholar
  50. Sennrich, R., Haddow, B., & Birch, A. (2016a). Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 86–96). Berlin, Germany: Association for Computational Linguistics.Google Scholar
  51. Sennrich, R., Haddow, B., & Birch, A. (2016b). Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 1715–1725). Berlin, Germany: Association for Computational Linguistics.Google Scholar
  52. Shen, S., Cheng, Y., He, Z., He, W., Wu, H., Sun, M., & Liu, Y. (2016). Minimum risk training for neural machine translation. In Proceedings of ACL.Google Scholar
  53. Smith, D. A., & Eisner, J. (2006). Minimum risk annealing for training log-linear models. In Proceedings of ACL.Google Scholar
  54. Snover, M., Dorr, B., Schwartz, R., Micciulla, L., & Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In Proceedings of AMTA.Google Scholar
  55. Su, J., Xiong, D., Zhang, B., Liu, Y., Yao, J., & Zhang, M. (2015). Bilingual correspondence recursive autoencoder for statistical machine translation. In Proceedings of EMNLP (pp. 1248–1258).Google Scholar
  56. Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Proceedings of NIPS.Google Scholar
  57. Sutton, R. S., & Barto, A. G. (1988). Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.Google Scholar
  58. Tamura, A., Watanabe, T., & Sumita, E. (2014). Recurrent neural networks for word alignment model. In Proceedings of ACL.Google Scholar
  59. Tang, Y., Meng, F., Lu, Z., Li, H., & Yu, P. L. H. (2016). Neural machine translation with external phrase memory. arXiv:1606.01792v1.
  60. Tu, Z., Lu, Z., Liu, Y., Liu, X., & Li, H. (2016). Modeling coverage for neural machine translation. In Proceedings of ACL.Google Scholar
  61. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N. et al. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.
  62. Vaswani, A., Zhao, Y., Fossum, V., & Chiang, D. (2013). Decoding with large-scale neural language models improves translation. In Proceedings of EMNLP (pp. 1387–1392).Google Scholar
  63. Vogel, S., Ney, H., & Tillmann, C. (1996). HMM-based word alignment in statistical translation. In Proceedings of COLING.Google Scholar
  64. Wang, X., Lu, Z., Tu, Z., Li, H., Xiong, D., & Zhang, M. (2017). Neural machine translation advised by statistical machine translation. In Proceedings of AAAI 2017 (pp. 3330–3336).Google Scholar
  65. Wiseman, S., & Rush, A. M. (2016). Sequence-to-sequence learning as beam-search optimization. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 1296–1306). Austin, TX: Association for Computational Linguistics.Google Scholar
  66. Wu, S., Zhang, D., Yang, N., Li, M., & Zhou, M. (2017). Sequence-to-dependency neural machine translation. In Proceedings of ACL 2017 (Vol.e 1, pp. 698–707).Google Scholar
  67. Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W. et al. (2016). Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144v2.
  68. Xiong, D., Liu, Q., & Lin, S. (2006). Maximum entropy based phrase reordering model for statistical machine translation. In Proceedings of ACL-COLING (pp. 505–512).Google Scholar
  69. Yang, N., Liu, S., Li, M., Zhou, M., & Yu, N. (2013). Word alignment modeling with context dependent deep neural network. In Proceedings of ACL.Google Scholar
  70. Zhang, B., Xiong, D., & Su, J. (2017a). BattRAE: Bidimensional attention-based recursive autoencoders for learning bilingual phrase embeddings. In Proceedings of AAAI.Google Scholar
  71. Zhang, J., Liu, S., Li, M., Zhou, M., & Zong, C. (2014a). Bilingually-constrained phrase embeddings for machine translation. In Proceedings of ACL (pp. 111–121).Google Scholar
  72. Zhang, J., Liu, S., Li, M., Zhou, M., & Zong, C. (2014b). Mind the gap: Machine translation by minimizing the semantic gap in embedding space. In AAAI (pp. 1657–1664).Google Scholar
  73. Zhang, J., Liu, Y., Luan, H., Xu, J., & Sun, M. (2017b). Prior knowledge integration for neural machine translation using posterior regularization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 1514–1523). Vancouver, Canada: Association for Computational Linguistics.Google Scholar
  74. Zhang, J., Zhang, D., & Hao, J. (2015). Local translation prediction with global sentence representation. In Proceedings of IJCAI.Google Scholar
  75. Zhang, J., & Zong, C. (2016). Exploiting source-side monolingual data in neural machine translation. In Proceedings of EMNLP 2016 (pp. 1535–1545).Google Scholar
  76. Zheng, H., Cheng, Y., & Liu, Y. (2017). Maximum expected likelihood estimation for zero-resource neural machine translation. In Proceedings of IJCAI.Google Scholar
  77. Zhou, L., Hu, W., Zhang, J., & Zong, C. (2017). Neural system combination for machine translation. In Proceedings of ACL 2017.Google Scholar
  78. Zoph, B., Yuret, D., May, J., & Knight, K. (2016). Transfer learning for low-resource neural machine translation. In EMNLP.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Tsinghua UniversityBeijingChina
  2. 2.Institute of AutomationChinese Academy of SciencesBeijingChina

Personalised recommendations