Coarse-To-Fine Learning for Neural Machine Translation

  • Zhirui ZhangEmail author
  • Shujie Liu
  • Mu Li
  • Ming Zhou
  • Enhong Chen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11108)


In this paper, we address the problem of learning better word representations for neural machine translation (NMT). We propose a novel approach to NMT model training based on coarse-to-fine learning paradigm, which is able to infer better NMT model parameters for a wide range of less-frequent words in the vocabulary. To this end, our proposed method first groups source and target words into a set of hierarchical clusters, then a sequence of NMT models are learned based on it with growing cluster granularity. Each subsequent model inherits model parameters from its previous one and refines them with finer-grained word-cluster mapping. Experimental results on public data sets demonstrate that our proposed method significantly outperforms baseline attention-based NMT model on Chinese-English and English-French translation tasks.


Neural machine translation Coarse-to-fine learning Hierarchical cluster 


  1. 1.
    Arthur, P., Neubig, G., Nakamura, S.: Incorporating discrete translation lexicons into neural machine translation. In: EMNLP (2016)Google Scholar
  2. 2.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2014)Google Scholar
  3. 3.
    Brown, P.F., Pietra, S.D., Pietra, V.J.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics (1993)Google Scholar
  4. 4.
    Charniak, E., et al.: Multilevel coarse-to-fine PCFG parsing. In: HLT-NAACL (2006)Google Scholar
  5. 5.
    Cheng, Y., et al.: Semi-supervised learning for neural machine translation. In: ACL (2016)Google Scholar
  6. 6.
    Chiang, D.: Hierarchical phrase-based translation. Computational Linguistics (2007)Google Scholar
  7. 7.
    Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP (2014)Google Scholar
  8. 8.
    Costa-jussà, M.R., Fonollosa, J.A.R.: Character-based neural machine translation. In: ACL (2016)Google Scholar
  9. 9.
    Fleuret, F., Geman, D.: Coarse-to-fine face detection. Int. J. Comput. Vis. 41, 85–107 (2001)CrossRefGoogle Scholar
  10. 10.
    Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.: Convolutional sequence to sequence learning. In: ICML (2017)Google Scholar
  11. 11.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS (2010)Google Scholar
  12. 12.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation (1997)Google Scholar
  13. 13.
    Jean, S., Cho, K., Memisevic, R., Bengio, Y.: On using very large target vocabulary for neural machine translation. In: ACL (2015)Google Scholar
  14. 14.
    Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models. In: EMNLP (2013)Google Scholar
  15. 15.
    Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: HLT-NAACL (2003)Google Scholar
  16. 16.
    Li, X., Zhang, J., Zong, C.: Towards zero unknown word in neural machine translation. In: IJCAI (2016)Google Scholar
  17. 17.
    Luong, T., Pham, H., Manning, C.D.: Bilingual word representations with monolingual quality in mind. In: HLT-NAACL (2015)Google Scholar
  18. 18.
    Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: EMNLP (2015)Google Scholar
  19. 19.
    Luong, T., Sutskever, I., Le, Q.V., Vinyals, O., Zaremba, W.: Addressing the rare word problem in neural machine translation. In: ACL (2015)Google Scholar
  20. 20.
    Papineni, K., Roucos, S.E., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: ACL (2002)Google Scholar
  21. 21.
    Pedersoli, M., Vedaldi, A., Gonzàlez, J., Roca, F.X.: A coarse-to-fine approach for fast deformable object detection. In: CVPR (2011)Google Scholar
  22. 22.
    Petrov, S.: Coarse-to-fine natural language processing. In: Theory and Applications of Natural Language Processing (2009)Google Scholar
  23. 23.
    Petrov, S., Haghighi, A., Klein, D.: Coarse-to-fine syntactic machine translation using language projections. In: EMNLP (2008)Google Scholar
  24. 24.
    Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. In: EMNLP (2015)Google Scholar
  25. 25.
    Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: ACL (2016)Google Scholar
  26. 26.
    Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: ACL (2016)Google Scholar
  27. 27.
    Shang, L., Lu, Z., Li, H.: Neural responding machine for short-text conversation. In: ACL (2015)Google Scholar
  28. 28.
    Shen, S., et al.: Minimum risk training for neural machine translation. In: ACL (2016)Google Scholar
  29. 29.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS (2014)Google Scholar
  30. 30.
    Tu, Z., Lu, Z., Liu, Y., Liu, X., Li, H.: Modeling coverage for neural machine translation. In: ACL (2016)Google Scholar
  31. 31.
    Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)Google Scholar
  32. 32.
    Vinyals, O., Kaiser, L., Koo, T., Petrov, S., Sutskever, I., Hinton, G.E.: Grammar as a foreign language. In: NIPS (2015)Google Scholar
  33. 33.
    Wang, M., Lu, Z., Zhou, J., Liu, Q.: Deep neural machine translation with linear associative unit. In: ACL (2017)Google Scholar
  34. 34.
    Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016)Google Scholar
  35. 35.
    Zeiler, M.D.: ADADELTA: an adaptive learning rate method. CoRR abs/1212.5701 (2012)Google Scholar
  36. 36.
    Zhang, J., Zong, C.: Bridging neural machine translation and bilingual dictionaries. CoRR abs/1610.07272 (2016)Google Scholar
  37. 37.
    Zhou, J., Cao, Y., Wang, X., Li, P., Xu, W.: Deep recurrent models with fast-forward connections for neural machine translation. In: TACL (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Zhirui Zhang
    • 1
    Email author
  • Shujie Liu
    • 2
  • Mu Li
    • 2
  • Ming Zhou
    • 2
  • Enhong Chen
    • 1
  1. 1.University of Science and Technology of ChinaHefeiChina
  2. 2.Microsoft Research AsiaBeijingChina

Personalised recommendations