Skip to main content

Deps-SAN: Neural Machine Translation with Dependency-Scaled Self-Attention Network

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13625))

Included in the following conference series:

  • 910 Accesses

Abstract

Syntax knowledge contributes its powerful strength in Neural machine translation (NMT) tasks. Early NMT works supposed that syntax details can be automatically learned from numerous texts via attention networks. However, succeeding researches pointed out that limited by the uncontrolled nature of attention computation, the NMT model requires an external syntax to capture the deep syntactic awareness. Although existing syntax-aware NMT methods have born great fruits in combining syntax, the additional workloads they introduced render the model heavy and slow. Particularly, these efforts scarcely involve the Transformer-based NMT and modify its core self-attention network (SAN). To this end, we propose a parameter-free, Dependency-scaled Self-Attention Network (Deps-SAN) for syntax-aware Transformer-based NMT. A quantified matrix of dependency closeness between tokens is constructed to impose explicit syntactic constraints into the SAN for learning syntactic details and dispelling the dispersion of attention distributions. Two knowledge sparsing techniques are further integrated to avoid the model overfitting the dependency noises introduced by the external parser. Experiments and analyses on IWSLT14 German-to-English and WMT16 German-to-English benchmark NMT tasks verify the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Transformer, as the new generation of NMT baseline, abandons the recurrence and convolutions, and solely relies on SANs to achieve the incredible progress.

  2. 2.

    To distinctly illustrate our model, please refer to the original paper for details of Transformer-based NMT.

  3. 3.

    https://wit3.fbk.eu/archive/2014-01/texts/de/en/de-en.tgz.

  4. 4.

    https://github.com/pytorch/fairseq/blob/master/examples/translation/prepare-iwslt14.sh.

  5. 5.

    https://github.com/moses-smt/mosesdecoder/blob/master/scripts/analysis/bootstrap-hypothesis-difference-significance.pl

  6. 6.

    These groups along with its sentence number are listed below: ([0–10],1657), ([10–20],2637), ([20–30],1381), ([30–40],614), ([40–50],252), ([50–60],122), ([60–70],43), ([70-80],27) and ([80-],17)

References

  1. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)

    Google Scholar 

  2. Bugliarello, E., Okazaki, N.: Enhancing machine translation with dependency-aware self-attention. In: ACL, pp. 1618–1627 (2020)

    Google Scholar 

  3. Cettolo, M., Niehues, J., Stüker, S., Bentivogli, L., Federico, M.: Report on the 11th iwslt evaluation campaign, IWSLT 2014. In: IWSLT, vol. 57 (2014)

    Google Scholar 

  4. Chen, H., Huang, S., Chiang, D., Chen, J.: Improved neural machine translation with a syntax-aware encoder and decoder. In: ACL, pp. 1936–1945 (2017)

    Google Scholar 

  5. Chen, K., et al.: Neural machine translation with source dependency representation. In: EMNLP, pp. 2846–2852 (2017)

    Google Scholar 

  6. Chen, K., Wang, R., Utiyama, M., Sumita, E., Zhao, T.: Syntax-directed attention for neural machine translation. In: AAAI, pp. 4792–4799 (2018)

    Google Scholar 

  7. Chen, K., Zhao, T., Yang, M., Liu, L.: Translation prediction with source dependency-based context representation. In: AAAI, vol. 31 (2017)

    Google Scholar 

  8. Choi, H., Cho, K., Bengio, Y.: Context-dependent word representation for neural machine translation. Comput. Speech Lang. 45, 149–160 (2017)

    Article  Google Scholar 

  9. Duan, S., Zhao, H., Zhang, D., Wang, R.: Syntax-aware data augmentation for neural machine translation. arXiv preprint arXiv:2004.14200 (2020)

  10. Eriguchi, A., Hashimoto, K., Tsuruoka, Y.: Tree-to-sequence attentional neural machine translation. In: ACL, pp. 823–833 (2016)

    Google Scholar 

  11. Eriguchi, A., Tsuruoka, Y., Cho, K.: Learning to parse and translate improves neural machine translation. In: ACL, pp. 72–78 (2017)

    Google Scholar 

  12. Gehring, J., Auli, M., Grangier, D., Dauphin, Y.N.: A convolutional encoder model for neural machine translation. arXiv preprint arXiv:1611.02344 (2016)

  13. Koehn, P.: Statistical significance tests for machine translation evaluation. In: EMNLP, pp. 388–395 (2004)

    Google Scholar 

  14. Li, J., Xiong, D., Tu, Z., Zhu, M., Zhang, M., Zhou, G.: Modeling source syntax for neural machine translation. In: ACL, pp. 688–697 (2017)

    Google Scholar 

  15. Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: EMNLP, pp. 1412–1421 (2015)

    Google Scholar 

  16. Ma, C., Liu, L., Tamura, A., Zhao, T., Sumita, E.: Deterministic attention for sequence-to-sequence constituent parsing. In: AAAI 2017, pp. 3237–3243 (2017)

    Google Scholar 

  17. Ma, C., Tamura, A., Utiyama, M., Sumita, E., Zhao, T.: Improving neural machine translation with neural syntactic distance. In: NAACL, pp. 2032–2037 (2019)

    Google Scholar 

  18. Ma, C., Tamura, A., Utiyama, M., Zhao, T., Sumita, E.: Forest-based neural machine translation. In: ACL, pp. 1253–1263 (2018)

    Google Scholar 

  19. McDonald, R., Nivre, J.: Analyzing and integrating dependency parsers. Comput. Linguist. 37(1), 197–230 (2011)

    Article  Google Scholar 

  20. Nivre, J., et al.: Universal dependencies v1: a multilingual treebank collection. In: LREC, pp. 1659–1666 (2016)

    Google Scholar 

  21. Omote, Y., Tamura, A., Ninomiya, T.: Dependency-based relative positional encoding for transformer NMT. In: RANLP, pp. 854–861 (2019)

    Google Scholar 

  22. Ott, M., et al.: FAIRSEQ: a fast, extensible toolkit for sequence modeling. In: NAACL, pp. 48–53 (2019)

    Google Scholar 

  23. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: ACL, pp. 311–318 (2002)

    Google Scholar 

  24. Peng, R., Chen, Z., Hao, T., Fang, Y.: Neural machine translation with attention based on a new syntactic branch distance. In: Huang, S., Knight, K. (eds.) CCMT 2019. CCIS, vol. 1104, pp. 47–57. Springer, Singapore (2019). https://doi.org/10.1007/978-981-15-1721-1_5

    Chapter  Google Scholar 

  25. Peng, R., Hao, T., Fang, Y.: Syntax-aware neural machine translation directed by syntactic dependency degree. Neural Comput. Appl. 33(23), 16609–16625 (2021). https://doi.org/10.1007/s00521-021-06256-4

    Article  Google Scholar 

  26. Raganato, A., Tiedemann, J.: An analysis of encoder representations in transformer-based machine translation. In: EMNLP Workshop, pp. 287–297 (2018)

    Google Scholar 

  27. Sennrich, R., Haddow, B.: Linguistic input features improve neural machine translation. In: WMT, pp. 83–91 (2016)

    Google Scholar 

  28. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: ACL. pp. 1715–1725 (2016)

    Google Scholar 

  29. Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. In: NAACL, pp. 464–468 (2018)

    Google Scholar 

  30. Shi, X., Padhi, I., Knight, K.: Does string-based neural MT learn source syntax? In: EMNLP, pp. 1526–1534 (2016)

    Google Scholar 

  31. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. JMLR. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  32. Vaswani, A., et al.: Attention is all you need. In: Neurips, pp. 5998–6008 (2017)

    Google Scholar 

  33. Wang, W., Knight, K., Marcu, D.: Binarizing syntax trees to improve syntax-based machine translation accuracy. In: EMNLP-CoNLL, pp. 746–754 (2007)

    Google Scholar 

  34. Wu, S., Zhang, D., Zhang, Z., Yang, N., Li, M., Zhou, M.: Dependency-to-dependency neural machine translation. IEEE-ACM Trans. Audio Speech Lang. Process. 26(11), 2132–2141 (2018)

    Article  Google Scholar 

  35. Wu, S., Zhou, M., Zhang, D.: Improved neural machine translation with source syntax. In: IJCAI, pp. 4179–4185 (2017)

    Google Scholar 

  36. Yang, B., Li, J., Wong, D.F., Chao, L.S., Wang, X., Tu, Z.: Context-aware self-attention networks. In: AAAI, pp. 387–394 (2019)

    Google Scholar 

  37. Yang, B., Tu, Z., Wong, D.F., Meng, F., Chao, L.S., Zhang, T.: Modeling localness for self-attention networks. In: EMNLP, pp. 4449–4458 (2018)

    Google Scholar 

  38. Zhang, B., Xiong, D., Su, J., Duan, H.: A context-aware recurrent encoder for neural machine translation. TASLP. 25(12), 2424–2432 (2017)

    Google Scholar 

  39. Zhang, M., Li, Z., Fu, G., Zhang, M.: Syntax-enhanced neural machine translation with syntax-aware word representations. In: NAACL, pp. 1151–1161 (2019)

    Google Scholar 

  40. Zhou, H., Tu, Z., Huang, S., Liu, X., Li, H., Chen, J.: Chunk-based bi-scale decoder for neural machine translation. In: ACL, pp. 580–586 (2017)

    Google Scholar 

Download references

Acknowledgment

This work was supported in part by the National Natural Science Foundation of China under Grants 62071131, 61771149 and 61772146.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junbo Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Peng, R. et al. (2023). Deps-SAN: Neural Machine Translation with Dependency-Scaled Self-Attention Network. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13625. Springer, Cham. https://doi.org/10.1007/978-3-031-30111-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-30111-7_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30110-0

  • Online ISBN: 978-3-031-30111-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics