Abstract
Syntax knowledge contributes its powerful strength in Neural machine translation (NMT) tasks. Early NMT works supposed that syntax details can be automatically learned from numerous texts via attention networks. However, succeeding researches pointed out that limited by the uncontrolled nature of attention computation, the NMT model requires an external syntax to capture the deep syntactic awareness. Although existing syntax-aware NMT methods have born great fruits in combining syntax, the additional workloads they introduced render the model heavy and slow. Particularly, these efforts scarcely involve the Transformer-based NMT and modify its core self-attention network (SAN). To this end, we propose a parameter-free, Dependency-scaled Self-Attention Network (Deps-SAN) for syntax-aware Transformer-based NMT. A quantified matrix of dependency closeness between tokens is constructed to impose explicit syntactic constraints into the SAN for learning syntactic details and dispelling the dispersion of attention distributions. Two knowledge sparsing techniques are further integrated to avoid the model overfitting the dependency noises introduced by the external parser. Experiments and analyses on IWSLT14 German-to-English and WMT16 German-to-English benchmark NMT tasks verify the effectiveness of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Transformer, as the new generation of NMT baseline, abandons the recurrence and convolutions, and solely relies on SANs to achieve the incredible progress.
- 2.
To distinctly illustrate our model, please refer to the original paper for details of Transformer-based NMT.
- 3.
- 4.
- 5.
- 6.
These groups along with its sentence number are listed below: ([0–10],1657), ([10–20],2637), ([20–30],1381), ([30–40],614), ([40–50],252), ([50–60],122), ([60–70],43), ([70-80],27) and ([80-],17)
References
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)
Bugliarello, E., Okazaki, N.: Enhancing machine translation with dependency-aware self-attention. In: ACL, pp. 1618–1627 (2020)
Cettolo, M., Niehues, J., Stüker, S., Bentivogli, L., Federico, M.: Report on the 11th iwslt evaluation campaign, IWSLT 2014. In: IWSLT, vol. 57 (2014)
Chen, H., Huang, S., Chiang, D., Chen, J.: Improved neural machine translation with a syntax-aware encoder and decoder. In: ACL, pp. 1936–1945 (2017)
Chen, K., et al.: Neural machine translation with source dependency representation. In: EMNLP, pp. 2846–2852 (2017)
Chen, K., Wang, R., Utiyama, M., Sumita, E., Zhao, T.: Syntax-directed attention for neural machine translation. In: AAAI, pp. 4792–4799 (2018)
Chen, K., Zhao, T., Yang, M., Liu, L.: Translation prediction with source dependency-based context representation. In: AAAI, vol. 31 (2017)
Choi, H., Cho, K., Bengio, Y.: Context-dependent word representation for neural machine translation. Comput. Speech Lang. 45, 149–160 (2017)
Duan, S., Zhao, H., Zhang, D., Wang, R.: Syntax-aware data augmentation for neural machine translation. arXiv preprint arXiv:2004.14200 (2020)
Eriguchi, A., Hashimoto, K., Tsuruoka, Y.: Tree-to-sequence attentional neural machine translation. In: ACL, pp. 823–833 (2016)
Eriguchi, A., Tsuruoka, Y., Cho, K.: Learning to parse and translate improves neural machine translation. In: ACL, pp. 72–78 (2017)
Gehring, J., Auli, M., Grangier, D., Dauphin, Y.N.: A convolutional encoder model for neural machine translation. arXiv preprint arXiv:1611.02344 (2016)
Koehn, P.: Statistical significance tests for machine translation evaluation. In: EMNLP, pp. 388–395 (2004)
Li, J., Xiong, D., Tu, Z., Zhu, M., Zhang, M., Zhou, G.: Modeling source syntax for neural machine translation. In: ACL, pp. 688–697 (2017)
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: EMNLP, pp. 1412–1421 (2015)
Ma, C., Liu, L., Tamura, A., Zhao, T., Sumita, E.: Deterministic attention for sequence-to-sequence constituent parsing. In: AAAI 2017, pp. 3237–3243 (2017)
Ma, C., Tamura, A., Utiyama, M., Sumita, E., Zhao, T.: Improving neural machine translation with neural syntactic distance. In: NAACL, pp. 2032–2037 (2019)
Ma, C., Tamura, A., Utiyama, M., Zhao, T., Sumita, E.: Forest-based neural machine translation. In: ACL, pp. 1253–1263 (2018)
McDonald, R., Nivre, J.: Analyzing and integrating dependency parsers. Comput. Linguist. 37(1), 197–230 (2011)
Nivre, J., et al.: Universal dependencies v1: a multilingual treebank collection. In: LREC, pp. 1659–1666 (2016)
Omote, Y., Tamura, A., Ninomiya, T.: Dependency-based relative positional encoding for transformer NMT. In: RANLP, pp. 854–861 (2019)
Ott, M., et al.: FAIRSEQ: a fast, extensible toolkit for sequence modeling. In: NAACL, pp. 48–53 (2019)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: ACL, pp. 311–318 (2002)
Peng, R., Chen, Z., Hao, T., Fang, Y.: Neural machine translation with attention based on a new syntactic branch distance. In: Huang, S., Knight, K. (eds.) CCMT 2019. CCIS, vol. 1104, pp. 47–57. Springer, Singapore (2019). https://doi.org/10.1007/978-981-15-1721-1_5
Peng, R., Hao, T., Fang, Y.: Syntax-aware neural machine translation directed by syntactic dependency degree. Neural Comput. Appl. 33(23), 16609–16625 (2021). https://doi.org/10.1007/s00521-021-06256-4
Raganato, A., Tiedemann, J.: An analysis of encoder representations in transformer-based machine translation. In: EMNLP Workshop, pp. 287–297 (2018)
Sennrich, R., Haddow, B.: Linguistic input features improve neural machine translation. In: WMT, pp. 83–91 (2016)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: ACL. pp. 1715–1725 (2016)
Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. In: NAACL, pp. 464–468 (2018)
Shi, X., Padhi, I., Knight, K.: Does string-based neural MT learn source syntax? In: EMNLP, pp. 1526–1534 (2016)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. JMLR. 15(1), 1929–1958 (2014)
Vaswani, A., et al.: Attention is all you need. In: Neurips, pp. 5998–6008 (2017)
Wang, W., Knight, K., Marcu, D.: Binarizing syntax trees to improve syntax-based machine translation accuracy. In: EMNLP-CoNLL, pp. 746–754 (2007)
Wu, S., Zhang, D., Zhang, Z., Yang, N., Li, M., Zhou, M.: Dependency-to-dependency neural machine translation. IEEE-ACM Trans. Audio Speech Lang. Process. 26(11), 2132–2141 (2018)
Wu, S., Zhou, M., Zhang, D.: Improved neural machine translation with source syntax. In: IJCAI, pp. 4179–4185 (2017)
Yang, B., Li, J., Wong, D.F., Chao, L.S., Wang, X., Tu, Z.: Context-aware self-attention networks. In: AAAI, pp. 387–394 (2019)
Yang, B., Tu, Z., Wong, D.F., Meng, F., Chao, L.S., Zhang, T.: Modeling localness for self-attention networks. In: EMNLP, pp. 4449–4458 (2018)
Zhang, B., Xiong, D., Su, J., Duan, H.: A context-aware recurrent encoder for neural machine translation. TASLP. 25(12), 2424–2432 (2017)
Zhang, M., Li, Z., Fu, G., Zhang, M.: Syntax-enhanced neural machine translation with syntax-aware word representations. In: NAACL, pp. 1151–1161 (2019)
Zhou, H., Tu, Z., Huang, S., Liu, X., Li, H., Chen, J.: Chunk-based bi-scale decoder for neural machine translation. In: ACL, pp. 580–586 (2017)
Acknowledgment
This work was supported in part by the National Natural Science Foundation of China under Grants 62071131, 61771149 and 61772146.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Peng, R. et al. (2023). Deps-SAN: Neural Machine Translation with Dependency-Scaled Self-Attention Network. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13625. Springer, Cham. https://doi.org/10.1007/978-3-031-30111-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-30111-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30110-0
Online ISBN: 978-3-031-30111-7
eBook Packages: Computer ScienceComputer Science (R0)