Deps-SAN: Neural Machine Translation with Dependency-Scaled Self-Attention Network

Peng, Ru; Lin, Nankai; Fang, Yi; Jiang, Shengyi; Hao, Tianyong; Chen, Boyu; Zhao, Junbo

doi:10.1007/978-3-031-30111-7_3

Ru Peng¹²,
Nankai Lin¹³,
Yi Fang¹³,
Shengyi Jiang¹⁴,
Tianyong Hao¹⁵,
Boyu Chen¹⁶ &
…
Junbo Zhao¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13625))

Included in the following conference series:

International Conference on Neural Information Processing

910 Accesses

Abstract

Syntax knowledge contributes its powerful strength in Neural machine translation (NMT) tasks. Early NMT works supposed that syntax details can be automatically learned from numerous texts via attention networks. However, succeeding researches pointed out that limited by the uncontrolled nature of attention computation, the NMT model requires an external syntax to capture the deep syntactic awareness. Although existing syntax-aware NMT methods have born great fruits in combining syntax, the additional workloads they introduced render the model heavy and slow. Particularly, these efforts scarcely involve the Transformer-based NMT and modify its core self-attention network (SAN). To this end, we propose a parameter-free, Dependency-scaled Self-Attention Network (Deps-SAN) for syntax-aware Transformer-based NMT. A quantified matrix of dependency closeness between tokens is constructed to impose explicit syntactic constraints into the SAN for learning syntactic details and dispelling the dispersion of attention distributions. Two knowledge sparsing techniques are further integrated to avoid the model overfitting the dependency noises introduced by the external parser. Experiments and analyses on IWSLT14 German-to-English and WMT16 German-to-English benchmark NMT tasks verify the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Transformer, as the new generation of NMT baseline, abandons the recurrence and convolutions, and solely relies on SANs to achieve the incredible progress.
2.
To distinctly illustrate our model, please refer to the original paper for details of Transformer-based NMT.
3.
https://wit3.fbk.eu/archive/2014-01/texts/de/en/de-en.tgz.
4.
https://github.com/pytorch/fairseq/blob/master/examples/translation/prepare-iwslt14.sh.
5.
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/analysis/bootstrap-hypothesis-difference-significance.pl
6.
These groups along with its sentence number are listed below: ([0–10],1657), ([10–20],2637), ([20–30],1381), ([30–40],614), ([40–50],252), ([50–60],122), ([60–70],43), ([70-80],27) and ([80-],17)

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)
Google Scholar
Bugliarello, E., Okazaki, N.: Enhancing machine translation with dependency-aware self-attention. In: ACL, pp. 1618–1627 (2020)
Google Scholar
Cettolo, M., Niehues, J., Stüker, S., Bentivogli, L., Federico, M.: Report on the 11th iwslt evaluation campaign, IWSLT 2014. In: IWSLT, vol. 57 (2014)
Google Scholar
Chen, H., Huang, S., Chiang, D., Chen, J.: Improved neural machine translation with a syntax-aware encoder and decoder. In: ACL, pp. 1936–1945 (2017)
Google Scholar
Chen, K., et al.: Neural machine translation with source dependency representation. In: EMNLP, pp. 2846–2852 (2017)
Google Scholar
Chen, K., Wang, R., Utiyama, M., Sumita, E., Zhao, T.: Syntax-directed attention for neural machine translation. In: AAAI, pp. 4792–4799 (2018)
Google Scholar
Chen, K., Zhao, T., Yang, M., Liu, L.: Translation prediction with source dependency-based context representation. In: AAAI, vol. 31 (2017)
Google Scholar
Choi, H., Cho, K., Bengio, Y.: Context-dependent word representation for neural machine translation. Comput. Speech Lang. 45, 149–160 (2017)
Article Google Scholar
Duan, S., Zhao, H., Zhang, D., Wang, R.: Syntax-aware data augmentation for neural machine translation. arXiv preprint arXiv:2004.14200 (2020)
Eriguchi, A., Hashimoto, K., Tsuruoka, Y.: Tree-to-sequence attentional neural machine translation. In: ACL, pp. 823–833 (2016)
Google Scholar
Eriguchi, A., Tsuruoka, Y., Cho, K.: Learning to parse and translate improves neural machine translation. In: ACL, pp. 72–78 (2017)
Google Scholar
Gehring, J., Auli, M., Grangier, D., Dauphin, Y.N.: A convolutional encoder model for neural machine translation. arXiv preprint arXiv:1611.02344 (2016)
Koehn, P.: Statistical significance tests for machine translation evaluation. In: EMNLP, pp. 388–395 (2004)
Google Scholar
Li, J., Xiong, D., Tu, Z., Zhu, M., Zhang, M., Zhou, G.: Modeling source syntax for neural machine translation. In: ACL, pp. 688–697 (2017)
Google Scholar
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: EMNLP, pp. 1412–1421 (2015)
Google Scholar
Ma, C., Liu, L., Tamura, A., Zhao, T., Sumita, E.: Deterministic attention for sequence-to-sequence constituent parsing. In: AAAI 2017, pp. 3237–3243 (2017)
Google Scholar
Ma, C., Tamura, A., Utiyama, M., Sumita, E., Zhao, T.: Improving neural machine translation with neural syntactic distance. In: NAACL, pp. 2032–2037 (2019)
Google Scholar
Ma, C., Tamura, A., Utiyama, M., Zhao, T., Sumita, E.: Forest-based neural machine translation. In: ACL, pp. 1253–1263 (2018)
Google Scholar
McDonald, R., Nivre, J.: Analyzing and integrating dependency parsers. Comput. Linguist. 37(1), 197–230 (2011)
Article Google Scholar
Nivre, J., et al.: Universal dependencies v1: a multilingual treebank collection. In: LREC, pp. 1659–1666 (2016)
Google Scholar
Omote, Y., Tamura, A., Ninomiya, T.: Dependency-based relative positional encoding for transformer NMT. In: RANLP, pp. 854–861 (2019)
Google Scholar
Ott, M., et al.: FAIRSEQ: a fast, extensible toolkit for sequence modeling. In: NAACL, pp. 48–53 (2019)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: ACL, pp. 311–318 (2002)
Google Scholar
Peng, R., Chen, Z., Hao, T., Fang, Y.: Neural machine translation with attention based on a new syntactic branch distance. In: Huang, S., Knight, K. (eds.) CCMT 2019. CCIS, vol. 1104, pp. 47–57. Springer, Singapore (2019). https://doi.org/10.1007/978-981-15-1721-1_5
Chapter Google Scholar
Peng, R., Hao, T., Fang, Y.: Syntax-aware neural machine translation directed by syntactic dependency degree. Neural Comput. Appl. 33(23), 16609–16625 (2021). https://doi.org/10.1007/s00521-021-06256-4
Article Google Scholar
Raganato, A., Tiedemann, J.: An analysis of encoder representations in transformer-based machine translation. In: EMNLP Workshop, pp. 287–297 (2018)
Google Scholar
Sennrich, R., Haddow, B.: Linguistic input features improve neural machine translation. In: WMT, pp. 83–91 (2016)
Google Scholar
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: ACL. pp. 1715–1725 (2016)
Google Scholar
Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. In: NAACL, pp. 464–468 (2018)
Google Scholar
Shi, X., Padhi, I., Knight, K.: Does string-based neural MT learn source syntax? In: EMNLP, pp. 1526–1534 (2016)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. JMLR. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Neurips, pp. 5998–6008 (2017)
Google Scholar
Wang, W., Knight, K., Marcu, D.: Binarizing syntax trees to improve syntax-based machine translation accuracy. In: EMNLP-CoNLL, pp. 746–754 (2007)
Google Scholar
Wu, S., Zhang, D., Zhang, Z., Yang, N., Li, M., Zhou, M.: Dependency-to-dependency neural machine translation. IEEE-ACM Trans. Audio Speech Lang. Process. 26(11), 2132–2141 (2018)
Article Google Scholar
Wu, S., Zhou, M., Zhang, D.: Improved neural machine translation with source syntax. In: IJCAI, pp. 4179–4185 (2017)
Google Scholar
Yang, B., Li, J., Wong, D.F., Chao, L.S., Wang, X., Tu, Z.: Context-aware self-attention networks. In: AAAI, pp. 387–394 (2019)
Google Scholar
Yang, B., Tu, Z., Wong, D.F., Meng, F., Chao, L.S., Zhang, T.: Modeling localness for self-attention networks. In: EMNLP, pp. 4449–4458 (2018)
Google Scholar
Zhang, B., Xiong, D., Su, J., Duan, H.: A context-aware recurrent encoder for neural machine translation. TASLP. 25(12), 2424–2432 (2017)
Google Scholar
Zhang, M., Li, Z., Fu, G., Zhang, M.: Syntax-enhanced neural machine translation with syntax-aware word representations. In: NAACL, pp. 1151–1161 (2019)
Google Scholar
Zhou, H., Tu, Z., Huang, S., Liu, X., Li, H., Chen, J.: Chunk-based bi-scale decoder for neural machine translation. In: ACL, pp. 580–586 (2017)
Google Scholar

Download references

Acknowledgment

This work was supported in part by the National Natural Science Foundation of China under Grants 62071131, 61771149 and 61772146.

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Ru Peng & Junbo Zhao
School of Information, Guangdong University of Technology, Guangzhou, China
Nankai Lin & Yi Fang
School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, China
Shengyi Jiang
School of Computer Science, South China Normal University, Guangzhou, China
Tianyong Hao
Institute of Health Informatics, University College London, London, UK
Boyu Chen

Authors

Ru Peng
View author publications
You can also search for this author in PubMed Google Scholar
Nankai Lin
View author publications
You can also search for this author in PubMed Google Scholar
Yi Fang
View author publications
You can also search for this author in PubMed Google Scholar
Shengyi Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Tianyong Hao
View author publications
You can also search for this author in PubMed Google Scholar
Boyu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Junbo Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junbo Zhao .

Editor information

Editors and Affiliations

Indian Institute of Technology Indore, Indore, India
Mohammad Tanveer
Indian Institute of Information Technology - Allahabad, Prayagraj, India
Sonali Agarwal
Kobe University, Kobe, Japan
Seiichi Ozawa
Indian Institute of Technology Patna, Patna, India
Asif Ekbal
University of Innsbruck, Innsbruck, Austria
Adam Jatowt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peng, R. et al. (2023). Deps-SAN: Neural Machine Translation with Dependency-Scaled Self-Attention Network. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13625. Springer, Cham. https://doi.org/10.1007/978-3-031-30111-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-30111-7_3
Published: 13 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30110-0
Online ISBN: 978-3-031-30111-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Deps-SAN: Neural Machine Translation with Dependency-Scaled Self-Attention Network