Skip to main content
Log in

Making attention mechanisms more robust and interpretable with virtual adversarial training

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Although attention mechanisms have become fundamental components of deep learning models, they are vulnerable to perturbations, which may degrade the prediction performance and model interpretability. Adversarial training (AT) for attention mechanisms has successfully reduced such drawbacks by considering adversarial perturbations. However, this technique requires label information, and thus, its use is limited to supervised settings. In this study, we explore the concept of incorporating virtual AT (VAT) into the attention mechanisms, by which adversarial perturbations can be computed even from unlabeled data. To realize this approach, we propose two general training techniques, namely VAT for attention mechanisms (Attention VAT) and “interpretable” VAT for attention mechanisms (Attention iVAT), which extend AT for attention mechanisms to a semi-supervised setting. In particular, Attention iVAT focuses on the differences in attention; thus, it can efficiently learn clearer attention and improve model interpretability, even with unlabeled data. Empirical experiments based on six public datasets revealed that our techniques provide better prediction performance than conventional AT-based as well as VAT-based techniques, and stronger agreement with evidence that is provided by humans in detecting important words in sentences. Moreover, our proposal offers these advantages without needing to add the careful selection of unlabeled data. That is, even if the model using our VAT-based technique is trained on unlabeled data from a source other than the target task, both the prediction performance and model interpretability can be improved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. spotify/annoy: https://github.com/spotify/annoy

  2. They constructed a simple model in which they first trained the encoder to extract rationales, and then trained the decoder to perform prediction using only rationales based on the pipeline model [37]. The pipeline model adopts BERT for both the encoder and the decoder.

  3. https://github.com/shunk031/attention-meets-perturbation

  4. https://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip

  5. https://s3.amazonaws.com/text-datasets/imdb_full.pkl

  6. https://s3.amazonaws.com/text-datasets/imdb_word_index.json

  7. The dataset can be found on Xiang Zhang’s https://drive.google.com/drive/u/0/folders/0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M.

  8. The dataset can be found on Deep Mind Q&A https://drive.google.com/uc?export=download&id=0BwmD_VLjROrfTTljRDVZMFJnVWM.

  9. https://nlp.stanford.edu/projects/snli/snli_1.0.zip

  10. https://www.nyu.edu/projects/bowman/multinli/multinli_1.0.zip

References

  1. Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2013) Intriguing properties of neural networks. In: 2nd International conference on learning representations, ICLR, conference track proceedings. arXiv:1312.6199

  2. Goodfellow I J, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. In: 3rd International conference on learning representations, ICLR, conference track proceedings. arXiv:1412.6572

  3. Mudrakarta P K, Taly A, Sundararajan M, Dhamdhere K (2018) Did the model understand the question?. In: Proceedings of the 56th annual meeting of the association for computational linguistics (volume 1: long papers). https://doi.org/10.18653/v1/P18-1176. Association for Computational Linguistics (ACL), pp 1896–1906

  4. Miyato T, Dai A M, Goodfellow I (2016) Adversarial training methods for semi-supervised text classification. In: 5th International conference on learning representations, ICLR, Conference track proceedings. https://openreview.net/forum?id=r1X3g2_xl

  5. Sato M, Suzuki J, Shindo H, Matsumoto Y (2018) Interpretable adversarial perturbation in input embedding space for text. In: Proceedings of the 27th international joint conference on artificial intelligence. https://dl.acm.org/doi/10.5555/3304222.3304371. AAAI Press, pp 4323–4330

  6. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473

  7. Lin Z, Feng M, dos Santos C N, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. In: Proceedings of the 5th international conference on learning representations, ICLR, conference track proceedings. https://openreview.net/forum?id=BJC_jUqxe&noteId=BJC_jUqxe

  8. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 30th international conference on neural information processing systems. https://papers.nips.cc/paper/7181-attention-is-all-you-need, pp 5998–6008

  9. Devlin J, Chang M -W, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). https://doi.org/10.18653/v1/N19-1423. Association for Computational Linguistic (ACL), pp 4171–4186

  10. Jain S, Wallace B C (2019) Attention is not explanation. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). https://doi.org/10.18653/v1/N19-1357. Association for Computational Linguistics (ACL), pp 3543–3556

  11. Kitada S, Iyatomi H (2021) Attention meets perturbations: robust and interpretable attention with adversarial training. IEEE Access 9:92974–92985

    Article  Google Scholar 

  12. Miyato T, Maeda S-I, Koyama M, Ishii S (2018) Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE Trans Pattern Anal Mach Intell 41(8):1979–1993

    Article  Google Scholar 

  13. Chen L, Ruan W, Liu X, Lu J (2020) Seqvat: virtual adversarial training for semi-supervised sequence labeling. In: Proceedings of the 58th annual meeting of the association for computational linguistics. https://doi.org/10.18653/v1/2020.acl-main.777. Association for Computational Linguistics (ACL), pp 8801–8811

  14. Simonyan K, Vedaldi A, Zisserman A (2013) Deep inside convolutional networks: Visualising image classification models and saliency maps. In: 2nd International conference on learning representations, ICLR, workshop track proceedings. arXiv:1312.6034

  15. DeYoung J, Jain S, Rajani N F, Lehman E, Xiong C, Socher R, Wallace B C (2019) Eraser: a benchmark to evaluate rationalized nlp models. In: Proceedings of the 58th annual meeting of the association for computational linguistics. https://doi.org/10.18653/v1/2020.acl-main.408. Association for Computational Linguistics (ACL), pp 4443–4458

  16. Wang B, Gao J, Qi Y (2016) A theoretical framework for robustness of (deep) classifiers against adversarial examples. CoRR. arXiv:1612.00334

  17. Li Z, Feng C, Wu M, Yu H, Zheng J, Zhu F (2021) Adversarial robustness via attention transfer. Pattern Recogn Lett 146:172–178. https://doi.org/10.1016/j.patrec.2021.03.011

    Article  Google Scholar 

  18. Yi Z, Yu J, Tan Y, Wu Q (2022) Fine-tuning more stable neural text classifiers for defending word level adversarial attacks. Appl Intell 1–18

  19. Yasunaga M, Kasai J, Radev D (2018) Robust multilingual part-of-speech tagging via adversarial training. In: Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, volume 1 (long papers). https://doi.org/10.18653/v1/N18-1089. Association for Computational Linguistics (ACL), pp 976–986

  20. Wu Y, Bamman D, Russell S (2017) Adversarial training for relation extraction. In: Proceedings of the 2017 conference on empirical methods in natural language processing. https://doi.org/10.18653/v1/D17-1187. Association for Computational Linguistics (ACL), pp 1778–1783

  21. Wang Y, Huang M, Zhao L (2016) Attention-based LSTM for aspect-level sentiment classification. In: Proceedings of the 2016 conference on empirical methods in natural language processing. https://doi.org/10.18653/v1/D16-1058. Association for Computational Linguistics (ACL), pp 606–615

  22. Chelba C, Mikolov T, Schuster M, Ge Q, Brants T, Koehn P, Robinson T (2014) One billion word benchmark for measuring progress in statistical language modeling. In: Proceedings of the 15th annual conference of the international speech communication association. arXiv:1312.3005. International Speech Communication Association (ISCA), pp 2635–2639

  23. An J, Wang K, Sun H, Cui C, Li W, Ma C (2022) Attention virtual adversarial based semi-supervised question generation. Concurr Comput Pract Exp 34(10):6797

    Article  Google Scholar 

  24. Dai K, Li X, Huang X, Ye Y (2022) Sentatn: learning sentence transferable embeddings for cross-domain sentiment classification. Appl Intell 1–14

  25. He X, Golub D (2016) Character-level question answering with attention. In: Proceedings of the 2016 conference on empirical methods in natural language processing. https://doi.org/10.18653/v1/D16-1166. Association for Computational Linguistics (ACL), pp 1598–1607

  26. Parikh A, Täckström O, Das D, Uszkoreit J (2016) A decomposable attention model for natural language inference. In: Proceedings of the 2016 conference on empirical methods in natural language processing. https://doi.org/10.18653/v1/D16-1244. Association for Computational Linguistics (ACL), pp 2249–2255

  27. Li J, Monroe W, Jurafsky D (2016) Understanding neural networks through representation erasure. CoRR. arXiv:1612.08220

  28. Pruthi D, Gupta M, Dhingra B, Neubig G, Lipton Z C (2019) Learning to deceive with attention-based explanations. In: Proceedings of the 58th annual meeting of the association for computational linguistics. https://doi.org/10.18653/v1/2020.acl-main.432. Association for Computational Linguistics (ACL), pp 4782–4793

  29. Meister C, Lazov S, Augenstein I, Cotterell R (2021) Is sparse attention more interpretable? arXiv:2106.01087

  30. Socher R, Perelygin A, Wu J, Chuang J, Manning C D, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing. https://www.aclweb.org/anthology/D13-1170/. Association for Computational Linguistics (ACL), pp 1631–1642

  31. Maas A L, Daly R E, Pham P T, Huang D, Ng A Y, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. https://www.aclweb.org/anthology/P11-1015/. Association for Computational Linguistics (ACL), pp 142–150

  32. Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Proceedings of the 28th international conference on neural information processing systems, vol 1. https://dl.acm.org/doi/10.5555/2969239.2969312. MIT Press, pp 649–657

  33. Hermann K M, Kocisky T, Grefenstette E, Espeholt L, Kay W, Suleyman M, Blunsom P (2015) Teaching machines to read and comprehend. In: Proceedings of the 28th international conference on neural information processing systems, vol 1. https://dl.acm.org/doi/10.5555/2969239.2969428. MIT Press, pp 1693–1701

  34. Bowman S, Angeli G, Potts C, Manning C D (2015) A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 conference on empirical methods in natural language processing. https://doi.org/10.18653/v1/D15-1075. Association for Computational Linguistics (ACL), pp 632–642

  35. Williams A, Nangia N, Bowman S R (2017) A broad-coverage challenge corpus for sentence understanding through inference. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies (long papers). https://doi.org/10.18653/v1/N18-1101. Association for Computational Linguistics (ACL)

  36. Mohankumar A K, Nema P, Narasimhan S, Khapra M M, Srinivasan B V, Ravindran B (2020) Towards transparent and explainable attention models. In: Proceedings of the 58th annual meeting of the association for computational linguistics. https://doi.org/10.18653/v1/2020.acl-main.387. Association for Computational Linguistics (ACL), pp 4206–4216

  37. Lehman E, DeYoung J, Barzilay R, Wallace B C (2019) Inferring which medical treatments work from reports of clinical trials. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). https://doi.org/10.18653/v1/N19-1371, pp 3705–3717

  38. Gardner M, Grus J, Neumann M, Tafjord O, Dasigi P, Liu N, Peters M, Schmitz M, Zettlemoyer L (2018) Allennlp: a deep semantic natural language processing platform. In: Proceedings of workshop for NLP Open Source Software (NLP-OSS). https://doi.org/10.18653/v1/W18-2501. Association for Computational Linguistics (ACL), pp 1–6

  39. Wallace E, Tuyls J, Wang J, Subramanian S, Gardner M, Singh S (2019) allenNLP Interpret: A framework for explaining predictions of NLP models. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP): system demonstrations. https://doi.org/10.18653/v1/D19-3002. Association for Computational Linguistics (ACL), pp 7–12

  40. Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146

    Article  Google Scholar 

  41. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  42. Kingma D P, Ba J (2014) Adam: a method for stochastic optimization. CoRR. arXiv:1412.6980

  43. Dodge J, Gururangan S, Card D, Schwartz R, Smith N A (2019) Show your work: improved reporting of experimental results. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). https://doi.org/10.18653/v1/D19-1224. Association for Computational Linguistics (ACL), pp 2185–2194

Download references

Funding

This work was supported by JSPS KAKENHI under Grant 21J14143.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shunsuke Kitada.

Ethics declarations

Ethics approval

Not applicable

Conflict of Interests

The authors declare no conflict of interest.

Additional information

Availability of data and materials

The datasets used during the current study are available. Refer to Appendix Appendix for details on the dataset and preprocessing.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Common model architecture

In this appendix, we introduce the common model architecture to which our training techniques were applied. The common model is a practical and widely used RNN-based model, and its performance has been compared in extensive experiments focusing on attention mechanisms [10, 11, 29]. As the settings differ for single- and pair-sequence tasks, we defined a model for each task, as illustrated in Fig. 3, and the details are described below.

Fig. 3
figure 3

Common models for applying the proposed training technique: (a) single-sequence model for the text classification task, and (b) pair-sequence model for QA and NLI tasks

1.1 A.1 Model for single-sequence tasks

Figure 3a presents the model for the single-sequence tasks, such as text classification. The input of the model was a word sequence of one-hot encoding \(X_{S} = (\boldsymbol {x}_{1}, \boldsymbol {x}_{2}, \cdots , \boldsymbol {x}_{T_{S}}) \in \mathbb {R}^{|{V}| \times T_{S}}\), where |V | and TS are the vocabulary size and the number of words in the sequence, respectively. Let \(\boldsymbol {w}_{t} \in \mathbb {R}^{d}\) be a d-dimensional word embedding corresponding to xt. Each word was represented with the word embeddings to obtain \((\boldsymbol {w}_{t})_{t=1}^{T_{S}} \in \mathbb {R}^{d \times T_{S}}\). The word embeddings were encoded with a bidirectional RNN (BiRNN)-based encoder Enc to obtain the m-dimensional hidden state:

$$ \boldsymbol{h}_{t} = \mathbf{Enc}(\boldsymbol{w}_{t}, \boldsymbol{h}_{t-1}), $$
(15)

where h0 is the initial hidden state and it is regarded as a zero vector. Following [10] and [11], we used the additive formulation of attention mechanisms [6] to compute the attention score for the t-th word \(\tilde {a}_{t}\), which is defined as:

$$ \tilde{a}_{t} = \boldsymbol{c}^{\top} \tanh(W\boldsymbol{h}_{t} + \boldsymbol{b}), $$
(16)

where \(W \in \mathbb {R}^{d^{\prime } \times m}\) and \(\boldsymbol {b}, \boldsymbol {c} \in \mathbb {R}^{d^{\prime }}\) are the model parameters. Subsequently, the attention weights \(\boldsymbol {a} \in \mathbb {R}^{T}\) for all words were computed from the attention scores \(\tilde {\boldsymbol {a}} = (\tilde {a}_{t})_{t=1}^{T_{S}}\), as follows:

$$ \boldsymbol{a} = (a_{t})_{t=1}^{T_{S}} = \text{softmax}(\tilde{\boldsymbol{a}}). $$
(17)

The weighted instance representation ha was calculated using the attention weights a and hidden state ht, as follows:

$$ \boldsymbol{h}_{\boldsymbol{a}} = \sum\limits_{t=1}^{T_{S}} a_{t} \boldsymbol{h}_{t}. $$
(18)

Finally, ha was fed to a dense layer Dec, and the output activation function σ was used to obtain the following predictions:

$$ \hat{\boldsymbol{y}} = \sigma(\mathbf{Dec}(\boldsymbol{h}_{\boldsymbol{\boldsymbol{a}}})) \in \mathbb{R}^{|{\boldsymbol{y}}|}, $$
(19)

where σ is a sigmoid function and |y| is the class label set size.

1.2 A.2 Model for pair-sequence tasks

Figure 3b presents the model for pair-sequence tasks, such as QA and NLI. The input of the model was \(X_{P} = (\boldsymbol {x}_{t}^{(p)})_{t=1}^{T_{P}} \in \mathbb {R}^{|{V}| \times T_{P}}\) and \(X_{Q} = (\boldsymbol {x}_{t}^{(q)})_{t=1}^{T_{Q}} \in \mathbb {R}^{|{V}| \times T_{Q}}\), where TP and TQ are the number of words in each sentence. Furthermore, XP and XQ represent the paragraph and question in QA tasks and the hypothesis and premise in NLI tasks, respectively. We used two separate BiRNN encoders (EncP and EncQ) to obtain the hidden states \(\boldsymbol {h}_{t}^{(p)} \in \mathbb {R}^{m}\) and \(\boldsymbol {h}_{t}^{(q)} \in \mathbb {R}^{m}\):

$$ \boldsymbol{h}_{t}^{(p)} = \mathbf{Enc}_{P}(\boldsymbol{w}_{t}^{(p)}, \boldsymbol{h}_{t-1}^{(p)}); ~~ \boldsymbol{h}_{t}^{(q)} \!= \mathbf{Enc}_{Q}(\boldsymbol{w}_{t}^{(q)}, \boldsymbol{h}_{t-1}^{(q)}), $$
(20)

where \(\boldsymbol {h}_{0}^{(p)}\) and \(\boldsymbol {h}_{0}^{(q)}\) are the initial hidden states, and they are regarded as zero vectors. Subsequently, we computed the attention weight \(\tilde {a}_{t}\) of each word of XP as follows:

$$ \tilde{a}_{t} = \boldsymbol{c}^{\top} \tanh(\boldsymbol{W}_{1}\boldsymbol{h}_{t}^{(p)} + \boldsymbol{W}_{2}\boldsymbol{h}_{T_{Q}}^{(q)} + \boldsymbol{b}), $$
(21)

where \(\boldsymbol {W}_{1} \in \mathbb {R}^{d^{\prime } \times m}\) and \(\boldsymbol {W}_{2} \in \mathbb {R}^{d^{\prime } \times m}\) denote the projection matrices, and \(\boldsymbol {b}, \boldsymbol {c} \in \mathbb {R}^{d^{\prime }}\) are the parameter vectors. Similar to (17), the attention weight at could be calculated from \(\tilde {a}_{t}\). The presentation was obtained from the sum of the words in XP.

$$ \boldsymbol{h}_{\boldsymbol{a}} = \sum\limits_{t=1}^{T_{P}} a_{t} \boldsymbol{h}_{t}^{(p)} $$
(22)

was fed to a Dec, following which a softmax function was used as σ to obtain the prediction (in the same manner as in (19)).

1.3 A.3 Model training with attention mechanisms

Let \(X_{\tilde {\boldsymbol {a}}}\) be an input sequence with an attention score \(\tilde {\boldsymbol {a}}\), where \(\tilde {\boldsymbol {a}}\) is a concatenated attention score for all t. The conditional probability of the class y was modeled as \(p(\boldsymbol {y} \vert X_{\tilde {\boldsymbol {a}}}; \boldsymbol {\theta })\), where 𝜃 represents all model parameters. We minimized the following negative log-likelihood as a loss function for the model parameters to train the model:

$$ \mathcal{L}(X_{\tilde{\boldsymbol{a}}}, \boldsymbol{y}; \boldsymbol{\theta}) = - \log{p(\boldsymbol{y} \vert X_{\tilde{\boldsymbol{a}}}; \boldsymbol{\theta})}. $$
(23)

Appendix B: Implementation details

We implemented all of the training techniques using the AllenNLP library with Interpret [38, 39]. We evaluated the test set only once in all experiments. The experiments were conducted on an Ubuntu PC with a GeForce GTX 1080 Ti GPU. Our implementation is based on the one published by [11].Footnote 3 Note that the model that was used in our experiment had a small number of parameters compared to recent models; therefore, the execution speed of the model was also fast.

2.1 B.1 Supervised classification task

We used pretrained fastText [40] word embedding with 300 dimensions and a one-layer bi-directional long short-term memory (LSTM) [41] as the encoder with a hidden size of 256 for the supervised settings. We used a sigmoid function as the output activation function. All models were regularized using L2 regularization (10− 5) that was applied to all of the parameters. We trained the model using the maximum likelihood loss and the Adam optimizer [42] with a learning rate of 0.001. All of the experiments were conducted at λ = 1.

We searched for the best hyperparameter 𝜖 from [0.01, 30.00] following [11]. The Allentune library [43] was used to adjust 𝜖, and we decided on the value of the hyperparameter 𝜖 based on the validation score.

2.2 B.2 Semi-supervised classification task

We used the same pretrained fastText and encoder for the semi-supervised settings. We again used the Adam optimizer [42] with the same learning rate as that in the supervised classification task. The same hyperparameter search was performed as in the supervised settings. All of the experiments were conducted at λ = 1 in the semi-supervised settings.

We also performed the same preprocessing according to Chen et al. [13] using the same unlabeled data. We determined the amount of unlabeled data Nul based on the validation score for each benchmark dataset. We reported the test score of the model with the highest validation score.

Appendix C: Details of tasks and dataset

3.1 C.1 Single-sequence task

SST [30]Footnote 4 was used to ascertain the positive or negative sentiment the a sentence. IMDB Large Movie Reviews (IMDB) [31]Footnote 5,Footnote 6 was used to identify positive or negative sentiments from movie reviews. AG News (AGNews) [32]Footnote 7 was used to identify the topic of news articles as either the world (set as a negative label) or business (set as a positive label).

3.2 C.2 Pair-sequence task

CNN news article corpus (CNN news) [33]Footnote 8 was used to identify the answer entities from a paragraph. SNLI [34]Footnote 9 was used to determine whether a hypothesis sentence entailed, contradicted, or was neutral regarding a given premise sentence. Multi-Genre NLI (MultiNLI) [35]Footnote 10 used the same format as SNLI and was a comparable in size, but it included a more diverse range of text, as well as an auxiliary test set for cross-genre transfer evaluation.

Appendix D: Details of evaluation criteria

4.1 Correlation between attention and gradient-based word importance

We computed how the attention weighted obtained through our VAT-based technique agree with the importance of words calculated by gradients [14]. This evaluation follows Jain and Wallace [10]. The correlation τg is defined from the attention \(\boldsymbol {a} \in \mathbb {R}^{T}\) and the gradient-based word importance \(\boldsymbol {g} \in \mathbb {R}^{T}\) as follows:

$$ \tau_{g} = \text{PearsonCorr}(\boldsymbol{a}, \boldsymbol{g}). $$
(24)

The gradient-based word importance \(\boldsymbol {g} = (g_{t})_{t=1}^{T}\) is calculated as follows:

$$ g_{t} = |\sum\limits_{i=1}^{|{V}|} \mathbbm{1} [X_{it} = 1] \frac{\partial \boldsymbol{y}}{\partial X_{it}} |, \forall t \in [1, T], $$
(25)

where Xit is the t-th one-hot encoded word for the i-th vocabulary in XR|VT, and T is the number of words in the sequence.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kitada, S., Iyatomi, H. Making attention mechanisms more robust and interpretable with virtual adversarial training. Appl Intell 53, 15802–15817 (2023). https://doi.org/10.1007/s10489-022-04301-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04301-w

Keywords

Navigation