Skip to main content
Log in

An attention matrix for every decision: faithfulness-based arbitration among multiple attention-based interpretations of transformers in text classification

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Transformers are widely used in natural language processing, where they consistently achieve state-of-the-art performance. This is mainly due to their attention-based architecture, which allows them to model rich linguistic relations between (sub)words. However, transformers are difficult to interpret. Being able to provide reasoning for its decisions is an important property for a model in domains where human lives are affected. With transformers finding wide use in such fields, the need for interpretability techniques tailored to them arises. We propose a new technique that selects the most faithful attention-based interpretation among the several ones that can be obtained by combining different head, layer and matrix operations. In addition, two variations are introduced towards (i) reducing the computational complexity, thus being faster and friendlier to the environment, and (ii) enhancing the performance in multi-label data. We further propose a new faithfulness metric that is more suitable for transformer models and exhibits high correlation with the area under the precision-recall curve based on ground truth rationales. We validate the utility of our contributions with a series of quantitative and qualitative experiments on seven datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Availability of Data and Materials

All datasets used in these research are public and freely available.

Code Availability

Experiments’ code is available in GitHub: https://tinyurl.com/bdh3v2nw.

Notes

  1. https://tinyurl.com/bdh3v2nw.

  2. https://tinyurl.com/2u8zeks8.

  3. https://tinyurl.com/4nprsskn.

References

  • Abnar S, Zuidema WH (2020) Quantifying attention flow in transformers. CoRR arXiv:2005.00928

  • Alammar J (2021) Ecco: An open source library for the explainability of transformer language models. In: Proceedings of the 59th Annual Meeting of the ACL and the 11th International Joint Conference on Natural Language Processing: System Demonstrations, pp 249–257. ACL, Online. https://doi.org/10.18653/v1/2021.acl-demo.30

  • Bacco L, Cimino A, Dell’Orletta F, Merone M (2021) Explainable sentiment analysis: a hierarchical transformer-based extractive summarization approach. Electronics 10(18):2195. https://doi.org/10.3390/electronics10182195

    Article  Google Scholar 

  • Bach S, Binder A, Montavon G, Klauschen F, Müller K-R, Samek W (2015) On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS One 10(7):1–46. https://doi.org/10.1371/journal.pone.0130140

    Article  Google Scholar 

  • Baker S, Silins I, Guo Y, Ali I, Högberg J, Stenius U, Korhonen A (2015) Automatic semantic classification of scientific literature according to the hallmarks of cancer. Bioinformatics 32(3):432–440. https://doi.org/10.1093/bioinformatics/btv585

    Article  Google Scholar 

  • Bastings J, Filippova K (2020) The elephant in the interpretability room: Why use attention as explanation when we have saliency methods? In: BlackboxNLP@EMNLP, pp 149–155. ACL, Online

  • Brunner G, Liu Y, Pascual D, Richter O, Ciaramita M, Wattenhofer R (2020) On identifiability in transformers. In: 8th International Conference on Learning Representations, ICLR. OpenReview.net, Online. https://openreview.net/forum?id=BJg1f6EFDB

  • Camburu O-M, Rocktäschel T, Lukasiewicz T, Blunsom P (2018) e-snli: Natural language inference with natural language explanations. Advances in Neural Information Processing Systems 31

  • Chan CS, Kong H, Guanqing L (2022) A comparative study of faithfulness metrics for model interpretability methods. In: Proceedings of the 60th Annual Meeting of the ACL (Volume 1: Long Papers), pp 5029–5038. ACL, Dublin, Ireland. https://doi.org/10.18653/v1/2022.acl-long.345

  • Chefer H, Gur S, Wolf L (2021) Transformer interpretability beyond attention visualization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 782–791

  • Clark K, Khandelwal U, Levy O, Manning CD (2019) What does BERT look at? An analysis of BERT’s attention. In: BlackboxNLP@EMNLP, pp. 276–286. ACL, Florence, Italy. https://doi.org/10.18653/v1/W19-4828

  • DeYoung J, Jain S, Rajani NF, Lehman E, Xiong C, Socher R, Wallace BC (2020) ERASER: A benchmark to evaluate rationalized NLP models. In: Proceedings of the 58th Annual Meeting of the ACL, pp 4443–4458. ACL, Online. https://doi.org/10.18653/v1/2020.acl-main.408

  • Du M, Liu N, Yang F, Ji S, Hu X (2019) On attribution of recurrent neural network predictions via additive decomposition. In: The World Wide Web Conference, pp 383–393

  • EU (2021) Proposal for a regulation of the european parliament and the council laying down harmonised rules on artificial intelligence (AI Act) and amending certain union legislative acts. EUR-Lex-52021PC0206

  • Feldhus N, Schwarzenberg R, Moller S (2021) Thermostat: A large collection of nlp model explanations and analysis tools. In: EMNLP

  • Hayati SA, Kang D, Ungar L (2021) Does BERT learn as humans perceive? understanding linguistic styles through lexica. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP, 7–11 November, pp 6323–6331. ACL, Online. https://doi.org/10.18653/v1/2021.emnlp-main.510

  • Herman B (2017) The promise and peril of human evaluation for model interpretability. ArXiv arXiv:1711.07414

  • Hoover B, Strobelt H, Gehrmann S (2020) exBERT: A visual analysis tool to explore learned representations in transformer models. In: Proceedings of the 58th Annual Meeting of the ACL: System Demonstrations, pp 187–196. ACL, Online. https://doi.org/10.18653/v1/2020.acl-demos.22

  • Jain S, Wallace BC (2019) Attention is not explanation. In: NAACL-HLT, pp 3543–3556. ACL, Minneapolis, Minnesota

  • Kim C, Zhu V, Obeid J, Lenert L (2019) Natural language processing and machine learning algorithm to identify brain MRI reports with acute ischemic stroke. PLoS One 14(2):1–13. https://doi.org/10.1371/journal.pone.0212778

    Article  Google Scholar 

  • Kovaleva O, Romanov A, Rogers A, Rumshisky A (2019) Revealing the dark secrets of BERT. CoRR arXiv:1908.08593

  • Lertvittayakumjorn P, Toni F (2019) Human-grounded evaluations of explanation methods for text classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP, November 3–7, pp 5194–5204. ACL, Hong Kong, China. https://doi.org/10.18653/v1/D19-1523

  • Liu S, Le F, Chakraborty S, Abdelzaher T (2021) On exploring attention-based explanation for transformer models in text classification. In: IEEE International Conference on Big Data (Big Data), pp 1193–1203. https://doi.org/10.1109/BigData52589.2021.9671639

  • Liu Y, Li H, Guo Y, Kong C, Li J, Wang S (2022) Rethinking attention-model explainability through faithfulness violation test. In: International Conference on Machine Learning, ICML, 17–23 July, vol. 162, pp 13807–13824. PMLR, Baltimore, Maryland. https://proceedings.mlr.press/v162/liu22i.html

  • Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems 30, pp 4765–4774. Curran Associates, Inc., Long Beach, California

  • Mathew B, Saha P, Yimam SM, Biemann C, Goyal P, Mukherjee A (2021) Hatexplain: A benchmark dataset for explainable hate speech detection. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, February 2-9, pp 14867–14875. AAAI Press, Online. https://ojs.aaai.org/index.php/AAAI/article/view/17745

  • Melis DA, Jaakkola T (2018) Towards robust interpretability with self-explaining neural networks. In: Advances in Neural Information Processing Systems, Montreal, Canada, pp 7775–7784

  • Mollas I, Bassiliades N, Tsoumakas G (2022) LioNets: a neural-specific local interpretation technique exploiting penultimate layer information. Appl Intell. https://doi.org/10.1007/s10489-022-03351-4

    Article  Google Scholar 

  • Mollas I, Chrysopoulou Z, Karlos S, Tsoumakas G (2022) ETHOS: a multi-label hate speech detection dataset. Complex Intell Syst. https://doi.org/10.1007/s40747-021-00608-2

    Article  Google Scholar 

  • Mullenbach J, Wiegreffe S, Duke J, Sun J, Eisenstein J (2018) Explainable prediction of medical codes from clinical text. In: NAACL-HLT, pp 1101–1111. ACL, New Orleans, Louisiana

  • Niu R, Wei Z, Wang Y, Wang Q (2022) Attexplainer: Explain transformer via attention by reinforcement learning. In: Proceedings of the 31st International Joint Conference on Artificial Intelligence, IJCAI-22, Vienna, Austria, pp 724–731. https://doi.org/10.24963/ijcai.2022/102

  • Patterson D, Gonzalez J, Le Q, Liang C, Munguia L-M, Rothchild D, So D, Texier M, Dean J (2021) Carbon emissions and large neural network training. arXiv. https://doi.org/10.48550/ARXIV.2104.10350

  • Ribeiro MT, Singh S, Guestrin C (2016) Why should i trust you? Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1135–1144. ACM

  • Robnik-Sikonja M, Bohanec M (2018) Perturbation-based explanations of prediction models. In: Zhou J, Chen F (eds) Human and machine learning - visible, explainable, trustworthy and transparent. Springer International, Cham, pp 159–175

    Google Scholar 

  • Rychener Y, Renard X, Seddah D, Frossard P, Detyniecki M (2020) On the granularity of explanations in model agnostic NLP interpretability. arXiv. To appear in ECMLPKDD2022 proceedings of XKDD workshop. https://doi.org/10.48550/ARXIV.2012.13189

  • Schwenke L, Atzmueller M (2021) Show me what you’re looking for: visualizing abstracted transformer attention for enhancing their local interpretability on time series data. In: The International FLAIRS Conference Proceedings, vol. 34

  • Schwenke L, Atzmueller M (2021) Show me what you’re looking for: Visualizing abstracted transformer attention for enhancing their local interpretability on time series data. The International FLAIRS Conference Proceedings 34. https://doi.org/10.32473/flairs.v34i1.128399

  • Sundararajan M, Taly A, Yan Q (2017) Axiomatic attribution for deep networks. In: Proceedings of the 34th International Conference on Machine Learning, ICML 6-11 August, vol. 70, pp 3319–3328. PMLR, Sydney, NSW, Australia. http://proceedings.mlr.press/v70/sundararajan17a.html

  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems 30

  • Vig J (2019) A multiscale visualization of attention in the transformer model. CoRR arXiv:1906.05714

  • Wang Y, Lee H-Y, Chen Y-N (2019) Tree transformer: Integrating tree structures into self-attention. In: Proceedings of EMNLP 2019 and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 1061–1070. ACL, Hong Kong, China. https://doi.org/10.18653/v1/D19-1098

  • Wiegreffe S, Pinter Y (2019) Attention is not not explanation. In: EMNLP/IJCNLP, pp 11–20. ACL, Hong Kong, China

  • Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, Davison J, Shleifer S, von Platen P, Ma C, Jernite Y, Plu J, Xu C, Le Scao T, Gugger S, Drame M, Lhoest Q, Rush A (2020) Transformers: State-of-the-art natural language processing. In: Proceedings of EMNLP 2020: System Demonstrations, pp 38–45. ACL, Online. https://doi.org/10.18653/v1/2020.emnlp-demos.6

Download references

Acknowledgements

The research work was supported by the Hellenic Foundation for Research and Innovation (H.F.R.I.) under the “First Call for H.F.R.I. Research Projects to support Faculty members and Researchers and the procurement of high-cost research equipment grant” (Project Number: 514)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikolaos Mylonas.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Responsible editor: Charalampos Tsourakakis.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mylonas, N., Mollas, I. & Tsoumakas, G. An attention matrix for every decision: faithfulness-based arbitration among multiple attention-based interpretations of transformers in text classification. Data Min Knowl Disc 38, 128–153 (2024). https://doi.org/10.1007/s10618-023-00962-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-023-00962-4

Keywords

Navigation