Abstract
Interpretability is becoming an expected and even essential characteristic in GDPR Europe. In the majority of existing work on natural language processing (NLP), interpretability has focused on the problem of explanatory responses to questions like “Why p?” (identifying the causal attributes that support the prediction of "p.)” This type of local explainability focuses on explaining a single prediction made by a model for a single input, by quantifying the contribution of each feature to the predicted output class. Most of these methods are based on post-hoc approaches. In this paper, we propose a technique to learn centroid vectors concurrently while building the black-box in order to support answers to “Why p?” and “Why p and not q?,” where “q” is another class that is contrastive to “p.” Across multiple datasets, our approach achieves better results than traditional post-hoc methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bashier, H.K., Kim, M.Y., Goebel, R.: Disk-CSV: distilling interpretable semantic knowledge with a class semantic vector. In: Proceedings of the 16th Conference of the EACL Main Volume, pp. 3021–3030 (2021)
Bastings, J., Aziz, W., Titov, I.: Interpretable neural predictions with differentiable binary variables. In: Proceedings of ACL, pp. 2963–2977 (2019)
Chen, J., Song, L., Wainwright, M.J., Jordan, M.I.: L-shapley and c-shapley: efficient model interpretation for structured data. In: ICLR 2019 (2018)
DeYoung, J., et al.: Eraser: a benchmark to evaluate rationalized NLP models. In: Proceedings of the 58th ACL, pp. 4443–4458 (2020)
Einhorn, H.J., Hogarth, R.M.: Judging probable cause. Psychol. Bull. 99(1), 3 (1986)
Hendricks, L.A., Hu, R., Darrell, T., Akata, Z.: Generating counterfactual explanations with natural language. In: ICML Workshop on Human Interpretability in Machine Learning, pp. 95–98 (2018)
Hilton, D.J.: Conversational processes and causal explanation. Psychol. Bull. 107(1), 65 (1990)
Ismail, A.A., Corrada Bravo, H., Feizi, S.: Improving deep learning interpretability by saliency guided training. Adv. Neural Inf. Process. Syst. 34, 26726–26739 (2021)
Jacovi, A., Swayamdipta, S., Ravfogel, S., Elazar, Y., Choi, Y., Goldberg, Y.: Contrastive explanations for model interpretability. arXiv preprint arXiv:2103.01378 (2021)
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. Cornell University Library. arXiv preprint arXiv:1412.6980 (2017)
Koura, A.: An approach to why-questions. Synthese 74(2), 191–206 (1988)
Lipton, P.: Contrastive explanation. Royal Inst. Philos. Suppl. 27, 247–266 (1990)
Lipton, Z.C.: The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery. Queue 16(3), 31–57 (2018)
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of ACL, pp. 142–150. ACL (2011)
McGill, A.L., Klein, J.G.: Contrastive and counterfactual reasoning in causal judgment. J. Pers. Soc. Psychol. 64(6), 897 (1993)
Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019)
Nguyen, D.: Comparing automatic and human evaluation of local explanations for text classification. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1069–1078 (2018)
Rathi, S.: Generating counterfactual and contrastive explanations using shap. arXiv preprint arXiv:1906.09293 (2019)
Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you?: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144. ACM (2016)
Rudin, C.: Please stop explaining black box models for high stakes decisions. In: 32nd Conference on Neural Information Processing Systems (NIPS 2018), Workshop on Critiquing and Correcting Trends in Machine Learning (2018)
Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 3145–3153. JMLR. org (2017)
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: Proceedings of International Conference on Machine Learning (ICML), pp. 3319–3328 (2017)
Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on EMNLP, pp. 1422–1432 (2015)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
van der Waa, J., Robeer, M., van Diggelen, J., Brinkhuis, M., Neerincx, M.: Contrastive explanations with local foil trees. arXiv preprint arXiv:1806.07470 (2018)
Wachter, S., Mittelstadt, B., Russell, C.: Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harv. JL Tech. 31, 841 (2017)
Wang, Y., Huang, H., Rudin, C., Shaposhnik, Y.: understanding how dimension reduction tools work: an empirical approach to deciphering t-SNE, UMAP, TriMap, and PaCMAP for data visualization. J. Mach. Learn. Res. 22(201), 1–73 (2021)
Woodward, J.: Making Things Happen: A Theory of Causal Explanation. Oxford University Press, Oxford (2005)
Yang, L., Kenny, E., Ng, T.L.J., Yang, Y., Smyth, B., Dong, R.: Generating plausible counterfactual explanations for deep transformers in financial text classification. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 6150–6160 (2020)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015)
Acknowledgements
We would like to acknowledge the support of the Alberta Machine Intelligence Institute (Amii), and the Natural Sciences and Engineering Research Council of Canada (NSERC).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Babiker, H.K.B., Kim, MY., Goebel, R. (2023). Neural Networks with Feature Attribution and Contrastive Explanations. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13713. Springer, Cham. https://doi.org/10.1007/978-3-031-26387-3_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-26387-3_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26386-6
Online ISBN: 978-3-031-26387-3
eBook Packages: Computer ScienceComputer Science (R0)