A-XAI: adversarial machine learning for trustable explainability

Agrawal, Nishita; Pendharkar, Isha; Shroff, Jugal; Raghuvanshi, Jatin; Neogi, Akashdip; Patil, Shruti; Walambe, Rahee; Kotecha, Ketan

doi:10.1007/s43681-023-00368-4

A-XAI: adversarial machine learning for trustable explainability

Original Research
Published: 17 January 2024

(2024)
Cite this article

AI and Ethics Aims and scope Submit manuscript

Nishita Agrawal¹,
Isha Pendharkar¹,
Jugal Shroff¹,
Jatin Raghuvanshi¹,
Akashdip Neogi¹,
Shruti Patil^1,2,
Rahee Walambe ORCID: orcid.org/0000-0003-1745-5231^1,2 &
…
Ketan Kotecha^1,2

222 Accesses
Explore all metrics

Abstract

With the recent advancements in the usage of Artificial Intelligence (AI)-based systems in the healthcare and medical domain, it has become necessary to monitor whether these systems make predictions using the correct features or not. For this purpose, many different types of model interpretability and explainability methods are proposed in the literature. However, with the rising number of adversarial attacks against these AI-based systems, it also becomes necessary to make those systems more robust to adversarial attacks and validate the correctness of the generated model explainability. In this work, we first demonstrate how an adversarial attack can affect the model explainability even after robust training. Along with this, we present two different types of attack classifiers: one that can detect whether the given input is benign or adversarial and the other classifier that can identify the type of attack. We also identify the regions affected by the adversarial attack using model explainability. Finally, we demonstrate how the correctness of the generated explainability can be verified using model interpretability methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature Importance in Explainable AI for Expounding Black Box Models

Explainable Artificial Intelligence: Concepts and Current Progression

Comparing Explanations from Glass-Box and Black-Box Machine-Learning Models

Data availability

All the datasets used for training the models are publicly available and their links are provided in the reference section. The pre-processed and the generated datasets can be accessed via the link provided below: https://drive.google.com/drive/folders/1PM9m61SHcuRuI-3jMnEZXOj0fpNYxCC0?usp=sharing.

References

M. D. F. ltd: Ai in healthcare market: Size, share, growth: 2022–2027,” Market Data Forecast. [Online]. https://www.marketdataforecast.com/market-reports/artificial-intelligence-in-healthcare-market. Accessed 10 Nov 2022
Kong, Z., Xue, J., Wang, Y., Huang, L., Niu, Z., Li, F.: A survey on adversarial attack in the age of artificial intelligence. Wirel. Commun. Mob. Compu. (2021). https://doi.org/10.1155/2021/4907754
Article Google Scholar
Gragnaniello, D., Marra, F., Poggi, G., Verdoliva, L.: Analysis of adversarial attacks against CNN-based image forgery detectors. In: 2018 26th European Signal Processing Conference (EUSIPCO), Rome, Italy, pp 967–971 (2018), https://doi.org/10.23919/EUSIPCO.2018.8553560.
Saleh, A., Sukaik, R., Abu-Naser, S.S.: Brain tumor classification using deep learning. In: 2020 International Conference on Assistive and Rehabilitation Technologies (iCareTech), pp. 131–136 (2020), https://doi.org/10.1109/iCareTech49914.2020.00032
Soomro, T.A., et al.: Image segmentation for MR brain tumor detection using machine learning: a review. IEEE Rev. Biomed. Eng. 15, 10 (2022). https://doi.org/10.1109/RBME.2022.3185292
Article Google Scholar
Dhar, T., Dey, N., Borra, S., Sherratt, R.S.: Challenges of Deep Learning in Medical Image Analysis—Improving Explainability and Trust. IEEE Trans. Technol. Soc. 4(1), 68–75 (2023)
Article Google Scholar
van der Velden, B.H.M., Kuijf, H.J., Gilhuijs, K.G.A., Viergever, M.A.: Explainable artificial intelligence (XAI) in deep learning-based medical image analysis. Med. Image Anal., 79, 102470, ISSN 1361–8415 (2022) https://doi.org/10.1016/j.media.2022.102470. https://www.sciencedirect.com/science/article/pii/S1361841522001177
Mahapatra, D.: Cyclic generative adversarial networks with congruent image-report generation for explainable medical image analysis. arXiv preprint arXiv:2211.08424 (2022)
Ma, X., Niu, Y., Gu, L., Wang, Y., Zhao, Y., Bailey, J., Lu, F.: Understanding adversarial attacks on deep learning-based medical image analysis systems. Pattern Recognit. 110, 107332, ISSN 0031-3203 (2021) https://doi.org/10.1016/j.patcog.2020.107332. https://www.sciencedirect.com/science/article/pii/S0031320320301357
Selvaganapathy, S., Sadasivam, S., & Raj, N.: SafeXAI: Explainable AI to detect adversarial attacks in electronic medical records. In: Intelligent Data Engineering and Analytics: Proceedings of the 9th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA 2021), pp. 501–509. Singapore: Springer Nature Singapore (2022)
Eldrandaly, K.A., Abdel-Basset, M., Ibrahim, M., Abdel-Aziz, N.M.: J. Enterp. Inf. Syst. 15, 10 (2022). https://doi.org/10.1080/17517575.2022.2098537
Article Google Scholar
Ghorbani, A., Abid, A., Zou, J.: Interpretation of neural networks is fragile. AAAI 33(01), 3681–3688 (2019)
Article Google Scholar
Kindermans, P.-J., Hooker, S., Adebayo, J., Alber, M., Schütt, K., Dähne, S., Erhan, D., Kim, B.: The (Un)reliability of Saliency Methods. https://doi.org/10.1007/978-3-030-28954-6_14. (2019)
Heo, J., Joo, S., Moon, T.: Fooling neural network interpretations via adversarial model manipulation. Adv. Neural Inform. Process. Syst. 32 (2019)
Dombrowski, A.-K., et al.: Explanations can be manipulated and geometry is to blame. Adv. Neural Inf. Process. Syst. 32 (2019)
Zhang, X., et al.: Interpretable deep learning under fire. In: 29th {USENIX} Security Symposium ({USENIX} Security 20). (2020)
Woods, W., Chen, J., Teuscher, C.: Adversarial explanations for understanding image classification decisions and improved neural network robustness. Nat. Mach. Intell. 1(11), 508–516 (2019)
Article Google Scholar
Chen, J., et al.: Robust attribution regularization. Adv Neural Inform Process Syst. 32 (2019)
Rieger L., Hansen, L.K.: A simple defense against adversarial attacks on heatmap explanations. arXiv preprint arXiv:2007.06381 (2020)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial arXiv:1412.6572 (2014)
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. CoRR, abs/1706.06083. https://arxiv.org/abs/1706.06083 (2017)
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582, (2016)
Hart, S.: Shapley value. In: Eatwell, J., Milgate, M., Newman, P. (eds.) Game Theory, pp. 210–216. Palgrave Macmillan UK (1989)
Chapter Google Scholar
Sundararajan, M., Najmi, A.: The many Shapley values for model explanation. In International conference on machine learning, pp. 9269–9278. In: PMLR (2020)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.:. Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626. (2017)
Nickparvar, M.: Brain Tumor MRI Dataset . Kaggle. https://doi.org/10.34740/KAGGLE/DSV/2645886 (2021)
Nicolae, M. I., Sinn, M., Tran, M. N., Buesser, B., Rawat, A., Wistuba, M., Edwards, B.: Adversarial robustness toolbox v1. 0.0. arXiv preprint arXiv:1807.01069. (2018)

Download references

Acknowledgements

This work was supported by Symbiosis Centre for Applied Artificial Intelligence (SCAAI) and Symbiosis International University (SIU) under its research support fund.

Author information

Authors and Affiliations

Symbiosis Institute of Technology (SIT), Symbiosis International (Deemed University), Pune, India
Nishita Agrawal, Isha Pendharkar, Jugal Shroff, Jatin Raghuvanshi, Akashdip Neogi, Shruti Patil, Rahee Walambe & Ketan Kotecha
Symbiosis Centre for Applied Artificial Intelligence, Symbiosis Institute of Technology (SIT), Symbiosis International (Deemed University), Pune, India
Shruti Patil, Rahee Walambe & Ketan Kotecha

Authors

Nishita Agrawal
View author publications
You can also search for this author in PubMed Google Scholar
Isha Pendharkar
View author publications
You can also search for this author in PubMed Google Scholar
Jugal Shroff
View author publications
You can also search for this author in PubMed Google Scholar
Jatin Raghuvanshi
View author publications
You can also search for this author in PubMed Google Scholar
Akashdip Neogi
View author publications
You can also search for this author in PubMed Google Scholar
Shruti Patil
View author publications
You can also search for this author in PubMed Google Scholar
Rahee Walambe
View author publications
You can also search for this author in PubMed Google Scholar
Ketan Kotecha
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, AN, IP, JR, JS, NA, RW, SP; methodology, AN, IP, JR, JS, NA; software, AN, IP, JR, JS, NA; software, AN, IP, JR, JS, NA; validation, RW, SP; investigation, RW, SP; data curation, AN, IP, JR, JS, NA; writing—original draft preparation, AN, IP, JR, JS, NA; writing—review and editing, RW, SP, K K; Visualization, AN, IP, JR, JS, NA; supervision, RW; project administration, RW; funding acquisition, KK All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Rahee Walambe.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix: Confusion matrix of brain tumor classifier on benign and adversarial test samples before and after adversarial training

See Tables 20, 21, 22, 23.

Table 20 Confusion matrix of brain tumor classifier on benign test samples before adversarial training

Full size table

Table 21 Confusion matrix of brain tumor classifier on adversarial test samples before adversarial training

Full size table

Table 22 Confusion matrix of robust brain tumor classifier on benign test samples after adversarial training

Full size table

Table 23 Confusion matrix of robust brain tumor classifier on adversarial test samples after adversarial training

Full size table

Confusion matrix

See Tables 24, 25.

Table 24 Confusion matrix (attack detector)

Full size table

Table 25 Confusion matrix (attack classifier)

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Agrawal, N., Pendharkar, I., Shroff, J. et al. A-XAI: adversarial machine learning for trustable explainability. AI Ethics (2024). https://doi.org/10.1007/s43681-023-00368-4

Download citation

Received: 06 March 2023
Accepted: 21 October 2023
Published: 17 January 2024
DOI: https://doi.org/10.1007/s43681-023-00368-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A-XAI: adversarial machine learning for trustable explainability

Abstract

Access this article

Similar content being viewed by others

Feature Importance in Explainable AI for Expounding Black Box Models

Explainable Artificial Intelligence: Concepts and Current Progression

Comparing Explanations from Glass-Box and Black-Box Machine-Learning Models

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix: Confusion matrix of brain tumor classifier on benign and adversarial test samples before and after adversarial training

Confusion matrix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A-XAI: adversarial machine learning for trustable explainability

Abstract

Access this article

Similar content being viewed by others

Feature Importance in Explainable AI for Expounding Black Box Models

Explainable Artificial Intelligence: Concepts and Current Progression

Comparing Explanations from Glass-Box and Black-Box Machine-Learning Models

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix: Confusion matrix of brain tumor classifier on benign and adversarial test samples before and after adversarial training

Confusion matrix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation