Skip to main content
Log in

A-XAI: adversarial machine learning for trustable explainability

  • Original Research
  • Published:
AI and Ethics Aims and scope Submit manuscript

Abstract

With the recent advancements in the usage of Artificial Intelligence (AI)-based systems in the healthcare and medical domain, it has become necessary to monitor whether these systems make predictions using the correct features or not. For this purpose, many different types of model interpretability and explainability methods are proposed in the literature. However, with the rising number of adversarial attacks against these AI-based systems, it also becomes necessary to make those systems more robust to adversarial attacks and validate the correctness of the generated model explainability. In this work, we first demonstrate how an adversarial attack can affect the model explainability even after robust training. Along with this, we present two different types of attack classifiers: one that can detect whether the given input is benign or adversarial and the other classifier that can identify the type of attack. We also identify the regions affected by the adversarial attack using model explainability. Finally, we demonstrate how the correctness of the generated explainability can be verified using model interpretability methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data availability

All the datasets used for training the models are publicly available and their links are provided in the reference section. The pre-processed and the generated datasets can be accessed via the link provided below: https://drive.google.com/drive/folders/1PM9m61SHcuRuI-3jMnEZXOj0fpNYxCC0?usp=sharing.

References

  1. M. D. F. ltd: Ai in healthcare market: Size, share, growth: 2022–2027,” Market Data Forecast. [Online]. https://www.marketdataforecast.com/market-reports/artificial-intelligence-in-healthcare-market. Accessed 10 Nov 2022

  2. Kong, Z., Xue, J., Wang, Y., Huang, L., Niu, Z., Li, F.: A survey on adversarial attack in the age of artificial intelligence. Wirel. Commun. Mob. Compu. (2021). https://doi.org/10.1155/2021/4907754

    Article  Google Scholar 

  3. Gragnaniello, D., Marra, F., Poggi, G., Verdoliva, L.: Analysis of adversarial attacks against CNN-based image forgery detectors. In: 2018 26th European Signal Processing Conference (EUSIPCO), Rome, Italy, pp 967–971 (2018), https://doi.org/10.23919/EUSIPCO.2018.8553560.

  4. Saleh, A., Sukaik, R., Abu-Naser, S.S.: Brain tumor classification using deep learning. In: 2020 International Conference on Assistive and Rehabilitation Technologies (iCareTech), pp. 131–136 (2020), https://doi.org/10.1109/iCareTech49914.2020.00032

  5. Soomro, T.A., et al.: Image segmentation for MR brain tumor detection using machine learning: a review. IEEE Rev. Biomed. Eng. 15, 10 (2022). https://doi.org/10.1109/RBME.2022.3185292

    Article  Google Scholar 

  6. Dhar, T., Dey, N., Borra, S., Sherratt, R.S.: Challenges of Deep Learning in Medical Image Analysis—Improving Explainability and Trust. IEEE Trans. Technol. Soc. 4(1), 68–75 (2023)

    Article  Google Scholar 

  7. van der Velden, B.H.M., Kuijf, H.J., Gilhuijs, K.G.A., Viergever, M.A.: Explainable artificial intelligence (XAI) in deep learning-based medical image analysis. Med. Image Anal., 79, 102470, ISSN 1361–8415 (2022) https://doi.org/10.1016/j.media.2022.102470. https://www.sciencedirect.com/science/article/pii/S1361841522001177

  8. Mahapatra, D.: Cyclic generative adversarial networks with congruent image-report generation for explainable medical image analysis. arXiv preprint arXiv:2211.08424 (2022)

  9. Ma, X., Niu, Y., Gu, L., Wang, Y., Zhao, Y., Bailey, J., Lu, F.: Understanding adversarial attacks on deep learning-based medical image analysis systems. Pattern Recognit. 110, 107332, ISSN 0031-3203 (2021) https://doi.org/10.1016/j.patcog.2020.107332. https://www.sciencedirect.com/science/article/pii/S0031320320301357

  10. Selvaganapathy, S., Sadasivam, S., & Raj, N.: SafeXAI: Explainable AI to detect adversarial attacks in electronic medical records. In: Intelligent Data Engineering and Analytics: Proceedings of the 9th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA 2021), pp. 501–509. Singapore: Springer Nature Singapore (2022)

  11. Eldrandaly, K.A., Abdel-Basset, M., Ibrahim, M., Abdel-Aziz, N.M.: J. Enterp. Inf. Syst. 15, 10 (2022). https://doi.org/10.1080/17517575.2022.2098537

    Article  Google Scholar 

  12. Ghorbani, A., Abid, A., Zou, J.: Interpretation of neural networks is fragile. AAAI 33(01), 3681–3688 (2019)

    Article  Google Scholar 

  13. Kindermans, P.-J., Hooker, S., Adebayo, J., Alber, M., Schütt, K., Dähne, S., Erhan, D., Kim, B.: The (Un)reliability of Saliency Methods. https://doi.org/10.1007/978-3-030-28954-6_14. (2019)

  14. Heo, J., Joo, S., Moon, T.: Fooling neural network interpretations via adversarial model manipulation. Adv. Neural Inform. Process. Syst. 32 (2019)

  15. Dombrowski, A.-K., et al.: Explanations can be manipulated and geometry is to blame. Adv. Neural Inf. Process. Syst. 32 (2019)

  16. Zhang, X., et al.: Interpretable deep learning under fire. In: 29th {USENIX} Security Symposium ({USENIX} Security 20). (2020)

  17. Woods, W., Chen, J., Teuscher, C.: Adversarial explanations for understanding image classification decisions and improved neural network robustness. Nat. Mach. Intell. 1(11), 508–516 (2019)

    Article  Google Scholar 

  18. Chen, J., et al.: Robust attribution regularization. Adv Neural Inform Process Syst. 32 (2019)

  19. Rieger L., Hansen, L.K.: A simple defense against adversarial attacks on heatmap explanations. arXiv preprint arXiv:2007.06381 (2020)

  20. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial arXiv:1412.6572 (2014)

  21. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. CoRR, abs/1706.06083. https://arxiv.org/abs/1706.06083 (2017)

  22. Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582, (2016)

  23. Hart, S.: Shapley value. In: Eatwell, J., Milgate, M., Newman, P. (eds.) Game Theory, pp. 210–216. Palgrave Macmillan UK (1989)

    Chapter  Google Scholar 

  24. Sundararajan, M., Najmi, A.: The many Shapley values for model explanation. In International conference on machine learning, pp. 9269–9278. In: PMLR (2020)

  25. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.:. Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626. (2017)

  26. Nickparvar, M.: Brain Tumor MRI Dataset . Kaggle. https://doi.org/10.34740/KAGGLE/DSV/2645886 (2021)

  27. Nicolae, M. I., Sinn, M., Tran, M. N., Buesser, B., Rawat, A., Wistuba, M., Edwards, B.: Adversarial robustness toolbox v1. 0.0. arXiv preprint arXiv:1807.01069. (2018)

Download references

Acknowledgements

This work was supported by Symbiosis Centre for Applied Artificial Intelligence (SCAAI) and Symbiosis International University (SIU) under its research support fund.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, AN, IP, JR, JS, NA, RW, SP; methodology, AN, IP, JR, JS, NA; software, AN, IP, JR, JS, NA; software, AN, IP, JR, JS, NA; validation, RW, SP; investigation, RW, SP; data curation, AN, IP, JR, JS, NA; writing—original draft preparation, AN, IP, JR, JS, NA; writing—review and editing, RW, SP, K K; Visualization, AN, IP, JR, JS, NA; supervision, RW; project administration, RW; funding acquisition, KK All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Rahee Walambe.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix: Confusion matrix of brain tumor classifier on benign and adversarial test samples before and after adversarial training

See Tables 20, 21, 22, 23.

Table 20 Confusion matrix of brain tumor classifier on benign test samples before adversarial training
Table 21 Confusion matrix of brain tumor classifier on adversarial test samples before adversarial training
Table 22 Confusion matrix of robust brain tumor classifier on benign test samples after adversarial training
Table 23 Confusion matrix of robust brain tumor classifier on adversarial test samples after adversarial training

Confusion matrix

See Tables 24, 25.

Table 24 Confusion matrix (attack detector)
Table 25 Confusion matrix (attack classifier)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Agrawal, N., Pendharkar, I., Shroff, J. et al. A-XAI: adversarial machine learning for trustable explainability. AI Ethics (2024). https://doi.org/10.1007/s43681-023-00368-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s43681-023-00368-4

Keywords

Navigation