Abstract
With the recent advancements in the usage of Artificial Intelligence (AI)-based systems in the healthcare and medical domain, it has become necessary to monitor whether these systems make predictions using the correct features or not. For this purpose, many different types of model interpretability and explainability methods are proposed in the literature. However, with the rising number of adversarial attacks against these AI-based systems, it also becomes necessary to make those systems more robust to adversarial attacks and validate the correctness of the generated model explainability. In this work, we first demonstrate how an adversarial attack can affect the model explainability even after robust training. Along with this, we present two different types of attack classifiers: one that can detect whether the given input is benign or adversarial and the other classifier that can identify the type of attack. We also identify the regions affected by the adversarial attack using model explainability. Finally, we demonstrate how the correctness of the generated explainability can be verified using model interpretability methods.
Similar content being viewed by others
Data availability
All the datasets used for training the models are publicly available and their links are provided in the reference section. The pre-processed and the generated datasets can be accessed via the link provided below: https://drive.google.com/drive/folders/1PM9m61SHcuRuI-3jMnEZXOj0fpNYxCC0?usp=sharing.
References
M. D. F. ltd: Ai in healthcare market: Size, share, growth: 2022–2027,” Market Data Forecast. [Online]. https://www.marketdataforecast.com/market-reports/artificial-intelligence-in-healthcare-market. Accessed 10 Nov 2022
Kong, Z., Xue, J., Wang, Y., Huang, L., Niu, Z., Li, F.: A survey on adversarial attack in the age of artificial intelligence. Wirel. Commun. Mob. Compu. (2021). https://doi.org/10.1155/2021/4907754
Gragnaniello, D., Marra, F., Poggi, G., Verdoliva, L.: Analysis of adversarial attacks against CNN-based image forgery detectors. In: 2018 26th European Signal Processing Conference (EUSIPCO), Rome, Italy, pp 967–971 (2018), https://doi.org/10.23919/EUSIPCO.2018.8553560.
Saleh, A., Sukaik, R., Abu-Naser, S.S.: Brain tumor classification using deep learning. In: 2020 International Conference on Assistive and Rehabilitation Technologies (iCareTech), pp. 131–136 (2020), https://doi.org/10.1109/iCareTech49914.2020.00032
Soomro, T.A., et al.: Image segmentation for MR brain tumor detection using machine learning: a review. IEEE Rev. Biomed. Eng. 15, 10 (2022). https://doi.org/10.1109/RBME.2022.3185292
Dhar, T., Dey, N., Borra, S., Sherratt, R.S.: Challenges of Deep Learning in Medical Image Analysis—Improving Explainability and Trust. IEEE Trans. Technol. Soc. 4(1), 68–75 (2023)
van der Velden, B.H.M., Kuijf, H.J., Gilhuijs, K.G.A., Viergever, M.A.: Explainable artificial intelligence (XAI) in deep learning-based medical image analysis. Med. Image Anal., 79, 102470, ISSN 1361–8415 (2022) https://doi.org/10.1016/j.media.2022.102470. https://www.sciencedirect.com/science/article/pii/S1361841522001177
Mahapatra, D.: Cyclic generative adversarial networks with congruent image-report generation for explainable medical image analysis. arXiv preprint arXiv:2211.08424 (2022)
Ma, X., Niu, Y., Gu, L., Wang, Y., Zhao, Y., Bailey, J., Lu, F.: Understanding adversarial attacks on deep learning-based medical image analysis systems. Pattern Recognit. 110, 107332, ISSN 0031-3203 (2021) https://doi.org/10.1016/j.patcog.2020.107332. https://www.sciencedirect.com/science/article/pii/S0031320320301357
Selvaganapathy, S., Sadasivam, S., & Raj, N.: SafeXAI: Explainable AI to detect adversarial attacks in electronic medical records. In: Intelligent Data Engineering and Analytics: Proceedings of the 9th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA 2021), pp. 501–509. Singapore: Springer Nature Singapore (2022)
Eldrandaly, K.A., Abdel-Basset, M., Ibrahim, M., Abdel-Aziz, N.M.: J. Enterp. Inf. Syst. 15, 10 (2022). https://doi.org/10.1080/17517575.2022.2098537
Ghorbani, A., Abid, A., Zou, J.: Interpretation of neural networks is fragile. AAAI 33(01), 3681–3688 (2019)
Kindermans, P.-J., Hooker, S., Adebayo, J., Alber, M., Schütt, K., Dähne, S., Erhan, D., Kim, B.: The (Un)reliability of Saliency Methods. https://doi.org/10.1007/978-3-030-28954-6_14. (2019)
Heo, J., Joo, S., Moon, T.: Fooling neural network interpretations via adversarial model manipulation. Adv. Neural Inform. Process. Syst. 32 (2019)
Dombrowski, A.-K., et al.: Explanations can be manipulated and geometry is to blame. Adv. Neural Inf. Process. Syst. 32 (2019)
Zhang, X., et al.: Interpretable deep learning under fire. In: 29th {USENIX} Security Symposium ({USENIX} Security 20). (2020)
Woods, W., Chen, J., Teuscher, C.: Adversarial explanations for understanding image classification decisions and improved neural network robustness. Nat. Mach. Intell. 1(11), 508–516 (2019)
Chen, J., et al.: Robust attribution regularization. Adv Neural Inform Process Syst. 32 (2019)
Rieger L., Hansen, L.K.: A simple defense against adversarial attacks on heatmap explanations. arXiv preprint arXiv:2007.06381 (2020)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial arXiv:1412.6572 (2014)
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. CoRR, abs/1706.06083. https://arxiv.org/abs/1706.06083 (2017)
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582, (2016)
Hart, S.: Shapley value. In: Eatwell, J., Milgate, M., Newman, P. (eds.) Game Theory, pp. 210–216. Palgrave Macmillan UK (1989)
Sundararajan, M., Najmi, A.: The many Shapley values for model explanation. In International conference on machine learning, pp. 9269–9278. In: PMLR (2020)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.:. Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626. (2017)
Nickparvar, M.: Brain Tumor MRI Dataset . Kaggle. https://doi.org/10.34740/KAGGLE/DSV/2645886 (2021)
Nicolae, M. I., Sinn, M., Tran, M. N., Buesser, B., Rawat, A., Wistuba, M., Edwards, B.: Adversarial robustness toolbox v1. 0.0. arXiv preprint arXiv:1807.01069. (2018)
Acknowledgements
This work was supported by Symbiosis Centre for Applied Artificial Intelligence (SCAAI) and Symbiosis International University (SIU) under its research support fund.
Author information
Authors and Affiliations
Contributions
Conceptualization, AN, IP, JR, JS, NA, RW, SP; methodology, AN, IP, JR, JS, NA; software, AN, IP, JR, JS, NA; software, AN, IP, JR, JS, NA; validation, RW, SP; investigation, RW, SP; data curation, AN, IP, JR, JS, NA; writing—original draft preparation, AN, IP, JR, JS, NA; writing—review and editing, RW, SP, K K; Visualization, AN, IP, JR, JS, NA; supervision, RW; project administration, RW; funding acquisition, KK All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Agrawal, N., Pendharkar, I., Shroff, J. et al. A-XAI: adversarial machine learning for trustable explainability. AI Ethics (2024). https://doi.org/10.1007/s43681-023-00368-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s43681-023-00368-4