Abstract
For image classification using Deep Learning, applying visual explanations allows end-users to understand better the basis of model decisions in the inference process. Our method optimizes the black-box visual explanation called Randomized Input Sampling for Explanation (RISE) by proposing the concept of Decisive Saliency Map (DSM) and the corresponding quantitative metric. The introduction of DSM makes the discriminative salient regions more prominent and easier to understand with ignorable extra costs. Moreover, DSM efficiently correlates robustness assessment with the visual explanation via saliency value distribution. It provides a reference indicator for the reliability and robustness assessment of the model predictions, complementing the common-used Softmax confidence score. Experiments demonstrate that the utilization of DSM and the related quantitative metric can improve the visualization of mainstream CNN models, and differentiate the concrete importance of confusingly similar salient regions. By quantitatively assessing the robustness of the inference process, DSM identifies the potential misclassification risk of high-performance CNN models accurately.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-022-02715-8/MediaObjects/371_2022_2715_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-022-02715-8/MediaObjects/371_2022_2715_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-022-02715-8/MediaObjects/371_2022_2715_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-022-02715-8/MediaObjects/371_2022_2715_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-022-02715-8/MediaObjects/371_2022_2715_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-022-02715-8/MediaObjects/371_2022_2715_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-022-02715-8/MediaObjects/371_2022_2715_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-022-02715-8/MediaObjects/371_2022_2715_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-022-02715-8/MediaObjects/371_2022_2715_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-022-02715-8/MediaObjects/371_2022_2715_Fig10_HTML.png)
Similar content being viewed by others
References
Yang, T., Zhang, T., Huang, L.: Detection of defects in voltage-dependent resistors using stacked-block-based convolutional neural networks. Vis. Comput. 37, 1559–1567 (2021). https://doi.org/10.1007/s00371-020-01901-w
Patel, N., Mukherjee, S., Ying, L.: EREL-Net: A remedy for industrial bottle defect detection. International Conference on Software Maintenance. Lecture Notes in Computer Science, vol 11010. Springer, Cham. (2018). https://doi.org/10.1007/978-3-030-04375-9_39
Paleyes, A., Urma, R.G., Lawrence, N.D.: Challenges in deploying machine learning: a survey of case studies. NeurIPS: ML Retrospectives, Surveys & Meta-Analyses (2020). https://doi.org/10.1145/3533378
Gunning, D., Aha, D.: DARPA’s explainable artificial intelligence (XAI) program. AI Magazine, vol. 40, no. 2 (2019). https://doi.org/10.1609/aimag.v40i2.2850
Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018). https://doi.org/10.1109/ACCESS.2018.2870052
Artificial Intelligence (AI) - Assessment of the robustness of neural networks. ISO/IEC Technical Report 24029–1:2021 (2021)
Martin, D., Heinzel, S., Von Bischhoffshausen, J. Kunze, Kühl, N.: Deep learning strategies for industrial surface defect detection systems. In: the Annual Hawaii International Conference on System Sciences (2022). https://doi.org/10.24251/hicss.2022.146
Vermeire, T., Laugel, T., Renard, X., Martens, D., Detyniecki, M.: How to choose an explainability method? Towards a methodical implementation of XAI in practice. Communications in Computer and Information Science, (2021). https://doi.org/10.1007/978-3-030-93736-2_39
Brundage, M. et al.: Toward trustworthy AI development: mechanisms for supporting verifiable claims. arXiv preprint arXiv: 2004.07213v2 (2020)
Wagner, J., Köhler, J. M., Gindele, T., Hetzel, L., Wiedemer, J. T., Behnke, S.: Interpretable and fine-grained visual explanations for convolutional neural networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9089–9099 (2019). https://doi.org/10.1109/CVPR.2019.00931
Ji, S., Li, J., Du, T., Li, B.: A survey on techniques, applications and security of machine learning interpretability. J. Comput. Res. Develop. 56(10), 2071–2096 (2019)
Khorram, S., Lawson, T., Li, F.: iGOS++: integrated gradient optimized saliency by bilateral perturbations. CHIL ’21: Proceedings of the Conference on Health, Inference, and Learning April, Pages 174–182. (2021). https://doi.org/10.1145/3450439.3451865
Finale, D., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608v2 (2017)
Springenberg, J., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806 (2014)
Zeiler, M. D., Fergus, R.: Visualizing and understanding convolutional networks. In European conference on computer vision, pp. 818–833. Springer (2014)
Simonyan, K., Vedaldi, A., Zisserman A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)
Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., Lipson, H.: Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 (2015)
Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you?: Explaining the predictions of any classifier. In Proc. of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)
Fong, R. C., Vedaldi, A.: Interpretable explanations of black boxes by meaningful perturbation. 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3449–3457 (2017). https://doi.org/10.1109/ICCV.2017.371
Petsiuk, V., Das, A., Saenko, K.: RISE: Randomized input sampling for explanation of black-box models. In: British Machine Vision Conference (2018)
Ribeiro, M. T., Singh, S., Guestrin, C.: Anchors: high-precision model-agnostic explanations. In: AAAI Conference on Artificial Intelligence, pp 1527–1535 (2018)
Barredo Arrieta, A., et al.: Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inform. Fus. 58, 82–115 (2020). https://doi.org/10.1016/j.inffus.2019.12.012
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2921–2929 (2016). https://doi.org/10.1109/CVPR.2016.319
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: Visual explanations from deep networks via gradient-based localization. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017). https://doi.org/10.1109/ICCV.2017.74
Wang, H. et al.: Score-CAM: score-weighted visual explanations for convolutional neural networks. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020, pp. 111–119, (2020). https://doi.org/10.1109/CVPRW50498.2020.00020.
Cheng, K., Wang, N., Shi, W., Zhan, Y.: Research advances in the interpretability of deep learning. J. Comput. Res. Develop. 57, 1208 (2020). https://doi.org/10.7544/ISSN1000-1239.2020.20190485
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. 51(5), 1–42 (2018). https://doi.org/10.1145/3236009
Fong, R., Patrick, M., Vedaldi A.: Understanding deep networks via extremal perturbations and smooth masks. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2950–2958 (2019). https://doi.org/10.1109/ICCV.2019.00304
Li, X., Shi, Y., Li, H., Bai, W., Song, Y., Cao, C., Chen, L.: An experimental study of quantitative evaluations on saliency methods. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Association for Computing Machinery, New York, NY, USA, 3200–3208 (2021). https://doi.org/10.1145/3447548.3467148
Keller, P.R., Keller, M.M.: Visual cues: practical data visualization. IEEE Computer Society Press, Los Alamitos (1993)
Chen, W., Zhang, S., Lu, A., Zhao, Y.: Guide for Data Visualization (In Chinese). High Education Press (2020)
Johnson, N.L.: Systems of frequency curves generated by methods of translation. Biometrika 36(1/2), 149 (1949). https://doi.org/10.2307/2332539
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: 2013 IEEE International Conference on Computer Vision Workshops, pp. 554–561 (2013). https://doi.org/10.1109/ICCVW.2013.77
Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. In: Proc. of International Conference on Machine Learning, pp. 6105–6114 (2019)
Morales, D.A., Talavera, E., Remeseiro, B.: Playing to distraction: towards a robust training of cnn classifiers through visual explanation techniques. Neural Comput. Appl. (2020). https://doi.org/10.1007/s00521-021-06282-2
Koffka, K.: Principles of Gestalt psychology. Routledge, Taylor & Francis Group, London (2013)
Guo, C., Pleiss, G., Sun, Y., Weinberger, K. Q.: On calibration of modern neural networks. In: Proceedings of the 34th International Conference on Machine Learning, 70:1321–1330 (2017)
Naseer, M., Ranasinghe, K., et al.: Intriguing properties of vision transformers. Neural Inform. Process. Syst. (NeurIPS 2021) 34, 23296–23308 (2021)
Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 13001–13008 (2020). https://doi.org/10.1609/aaai.v34i07.7000
Yun, S., Han, D., Chun, S., Oh, S. J., Yoo, Y., Choe, J.: CutMix: regularization strategy to train strong classifiers with localizable features. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6022–6031 (2019). https://doi.org/10.1109/ICCV.2019.00612
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Bargal, S.A., et al.: Guided zoom: zooming into network evidence to refine fine-grained model decisions. IEEE Transactions Pattern Anal. Mach. Intell. 43(11), 4196–4202 (2021). https://doi.org/10.1109/TPAMI.2021.3054303
Du, R. et al.: Fine-grained visual classification via progressive multi-granularity training of jigsaw patches. Computer Vision – ECCV 2020. Lecture Notes in Computer Science, vol 12365. Springer, Cham. (2020). https://doi.org/10.1007/978-3-030-58565-5_10
Pei, H., Guo, R., Tan, Z., et al.: Fine-grained classification of automobile front face modeling based on Gestalt psychology. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02506-1
Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., Durand, F.: What do different evaluation metrics tell us about saliency models. IEEE Transactions Pattern Anal. Mach. Intell. 41(3), 740–757 (2019). https://doi.org/10.1109/TPAMI.2018.2815601
Riche N, Duvinage M, Mancas M, Gosselin B, Dutoit T.: Saliency and human fixations: state-of-the-art and study of comparison metrics. In IEEE International Conference on Computer Vision, pp. 1153–1160 (2013). https://doi.org/10.1109/ICCV.2013.147
Emami, M., Hoberock, L.L.: Selection of a best metric and evaluation of bottom-up visual saliency models. Image Vis. Comput. 31(10), 796–808 (2013). https://doi.org/10.1016/j.imavis.2013.08.004
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Metrics to evaluate class sensitivity
Appendix: Metrics to evaluate class sensitivity
The (dis)similarity metrics of saliency maps for evaluating Class Sensitivity are listed below.
Saliency maps and related explanations of the classes with the highest and lowest scores can be defined as:
The saliency maps \({SM}_{max}, {SM}_{min}\) are normalized as required in SIM, KL, and NSS calculations. Then top classes are set as ground truth in the calculation.
In KL computation, \(\epsilon \) is a regularization constant, with the value of 2.2204e-16 in usual. We binarize the top-class saliency map as \({{SM}_{max}}_{i}^{B}\) in NSS with its mean saliency value as the threshold.
\(\text{where }\)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xu, X., Mo, J. Visual explanation and robustness assessment optimization of saliency maps for image classification. Vis Comput 39, 6097–6113 (2023). https://doi.org/10.1007/s00371-022-02715-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-022-02715-8