Skip to main content
Log in

Visual explanation and robustness assessment optimization of saliency maps for image classification

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

For image classification using Deep Learning, applying visual explanations allows end-users to understand better the basis of model decisions in the inference process. Our method optimizes the black-box visual explanation called Randomized Input Sampling for Explanation (RISE) by proposing the concept of Decisive Saliency Map (DSM) and the corresponding quantitative metric. The introduction of DSM makes the discriminative salient regions more prominent and easier to understand with ignorable extra costs. Moreover, DSM efficiently correlates robustness assessment with the visual explanation via saliency value distribution. It provides a reference indicator for the reliability and robustness assessment of the model predictions, complementing the common-used Softmax confidence score. Experiments demonstrate that the utilization of DSM and the related quantitative metric can improve the visualization of mainstream CNN models, and differentiate the concrete importance of confusingly similar salient regions. By quantitatively assessing the robustness of the inference process, DSM identifies the potential misclassification risk of high-performance CNN models accurately.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig.1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Yang, T., Zhang, T., Huang, L.: Detection of defects in voltage-dependent resistors using stacked-block-based convolutional neural networks. Vis. Comput. 37, 1559–1567 (2021). https://doi.org/10.1007/s00371-020-01901-w

    Article  Google Scholar 

  2. Patel, N., Mukherjee, S., Ying, L.: EREL-Net: A remedy for industrial bottle defect detection. International Conference on Software Maintenance. Lecture Notes in Computer Science, vol 11010. Springer, Cham. (2018). https://doi.org/10.1007/978-3-030-04375-9_39

  3. Paleyes, A., Urma, R.G., Lawrence, N.D.: Challenges in deploying machine learning: a survey of case studies. NeurIPS: ML Retrospectives, Surveys & Meta-Analyses (2020). https://doi.org/10.1145/3533378

  4. Gunning, D., Aha, D.: DARPA’s explainable artificial intelligence (XAI) program. AI Magazine, vol. 40, no. 2 (2019). https://doi.org/10.1609/aimag.v40i2.2850

  5. Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018). https://doi.org/10.1109/ACCESS.2018.2870052

    Article  Google Scholar 

  6. Artificial Intelligence (AI) - Assessment of the robustness of neural networks. ISO/IEC Technical Report 24029–1:2021 (2021)

  7. Martin, D., Heinzel, S., Von Bischhoffshausen, J. Kunze, Kühl, N.: Deep learning strategies for industrial surface defect detection systems. In: the Annual Hawaii International Conference on System Sciences (2022). https://doi.org/10.24251/hicss.2022.146

  8. Vermeire, T., Laugel, T., Renard, X., Martens, D., Detyniecki, M.: How to choose an explainability method? Towards a methodical implementation of XAI in practice. Communications in Computer and Information Science, (2021). https://doi.org/10.1007/978-3-030-93736-2_39

  9. Brundage, M. et al.: Toward trustworthy AI development: mechanisms for supporting verifiable claims. arXiv preprint arXiv: 2004.07213v2 (2020)

  10. Wagner, J., Köhler, J. M., Gindele, T., Hetzel, L., Wiedemer, J. T., Behnke, S.: Interpretable and fine-grained visual explanations for convolutional neural networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9089–9099 (2019). https://doi.org/10.1109/CVPR.2019.00931

  11. Ji, S., Li, J., Du, T., Li, B.: A survey on techniques, applications and security of machine learning interpretability. J. Comput. Res. Develop. 56(10), 2071–2096 (2019)

    Google Scholar 

  12. Khorram, S., Lawson, T., Li, F.: iGOS++: integrated gradient optimized saliency by bilateral perturbations. CHIL ’21: Proceedings of the Conference on Health, Inference, and Learning April, Pages 174–182. (2021). https://doi.org/10.1145/3450439.3451865

  13. Finale, D., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608v2 (2017)

  14. Springenberg, J., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806 (2014)

  15. Zeiler, M. D., Fergus, R.: Visualizing and understanding convolutional networks. In European conference on computer vision, pp. 818–833. Springer (2014)

  16. Simonyan, K., Vedaldi, A., Zisserman A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)

  17. Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., Lipson, H.: Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 (2015)

  18. Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you?: Explaining the predictions of any classifier. In Proc. of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)

  19. Fong, R. C., Vedaldi, A.: Interpretable explanations of black boxes by meaningful perturbation. 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3449–3457 (2017). https://doi.org/10.1109/ICCV.2017.371

  20. Petsiuk, V., Das, A., Saenko, K.: RISE: Randomized input sampling for explanation of black-box models. In: British Machine Vision Conference (2018)

  21. Ribeiro, M. T., Singh, S., Guestrin, C.: Anchors: high-precision model-agnostic explanations. In: AAAI Conference on Artificial Intelligence, pp 1527–1535 (2018)

  22. Barredo Arrieta, A., et al.: Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inform. Fus. 58, 82–115 (2020). https://doi.org/10.1016/j.inffus.2019.12.012

    Article  Google Scholar 

  23. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2921–2929 (2016). https://doi.org/10.1109/CVPR.2016.319

  24. Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: Visual explanations from deep networks via gradient-based localization. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017). https://doi.org/10.1109/ICCV.2017.74

  25. Wang, H. et al.: Score-CAM: score-weighted visual explanations for convolutional neural networks. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020, pp. 111–119, (2020). https://doi.org/10.1109/CVPRW50498.2020.00020.

  26. Cheng, K., Wang, N., Shi, W., Zhan, Y.: Research advances in the interpretability of deep learning. J. Comput. Res. Develop. 57, 1208 (2020). https://doi.org/10.7544/ISSN1000-1239.2020.20190485

    Article  Google Scholar 

  27. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. 51(5), 1–42 (2018). https://doi.org/10.1145/3236009

    Article  Google Scholar 

  28. Fong, R., Patrick, M., Vedaldi A.: Understanding deep networks via extremal perturbations and smooth masks. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2950–2958 (2019). https://doi.org/10.1109/ICCV.2019.00304

  29. Li, X., Shi, Y., Li, H., Bai, W., Song, Y., Cao, C., Chen, L.: An experimental study of quantitative evaluations on saliency methods. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Association for Computing Machinery, New York, NY, USA, 3200–3208 (2021). https://doi.org/10.1145/3447548.3467148

  30. Keller, P.R., Keller, M.M.: Visual cues: practical data visualization. IEEE Computer Society Press, Los Alamitos (1993)

    Google Scholar 

  31. Chen, W., Zhang, S., Lu, A., Zhao, Y.: Guide for Data Visualization (In Chinese). High Education Press (2020)

  32. Johnson, N.L.: Systems of frequency curves generated by methods of translation. Biometrika 36(1/2), 149 (1949). https://doi.org/10.2307/2332539

    Article  MathSciNet  MATH  Google Scholar 

  33. Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: 2013 IEEE International Conference on Computer Vision Workshops, pp. 554–561 (2013). https://doi.org/10.1109/ICCVW.2013.77

  34. Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. In: Proc. of International Conference on Machine Learning, pp. 6105–6114 (2019)

  35. Morales, D.A., Talavera, E., Remeseiro, B.: Playing to distraction: towards a robust training of cnn classifiers through visual explanation techniques. Neural Comput. Appl. (2020). https://doi.org/10.1007/s00521-021-06282-2

    Article  Google Scholar 

  36. Koffka, K.: Principles of Gestalt psychology. Routledge, Taylor & Francis Group, London (2013)

    Book  Google Scholar 

  37. Guo, C., Pleiss, G., Sun, Y., Weinberger, K. Q.: On calibration of modern neural networks. In: Proceedings of the 34th International Conference on Machine Learning, 70:1321–1330 (2017)

  38. Naseer, M., Ranasinghe, K., et al.: Intriguing properties of vision transformers. Neural Inform. Process. Syst. (NeurIPS 2021) 34, 23296–23308 (2021)

    Google Scholar 

  39. Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 13001–13008 (2020). https://doi.org/10.1609/aaai.v34i07.7000

  40. Yun, S., Han, D., Chun, S., Oh, S. J., Yoo, Y., Choe, J.: CutMix: regularization strategy to train strong classifiers with localizable features. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6022–6031 (2019). https://doi.org/10.1109/ICCV.2019.00612

  41. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  42. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  43. Bargal, S.A., et al.: Guided zoom: zooming into network evidence to refine fine-grained model decisions. IEEE Transactions Pattern Anal. Mach. Intell. 43(11), 4196–4202 (2021). https://doi.org/10.1109/TPAMI.2021.3054303

    Article  Google Scholar 

  44. Du, R. et al.: Fine-grained visual classification via progressive multi-granularity training of jigsaw patches. Computer Vision – ECCV 2020. Lecture Notes in Computer Science, vol 12365. Springer, Cham. (2020). https://doi.org/10.1007/978-3-030-58565-5_10

  45. Pei, H., Guo, R., Tan, Z., et al.: Fine-grained classification of automobile front face modeling based on Gestalt psychology. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02506-1

    Article  Google Scholar 

  46. Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., Durand, F.: What do different evaluation metrics tell us about saliency models. IEEE Transactions Pattern Anal. Mach. Intell. 41(3), 740–757 (2019). https://doi.org/10.1109/TPAMI.2018.2815601

    Article  Google Scholar 

  47. Riche N, Duvinage M, Mancas M, Gosselin B, Dutoit T.: Saliency and human fixations: state-of-the-art and study of comparison metrics. In IEEE International Conference on Computer Vision, pp. 1153–1160 (2013). https://doi.org/10.1109/ICCV.2013.147

  48. Emami, M., Hoberock, L.L.: Selection of a best metric and evaluation of bottom-up visual saliency models. Image Vis. Comput. 31(10), 796–808 (2013). https://doi.org/10.1016/j.imavis.2013.08.004

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoshun Xu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Metrics to evaluate class sensitivity

Appendix: Metrics to evaluate class sensitivity

The (dis)similarity metrics of saliency maps for evaluating Class Sensitivity are listed below.

Saliency maps and related explanations of the classes with the highest and lowest scores can be defined as:

$${c}_{max},{c}_{min}=\mathrm{arg}maxf\left(I\right),\mathrm{ arg}minf(I)$$
$${SM}_{max}, {SM}_{min}=E(I,f{)}_{{c}_{max}}, E(I,f{)}_{{c}_{min}}$$
(11)

The saliency maps \({SM}_{max}, {SM}_{min}\) are normalized as required in SIM, KL, and NSS calculations. Then top classes are set as ground truth in the calculation.

In KL computation, \(\epsilon \) is a regularization constant, with the value of 2.2204e-16 in usual. We binarize the top-class saliency map as \({{SM}_{max}}_{i}^{B}\) in NSS with its mean saliency value as the threshold.

$$SIM=\sum_{x=1}^{X} min({SM}_{min}, {SM}_{max})$$
(12)
$$KL=\sum_{x=1}^{X} {SM}_{min}*\mathrm{log}\left(\frac{{SM}_{min}}{{SM}_{max}+\epsilon }+\epsilon \right)$$
$$NKL=1-KL$$
(13)
$$CC=\frac{\mathrm{cov}({SM}_{min},{SM}_{max})}{{\sigma }_{{SM}_{min}}*{\sigma }_{{SM}_{max}}}$$
(14)
$$NSS\left({SM}_{min},{S{M}_{max}}_{i}^{B}\right)\!=\!\frac{1}{N}\sum_{i} {SM}_{min}\times\! {S{M}_{max}}_{i}^{B}$$

\(\text{where }\)

$$N=\sum_{i} {{SM}_{max}}_{i}^{B}$$
(15)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, X., Mo, J. Visual explanation and robustness assessment optimization of saliency maps for image classification. Vis Comput 39, 6097–6113 (2023). https://doi.org/10.1007/s00371-022-02715-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02715-8

Keywords

Navigation