Skip to main content

Black Box Explanation by Learning Image Exemplars in the Latent Feature Space

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11906))

Abstract

We present an approach to explain the decisions of black box models for image classification. While using the black box to label images, our explanation method exploits the latent feature space learned through an adversarial autoencoder. The proposed method first generates exemplar images in the latent feature space and learns a decision tree classifier. Then, it selects and decodes exemplars respecting local decision rules. Finally, it visualizes them in a manner that shows to the user how the exemplars can be modified to either stay within their class, or to become counter-factuals by “morphing” into another class. Since we focus on black box decision systems for image classification, the explanation obtained from the exemplars also provides a saliency map highlighting the areas of the image that contribute to its classification, and areas of the image that push it into another class. We present the results of an experimental evaluation on three datasets and two black box models. Besides providing the most useful and interpretable explanations, we show that the proposed method outperforms existing explainers in terms of fidelity, relevance, coherence, and stability.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://ec.europa.eu/justice/smedataprotect/.

  2. 2.

    In the experiments we use for the \( discriminator \) the default validity threshold 0.5 to distinguish between real and fake exemplars. This value can be increased to admit only more reliable exemplars, or decreased to speed-up the generation process.

  3. 3.

    Dataset: http://yann.lecun.com/exdb/mnist/, https://www.cs.toronto.edu/~kriz/cifar.html, https://www.kaggle.com/zalando-research/.

  4. 4.

    Black box: https://scikit-learn.org/, https://keras.io/examples/.

  5. 5.

    The encoding distribution of AAE is defined as a Gaussian distribution whose mean and variance is predicted by the encoder itself [20]. We adopted the following number of latent features k for the various datasets: mnist \(k{=}4\), fashion \(k{=}8\), cifar10 \(k{=}16\).

  6. 6.

    Github code links: https://github.com/riccotti/ABELE, https://github.com/marcotcr/lime, https://github.com/marcoancona/DeepExplain.

  7. 7.

    Criticisms are images not well-explained by prototypes with a regularized kernel function [18].

  8. 8.

    Best view in color. Black lines are not part of the explanation, they only highlight borders. We do not report explanations for cifar10 and for RF for the sake of space.

  9. 9.

    This effect is probably due to the figure segmentation performed by lime.

  10. 10.

    A decision tree for abele and a linear lasso model for lime.

  11. 11.

    These results confirm the experiments reported in [11].

  12. 12.

    The abele method achieves similar results for RF not reported due to lack of space.

  13. 13.

    The abele method achieves similar results for RF not reported due to lack of space.

  14. 14.

    As in [21], in our experiments, we use \(\epsilon {=}0.1\) for \(\mathcal {N}\) and we add salt and pepper noise.

References

  1. Bach, S., Binder, A., et al.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS One 10(7), e0130140 (2015)

    Article  Google Scholar 

  2. Bien, J., et al.: Prototype selection for interpretable classification. AOAS (2011)

    Google Scholar 

  3. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  4. Chen, C., Li, O., Barnett, A., Su, J., Rudin, C.: This looks like that: deep learning for interpretable image recognition. arXiv:1806.10574 (2018)

  5. Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv:1702.08608 (2017)

  6. Escalante, H.J., et al. (eds.): Explainable and Interpretable Models in Computer Vision and Machine Learning. TSSCML. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98131-4

    Book  Google Scholar 

  7. Fong, R.C., Vedaldi, A.: Interpretable explanations of black boxes by meaningful perturbation. In: ICCV, pp. 3429–3437 (2017)

    Google Scholar 

  8. Frixione, M., et al.: Prototypes vs exemplars in concept representation. In: KEOD (2012)

    Google Scholar 

  9. Frosst, N., et al.: Distilling a neural network into a soft decision tree. arXiv:1711.09784 (2017)

  10. Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014)

    Google Scholar 

  11. Guidotti, R., et al.: Local rule-based explanations of black box decision systems. arXiv:1805.10820 (2018)

  12. Guidotti, R., Monreale, A., Cariaggi, L.: Investigating neighborhood generation for explanations of image classifiers. In: PAKDD (2019)

    Google Scholar 

  13. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., et al.: A survey of methods for explaining black box models. ACM CSUR 51(5), 93:1–93:42 (2018)

    Google Scholar 

  14. Guidotti, R., Ruggieri, S.: On the stability of interpretable models. In: IJCNN (2019)

    Google Scholar 

  15. Hara, S., et al.: Maximally invariant data perturbation as explanation. arXiv:1806.07004 (2018)

  16. He, K., et al.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  17. Hinton, G., et al.: Distilling the knowledge in a neural network. arXiv:1503.02531 (2015)

  18. Kim, B., et al.: Examples are not enough, learn to criticize! In: NIPS (2016)

    Google Scholar 

  19. Li, O., Liu, H., Chen, C., Rudin, C.: Deep learning for case-based reasoning through prototypes: a neural network that explains its predictions. In: AAAI (2018)

    Google Scholar 

  20. Makhzani, A., Shlens, J., et al.: Adversarial autoencoders. arXiv:1511.05644 (2015)

  21. Melis, D.A., Jaakkola, T.: Towards robust interpretability with self-explaining neural networks. In: NIPS (2018)

    Google Scholar 

  22. Molnar, C.: Interpretable machine learning. LeanPub (2018)

    Google Scholar 

  23. Panigutti, C., Guidotti, R., Monreale, A., Pedreschi, D.: Explaining multi-label black-box classifiers for health applications. In: W3PHIAI (2019)

    Google Scholar 

  24. Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you?: Explaining the predictions of any classifier. In: KDD, pp. 1135–1144. ACM (2016)

    Google Scholar 

  25. Shrikumar, A., et al.: Not just a black box: learning important features through propagating activation differences. arXiv:1605.01713 (2016)

  26. Siddharth, N., Paige, B., Desmaison, A., de Meent, V., et al.: Inducing interpretable representations with variational autoencoders. arXiv:1611.07492 (2016)

  27. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv:1312.6034 (2013)

  28. Spinner, T., et al.: Towards an interpretable latent space: an intuitive comparison of autoencoders with variational autoencoders. In: IEEE VIS (2018)

    Google Scholar 

  29. Sun, K., Zhu, Z., Lin, Z.: Enhancing the robustness of deep neural networks by boundary conditional gan. arXiv:1902.11029 (2019)

  30. Sundararajan, M., et al.: Axiomatic attribution for DNN. In ICML, JMLR (2017)

    Google Scholar 

  31. van der Waa, J., et al.: Contrastive explanations with local foil trees. arXiv:1806.07470 (2018)

  32. Xie, J., et al.: Image denoising with deep neural networks. In: NIPS (2012)

    Google Scholar 

  33. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53

    Chapter  Google Scholar 

Download references

Acknowledgements

This work is partially supported by the EC H2020 programme under the funding schemes: Research Infrastructures G.A. 654024 SoBigData, G.A. 78835 Pro-Res, G.A. 825619 AI4EU and G.A. 780754 Track&Know. The third author acknowledges the support of the Natural Sciences and Engineering Research Council of Canada and of the Ocean Frontiers Institute.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Riccardo Guidotti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Guidotti, R., Monreale, A., Matwin, S., Pedreschi, D. (2020). Black Box Explanation by Learning Image Exemplars in the Latent Feature Space. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11906. Springer, Cham. https://doi.org/10.1007/978-3-030-46150-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-46150-8_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-46149-2

  • Online ISBN: 978-3-030-46150-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics