Advertisement

Why Do These Match? Explaining the Behavior of Image Similarity Models

Conference paper
  • 546 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12356)

Abstract

Explaining a deep learning model can help users understand its behavior and allow researchers to discern its shortcomings. Recent work has primarily focused on explaining models for tasks like image classification or visual question answering. In this paper, we introduce Salient Attributes for Network Explanation (SANE) to explain image similarity models, where a model’s output is a score measuring the similarity of two inputs rather than a classification score. In this task, an explanation depends on both of the input images, so standard methods do not apply. Our SANE explanations pairs a saliency map identifying important image regions with an attribute that best explains the match. We find that our explanations provide additional information not typically captured by saliency maps alone, and can also improve performance on the classic task of attribute recognition. Our approach’s ability to generalize is demonstrated on two datasets from diverse domains, Polyvore Outfits and Animals with Attributes 2. Code available at: https://github.com/VisionLearningGroup/SANE.

Keywords

Explainable AI Image similarity models Fashion compatibility Image retrieval 

Notes

Acknowledgements.

This work is funded in part by a DARPA XAI grant, NSF Grant No. 1718221, and ONR MURI Award N00014-16-1-2007.

Supplementary material

504452_1_En_38_MOESM1_ESM.pdf (8.9 mb)
Supplementary material 1 (pdf 9135 KB)

References

  1. 1.
    Ak, K.E., Kassim, A.A., Lim, J.H., Tham, J.Y.: Learning attribute representations with localization for flexible fashion search. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  2. 2.
    Bansal, A., Sikka, K., Sharma, G., Chellappa, R., Divakaran, A.: Zero-shot object detection. In: The European Conference on Computer Vision (ECCV) (2018)Google Scholar
  3. 3.
    Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Network dissection: Quantifying interpretability of deep visual representations. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  4. 4.
    Cao, C., et al.: Look and think twice: Capturing top-down visual attention with feedback convolutional neural networks. In: The IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  5. 5.
    Chang, C.H., Creager, E., Goldenberg, A., Duvenaud, D.: Explaining image classifiers by counterfactual generation. In: The International Conference on Learning Representations (2019)Google Scholar
  6. 6.
    Fong, R., Vedaldi, A.: Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  7. 7.
    Fong, R.C., Vedaldi, A.: Interpretable explanations of black boxes by meaningful perturbation. In: The IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  8. 8.
    Gordo, A., Larlus, D.: Beyond instance-level image retrieval: Leveraging captions to learn a global visual representation for semantic retrieval. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  9. 9.
    Han, X., Song, X., Yin, J., Wang, Y., Nie, L.: Prototype-guided attribute-wise interpretable scheme for clothing matching. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (2019)Google Scholar
  10. 10.
    Han, X., et al.: Automatic spatially-aware fashion concept discovery. In: The IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  11. 11.
    Han, X., Wu, Z., Jiang, Y.G., Davis, L.S.: Learning fashion compatibility with bidirectional LSTMs. In: ACM International Conference on Multimedia (2017)Google Scholar
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  13. 13.
    Hendricks, L.A., Akata, Z., Rohrbach, M., Donahue, J., Schiele, B., Darrell, T.: Generating visual explanations. In: The European Conference on Computer Vision (ECCV) (2016)Google Scholar
  14. 14.
    Hendricks, L.A., Hu, R., Darrell, T., Akata, Z.: Grounding visual explanations. In: The European Conference on Computer Vision (ECCV) (2018)Google Scholar
  15. 15.
    Hsiao, W.L., Grauman, K.: Creating capsule wardrobes from fashion images. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  16. 16.
    Huber, P.J.: Robust estimation of a location parameter. Ann. Stat. 53(1), 73–101 (1964)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Huk Park, D., et al.: Multimodal explanations: Justifying decisions and pointing to the evidence. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  18. 18.
    Kiapour, M.H., Han, X., Lazebnik, S., Berg, A.C., Berg, T.L.: Where to buy it: matching street clothing photos to online shops. In: The IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  19. 19.
    Kim, B., et al.: Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In: The International Conference on Machine Learning (ICML) (2018)Google Scholar
  20. 20.
    Kim, J., Rohrbach, A., Darrell, T., Canny, J., Akata, Z.: Textual explanations for self-driving vehicles. In: The European Conference on Computer Vision (ECCV) (2018)Google Scholar
  21. 21.
    Lad, S., Parikh, D.: Interactively guiding semi-supervised clustering via attribute-based explanations. In: The European Conference on Computer Vision (ECCV) (2014)Google Scholar
  22. 22.
    Li, Q., Tao, Q., Joty, S., Cai, J., Luo, J.: VQA-E explaining, elaborating, and enhancing your answers for visual questions. In: The European Conference on Computer Vision (ECCV) (2018)Google Scholar
  23. 23.
    Li, Y., Zhang, J., Zhang, J., Huang, K.: Discriminative learning of latent features for zero-shot recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  24. 24.
    Liao, L., He, X., Zhao, B., Ngo, C.W., Chua, T.S.: Interpretable multimodal retrieval for fashion products. In: Proceedings of the 26th ACM International Conference on Multimedia (2018)Google Scholar
  25. 25.
    Nguyen, A., Dosovitskiy, A., Yosinski, J., Brox, T., Clune, J.: Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. In: Advances in Neural Information Processing Systems (NIPS) (2016)Google Scholar
  26. 26.
    Petsiuk, V., Das, A., Saenko, K.: Rise: Randomized input sampling for explanation of black-box models. In: British Machine Vision Conference (BMVC) (2018)Google Scholar
  27. 27.
    Radenović, F., Iscen, A., Tolias, G., Avrithis, Y., Chum, O.: Revisiting oxford and Paris: large-scale image retrieval benchmarking. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  28. 28.
    Ribeiro, M.T., Singh, S., Guestrin, C.: “why should i trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016)Google Scholar
  29. 29.
    Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: The IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  30. 30.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: ICLR Workshop (2014)Google Scholar
  31. 31.
    Tan, R., Vasileva, M.I., Saenko, K., Plummer, B.A.: Learning similarity conditions without explicit supervision. In: The IEEE International Conference on Computer Vision (ICCV) (2019)Google Scholar
  32. 32.
    Teach, R.L., Shortliffe, E.H.: An analysis of physician attitudes regarding computer-based clinical consultation systems. Comput. Biomed. Res. 14(6), 542–558 (1981)CrossRefGoogle Scholar
  33. 33.
    Vasileva, M.I., Plummer, B.A., Dusad, K., Rajpal, S., Kumar, R., Forsyth, D.: Learning type-aware embeddings for fashion compatibility. In: The European Conference on Computer Vision (ECCV) (2018)Google Scholar
  34. 34.
    Vittayakorn, S., Umeda, T., Murasaki, K., Sudo, K., Okatani, T., Yamaguchi, K.: Automatic attribute discovery with neural activations. In: The European Conference on Computer Vision (ECCV) (2016)Google Scholar
  35. 35.
    Wang, X., Ye, Y., Gupta, A.: Zero-shot recognition via semantic embeddings and knowledge graphs. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018) Google Scholar
  36. 36.
    Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2251–2265 (2019).  https://doi.org/10.1109/TPAMI.2018.2857768CrossRefGoogle Scholar
  37. 37.
    Yang, X., et al.: Interpretable fashion matching with rich attributes. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (2019)Google Scholar
  38. 38.
    Yelamarthi, S.K., Reddy, S.K., Mishra, A., Mittal, A.: A zero-shot framework for sketch based image retrieval. In: The European Conference on Computer Vision (ECCV) (2018)Google Scholar
  39. 39.
    Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., Lipson, H.: Understanding neural networks through deep visualization. In: Deep Learning Workshop, International Conference on Machine Learning (ICML) (2015)Google Scholar
  40. 40.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: The European Conference on Computer Vision (ECCV) (2014)Google Scholar
  41. 41.
    Zhang, J., Lin, Z., Brandt, J., Shen, X., Sclaroff, S.: Top-down neural attention by excitation backprop. In: The European Conference on Computer Vision (ECCV) (2016)Google Scholar
  42. 42.
    Zhao, B., Feng, J., Wu, X., Yan, S.: Memory-augmented attribute manipulation networks for interactive fashion search. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  43. 43.
    Zhou, B., Khosla, A., A., L., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Boston UniversityBostonUSA
  2. 2.University of Illinois at Urbana-ChampaignUrbanaUSA
  3. 3.MIT-IBM Watson AI LabCambridgeUSA

Personalised recommendations