A Generic Visualization Approach for Convolutional Neural Networks

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12362)


Retrieval networks are essential for searching and indexing. Compared to classification networks, attention visualization for retrieval networks is hardly studied. We formulate attention visualization as a constrained optimization problem. We leverage the unit L2-Norm constraint as an attention filter (L2-CAF) to localize attention in both classification and retrieval networks. Unlike recent literature, our approach requires neither architectural changes nor fine-tuning. Thus, a pre-trained network’s performance is never undermined.

L2-CAF is quantitatively evaluated using weakly supervised object localization. State-of-the-art results are achieved on classification networks. For retrieval networks, significant improvement margins are achieved over a Grad-CAM baseline. Qualitative evaluation demonstrates how the L2-CAF visualizes attention per frame for a recurrent retrieval network. Further ablation studies highlight the computational cost of our approach and compare L2-CAF with other feasible alternatives. Code available at



This work was partially funded by independent grants from Office of Naval Research (N000141612713) and Facebook AI.

Supplementary material

504472_1_En_43_MOESM1_ESM.pdf (2.8 mb)
Supplementary material 1 (pdf 2906 KB)

Supplementary material 2 (m4v 11836 KB)


  1. 1.
  2. 2.
    Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B.: Sanity checks for saliency maps. In: NIPS (2018)Google Scholar
  3. 3.
    Bucher, M., Herbin, S., Jurie, F.: Improving semantic embedding consistency by metric learning for zero-shot classiffication. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. Lecture Notes in Computer Science, vol. 9909, pp. 730–746. Springer, Cham (2016). Scholar
  4. 4.
    Cao, Y., Long, M., Wang, J., Zhu, H., Wen, Q.: Deep quantization network for efficient image retrieval. In: AAAI (2016)Google Scholar
  5. 5.
    Chen, C., Li, O., Tao, C., Barnett, A.J., Su, J., Rudin, C.: This looks like that: deep learning for interpretable image recognition. In: NeurIPS (2018)Google Scholar
  6. 6.
    Chen, W., Chen, X., Zhang, J., Huang, K.: Beyond triplet loss: a deep quadruplet network for person re-identification. In: CVPR (2017)Google Scholar
  7. 7.
    Choe, J., Shim, H.: Attention-based dropout layer for weakly supervised object localization. In: CVPR (2019)Google Scholar
  8. 8.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)Google Scholar
  9. 9.
    Eghbali, S., Tahvildari, L.: Deep spherical quantization for image search. In: CVPR (2019)Google Scholar
  10. 10.
    Fong, R., Patrick, M., Vedaldi, A.: Understanding deep networks via extremal perturbations and smooth masks. In: ICCV (2019)Google Scholar
  11. 11.
    Fong, R.C., Vedaldi, A.: Interpretable explanations of black boxes by meaningful perturbation. In: ICCV (2017)Google Scholar
  12. 12.
    Funahashi, K.i., Nakamura, Y.: Approximation of dynamical systems by continuous time recurrent neural networks. Neural Netw. 16, 801–806 (1993)Google Scholar
  13. 13.
    Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014)Google Scholar
  14. 14.
    He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). Scholar
  15. 15.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)CrossRefGoogle Scholar
  16. 16.
    Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
  17. 17.
    Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017)Google Scholar
  18. 18.
    Kazemi, V., Elqursh, A.: Show, ask, attend, and answer: a strong baseline for visual question answering. arXiv preprint arXiv:1704.03162 (2017)
  19. 19.
    Kim, S., Seo, M., Laptev, I., Cho, M., Kwak, S.: Deep metric learning beyond binary supervision. In: CVPR (2019)Google Scholar
  20. 20.
    Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114 (2013)
  21. 21.
    Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: ICCVW (2013)Google Scholar
  22. 22.
    Li, J., Lo, P., Taha, A., Wu, H., Zhao, T.: Segmentation of renal structures for image-guided surgery. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 454–462. Springer, Cham (2018). Scholar
  23. 23.
    Li, Q., Sun, Z., He, R., Tan, T.: Deep supervised discrete hashing. In: NIPS (2017)Google Scholar
  24. 24.
    Long, F., Yao, T., Qiu, Z., Tian, X., Luo, J., Mei, T.: Gaussian temporal awareness networks for action localization. In: CVPR (2019)Google Scholar
  25. 25.
    Oh Song, H., Jegelka, S., Rathod, V., Murphy, K.: Deep metric learning via facility location. In: CVPR (2017)Google Scholar
  26. 26.
    Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: CVPR (2016)Google Scholar
  27. 27.
    Petsiuk, V., Das, A., Saenko, K.: Rise: randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421 (2018)
  28. 28.
    Piergiovanni, A., Ryoo, M.S.: Learning latent super-events to detect multiple activities in videos. In: CVPR (2018)Google Scholar
  29. 29.
    Ramanishka, V., Chen, Y.T., Misu, T., Saenko, K.: Toward driving scene understanding: a dataset for learning driver behavior and causal reasoning. In: CVPR (2018)Google Scholar
  30. 30.
    Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. PMLR (2014)Google Scholar
  31. 31.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). Scholar
  32. 32.
    Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: ICCV (2017)Google Scholar
  33. 33.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  34. 34.
    Singh, K.K., Lee, Y.J.: Hide-and-seek: forcing a network to be meticulous for weakly-supervised object and action localization. In: ICCV (2017)Google Scholar
  35. 35.
    Smilkov, D., Thorat, N., Kim, B., Viégas, F., Wattenberg, M.: SmoothGrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017)
  36. 36.
    Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: NIPS (2016)Google Scholar
  37. 37.
    Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)Google Scholar
  38. 38.
    Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)Google Scholar
  39. 39.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-UCSD birds-200-2011 dataset (2011)Google Scholar
  40. 40.
    Wang, J., Zhou, F., Wen, S., Liu, X., Lin, Y.: Deep metric learning with angular loss. In: ICCV (2017)Google Scholar
  41. 41.
    Wu, C.Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: CVPR (2017)Google Scholar
  42. 42.
    Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044 (2015)
  43. 43.
    Yuan, Y., Yang, K., Zhang, C.: Hard-aware deeply cascaded embedding. In: ICCV (2017)Google Scholar
  44. 44.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). Scholar
  45. 45.
    Zhang, X., Wei, Y., Feng, J., Yang, Y., Huang, T.: Adversarial complementary learning for weakly supervised object localization. In: CVPR (2018)Google Scholar
  46. 46.
    Zhang, X., Wei, Y., Kang, G., Yang, Y., Huang, T.: Self-produced guidance for weakly-supervised object localization. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 610–625. Springer, Cham (2018). Scholar
  47. 47.
    Zhang, Z., Saligrama, V.: Zero-shot learning via joint latent similarity embedding. In: CVPR (2016)Google Scholar
  48. 48.
    Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.University of MarylandCollege ParkUSA

Personalised recommendations