Skip to main content

Fixing Localization Errors to Improve Image Classification

Part of the Lecture Notes in Computer Science book series (LNIP,volume 12370)

Abstract

Deep neural networks are generally considered black-box models that offer less interpretability for their decision process. To address this limitation, Class Activation Map (CAM) provides an attractive solution that visualizes class-specific discriminative regions in an input image. The remarkable ability of CAMs to locate class discriminating regions has been exploited in weakly-supervised segmentation and localization tasks. In this work, we explore a new direction towards the possible use of CAM in deep network learning process. We note that such visualizations lend insights into the workings of deep CNNs and could be leveraged to introduce additional constraints during the learning stage. Specifically, the CAMs for negative classes (negative CAMs) often have false activations even though those classes are absent from an image. Thereby, we propose a loss function that seeks to minimize peaks within the negative CAMs, called ‘Homogeneous Negative CAM’ loss. This way, in an effort to fix localization errors, our loss provides an extra supervisory signal that helps the model to better discriminate between similar classes. Our designed loss function is easy to implement and can be readily integrated into existing DNNs. We evaluate it on a number of classification tasks including large-scale recognition, multi-label classification and fine-grained recognition. Our loss provides better performance compared to other loss functions across the studied tasks. Additionally, we show that the proposed loss function provides higher robustness against adversarial attacks and noisy labels.

G. Sun and S. Khan—Equal contribution.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Ahn, J., Cho, S., Kwak, S.: Weakly supervised learning of instance segmentation with inter-pixel relations. In: CVPR (2019)

    Google Scholar 

  2. Chen, H.Y., et al.: Improving adversarial robustness via guided complement entropy. In: ICCV (2019)

    Google Scholar 

  3. Choe, J., Oh, S.J., Lee, S., Chun, S., Akata, Z., Shim, H.: Evaluating weakly supervised object localization methods right. In: CVPR (2020)

    Google Scholar 

  4. Cholakkal, H., Sun, G., Khan, F.S., Shao, L.: Object counting and instance segmentation with image-level supervision. In: CVPR (2019)

    Google Scholar 

  5. Cui, Y., Jia, M., Lin, T.Y., Song, Y., Belongie, S.: Class-balanced loss based on effective number of samples. In: CVPR (2019)

    Google Scholar 

  6. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)

    Google Scholar 

  7. Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: additive angular margin loss for deep face recognition. In: CVPR (2019)

    Google Scholar 

  8. Dong, Q., Gong, S., Zhu, X.: Class rectification hard mining for imbalanced deep learning. In: ICCV (2017)

    Google Scholar 

  9. Dosovitskiy, A., Brox, T.: Inverting visual representations with convolutional networks. In: CVPR (2016)

    Google Scholar 

  10. Dubey, A., Gupta, O., Raskar, R., Naik, N.: Maximum-entropy fine grained classification. In: NeurIPS (2018)

    Google Scholar 

  11. Erhan, D., Bengio, Y., Courville, A., Vincent, P.: Visualizing higher-layer features of a deep network. Univ. Montreal 1341(3), 1 (2009)

    Google Scholar 

  12. Gong, Y., Jia, Y., Leung, T.K., Toshev, A., Ioffe, S.: Deep convolutional ranking for multi label image annotation. In: ICLR (2014)

    Google Scholar 

  13. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT press, Cambridge (2016)

    MATH  Google Scholar 

  14. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2015)

    Google Scholar 

  15. Guo, H., Zheng, K., Fan, X., Yu, H., Wang, S.: Visual attention consistency under image transforms for multi-label image classification. In: CVPR (2019)

    Google Scholar 

  16. Hayat, M., Khan, S., Zamir, S.W., Shen, J., Shao, L.: Gaussian affinity for max-margin class imbalanced learning. In: ICCV (2019)

    Google Scholar 

  17. Huang, C., Li, Y., Change Loy, C., Tang, X.: Learning deep representation for imbalanced classification. In: CVPR (2016)

    Google Scholar 

  18. Khan, S.H., Hayat, M., Bennamoun, M., Sohel, F.A., Togneri, R.: Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans. Neural Netw. Learn. Syst. 29(8), 3573–3587 (2018)

    CrossRef  Google Scholar 

  19. Khan, S., Rahmani, H., Shah, S.A.A., Bennamoun, M.: A guide to convolutional neural networks for computer vision. Synth. Lect. Comput. Vis. 8(1), 1–207 (2018)

    CrossRef  Google Scholar 

  20. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Technical report, Citeseer (2009)

    Google Scholar 

  21. Kun, Y., Jianxin, W.: Probabilistic end-to-end noise correction for learning with noisy labels. In: CVPR (2019)

    Google Scholar 

  22. Li, D., Chen, X., Huang, K.: Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In: ACPR (2015)

    Google Scholar 

  23. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)

    Google Scholar 

  24. Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    CrossRef  Google Scholar 

  25. Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., Song, L.: Sphereface: deep hypersphere embedding for face recognition. In: CVPR (2017)

    Google Scholar 

  26. Liu, W., Wen, Y., Yu, Z., Yang, M.: Large-margin softmax loss for convolutional neural networks. In: ICML (2016)

    Google Scholar 

  27. Mahendran, A., Vedaldi, A.: Visualizing deep convolutional neural networks using natural pre-images. IJCV 120(3), 233–255 (2016)

    CrossRef  MathSciNet  Google Scholar 

  28. Mustafa, A., Khan, S., Hayat, M., Goecke, R., Shen, J., Shao, L.: Adversarial defense by restricting the hidden space of deep neural networks. In: ICCV (2019)

    Google Scholar 

  29. Ryou, S., Jeong, S.G., Perona, P.: Anchor loss: modulating loss scale based on prediction difficulty. In: ICCV (2019)

    Google Scholar 

  30. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)

    CrossRef  Google Scholar 

  31. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: ICCV (2017)

    Google Scholar 

  32. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)

  33. Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806 (2014)

  34. Sun, G., Wang, W., Dai, J., Van Gool, L.: Mining cross-image semantics for weakly supervised semantic segmentation. arXiv preprint (2020)

    Google Scholar 

  35. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset (2011)

    Google Scholar 

  36. Wan, W., Zhong, Y., Li, T., Chen, J.: Rethinking feature distribution for loss functions in image classification. In: CVPR (2018)

    Google Scholar 

  37. Wang, F., Cheng, J., Liu, W., Liu, H.: Additive margin softmax for face verification. IEEE Signal Process. Lett. 25(7), 926–930 (2018)

    CrossRef  Google Scholar 

  38. Wang, H., et al.: Cosface: Large margin cosine loss for deep face recognition. In: CVPR (2018)

    Google Scholar 

  39. Wen, Y., Zhang, K., Li, Z., Qiao, Yu.: A discriminative feature learning approach for deep face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 499–515. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_31

    CrossRef  Google Scholar 

  40. Woo, S., Park, J., Lee, J.Y., So Kweon, I.: Cbam: convolutional block attention module. In: ECCV (2018)

    Google Scholar 

  41. Xie, C., Wu, Y., Maaten, L.V.d., Yuille, A.L., He, K.: Feature denoising for improving adversarial robustness. In: CVPR (2019)

    Google Scholar 

  42. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53

    CrossRef  Google Scholar 

  43. Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. TKDE 26(8), 1819–1837 (2013)

    Google Scholar 

  44. Zhang, Y., Gong, B., Shah, M.: Fast zero-shot image tagging. In: CVPR (2016)

    Google Scholar 

  45. Zhang, Z., Sabuncu, M.: Generalized cross entropy loss for training deep neural networks with noisy labels. In: NeurIPS (2018)

    Google Scholar 

  46. Zhou, B., Khosla, A.A.L., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR (2016)

    Google Scholar 

  47. Zhou, Y., Zhu, Y., Ye, Q., Qiu, Q., Jiao, J.: Weakly supervised instance segmentation using class peak response. In: CVPR (2018)

    Google Scholar 

  48. Zhu, F., Li, H., Ouyang, W., Yu, N., Wang, X.: Learning spatial regularization with image-level supervisions for multi-label image classification. In: CVPR (2017)

    Google Scholar 

  49. Zhu, Y., Zhou, Y., Ye, Q., Qiu, Q., Jiao, J.: Soft proposal networks for weakly supervised object localization. In: ICCV (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guolei Sun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, G., Khan, S., Li, W., Cholakkal, H., Khan, F.S., Van Gool, L. (2020). Fixing Localization Errors to Improve Image Classification. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12370. Springer, Cham. https://doi.org/10.1007/978-3-030-58595-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58595-2_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58594-5

  • Online ISBN: 978-3-030-58595-2

  • eBook Packages: Computer ScienceComputer Science (R0)