Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation

Zhai, Wei; Wu, Pingyu; Zhu, Kai; Cao, Yang; Wu, Feng; Zha, Zheng-Jun

doi:10.1007/s11263-023-01919-2

Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation

Published: 17 October 2023

Volume 132, pages 750–775, (2024)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Wei Zhai¹^na1,
Pingyu Wu¹^na1,
Kai Zhu¹,
Yang Cao ORCID: orcid.org/0000-0002-2891-4379^1,2,
Feng Wu^1,2 &
…
Zheng-Jun Zha¹

614 Accesses
1 Citation
Explore all metrics

Abstract

Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels. Recently, a new paradigm has emerged by generating a foreground prediction map (FPM) to achieve pixel-level localization. While existing FPM-based methods use cross-entropy to evaluate the foreground prediction map and to guide the learning of the generator, this paper presents two astonishing experimental observations on the object localization learning process: For a trained network, as the foreground mask expands, (1) the cross-entropy converges to zero when the foreground mask covers only part of the object region. (2) The activation value continuously increases until the foreground mask expands to the object boundary. Therefore, to achieve a more effective localization performance, we argue for the usage of activation value to learn more object regions. In this paper, we propose a background activation suppression (BAS) method. Specifically, an activation map constraint module is designed to facilitate the learning of generator by suppressing the background activation value. Meanwhile, by using foreground region guidance and area constraint, BAS can learn the whole region of the object. In the inference phase, we consider the prediction maps of different categories together to obtain the final localization results. Extensive experiments show that BAS achieves significant and consistent improvement over the baseline methods on the CUB-200-2011 and ILSVRC datasets. In addition, our method also achieves state-of-the-art weakly supervised semantic segmentation performance on the PASCAL VOC 2012 and MS COCO 2014 datasets. Code and models are available at https://github.com/wpy1999/BAS-Extension.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 15

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

References

Ahn, J., & Kwak, S. (2018). Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4981–4990).
Ahn, J., Cho, S., & Kwak, S. (2019). Weakly supervised learning of instance segmentation with inter-pixel relations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2209–2218).
Bae, W., Noh, J., & Kim, G. (2020). Rethinking class activation mapping for weakly supervised object localization. In European conference on computer vision (pp. 618–634). Springer.
Chan, L., Hosseini, M. S., & Plataniotis, K. N. (2021). A comprehensive analysis of weakly-supervised semantic segmentation in different image domains. International Journal of Computer Vision, 129(2), 361–384.
Article Google Scholar
Chang, Y. T., Wang, Q., Hung, W. C., Piramuthu, R., Tsai, Y. H., & Yang, M. H. (2020). Weakly-supervised semantic segmentation via sub-category exploration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8991–9000).
Chen, L., Wu, W., Fu, C., Han, X., & Zhang, Y. (2020). Weakly supervised semantic segmentation with boundary exploration. In European conference on computer vision (pp. 347–362). Springer.
Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2014). Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv:1412.7062
Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.
Article PubMed Google Scholar
Chen, Q., Yang, L., Lai, J. H., & Xie, X. (2022a). Self-supervised image-specific prototype exploration for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4288–4298).
Chen, Z., Wang, T., Wu, X., Hua, X. S., Zhang, H., & Sun, Q. (2022b). Class re-activation maps for weakly-supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 969–978).
Choe, J., Lee, S., & Shim, H. (2020a). Attention-based dropout layer for weakly supervised single object localization and semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(12), 4256–4271.
Choe, J., Oh, S. J., Lee, S., Chun, S., Akata, Z., & Shim, H. (2020b). Evaluating weakly supervised object localization methods right. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3133–3142).
Choe, J., & Shim, H. (2019). Attention-based dropout layer for weakly supervised object localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2219–2228).
Du, Y., Fu, Z., Liu, Q., & Wang, Y. (2022). Weakly supervised semantic segmentation by pixel-to-prototype contrast. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4320–4329).
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.
Article Google Scholar
Guo, G., Han, J., Wan, F., & Zhang, D. (2021). Strengthen learning tolerance for weakly supervised object localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7403–7412).
Hariharan, B., Arbeláez, P., Bourdev, L., Maji, S., & Malik, J. (2011). Semantic contours from inverse detectors. In 2011 International conference on computer vision (pp. 991–998). IEEE.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
Jiang, P. T., Han, L. H., Hou, Q., Cheng, M. M., & Wei, Y. (2021). Online attention accumulation for weakly supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 7062–7077.
Article Google Scholar
Jiang, P. T., Yang, Y., Hou, Q., & Wei, Y. (2022). L2g: A simple local-to-global knowledge transfer framework for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16886–16896).
Jo, S., & Yu, I. J. (2021). Puzzle-cam: Improved localization via matching partial and full features. In 2021 IEEE international conference on image processing (ICIP) (pp. 639–643). IEEE.
Kim, E., Kim, S., Lee, J., Kim, H., & Yoon, S. (2022). Bridging the gap between classification and localization for weakly supervised object localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14258–14267).
Kim, J., Choe, J., Yun, S., & Kwak, N. (2021). Normalization matters in weakly supervised object localization. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 3427–3436).
Kolesnikov, A., & Lampert, C. H. (2016). Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In European conference on computer vision (pp. 695–711). Springer.
Kweon, H., Yoon, S. H., Kim, H., Park, D., & Yoon, K. J. (2021). Unlocking the potential of ordinary classifier: Class-specific adversarial erasing framework for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6994–7003).
Lee, J., Choi, J., Mok, J., & Yoon, S. (2021). Reducing information bottleneck for weakly supervised semantic segmentation. Advances in Neural Information Processing Systems, 34, 27408–27421.
Google Scholar
Lee, J., Kim, E., Mok, J., & Yoon, S. (2022a). Anti-adversarially manipulated attributions for weakly supervised semantic segmentation and object localization. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Lee, J., Oh, S. J., Yun, S., Choe, J., Kim, E., & Yoon, S. (2022b). Weakly supervised semantic segmentation using out-of-distribution data. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16897–16906).
Lee, S., Lee, M., Lee, J., & Shim, H. (2021b). Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5495–5505).
Li, K., Wu, Z., Peng, K. C., Ernst, J., & Fu, Y. (2018). Tell me where to look: Guided attention inference network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9215–9223).
Li, Y., Kuang, Z., Liu, L., Chen, Y., & Zhang, W. (2021). Pseudo-mask matters in weakly-supervised semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6964–6973).
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740–755). Springer.
Liu, Y., Wu, Y. H., Wen, P. S., Shi, Y. J., Qiu, Y., & Cheng, M. M. (2020). Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Lovász, L. (1993). Random walks on graphs. Comb. Paul Erdos Eighty, 2(1–46), 4.
Google Scholar
Lu, W., Jia, X., Xie, W., Shen, L., Zhou, Y., & Duan, J. (2020). Geometry constrained weakly supervised object localization. In Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVI 16 (pp. 481–496). Springer.
Luo, H., Zhai, W., Zhang, J., Cao, Y., & Tao, D. (2022). Learning affordance grounding from exocentric images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2252–2261)
Mai, J., Yang, M., & Luo, W. (2020). Erasing integrated learning: A simple yet effective approach for weakly supervised object localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8766–8775).
Meng, M., Zhang, T., Tian, Q., Zhang, Y., & Wu, F. (2021). Foreground activation maps for weakly supervised object localization. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 3385–3395).
Pan, J., Zhu, P., Zhang, K., Cao, B., Wang, Y., Zhang, D., Han, J., & Hu, Q. (2022). Learning self-supervised low-rank network for single-stage weakly and semi-supervised semantic segmentation. International Journal of Computer Vision, 130(5), 1181–1195.
Article Google Scholar
Pan, X., Gao, Y., Lin, Z., Tang, F., Dong, W., Yuan, H., Huang, F., & Xu, C. (2021). Unveiling the potential of structure preserving for weakly supervised object localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11642–11651).
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. (2021). Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748–8763). PMLR.
Ru, L., Du, B., Zhan, Y., & Wu, C. (2022a). Weakly-supervised semantic segmentation with visual words learning and hybrid pooling. International Journal of Computer Vision, 130(4), 1127–1144.
Ru, L., Zhan, Y., Yu, B., & Du, B. (2022b). Learning affinity from attention: End-to-end weakly-supervised semantic segmentation with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16846–16855).
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Article MathSciNet Google Scholar
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2020). Grad-CAM: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision, 128(2), 336–359.
Article Google Scholar
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Singh, K. K., & Lee, Y. J. (2017). Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In 2017 IEEE international conference on computer vision (ICCV) (pp. 3544–3553). IEEE.
Song, L., Liu, J., Sun, M., & Shang, X. (2021). Weakly supervised group mask network for object detection. International Journal of Computer Vision, 129(3), 681–702.
Article Google Scholar
Su, Y., Sun, R., Lin, G., & Wu, Q. (2021). Context decoupling augmentation for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7004–7014).
Sun, K., Shi, H., Zhang, Z., & Huang, Y. (2021). ECS-Net: Improving weakly supervised semantic segmentation by using connections between class activation maps. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7283–7292).
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818–2826).
Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The caltech-ucsd birds-200–2011 dataset.
Wang, W., Sun, G., & Van Gool, L. (2022). Looking beyond single images for weakly supervised semantic segmentation learning. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Wang, X., Liu, S., Ma, H., & Yang, M. H. (2020). Weakly-supervised semantic segmentation by iterative affinity learning. International Journal of Computer Vision, 128(6), 1736–1749.
Article MathSciNet Google Scholar
Wang, Y., Zhang, J., Kan, M., Shan, S., & Chen, X. (2020b). Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12275–12284).
Wei, J., Wang, Q., Li, Z., Wang, S., Zhou, S. K., & Cui, S. (2021). Shallow feature matters for weakly supervised object localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5993–6001).
Wei, Y., Feng, J., Liang, X., Cheng, M. M., Zhao, Y., & Yan, S. (2017). Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1568–1576).
Wu, P., Zhai, W., & Cao, Y. (2021). Background activation suppression for weakly supervised object localization. arXiv:2112.00580
Wu, Z., Shen, C., & Van Den Hengel, A. (2019). Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognition, 90, 119–133.
Article ADS Google Scholar
Xie, J., Hou, X., Ye, K., & Shen, L. (2022a). Cross language image matching for weakly supervised semantic segmentation. arXiv:2203.02668
Xie, J., Luo, C., Zhu, X., Jin, Z., Lu, W., & Shen, L. (2021). Online refinement of low-level feature based activation map for weakly supervised object localization. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 132–141).
Xie, J., Xiang, J., Chen, J., Hou, X., Zhao, X., & Shen, L. (2022b). Contrastive learning of class-agnostic activation map for weakly supervised object localization and semantic segmentation. arXiv:2203.13505
Xu, J., Hou, J., Zhang, Y., Feng, R., Zhao, R. W., Zhang, T., Lu, X., & Gao, S. (2022). Cream: Weakly supervised object localization via class re-activation mapping. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9437–9446).
Xu, L., Ouyang, W., Bennamoun, M., Boussaid, F., Sohel, F., & Xu, D. (2021). Leveraging auxiliary tasks with affinity learning for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6984–6993).
Xue, H., Liu, C., Wan, F., Jiao, J., Ji, X., & Ye, Q. (2019). Danet: Divergent activation for weakly supervised object localization. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6589–6598).
Yun, S., Han, D., Oh, S. J., Chun, S., Choe, J., & Yoo, Y. (2019). Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6023–6032).
Zhai, W., Luo, H., Zhang, J., Cao, Y., & Tao, D. (2022). One-shot object affordance detection in the wild. International Journal of Computer Vision, 130, 1–29.
Article Google Scholar
Zhang, B., Xiao, J., Jiao, J., Wei, Y., & Zhao, Y. (2021a). Affinity attention graph neural network for weakly supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 8082–8096.
Zhang, C. L., Cao, Y. H., & Wu, J. (2020a). Rethinking the route towards weakly supervised object localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13460–13469).
Zhang, D., Han, J., Cheng, G., & Yang, M. H. (2021b). Weakly supervised object localization and detection: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 5866–5885.
Zhang, D., Han, J., Zhao, L., & Meng, D. (2019). Leveraging prior-knowledge for weakly supervised object detection under a collaborative self-paced curriculum learning framework. International Journal of Computer Vision, 127(4), 363–380.
Article Google Scholar
Zhang, D., Han, J., Zhao, L., & Zhao, T. (2020b). From discriminant to complete: Reinforcement searching-agent learning for weakly supervised object detection. IEEE Transactions on Neural Networks and Learning Systems, 31(12), 5549–5560.
Zhang, D., Zhang, H., Tang, J., Hua, X. S., & Sun, Q. (2020c). Causal intervention for weakly-supervised semantic segmentation. Advances in Neural Information Processing Systems, 33, 655–666.
Zhang, F., Gu, C., Zhang, C., & Dai, Y. (2021c). Complementary patch for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7242–7251).
Zhang, X., Wei, Y., Feng, J., Yang, Y., & Huang, T. S. (2018a). Adversarial complementary learning for weakly supervised object localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1325–1334).
Zhang, X., Wei, Y., Kang, G., Yang, Y., & Huang, T. (2018b). Self-produced guidance for weakly-supervised object localization. In Proceedings of the European conference on computer vision (ECCV) (pp. 597–613).
Zhang, X., Wei, Y., & Yang, Y. (2020d). Inter-image communication for weakly supervised localization. In Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIX 16 (pp. 271–287). Springer.
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2921–2929).
Zhu, L., She, Q., Chen, Q., You, Y., Wang, B., & Lu, Y. (2022). Weakly supervised object localization as domain adaption. arXiv:2203.01714

Download references

Author information

Wei Zhai and Pingyu Wu have contributed equally to this work.

Authors and Affiliations

University of Science and Technology of China, Hefei, China
Wei Zhai, Pingyu Wu, Kai Zhu, Yang Cao, Feng Wu & Zheng-Jun Zha
Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
Yang Cao & Feng Wu

Authors

Wei Zhai
View author publications
You can also search for this author in PubMed Google Scholar
Pingyu Wu
View author publications
You can also search for this author in PubMed Google Scholar
Kai Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yang Cao
View author publications
You can also search for this author in PubMed Google Scholar
Feng Wu
View author publications
You can also search for this author in PubMed Google Scholar
Zheng-Jun Zha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Cao.

Additional information

Communicated by SUHA KWAK.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhai, W., Wu, P., Zhu, K. et al. Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation. Int J Comput Vis 132, 750–775 (2024). https://doi.org/10.1007/s11263-023-01919-2

Download citation

Received: 02 October 2022
Accepted: 19 September 2023
Published: 17 October 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11263-023-01919-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation