Skip to main content
Log in

Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels. Recently, a new paradigm has emerged by generating a foreground prediction map (FPM) to achieve pixel-level localization. While existing FPM-based methods use cross-entropy to evaluate the foreground prediction map and to guide the learning of the generator, this paper presents two astonishing experimental observations on the object localization learning process: For a trained network, as the foreground mask expands, (1) the cross-entropy converges to zero when the foreground mask covers only part of the object region. (2) The activation value continuously increases until the foreground mask expands to the object boundary. Therefore, to achieve a more effective localization performance, we argue for the usage of activation value to learn more object regions. In this paper, we propose a background activation suppression (BAS) method. Specifically, an activation map constraint module is designed to facilitate the learning of generator by suppressing the background activation value. Meanwhile, by using foreground region guidance and area constraint, BAS can learn the whole region of the object. In the inference phase, we consider the prediction maps of different categories together to obtain the final localization results. Extensive experiments show that BAS achieves significant and consistent improvement over the baseline methods on the CUB-200-2011 and ILSVRC datasets. In addition, our method also achieves state-of-the-art weakly supervised semantic segmentation performance on the PASCAL VOC 2012 and MS COCO 2014 datasets. Code and models are available at https://github.com/wpy1999/BAS-Extension.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

References

  • Ahn, J., & Kwak, S. (2018). Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4981–4990).

  • Ahn, J., Cho, S., & Kwak, S. (2019). Weakly supervised learning of instance segmentation with inter-pixel relations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2209–2218).

  • Bae, W., Noh, J., & Kim, G. (2020). Rethinking class activation mapping for weakly supervised object localization. In European conference on computer vision (pp. 618–634). Springer.

  • Chan, L., Hosseini, M. S., & Plataniotis, K. N. (2021). A comprehensive analysis of weakly-supervised semantic segmentation in different image domains. International Journal of Computer Vision, 129(2), 361–384.

    Article  Google Scholar 

  • Chang, Y. T., Wang, Q., Hung, W. C., Piramuthu, R., Tsai, Y. H., & Yang, M. H. (2020). Weakly-supervised semantic segmentation via sub-category exploration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8991–9000).

  • Chen, L., Wu, W., Fu, C., Han, X., & Zhang, Y. (2020). Weakly supervised semantic segmentation with boundary exploration. In European conference on computer vision (pp. 347–362). Springer.

  • Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2014). Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv:1412.7062

  • Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.

    Article  PubMed  Google Scholar 

  • Chen, Q., Yang, L., Lai, J. H., & Xie, X. (2022a). Self-supervised image-specific prototype exploration for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4288–4298).

  • Chen, Z., Wang, T., Wu, X., Hua, X. S., Zhang, H., & Sun, Q. (2022b). Class re-activation maps for weakly-supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 969–978).

  • Choe, J., Lee, S., & Shim, H. (2020a). Attention-based dropout layer for weakly supervised single object localization and semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(12), 4256–4271.

  • Choe, J., Oh, S. J., Lee, S., Chun, S., Akata, Z., & Shim, H. (2020b). Evaluating weakly supervised object localization methods right. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3133–3142).

  • Choe, J., & Shim, H. (2019). Attention-based dropout layer for weakly supervised object localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2219–2228).

  • Du, Y., Fu, Z., Liu, Q., & Wang, Y. (2022). Weakly supervised semantic segmentation by pixel-to-prototype contrast. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4320–4329).

  • Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.

    Article  Google Scholar 

  • Guo, G., Han, J., Wan, F., & Zhang, D. (2021). Strengthen learning tolerance for weakly supervised object localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7403–7412).

  • Hariharan, B., Arbeláez, P., Bourdev, L., Maji, S., & Malik, J. (2011). Semantic contours from inverse detectors. In 2011 International conference on computer vision (pp. 991–998). IEEE.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).

  • Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861

  • Jiang, P. T., Han, L. H., Hou, Q., Cheng, M. M., & Wei, Y. (2021). Online attention accumulation for weakly supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 7062–7077.

    Article  Google Scholar 

  • Jiang, P. T., Yang, Y., Hou, Q., & Wei, Y. (2022). L2g: A simple local-to-global knowledge transfer framework for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16886–16896).

  • Jo, S., & Yu, I. J. (2021). Puzzle-cam: Improved localization via matching partial and full features. In 2021 IEEE international conference on image processing (ICIP) (pp. 639–643). IEEE.

  • Kim, E., Kim, S., Lee, J., Kim, H., & Yoon, S. (2022). Bridging the gap between classification and localization for weakly supervised object localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14258–14267).

  • Kim, J., Choe, J., Yun, S., & Kwak, N. (2021). Normalization matters in weakly supervised object localization. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 3427–3436).

  • Kolesnikov, A., & Lampert, C. H. (2016). Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In European conference on computer vision (pp. 695–711). Springer.

  • Kweon, H., Yoon, S. H., Kim, H., Park, D., & Yoon, K. J. (2021). Unlocking the potential of ordinary classifier: Class-specific adversarial erasing framework for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6994–7003).

  • Lee, J., Choi, J., Mok, J., & Yoon, S. (2021). Reducing information bottleneck for weakly supervised semantic segmentation. Advances in Neural Information Processing Systems, 34, 27408–27421.

    Google Scholar 

  • Lee, J., Kim, E., Mok, J., & Yoon, S. (2022a). Anti-adversarially manipulated attributions for weakly supervised semantic segmentation and object localization. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • Lee, J., Oh, S. J., Yun, S., Choe, J., Kim, E., & Yoon, S. (2022b). Weakly supervised semantic segmentation using out-of-distribution data. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16897–16906).

  • Lee, S., Lee, M., Lee, J., & Shim, H. (2021b). Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5495–5505).

  • Li, K., Wu, Z., Peng, K. C., Ernst, J., & Fu, Y. (2018). Tell me where to look: Guided attention inference network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9215–9223).

  • Li, Y., Kuang, Z., Liu, L., Chen, Y., & Zhang, W. (2021). Pseudo-mask matters in weakly-supervised semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6964–6973).

  • Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740–755). Springer.

  • Liu, Y., Wu, Y. H., Wen, P. S., Shi, Y. J., Qiu, Y., & Cheng, M. M. (2020). Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • Lovász, L. (1993). Random walks on graphs. Comb. Paul Erdos Eighty, 2(1–46), 4.

    Google Scholar 

  • Lu, W., Jia, X., Xie, W., Shen, L., Zhou, Y., & Duan, J. (2020). Geometry constrained weakly supervised object localization. In Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVI 16 (pp. 481–496). Springer.

  • Luo, H., Zhai, W., Zhang, J., Cao, Y., & Tao, D. (2022). Learning affordance grounding from exocentric images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2252–2261)

  • Mai, J., Yang, M., & Luo, W. (2020). Erasing integrated learning: A simple yet effective approach for weakly supervised object localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8766–8775).

  • Meng, M., Zhang, T., Tian, Q., Zhang, Y., & Wu, F. (2021). Foreground activation maps for weakly supervised object localization. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 3385–3395).

  • Pan, J., Zhu, P., Zhang, K., Cao, B., Wang, Y., Zhang, D., Han, J., & Hu, Q. (2022). Learning self-supervised low-rank network for single-stage weakly and semi-supervised semantic segmentation. International Journal of Computer Vision, 130(5), 1181–1195.

    Article  Google Scholar 

  • Pan, X., Gao, Y., Lin, Z., Tang, F., Dong, W., Yuan, H., Huang, F., & Xu, C. (2021). Unveiling the potential of structure preserving for weakly supervised object localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11642–11651).

  • Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. (2021). Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748–8763). PMLR.

  • Ru, L., Du, B., Zhan, Y., & Wu, C. (2022a). Weakly-supervised semantic segmentation with visual words learning and hybrid pooling. International Journal of Computer Vision, 130(4), 1127–1144.

  • Ru, L., Zhan, Y., Yu, B., & Du, B. (2022b). Learning affinity from attention: End-to-end weakly-supervised semantic segmentation with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16846–16855).

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.

    Article  MathSciNet  Google Scholar 

  • Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2020). Grad-CAM: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision, 128(2), 336–359.

    Article  Google Scholar 

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  • Singh, K. K., & Lee, Y. J. (2017). Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In 2017 IEEE international conference on computer vision (ICCV) (pp. 3544–3553). IEEE.

  • Song, L., Liu, J., Sun, M., & Shang, X. (2021). Weakly supervised group mask network for object detection. International Journal of Computer Vision, 129(3), 681–702.

    Article  Google Scholar 

  • Su, Y., Sun, R., Lin, G., & Wu, Q. (2021). Context decoupling augmentation for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7004–7014).

  • Sun, K., Shi, H., Zhang, Z., & Huang, Y. (2021). ECS-Net: Improving weakly supervised semantic segmentation by using connections between class activation maps. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7283–7292).

  • Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818–2826).

  • Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The caltech-ucsd birds-200–2011 dataset.

  • Wang, W., Sun, G., & Van Gool, L. (2022). Looking beyond single images for weakly supervised semantic segmentation learning. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • Wang, X., Liu, S., Ma, H., & Yang, M. H. (2020). Weakly-supervised semantic segmentation by iterative affinity learning. International Journal of Computer Vision, 128(6), 1736–1749.

    Article  MathSciNet  Google Scholar 

  • Wang, Y., Zhang, J., Kan, M., Shan, S., & Chen, X. (2020b). Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12275–12284).

  • Wei, J., Wang, Q., Li, Z., Wang, S., Zhou, S. K., & Cui, S. (2021). Shallow feature matters for weakly supervised object localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5993–6001).

  • Wei, Y., Feng, J., Liang, X., Cheng, M. M., Zhao, Y., & Yan, S. (2017). Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1568–1576).

  • Wu, P., Zhai, W., & Cao, Y. (2021). Background activation suppression for weakly supervised object localization. arXiv:2112.00580

  • Wu, Z., Shen, C., & Van Den Hengel, A. (2019). Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognition, 90, 119–133.

    Article  ADS  Google Scholar 

  • Xie, J., Hou, X., Ye, K., & Shen, L. (2022a). Cross language image matching for weakly supervised semantic segmentation. arXiv:2203.02668

  • Xie, J., Luo, C., Zhu, X., Jin, Z., Lu, W., & Shen, L. (2021). Online refinement of low-level feature based activation map for weakly supervised object localization. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 132–141).

  • Xie, J., Xiang, J., Chen, J., Hou, X., Zhao, X., & Shen, L. (2022b). Contrastive learning of class-agnostic activation map for weakly supervised object localization and semantic segmentation. arXiv:2203.13505

  • Xu, J., Hou, J., Zhang, Y., Feng, R., Zhao, R. W., Zhang, T., Lu, X., & Gao, S. (2022). Cream: Weakly supervised object localization via class re-activation mapping. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9437–9446).

  • Xu, L., Ouyang, W., Bennamoun, M., Boussaid, F., Sohel, F., & Xu, D. (2021). Leveraging auxiliary tasks with affinity learning for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6984–6993).

  • Xue, H., Liu, C., Wan, F., Jiao, J., Ji, X., & Ye, Q. (2019). Danet: Divergent activation for weakly supervised object localization. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6589–6598).

  • Yun, S., Han, D., Oh, S. J., Chun, S., Choe, J., & Yoo, Y. (2019). Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6023–6032).

  • Zhai, W., Luo, H., Zhang, J., Cao, Y., & Tao, D. (2022). One-shot object affordance detection in the wild. International Journal of Computer Vision, 130, 1–29.

    Article  Google Scholar 

  • Zhang, B., Xiao, J., Jiao, J., Wei, Y., & Zhao, Y. (2021a). Affinity attention graph neural network for weakly supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 8082–8096.

  • Zhang, C. L., Cao, Y. H., & Wu, J. (2020a). Rethinking the route towards weakly supervised object localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13460–13469).

  • Zhang, D., Han, J., Cheng, G., & Yang, M. H. (2021b). Weakly supervised object localization and detection: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 5866–5885.

  • Zhang, D., Han, J., Zhao, L., & Meng, D. (2019). Leveraging prior-knowledge for weakly supervised object detection under a collaborative self-paced curriculum learning framework. International Journal of Computer Vision, 127(4), 363–380.

    Article  Google Scholar 

  • Zhang, D., Han, J., Zhao, L., & Zhao, T. (2020b). From discriminant to complete: Reinforcement searching-agent learning for weakly supervised object detection. IEEE Transactions on Neural Networks and Learning Systems, 31(12), 5549–5560.

  • Zhang, D., Zhang, H., Tang, J., Hua, X. S., & Sun, Q. (2020c). Causal intervention for weakly-supervised semantic segmentation. Advances in Neural Information Processing Systems, 33, 655–666.

  • Zhang, F., Gu, C., Zhang, C., & Dai, Y. (2021c). Complementary patch for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7242–7251).

  • Zhang, X., Wei, Y., Feng, J., Yang, Y., & Huang, T. S. (2018a). Adversarial complementary learning for weakly supervised object localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1325–1334).

  • Zhang, X., Wei, Y., Kang, G., Yang, Y., & Huang, T. (2018b). Self-produced guidance for weakly-supervised object localization. In Proceedings of the European conference on computer vision (ECCV) (pp. 597–613).

  • Zhang, X., Wei, Y., & Yang, Y. (2020d). Inter-image communication for weakly supervised localization. In Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIX 16 (pp. 271–287). Springer.

  • Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2921–2929).

  • Zhu, L., She, Q., Chen, Q., You, Y., Wang, B., & Lu, Y. (2022). Weakly supervised object localization as domain adaption. arXiv:2203.01714

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Cao.

Additional information

Communicated by SUHA KWAK.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhai, W., Wu, P., Zhu, K. et al. Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation. Int J Comput Vis 132, 750–775 (2024). https://doi.org/10.1007/s11263-023-01919-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-023-01919-2

Keywords

Navigation