R-CCF: region-aware continual contrastive fusion for weakly supervised object detection

Zhang, Yongqiang; Tian, Rui; Zhang, Yin; Zhang, Zian; Bai, Yancheng; Ding, Mingli; Zuo, Wangmeng

doi:10.1007/s10489-024-05403-3

R-CCF: region-aware continual contrastive fusion for weakly supervised object detection

Published: 03 April 2024

Volume 54, pages 4689–4712, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yongqiang Zhang ORCID: orcid.org/0000-0002-0437-7337¹^na1,
Rui Tian¹^na1,
Yin Zhang¹,
Zian Zhang¹,
Yancheng Bai²,
Mingli Ding¹ &
…
Wangmeng Zuo³

127 Accesses
1 Altmetric
Explore all metrics

Abstract

Weakly-supervised learning has emerged as a compelling method for object detection by reducing the fully annotated labels requirement in the training procedure. Recently, some works have treated the detection task as a classification task, resulting in highlighting only discriminative object parts. Moreover, fully-supervised object detectors use specific modules (e.g. feature pyramid networks (FPN) and region proposal network (RPN)) to accurately localize target objects, while weakly-supervised object detectors, such as a well-designed module for object localization, rarely exist. To address the above challenges and gaps, we propose a region-aware continual contrastive fusion (R-CCF) module, which can be plugged into any off-the-shelf weak detector to improve detection performance by refining object location. Specifically, a novel region association (RA) algorithm is proposed to automatically query similarities of the most discriminative regions with their surrounding regions and then to form new rough object locations. Furthermore, we introduce an effective object integration (OI) constraint, including a class sub-constraint and a distance sub-constraint, to refine the rough object locations from the RA algorithm further and achieve accurate object regions. By integrating our R-CCF module into weakly supervised detector architectures and training end-to-end, we can continually refine object locations by contrastively fusing the discriminative regions with surrounding patches. Extensive experiments demonstrate the effectiveness of the proposed method in weakly supervised object detection and show that integrating R-CCF into the state-of-the-art MIST [1] achieves 58.3% in mAP on the PASCAL VOC2007 benchmark, surpassing MIST by 0.2% absolutely. Moreover, R-CCF based on OICR [2] and WSDDN [3] achieve 42.5% and 32.5% in mAP on the PASCAL VOC2007, which is 1.3% and 2.1% higher than the baseline detectors, respectively. We also test the robustness of R-CCF on the PASCAL VOC 2012 dataset, and R-CCF outperforms the baseline methods clearly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 1

Algorithm 2

End-to-End Object-Level Contrastive Pretraining for Detection via Semantic-Aware Localization

RSMNet: A Regional Similar Module Network for Weakly Supervised Object Localization

Article 10 May 2022

Active Learning Strategies for Weakly-Supervised Object Detection

Data availability and access

The datasets used during the current study are available as follows: 1) PASCAL VOC 2007: http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html 2) PASCAL VOC 2012: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html 3) COCO 2017:https://cocodataset.org/.

References

Ren Z, Yu Z, Yang X, Liu MY, Lee YJ, Schwing AG, Kautz J (2020) Instance-aware, context-focused, and memory-efficient weakly supervised object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10598–10607
Tang P, Wang X, Bai X, Liu W (2017) Multiple instance detection network with online instance classifier refinement. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2843–2851
Bilen H, Vedaldi A (2016) Weakly supervised deep detection networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2846–2854
Zhang Y, Bai Y, Ding M, Li Y, Ghanem B (2018) W2f: a weakly-supervised to fully-supervised framework for object detection. In: CVPR. IEEE, pp 928–936
Zhang Y, Bai Y, Ding M, Li Y, Ghanem B (2018) Weakly-supervised object detection via mining pseudo ground truth bounding-boxes. Pattern Recognit 84:68–81
Article Google Scholar
Zhang Y, Ding M, Bai Y, Xu M, Ghanem B (2019) Beyond weakly supervised: pseudo ground truths mining for missing bounding-boxes object detection. IEEE Trans Circuits Syst Video Technol 30(4):983–997
Article Google Scholar
Cheng G, Yang J, Gao D, Guo L, Han J (2020) High-quality proposals for weakly supervised object detection. IEEE Trans Image Process 29:5794–5804
Article Google Scholar
Peng J, Wang H, Yue S, Zhang Z (2022) Context-aware co-supervision for accurate object detection. Pattern Recognit 121:108199
Article Google Scholar
Dai X, Chen Y, Yang J, Zhang P, Yuan L, Zhang L (2021) Dynamic detr: end-to-end object detection with dynamic attention. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2988–2997
Li F, Zhang H, Liu S, Guo J, Ni LM, Zhang L (2022) Dn-detr: accelerate detr training by introducing query denoising. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13619–13627
Wang Y, Ilic V, Li J, Kisačanin B, Pavlovic V (2023a) Alwod: active learning for weakly-supervised object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6459–6469
Wang Y, Guerrero R, Pavlovic V (2023b) D2f2wod: learning object proposals for weakly-supervised object detection via progressive domain adaptation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 22–31
Feng X, Yao X, Shen H, Cheng G, Xiao B, Han J (2023) Learning an invariant and equivariant network for weakly supervised object detection. IEEE Trans Pattern Anal Mach Intell
Sui L, Zhang CL, Wu J (2022) Salvage of supervision in weakly supervised object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14227–14236
Gao W, Wan F, Yue J, Xu S, Ye Q (2022) Discrepant multiple instance learning for weakly supervised object detection. Pattern Recognit 122:108233
Article Google Scholar
Wei Y, Shen Z, Cheng B, Shi H, Xiong J, Feng J, Huang T (2018) Ts2c: tight box mining with surrounding segmentation context for weakly supervised object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 434–450
Choe J, Han D, Yun S, Ha JW, Oh SJ, Shim H (2021) Region-based dropout with attention prior for weakly supervised object localization. Pattern Recognit 116:107949
Article Google Scholar
Murtaza S, Belharbi S, Pedersoli M, Sarraf A, Granger E (2023) Discriminative sampling of proposals in self-supervised transformers for weakly supervised object localization. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 155–165
Shao F, Chen L, Shao J, Ji W, Xiao S, Ye L, Zhuang Y, Xiao J (2022) Deep learning for weakly-supervised object detection and localization: a survey. Neurocomputing 496:192–207
Article Google Scholar
Bai J, Ren J, Xiao Z, Chen Z, Gao C, Ali TAA, Jiao L (2023) Localizing from classification: self-directed weakly supervised object localization for remote sensing images. IEEE Trans Neural Netw Learn Syst
Hui W, Tan C, Gu G, Zhao Y (2022) Gradient-based refined class activation map for weakly supervised object localization. Pattern Recognit 128:108664
Article Google Scholar
Tang P, Wang X, Bai S, Shen W, Bai X, Liu W, Yuille A (2018) Pcl: proposal cluster learning for weakly supervised object detection. IEEE Trans Pattern Anal Mach Intell 42(1):176–191
Article Google Scholar
Zeng Z, Liu B, Fu J, Chao H, Zhang L (2019) Wsod2: learning bottom-up and top-down objectness distillation for weakly-supervised object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8292–8300
Wang J, Chen Y, Dong Z, Gao M (2023) Improved yolov5 network for real-time multi-scale traffic sign detection. Neural Comput Appl 35(10):7853–7865
Article Google Scholar
Piao Z, Wang J, Tanga L, Zhao B, Wang W (2022) Accloc: anchor-free and two-stage detector for accurate object localization. Pattern Recognit 126:108523
Article Google Scholar
Wang J, Zhao C, Huo Z, Qiao Y, Sima H (2022) High quality proposal feature generation for crowded pedestrian detection. Pattern Recognit 128:108605
Article Google Scholar
Shao Z, Su Y, Zhou Y, Meng F, Zhu H, Liu B, Yao R (2023) Ct-net: arbitrary-shaped text detection via contour transformer. IEEE Trans Circuits Syst Video Technol
Chen X, Xie S, He K (2021) An empirical study of training self-supervised vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9640–9649
Chen TS, Hung WC, Tseng HY, Chien SY, Yang MH (2021b) Incremental false negative detection for contrastive learning. Preprint arXiv:2106.03719
He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9729–9738
Li J, Zhou P, Xiong C, Hoi SC (2020) Prototypical contrastive learning of unsupervised representations. arXiv:2005.04966
Lim S, Park J, Lee M, Lee H (2023) Unsupervised object discovery with pseudo label generated using k-means and self-supervised transformer. Neurocomputing 545:126326
Article Google Scholar
Zhuang C, Zhai AL, Yamins D (2019) Local aggregation for unsupervised learning of visual embeddings. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6002–6012
Caron M, Bojanowski P, Joulin A, Douze M (2018) Deep clustering for unsupervised learning of visual features. In: Proceedings of the European conference on computer vision (ECCV), pp 132–149
Van Gansbeke W, Vandenhende S, Georgoulis S, Proesmans M, Van Gool L (2020) Scan: learning to classify images without labels. In: European conference on computer vision, Springer, pp 268–285
Niu C, Shan H, Wang G (2022) Spice: semantic pseudo-labeling for image clustering. IEEE Trans Image Process 31:7264–7278
Article Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al. (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929
Mao Z, Zhou Y, Sun J, Wu H, Pan F, Ahmad B (2023) Weakly-supervised object localization with gradient-pyramid feature. Appl Intell 53(3):2923–2935
Article Google Scholar
Ramaswamy HG et al (2020) Ablation-cam: visual explanations for deep convolutional network via gradient-free localization. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 983–991
Jia Q, Wei S, Ruan T, Zhao Y, Zhao Y (2021) Gradingnet: towards providing reliable supervisions for weakly supervised object detection by grading the box candidates. Proceedings of the AAAI Conference on Artificial Intelligence, vol 35, pp 1682–1690
Ren Z, Tang Y, Zhang W (2023) Ido: instance dual-optimization for weakly supervised object detection. Appl Intell 1–18
Feng X, Han J, Yao X, Cheng G (2020) Tcanet: triple context-aware network for weakly supervised object detection in remote sensing images. IEEE Trans Geosci Remote Sens 59(8):6946–6955
Article Google Scholar
Zhong Y, Wang J, Peng J, Zhang L (2020) Boosting weakly supervised object detection with progressive knowledge transfer. In: European conference on computer vision, Springer, pp 615–631
Hou L, Zhang Y, Fu K, Li J (2021) Informative and consistent correspondence mining for cross-domain weakly supervised object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9929–9938
Cai Y, Tan X, Tan X (2017) Selective weakly supervised human detection under arbitrary poses. Pattern Recognit 65:223–237
Article Google Scholar
Huang Z, Bao Y, Dong B, Zhou E, Zuo W (2022) W2n: switching from weak supervision to noisy supervision for object detection. In: European conference on computer vision, Springer, pp 708–724
Chen T, Kornblith S, Norouzi M, Hinton G (2020a) A simple framework for contrastive learning of visual representations. In: International conference on machine learning, PMLR, pp 1597–1607
Chen T, Kornblith S, Swersky K, Norouzi M, Hinton GE (2020) Big self-supervised models are strong semi-supervised learners. Adv Neural Inf Process Syst 33:22243–22255
Google Scholar
Deselaers T, Alexe B, Ferrari V (2012) Weakly supervised localization and learning with generic knowledge. Int J Comput Vis 100(3):275–293
Article MathSciNet Google Scholar
Deselaers T, Alexe B, Ferrari V (2012) Weakly supervised localization and learning with generic knowledge. Int J Comput Vis 100(3):275–293
Article MathSciNet Google Scholar
Li X, Kan M, Shan S, Chen X (2019) Weakly supervised object detection with segmentation collaboration. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9735–9744
Wan F, Liu C, Ke W, Ji X, Jiao J, Ye Q (2019) C-mil: continuation multiple instance learning for weakly supervised object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2199–2208
Yang K, Li D, Dou Y (2019) Towards precise end-to-end weakly supervised object detection network. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8372–8381
Chen Z, Fu Z, Jiang R, Chen Y, Hua XS (2020) Slv: spatial likelihood voting for weakly supervised object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12995–13004
Arun A, Jawahar CV, Kumar MP (2019) Dissimilarity coefficient based weakly supervised object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9432–9441
Gao Y, Liu B, Guo N, Ye X, Wan F, You H, Fan D (2019) C-midn: coupled multiple instance detection network with segmentation guidance for weakly supervised object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9834–9843
Tao Q, Yang H, Cai J (2018) Exploiting web images for weakly supervised object detection. IEEE Trans Multimed 21(5):1135–1146
Article Google Scholar
Dong B, Huang Z, Guo Y, Wang Q, Niu Z, Zuo W (2021) Boosting weakly supervised object detection via learning bounding box adjusters. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2876–2885
Cao T, Du L, Zhang X, Chen S, Zhang Y, Wang YF (2021) Cat: weakly supervised object detection with category transfer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3070–3079
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Gokberk Cinbis R, Verbeek J, Schmid C (2014) Multi-fold mil training for weakly supervised object localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2409–2416
Bilen H, Pedersoli M, Tuytelaars T (2015) Weakly supervised object detection with convex clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1081–1089
Wang C, Ren W, Huang K, Tan T (2014) Weakly supervised object localization with latent category learning. In: European conference on computer vision, Springer, pp 431–445
Li D, Huang JB, Li Y, Wang S, Yang MH (2016) Weakly supervised object localization with progressive domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3512–3520
Teh EW, Rochan M, Wang Y (2016) Attention networks for weakly supervised object localization. In: BMVC, pp 1–11
Kantorov V, Oquab M, Cho M, Laptev I (2016) Contextlocnet: context-aware deep network models for weakly supervised localization. In: European conference on computer vision, Springer, pp 350–365
Jie Z, Wei Y, Jin X, Feng J, Liu W (2017) Deep self-taught learning for weakly supervised object localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1377–1385
Diba A, Sharma V, Pazandeh A, Pirsiavash H, Van Gool L (2017) Weakly supervised cascaded convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 914–922
Wan F, Wei P, Jiao J, Han Z, Ye Q (2018) Min-entropy latent model for weakly supervised object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1297–1306
Tang P, Wang X, Wang A, Yan Y, Liu W, Huang J, Yuille A (2018) Weakly supervised region proposal network and object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 352–368
Shen Y, Ji R, Wang Y, Wu Y, Cao L (2019) Cyclic guidance for weakly supervised joint detection and segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 697–707
Gao M, Li A, Yu R, Morariu VI, Davis LS (2018) C-wsl: count-guided weakly supervised localization. In: Proceedings of the European conference on computer vision (ECCV), pp 152–168
Sun G, Wang W, Dai J, Van Gool L (2020) Mining cross-image semantics for weakly supervised semantic segmentation. In: European conference on computer vision, Springer, pp 347–365
Huang Z, Zou Y, Kumar B, Huang D (2020) Comprehensive attention self-distillation for weakly-supervised object detection. Adv Neural Inf Process Syst 33:16797–16807
Google Scholar

Download references

Acknowledgements

This work is supported by the China Postdoctoral Science Foundation (Grant No. 259822), the National Postdoctoral program for Innovative Talents (Grant No. BX20200108), the National Science Foundation of China (Grant No. 62206077 and No. 61976070), and the Science Foundation of Heilongjiang Province (LH2021F024).

Author information

Yongqiang Zhang and Rui Tian contributed equally to this work.

Authors and Affiliations

School of Instrumentation Science and Engineering, Harbin Institute of Technology, Harbin, China
Yongqiang Zhang, Rui Tian, Yin Zhang, Zian Zhang & Mingli Ding
Institute of Software, Chinese Academy of Sciences, Beijing, China
Yancheng Bai
School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
Wangmeng Zuo

Authors

Yongqiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Rui Tian
View author publications
You can also search for this author in PubMed Google Scholar
Yin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yancheng Bai
View author publications
You can also search for this author in PubMed Google Scholar
Mingli Ding
View author publications
You can also search for this author in PubMed Google Scholar
Wangmeng Zuo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yongqiang Zhang and Rui Tian conceived of the presented idea. Yongqiang Zhang, Rui Tian, Yin Zhang and Zian Zhang carried out the experiment and wrote the manuscript with support from Wangmeng Zuo and Mingli Ding. MingliDing supervised the project. Mingli Ding and Yancheng Bai contributed to the final version of the manuscript.

Corresponding author

Correspondence to Yongqiang Zhang.

Ethics declarations

Conflict of interest

We would like to note that in the manuscript entitled “R-CCF: Region-aware Continual Contrastive Fusion for Weakly Supervised Object Detection”, no conflict of interest exists in the submission of this manuscript, and the manuscript is approved by all authors for publication.

Ethical and informed consent

We confirm that the used data does not include any Ethical consent.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, Y., Tian, R., Zhang, Y. et al. R-CCF: region-aware continual contrastive fusion for weakly supervised object detection. Appl Intell 54, 4689–4712 (2024). https://doi.org/10.1007/s10489-024-05403-3

Download citation

Accepted: 12 March 2024
Published: 03 April 2024
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10489-024-05403-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

R-CCF: region-aware continual contrastive fusion for weakly supervised object detection

Abstract

Access this article

Similar content being viewed by others

End-to-End Object-Level Contrastive Pretraining for Detection via Semantic-Aware Localization

RSMNet: A Regional Similar Module Network for Weakly Supervised Object Localization

Active Learning Strategies for Weakly-Supervised Object Detection

Data availability and access

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical and informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

R-CCF: region-aware continual contrastive fusion for weakly supervised object detection

Abstract

Access this article

Similar content being viewed by others

End-to-End Object-Level Contrastive Pretraining for Detection via Semantic-Aware Localization

RSMNet: A Regional Similar Module Network for Weakly Supervised Object Localization

Active Learning Strategies for Weakly-Supervised Object Detection

Data availability and access

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical and informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation