Skip to main content
Log in

R-CCF: region-aware continual contrastive fusion for weakly supervised object detection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Weakly-supervised learning has emerged as a compelling method for object detection by reducing the fully annotated labels requirement in the training procedure. Recently, some works have treated the detection task as a classification task, resulting in highlighting only discriminative object parts. Moreover, fully-supervised object detectors use specific modules (e.g. feature pyramid networks (FPN) and region proposal network (RPN)) to accurately localize target objects, while weakly-supervised object detectors, such as a well-designed module for object localization, rarely exist. To address the above challenges and gaps, we propose a region-aware continual contrastive fusion (R-CCF) module, which can be plugged into any off-the-shelf weak detector to improve detection performance by refining object location. Specifically, a novel region association (RA) algorithm is proposed to automatically query similarities of the most discriminative regions with their surrounding regions and then to form new rough object locations. Furthermore, we introduce an effective object integration (OI) constraint, including a class sub-constraint and a distance sub-constraint, to refine the rough object locations from the RA algorithm further and achieve accurate object regions. By integrating our R-CCF module into weakly supervised detector architectures and training end-to-end, we can continually refine object locations by contrastively fusing the discriminative regions with surrounding patches. Extensive experiments demonstrate the effectiveness of the proposed method in weakly supervised object detection and show that integrating R-CCF into the state-of-the-art MIST [1] achieves 58.3% in mAP on the PASCAL VOC2007 benchmark, surpassing MIST by 0.2% absolutely. Moreover, R-CCF based on OICR [2] and WSDDN [3] achieve 42.5% and 32.5% in mAP on the PASCAL VOC2007, which is 1.3% and 2.1% higher than the baseline detectors, respectively. We also test the robustness of R-CCF on the PASCAL VOC 2012 dataset, and R-CCF outperforms the baseline methods clearly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability and access

The datasets used during the current study are available as follows: 1) PASCAL VOC 2007: http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html 2) PASCAL VOC 2012: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html 3) COCO 2017:https://cocodataset.org/.

References

  1. Ren Z, Yu Z, Yang X, Liu MY, Lee YJ, Schwing AG, Kautz J (2020) Instance-aware, context-focused, and memory-efficient weakly supervised object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10598–10607

  2. Tang P, Wang X, Bai X, Liu W (2017) Multiple instance detection network with online instance classifier refinement. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2843–2851

  3. Bilen H, Vedaldi A (2016) Weakly supervised deep detection networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2846–2854

  4. Zhang Y, Bai Y, Ding M, Li Y, Ghanem B (2018) W2f: a weakly-supervised to fully-supervised framework for object detection. In: CVPR. IEEE, pp 928–936

  5. Zhang Y, Bai Y, Ding M, Li Y, Ghanem B (2018) Weakly-supervised object detection via mining pseudo ground truth bounding-boxes. Pattern Recognit 84:68–81

    Article  Google Scholar 

  6. Zhang Y, Ding M, Bai Y, Xu M, Ghanem B (2019) Beyond weakly supervised: pseudo ground truths mining for missing bounding-boxes object detection. IEEE Trans Circuits Syst Video Technol 30(4):983–997

    Article  Google Scholar 

  7. Cheng G, Yang J, Gao D, Guo L, Han J (2020) High-quality proposals for weakly supervised object detection. IEEE Trans Image Process 29:5794–5804

    Article  Google Scholar 

  8. Peng J, Wang H, Yue S, Zhang Z (2022) Context-aware co-supervision for accurate object detection. Pattern Recognit 121:108199

    Article  Google Scholar 

  9. Dai X, Chen Y, Yang J, Zhang P, Yuan L, Zhang L (2021) Dynamic detr: end-to-end object detection with dynamic attention. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2988–2997

  10. Li F, Zhang H, Liu S, Guo J, Ni LM, Zhang L (2022) Dn-detr: accelerate detr training by introducing query denoising. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13619–13627

  11. Wang Y, Ilic V, Li J, Kisačanin B, Pavlovic V (2023a) Alwod: active learning for weakly-supervised object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6459–6469

  12. Wang Y, Guerrero R, Pavlovic V (2023b) D2f2wod: learning object proposals for weakly-supervised object detection via progressive domain adaptation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 22–31

  13. Feng X, Yao X, Shen H, Cheng G, Xiao B, Han J (2023) Learning an invariant and equivariant network for weakly supervised object detection. IEEE Trans Pattern Anal Mach Intell

  14. Sui L, Zhang CL, Wu J (2022) Salvage of supervision in weakly supervised object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14227–14236

  15. Gao W, Wan F, Yue J, Xu S, Ye Q (2022) Discrepant multiple instance learning for weakly supervised object detection. Pattern Recognit 122:108233

    Article  Google Scholar 

  16. Wei Y, Shen Z, Cheng B, Shi H, Xiong J, Feng J, Huang T (2018) Ts2c: tight box mining with surrounding segmentation context for weakly supervised object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 434–450

  17. Choe J, Han D, Yun S, Ha JW, Oh SJ, Shim H (2021) Region-based dropout with attention prior for weakly supervised object localization. Pattern Recognit 116:107949

    Article  Google Scholar 

  18. Murtaza S, Belharbi S, Pedersoli M, Sarraf A, Granger E (2023) Discriminative sampling of proposals in self-supervised transformers for weakly supervised object localization. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 155–165

  19. Shao F, Chen L, Shao J, Ji W, Xiao S, Ye L, Zhuang Y, Xiao J (2022) Deep learning for weakly-supervised object detection and localization: a survey. Neurocomputing 496:192–207

    Article  Google Scholar 

  20. Bai J, Ren J, Xiao Z, Chen Z, Gao C, Ali TAA, Jiao L (2023) Localizing from classification: self-directed weakly supervised object localization for remote sensing images. IEEE Trans Neural Netw Learn Syst

  21. Hui W, Tan C, Gu G, Zhao Y (2022) Gradient-based refined class activation map for weakly supervised object localization. Pattern Recognit 128:108664

    Article  Google Scholar 

  22. Tang P, Wang X, Bai S, Shen W, Bai X, Liu W, Yuille A (2018) Pcl: proposal cluster learning for weakly supervised object detection. IEEE Trans Pattern Anal Mach Intell 42(1):176–191

    Article  Google Scholar 

  23. Zeng Z, Liu B, Fu J, Chao H, Zhang L (2019) Wsod2: learning bottom-up and top-down objectness distillation for weakly-supervised object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8292–8300

  24. Wang J, Chen Y, Dong Z, Gao M (2023) Improved yolov5 network for real-time multi-scale traffic sign detection. Neural Comput Appl 35(10):7853–7865

    Article  Google Scholar 

  25. Piao Z, Wang J, Tanga L, Zhao B, Wang W (2022) Accloc: anchor-free and two-stage detector for accurate object localization. Pattern Recognit 126:108523

    Article  Google Scholar 

  26. Wang J, Zhao C, Huo Z, Qiao Y, Sima H (2022) High quality proposal feature generation for crowded pedestrian detection. Pattern Recognit 128:108605

    Article  Google Scholar 

  27. Shao Z, Su Y, Zhou Y, Meng F, Zhu H, Liu B, Yao R (2023) Ct-net: arbitrary-shaped text detection via contour transformer. IEEE Trans Circuits Syst Video Technol

  28. Chen X, Xie S, He K (2021) An empirical study of training self-supervised vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9640–9649

  29. Chen TS, Hung WC, Tseng HY, Chien SY, Yang MH (2021b) Incremental false negative detection for contrastive learning. Preprint arXiv:2106.03719

  30. He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9729–9738

  31. Li J, Zhou P, Xiong C, Hoi SC (2020) Prototypical contrastive learning of unsupervised representations. arXiv:2005.04966

  32. Lim S, Park J, Lee M, Lee H (2023) Unsupervised object discovery with pseudo label generated using k-means and self-supervised transformer. Neurocomputing 545:126326

    Article  Google Scholar 

  33. Zhuang C, Zhai AL, Yamins D (2019) Local aggregation for unsupervised learning of visual embeddings. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6002–6012

  34. Caron M, Bojanowski P, Joulin A, Douze M (2018) Deep clustering for unsupervised learning of visual features. In: Proceedings of the European conference on computer vision (ECCV), pp 132–149

  35. Van Gansbeke W, Vandenhende S, Georgoulis S, Proesmans M, Van Gool L (2020) Scan: learning to classify images without labels. In: European conference on computer vision, Springer, pp 268–285

  36. Niu C, Shan H, Wang G (2022) Spice: semantic pseudo-labeling for image clustering. IEEE Trans Image Process 31:7264–7278

    Article  Google Scholar 

  37. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al. (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929

  38. Mao Z, Zhou Y, Sun J, Wu H, Pan F, Ahmad B (2023) Weakly-supervised object localization with gradient-pyramid feature. Appl Intell 53(3):2923–2935

    Article  Google Scholar 

  39. Ramaswamy HG et al (2020) Ablation-cam: visual explanations for deep convolutional network via gradient-free localization. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 983–991

  40. Jia Q, Wei S, Ruan T, Zhao Y, Zhao Y (2021) Gradingnet: towards providing reliable supervisions for weakly supervised object detection by grading the box candidates. Proceedings of the AAAI Conference on Artificial Intelligence, vol 35, pp 1682–1690

  41. Ren Z, Tang Y, Zhang W (2023) Ido: instance dual-optimization for weakly supervised object detection. Appl Intell 1–18

  42. Feng X, Han J, Yao X, Cheng G (2020) Tcanet: triple context-aware network for weakly supervised object detection in remote sensing images. IEEE Trans Geosci Remote Sens 59(8):6946–6955

    Article  Google Scholar 

  43. Zhong Y, Wang J, Peng J, Zhang L (2020) Boosting weakly supervised object detection with progressive knowledge transfer. In: European conference on computer vision, Springer, pp 615–631

  44. Hou L, Zhang Y, Fu K, Li J (2021) Informative and consistent correspondence mining for cross-domain weakly supervised object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9929–9938

  45. Cai Y, Tan X, Tan X (2017) Selective weakly supervised human detection under arbitrary poses. Pattern Recognit 65:223–237

    Article  Google Scholar 

  46. Huang Z, Bao Y, Dong B, Zhou E, Zuo W (2022) W2n: switching from weak supervision to noisy supervision for object detection. In: European conference on computer vision, Springer, pp 708–724

  47. Chen T, Kornblith S, Norouzi M, Hinton G (2020a) A simple framework for contrastive learning of visual representations. In: International conference on machine learning, PMLR, pp 1597–1607

  48. Chen T, Kornblith S, Swersky K, Norouzi M, Hinton GE (2020) Big self-supervised models are strong semi-supervised learners. Adv Neural Inf Process Syst 33:22243–22255

    Google Scholar 

  49. Deselaers T, Alexe B, Ferrari V (2012) Weakly supervised localization and learning with generic knowledge. Int J Comput Vis 100(3):275–293

    Article  MathSciNet  Google Scholar 

  50. Deselaers T, Alexe B, Ferrari V (2012) Weakly supervised localization and learning with generic knowledge. Int J Comput Vis 100(3):275–293

    Article  MathSciNet  Google Scholar 

  51. Li X, Kan M, Shan S, Chen X (2019) Weakly supervised object detection with segmentation collaboration. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9735–9744

  52. Wan F, Liu C, Ke W, Ji X, Jiao J, Ye Q (2019) C-mil: continuation multiple instance learning for weakly supervised object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2199–2208

  53. Yang K, Li D, Dou Y (2019) Towards precise end-to-end weakly supervised object detection network. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8372–8381

  54. Chen Z, Fu Z, Jiang R, Chen Y, Hua XS (2020) Slv: spatial likelihood voting for weakly supervised object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12995–13004

  55. Arun A, Jawahar CV, Kumar MP (2019) Dissimilarity coefficient based weakly supervised object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9432–9441

  56. Gao Y, Liu B, Guo N, Ye X, Wan F, You H, Fan D (2019) C-midn: coupled multiple instance detection network with segmentation guidance for weakly supervised object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9834–9843

  57. Tao Q, Yang H, Cai J (2018) Exploiting web images for weakly supervised object detection. IEEE Trans Multimed 21(5):1135–1146

    Article  Google Scholar 

  58. Dong B, Huang Z, Guo Y, Wang Q, Niu Z, Zuo W (2021) Boosting weakly supervised object detection via learning bounding box adjusters. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2876–2885

  59. Cao T, Du L, Zhang X, Chen S, Zhang Y, Wang YF (2021) Cat: weakly supervised object detection with category transfer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3070–3079

  60. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  61. Gokberk Cinbis R, Verbeek J, Schmid C (2014) Multi-fold mil training for weakly supervised object localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2409–2416

  62. Bilen H, Pedersoli M, Tuytelaars T (2015) Weakly supervised object detection with convex clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1081–1089

  63. Wang C, Ren W, Huang K, Tan T (2014) Weakly supervised object localization with latent category learning. In: European conference on computer vision, Springer, pp 431–445

  64. Li D, Huang JB, Li Y, Wang S, Yang MH (2016) Weakly supervised object localization with progressive domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3512–3520

  65. Teh EW, Rochan M, Wang Y (2016) Attention networks for weakly supervised object localization. In: BMVC, pp 1–11

  66. Kantorov V, Oquab M, Cho M, Laptev I (2016) Contextlocnet: context-aware deep network models for weakly supervised localization. In: European conference on computer vision, Springer, pp 350–365

  67. Jie Z, Wei Y, Jin X, Feng J, Liu W (2017) Deep self-taught learning for weakly supervised object localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1377–1385

  68. Diba A, Sharma V, Pazandeh A, Pirsiavash H, Van Gool L (2017) Weakly supervised cascaded convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 914–922

  69. Wan F, Wei P, Jiao J, Han Z, Ye Q (2018) Min-entropy latent model for weakly supervised object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1297–1306

  70. Tang P, Wang X, Wang A, Yan Y, Liu W, Huang J, Yuille A (2018) Weakly supervised region proposal network and object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 352–368

  71. Shen Y, Ji R, Wang Y, Wu Y, Cao L (2019) Cyclic guidance for weakly supervised joint detection and segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 697–707

  72. Gao M, Li A, Yu R, Morariu VI, Davis LS (2018) C-wsl: count-guided weakly supervised localization. In: Proceedings of the European conference on computer vision (ECCV), pp 152–168

  73. Sun G, Wang W, Dai J, Van Gool L (2020) Mining cross-image semantics for weakly supervised semantic segmentation. In: European conference on computer vision, Springer, pp 347–365

  74. Huang Z, Zou Y, Kumar B, Huang D (2020) Comprehensive attention self-distillation for weakly-supervised object detection. Adv Neural Inf Process Syst 33:16797–16807

    Google Scholar 

Download references

Acknowledgements

This work is supported by the China Postdoctoral Science Foundation (Grant No. 259822), the National Postdoctoral program for Innovative Talents (Grant No. BX20200108), the National Science Foundation of China (Grant No. 62206077 and No. 61976070), and the Science Foundation of Heilongjiang Province (LH2021F024).

Author information

Authors and Affiliations

Authors

Contributions

Yongqiang Zhang and Rui Tian conceived of the presented idea. Yongqiang Zhang, Rui Tian, Yin Zhang and Zian Zhang carried out the experiment and wrote the manuscript with support from Wangmeng Zuo and Mingli Ding. MingliDing supervised the project. Mingli Ding and Yancheng Bai contributed to the final version of the manuscript.

Corresponding author

Correspondence to Yongqiang Zhang.

Ethics declarations

Conflict of interest

We would like to note that in the manuscript entitled “R-CCF: Region-aware Continual Contrastive Fusion for Weakly Supervised Object Detection”, no conflict of interest exists in the submission of this manuscript, and the manuscript is approved by all authors for publication.

Ethical and informed consent

We confirm that the used data does not include any Ethical consent.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Tian, R., Zhang, Y. et al. R-CCF: region-aware continual contrastive fusion for weakly supervised object detection. Appl Intell 54, 4689–4712 (2024). https://doi.org/10.1007/s10489-024-05403-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05403-3

Keywords

Navigation