Abstract
Weakly-supervised learning has emerged as a compelling method for object detection by reducing the fully annotated labels requirement in the training procedure. Recently, some works have treated the detection task as a classification task, resulting in highlighting only discriminative object parts. Moreover, fully-supervised object detectors use specific modules (e.g. feature pyramid networks (FPN) and region proposal network (RPN)) to accurately localize target objects, while weakly-supervised object detectors, such as a well-designed module for object localization, rarely exist. To address the above challenges and gaps, we propose a region-aware continual contrastive fusion (R-CCF) module, which can be plugged into any off-the-shelf weak detector to improve detection performance by refining object location. Specifically, a novel region association (RA) algorithm is proposed to automatically query similarities of the most discriminative regions with their surrounding regions and then to form new rough object locations. Furthermore, we introduce an effective object integration (OI) constraint, including a class sub-constraint and a distance sub-constraint, to refine the rough object locations from the RA algorithm further and achieve accurate object regions. By integrating our R-CCF module into weakly supervised detector architectures and training end-to-end, we can continually refine object locations by contrastively fusing the discriminative regions with surrounding patches. Extensive experiments demonstrate the effectiveness of the proposed method in weakly supervised object detection and show that integrating R-CCF into the state-of-the-art MIST [1] achieves 58.3% in mAP on the PASCAL VOC2007 benchmark, surpassing MIST by 0.2% absolutely. Moreover, R-CCF based on OICR [2] and WSDDN [3] achieve 42.5% and 32.5% in mAP on the PASCAL VOC2007, which is 1.3% and 2.1% higher than the baseline detectors, respectively. We also test the robustness of R-CCF on the PASCAL VOC 2012 dataset, and R-CCF outperforms the baseline methods clearly.
Similar content being viewed by others
Data availability and access
The datasets used during the current study are available as follows: 1) PASCAL VOC 2007: http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html 2) PASCAL VOC 2012: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html 3) COCO 2017:https://cocodataset.org/.
References
Ren Z, Yu Z, Yang X, Liu MY, Lee YJ, Schwing AG, Kautz J (2020) Instance-aware, context-focused, and memory-efficient weakly supervised object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10598–10607
Tang P, Wang X, Bai X, Liu W (2017) Multiple instance detection network with online instance classifier refinement. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2843–2851
Bilen H, Vedaldi A (2016) Weakly supervised deep detection networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2846–2854
Zhang Y, Bai Y, Ding M, Li Y, Ghanem B (2018) W2f: a weakly-supervised to fully-supervised framework for object detection. In: CVPR. IEEE, pp 928–936
Zhang Y, Bai Y, Ding M, Li Y, Ghanem B (2018) Weakly-supervised object detection via mining pseudo ground truth bounding-boxes. Pattern Recognit 84:68–81
Zhang Y, Ding M, Bai Y, Xu M, Ghanem B (2019) Beyond weakly supervised: pseudo ground truths mining for missing bounding-boxes object detection. IEEE Trans Circuits Syst Video Technol 30(4):983–997
Cheng G, Yang J, Gao D, Guo L, Han J (2020) High-quality proposals for weakly supervised object detection. IEEE Trans Image Process 29:5794–5804
Peng J, Wang H, Yue S, Zhang Z (2022) Context-aware co-supervision for accurate object detection. Pattern Recognit 121:108199
Dai X, Chen Y, Yang J, Zhang P, Yuan L, Zhang L (2021) Dynamic detr: end-to-end object detection with dynamic attention. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2988–2997
Li F, Zhang H, Liu S, Guo J, Ni LM, Zhang L (2022) Dn-detr: accelerate detr training by introducing query denoising. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13619–13627
Wang Y, Ilic V, Li J, Kisačanin B, Pavlovic V (2023a) Alwod: active learning for weakly-supervised object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6459–6469
Wang Y, Guerrero R, Pavlovic V (2023b) D2f2wod: learning object proposals for weakly-supervised object detection via progressive domain adaptation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 22–31
Feng X, Yao X, Shen H, Cheng G, Xiao B, Han J (2023) Learning an invariant and equivariant network for weakly supervised object detection. IEEE Trans Pattern Anal Mach Intell
Sui L, Zhang CL, Wu J (2022) Salvage of supervision in weakly supervised object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14227–14236
Gao W, Wan F, Yue J, Xu S, Ye Q (2022) Discrepant multiple instance learning for weakly supervised object detection. Pattern Recognit 122:108233
Wei Y, Shen Z, Cheng B, Shi H, Xiong J, Feng J, Huang T (2018) Ts2c: tight box mining with surrounding segmentation context for weakly supervised object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 434–450
Choe J, Han D, Yun S, Ha JW, Oh SJ, Shim H (2021) Region-based dropout with attention prior for weakly supervised object localization. Pattern Recognit 116:107949
Murtaza S, Belharbi S, Pedersoli M, Sarraf A, Granger E (2023) Discriminative sampling of proposals in self-supervised transformers for weakly supervised object localization. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 155–165
Shao F, Chen L, Shao J, Ji W, Xiao S, Ye L, Zhuang Y, Xiao J (2022) Deep learning for weakly-supervised object detection and localization: a survey. Neurocomputing 496:192–207
Bai J, Ren J, Xiao Z, Chen Z, Gao C, Ali TAA, Jiao L (2023) Localizing from classification: self-directed weakly supervised object localization for remote sensing images. IEEE Trans Neural Netw Learn Syst
Hui W, Tan C, Gu G, Zhao Y (2022) Gradient-based refined class activation map for weakly supervised object localization. Pattern Recognit 128:108664
Tang P, Wang X, Bai S, Shen W, Bai X, Liu W, Yuille A (2018) Pcl: proposal cluster learning for weakly supervised object detection. IEEE Trans Pattern Anal Mach Intell 42(1):176–191
Zeng Z, Liu B, Fu J, Chao H, Zhang L (2019) Wsod2: learning bottom-up and top-down objectness distillation for weakly-supervised object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8292–8300
Wang J, Chen Y, Dong Z, Gao M (2023) Improved yolov5 network for real-time multi-scale traffic sign detection. Neural Comput Appl 35(10):7853–7865
Piao Z, Wang J, Tanga L, Zhao B, Wang W (2022) Accloc: anchor-free and two-stage detector for accurate object localization. Pattern Recognit 126:108523
Wang J, Zhao C, Huo Z, Qiao Y, Sima H (2022) High quality proposal feature generation for crowded pedestrian detection. Pattern Recognit 128:108605
Shao Z, Su Y, Zhou Y, Meng F, Zhu H, Liu B, Yao R (2023) Ct-net: arbitrary-shaped text detection via contour transformer. IEEE Trans Circuits Syst Video Technol
Chen X, Xie S, He K (2021) An empirical study of training self-supervised vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9640–9649
Chen TS, Hung WC, Tseng HY, Chien SY, Yang MH (2021b) Incremental false negative detection for contrastive learning. Preprint arXiv:2106.03719
He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9729–9738
Li J, Zhou P, Xiong C, Hoi SC (2020) Prototypical contrastive learning of unsupervised representations. arXiv:2005.04966
Lim S, Park J, Lee M, Lee H (2023) Unsupervised object discovery with pseudo label generated using k-means and self-supervised transformer. Neurocomputing 545:126326
Zhuang C, Zhai AL, Yamins D (2019) Local aggregation for unsupervised learning of visual embeddings. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6002–6012
Caron M, Bojanowski P, Joulin A, Douze M (2018) Deep clustering for unsupervised learning of visual features. In: Proceedings of the European conference on computer vision (ECCV), pp 132–149
Van Gansbeke W, Vandenhende S, Georgoulis S, Proesmans M, Van Gool L (2020) Scan: learning to classify images without labels. In: European conference on computer vision, Springer, pp 268–285
Niu C, Shan H, Wang G (2022) Spice: semantic pseudo-labeling for image clustering. IEEE Trans Image Process 31:7264–7278
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al. (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929
Mao Z, Zhou Y, Sun J, Wu H, Pan F, Ahmad B (2023) Weakly-supervised object localization with gradient-pyramid feature. Appl Intell 53(3):2923–2935
Ramaswamy HG et al (2020) Ablation-cam: visual explanations for deep convolutional network via gradient-free localization. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 983–991
Jia Q, Wei S, Ruan T, Zhao Y, Zhao Y (2021) Gradingnet: towards providing reliable supervisions for weakly supervised object detection by grading the box candidates. Proceedings of the AAAI Conference on Artificial Intelligence, vol 35, pp 1682–1690
Ren Z, Tang Y, Zhang W (2023) Ido: instance dual-optimization for weakly supervised object detection. Appl Intell 1–18
Feng X, Han J, Yao X, Cheng G (2020) Tcanet: triple context-aware network for weakly supervised object detection in remote sensing images. IEEE Trans Geosci Remote Sens 59(8):6946–6955
Zhong Y, Wang J, Peng J, Zhang L (2020) Boosting weakly supervised object detection with progressive knowledge transfer. In: European conference on computer vision, Springer, pp 615–631
Hou L, Zhang Y, Fu K, Li J (2021) Informative and consistent correspondence mining for cross-domain weakly supervised object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9929–9938
Cai Y, Tan X, Tan X (2017) Selective weakly supervised human detection under arbitrary poses. Pattern Recognit 65:223–237
Huang Z, Bao Y, Dong B, Zhou E, Zuo W (2022) W2n: switching from weak supervision to noisy supervision for object detection. In: European conference on computer vision, Springer, pp 708–724
Chen T, Kornblith S, Norouzi M, Hinton G (2020a) A simple framework for contrastive learning of visual representations. In: International conference on machine learning, PMLR, pp 1597–1607
Chen T, Kornblith S, Swersky K, Norouzi M, Hinton GE (2020) Big self-supervised models are strong semi-supervised learners. Adv Neural Inf Process Syst 33:22243–22255
Deselaers T, Alexe B, Ferrari V (2012) Weakly supervised localization and learning with generic knowledge. Int J Comput Vis 100(3):275–293
Deselaers T, Alexe B, Ferrari V (2012) Weakly supervised localization and learning with generic knowledge. Int J Comput Vis 100(3):275–293
Li X, Kan M, Shan S, Chen X (2019) Weakly supervised object detection with segmentation collaboration. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9735–9744
Wan F, Liu C, Ke W, Ji X, Jiao J, Ye Q (2019) C-mil: continuation multiple instance learning for weakly supervised object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2199–2208
Yang K, Li D, Dou Y (2019) Towards precise end-to-end weakly supervised object detection network. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8372–8381
Chen Z, Fu Z, Jiang R, Chen Y, Hua XS (2020) Slv: spatial likelihood voting for weakly supervised object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12995–13004
Arun A, Jawahar CV, Kumar MP (2019) Dissimilarity coefficient based weakly supervised object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9432–9441
Gao Y, Liu B, Guo N, Ye X, Wan F, You H, Fan D (2019) C-midn: coupled multiple instance detection network with segmentation guidance for weakly supervised object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9834–9843
Tao Q, Yang H, Cai J (2018) Exploiting web images for weakly supervised object detection. IEEE Trans Multimed 21(5):1135–1146
Dong B, Huang Z, Guo Y, Wang Q, Niu Z, Zuo W (2021) Boosting weakly supervised object detection via learning bounding box adjusters. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2876–2885
Cao T, Du L, Zhang X, Chen S, Zhang Y, Wang YF (2021) Cat: weakly supervised object detection with category transfer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3070–3079
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Gokberk Cinbis R, Verbeek J, Schmid C (2014) Multi-fold mil training for weakly supervised object localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2409–2416
Bilen H, Pedersoli M, Tuytelaars T (2015) Weakly supervised object detection with convex clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1081–1089
Wang C, Ren W, Huang K, Tan T (2014) Weakly supervised object localization with latent category learning. In: European conference on computer vision, Springer, pp 431–445
Li D, Huang JB, Li Y, Wang S, Yang MH (2016) Weakly supervised object localization with progressive domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3512–3520
Teh EW, Rochan M, Wang Y (2016) Attention networks for weakly supervised object localization. In: BMVC, pp 1–11
Kantorov V, Oquab M, Cho M, Laptev I (2016) Contextlocnet: context-aware deep network models for weakly supervised localization. In: European conference on computer vision, Springer, pp 350–365
Jie Z, Wei Y, Jin X, Feng J, Liu W (2017) Deep self-taught learning for weakly supervised object localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1377–1385
Diba A, Sharma V, Pazandeh A, Pirsiavash H, Van Gool L (2017) Weakly supervised cascaded convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 914–922
Wan F, Wei P, Jiao J, Han Z, Ye Q (2018) Min-entropy latent model for weakly supervised object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1297–1306
Tang P, Wang X, Wang A, Yan Y, Liu W, Huang J, Yuille A (2018) Weakly supervised region proposal network and object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 352–368
Shen Y, Ji R, Wang Y, Wu Y, Cao L (2019) Cyclic guidance for weakly supervised joint detection and segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 697–707
Gao M, Li A, Yu R, Morariu VI, Davis LS (2018) C-wsl: count-guided weakly supervised localization. In: Proceedings of the European conference on computer vision (ECCV), pp 152–168
Sun G, Wang W, Dai J, Van Gool L (2020) Mining cross-image semantics for weakly supervised semantic segmentation. In: European conference on computer vision, Springer, pp 347–365
Huang Z, Zou Y, Kumar B, Huang D (2020) Comprehensive attention self-distillation for weakly-supervised object detection. Adv Neural Inf Process Syst 33:16797–16807
Acknowledgements
This work is supported by the China Postdoctoral Science Foundation (Grant No. 259822), the National Postdoctoral program for Innovative Talents (Grant No. BX20200108), the National Science Foundation of China (Grant No. 62206077 and No. 61976070), and the Science Foundation of Heilongjiang Province (LH2021F024).
Author information
Authors and Affiliations
Contributions
Yongqiang Zhang and Rui Tian conceived of the presented idea. Yongqiang Zhang, Rui Tian, Yin Zhang and Zian Zhang carried out the experiment and wrote the manuscript with support from Wangmeng Zuo and Mingli Ding. MingliDing supervised the project. Mingli Ding and Yancheng Bai contributed to the final version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
We would like to note that in the manuscript entitled “R-CCF: Region-aware Continual Contrastive Fusion for Weakly Supervised Object Detection”, no conflict of interest exists in the submission of this manuscript, and the manuscript is approved by all authors for publication.
Ethical and informed consent
We confirm that the used data does not include any Ethical consent.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, Y., Tian, R., Zhang, Y. et al. R-CCF: region-aware continual contrastive fusion for weakly supervised object detection. Appl Intell 54, 4689–4712 (2024). https://doi.org/10.1007/s10489-024-05403-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05403-3