Abstract
Weakly supervised object detection (WSOD) has attracted significant attention in recent years, as it utilizes only image-level annotations to train object detectors and greatly reduces the labor and capital cost of fine labeling. Nevertheless, the absence of instance-level annotations leads to two phenomena: partial regions and missing instances. We believe these are mainly caused by two issues: 1) Noisy instances exist in the training samples, which can confuse the detector. 2) Global salient information is missing, resulting in little attention being received in the low-confidence region. To solve the above two problems, we propose an instance dual-optimization framework called IDO. First, an instance-wise selection strategy (IWSS) based on curriculum learning is proposed for instance denoising and for improving the robustness of the model. Second, CAM-generated spatial attention (CGSA) is carefully designed to optimize the features of instances. Without introducing additional hyperparameters, our CGSA complements the low class-confidence region with more global salient information, which assists the model in acquiring a more complete region of the target and identifying more neglected targets. Finally, we empirically demonstrate that our proposal can achieve comparable results to those of other state-of-the-art methods on PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO.
Similar content being viewed by others
Data Availibility
The data that support the findings of this study are openly available at https://doi.org/https://host.robots.ox.ac.uk/pascal/VOC and https://cocodataset.org/.
References
Girshick, R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448
Ren S, He, K, Girshick, R, Sun, J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems 28
Sun C, Ai Y, Wang S, Zhang W (2021) Mask-guided ssd for small-object detection. Appl Intell 51(6):3311–3322
Tang Z, Yang J, Pei Z, Song X (2021) Coordinate-based anchor-free module for object detection. Appl Intell 51(12):9066–9080
Leng J, Liu Y (2022) Context augmentation for object detection. Appl Intell 52(3):2621–2633
Li, Y, Zhou, S, Chen, H (2022) Attention-based fusion factor in fpn for object detection. Appl Intell, 1–10
Ding X, Li Q, Cheng Y, Wang J, Bian W, Jie B (2020) Local keypoint555 based faster r-cnn. Applied Intelligence 50(10):3007–3022
Deng, J, Dong, W, Socher, R, Li, L.-J, Li, K, Fei-Fei, L (2009) Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 Ieee
Everingham M, Van Gool L, Williams CK, Winn J (2010) Zisserman, A The pascal visual object classes (voc) challenge. International Journal of Computer Vision 88(2):303–338
Lin T.-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick, C.L (2014) Microsoft coco: Common objects in context. In: Proceedings of the European Conference on Computer Vision, pp. 740–755 Springer
Tang P, Wang X, Bai S, Shen W, Bai X, Liu W, Yuille A (2018) Pcl: Proposal cluster learning for weakly supervised object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 42(1):176–191
Zeng Z, Liu B, Fu J, Chao H, Zhang L (2019) Wsod2: Learning bottom-up and top-down objectness distillation for weakly-supervised object detec573 tion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8292–8300
Ren Z, Yu Z, Yang X, Liu M.-Y, Lee Y.J, Schwing A.G, Kautz J (2020) Instance-aware, context-focused, and memory-efficient weakly supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10598–10607
Bilen H, Vedaldi A (2016) Weakly supervised deep detection networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2846–2854
Tang P, Wang, X, Bai, X, Liu, W (2017) Multiple instance detection network with online instance classifier refinement. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2843–2851
Yang K, Li D, Dou Y (2019) Towards precise end-to-end weakly super587 vised object detection network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8372–8381
Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. International Journal of Computer Vision 104(2):154–171
Bengio Y, Louradour J, Collobert, R, Weston J (2009) Curriculum learning. In: Proceedings of the International Conference on Machine Learning, pp. 41–48
Kantorov V, Oquab, M, Cho M, Laptev I (2016) Contextlocnet: Context aware deep network models for weakly supervised localization. In: Pro597 ceedings of the European Conference on Computer Vision, pp. 350–365 . Springer
Wan F, Liu C, Ke W, Ji X, Jiao J, Ye Q (2019) C-mil: Continuation multiple instance learning for weakly supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2199–2208
Gao Y, Liu B, Guo N, Ye X, Wan F, You H, Fan D (2019) C-midn: Coupled multiple instance detection network with segmentation guidance for weakly supervised object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9834–9843
Xu Y, Zhou C, Yu X, Xiao B, Yang Y (2021) Pyramidal multiple instance detection network with mask guided self-correction for weakly supervised object detection. IEEE Transactions on Image Processing 30:3029–3040
Yang K, Zhang P, Qiao P, Wang Z, Dai H, Shen T, Li D, Dou Y (2020) Rethinking segmentation guidance for weakly supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 946–947
Lin C, Wang S, Xu D, Lu Y, Zhang W (2020) Object instance mining for weakly supervised object detection. Proceedings of the AAAI Conference on Artificial Intelligence 34:11482–11489
Shen Y, Ji R, Chen Z, Wu Y, Huang F (2020) Uwsod: Toward fully supervised-level capacity weakly supervised object detection. Advances in Neural Information Processing Systems 33:7005–7019
Jia Q, Wei S, Ruan T, Zhao Y, Zhao Y (2021) Gradingnet: towards providing reliable supervisions for weakly supervised object detection by grading the box candidates. Proceedings of the AAAI Conference on Artificial Intelligence 35:1682–1690
Yin Y, Deng J, Zhou W, Li L, Li H (2022) Fi-wsod: Foreground information guided weakly supervised object detection. IEEE Transactions on Multimedia
Gao M, Li A, Yu R, Morariu V.I, Davis L.S (2018) C-wsl: Count guided weakly supervised localization. In: Proceedings of the European Conference on Computer Vision, pp. 152–168
Shen, Y., Ji, R., Wang, Y., Wu, Y., Cao, L (2019) Cyclic guidance for weakly supervised joint detection and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 697–707
Biffi C, McDonagh S, Torr P, Leonardis A (2020) Parisot S Many-shot from low-shot: Learning to annotate using mixed supervision for object detection. In: Proceedings of the European Conference on Computer Vision, pp. 35–50 Springer
Huang Z, Zou Y, Kumar B, Huang D (2020) Comprehensive attention self distillation for weakly-supervised object detection. Advances in Neural Information Processing Systems 33:16797–16807
Dong B, Huang Z, Guo Y, Wang Q, Niu Z, Zuo W (2021) Boosting weakly supervised object detection via learning bounding box adjusters. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2876–2885
Wu Z, Wen J, Xu Y, Yang J, Li X, Zhang D (2022) Enhanced spatial feature learning for weakly supervised object detection. IEEE Transactions on Neural Networks and Learning Systems
Gao W, Wan F, Yue J, Xu S, Ye Q (2022) Discrepant multiple instance learning for weakly supervised object detection. Pattern Recognition 122 108233
Li X, Yi S, Zhang R, Fu X, Jiang H, Wang C, Liu Z, Gao J, Yu J, Yu M et al (2022) Dynamic sample weighting for weakly supervised object detection. Image and Vision Computing 122:104444
Wu Z, Liu C, Wen J, Xu Y, Yang J, Li X (2022) Selecting high quality proposals for weakly supervised object detection with bottom-up aggregated attention and phase-aware loss. IEEE Transactions on Image Processing
Wang X, Chen Y, Zhu W (2021) A survey on curriculum learning. IEEE Transactions on Pattern Analysis and Machine Intelligence
Zhang D, Han J, Zhao L, Meng D (2019) Leveraging prior-knowledge for weakly supervised object detection under a collaborative self-paced curriculum learning framework. International Journal of Computer Vision 127(4):363–380
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929
Selvaraju R.R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626
Jiang P-T, Zhang C-B, Hou Q, Cheng M-M, Wei Y (2021) Layercam: Exploring hierarchical class activation maps for localization. IEEE Transactions on Image Processing 30:5875–5888
Diba A, Sharma V, Pazandeh A, Pirsiavash H, Van Gool L (2017) Weakly supervised cascaded convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 914–922
Wei Y, Shen Z, Cheng B, Shi H, Xiong J, Feng J, Huang T (2018) Ts2c: Tight box mining with surrounding segmentation context for weakly supervised object detection. In: Proceedings of the European Conference on Computer Vision, pp. 434–450
Cheng G, Yang J, Gao D, Guo L, Han J (2020) High-quality proposals for weakly supervised object detection. IEEE Transactions on Image Processing 29:5794–5804
Xia R, Li G, Huang Z, Meng H, Pang Y (2022) Cbash: Combined backbone and advanced selection heads with object semantic proposals for weakly supervised object detection. IEEE Transactions on Circuits and Systems for Video Technology
Zhang D, Han J, Zhao L, Zhao T (2020) From discriminant to complete: Reinforcement searching-agent learning for weakly supervised object detection. IEEE Transactions on Neural Networks and Learning Systems 31(12):5549–5560
Zhang D, Zeng W, Yao J, Han J (2020) Weakly supervised object detection using proposal-and semantic-level relationships. IEEE Transactions on Pattern Analysis and Machine Intelligence
Jiang W, Zhao Z, Su F, Fang Y (2021) Dynamic proposal sampling for weakly supervised object detection. Neurocomputing 441:248–259
Zhang Y, Bai Y, Ding M, Li Y, Ghanem, B (2018) W2f: A weakly supervised to fully-supervised framework for object detection. In: Proceed702 ings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 928–936
Li H, Li Y, Cao Y, Han Y, Jin Y, Wei Y (2022) Weakly supervised object detection with class prototypical network. IEEE Transactions on Multimedia
Deselaers T, Alexe B, Ferrari V (2012) Weakly supervised localization and learning with generic knowledge. International Journal of Computer Vision 100(3):275–293
Arbeláez P, Pont-Tuset J, Barron J. T, Marques F, Malik J, (2014) Multiscale combinatorial grouping In: Proceedings of the IEEE conference on computer vision and pattern recognition, 328–335
Kosugi S, Yamasaki T, Aizawa, K (2019) Object-aware instance labeling for weakly supervised object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6064–6072
Wan F, Wei P, Jiao J, Han Z, Ye Q (2018) Min-entropy latent model for weakly supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1297–1306
Liao M, Wan F, Yao Y, Han Z, Zou J, Wang Y, Feng B, Yuan P, Ye Q (2022) End-to-end weakly supervised object detection with sparse proposal evolution. In: Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part IX, pp. 724 210–226 Springer
Acknowledgements
The authors appreciate Gaowei Wu’s constructive comments which helped to improve the quality of this article. This research is supported by the National Key Research and Development Program of China (2020AAA0109600) and the National Natural Science Foundation of China (62106266, 62173328, 62006139, 61976213).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ren, Z., Tang, Y. & Zhang, W. IDO: Instance dual-optimization for weakly supervised object detection. Appl Intell 53, 26763–26780 (2023). https://doi.org/10.1007/s10489-023-04956-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04956-z