IDO: Instance dual-optimization for weakly supervised object detection

Ren, Zhida; Tang, Yongqiang; Zhang, Wensheng

doi:10.1007/s10489-023-04956-z

IDO: Instance dual-optimization for weakly supervised object detection

Published: 29 August 2023

Volume 53, pages 26763–26780, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

179 Accesses
1 Citation
Explore all metrics

Abstract

Weakly supervised object detection (WSOD) has attracted significant attention in recent years, as it utilizes only image-level annotations to train object detectors and greatly reduces the labor and capital cost of fine labeling. Nevertheless, the absence of instance-level annotations leads to two phenomena: partial regions and missing instances. We believe these are mainly caused by two issues: 1) Noisy instances exist in the training samples, which can confuse the detector. 2) Global salient information is missing, resulting in little attention being received in the low-confidence region. To solve the above two problems, we propose an instance dual-optimization framework called IDO. First, an instance-wise selection strategy (IWSS) based on curriculum learning is proposed for instance denoising and for improving the robustness of the model. Second, CAM-generated spatial attention (CGSA) is carefully designed to optimize the features of instances. Without introducing additional hyperparameters, our CGSA complements the low class-confidence region with more global salient information, which assists the model in acquiring a more complete region of the target and identifying more neglected targets. Finally, we empirically demonstrate that our proposal can achieve comparable results to those of other state-of-the-art methods on PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Microsoft COCO: Common Objects in Context

Data Availibility

The data that support the findings of this study are openly available at https://doi.org/https://host.robots.ox.ac.uk/pascal/VOC and https://cocodataset.org/.

References

Girshick, R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448
Ren S, He, K, Girshick, R, Sun, J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems 28
Sun C, Ai Y, Wang S, Zhang W (2021) Mask-guided ssd for small-object detection. Appl Intell 51(6):3311–3322
Article Google Scholar
Tang Z, Yang J, Pei Z, Song X (2021) Coordinate-based anchor-free module for object detection. Appl Intell 51(12):9066–9080
Article Google Scholar
Leng J, Liu Y (2022) Context augmentation for object detection. Appl Intell 52(3):2621–2633
Article Google Scholar
Li, Y, Zhou, S, Chen, H (2022) Attention-based fusion factor in fpn for object detection. Appl Intell, 1–10
Ding X, Li Q, Cheng Y, Wang J, Bian W, Jie B (2020) Local keypoint555 based faster r-cnn. Applied Intelligence 50(10):3007–3022
Article Google Scholar
Deng, J, Dong, W, Socher, R, Li, L.-J, Li, K, Fei-Fei, L (2009) Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 Ieee
Everingham M, Van Gool L, Williams CK, Winn J (2010) Zisserman, A The pascal visual object classes (voc) challenge. International Journal of Computer Vision 88(2):303–338
Article Google Scholar
Lin T.-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick, C.L (2014) Microsoft coco: Common objects in context. In: Proceedings of the European Conference on Computer Vision, pp. 740–755 Springer
Tang P, Wang X, Bai S, Shen W, Bai X, Liu W, Yuille A (2018) Pcl: Proposal cluster learning for weakly supervised object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 42(1):176–191
Article Google Scholar
Zeng Z, Liu B, Fu J, Chao H, Zhang L (2019) Wsod2: Learning bottom-up and top-down objectness distillation for weakly-supervised object detec573 tion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8292–8300
Ren Z, Yu Z, Yang X, Liu M.-Y, Lee Y.J, Schwing A.G, Kautz J (2020) Instance-aware, context-focused, and memory-efficient weakly supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10598–10607
Bilen H, Vedaldi A (2016) Weakly supervised deep detection networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2846–2854
Tang P, Wang, X, Bai, X, Liu, W (2017) Multiple instance detection network with online instance classifier refinement. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2843–2851
Yang K, Li D, Dou Y (2019) Towards precise end-to-end weakly super587 vised object detection network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8372–8381
Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. International Journal of Computer Vision 104(2):154–171
Article Google Scholar
Bengio Y, Louradour J, Collobert, R, Weston J (2009) Curriculum learning. In: Proceedings of the International Conference on Machine Learning, pp. 41–48
Kantorov V, Oquab, M, Cho M, Laptev I (2016) Contextlocnet: Context aware deep network models for weakly supervised localization. In: Pro597 ceedings of the European Conference on Computer Vision, pp. 350–365 . Springer
Wan F, Liu C, Ke W, Ji X, Jiao J, Ye Q (2019) C-mil: Continuation multiple instance learning for weakly supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2199–2208
Gao Y, Liu B, Guo N, Ye X, Wan F, You H, Fan D (2019) C-midn: Coupled multiple instance detection network with segmentation guidance for weakly supervised object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9834–9843
Xu Y, Zhou C, Yu X, Xiao B, Yang Y (2021) Pyramidal multiple instance detection network with mask guided self-correction for weakly supervised object detection. IEEE Transactions on Image Processing 30:3029–3040
Article Google Scholar
Yang K, Zhang P, Qiao P, Wang Z, Dai H, Shen T, Li D, Dou Y (2020) Rethinking segmentation guidance for weakly supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 946–947
Lin C, Wang S, Xu D, Lu Y, Zhang W (2020) Object instance mining for weakly supervised object detection. Proceedings of the AAAI Conference on Artificial Intelligence 34:11482–11489
Article Google Scholar
Shen Y, Ji R, Chen Z, Wu Y, Huang F (2020) Uwsod: Toward fully supervised-level capacity weakly supervised object detection. Advances in Neural Information Processing Systems 33:7005–7019
Google Scholar
Jia Q, Wei S, Ruan T, Zhao Y, Zhao Y (2021) Gradingnet: towards providing reliable supervisions for weakly supervised object detection by grading the box candidates. Proceedings of the AAAI Conference on Artificial Intelligence 35:1682–1690
Article Google Scholar
Yin Y, Deng J, Zhou W, Li L, Li H (2022) Fi-wsod: Foreground information guided weakly supervised object detection. IEEE Transactions on Multimedia
Gao M, Li A, Yu R, Morariu V.I, Davis L.S (2018) C-wsl: Count guided weakly supervised localization. In: Proceedings of the European Conference on Computer Vision, pp. 152–168
Shen, Y., Ji, R., Wang, Y., Wu, Y., Cao, L (2019) Cyclic guidance for weakly supervised joint detection and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 697–707
Biffi C, McDonagh S, Torr P, Leonardis A (2020) Parisot S Many-shot from low-shot: Learning to annotate using mixed supervision for object detection. In: Proceedings of the European Conference on Computer Vision, pp. 35–50 Springer
Huang Z, Zou Y, Kumar B, Huang D (2020) Comprehensive attention self distillation for weakly-supervised object detection. Advances in Neural Information Processing Systems 33:16797–16807
Google Scholar
Dong B, Huang Z, Guo Y, Wang Q, Niu Z, Zuo W (2021) Boosting weakly supervised object detection via learning bounding box adjusters. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2876–2885
Wu Z, Wen J, Xu Y, Yang J, Li X, Zhang D (2022) Enhanced spatial feature learning for weakly supervised object detection. IEEE Transactions on Neural Networks and Learning Systems
Gao W, Wan F, Yue J, Xu S, Ye Q (2022) Discrepant multiple instance learning for weakly supervised object detection. Pattern Recognition 122 108233
Li X, Yi S, Zhang R, Fu X, Jiang H, Wang C, Liu Z, Gao J, Yu J, Yu M et al (2022) Dynamic sample weighting for weakly supervised object detection. Image and Vision Computing 122:104444
Article Google Scholar
Wu Z, Liu C, Wen J, Xu Y, Yang J, Li X (2022) Selecting high quality proposals for weakly supervised object detection with bottom-up aggregated attention and phase-aware loss. IEEE Transactions on Image Processing
Wang X, Chen Y, Zhu W (2021) A survey on curriculum learning. IEEE Transactions on Pattern Analysis and Machine Intelligence
Zhang D, Han J, Zhao L, Meng D (2019) Leveraging prior-knowledge for weakly supervised object detection under a collaborative self-paced curriculum learning framework. International Journal of Computer Vision 127(4):363–380
Article MATH Google Scholar
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929
Selvaraju R.R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626
Jiang P-T, Zhang C-B, Hou Q, Cheng M-M, Wei Y (2021) Layercam: Exploring hierarchical class activation maps for localization. IEEE Transactions on Image Processing 30:5875–5888
Article Google Scholar
Diba A, Sharma V, Pazandeh A, Pirsiavash H, Van Gool L (2017) Weakly supervised cascaded convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 914–922
Wei Y, Shen Z, Cheng B, Shi H, Xiong J, Feng J, Huang T (2018) Ts2c: Tight box mining with surrounding segmentation context for weakly supervised object detection. In: Proceedings of the European Conference on Computer Vision, pp. 434–450
Cheng G, Yang J, Gao D, Guo L, Han J (2020) High-quality proposals for weakly supervised object detection. IEEE Transactions on Image Processing 29:5794–5804
Article MATH Google Scholar
Xia R, Li G, Huang Z, Meng H, Pang Y (2022) Cbash: Combined backbone and advanced selection heads with object semantic proposals for weakly supervised object detection. IEEE Transactions on Circuits and Systems for Video Technology
Zhang D, Han J, Zhao L, Zhao T (2020) From discriminant to complete: Reinforcement searching-agent learning for weakly supervised object detection. IEEE Transactions on Neural Networks and Learning Systems 31(12):5549–5560
Zhang D, Zeng W, Yao J, Han J (2020) Weakly supervised object detection using proposal-and semantic-level relationships. IEEE Transactions on Pattern Analysis and Machine Intelligence
Jiang W, Zhao Z, Su F, Fang Y (2021) Dynamic proposal sampling for weakly supervised object detection. Neurocomputing 441:248–259
Article Google Scholar
Zhang Y, Bai Y, Ding M, Li Y, Ghanem, B (2018) W2f: A weakly supervised to fully-supervised framework for object detection. In: Proceed702 ings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 928–936
Li H, Li Y, Cao Y, Han Y, Jin Y, Wei Y (2022) Weakly supervised object detection with class prototypical network. IEEE Transactions on Multimedia
Deselaers T, Alexe B, Ferrari V (2012) Weakly supervised localization and learning with generic knowledge. International Journal of Computer Vision 100(3):275–293
Article MathSciNet Google Scholar
Arbeláez P, Pont-Tuset J, Barron J. T, Marques F, Malik J, (2014) Multiscale combinatorial grouping In: Proceedings of the IEEE conference on computer vision and pattern recognition, 328–335
Kosugi S, Yamasaki T, Aizawa, K (2019) Object-aware instance labeling for weakly supervised object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6064–6072
Wan F, Wei P, Jiao J, Han Z, Ye Q (2018) Min-entropy latent model for weakly supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1297–1306
Liao M, Wan F, Yao Y, Han Z, Zou J, Wang Y, Feng B, Yuan P, Ye Q (2022) End-to-end weakly supervised object detection with sparse proposal evolution. In: Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part IX, pp. 724 210–226 Springer

Download references

Acknowledgements

The authors appreciate Gaowei Wu’s constructive comments which helped to improve the quality of this article. This research is supported by the National Key Research and Development Program of China (2020AAA0109600) and the National Natural Science Foundation of China (62106266, 62173328, 62006139, 61976213).

Author information

Authors and Affiliations

State Key Laboratory of Multimodel Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhida Ren, Yongqiang Tang & Wensheng Zhang
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Zhida Ren & Wensheng Zhang

Authors

Zhida Ren
View author publications
You can also search for this author in PubMed Google Scholar
Yongqiang Tang
View author publications
You can also search for this author in PubMed Google Scholar
Wensheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yongqiang Tang.

Ethics declarations

Conflict of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ren, Z., Tang, Y. & Zhang, W. IDO: Instance dual-optimization for weakly supervised object detection. Appl Intell 53, 26763–26780 (2023). https://doi.org/10.1007/s10489-023-04956-z

Download citation

Accepted: 08 August 2023
Published: 29 August 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s10489-023-04956-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

IDO: Instance dual-optimization for weakly supervised object detection

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Microsoft COCO: Common Objects in Context

Data Availibility

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

IDO: Instance dual-optimization for weakly supervised object detection

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Microsoft COCO: Common Objects in Context

Data Availibility

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation