Skip to main content
Log in

IDO: Instance dual-optimization for weakly supervised object detection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Weakly supervised object detection (WSOD) has attracted significant attention in recent years, as it utilizes only image-level annotations to train object detectors and greatly reduces the labor and capital cost of fine labeling. Nevertheless, the absence of instance-level annotations leads to two phenomena: partial regions and missing instances. We believe these are mainly caused by two issues: 1) Noisy instances exist in the training samples, which can confuse the detector. 2) Global salient information is missing, resulting in little attention being received in the low-confidence region. To solve the above two problems, we propose an instance dual-optimization framework called IDO. First, an instance-wise selection strategy (IWSS) based on curriculum learning is proposed for instance denoising and for improving the robustness of the model. Second, CAM-generated spatial attention (CGSA) is carefully designed to optimize the features of instances. Without introducing additional hyperparameters, our CGSA complements the low class-confidence region with more global salient information, which assists the model in acquiring a more complete region of the target and identifying more neglected targets. Finally, we empirically demonstrate that our proposal can achieve comparable results to those of other state-of-the-art methods on PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availibility

The data that support the findings of this study are openly available at https://doi.org/https://host.robots.ox.ac.uk/pascal/VOC and https://cocodataset.org/.

References

  1. Girshick, R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448

  2. Ren S, He, K, Girshick, R, Sun, J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems 28

  3. Sun C, Ai Y, Wang S, Zhang W (2021) Mask-guided ssd for small-object detection. Appl Intell 51(6):3311–3322

    Article  Google Scholar 

  4. Tang Z, Yang J, Pei Z, Song X (2021) Coordinate-based anchor-free module for object detection. Appl Intell 51(12):9066–9080

    Article  Google Scholar 

  5. Leng J, Liu Y (2022) Context augmentation for object detection. Appl Intell 52(3):2621–2633

    Article  Google Scholar 

  6. Li, Y, Zhou, S, Chen, H (2022) Attention-based fusion factor in fpn for object detection. Appl Intell, 1–10

  7. Ding X, Li Q, Cheng Y, Wang J, Bian W, Jie B (2020) Local keypoint555 based faster r-cnn. Applied Intelligence 50(10):3007–3022

    Article  Google Scholar 

  8. Deng, J, Dong, W, Socher, R, Li, L.-J, Li, K, Fei-Fei, L (2009) Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 Ieee

  9. Everingham M, Van Gool L, Williams CK, Winn J (2010) Zisserman, A The pascal visual object classes (voc) challenge. International Journal of Computer Vision 88(2):303–338

    Article  Google Scholar 

  10. Lin T.-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick, C.L (2014) Microsoft coco: Common objects in context. In: Proceedings of the European Conference on Computer Vision, pp. 740–755 Springer

  11. Tang P, Wang X, Bai S, Shen W, Bai X, Liu W, Yuille A (2018) Pcl: Proposal cluster learning for weakly supervised object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 42(1):176–191

    Article  Google Scholar 

  12. Zeng Z, Liu B, Fu J, Chao H, Zhang L (2019) Wsod2: Learning bottom-up and top-down objectness distillation for weakly-supervised object detec573 tion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8292–8300

  13. Ren Z, Yu Z, Yang X, Liu M.-Y, Lee Y.J, Schwing A.G, Kautz J (2020) Instance-aware, context-focused, and memory-efficient weakly supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10598–10607

  14. Bilen H, Vedaldi A (2016) Weakly supervised deep detection networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2846–2854

  15. Tang P, Wang, X, Bai, X, Liu, W (2017) Multiple instance detection network with online instance classifier refinement. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2843–2851

  16. Yang K, Li D, Dou Y (2019) Towards precise end-to-end weakly super587 vised object detection network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8372–8381

  17. Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. International Journal of Computer Vision 104(2):154–171

    Article  Google Scholar 

  18. Bengio Y, Louradour J, Collobert, R, Weston J (2009) Curriculum learning. In: Proceedings of the International Conference on Machine Learning, pp. 41–48

  19. Kantorov V, Oquab, M, Cho M, Laptev I (2016) Contextlocnet: Context aware deep network models for weakly supervised localization. In: Pro597 ceedings of the European Conference on Computer Vision, pp. 350–365 . Springer

  20. Wan F, Liu C, Ke W, Ji X, Jiao J, Ye Q (2019) C-mil: Continuation multiple instance learning for weakly supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2199–2208

  21. Gao Y, Liu B, Guo N, Ye X, Wan F, You H, Fan D (2019) C-midn: Coupled multiple instance detection network with segmentation guidance for weakly supervised object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9834–9843

  22. Xu Y, Zhou C, Yu X, Xiao B, Yang Y (2021) Pyramidal multiple instance detection network with mask guided self-correction for weakly supervised object detection. IEEE Transactions on Image Processing 30:3029–3040

    Article  Google Scholar 

  23. Yang K, Zhang P, Qiao P, Wang Z, Dai H, Shen T, Li D, Dou Y (2020) Rethinking segmentation guidance for weakly supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 946–947

  24. Lin C, Wang S, Xu D, Lu Y, Zhang W (2020) Object instance mining for weakly supervised object detection. Proceedings of the AAAI Conference on Artificial Intelligence 34:11482–11489

    Article  Google Scholar 

  25. Shen Y, Ji R, Chen Z, Wu Y, Huang F (2020) Uwsod: Toward fully supervised-level capacity weakly supervised object detection. Advances in Neural Information Processing Systems 33:7005–7019

    Google Scholar 

  26. Jia Q, Wei S, Ruan T, Zhao Y, Zhao Y (2021) Gradingnet: towards providing reliable supervisions for weakly supervised object detection by grading the box candidates. Proceedings of the AAAI Conference on Artificial Intelligence 35:1682–1690

    Article  Google Scholar 

  27. Yin Y, Deng J, Zhou W, Li L, Li H (2022) Fi-wsod: Foreground information guided weakly supervised object detection. IEEE Transactions on Multimedia

  28. Gao M, Li A, Yu R, Morariu V.I, Davis L.S (2018) C-wsl: Count guided weakly supervised localization. In: Proceedings of the European Conference on Computer Vision, pp. 152–168

  29. Shen, Y., Ji, R., Wang, Y., Wu, Y., Cao, L (2019) Cyclic guidance for weakly supervised joint detection and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 697–707

  30. Biffi C, McDonagh S, Torr P, Leonardis A (2020) Parisot S Many-shot from low-shot: Learning to annotate using mixed supervision for object detection. In: Proceedings of the European Conference on Computer Vision, pp. 35–50 Springer

  31. Huang Z, Zou Y, Kumar B, Huang D (2020) Comprehensive attention self distillation for weakly-supervised object detection. Advances in Neural Information Processing Systems 33:16797–16807

    Google Scholar 

  32. Dong B, Huang Z, Guo Y, Wang Q, Niu Z, Zuo W (2021) Boosting weakly supervised object detection via learning bounding box adjusters. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2876–2885

  33. Wu Z, Wen J, Xu Y, Yang J, Li X, Zhang D (2022) Enhanced spatial feature learning for weakly supervised object detection. IEEE Transactions on Neural Networks and Learning Systems

  34. Gao W, Wan F, Yue J, Xu S, Ye Q (2022) Discrepant multiple instance learning for weakly supervised object detection. Pattern Recognition 122 108233

  35. Li X, Yi S, Zhang R, Fu X, Jiang H, Wang C, Liu Z, Gao J, Yu J, Yu M et al (2022) Dynamic sample weighting for weakly supervised object detection. Image and Vision Computing 122:104444

    Article  Google Scholar 

  36. Wu Z, Liu C, Wen J, Xu Y, Yang J, Li X (2022) Selecting high quality proposals for weakly supervised object detection with bottom-up aggregated attention and phase-aware loss. IEEE Transactions on Image Processing

  37. Wang X, Chen Y, Zhu W (2021) A survey on curriculum learning. IEEE Transactions on Pattern Analysis and Machine Intelligence

  38. Zhang D, Han J, Zhao L, Meng D (2019) Leveraging prior-knowledge for weakly supervised object detection under a collaborative self-paced curriculum learning framework. International Journal of Computer Vision 127(4):363–380

    Article  MATH  Google Scholar 

  39. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929

  40. Selvaraju R.R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626

  41. Jiang P-T, Zhang C-B, Hou Q, Cheng M-M, Wei Y (2021) Layercam: Exploring hierarchical class activation maps for localization. IEEE Transactions on Image Processing 30:5875–5888

    Article  Google Scholar 

  42. Diba A, Sharma V, Pazandeh A, Pirsiavash H, Van Gool L (2017) Weakly supervised cascaded convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 914–922

  43. Wei Y, Shen Z, Cheng B, Shi H, Xiong J, Feng J, Huang T (2018) Ts2c: Tight box mining with surrounding segmentation context for weakly supervised object detection. In: Proceedings of the European Conference on Computer Vision, pp. 434–450

  44. Cheng G, Yang J, Gao D, Guo L, Han J (2020) High-quality proposals for weakly supervised object detection. IEEE Transactions on Image Processing 29:5794–5804

    Article  MATH  Google Scholar 

  45. Xia R, Li G, Huang Z, Meng H, Pang Y (2022) Cbash: Combined backbone and advanced selection heads with object semantic proposals for weakly supervised object detection. IEEE Transactions on Circuits and Systems for Video Technology

  46. Zhang D, Han J, Zhao L, Zhao T (2020) From discriminant to complete: Reinforcement searching-agent learning for weakly supervised object detection. IEEE Transactions on Neural Networks and Learning Systems 31(12):5549–5560

  47. Zhang D, Zeng W, Yao J, Han J (2020) Weakly supervised object detection using proposal-and semantic-level relationships. IEEE Transactions on Pattern Analysis and Machine Intelligence

  48. Jiang W, Zhao Z, Su F, Fang Y (2021) Dynamic proposal sampling for weakly supervised object detection. Neurocomputing 441:248–259

    Article  Google Scholar 

  49. Zhang Y, Bai Y, Ding M, Li Y, Ghanem, B (2018) W2f: A weakly supervised to fully-supervised framework for object detection. In: Proceed702 ings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 928–936

  50. Li H, Li Y, Cao Y, Han Y, Jin Y, Wei Y (2022) Weakly supervised object detection with class prototypical network. IEEE Transactions on Multimedia

  51. Deselaers T, Alexe B, Ferrari V (2012) Weakly supervised localization and learning with generic knowledge. International Journal of Computer Vision 100(3):275–293

    Article  MathSciNet  Google Scholar 

  52. Arbeláez P, Pont-Tuset J, Barron J. T, Marques F, Malik J, (2014) Multiscale combinatorial grouping In: Proceedings of the IEEE conference on computer vision and pattern recognition, 328–335

  53. Kosugi S, Yamasaki T, Aizawa, K (2019) Object-aware instance labeling for weakly supervised object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6064–6072

  54. Wan F, Wei P, Jiao J, Han Z, Ye Q (2018) Min-entropy latent model for weakly supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1297–1306

  55. Liao M, Wan F, Yao Y, Han Z, Zou J, Wang Y, Feng B, Yuan P, Ye Q (2022) End-to-end weakly supervised object detection with sparse proposal evolution. In: Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part IX, pp. 724 210–226 Springer

Download references

Acknowledgements

The authors appreciate Gaowei Wu’s constructive comments which helped to improve the quality of this article. This research is supported by the National Key Research and Development Program of China (2020AAA0109600) and the National Natural Science Foundation of China (62106266, 62173328, 62006139, 61976213).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongqiang Tang.

Ethics declarations

Conflict of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ren, Z., Tang, Y. & Zhang, W. IDO: Instance dual-optimization for weakly supervised object detection. Appl Intell 53, 26763–26780 (2023). https://doi.org/10.1007/s10489-023-04956-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04956-z

Keywords

Navigation