Skip to main content
Log in

Adaptive Generation of Weakly Supervised Semantic Segmentation for Object Detection

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Object detection and semantic segmentation are the basic tasks of computer vision. Recently, the combination of object detection and semantic segmentation has made great progress. With the box-level weakly supervised semantic segmentation(WSSS) method, we predict segmentation based on feature maps extracted from object detector. Existing methods require both box-level and pixel-level annotations to train the shared backbone network simultaneously to get the bounding boxes and segmentation. However, in the absence of pixel-level annotations and without changing the parameters of network framework, object detectors can’t predict semantic segmentation. We design a compact and plug-and-play object detection to semantic segmentation(O2S) module to enable object detectors to predict semantic masks, making full utilization of the training set and intermediate feature maps of object detection. We also propose a box-level weakly supervised probabilistic gap adaptive(PGA) method, which enables O2S to learn semantic masks from the training set of object detection. We evaluate the proposed approach on Pascal VOC 2007 and Pascal VOC 2012 and show its feasibility. With only 3.5 million parameters, the results of O2S trained with PGA are very close to the results of the whole networks trained with the WSSS methods. Our work has important implications for exploring the commonality of multiple visual tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Kong T, Sun F, Liu H, Jiang Y, Li L, Shi J (2020) Foveabox: Beyound anchor-based object detection. IEEE Trans Image Process 29:7389–7398

    Article  MATH  Google Scholar 

  2. Tian Z, Shen C, Chen H, He T (2019) Fcos: Fully convolutional one-stage object detection, In: IEEE/CVF international conference on computer vision (ICCV) vol 2019, pp 9626–9635

  3. Kai C, Pang J, Wang J, Yu X, Lin D (2019) Hybrid task cascade for instance segmentation, In: IEEE/CVF conference on computer vision and pattern recognition

  4. Yla B, Gqa B, Msa B, Jq C, Jie Y, Zza B (2021) Semantic and detail collaborative learning network for salient object detection, Neurocomputing, 462(2)

  5. Song C, Huang Y, Ouyang W, Wang L (2019) Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation, In: IEEE/CVF conference on computer vision and pattern recognition (CVPR) vol 2019, pp 3131–3140

  6. Dai J, He K, Sun J (dec 2015) “Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation,” In: 2015 IEEE international conference on computer vision (ICCV). Los Alamitos, CA, USA: IEEE Computer Society, pp

  7. Papandreou G, Chen L, Murphy KP, Yuille AL (2015) Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation, In: IEEE international conference on computer vision (ICCV) vol 2015, pp 1742–1750

  8. Khoreva A, Benenson R, Hosang J, Hein M, Schiele B (2017) Simple does it: Weakly supervised instance and semantic segmentation, In: IEEE conference on computer vision and pattern recognition (CVPR) vol 2017, pp 1665–1674

  9. Zhou X, Wang D, Krhenbühl P (2019) Objects as points

  10. Law H, Deng J (2020) Cornernet: Detecting objects as paired keypoints. Int J Comput Vis 128(3):642–656

    Article  Google Scholar 

  11. Yang Z, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: Point set representation for object detection, In: IEEE/CVF international conference on computer vision (ICCV) vol 2019, pp 9656–9665

  12. Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection, In: IEEE/CVF conference on computer vision and pattern recognition (CVPR) vol 2019, pp 840–849

  13. Shen Y, Ji R, Wang Y, Wu Y, Cao L (2019) Cyclic guidance for weakly supervised joint detection and segmentation, In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  14. Yu J, Yao J, Zhang J, Yu Z, Tao D (2021) Sprnet: Single-pixel reconstruction for one-stage instance segmentation. IEEE Trans Cybern 51(4):1731–1742

    Article  Google Scholar 

  15. Lin TY, Maire M, Belongie S, Hays J, Zitnick CL (2014) Microsoft coco: Common objects in context, In: European conference on computer vision

  16. Yu J, Tan M, Zhang H, Rui Y, Tao D (2022) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578

    Article  Google Scholar 

  17. Krähenbühl P, Koltun V (Oct. 2012) Efficient inference in fully connected CRFs with Gaussian edge potentials, arXiv e-prints, p. arXiv:1210.5644

  18. Kim G (2006) Pascal visual object classes challenge

  19. Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  20. Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement, arXiv e-prints, p. arXiv:1804.02767,

  21. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation, In: CVPR

  22. Girshick R (2015) Fast r-cnn, In: IEEE international conference on computer vision (ICCV) vol 2015, pp 1440–1448

  23. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  24. He K, Gkioxari G, Dollár P, Girshick R (2020) Mask r-cnn. IEEE Trans Pattern Anal Mach Intell 42(2):386–397

    Article  Google Scholar 

  25. Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4: optimal speed and accuracy of object detection, arXiv e-prints, p. arXiv:2004.10934

  26. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision - ECCV 2016. Springer International Publishing, Cham, pp 21–37

    Chapter  Google Scholar 

  27. Shen Z, Zhuang L, Li J, Jiang YG, Xue X (2017) Dsod: Learning deeply supervised object detectors from scratch, In: 2017 IEEE international conference on computer vision (ICCV),

  28. Wang J, Chen K, Yang S, Loy CC, Lin D (2019) Region proposal by guided anchoring, In: IEEE/CVF conference on computer vision and pattern recognition (CVPR) vol 2019, pp 2960–2969

  29. Zhou X, Zhuo J, Krähenbühl P (2019) Bottom-up object detection by grouping extreme and center points, arXiv e-prints, p. arXiv:1901.08043,

  30. Jonathan L, Evan S, Trevor D (2017) Fully convolutional networks for semantic segmentation, IEEE Trans Pattern Anal Mach Intell

  31. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

    Article  Google Scholar 

  32. Zhang H, Dana K, Shi J, Zhang Z, Wang X, Tyagi A, Agrawal A (2018) Context encoding for semantic segmentation, In IEEE/CVF conference on computer vision and pattern recognition vol 2018, pp 7151–7160

  33. Kirillov A, Wu Y, He K, Girshick R (2019) Pointrend: Image segmentation as rendering

  34. Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation, In: European conference on computer vision

  35. Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions, In: ICLR,

  36. Yang M, Yu K, Zhang C, Li Z, Yang K (2018) Denseaspp for semantic segmentation in street scenes, In: IEEE/CVF conference on computer vision and pattern recognition vol 2018, pp 3684–3692

  37. Kolesnikov A, Lampert CH (2016) Seed, expand and constrain: Three principles for weakly-supervised image segmentation, In: European conference on computer vision

  38. Fan J, Zhang Z, Song C, Tan T (2020) Learning integral objects with intra-class discriminator for weakly-supervised semantic segmentation, In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  39. Araslanov N, Roth S (2020) Single-stage semantic segmentation from image labels, In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR),

  40. Wang Y, Zhang J, Kan M, Shan S, Chen X (2020) Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation, In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  41. Redondo-Cabrera C, Baptista-Ríos M, López-Sastre RJ (2019) Learning to exploit the prior network knowledge for weakly supervised semantic segmentation. IEEE Trans Image Process 28(7):3649–3661

    Article  MathSciNet  MATH  Google Scholar 

  42. Lee J, Kim E, Lee S, Lee J, Yoon S (2019) Ficklenet: Weakly and semi-supervised semantic image segmentation using stochastic inference, In: IEEE/CVF conference on computer vision and pattern recognition (CVPR) vol 2019, pp 5262–5271

  43. Wei Y, Feng J, Liang X, Cheng M, Zhao Y, Yan S (2017) Object region mining with adversarial erasing: A simple classification to semantic segmentation approach, In: IEEE conference on computer vision and pattern recognition (CVPR) vol 2017, pp 6488–6496

  44. Xu L, Xue H, Bennamoun M, Boussaid F, Sohel F (2021) Atrous convolutional feature network for weakly supervised semantic segmentation. Neurocomputing 421(1):115–126

    Article  Google Scholar 

  45. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization, In: IEEE conference on computer vision and pattern recognition (CVPR) vol 2016, pp 2921–2929

  46. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization, In: IEEE international conference on computer vision (ICCV) vol 2017, pp 618–626

  47. Bearman A, Russakovsky O, Ferrari V, Fei-Fei L (2016) What’s the point: Semantic segmentation with point supervision, ECCV

  48. Lin D, Dai J, Jia J, He K, Sun J (2016) Scribblesup: Scribble-supervised convolutional networks for semantic segmentation, In: IEEE conference on computer vision and pattern recognition (CVPR) vol 2016, pp 3159–3167

  49. Tang M, Djelouah A, Perazzi F, Boykov Y, Schroers C (2018) Normalized cut loss for weakly-supervised cnn segmentation, In: IEEE/CVF conference on computer vision and pattern recognition vol 2018, pp 1818–1827

  50. Vernaza P, Chandraker M (2017) Learning random-walk label propagation for weakly-supervised semantic segmentation, In: CVPR

  51. Arbeláez P, Pont-Tuset J, Barron J, Marques F, Malik J (2014) Multiscale combinatorial grouping, In: IEEE conference on computer vision and pattern recognition vol 2014, pp 328–335

  52. Rother C, Kolmogorov V, Blake A (2004) “grabcut”: Interactive foreground extraction using iterated graph cuts, In: ACM SIGGRAPH, (2004) Papers, ser. SIGGRAPH ’04. New York, NY, USA: association for computing machinery, pp 309–314

  53. Ibrahim MS, Vahdat A, Ranjbar M, Macready WG (2020) Semi-supervised semantic image segmentation with self-correcting networks, In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  54. Paszke A, Gross S, Massa F, Lerer A, Chintala S (2019) Pytorch: An imperative style, high-performance deep learning library

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No.61972417, 61902431) and the Natural Science Foundation of Shandong Province (No. ZR2020MF005).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shibao Li.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article. All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, S., Liu, Y., Zhang, Y. et al. Adaptive Generation of Weakly Supervised Semantic Segmentation for Object Detection. Neural Process Lett 55, 657–670 (2023). https://doi.org/10.1007/s11063-022-10902-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-022-10902-w

Keywords

Navigation