Abstract
The scarcity of fully-annotated data becomes the biggest obstacle that prevents many deep learning approaches from widely applied. Weakly-supervised visual learning which can utilize inexact annotations is developed rapidly to remedy such a situation. In this paper, we study the weakly-supervised task achieving pixel-level semantic segmentation only with image-level labels as supervision. Different from other methods, our approach tries to transform the weakly-supervised visual learning problem into a semi-supervised visual learning problem and then utilizes semi-supervised learning methods to solve it. Utilizing this transformation, we can adopt effective semi-supervised methods to perform transductive learning with context information. In the semi-supervised learning module, we propose to use the graph cut algorithm to label more supervision from the activation seeds generated from a classification network. The generated labels can provide the segmentation model with effective supervision information; moreover, the graph cut module can benefit from features extracted by the segmentation model. Then, each of them updates and optimizes the other iteratively until convergence. Experiment results on PASCAL VOC and COCO benchmarks demonstrate the effectiveness of the proposed deep graph cut algorithm for weakly-supervised semantic segmentation.
Article PDF
Similar content being viewed by others
References
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems, 2012. 1097–1105
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 3431–3440
Huang Z L, Wang X G, Wei Y C, et al. CCNet: criss-cross attention for semantic segmentation. 2020. arXiv:1811.11721
Kolesnikov A, Lampert C H. Seed, expand and constrain: three principles for weakly-supervised image segmentation. In: Proceedings of European Conference on Computer Vision. Berlin: Springer, 2016. 695–711
Huang Z, Wang X, Wang J, et al. Weakly-supervised semantic segmentation network with deep seeded region growing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 7014–7023
Zhou Z H. A brief introduction to weakly supervised learning. Natl Sci Rev, 2018, 5: 44–53
Wei Y, Xiao H, Shi H, et al. Revisiting dilated convolution: a simple approach for weakly-and semi-supervised semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 7268–7277
Tang P, Wang X G, Bai S, et al. PCL: proposal cluster learning for weakly supervised object detection. IEEE Trans Pattern Anal Mach Intell, 2020, 42: 176–191
Wang X G, Deng X B, Fu Q, et al. lesion localization from chest CT. IEEE Trans Medical Imaging, 2020, 39: 2615–2625
Zhou B, Khosla A, Lapedriza A, et al. Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 2921–2929
Lee J, Kim E, Lee S, et al. FickleNet: weakly and semi-supervised semantic image segmentation using stochastic inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 5267–5276
Wei Y, Feng J, Liang X, et al. Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 1568–1576
Ahn J, Kwak S. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 4981–4990
Fan J S, Zhang Z X, Tan T N. CIAN: cross-image affinity net for weakly supervised semantic segmentation. 2018. ArXiv:1811.10842
Boykov Y Y, Jolly M P. Interactive graph cuts for optimal boundary & region segmentation of objects in ND images. In: Proceedings of the 8th IEEE International Conference on Computer Vision, 2001. 105–112
Boykov Y, Kolmogorov V. An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision. IEEE Trans Pattern Anal Machine Intell, 2004, 26: 1124–1137
Dai J, He K, Sun J. BoxSup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, 2015. 1635–1643
Tang M, Perazzi F, Djelouah A, et al. On regularized losses for weakly-supervised CNN segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), 2018. 507–522
Papandreou G, Chen L C, Murphy K, et al. Weakly-and semi-supervised learning of a DCNN for semantic image segmentation. 2015. ArXiv:1502.02734
Tang M, Djelouah A, Perazzi F, et al. Normalized cut loss for weakly-supervised CNN segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 1818–1827
Bearman A, Russakovsky O, Ferrari V, et al. What’s the point: semantic segmentation with point supervision. In: Proceedings of European Conference on Computer Vision. Berlin: Springer, 2016. 549–565
Lee J, Kim E, Lee S, et al. Frame-to-frame aggregation of active regions in web videos for weakly supervised semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, 2019. 6808–6818
Yang J, Sun X, Lai Y K, et al. Recognition from web data: a progressive filtering approach. IEEE Trans Image Process, 2018, 27: 5303–5315
Wang Y, Zhang J, Kan M, et al. Self-supervised scale equivariant network for weakly supervised semantic segmentation. 2019. ArXiv:1909.03714
Zhang B, Xiao J, Wei Y, et al. Reliability does matter: an end-to-end weakly supervised semantic segmentation approach. 2019. ArXiv:1911.08039
Gao L, Song J, Nie F, et al. Graph-without-cut: an ideal graph learning for image segmentation. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence, 2016
Simonyan K, Vedaldi A, Zisserman A. Deep inside convolutional networks: visualising image classification models and saliency maps. 2013. ArXiv:1312.6034
Selvaraju R R, Cogswell M, Das A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, 2017. 618–626
Jiang H, Wang J, Yuan Z, et al. Salient object detection: a discriminative regional feature integration approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013. 2083–2090
Everingham M, Eslami S M A, van Gool L, et al. The pascal visual object classes challenge: a retrospective. Int J Comput Vis, 2015, 111: 98–136
Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: common objects in context. In: Proceedings of European Conference on Computer Vision. Berlin: Springer, 2014. 740–755
Hariharan B, Arbeláez P, Bourdev L, et al. Semantic contours from inverse detectors. In: Proceedings of 2011 International Conference on Computer Vision, 2011. 991–998
Chen L C, Papandreou G, Kokkinos I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs. 2014. ArXiv:1412.7062
Lin D, Dai J, Jia J, et al. Scribblesup: scribble-supervised convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 3159–3167
Wang X, You S, Li X, et al. Weakly-supervised semantic segmentation by iteratively mining common object features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 1354–1362
Chaudhry A, Dokania P K, Torr P H. Discovering class-specific pixels for weakly-supervised semantic segmentation. 2017. ArXiv:1707.05821
Hou Q, Jiang P, Wei Y, et al. Self-erasing network for integral object attention. In: Proceedings of Advances in Neural Information Processing Systems, 2018. 549–559
Shimoda W, Yanai K. Self-supervised difference detection for weakly-supervised semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, 2019. 5208–5217
Zhou D, Bousquet O, Lal T N, et al. Learning with local and global consistency. In: Proceedings of Advances in Neural Information Processing Systems, 2004. 321–328
Paszke A, Gross S, Chintala S, et al. Automatic differentiation in PyTorch. In: Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, 2017
Wu Z, Shen C, van den Hengel A. Wider or deeper: revisiting the ResNet model for visual recognition. Pattern Recogn, 2019, 90: 119–133
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant Nos. 61876212, 61733007) and Zhejiang Lab (Grant No. 2019NB0AB02).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Feng, J., Wang, X. & Liu, W. Deep graph cut network for weakly-supervised semantic segmentation. Sci. China Inf. Sci. 64, 130105 (2021). https://doi.org/10.1007/s11432-020-3065-4
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-020-3065-4