Weakly-Supervised Semantic Segmentation by Iterative Affinity Learning

Wang, Xiang; Liu, Sifei; Ma, Huimin; Yang, Ming-Hsuan

doi:10.1007/s11263-020-01293-3

Weakly-Supervised Semantic Segmentation by Iterative Affinity Learning

Published: 30 January 2020

Volume 128, pages 1736–1749, (2020)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Xiang Wang^1,2,
Sifei Liu³,
Huimin Ma⁴ &
…
Ming-Hsuan Yang ORCID: orcid.org/0000-0003-4848-2304⁵

1703 Accesses
56 Citations
2 Altmetric
Explore all metrics

Abstract

Weakly-supervised semantic segmentation is a challenging task as no pixel-wise label information is provided for training. Recent methods have exploited classification networks to localize objects by selecting regions with strong response. While such response map provides sparse information, however, there exist strong pairwise relations between pixels in natural images, which can be utilized to propagate the sparse map to a much denser one. In this paper, we propose an iterative algorithm to learn such pairwise relations, which consists of two branches, a unary segmentation network which learns the label probabilities for each pixel, and a pairwise affinity network which learns affinity matrix and refines the probability map generated from the unary network. The refined results by the pairwise network are then used as supervision to train the unary network, and the procedures are conducted iteratively to obtain better segmentation progressively. To learn reliable pixel affinity without accurate annotation, we also propose to mine confident regions. We show that iteratively training this framework is equivalent to optimizing an energy function with convergence to a local minimum. Experimental results on the PASCAL VOC 2012 and COCO datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Employing Multi-estimations for Weakly-Supervised Semantic Segmentation

Toward Practical Weakly Supervised Semantic Segmentation via Point-Level Supervision

Article 12 August 2023

Weakly supervised semantic segmentation by iteratively refining optimal segmentation with deep cues guidance

Article 18 January 2021

Notes

We use the code provided by the authors. The authors report results on the original training set (1464 images) of the PASCAL VOC 2012 dataset. Here we present results on the augmented training set (10582 images) as all models are trained on the augmented training set.

References

Ahn, J., & Kwak, S. (2018). Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 4981–4990).
Bearman, A., Russakovsky, O., Ferrari, V., & Fei-Fei, L. (2016). What’s the point: Semantic segmentation with point supervision. In Proceedings of European conference on computer vision (ECCV) (pp. 549–565).
Bertasius, G., Torresani, L., Stella, X. Y., & Shi, J. (2017). Convolutional random walk networks for semantic image segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 858–866).
Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2018). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 40(4), 834–848.
Article Google Scholar
Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587
Dai, J., He, K., & Sun, J. (2015) Boxsup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In Proceedings of IEEE international conference on computer vision (ICCV) (pp. 1635–1643).
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision (IJCV), 88(2), 303–338.
Article Google Scholar
Fan, R., Cheng, M. M., Hou, Q., Mu, T. J., Wang, J., & Hu, S. M. (2019). S4net: Single stage salient-instance segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 6103–6112).
Fan, R., Hou, Q., Cheng, M. M., Yu, G., Martin, R. R., & Hu, S. M. (2018). Associating inter-image salient instances for weakly supervised semantic segmentation. In Proceedings of European conference on computer vision (ECCV) (pp. 367–383).
Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision (IJCV), 59(2), 167–181.
Article Google Scholar
Hagen, L., & Kahng, A. B. (1992). New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 11, 1074–1085.
Article Google Scholar
Hariharan, B., Arbeláez, P., Bourdev, L., Maji, S., & Malik, J. (2011). Semantic contours from inverse detectors. In Proceedings of IEEE international conference on computer vision (ICCV) (pp. 991–998).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 770–778).
Huang, Z., Wang, X., Wang, J., Liu, W., & Wang, J. (2018). Weakly-supervised semantic segmentation network with deep seeded region growing. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 7014–7023).
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of ACM international conference on Multimedia (ACM MM) (pp. 675–678).
Kersten, D. (1987). Predictability and redundancy of natural images. JOSA A, 4(12), 2395–2400.
Article Google Scholar
Khoreva, A., Benenson, R., Hosang, J., Hein, M., & Schiele, B. (2017). Simple does it: Weakly supervised instance and semantic segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 876–885).
Kolesnikov, A., & Lampert, C. H. (2016). Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In Proceedings of European conference on computer vision (ECCV) (pp. 695–711).
Levin, A., Lischinski, D., & Weiss, Y. (2008). A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 30, 228–242.
Article Google Scholar
Lin, D., Dai, J., Jia, J., He, K., & Sun, J. (2016). Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3159–3167).
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In Proceedings of European conference on computer vision (ECCV) (pp. 740–755).
Liu, S., De Mello, S., Gu, J., Zhong, G., Yang, M. H., & Kautz, J. (2017). Learning affinity via spatial propagation networks. In Proceedings of annual conference on neural information processing systems (NeurIPS) (pp. 1520–1530).
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3431–3440).
Maire, M., Narihira, T., & Yu, S. X. (2016). Affinity CNN: Learning pixel-centric pairwise relations for figure/ground embedding. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 174–182).
Papandreou, G., Chen, L. C., Murphy, K. P., & Yuille, A. L. (2015). Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In Proceedings of IEEE international conference on computer vision (ICCV) (pp. 1742–1750).
Pathak, D., Krahenbuhl, P., & Darrell, T. (2015). Constrained convolutional neural networks for weakly supervised segmentation. In Proceedings of IEEE international conference on computer vision (ICCV) (pp. 1796–1804).
Pathak, D., Shelhamer, E., Long, J., & Darrell, T. (2014). Fully convolutional multi-class multiple instance learning. arXiv preprint arXiv:1412.7144.
Pinheiro, P. O., & Collobert, R. (2015). From image-level to pixel-level labeling with convolutional networks. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1713–1721).
Qi, X., Liu, Z., Shi, J., Zhao, H., & Jia, J. (2016). Augmented feedback in semantic segmentation under image level supervision. In Proceedings of European conference on computer vision (ECCV) (pp. 90–105).
Roy, A., & Todorovic, S. (2017). Combining bottom-up, top-down, and smoothness cues for weakly supervised image segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3529–3538).
Saleh, F., Aliakbarian, M. S., Salzmann, M., Petersson, L., Gould, S., & Alvarez, J. M. (2016). Built-in foreground/background prior for weakly-supervised semantic segmentation. In Proceedings of European Conference on Computer Vision (ECCV) (pp. 413–432).
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 22(8), 888–905.
Article Google Scholar
Shimoda, W., & Yanai, K. (2016). Distinct class-specific saliency maps for weakly supervised semantic segmentation. In Proceedings of European conference on computer vision (ECCV) (pp. 218–234).
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Wang, X., Ma, H., Chen, X., & You, S. (2018a). Edge preserving and multi-scale contextual neural network for salient object detection. IEEE Transactions on Image Processing (TIP), 27(1), 121–134.
Article MathSciNet Google Scholar
Wang, X., You, S., Li, X., & Ma, H. (2018b). Weakly-supervised semantic segmentation by iteratively mining common object features. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1354–1362).
Wei, Y. C., Cheng, C. K., et al. (1989) Towards efficient hierarchical designs by ratio cut partitioning. In IEEE international conference on computer-aided design (pp. 298–301).
Wei, Y., Feng, J., Liang, X., Cheng, M. M., Zhao, Y., & Yan, S. (2017a). Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1568–1576).
Wei, Y., Liang, X., Chen, Y., Shen, X., Cheng, M. M., Feng, J., et al. (2017b). STC: A simple to complex framework for weakly-supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 39(11), 2314–2320.
Article Google Scholar
Wei, Y., Xiao, H., Shi, H., Jie, Z., Feng, J., & Huang, T. S. (2018). Revisiting dilated convolution: A simple approach for weakly-and semi-supervised semantic segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 7268–7277).
Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2881–2890).
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2921–2929).
Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., et al. (2019). Semantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision (IJCV), 127(3), 302–321.
Article Google Scholar

Download references

Acknowledgements

This work is supported by National Key Basic Research Program of China (No. 2016YFB0100900), Beijing Science and Technology Planning Project (No. Z191100007419001), National Natural Science Foundation of China (No. 61773231), and National Science Foundation (CAREER No. 1149783).

Author information

Authors and Affiliations

Tencent Research, Beijing, China
Xiang Wang
Tsinghua University, Beijing, China
Xiang Wang
Nvidia, Santa Clara, CA, USA
Sifei Liu
University of Science and Technology Beijing, Beijing, China
Huimin Ma
University of California at Merced, Merced, CA, USA
Ming-Hsuan Yang

Authors

Xiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Sifei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Huimin Ma
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Hsuan Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huimin Ma.

Additional information

Communicated by Kristen Grauman.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, X., Liu, S., Ma, H. et al. Weakly-Supervised Semantic Segmentation by Iterative Affinity Learning. Int J Comput Vis 128, 1736–1749 (2020). https://doi.org/10.1007/s11263-020-01293-3

Download citation

Received: 21 April 2019
Accepted: 05 January 2020
Published: 30 January 2020
Issue Date: June 2020
DOI: https://doi.org/10.1007/s11263-020-01293-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weakly-Supervised Semantic Segmentation by Iterative Affinity Learning

Abstract

Access this article

Similar content being viewed by others

Employing Multi-estimations for Weakly-Supervised Semantic Segmentation

Toward Practical Weakly Supervised Semantic Segmentation via Point-Level Supervision

Weakly supervised semantic segmentation by iteratively refining optimal segmentation with deep cues guidance

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Weakly-Supervised Semantic Segmentation by Iterative Affinity Learning

Abstract

Access this article

Similar content being viewed by others

Employing Multi-estimations for Weakly-Supervised Semantic Segmentation

Toward Practical Weakly Supervised Semantic Segmentation via Point-Level Supervision

Weakly supervised semantic segmentation by iteratively refining optimal segmentation with deep cues guidance

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation