Semi-supervised Semantic Segmentation via Strong-Weak Dual-Branch Network

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12350)


While existing works have explored a variety of techniques to push the envelop of weakly-supervised semantic segmentation, there is still a significant gap compared to the supervised methods. In real-world application, besides massive amount of weakly-supervised data there are usually a few available pixel-level annotations, based on which semi-supervised track becomes a promising way for semantic segmentation. Current methods simply bundle these two different sets of annotations together to train a segmentation network. However, we discover that such treatment is problematic and achieves even worse results than just using strong labels, which indicates the misuse of the weak ones. To fully explore the potential of the weak labels, we propose to impose separate treatments of strong and weak annotations via a strong-weak dual-branch network, which discriminates the massive inaccurate weak supervisions from those strong ones. We design a shared network component to exploit the joint discrimination of strong and weak annotations; meanwhile, the proposed dual branches separately handle full and weak supervised learning and effectively eliminate their mutual interference. This simple architecture requires only slight additional computational costs during training yet brings significant improvements over the previous methods. Experiments on two standard benchmark datasets show the effectiveness of the proposed method.


Semi-supervised Strong-weak Semantic segmentation 



This work is partially supported by National Natural Science Foundation of China (Grants no. 61772568), and the Natural Science Foundation of Guangdong Province, China (Grant no. 2019A1515012029).

Supplementary material

504441_1_En_46_MOESM1_ESM.pdf (47 kb)
Supplementary material 1 (pdf 46 KB)


  1. 1.
    Ahn, J., Kwak, S.: Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In: CVPR, June 2018Google Scholar
  2. 2.
    Bearman, A., Russakovsky, O., Ferrari, V., Fei-Fei, L.: What’s the point: semantic segmentation with point supervision. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 549–565. Springer, Cham (2016). Scholar
  3. 3.
    Bennett, K., Demiriz, A.: Semi-supervised support vector machines. In: NIPs, pp. 368–374. MIT Press, Cambridge (1999).
  4. 4.
    Chaudhry, A., Dokania, P.K., Torr, P., Toor, P.: Discovering class-specific pixels for weakly-supervised semantic segmentation. In: BMVC, vol. abs/1707.05821 (2017)Google Scholar
  5. 5.
    Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. TPAMI 40, 834–848 (2016)CrossRefGoogle Scholar
  6. 6.
    Chen, L., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV (2018)Google Scholar
  7. 7.
    Dai, J., He, K., Sun, J.: BoxSup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: ICCV, pp. 1635–1643 (2015)Google Scholar
  8. 8.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge 2012 (VOC2012) results (2012).
  9. 9.
    Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)Google Scholar
  10. 10.
    Hariharan, B., Arbelaez, P., Bourdev, L., Maji, S., Malik, J.: Semantic contours from inverse detectors. In: ICCV (2011)Google Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2015)Google Scholar
  12. 12.
    Hou, Q., Jiang, P., Wei, Y., Cheng, M.: Self-erasing network for integral object attention. In: NIPS (2018)Google Scholar
  13. 13.
    Huang, Z., Wang, X., Wang, J., Liu, W., Wang, J.: Weakly-supervised semantic segmentation network with deep seeded region growing. In: CVPR, June 2018Google Scholar
  14. 14.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014)Google Scholar
  15. 15.
    Kolesnikov, A., Lampert, C.: Seed, expand and constrain: three principles for weakly-supervised image segmentation. In: ECCV, vol. abs/1603.06098 (2016)Google Scholar
  16. 16.
    Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: NIPS (2011)Google Scholar
  17. 17.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPs, pp. 1097–1105. Curran Associates Inc., USA (2012).
  18. 18.
    Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. In: ICLR, vol. abs/1610.02242 (2016)Google Scholar
  19. 19.
    Lee, J., Kim, E., Lee, S., Lee, J., Yoon, S.: FickleNet: weakly and semi-supervised semantic image segmentation using stochastic inference. In: CVPR, June 2019Google Scholar
  20. 20.
    Li, K., Wu, Z., Peng, K., Ernst, J., Fu, Y.: Tell me where to look: guided attention inference network. In: CVPR, pp. 9215–9223 (2018)Google Scholar
  21. 21.
    Lin, D., Dai, J., Jia, J., He, K., Sun, J.: ScribbleSup: scribble-supervised convolutional networks for semantic segmentation. In: CVPR, pp. 3159–3167 (2016)Google Scholar
  22. 22.
    Lin, T., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2999–3007 (2017)Google Scholar
  23. 23.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar
  24. 24.
    Miyato, T., Maeda, S., Koyama, M., Ishii, S.: Virtual adversarial training: a regularization method for supervised and semi-supervised learning. TPAMI 41(8), 1979–1993 (2019). Scholar
  25. 25.
    Oh, S., Benenson, R., Khoreva, A., Akata, Z., Fritz, M., Schiele, B.: Exploiting saliency for object segmentation from image level labels. In: CVPR (2017, to appear)Google Scholar
  26. 26.
    Papandreou, G., Chen, L., Murphy, K.P., Yuille, A.L.: Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In: ICCV, pp. 1742–1750, December 2015.
  27. 27.
    Papandreou, G., Chen, L., Murphy, K.P., Yuille, A.L.: Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In: ICCV, ICCV 2015, pp. 1742–1750. IEEE Computer Society, Washington, DC (2015).
  28. 28.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. TPAMI 39, 1137–1149 (2015)CrossRefGoogle Scholar
  29. 29.
    Roy, A., Todorovic, S.: Combining bottom-up, top-down, and smoothness cues for weakly supervised image segmentation. In: CVPR, July 2017Google Scholar
  30. 30.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  31. 31.
    Song, C., Huang, Y., Ouyang, W., Wang, L.: Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation. In: CVPR, June 2019Google Scholar
  32. 32.
    Tang, M., Djelouah, A., Perazzi, F., Boykov, Y., Schroers, C.: Normalized cut loss for weakly-supervised CNN segmentation. In: CVPR, June 2018Google Scholar
  33. 33.
    Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: ICLR (2017)Google Scholar
  34. 34.
    Wang, X., You, S., Li, X., Ma, H.: Weakly-supervised semantic segmentation by iteratively mining common object features. In: CVPR, June 2018Google Scholar
  35. 35.
    Wei, Y., Xiao, H., Shi, H., Jie, Z., Feng, J., Huang, T.: Revisiting dilated convolution: a simple approach for weakly- and semi-supervised semantic segmentation. In: CVPR, June 2018Google Scholar
  36. 36.
    Wei, Y., Feng, J., Liang, X., Cheng, M.M., Zhao, Y., Yan, S.: Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. In: CVPR, July 2017Google Scholar
  37. 37.
    Zhang, J., Bargal, S.A., Lin, Z., Brandt, J., Shen, X., Sclaroff, S.: Top-down neural attention by excitation backprop. IJCV 126, 1084–1102 (2016)CrossRefGoogle Scholar
  38. 38.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR, pp. 6230–6239 (2016)Google Scholar
  39. 39.
    Zhou, B., Khosla, A., A., L., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR (2016)Google Scholar
  40. 40.
    Zhu, X.: Semi-supervised learning literature survey. Technical report 1530, Computer Sciences, University of Wisconsin-Madison (2005)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.School of Data and Computer ScienceSun Yat-sen UniversityGuangzhouChina
  2. 2.Key Laboratory of Machine Intelligence and Advanced Computing (SYSU)Ministry of EducationBeijingChina

Personalised recommendations