Abstract
Fully-supervised semantic segmentation methods are difficult to generalize to novel objects, and their fine-tuning often requires a sufficient number of fully-labeled images. Few-shot semantic segmentation (FSS) has recently attracted lots of attention due to its excellent capability for segmenting the novel object with only a few labeled images. Most of recent approaches follow the prototype learning paradigm and have made a significant improvement in segmentation performance. However, there exist two critical bottleneck problems to be solved. (1) Previous methods mainly focus on mining the foreground information of the target object, and class-specific prototypes are generated by solely leveraging average operation on the whole support image, which may lead to information loss, underutilization, or semantic confusion of the object. (2) Most existing methods unilaterally guide the object segmentation in the query image with support images, which may result in semantic misalignment due to the diversity of objects in the support and query sets. To alleviate the above challenging problems, we propose a simple yet effective joint guidance learning architecture to generate and align more compact and robust prototypes from two aspects. (1) We propose a coarse-to-fine prototype generation module to generate coarse-grained foreground prototypes and fine-grained background prototypes. (2) We design a joint guidance learning module for the prototype evaluation and optimization on both support and query images. Extensive experiments show that the proposed method can achieve superior segmentation results on PASCAL-5\(^{i}\) and COCO-20\(^{i}\) datasets.
Similar content being viewed by others
Data Availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request
References
Yu H, Yang Z, Tan L, Wang Y, Sun W, Sun M, Tang Y (2018) Methods and datasets on semantic segmentation: A review. Neurocomputing 304:82–103
Kim S, An S, Chikontwe P, Park SH (2021) Bidirectional rnn-based few shot learning for 3d medical image segmentation. Proceedings of the AAAI Conference on Artificial Intelligence 35:1808–1816
Zhao N, Chua T-S, Lee GH (2021) Few-shot 3d point cloud semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p 8873–8882
Kalluri T, Chandraker M (2022) Cluster-to-adapt: Few shot domain adaptation for semantic segmentation across disjoint labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p 4121–4131
Wang Y, Zhang J, Kan M, Shan S, Chen X (2020) Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p 12275–12284
Lin D, Dai J, Jia J, He K, Sun J (2016) Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, p 3159–3167
Pathak D, Krahenbuhl P, Darrell T (2015) Constrained convolutional neural networks for weakly supervised segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, p 1796–1804
Dai J, He K, Sun J (2015) Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, p 1635–1643
Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. Advances in neural information processing systems 30
Li G, Jampani V, Sevilla-Lara L, Sun D, Kim J, Kim J (2021) Adaptive prototype learning and allocation for few-shot segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p 8334–8343
Zhang C, Lin G, Liu F, Yao R, Shen C (2019) Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p 5217–5226
Tian Z, Zhao H, Shu M, Yang Z, Li R, Jia J (2020) Prior guided feature enrichment network for few-shot segmentation. IEEE transactions on pattern analysis and machine intelligence 44(2):1050–1065
Zhang C, Lin G, Liu F, Guo J, Wu Q, Yao R (2019) Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, p 9587–9595
Wang K, Liew JH, Zou Y, Zhou D, Feng J (2019) Panet: Few-shot image semantic segmentation with prototype alignment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, p 9197–9206
Rao X, Lu T, Wang Z, Zhang Y (2022) Few-shot semantic segmentation via frequency guided neural network. IEEE Signal Processing Letters 29:1092–1096
Chang Z, Lu Y, Wang X, Ran X (2022) Mgnet: Mutual-guidance network for few-shot semantic segmentation. Eng Appl Artif Intell 116:105431
Fan Q, Pei W, Tai Y-W, Tang C-K (2022) Self-support few-shot semantic segmentation. In: Proceedings of the European Conference on Computer Vision, p 701–719
Chen J, Gao B-B, Lu Z, Xue J-H, Wang C, Liao Q (2021) Scnet: Enhancing few-shot semantic segmentation by self-contrastive background prototypes. arXiv preprint. arXiv:2104.09216
Yang L, Zhuo W, Qi L, Shi Y, Gao Y (2021) Mining latent classes for few-shot segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, p 8721–8730
Zhang J-W, Sun Y, Yang Y, Chen W (2022) Feature-proxy transformer for few-shot segmentation. arXiv preprint. arXiv:2210.06908
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al. (2020) An image is worth 16x16 words: Transformers for image recognition at scale. In: Proceedings of the International Conference on Learning Representations. https://doi.org/10.48550/arXiv.2010.11929
Sun G, Liu Y, Liang J, Van Gool L (2021) Boosting few-shot semantic segmentation with transformers. arXiv preprint. arXiv:2108.02266
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, p 3431–3440
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, p 764–773
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, p 2881–2890
Wei Y, Xiao H, Shi H, Jie Z, Feng J, Huang TS (2018) Revisiting dilated convolution: A simple approach for weakly-and semi-supervised semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, p 7268–7277
Liu J, Bao Y, Xie G-S, Xiong H, Sonke J-J, Gavves E (2022) Dynamic prototype convolution network for few-shot semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p 11553–11562
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Wu T, Huang J, Gao G, Wei X, Wei X, Luo X, Liu CH (2021) Embedded discriminative attention mechanism for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p 16765–16774
Jiang P-T, Han L-H, Hou Q, Cheng M-M, Wei Y (2021) Online attention accumulation for weakly supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(10):7062–7077. https://doi.org/10.1109/TPAMI.2021.3092573
Huang Y, Kang D, Jia W, Liu L, He X (2022) Channelized axial attention-considering channel relation within spatial attention for semantic segmentation. Proceedings of the AAAI Conference on Artificial Intelligence 36:1016–1025
Shaban A, Bansal S, Liu Z, Essa I, Boots B (2017) One-shot learning for semantic segmentation. In: Proceedings of the European Conference on Computer Vision, p 1–17
Wu Z, Shi X, Lin G, Cai J (2021) Learning meta-class memory for few-shot semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, p 517–526
Lu Z, He S, Zhu X, Zhang L, Song Y-Z, Xiang T (2021) Simpler is better: Few-shot semantic segmentation with classifier weight transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, p 8741–8750
Wang W, Duan L, Wang Y, En Q, Fan J, Zhang Z (2022) Remember the difference: Cross-domain few-shot semantic segmentation via meta-memory transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p 7065–7074
Mao B, Wang L, Xiang S, Pan C (2022) Task-aware adaptive attention learning for few-shot semantic segmentation. Neurocomputing 494:104–115
Yang B, Wan F, Liu C, Li B, Ji X, Ye Q (2021) Part-based semantic transform for few-shot semantic segmentation. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2021.3084252
Dong N, Xing EP (2018) Few-shot semantic segmentation with prototype learning. In: Proceedings of the British Machine Vision Conference, p 79–91
Zhang X, Wei Y, Yang Y, Huang TS (2020) Sg-one: Similarity guidance network for one-shot semantic segmentation. IEEE Ttransactions on Cybernetics 50(9):3855–3865. https://doi.org/10.1109/TCYB.2020.2992433
Ding H, Zhang H, Jiang X (2023) Self-regularized prototypical network for few-shot semantic segmentation. Pattern Recogn 133:109018
Yang B, Liu C, Li B, Jiao J, Ye Q (2020) Prototype mixture models for few-shot semantic segmentation. In: European Conference on Computer Vision, p 763–778
Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D, (2016) Matching networks for one shot learning. Advances in Neural Information Processing Systems 29:3630–3648
Yao X, Cao Q, Feng X, Cheng G, Han J (2021) Scale-aware detailed matching for few-shot aerial image semantic segmentation. IEEE Trans Geosci Remote Sens 60:1–11
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: Proceedings of the European Conference on Computer Vision, p 740–755
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338. https://doi.org/10.1007/s11263-009-0275-4
Hariharan B, Arbeláez P, Bourdev L, Maji S, Malik J (2011) Semantic contours from inverse detectors. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, p 991–998. https://doi.org/10.1109/ICCV.2011.6126343
Boudiaf M, Kervadec H, Masud ZI, Piantanida P, Ben Ayed I, Dolz J (2021) Few-shot segmentation without meta-learning: A good transductive inference is all you need? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p 13979–13988
Liu W, Zhang C, Lin G, Liu F (2020) Crnet: Cross-reference networks for few-shot segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p 4165–4173
Nguyen K, Todorovic S (2019) Feature weighting and boosting for few-shot segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, p 622–631
Siam M, Oreshkin B, Jagersand M (2019) Adaptive masked proxies for few-shot segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, p 5249–5258
Zhang X, Wei Y, Li Z, Yan C, Yang Y (2021) Rich embedding features for one-shot semantic segmentation. IEEE Transactions on Neural Networks and Learning Systems 33(11):6484–6493
Gao G, Fang Z, Han C, Wei Y, Liu CH, Yan S (2022) Drnet: Double recalibration network for few-shot semantic segmentation. IEEE Trans Image Process 31:6733–6746. https://doi.org/10.1109/TIP.2022.3215905
Liu Y, Zhang X, Zhang S, He X (2020) Part-aware prototype network for few-shot semantic segmentation. In: Proceedings of the European Conference on Computer Vision, p 142–158
Wang H, Yang Y, Jiang X, Cao X, Zhen X (2020) You only need the image: Unsupervised few-shot semantic segmentation with co-guidance network. In: 2020 IEEE International Conference on Image Processing (ICIP), p 1496–1500. https://doi.org/10.1109/ICIP40778.2020.9190849
Pambala AK, Dutta T, Biswas S (2021) Sml: Semantic meta-learning for few-shot semantic segmentation. Pattern Recogn Lett 147:93–99
Liu B, Ding Y, Jiao J, Ji X, Ye Q (2021) Anti-aliasing semantic reconstruction for few-shot semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p 9747–9756
Wang H, Zhang X, Hu Y, Yang Y, Cao X, Zhen X (2020) Few-shot semantic segmentation with democratic attention networks. In: Proceedings of the European Conference on Computer Vision, p 730–746
Zhang G, Kang G, Yang Y, Wei Y (2021) Few-shot segmentation via cycle-consistent transformer. Advances in Neural Information Processing Systems 34:21984–21996
Wang H, Yang Y, Cao X, Zhen X, Snoek C, Shao L (2021) Variational prototype inference for few-shot semantic segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, p 525–534
Lang C, Cheng G, Tu B, Han J (2022) Learning what not to segment: A new perspective on few-shot segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p 8057–8067
Zhang B, Xiao J, Qin T (2021) Self-guided and cross-guided learning for few-shot segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p 8312–8321
Liu Y, Liu N, Cao Q, Yao X, Han J, Shao L (2022) Learning non-target knowledge for few-shot semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p 11573–11582
Tang Y, Yu Y (2022) Query-guided prototype learning with decoder alignment and dynamic fusion in few-shot segmentation. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM). https://doi.org/10.1145/3555314
Xie G-S, Liu J, Xiong H, Shao L (2021) Scale-aware graph neural network for few-shot semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p 5475–5484
Liu B, Jiao J, Ye Q (2021) Harmonic feature activation for few-shot semantic segmentation. IEEE Trans Image Process 30:3142–3153
Acknowledgements
This work was funded by the National Key Research and Development Program of China (Grant No. 2017YFE0111900), the National Natural Science Foundation of China (Grant No. 62166025), the Science and Technology Program of Gansu Province (Grant No. 23JRRA1133), and the Gansu Haizhi Characteristic Demonstration Project (Grant No. GSHZTS 2022-2).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chang, Z., Lu, Y., Ran, X. et al. Simple yet effective joint guidance learning for few-shot semantic segmentation. Appl Intell 53, 26603–26621 (2023). https://doi.org/10.1007/s10489-023-04937-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04937-2