Abstract
Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images. In this paper, we propose a Cross-Reference and Local–Global Conditional Networks (CRCNet) for few-shot segmentation. Unlike previous works that only predict the query image’s mask, our proposed model concurrently makes predictions for both the support image and the query image. Our network can better find the co-occurrent objects in the two images with a cross-reference mechanism, thus helping the few-shot segmentation task. To further improve feature comparison, we develop a local-global conditional module to capture both global and local relations. We also develop a mask refinement module to refine the prediction of the foreground regions recurrently. Experiments on the PASCAL VOC 2012, MS COCO, and FSS-1000 datasets show that our network achieves new state-of-the-art performance.
This is a preview of subscription content, access via your institution.











References
Azad, R., Fayjie, A.R., Kauffmann, C., Ben Ayed, I., Pedersoli, M., & Dolz, J. (2021). On the texture bias for few-shot cnn segmentation. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp 2674–2683.
Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481–2495.
Boudiaf, M., Kervadec, H., Masud, Z.I., Piantanida, P., Ben Ayed, I., & Dolz, J. (2021). Few-shot segmentation without meta-learning: A good transductive inference is all you need? In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 13979–13988.
Cao, Y., Shen, C., & Shen, H. T. (2016). Exploiting depth from single monocular images for object detection and semantic segmentation. IEEE Transactions on Image Processing, 26(2), 836–846.
Chen, H., Huang, Y., & Nakayama, H. (2018). Semantic aware attention based deep object co-segmentation. arXiv preprint arXiv:1810.06859
Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.
Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2018). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.
Chen, W., Jiang, Z., Wang, Z., Cui, K., & Qian, X. (2019). Collaborative global-local networks for memory-efficient segmentation of ultra-high resolution images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 8924–8933.
Chen, W.Y., Liu, Y.C., Kira, Z., Wang, Y.C., & Huang, J.B. (2019). A closer look at few-shot classification. In International conference on learning representations.
Chen, Y., Cao, Y., Hu, H., & Wang, L. (2020). Memory enhanced global-local aggregation for video object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 10337–10346.
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 248–255.
Ding, H., Jiang, X., Shuai, B., Liu, A. Q., & Wang, G. (2020). Semantic segmentation with context encoding and multi-path decoding. IEEE Transactions on Image Processing, 29, 3520–3533.
Dong, N., & Xing, E. (2018). Few-shot semantic segmentation with prototype learning. In: BMVC, 79.
Dong, Z., Zhang, R., Shao, X., & Zhou, H. (2019). Multi-scale discriminative location-aware network for few-shot semantic segmentation. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC). 2, 42–47. IEEE.
Fan, Q., Zhuo, W., Tang, C.K., & Tai, Y.W. (2020). Few-shot object detection with attention-rpn and multi-relation detector. In Proceedings of the IEEE/cvf conference on computer vision and pattern recognition. pp 4013–4022.
Han, J., Quan, R., Zhang, D., & Nie, F. (2017). Robust object co-segmentation using background prior. IEEE Transactions on Image Processing, 27(4), 1639–1651.
Hariharan, B., Arbeláez, P., Girshick, R., & Malik, J. (2014). Simultaneous detection and segmentation. In European conference on computer vision. pp 297–312. Springer.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778.
Hendryx, S.M., Leach, A.B., Hein, P.D., & Morrison, C.T. (2019). Meta-learning initializations for image segmentation. arXiv preprint arXiv:1912.06290
Hou, R., Chang, H., Ma, B., Shan, S., & Chen, X. (2019). Cross attention network for few-shot classification. arXiv preprint arXiv:1910.07677
Hu, T., Yang, P., Zhang, C., Yu, G., Mu, Y., & Snoek, C.G. (2019). Attention-based multi-context guiding for few-shot semantic segmentation. In Proceedings of the AAAI conference on artificial intelligence. pp. 8441–8448.
Huang, Z., Wang, C., Wang, X., Liu, W., & Wang, J. (2019). Semantic image segmentation by scale-adaptive networks. IEEE Transactions on Image Processing, 29(1), 2066–2077.
Jing, L., Chen, Y., & Tian, Y. (2019). Coarse-to-fine semantic segmentation from image-level labels. IEEE Transactions on Image Processing, 29, 225–236.
Joulin, A., Bach, F., & Ponce, J. (2012). Multi-class cosegmentation. In 2012 IEEE conference on computer vision and pattern recognition. pp. 542–549. IEEE.
Kolesnikov, A., & Lampert, C.H. (2016). Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In European Conference on Computer Vision. pp 695–711. Springer.
Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. pp. 1097–1105.
Li, G., Jampani, V., Sevilla-Lara, L., Sun, D., Kim, J., & Kim, J. (2021). Adaptive prototype learning and allocation for few-shot segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8334–8343.
Li, X., Wei, T., Chen, Y.P., Tai, Y.W., & Tang, C.K. (2020). Fss-1000: A 1000-class dataset for few-shot segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2869–2878.
Lin, G., Liu, F., Milan, A., Shen, C., & Reid, I. (2019). Refinenet: Multi-path refinement networks for dense prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(5), 1228–1242.
Lin, G., Milan, A., Shen, C., & Reid, I.D. (2017). Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. p. 5.
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C.L. (2014). Microsoft coco: Common objects in context. In European Conference on Computer Vision. pp. 740–755.
Liu, B., Ding, Y., Jiao, J., Ji, X., & Ye, Q. (2021). Anti-aliasing semantic reconstruction for few-shot semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9747–9756.
Liu, W., Wu, Z., Ding, H., Liu, F., Lin, J., & Lin, G. (2021). Few-shot segmentation with global and local contrastive learning. arXiv preprint arXiv:2108.05293
Liu, W., Zhang, C., Lin, G., & Liu, F. (2020). Crnet: Cross-reference networks for few-shot segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4165–4173.
Liu, Y., Zhang, X., Zhang, S., & He, X. (2020). Part-aware prototype network for few-shot semantic segmentation. In European conference on computer vision. pp. 142–158. Springer.
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3431–3440.
Min, J., Kang, D., & Cho, M. (2021). Hypercorrelation squeeze for few-shot segmentation. In Proceedings of the IEEE/CVF international conference on computer vision. pp. 6941–6952.
Mukherjee, P., Lall, B., & Lattupally, S. (2018). Object cosegmentation using deep siamese network. arXiv preprint arXiv:1803.02555
Nguyen, K., & Todorovic, S. (2019) Feature weighting and boosting for few-shot segmentation. In Proceedings of the IEEE international conference on computer vision. pp. 622–631.
Peng, C., Zhang, X., Yu, G., Luo, G., & Sun, J. (2017). Large kernel matters–improve semantic segmentation by global convolutional network. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4353–4361.
Rakelly, K., Shelhamer, E., Darrell, T., Efros, A., & Levine, S. (2018). Conditional networks for few-shot semantic segmentation. In International conference on learning representations workshop.
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. pp. 91–99.
Rother, C., Minka, T., Blake, A., & Kolmogorov, V. (2006). Cosegmentation of image pairs by histogram matching-incorporating a global constraint into mrfs. In 2006 IEEE computer society conference on computer vision and pattern recognition (Proceedings of the IEEE/CVF conference on computer vision and pattern recognition’06). vol. 1, pp. 993–1000. IEEE.
Rubinstein, M., Joulin, A., Kopf, J., & Liu, C. (2013). Unsupervised joint object discovery and segmentation in internet images. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1939–1946.
Shaban, A., Bansal, S., Liu, Z., Essa, I., & Boots, B. (2017). One-shot learning for semantic segmentation. arXiv preprint arXiv:1709.03410
Siam, M., & Oreshkin, B. (2019). Adaptive masked weight imprinting for few-shot segmentation. arXiv preprint arXiv:1902.11123
Siam, M., Oreshkin, B.N., & Jagersand, M. (2019). Amp: Adaptive masked proxies for few-shot segmentation. In Proceedings of the IEEE international conference on computer vision. pp. 5249–5258.
Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for few-shot learning. In Advances in neural information processing systems.
Tian, Z., Zhao, H., Shu, M., Yang, Z., Li, R., & Jia, J. (2020). Prior guided feature enrichment network for few-shot segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Tsai, C. C., Li, W., Hsu, K. J., Qian, X., & Lin, Y. Y. (2018). Image co-saliency detection and co-segmentation via progressive joint optimization. IEEE Transactions on Image Processing, 28(1), 56–71.
Vinyals, O., Blundell, C., Lillicrap, T., & Wierstra, D., et al. (2016). Matching networks for one shot learning. In Advances in neural information processing systems. pp. 3630–3638.
Wang, H., Zhang, X., Hu, Y., Yang, Y., Cao, X., & Zhen, X. (2020). Few-shot semantic segmentation with democratic attention networks. In European conference on computer vision. pp. 730–746. Springer.
Wang, K., Liew, J.H., Zou, Y., Zhou, D., & Feng, J. (2019). Panet: Few-shot image semantic segmentation with prototype alignment. In Proceedings of the IEEE international conference on computer vision. pp. 9197–9206.
Wu, T., Tang, S., Zhang, R., Cao, J., & Zhang, Y. (2020). Cgnet: A light-weight context guided network for semantic segmentation. IEEE Transactions on Image Processing, 30, 1169–1179.
Xie, G., Liu, J., Xiong, H., & Shao, L. (2021). Scale-aware graph neural network for few-shot semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5475–5484.
Yang, B., Liu, C., Li, B., Jiao, J., & Ye, Q. (2020). Prototype mixture models for few-shot semantic segmentation. arXiv preprint arXiv:2008.03898
Yang, F.S.Y., Zhang, L., Xiang, T., Torr, P.H., & Hospedales, T.M. (2018). Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
Yang, L., Han, J., Zhang, D., Liu, N., & Zhang, D. (2018). Segmentation in weakly labeled videos via a semantic ranking and optical warping network. IEEE Transactions on Image Processing, 27(8), 4025–4037.
Yang, L., Zhuo, W., Qi, L., Shi, Y., & Gao, Y. (2021). Mining latent classes for few-shot segmentation. In Proceedings of the IEEE/CVF international conference on computer vision. pp. 8721–8730.
Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., & Lipson, H. (2015). Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579
Zhang, C., Lin, G., Liu, F., Guo, J., Wu, Q., & Yao, R. (2019). Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. In Proceedings of the IEEE international conference on computer vision. pp. 9587–9595.
Zhang, C., Lin, G., Liu, F., Yao, R., & Shen, C. (2019). Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 5217–5226.
Zhang, X., Wei, Y., Yang, Y., & Huang, T. (2018). Sg-one: Similarity guidance network for one-shot semantic segmentation. arXiv preprint arXiv:1810.09091
Zhou, S., Nie, D., Adeli, E., Yin, J., Lian, J., & Shen, D. (2019). High-resolution encoder-decoder networks for low-contrast medical image segmentation. IEEE Transactions on Image Processing, 29, 461–475.
Acknowledgements
This research is supported by the National Research Foundation, Singapore under its AI Singapore Programme (AISG Award No: AISG-RP-2018-003), the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (MOE-T2EP20220-0007) and Tier 1 (RG95/20). This research is also partly supported by the Agency for Science, Technology and Research (A*STAR) under its AME Programmatic Funds (Grant No. A20H6b0151).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Christoph H. Lampert.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, W., Zhang, C., Lin, G. et al. CRCNet: Few-Shot Segmentation with Cross-Reference and Region–Global Conditional Networks. Int J Comput Vis 130, 3140–3157 (2022). https://doi.org/10.1007/s11263-022-01677-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-022-01677-7