Prototype Mixture Models for Few-Shot Semantic Segmentation

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12353)


Few-shot segmentation is challenging because objects within the support and query images could significantly differ in appearance and pose. Using a single prototype acquired directly from the support image to segment the query image causes semantic ambiguity. In this paper, we propose prototype mixture models (PMMs), which correlate diverse image regions with multiple prototypes to enforce the prototype-based semantic representation. Estimated by an Expectation-Maximization algorithm, PMMs incorporate rich channel-wised and spatial semantics from limited support images. Utilized as representations as well as classifiers, PMMs fully leverage the semantics to activate objects in the query image while depressing background regions in a duplex manner. Extensive experiments on Pascal VOC and MS-COCO datasets show that PMMs significantly improve upon state-of-the-arts. Particularly, PMMs improve 5-shot segmentation performance on MS-COCO by up to 5.82% with only a moderate cost for model size and inference speed (Code is available at


Semantic segmentation Few-shot segmentation Few-shot learning Mixture models 



This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 61836012, 61671427, and 61771447.

Supplementary material

504445_1_En_45_MOESM1_ESM.pdf (1.9 mb)
Supplementary material 1 (pdf 1915 KB)


  1. 1.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE CVPR, pp. 6230–6239 (2017)Google Scholar
  2. 2.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). Scholar
  3. 3.
    Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: ICLR, pp. 6230–6239 (2015)Google Scholar
  4. 4.
    Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)Google Scholar
  5. 5.
    Chen, L., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. CoRR abs/1706.05587 (2017)Google Scholar
  6. 6.
    Chen, L., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV, pp. 833–851 (2018)Google Scholar
  7. 7.
    Hwang, J., Yu, S.X., Shi, J., Collins, M.D., Yang, T., Zhang, X., Chen, L.: SegSort: segmentation by discriminative sorting of segments. In: IEEE ICCV, pp. 7334–7344 (2019)Google Scholar
  8. 8.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 386–397 (2020)Google Scholar
  9. 9.
    Wan, F., Liu, C., Ke, W., Ji, X., Jiao, J., Ye, Q.: C-MIL: continuation multiple instance learning for weakly supervised object detection. In: IEEE CVPR, pp. 2199–2208 (2019)Google Scholar
  10. 10.
    Wan, F., Wei, P., Han, Z., Jiao, J., Ye, Q.: Min-entropy latent model for weakly supervised object detection. IEEE Trans. Pattern Anal. Machine Intell. 41(10), 2395–2409 (2019)Google Scholar
  11. 11.
    Zhang, X., Wan, F., Liu, C., Ji, R., Ye, Q.: Freeanchor: learning to match anchors for visual object detection. In: NeurIPS, pp. 147–155 (2019)Google Scholar
  12. 12.
    Tokmakov, P., Wang, Y., Hebert, M.: Learning compositional representations for few-shot recognition. In: IEEE ICCV, pp. 6372–6381 (2019)Google Scholar
  13. 13.
    Nguyen, K., Todorovic, S.: Feature weighting and boosting for few-shot segmentation. In: IEEE ICCV, pp. 622–631 (2019)Google Scholar
  14. 14.
    Shaban, A., Bansal, S., Liu, Z., Essa, I., Boots, B.: One-shot learning for semantic segmentation. In: BMVC (2017)Google Scholar
  15. 15.
    Zhang, X., Wei, Y., Yang, Y., Huang, T.: SG-One: similarity guidance network for one-shot semantic segmentation. CoRR abs/1810.09091 (2018)Google Scholar
  16. 16.
    Dong, N., Xing, E.P.: Few-shot semantic segmentation with prototype learning. In: BMVC, p. 79 (2018)Google Scholar
  17. 17.
    Hao, F., He, F., Cheng, J., Wang, L., Cao, J., Tao, D.: Collect and select: semantic alignment metric learning for few-shot learning. In: IEEE ICCV, pp. 8460–8469 (2019)Google Scholar
  18. 18.
    Wang, K., Liew, J., Zou, Y., Zhou, D., Feng, J.: PANet: few-shot image semantic segmentation with prototype alignment, pp. 622–631 (2019)Google Scholar
  19. 19.
    Zhang, C., Lin, G., Liu, F., Yao, R., Shen, C.: CANet: class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In: IEEE CVPR, pp. 5217–5226 (2019)Google Scholar
  20. 20.
    Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017)Google Scholar
  21. 21.
    Kirillov, A., Girshick, R., He, K., Dollar, P.: Panoptic feature pyramid networks. In: IEEE CVPR, pp. 6399–6408 (2019)Google Scholar
  22. 22.
    Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In: NeurIPS, pp. 3630–3638 (2016)Google Scholar
  23. 23.
    Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H.S., Hospedales, T.M.: Learning to compare: Relation network for few-shot learning. In: IEEE CVPR, pp. 1199–1208 (2018)Google Scholar
  24. 24.
    Wang, Y.-X., Hebert, M.: Learning to learn: model regression networks for easy small sample learning. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 616–634. Springer, Cham (2016). Scholar
  25. 25.
    Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: ICLR (2017)Google Scholar
  26. 26.
    Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML, pp. 1126–1135 (2017)Google Scholar
  27. 27.
    Jamal, M.A., Qi, G.J.: Task agnostic meta-learning for few-shot learning. In: IEEE ICCV, pp. 111719–111727 (2019)Google Scholar
  28. 28.
    Hariharan, B., Girshick, R.B.: Low-shot visual recognition by shrinking and hallucinating features. In: IEEE ICCV, pp. 3037–3046 (2017)Google Scholar
  29. 29.
    Wang, Y., Girshick, R.B., Hebert, M., Hariharan, B.: Low-shot learning from imaginary data. In: IEEE CVPR, pp. 7278–7286 (2018)Google Scholar
  30. 30.
    Chen, W.Y., Liu, Y.C., Kira, Z., Wang, Y.C.: A closer look at few-shot classification. In: IEEE ICLR (2019)Google Scholar
  31. 31.
    Snell, J., Swersky, K., Zemel, R.S.: Prototypical networks for few-shot learning. In: NeurIPS, pp. 4077–4087 (2017)Google Scholar
  32. 32.
    Banerjee, A., Dhillon, I.S., Ghosh, J., Sra, S.: Clustering on the unit hypersphere using von Mises-Fisher distributions. J. Mach. Learn. Res. 6, 1345–1382 (2005)MathSciNetzbMATHGoogle Scholar
  33. 33.
    Ke, W., Chen, J., Jiao, J., Zhao, G., Ye, Q.: SRN: side-output residual network for object symmetry detection in the wild. In: IEEE CVPR, pp. 302–310 (2017)Google Scholar
  34. 34.
    Hariharan, B., Arbelaez, P., Bourdev, L.D., Maji, S., Malik, J.: Semantic contours from inverse detectors. In: IEEE ICCV, pp. 991–998 (2011)Google Scholar
  35. 35.
    Rakelly, K., Shelhamer, E., Darrell, T., Efros, A.A., Levine, S.: Conditional networks for few-shot semantic segmentation. In: ICLR Workshop (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.University of Chinese Academy of SciencesBeijingChina

Personalised recommendations