Skip to main content
Log in

Semantic Contrastive Embedding for Generalized Zero-Shot Learning

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Generalized zero-shot learning (GZSL) aims to recognize objects from both seen and unseen classes when only the labeled examples from seen classes are provided. Recent feature generation methods learn a generative model that can synthesize the missing visual features of unseen classes to mitigate the data-imbalance problem in GZSL. However, the original visual feature space is suboptimal for GZSL recognition since it lacks semantic information, which is vital for recognizing the unseen classes. To tackle this issue, we propose to integrate the feature generation model with an embedding model. Our GZSL framework maps both the real and the synthetic samples produced by the generation model into an embedding space, where we perform the final GZSL classification. Specifically, we propose a semantic contrastive embedding (SCE) for our GZSL framework. Our SCE consists of attribute-level contrastive embedding and class-level contrastive embedding. They aim to obtain the transferable and discriminative information, respectively, in the embedding space. We evaluate our GZSL method with semantic contrastive embedding, named SCE-GZSL, on four benchmark datasets. The results show that our SCE-GZSL method can achieve the state-of-the-art or the second-best on these datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Akata, Z., Perronnin, F., Harchaoui, Z., & Schmid, C. (2013). Label-embedding for attribute-based classification. In CVPR.

  • Akata, Z., Reed, S., Walter, D., Lee, H., & Schiele, B. (2015). Evaluation of output embeddings for fine-grained image classification. In CVPR.

  • Annadani, Y., & Biswas, S. (2018). Preserving semantic relations for zero-shot learning. In CVPR.

  • Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein gan. arXiv preprint arXiv:1701.07875

  • Atzmon, Y., & Chechik, G. (2019). Adaptive confidence smoothing for generalized zero-shot learning. In CVPR.

  • Bucher, M., Herbin, S., & Jurie, F. (2016). Improving semantic embedding consistency by metric learning for zero-shot classiffication. In ECCV.

  • Bucher, M., Herbin, S., & Jurie, F. (2017). Generating visual representations for zero-shot classification. In ICCV.

  • Cacheux, Y. L., Borgne, H. L., & Crucianu, M. (2019). Modeling inter and intra-class relations in the triplet loss for zero-shot learning. In ICCV.

  • Changpinyo, S., Chao, W.L., Gong, B., & Sha, F. (2016). Synthesized classifiers for zero-shot learning. In CVPR.

  • Chao, W. L., Changpinyo, S., Gong, B., & Sha, F. (2016). An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In ECCV.

  • Chen, L., Zhang, H., Xiao, J., Liu, W., & Chang, S. F. (2018). Zero-shot visual recognition using semantics-preserving adversarial embedding networks. In CVPR.

  • Chen, S., Wang, W., Xia, B., Peng, Q., You, X., Zheng, F., & Shao, L. (2021). Free: Feature refinement for generalized zero-shot learning. In CVPR.

  • Chen, S., Xie, G., Liu, Y., Peng, Q., Sun, B., Li, H., You, X., & Shao, L. (2021). Hsva: Hierarchical semantic-visual adaptation for zero-shot learning. In NeurIPS.

  • Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In ICML.

  • Chen, X., Lan, X., Sun, F., & Zheng, N. (2020). A boundary based out-of-distribution classifier for generalized zero-shot learning. In ECCV.

  • Chen, Y. C., Chou, C. T., & Wang, Y. C. F. (2020). Learning to learn in a semi-supervised fashion. In ECCV.

  • Chen, Z., Luo, Y., Qiu, R., Wang, S., Huang, Z., Li, J., & Zhang, Z. (2021). Semantics disentangling for generalized zero-shot learning. In ICCV.

  • Dinh, L., Krueger, D., & Bengio, Y. (2014). Nice: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516

  • Dinh, L., Sohl-Dickstein, J., & Bengio, S. (2016). Density estimation using real nvp. arXiv preprint arXiv:1605.08803

  • Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In CVPR.

  • Felix, R., Kumar, V.B., Reid, I., & Carneiro, G. (2018). Multi-modal cycle-consistent generalized zero-shot learning. In ECCV.

  • Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., & Mikolov, T., et al. (2013). Devise: A deep visual-semantic embedding model. In NIPS.

  • Fu, Y., Hospedales, T. M., Xiang, T., Fu, Z., & Gong, S. (2014). Transductive multi-view embedding for zero-shot recognition and annotation. In ECCV.

  • Fu, Y., Hospedales, T. M., Xiang, T., & Gong, S. (2015a). Transductive multi-view zero-shot learning. In TPAMI.

  • Fu, Z., Xiang, T., Kodirov, E., & Gong, S. (2015b). Zero-shot object recognition by semantic manifold distance. In CVPR.

  • Fu, Z., Xiang, T., Kodirov, E., & Gong, S. (2017). Zero-shot learning on semantic class prototype graph. In TPAMI.

  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In NIPS.

  • Gutmann, M., & Hyvärinen, A. (2010). Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In AISTATS.

  • Han, Z., Fu, Z., Chen, S., & Yang, J. (2021). Contrastive embedding for generalized zero-shot learning. In CVPR.

  • Han, Z., Fu, Z., & Yang, J. (2020). Learning the redundancy-free features for generalized zero-shot object recognition. In CVPR.

  • He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In CVPR.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.

  • Huang, H., Wang, C., Yu, P. S., & Wang, C. D. (2019). Generative dual adversarial network for generalized zero-shot learning. In CVPR.

  • Huynh, D., & Elhamifar, E. (2020). Fine-grained generalized zero-shot learning via dense attribute-based attention. In CVPR.

  • Jiang, H., Wang, R., Shan, S., & Chen, X. (2019). Transferable contrastive network for generalized zero-shot learning. In ICCV.

  • Keshari, R., Singh, R., & Vatsa, M. (2020). Generalized zero-shot learning via over-complete distribution. In CVPR.

  • Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C., & Krishnan, D. (2020). Supervised contrastive learning. In NeurIPS.

  • Kingma, D. P., & Dhariwal, P. (2018). Glow: Generative flow with invertible 1x1 convolutions. In NeurIPS.

  • Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114

  • Kodirov, E., Xiang, T., Fu, Z., & Gong, S. (2015). Unsupervised domain adaptation for zero-shot learning. In ICCV (pp. 2452–2460).

  • Kodirov, E., Xiang, T., & Gong, S. (2017). Semantic autoencoder for zero-shot learning. In CVPR.

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS.

  • Kumar Verma, V., Arora, G., Mishra, A., & Rai, P. (2018). Generalized zero-shot learning via synthesized examples. In CVPR.

  • Lampert, C. H., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In CVPR.

  • Lee, C. W., Fang, W., Yeh, C. K., & Frank Wang, Y. C. (2018). Multi-label zero-shot learning with structured knowledge graphs. In CVPR.

  • Li, J., Jing, M., Lu, K., Ding, Z., Zhu, L., & Huang, Z. (2019). Leveraging the invariant side of generative zero-shot learning. In CVPR.

  • Li, K., Min, M. R., & Fu, Y. (2019). Rethinking zero-shot learning: A conditional visual classification perspective. In ICCV.

  • Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In ECCV.

  • Liu, L., Zhou, T., Long, G., Jiang, J., Dong, X., & Zhang, C. (2020a). Isometric propagation network for generalized zero-shot learning. In ICLR.

  • Liu, S., Chen, J., Pan, L., Ngo, C. W., Chua, T. S., & Jiang, Y. G. (2020b). Hyperbolic visual embedding learning for zero-shot recognition. In CVPR.

  • Liu, S., Long, M., Wang, J., & Jordan, M. I. (2018). Generalized zero-shot learning with deep calibration network. In NeurIPS.

  • Liu, Y., Guo, J., Cai, D., & He, X. (2019a). Attribute attention for semantic disambiguation in zero-shot learning. In ICCV.

  • Liu, Z., Miao, Z., Zhan, X., Wang, J., Gong, B., & Yu, S. X. (2019b). Large-scale long-tailed recognition in an open world. In CVPR.

  • Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-sne. In JMLR.

  • Mandal, D., Narayan, S., Dwivedi, S. K., Gupta, V., Ahmed, S., Khan, F. S., & Shao, L. (2019). Out-of-distribution detection for generalized zero-shot action recognition. In CVPR.

  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013a). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In NIPS.

  • Min, S., Yao, H., Xie, H., Wang, C., Zha, Z. J., & Zhang, Y. (2020). Domain-aware visual bias eliminating for generalized zero-shot learning. In CVPR.

  • Mishra, A., Krishna Reddy, S., Mittal, A., & Murthy, H. A. (2018). A generative model for zero shot learning using conditional variational autoencoders. In CVPR.

  • Narayan, S., Gupta, A., Khan, F. S., Snoek, C. G., & Shao, L. (2020). Latent embedding feedback and discriminative features for zero-shot classification. In ECCV.

  • Oord, A.V.D., Li, Y., & Vinyals, O. (2018). Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748

  • Palatucci, M., Pomerleau, D., Hinton, G. E., & Mitchell, T. M. (2009). Zero-shot learning with semantic output codes. In NIPS.

  • Patterson, G., & Hays, J. (2012). Sun attribute database: Discovering, annotating, and recognizing scene attributes. In CVPR.

  • Paul, A., Krishnan, N. C., & Munjal, P. (2019). Semantically aligned bias reducing zero shot learning. In CVPR.

  • Reed, S., Akata, Z., Lee, H., & Schiele, B. (2016). Learning deep representations of fine-grained visual descriptions. In CVPR.

  • Romera-Paredes, B., & Torr, P. (2015). An embarrassingly simple approach to zero-shot learning. In ICML.

  • Sariyildiz, M. B., & Cinbis, R. G. (2019). Gradient matching generative networks for zero-shot learning. In CVPR.

  • Schonfeld, E., Ebrahimi, S., Sinha, S., Darrell, T., & Akata, Z. (2019). Generalized zero-and few-shot learning via aligned variational autoencoders. In CVPR.

  • Shen, Y., Qin, J., Huang, L., Zhu, F., & Shao, L. (2020). Invertible zero-shot recognition flows. In ECCV.

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  • Socher, R., Ganjoo, M., Manning, C. D., & Ng, A. (2013). Zero-shot learning through cross-modal transfer. In NIPS.

  • Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017). Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI.

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In CVPR.

  • Tong, B., Wang, C., Klinkigt, M., Kobayashi, Y., & Nonaka, Y. (2019). Hierarchical disentanglement of discriminative latent features for zero-shot learning. In CVPR.

  • Vyas, M. R., Venkateswara, H., & Panchanathan, S. (2020). Leveraging seen and unseen semantic relationships for generative zero-shot learning. In ECCV.

  • Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The caltech-ucsd birds-200-2011 dataset.

  • Wang, C., Min, S., Chen, X., Sun, X., & Li, H. (2021). Dual progressive prototype network for generalized zero-shot learning. In NeurIPS.

  • Wang, X., Ye, Y., & Gupta, A. (2018). Zero-shot recognition via semantic embeddings and knowledge graphs. In CVPR.

  • Wang, Y. X., Ramanan, D., & Hebert, M. (2017). Learning to model the tail. In Proceedings of the 31st international conference on neural information processing systems (pp. 7032–7042).

  • Wu, Z., Xiong, Y., Yu, S.X., & Lin, D. (2018). Unsupervised feature learning via non-parametric instance discrimination. In CVPR.

  • Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., & Schiele, B. (2016). Latent embeddings for zero-shot classification. In CVPR.

  • Xian, Y., Lampert, C. H., Schiele, B., & Akata, Z. (2018). Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. In TPAMI.

  • Xian, Y., Lorenz, T., Schiele, B., & Akata, Z. (2018). Feature generating networks for zero-shot learning. In CVPR.

  • Xian, Y., Schiele, B., & Akata, Z. (2017). Zero-shot learning-the good, the bad and the ugly. In CVPR.

  • Xian, Y., Sharma, S., Schiele, B., & Akata, Z. (2019). f-vaegan-d2: A feature generating framework for any-shot learning. In CVPR.

  • Xie, G. S., Liu, L., Jin, X., Zhu, F., Zhang, Z., Qin, J., Yao, Y., & Shao, L. (2019). Attentive region embedding network for zero-shot learning. In CVPR.

  • Xie, G.S., Liu, L., Zhu, F., Zhao, F., Zhang, Z., Yao, Y., Qin, J., & Shao, L. (2020). Region graph embedding network for zero-shot learning. In ECCV.

  • Yu, Y., Ji, Z., Han, J., & Zhang, Z. (2020). Episode-based prototype generating network for zero-shot learning. In CVPR.

  • Zhang, F., & Shi, G. (2019). Co-representation network for generalized zero-shot learning. In ICML.

  • Zhang, Z., & Saligrama, V. (2015). Zero-shot learning via semantic similarity embedding. In CVPR.

  • Zhu, Y., Xie, J., Liu, B., & Elgammal, A. (2019). Learning feature-to-feature translator by alternating back-propagation for generative zero-shot learning. In ICCV.

Download references

Acknowledgements

We would like to show our greatest appreciation to all editors and reviewers for their constructive comments on our paper. This work was supported by the National Science Foundation of China (Grant Nos. U1713208 and 61876085) and the China Postdoctoral Science Foundation (Grant Nos. 2017M621748, 2020M681606 and 2019T120430). This work was also supported by the Postgraduate Research & Practice Innovation Program of Jiangsu Province (No. KYCX21_0302).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhenyong Fu or Jian Yang.

Additional information

Communicated by Judy Hoffman.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, Z., Fu, Z., Chen, S. et al. Semantic Contrastive Embedding for Generalized Zero-Shot Learning. Int J Comput Vis 130, 2606–2622 (2022). https://doi.org/10.1007/s11263-022-01656-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-022-01656-y

Keywords

Navigation