Indirect visual–semantic alignment for generalized zero-shot recognition

Chen, Yan-He; Yeh, Mei-Chen

doi:10.1007/s00530-024-01313-z

Indirect visual–semantic alignment for generalized zero-shot recognition

Regular Paper
Published: 03 April 2024

Volume 30, article number 111, (2024)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Yan-He Chen¹ &
Mei-Chen Yeh¹

84 Accesses
Explore all metrics

Abstract

Our paper addresses the challenge of generalized zero-shot learning, where the label of a target image may belong to either a seen or an unseen category. Previous methods for this task typically learn a joint embedding space where image features and their corresponding class prototypes are directly aligned. However, this can be difficult due to the inherent gap between the visual and semantic space. To overcome this challenge, we propose a novel learning framework that relaxes the alignment requirement. Our approach employs a metric learning-based loss function to optimize the visual embedding model, allowing for different penalty strengths on within-class and between-class similarities. By avoiding pair-wise comparisons between image and class embeddings, our approach achieves more flexibility in learning discriminative and generalized visual features. Our extensive experiments demonstrate the superiority of our method with performance on par with the state-of-the-art on five benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Zero-Shot Visual Recognition via Bidirectional Latent Embedding

Article 28 June 2017

An Inverse Mapping with Manifold Alignment for Zero-Shot Learning

Embarrassingly Easy Zero-Shot Image Recognition

Data availability

Datasets used/analysed during the study are publicly available in the following repositories. CUB https://www.vision.caltech.edu/datasets/cub_200_2011/, FLO https://www.robots.ox.ac.uk/~vgg/data/flowers/102/, SUN https://cs.brown.edu/~gmpatter/sunattributes.html, AWA2 https://cvml.ista.ac.at/AwA2/, aPY https://vision.cs.uiuc.edu/attributes/, Data splits and features http://www.mpi-inf.mpg.de/zsl-benchmark.

References

Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001, California Institute of Technology (2011)
Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1778–1785 (2009)
Kim, H., Lee, J., Byun, H.: Discriminative deep attributes for generalized zero-shot learning. Pattern Recogn. 124, 108435 (2022). https://doi.org/10.1016/j.patcog.2021.108435
Article Google Scholar
Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2251–2265 (2019). https://doi.org/10.1109/TPAMI.2018.2857768
Article Google Scholar
Wang, X., Han, X., Huang, W., Dong, D., Scott, M.R.: Multi-similarity loss with general pair weighting for deep metric learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5022–5030 (2019)
Sun, Y., Cheng, C., Zhang, Y., Zhang, C., Zheng, L., Wang, Z., Wei, Y.: Circle loss: A unified perspective of pair similarity optimization. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 6398–6407 (2020)
Bucher, M., Herbin, S., Jurie, F.: Improving semantic embedding consistency by metric learning for zero-shot classification. In: European Conference on Computer Vision, pp. 730–746 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hu, P., Sclaroff, S., Saenko, K.: Uncertainty-aware learning for zero-shot semantic segmentation. In: Advances in Neural Information Processing Systems, pp. 21713–21724 (2020)
Liu, J., Shi, C., Tu, D., Shi, Z., Liu, Y.: Zero-shot image classification based on a learnable deep metric. Sensors 21(9) (2021) https://doi.org/10.3390/s21093241
Chen, S., Wang, W., Xia, B., Peng, Q., You, X., Zheng, F., Shao, L.: Free: Feature refinement for generalized zero-shot learning. In: IEEE International Conference on Computer Vision, pp. 122–131 (2021)
Chandhok, S., Balasubramanian, V.N.: Two-level adversarial visual-semantic coupling for generalized zero-shot learning. In: IEEE Winter Conference on Applications of Computer Vision, pp. 3100–3108 (2021)
Cheng, D., Wang, G., Wang, N., Zhang, D., Zhang, Q., Gao, X.: Discriminative and robust attribute alignment for zero-shot learning. IEEE Trans. Circuits Syst. Video Technol. 33(8), 4244–4256 (2023). https://doi.org/10.1109/TCSVT.2023.3243205
Article Google Scholar
Nilsback, M.-E., Zisserman, A.: Automated flower classification over a large number of classes. In: Indian Conference on Computer Vision, Graphics and Image Processing, pp. 722–729 (2008)
Patterson, G., Hays, J.: Sun attribute database: Discovering, annotating, and recognizing scene attributes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2751–2758 (2012)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30, pp. 6000–6010 (2017)
Sun, H., Li, B., Dan, Z., Hu, W., Du, B., Yang, W., Wan, J.: Multi-level feature interaction and efficient non-local information enhanced channel attention for image dehazing. Neural Netw. 163, 10–27 (2023). https://doi.org/10.1016/j.neunet.2023.03.017
Article Google Scholar
Wan, J., Lai, Z., Liu, J., Zhou, J., Gao, C.: Robust face alignment by multi-order high-precision hourglass network. IEEE Trans. Image Process. 30, 121–133 (2021). https://doi.org/10.1109/TIP.2020.3032029
Article Google Scholar
Huang, Y., Huang, H.: Stacked attention hourglass network based robust facial landmark detection. Neural Netw. 157, 323–335 (2023). https://doi.org/10.1016/j.neunet.2022.10.021
Article Google Scholar
Zhu, Y., Xie, J., Tang, Z., Peng, X., Elgammal, A.: Semantic-guided multi-attention localization for zero-shot learning. Adv. Neural Inform. Process. Syst. (2019). https://doi.org/10.48550/arXiv.1903.00502
Article Google Scholar
Ji, Z., Fu, Y., Guo, J., Pang, Y., Zhang, Z.M., et al.: Stacked semantics-guided attention model for fine-grained zero-shot learning. Adv. Neural. Inf. Process. Syst. 31, 5998–6007 (2018)
Google Scholar
Huynh, D., Elhamifar, E.: Fine-grained generalized zero-shot learning via dense attribute-based attention. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 4483–4493 (2020)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Adv. Neural. Inf. Process. Syst. 2, 2672–2680 (2014)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Xian, Y., Lorenz, T., Schiele, B., Akata, Z.: Feature generating networks for zero-shot learning. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 5542–5551 (2018)
Xian, Y., Sharma, S., Schiele, B., Akata, Z.: f-vaegan-d2: A feature generating framework for any-shot learning. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 10275–10284 (2019)
Li, J., Jing, M., Lu, K., Ding, Z., Zhu, L., Huang, Z.: Leveraging the invariant side of generative zero-shot learning. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 7402–7411 (2019)
Narayan, S., Gupta, A., Khan, F.S., Snoek, C.G., Shao, L.: Latent embedding feedback and discriminative features for zero-shot classification. In: European Conference on Computer Vision, pp. 479–495 (2020)
Pambala, A., Dutta, T., Biswas, S.: Generative model with semantic embedding and integrated classifier for generalized zero-shot learning. In: IEEE Winter Conference on Applications of Computer Vision, pp. 1237–1246 (2020)
Niu, C., Shang, J., Huang, J., Yang, J., Song, Y., Zhou, Z., Zhou, G.: Unbiased feature generating for generalized zero-shot learning. J. Vis. Commun. Image Represent. 89, 103657 (2022). https://doi.org/10.1016/j.jvcir.2022.103657
Article Google Scholar
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein gans. Adv. Neural. Inf. Process. Syst. 30, 5769–5779 (2017)
Google Scholar
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. Int. Conf. Mach. Learn. 70, 214–223 (2017)
Google Scholar
Felix, R., Reid, I., Carneiro, G., et al.: Multi-modal cycle-consistent generalized zero-shot learning. In: European Conference on Computer Vision, pp. 21–37 (2018)
Keshari, R., Singh, R., Vatsa, M.: Generalized zero-shot learning via over-complete distribution. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 13300–13308 (2020)
Han, Z., Fu, Z., Chen, S., Yang, J.: Contrastive embedding for generalized zero-shot learning. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2371–2381 (2021)
Jiang, H., Wang, R., Shan, S., Chen, X.: Transferable contrastive network for generalized zero-shot learning. In: IEEE International Conference on Computer Vision, pp. 9765–9774 (2019)
Pu, S., Zhao, K., Zheng, M.: Alignment-uniformity aware representation learning for zero-shot video classification. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 19936–19945 (2022)
Xie, J., Zheng, S.: Zero-shot object detection through vision-language embedding alignment. In: IEEE International Conference on Data Mining Workshops, pp. 1–15 (2022)
Yue, Z., Wang, T., Sun, Q., Hua, X.-S., Zhang, H.: Counterfactual zero-shot and open-set visual recognition. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 15404–15414 (2021)
Reed, S., Akata, Z., Lee, H., Schiele, B.: Learning deep representations of fine-grained visual descriptions. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 49–58 (2016)
Min, S., Yao, H., Xie, H., Wang, C., Zha, Z.-J., Zhang, Y.: Domain-aware visual bias eliminating for generalized zero-shot learning. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 12664–12673 (2020)
Geng, C., Tao, L., Chen, S.: Guided cnn for generalized zero-shot and open-set recognition using visual and semantic prototypes. Pattern Recogn. 102, 107263 (2020). https://doi.org/10.1016/j.patcog.2020.107263
Article Google Scholar
Li, X., Xu, Z., Wei, K., Deng, C.: Generalized zero-shot learning via disentangled representation. AAAI Conf. Artif. Intell. 35, 1966–1974 (2021)
Google Scholar
Naeem, M.F., Khan, M.G.Z.A., Xian, Y., Afzal, M.Z., Stricker, D., Gool, L.V., Tombari, F.: I2mvformer: Large language model generated multi-view document supervision for zero-shot image classification. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 15169–15179 (2023)
Christensen, A., Mancini, M., Koepke, A.S., Winther, O., Akata, Z.: Image-free classifier injection for zero-shot classification. In: IEEE International Conference on Computer Vision, pp. 19072–19081 (2023)
Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)
Google Scholar
Su, Y., Zhu, H., Tan, Y., An, S., Xing, M.: Prime: privacy-preserving video anomaly detection via motion exemplar guidance. Knowl.-Based Syst. 278, 110872 (2023). https://doi.org/10.1016/j.knosys.2023.110872
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the reviewers for their helpful comments. The authors would also like to thank Yi-Hua Chao for her help in conducting additional experiments. This work was supported by the National Science and Technology Council of Taiwan (MOST 111-2221-E-003-016-MY2, MOST 110-2634-F-002-050).

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Taiwan Normal University, No. 88, Sec. 4, Tingzhou Rd., Taipei, Taiwan ROC
Yan-He Chen & Mei-Chen Yeh

Authors

Yan-He Chen
View author publications
You can also search for this author in PubMed Google Scholar
Mei-Chen Yeh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yan-He Chen prepared for all materials and Mei-Chen wrote the manuscript text. All authors reviewed the manuscript.

Corresponding author

Correspondence to Mei-Chen Yeh.

Ethics declarations

Conflict of interest

The authors declare no conflict of interests.

Additional information

Communicated by J. Gao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, YH., Yeh, MC. Indirect visual–semantic alignment for generalized zero-shot recognition. Multimedia Systems 30, 111 (2024). https://doi.org/10.1007/s00530-024-01313-z

Download citation

Received: 14 August 2023
Accepted: 05 March 2024
Published: 03 April 2024
DOI: https://doi.org/10.1007/s00530-024-01313-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Indirect visual–semantic alignment for generalized zero-shot recognition

Abstract

Access this article

Similar content being viewed by others

Zero-Shot Visual Recognition via Bidirectional Latent Embedding

An Inverse Mapping with Manifold Alignment for Zero-Shot Learning

Embarrassingly Easy Zero-Shot Image Recognition

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Indirect visual–semantic alignment for generalized zero-shot recognition

Abstract

Access this article

Similar content being viewed by others

Zero-Shot Visual Recognition via Bidirectional Latent Embedding

An Inverse Mapping with Manifold Alignment for Zero-Shot Learning

Embarrassingly Easy Zero-Shot Image Recognition

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation