Abstract
Zero-shot learning strives to classify unseen categories for which no data is available during training. In the generalized variant, the test samples can further belong to seen or unseen categories. The state-of-the-art relies on Generative Adversarial Networks that synthesize unseen class features by leveraging class-specific semantic embeddings. During training, they generate semantically consistent features, but discard this constraint during feature synthesis and classification. We propose to enforce semantic consistency at all stages of (generalized) zero-shot learning: training, feature synthesis and classification. We first introduce a feedback loop, from a semantic embedding decoder, that iteratively refines the generated features during both the training and feature synthesis stages. The synthesized features together with their corresponding latent embeddings from the decoder are then transformed into discriminative features and utilized during classification to reduce ambiguities among categories. Experiments on (generalized) zero-shot object and action classification reveal the benefit of semantic consistency and iterative feedback, outperforming existing methods on six zero-shot learning benchmarks. Source code at https://github.com/akshitac8/tfvaegan.
Keywords
- Generalized zero-shot classification
- Feature synthesis
S. Narayan and A. Gupta—Equal Contribution.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Akata, Zeynep, Perronnin, Florent, Harchaoui, Zaid, Schmid, Cordelia: Label-embedding for image classification. TPAMI 38(7), 1425–1438 (2015)
Arjovsky, M., Bottou, L.: Towards principled methods for training generative adversarial networks. arXiv preprint arXiv:1701.04862 (2017)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein gan. arXiv preprint arXiv:1701.07875 (2017)
Caba Heilbron, F., Escorcia, V., Ghanem, B., Carlos Niebles, J.: Activitynet: a large-scale video benchmark for human activity understanding. In: CVPR (2015)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. In: CVPR (2017)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Li, F.-F.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)
Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: NeurIPS (2016)
Felix, R., Kumar, V.B., Reid, I., Carneiro, G.: Multi-modal cycle-consistent generalized zero-shot learning. In: ECCV (2018)
Frome, A., et al.: Devise: a deep visual-semantic embedding model. In: NeurIPS (2013)
Yanwei, F., Hospedales, T.M., Xiang, T., Gong, S.: Transductive multi-view zero-shot learning. TPAMI 37(11), 2332–2345 (2015)
Goodfellow, I., PougetAbadie, J., Mirza, M., Xu, B., Warde-Farley, D.: Generative adversarial nets. In: NeurIPS (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Huang, H., Wang, C., Yu, P.S., Wang, C.-D.: Generative dual adversarial network for generalized zero-shot learning. In: CVPR (2019)
Huh, M., Sun, S.-H., Zhang, N.: Feedback adversarial learning: spatial feedback for improving generative adversarial networks. In: CVPR (2019)
Jain, M., van Gemert, J.C., Mensink, T., Snoek, C.G.M.: Objects2action: classifying and localizing actions without any video example. In: ICCV (2015)
Jayaraman, D., Grauman, K.: Zero-shot recognition with unreliable attributes. In: NeurIPS (2014)
Jiang, H., Wang, R., Shan, S., Chen, X.: Transferable contrastive network for generalized zero-shot learning. In: ICCV (2019)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014)
Kodirov, E., Xiang, T., Fu, Z., Gong, S.: Unsupervised domain adaptation for zero-shot learning. In: ICCV (2015)
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: ICCV (2011)
Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. TPAMI 36(3), 453–465 (2013)
Li, J., Jing, M., Lu, K., Ding, Z., Zhu, L., Huang, Z.: Leveraging the invariant side of generative zero-shot learning. In: CVPR (2019)
Li, Z., Yang, J., Liu, Z., Yang, X., Jeon, G., Wu, W.: Feedback network for image super-resolution. In: CVPR (2019)
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. JMLR 9(11), 2579–2605 (2008)
Mandal, D., et al.: Out-of-distribution detection for generalized zero-shot action recognition. In: CVPR (2019)
Mettes, P., Snoek, C.G.M.: Spatial-aware object embeddings for zero-shot localization and classification of actions. In: ICCV (2017)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NeurIPS (2013)
Mishra, A., Verma, V.K., Reddy, M.S.K., Arulkumar S, Rai, P., Mittal, A.: A generative approach to zero-shot and few-shot action recognition. In: WACV (2018)
Nilsback, M.-E., Zisserman, A.: Automated flower classification over a large number of classes. In: ICVGIP (2008)
Patterson, G., Hays, J.: Sun attribute database: discovering, annotating, and recognizing scene attributes. In: CVPR (2012)
Paul, A., Krishnan, N.C., Munjal, P.: Semantically aligned bias reducing zero shot learning. In: CVPR (2019)
Reed, S., Akata, Z., Lee, H., Schiele, B.: Learning deep representations of fine-grained visual descriptions. In: CVPR (2016)
Rohrbach, M., Ebert, S., Schiele, B.: Transfer learning in a transductive setting. In: NeurIPS (2013)
Romera-Paredes, B., Torr, P.: An embarrassingly simple approach to zero-shot learning. In: ICML (2015)
Schonfeld, E., Ebrahimi, S., Sinha, S., Darrell, T., Akata, Z.: Generalized zero-and few-shot learning via aligned variational autoencoders. In: CVPR (2019)
Shama, F., Mechrez, R., Shoshan, A., Zelnik-Manor, L.: Adversarial feedback loop. In: ICCV (2019)
Song, J., Shen, C., Yang, Y., Liu, Y., Song, M.: Transductive unbiased embedding for zero-shot learning. In: CVPR (2018)
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
Verma, V.K., Rai, P.: A simple exponential family framework for zero-shot learning. In: ECML (2017)
Welinder, P., et al.: Caltech-ucsd birds 200. Technical report CNS-TR-2010-001, Caltech (2010)
Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. TPAMI 41(9), 2251–2265 (2018)
Xian, Y., Lorenz, T., Schiele, B., Akata, Z.: Feature generating networks for zero-shot learning. In: CVPR (2018)
Xian, Y., Sharma, S., Schiele, B., Akata, Z.: F-VAEGAN-D2: a feature generating framework for any-shot learning. In: CVPR (2019)
Xu, X., Hospedales, T., Gong, S.: Transductive zero-shot action recognition by word-vector embedding. IJCV 123(3), 309–333 (2017)
Ye, M., Guo, Y.: Zero-shot classification with discriminative semantic representation learning. In: CVPR (2017)
Zamir, A.R., et al.: Feedback networks. In: CVPR (2017)
Zhang, C., Peng, Y.: Visual data synthesis via GAN for zero-shot video classification. In: IJCAI (2018)
Zhao, S., Song, J., Ermon, S.: InfoVAE: balancing learning and inference in variational autoencoders. In: AAAI (2019)
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)
Zhu, Y., Long, Y., Guan, Y., Newsam, S., Shao, L.: Towards universal representation for unseen action recognition. In: CVPR (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Narayan, S., Gupta, A., Khan, F.S., Snoek, C.G.M., Shao, L. (2020). Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12367. Springer, Cham. https://doi.org/10.1007/978-3-030-58542-6_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-58542-6_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58541-9
Online ISBN: 978-3-030-58542-6
eBook Packages: Computer ScienceComputer Science (R0)