Skip to main content

Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification

Part of the Lecture Notes in Computer Science book series (LNIP,volume 12367)

Abstract

Zero-shot learning strives to classify unseen categories for which no data is available during training. In the generalized variant, the test samples can further belong to seen or unseen categories. The state-of-the-art relies on Generative Adversarial Networks that synthesize unseen class features by leveraging class-specific semantic embeddings. During training, they generate semantically consistent features, but discard this constraint during feature synthesis and classification. We propose to enforce semantic consistency at all stages of (generalized) zero-shot learning: training, feature synthesis and classification. We first introduce a feedback loop, from a semantic embedding decoder, that iteratively refines the generated features during both the training and feature synthesis stages. The synthesized features together with their corresponding latent embeddings from the decoder are then transformed into discriminative features and utilized during classification to reduce ambiguities among categories. Experiments on (generalized) zero-shot object and action classification reveal the benefit of semantic consistency and iterative feedback, outperforming existing methods on six zero-shot learning benchmarks. Source code at https://github.com/akshitac8/tfvaegan.

Keywords

  • Generalized zero-shot classification
  • Feature synthesis

S. Narayan and A. Gupta—Equal Contribution.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Akata, Zeynep, Perronnin, Florent, Harchaoui, Zaid, Schmid, Cordelia: Label-embedding for image classification. TPAMI 38(7), 1425–1438 (2015)

    CrossRef  Google Scholar 

  2. Arjovsky, M., Bottou, L.: Towards principled methods for training generative adversarial networks. arXiv preprint arXiv:1701.04862 (2017)

  3. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein gan. arXiv preprint arXiv:1701.07875 (2017)

  4. Caba Heilbron, F., Escorcia, V., Ghanem, B., Carlos Niebles, J.: Activitynet: a large-scale video benchmark for human activity understanding. In: CVPR (2015)

    Google Scholar 

  5. Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. In: CVPR (2017)

    Google Scholar 

  6. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Li, F.-F.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)

    Google Scholar 

  7. Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: NeurIPS (2016)

    Google Scholar 

  8. Felix, R., Kumar, V.B., Reid, I., Carneiro, G.: Multi-modal cycle-consistent generalized zero-shot learning. In: ECCV (2018)

    Google Scholar 

  9. Frome, A., et al.: Devise: a deep visual-semantic embedding model. In: NeurIPS (2013)

    Google Scholar 

  10. Yanwei, F., Hospedales, T.M., Xiang, T., Gong, S.: Transductive multi-view zero-shot learning. TPAMI 37(11), 2332–2345 (2015)

    CrossRef  Google Scholar 

  11. Goodfellow, I., PougetAbadie, J., Mirza, M., Xu, B., Warde-Farley, D.: Generative adversarial nets. In: NeurIPS (2014)

    Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  13. Huang, H., Wang, C., Yu, P.S., Wang, C.-D.: Generative dual adversarial network for generalized zero-shot learning. In: CVPR (2019)

    Google Scholar 

  14. Huh, M., Sun, S.-H., Zhang, N.: Feedback adversarial learning: spatial feedback for improving generative adversarial networks. In: CVPR (2019)

    Google Scholar 

  15. Jain, M., van Gemert, J.C., Mensink, T., Snoek, C.G.M.: Objects2action: classifying and localizing actions without any video example. In: ICCV (2015)

    Google Scholar 

  16. Jayaraman, D., Grauman, K.: Zero-shot recognition with unreliable attributes. In: NeurIPS (2014)

    Google Scholar 

  17. Jiang, H., Wang, R., Shan, S., Chen, X.: Transferable contrastive network for generalized zero-shot learning. In: ICCV (2019)

    Google Scholar 

  18. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014)

    Google Scholar 

  19. Kodirov, E., Xiang, T., Fu, Z., Gong, S.: Unsupervised domain adaptation for zero-shot learning. In: ICCV (2015)

    Google Scholar 

  20. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: ICCV (2011)

    Google Scholar 

  21. Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. TPAMI 36(3), 453–465 (2013)

    CrossRef  Google Scholar 

  22. Li, J., Jing, M., Lu, K., Ding, Z., Zhu, L., Huang, Z.: Leveraging the invariant side of generative zero-shot learning. In: CVPR (2019)

    Google Scholar 

  23. Li, Z., Yang, J., Liu, Z., Yang, X., Jeon, G., Wu, W.: Feedback network for image super-resolution. In: CVPR (2019)

    Google Scholar 

  24. van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. JMLR 9(11), 2579–2605 (2008)

    MATH  Google Scholar 

  25. Mandal, D., et al.: Out-of-distribution detection for generalized zero-shot action recognition. In: CVPR (2019)

    Google Scholar 

  26. Mettes, P., Snoek, C.G.M.: Spatial-aware object embeddings for zero-shot localization and classification of actions. In: ICCV (2017)

    Google Scholar 

  27. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NeurIPS (2013)

    Google Scholar 

  28. Mishra, A., Verma, V.K., Reddy, M.S.K., Arulkumar S, Rai, P., Mittal, A.: A generative approach to zero-shot and few-shot action recognition. In: WACV (2018)

    Google Scholar 

  29. Nilsback, M.-E., Zisserman, A.: Automated flower classification over a large number of classes. In: ICVGIP (2008)

    Google Scholar 

  30. Patterson, G., Hays, J.: Sun attribute database: discovering, annotating, and recognizing scene attributes. In: CVPR (2012)

    Google Scholar 

  31. Paul, A., Krishnan, N.C., Munjal, P.: Semantically aligned bias reducing zero shot learning. In: CVPR (2019)

    Google Scholar 

  32. Reed, S., Akata, Z., Lee, H., Schiele, B.: Learning deep representations of fine-grained visual descriptions. In: CVPR (2016)

    Google Scholar 

  33. Rohrbach, M., Ebert, S., Schiele, B.: Transfer learning in a transductive setting. In: NeurIPS (2013)

    Google Scholar 

  34. Romera-Paredes, B., Torr, P.: An embarrassingly simple approach to zero-shot learning. In: ICML (2015)

    Google Scholar 

  35. Schonfeld, E., Ebrahimi, S., Sinha, S., Darrell, T., Akata, Z.: Generalized zero-and few-shot learning via aligned variational autoencoders. In: CVPR (2019)

    Google Scholar 

  36. Shama, F., Mechrez, R., Shoshan, A., Zelnik-Manor, L.: Adversarial feedback loop. In: ICCV (2019)

    Google Scholar 

  37. Song, J., Shen, C., Yang, Y., Liu, Y., Song, M.: Transductive unbiased embedding for zero-shot learning. In: CVPR (2018)

    Google Scholar 

  38. Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)

  39. Verma, V.K., Rai, P.: A simple exponential family framework for zero-shot learning. In: ECML (2017)

    Google Scholar 

  40. Welinder, P., et al.: Caltech-ucsd birds 200. Technical report CNS-TR-2010-001, Caltech (2010)

    Google Scholar 

  41. Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. TPAMI 41(9), 2251–2265 (2018)

    Google Scholar 

  42. Xian, Y., Lorenz, T., Schiele, B., Akata, Z.: Feature generating networks for zero-shot learning. In: CVPR (2018)

    Google Scholar 

  43. Xian, Y., Sharma, S., Schiele, B., Akata, Z.: F-VAEGAN-D2: a feature generating framework for any-shot learning. In: CVPR (2019)

    Google Scholar 

  44. Xu, X., Hospedales, T., Gong, S.: Transductive zero-shot action recognition by word-vector embedding. IJCV 123(3), 309–333 (2017)

    CrossRef  MathSciNet  Google Scholar 

  45. Ye, M., Guo, Y.: Zero-shot classification with discriminative semantic representation learning. In: CVPR (2017)

    Google Scholar 

  46. Zamir, A.R., et al.: Feedback networks. In: CVPR (2017)

    Google Scholar 

  47. Zhang, C., Peng, Y.: Visual data synthesis via GAN for zero-shot video classification. In: IJCAI (2018)

    Google Scholar 

  48. Zhao, S., Song, J., Ermon, S.: InfoVAE: balancing learning and inference in variational autoencoders. In: AAAI (2019)

    Google Scholar 

  49. Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)

    Google Scholar 

  50. Zhu, Y., Long, Y., Guan, Y., Newsam, S., Shao, L.: Towards universal representation for unseen action recognition. In: CVPR (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanath Narayan .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 13238 KB)

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Narayan, S., Gupta, A., Khan, F.S., Snoek, C.G.M., Shao, L. (2020). Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12367. Springer, Cham. https://doi.org/10.1007/978-3-030-58542-6_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58542-6_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58541-9

  • Online ISBN: 978-3-030-58542-6

  • eBook Packages: Computer ScienceComputer Science (R0)