Invertible Zero-Shot Recognition Flows

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12361)


Deep generative models have been successfully applied to Zero-Shot Learning (ZSL) recently. However, the underlying drawbacks of GANs and VAEs (e.g., the hardness of training with ZSL-oriented regularizers and the limited generation quality) hinder the existing generative ZSL models from fully bypassing the seen-unseen bias. To tackle the above limitations, for the first time, this work incorporates a new family of generative models (i.e., flow-based models) into ZSL. The proposed Invertible Zero-shot Flow (IZF) learns factorized data embeddings (i.e., the semantic factors and the non-semantic ones) with the forward pass of an invertible flow network, while the reverse pass generates data samples. This procedure theoretically extends conventional generative flows to a factorized conditional scheme. To explicitly solve the bias problem, our model enlarges the seen-unseen distributional discrepancy based on a negative sample-based distance measurement. Notably, IZF works flexibly with either a naive Bayesian classifier or a held-out trainable one for zero-shot recognition. Experiments on widely-adopted ZSL benchmarks demonstrate the significant performance gain of IZF over existing methods, in both classic and generalized settings.


Zero-Shot Learning Generative flows Invertible networks 

Supplementary material

504471_1_En_36_MOESM1_ESM.pdf (264 kb)
Supplementary material 1 (pdf 263 KB)


  1. 1.
    Akata, Z., Reed, S., Walter, D., Lee, H., Schiele, B.: Evaluation of output embeddings for fine-grained image classification. In: CVPR (2015)Google Scholar
  2. 2.
    Ardizzone, L., et al.: Analyzing inverse problems with invertible neural networks. In: ICLR (2019)Google Scholar
  3. 3.
    Ardizzone, L., Lüth, C., Kruse, J., Rother, C., Köthe, U.: Guided image generation with conditional invertible neural networks. arXiv preprint arXiv:1907.02392 (2019)
  4. 4.
    Cacheux, Y.L., Borgne, H.L., Crucianu, M.: Modeling inter and intra-class relations in the triplet loss for zero-shot learning. In: ICCV (2019)Google Scholar
  5. 5.
    Che, T., Li, Y., Jacob, A.P., Bengio, Y., Li, W.: Mode regularized generative adversarial networks. In: ICLR (2017)Google Scholar
  6. 6.
    Dinh, L., Krueger, D., Bengio, Y.: Nice: non-linear independent components estimation. In: ICLR Workshops (2014)Google Scholar
  7. 7.
    Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real NVP. In: ICLR (2017)Google Scholar
  8. 8.
    Elhoseiny, M., Elfeki, M.: Creativity inspired zero-shot learning. In: ICCV (2019)Google Scholar
  9. 9.
    Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.A.: Describing objects by their attributes. In: CVPR (2009)Google Scholar
  10. 10.
    Frome, A., et al.: DeViSE: a deep visual-semantic embedding model. In: NeurIPS (2013)Google Scholar
  11. 11.
    Gao, R., et al.: Zero-VAE-GAN: generating unseen features for generalized and transductive zero-shot learning. IEEE Trans. Image Process. 29, 3665–3680 (2020)CrossRefGoogle Scholar
  12. 12.
    Gao, R., Hou, X., Qin, J., Liu, L., Zhu, F., Zhang, Z.: A joint generative model for zero-shot learning. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11132, pp. 631–646. Springer, Cham (2019). Scholar
  13. 13.
    Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS (2015)Google Scholar
  14. 14.
    Grover, A., Dhar, M., Ermon, S.: Flow-GAN: combining maximum likelihood and adversarial learning in generative models. In: AAAI (2018)Google Scholar
  15. 15.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  16. 16.
    Higgins, I., et al.: Beta-VAE: learning basic visual concepts with a constrained variational framework. In: ICLR (2017)Google Scholar
  17. 17.
    Hoogeboom, E., Peters, J.W., van den Berg, R., Welling, M.: Integer discrete flows and lossless compression. In: NeurIPS (2019)Google Scholar
  18. 18.
    Huang, H., Wang, C., Yu, P.S., Wang, C.D.: Generative dual adversarial network for generalized zero-shot learning. In: CVPR (2019)Google Scholar
  19. 19.
    Jiang, H., Wang, R., Shan, S., Chen, X.: Transferable contrastive network for generalized zero-shot learning. In: ICCV (2019)Google Scholar
  20. 20.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)Google Scholar
  21. 21.
    Kingma, D., Dhariwal, P.: Glow: Generative flow with invertible 1x1 convolutions. In: NeurIPS (2018)Google Scholar
  22. 22.
    Kingma, D., Welling, M.: Auto-encoding variational Bayes. In: ICLR (2014)Google Scholar
  23. 23.
    Kodirov, E., Xiang, T., Gong, S.: Semantic autoencoder for zero-shot learning. In: CVPR (2017)Google Scholar
  24. 24.
    Kumar Verma, V., Arora, G., Mishra, A., Rai, P.: Generalized zero-shot learning via synthesized examples. In: CVPR (2018)Google Scholar
  25. 25.
    Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR (2009)Google Scholar
  26. 26.
    Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 453–465 (2013)CrossRefGoogle Scholar
  27. 27.
    Li, J., Jing, M., Lu, K., Ding, Z., Zhu, L., Huang, Z.: Leveraging the invariant side of generative zero-shot learning. In: CVPR (2019)Google Scholar
  28. 28.
    Li, K., Min, M.R., Fu, Y.: Rethinking zero-shot learning: a conditional visual classification perspective. In: ICCV (2019)Google Scholar
  29. 29.
    Liu, R., Liu, Y., Gong, X., Wang, X., Li, H.: Conditional adversarial generative flow for controllable image synthesis. In: CVPR (2019)Google Scholar
  30. 30.
    Liu, S., Long, M., Wang, J., Jordan, M.I.: Generalized zero-shot learning with deep calibration network. In: NeurIPS (2018)Google Scholar
  31. 31.
    Liu, Y., Guo, J., Cai, D., He, X.: Attribute attention for semantic disambiguation in zero-shot learning. In: ICCV (2019)Google Scholar
  32. 32.
    Long, Y., Liu, L., Shen, Y., Shao, L.: Towards affordable semantic searching: zero-shot retrieval via dominant attributes. In: AAAI (2018)Google Scholar
  33. 33.
    Maas, A.L., Hannun, A.Y., Ng., A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: ICML (2013)Google Scholar
  34. 34.
    van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)zbMATHGoogle Scholar
  35. 35.
    Mandal, D., et al.: Out-of-distribution detection for generalized zero-shot action recognition. In: CVPR (2019)Google Scholar
  36. 36.
    Mensink, T., Verbeek, J., Perronnin, F., Csurka, G.: Distance-based image classification: generalizing to new classes at near-zero cost. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2624–2637 (2013)CrossRefGoogle Scholar
  37. 37.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NeurIPS (2013)Google Scholar
  38. 38.
    Mishra, A., Krishna Reddy, S., Mittal, A., Murthy, H.A.: A generative model for zero shot learning using conditional variational autoencoders. In: CVPR Workshops (2018)Google Scholar
  39. 39.
    Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: NeurIPS (2019)Google Scholar
  40. 40.
    Patterson, G., Hays, J.: Sun attribute database: discovering, annotating, and recognizing scene attributes. In: CVPR (2012)Google Scholar
  41. 41.
    Prenger, R., Valle, R., Catanzaro, B.: WaveGlow: a flow-based generative network for speech synthesis. In: ICASSP (2019)Google Scholar
  42. 42.
    Qin, J., et al.: Zero-shot action recognition with error-correcting output codes. In: CVPR (2017)Google Scholar
  43. 43.
    Radovanović, M., Nanopoulos, A., Ivanović, M.: Hubs in space: popular nearest neighbors in high-dimensional data. J. Mach. Learn. Res. 11(Sep), 2487–2531 (2010)MathSciNetzbMATHGoogle Scholar
  44. 44.
    Romera-Paredes, B., Torr, P.: An embarrassingly simple approach to zero-shot learning. In: ICML (2015)Google Scholar
  45. 45.
    Scheirer, W.J., de Rezende Rocha, A., Sapkota, A., Boult, T.E.: Toward open set recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(7), 1757–1772 (2012)CrossRefGoogle Scholar
  46. 46.
    Schonfeld, E., Ebrahimi, S., Sinha, S., Darrell, T., Akata, Z.: Generalized zero- and few-shot learning via aligned variational autoencoders. In: CVPR (2019)Google Scholar
  47. 47.
    Shen, Y., Liu, L., Shen, F., Shao, L.: Zero-shot sketch-image hashing. In: CVPR (2018)Google Scholar
  48. 48.
    Shen, Z., Lai, W.-S., Xu, T., Kautz, J., Yang, M.-H.: Exploiting semantics for face image deblurring. Int. J. Comput. Vis. 128(7), 1829–1846 (2020). Scholar
  49. 49.
    Shen, Z., et al.: Human-aware motion deblurring. In: ICCV (2019)Google Scholar
  50. 50.
    Socher, R., Ganjoo, M., Sridhar, H., Bastani, O., Manning, C.D., Ng, A.Y.: Zero-shot learning through cross-modal transfer. In: NeurIPS (2013)Google Scholar
  51. 51.
    Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: NeurIPS (2015)Google Scholar
  52. 52.
    Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: relation network for few-shot learning. In: CVPR (2018)Google Scholar
  53. 53.
    Tolstikhin, I., Bousquet, O., Gelly, S., Schoelkopf, B.: Wasserstein auto-encoders. In: ICLR (2018)Google Scholar
  54. 54.
    Tong, B., Wang, C., Klinkigt, M., Kobayashi, Y., Nonaka, Y.: Hierarchical disentanglement of discriminative latent features for zero-shot learning. In: CVPR (2019)Google Scholar
  55. 55.
    Tran, D., Vafa, K., Agrawal, K.K., Dinh, L., Poole, B.: Discrete flows: invertible generative models of discrete data. In: ICLR Workshops (2019)Google Scholar
  56. 56.
    Tsai, Y.H.H., Huang, L.K., Salakhutdinov, R.: Learning robust visual-semantic embeddings. In: ICCV (2017) Google Scholar
  57. 57.
    Tsai, Y.H.H., Liang, P.P., Zadeh, A., Morency, L.P., Salakhutdinov, R.: Learning factorized multimodal representations. In: ICLR (2019)Google Scholar
  58. 58.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 dataset. Technical report CNS-TR-2011-001, California Institute of Technology (2011)Google Scholar
  59. 59.
    Wang, Q., Chen, K.: Zero-shot visual recognition via bidirectional latent embedding. Int. J. Comput. Vis. 124(3), 356–383 (2017). Scholar
  60. 60.
    Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., Schiele, B.: Latent embeddings for zero-shot classification. In: CVPR (2016)Google Scholar
  61. 61.
    Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2251–2265 (2018)CrossRefGoogle Scholar
  62. 62.
    Xian, Y., Lorenz, T., Schiele, B., Akata, Z.: Feature generating networks for zero-shot learning. In: CVPR (2018)Google Scholar
  63. 63.
    Xian, Y., Schiele, B., Akata, Z.: Zero-shot learning-the good, the bad and the ugly. In: CVPR (2017)Google Scholar
  64. 64.
    Xian, Y., Sharma, S., Schiele, B., Akata, Z.: f-VAEGAN-D2: a feature generating framework for any-shot learning. In: CVPR (2019)Google Scholar
  65. 65.
    Xie, G.S., et al.: Attentive region embedding network for zero-shot learning. In: CVPR (2019)Google Scholar
  66. 66.
    Yu, H., Lee, B.: Zero-shot learning via simultaneous generating and learning. In: NeurIPS (2019)Google Scholar
  67. 67.
    Zhang, F., Shi, G.: Co-representation network for generalized zero-shot learning. In: ICML (2019)Google Scholar
  68. 68.
    Zhang, H., Koniusz, P.: Zero-shot kernel learning. In: CVPR (2018)Google Scholar
  69. 69.
    Zhang, L., Xiang, T., Gong, S.: Learning a deep embedding model for zero-shot learning. In: CVPR (2017)Google Scholar
  70. 70.
    Zhang, Z., Saligrama, V.: Zero-shot learning via semantic similarity embedding. In: ICCV (2015)Google Scholar
  71. 71.
    Zhu, Y., Xie, J., Liu, B., Elgammal, A.: Learning feature-to-feature translator by alternating back-propagation for generative zero-shot learning. In: ICCV (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.EBaySan JoseUSA
  2. 2.Inception Institute of Artificial IntelligenceAbu DhabiUAE
  3. 3.Mohamed bin Zayed University of Artificial IntelligenceAbu DhabiUAE

Personalised recommendations