Visual Structure Constraint for Transductive Zero-Shot Learning in the Wild

Abstract

To recognize objects of the unseen classes, most existing Zero-Shot Learning(ZSL) methods first learn a compatible projection function between the common semantic space and the visual space based on the data of source seen classes, then directly apply it to the target unseen classes. However, for data in the wild, distributions between the source and target domain might not match well, thus causing the well-known domain shift problem. Based on the observation that visual features of test instances can be separated into different clusters, we propose a new visual structure constraint on class centers for transductive ZSL, to improve the generality of the projection function (i.e.alleviate the above domain shift problem). Specifically, three different strategies (symmetric Chamfer-distance, Bipartite matching distance, and Wasserstein distance) are adopted to align the projected unseen semantic centers and visual cluster centers of test instances. We also propose two new training strategies to handle the data in the wild, where many unrelated images in the test dataset may exist. This realistic setting has never been considered in previous methods. Extensive experiments demonstrate that the proposed visual structure constraint brings substantial performance gain consistently and the new training strategies make it generalize well for data in the wild. The source code is available at https://github.com/raywzy/VSC.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

References

  1. Akata, Z., Perronnin, F., Harchaoui, Z., & Schmid, C. (2013). Label-embedding for attribute-based classification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 819-826).

  2. Akata, Z., Perronnin, F., Harchaoui, Z., & Schmid, C. (2016). Label-embedding for image classification. TPAMI.

  3. Akata, Z., Reed, S., Walter, D., Lee, H., & Schiele, B. (2015). Evaluation of output embeddings for fine-grained image classification. In: CVPR.

  4. Annadani, Y., & Biswas, S. (2018). Preserving semantic relations for zero-shot learning.In: CVPR.

  5. Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein gan. arXiv preprint arXiv:1701.07875.

  6. Chang, J., Wang, L., Meng, G., Xiang, S., & Pan, C. (2017). Deep adaptive image clustering. In: Proceedings of the IEEE international conference on computer vision, pp. 5879–5887.

  7. Changpinyo, S., Chao, W.L., Gong, B., & Sha, F. (2016). Synthesized classifiers for zero-shot learning.In: CVPR.

  8. Changpinyo, S., Chao, W.L., Gong, B., & Sha, F. (2018). Classifier and exemplar synthesis for zero-shot learning. arXiv preprint arXiv:1812.06423.

  9. Changpinyo, S., Chao, W.L., & Sha, F. (2017). Predicting visual exemplars of unseen classes for zero-shot learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3476–3485.

  10. Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. In: Advances in neural information processing systems, pp. 2292–2300.

  11. Elhoseiny, M., Saleh, B., & Elgammal, A. (2013). Write a classifier: Zero-shot learning using purely textual descriptions. In: ICCV.

  12. Fan, H., Su, H., & Guibas, L. (2017). A point set generation network for 3d object reconstruction from a single image. In: CVPR.

  13. Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In: CVPR.

  14. Felix, R., Kumar, V.B., Reid, I., & Carneiro, G. (2018). Multi-modal cycle-consistent generalized zero-shot learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 21–37.

  15. Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., & Mikolov, T., et al. (2013). Devise: A deep visual-semantic embedding model. In: NIPS.

  16. Fu, Y., Hospedales, T.M., Xiang, T., Fu, Z., & Gong, S. (2014). Transductive multi-view embedding for zero-shot recognition and annotation. In: ECCV.

  17. Fu, Y., Hospedales, T.M., Xiang, T., & Gong, S. (2015). Transductive multi-view zero-shot learning. TPAMI.

  18. Fu, Y., & Sigal, L. (2016). Semi-supervised vocabulary-informed learning. In: CVPR.

  19. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680.

  20. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: CVPR.

  21. Huang, H., Wang, C., Yu, P.S., & Wang, C.D. (2019). Generative dual adversarial network for generalized zero-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 801–810.

  22. Kingma, D.P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.

  23. Kodirov, E., Xiang, T., Fu, Z., & Gong, S. (2015). Unsupervised domain adaptation for zero-shot learning.In: ICCV.

  24. Kodirov, E., Xiang, T., & Gong, S. (2017). Semantic autoencoder for zero-shot learning. In: CVPR.

  25. Lampert, C.H., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In: CVPR.

  26. Lampert, C.H., Nickisch, H., & Harmeling, S. (2014). Attribute-based classification for zero-shot visual object categorization. TPAMI.

  27. Lei Ba, J., Swersky, K., & Fidler, S., et al. (2015). Predicting deep zero-shot convolutional neural networks using textual descriptions. In: ICCV.

  28. Li, J., Jing, M., Lu, K., Ding, Z., Zhu, L., & Huang, Z. (2019). Leveraging the invariant side of generative zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7402–7411.

  29. Li, Y., Wang, D., Hu, H., Lin, Y., & Zhuang, Y. (2017). Zero-shot recognition using dual visual-semantic mapping paths. In: CVPR.

  30. Li, Y., Zhang, J., Zhang, J., & Huang, K. (2018). Discriminative learning of latent features for zero-shot recognition. In: CVPR.

  31. Liu, S., Long, M., Wang, J., & Jordan, M.I. (2018). Generalized zero-shot learning with deep calibration network.In: Advances in Neural Information Processing Systems, pp. 2005–2015.

  32. Long, Y., Liu, L., Shen, F., Shao, L., & Li, X. (2017). Zero-shot learning using synthesised unseen visual data with diffusion regularisation. IEEE transactions on pattern analysis and machine intelligence, 40(10), 2498–2512.

    Article  Google Scholar 

  33. Lu, Y. (2016). Unsupervised learning on neural network outputs: with application in zero-shot learning. IJCAI.

  34. Miller, G. A. (1995). Wordnet: a lexical database for english. Communications of the ACM.

  35. Morgado, P., & Vasconcelos, N. (2017). Semantically consistent regularization for zero-shot recognition. In: CVPR.

  36. Norouzi, M., Mikolov, T., Bengio, S., Singer, Y., Shlens, J., Frome, A., Corrado, G.S., & Dean, J. (2014). Zero-shot learning by convex combination of semantic embeddings. ICLR.

  37. Patterson, G., Xu, C., Su, H., & Hays, J. (2014). The sun attribute database: Beyond categories for deeper scene understanding. IJCV.

  38. Pennington, J., Socher, R., & Manning, C.D. (2014). Glove: Global vectors for word representation. In: EMNLP.

  39. Radovanović, M., Nanopoulos, A., & Ivanović, M. (2010). Hubs in space: Popular nearest neighbors in high-dimensional data.JMLR.

  40. Reed, S., Akata, Z., Lee, H., & Schiele, B. (2016). Learning deep representations of fine-grained visual descriptions. In: CVPR.

  41. Romera-Paredes, B., & Torr, P. (2015). An embarrassingly simple approach to zero-shot learning. In: ICML.

  42. Shigeto, Y., Suzuki, I., Hara, K., Shimbo, M., & Matsumoto, Y. (2015). Ridge regression, hubness, and zero-shot learning.In: ECML.

  43. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR.

  44. Song, J., Shen, C., Yang, Y., Liu, Y., & Song, M. (2018). Transductive unbiased embedding for zero-shot learning.In: CVPR.

  45. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions.In: CVPR.

  46. Vyas, M.R., Venkateswara, H., & Panchanathan, S. (2020). Leveraging seen and unseen semantic relationships for generative zero-shot learning. arXiv preprint arXiv:2007.09549.

  47. Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The caltech-ucsd birds-200-2011 dataset.

  48. Wan, Z., Chen, D., Li, Y., Yan, X., Zhang, J., Yu, Y., et al. (2019). Transductive zero-shot learning with visual structure constraint. Advances in Neural Information Processing Systems, 32, 9972–9982.

    Google Scholar 

  49. Wang, W., Pu, Y., Verma, V.K., Fan, K., Zhang, Y., Chen, C., Rai, P., & Carin, L. (2018). Zero-shot learning via class-conditioned deep generative models. AAAI.

  50. Wang, X., Ye, Y., & Gupta, A. (2018). Zero-shot recognition via semantic embeddings and knowledge graphs. In: CVPR.

  51. Xian, Y., Lampert, C.H., Schiele, B., & Akata, Z. (2018). Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. TPAMI.

  52. Xian, Y., Lorenz, T., Schiele, B., & Akata, Z. (2018). Feature generating networks for zero-shot learning. In: CVPR.

  53. Xian, Y., Sharma, S., Schiele, B., & Akata, Z. (2019). f-vaegan-d2: A feature generating framework for any-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10275–10284.

  54. Ye, M., & Guo, Y. (2017). Zero-shot classification with discriminative semantic representation learning. In: CVPR.

  55. Zhang, L., Xiang, T., & Gong, S. (2017). Learning a deep embedding model for zero-shot learning. In: CVPR.

  56. Zhang, L., Xiang, T., & Gong, S. (2017). Learning a deep embedding model for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2021–2030.

  57. Zhang, Z., & Saligrama, V. (2015). Zero-shot learning via semantic similarity embedding. In: ICCV.

  58. Zhang, Z., & Saligrama, V. (2016). Zero-shot learning via joint latent similarity embedding. In: CVPR.

  59. Zhang, Z., & Saligrama, V. (2016). Zero-shot recognition via structured prediction. In: ECCV.

  60. Zhao, A., Ding, M., Guan, J., Lu, Z., Xiang, T., & Wen, J.R. (2018). Domain-invariant projection learning for zero-shot recognition. In: Advances in Neural Information Processing Systems, pp. 1019–1030.

  61. Zhu, X., & Ghahramani, Z. (2002). Learning from labeled and unlabeled data with label propagation. Technical Report, Carnegie Mellon University.

  62. Zhu, Y., Elhoseiny, M., Liu, B., Peng, X., & Elgammal, A. (2018). A generative adversarial approach for zero-shot learning from noisy texts. In: CVPR.

  63. Zhu, Y., Xie, J., Tang, Z., Peng, X., & Elgammal, A. (2019). Learning where to look: Semantic-guided multi-attention localization for zero-shot learning. arXiv preprint arXiv:1903.00502.

Download references

Acknowledgements

The work described in this paper was fully supported by an ECS grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. CityU 21209119).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jing Liao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Mei Chen.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wan, Z., Chen, D. & Liao, J. Visual Structure Constraint for Transductive Zero-Shot Learning in the Wild. Int J Comput Vis (2021). https://doi.org/10.1007/s11263-021-01451-1

Download citation

Keywords

  • Computer vision
  • Zero-shot learning
  • Visual structure constraint