Advertisement

Improving Semantic Embedding Consistency by Metric Learning for Zero-Shot Classiffication

  • Maxime Bucher
  • Stéphane Herbin
  • Frédéric Jurie
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9909)

Abstract

This paper addresses the task of zero-shot image classification. The key contribution of the proposed approach is to control the semantic embedding of images – one of the main ingredients of zero-shot learning – by formulating it as a metric learning problem. The optimized empirical criterion associates two types of sub-task constraints: metric discriminating capacity and accurate attribute prediction. This results in a novel expression of zero-shot learning not requiring the notion of class in the training phase: only pairs of image/attributes, augmented with a consistency indicator, are given as ground truth. At test time, the learned model can predict the consistency of a test image with a given set of attributes, allowing flexible ways to produce recognition inferences. Despite its simplicity, the proposed approach gives state-of-the-art results on four challenging datasets used for zero-shot recognition evaluation.

Keywords

Zero-shot learning Attributes Semantic embedding 

References

  1. 1.
    Mahajan, D.K., Sellamanickam, S., Nair, V.: A joint learning framework for attribute models and object descriptions. In: IEEE International Conference on Computer Vision (ICCV) (2011)Google Scholar
  2. 2.
    Romera-Paredes, B., Torr, P.H.: An embarrassingly simple approach to zero-shot learning. In: Proceedings of the International Conference on Machine learning. 2152–2161 (2015)Google Scholar
  3. 3.
    Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2009)Google Scholar
  4. 4.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. (IJCV) 60(2), 91–110 (2004)CrossRefGoogle Scholar
  5. 5.
    Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, pp. 1–2 (2004)Google Scholar
  6. 6.
    Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the Fisher vector: theory and practice. Int. J. Comput. Vis. (IJCV) 105(3), 222–245 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Conference on Neural Information Processing Systems (NIPS), pp. 1106–1114 (2012)Google Scholar
  8. 8.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2014)Google Scholar
  9. 9.
    Ozeki, M., Okatani, T.: Understanding convolutional neural networks in terms of category-level attributes. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9004, pp. 362–375. Springer, Heidelberg (2015)Google Scholar
  10. 10.
    Escorcia, V., Niebles, J.C., Ghanem, B.: On the relationship between visual attributes and convolutional networks. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  11. 11.
    Zhang, Z., Saligrama, V.: Zero-shot learning via semantic similarity embedding. In: IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  12. 12.
    Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 453–465 (2014)CrossRefGoogle Scholar
  13. 13.
    Parikh, D., Grauman, K.: Interactively building a discriminative vocabulary of nameable attributes. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2011)Google Scholar
  14. 14.
    Duan, K., Parikh, D., Crandall, D., Grauman, K.: Discovering localized attributes for fine-grained recognition. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2012)Google Scholar
  15. 15.
    Berg, T.L., Berg, A.C., Shih, J.: Automatic attribute discovery and characterization from noisy web data. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 663–676. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  16. 16.
    Ba, L.J., Swersky, K., Fidler, S., Salakhutdinov, R.: Predicting deep zero-shot convolutional neural networks using textual descriptions. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, 7–13 December 2015, pp. 4247–4255 (2015)Google Scholar
  17. 17.
    Elhoseiny, M., Saleh, B., Elgammal, A.: Write a classifier: zero-shot learning using purely textual descriptions. In: IEEE International Conference on Computer Vision (ICCV) (2013)Google Scholar
  18. 18.
    Yu, F.X., Cao, L., Feris, R.S., Smith, J.R., Chang, S.F.F.: Designing category-level attributes for discriminative visual recognition. In: IEEE International Conference on Computer Vision (ICCV). IEEE (2013)Google Scholar
  19. 19.
    Verma, N., Mahajan, D., Sellamanickam, S., Nair, V.: Learning hierarchical similarity metrics. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2012)Google Scholar
  20. 20.
    Rohrbach, M., Stark, M., Schiele, B.: Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2011)Google Scholar
  21. 21.
    Akata, Z., Reed, S., Walter, D., Lee, H., Schiele, B.: Evaluation of output embeddings for fine-grained image classification. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  22. 22.
    Wang, Y., Mori, G.: A discriminative latent model of object classes and attributes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 155–168. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  23. 23.
    Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2009)Google Scholar
  24. 24.
    Mensink, T., Gavves, E., Snoek, C.G.M.: COSTA: co-occurrence statistics for zero-shot classification. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  25. 25.
    Norouzi, M., Mikolov, T., Bengio, S., Singer, Y., Shlens, J., Frome, A., Corrado, G.S., Dean, J.: Zero-shot learning by convex combination of semantic embeddings. In: International Conference on Learning Representations (ICLR), December 2013Google Scholar
  26. 26.
    Fu, Z., Xiang, T.A., Kodirov, E., Gong, S.: Zero-shot object recognition by semantic manifold distance. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  27. 27.
    Parikh, D., Grauman, K.: Relative attributes. In: IEEE International Conference on Computer Vision (ICCV) (2011)Google Scholar
  28. 28.
    Palatucci, M., Pomerleau, D., Hinton, G.E., Mitchell, T.M.: Zero-shot learning with semantic output codes. In: Conference on Neural Information Processing Systems (NIPS) (2009)Google Scholar
  29. 29.
    Weston, J., Bengio, S., Usunier, N.: WSABIE: scaling up to large vocabulary image annotation. In: IJCAI. 2764–2770. (2011)Google Scholar
  30. 30.
    Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for image classification. IEEE Trans. Pattern Anal. Mach. Intell. (2015)Google Scholar
  31. 31.
    Hamm, J., Belkin, M.: Probabilistic Zero-shot Classification with Semantic Rankings. arXiv.org, February 2015Google Scholar
  32. 32.
    Jayaraman, D., Grauman, K.: Zero-shot recognition with unreliable attributes. In: Conference on Neural Information Processing Systems (NIPS) (2014)Google Scholar
  33. 33.
    Wu, S., Bondugula, S., Luisier, F., Zhuang, X., Natarajan, P.: Zero-shot event detection using multi-modal fusion of weakly supervised concepts. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  34. 34.
    Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Ranzato, M., Mikolov, T.: DeViSE: a deep visual-semantic embedding model. In: Conference on Neural Information Processing Systems (NIPS) (2013)Google Scholar
  35. 35.
    Wang, G., Forsyth, D.: Joint learning of visual attributes, object classes and visual saliency. In: IEEE International Conference on Computer Vision (ICCV) (2009)Google Scholar
  36. 36.
    Socher, R., Ganjoo, M., Manning, C.D., Ng, A.: Zero-shot learning through cross-modal transfer. In: Conference on Neural Information Processing Systems (NIPS) (2013)Google Scholar
  37. 37.
    Fu, Y., Hospedales, T.M., Xiang, T., Fu, Z., Gong, S.: Transductive multi-view embedding for zero-shot recognition and annotation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part II. LNCS, vol. 8690, pp. 584–599. Springer, Heidelberg (2014)Google Scholar
  38. 38.
    Li, X., Guo, Y., Schuurmans, D.: Semi-supervised zero-shot classification with label representation learning. In: IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  39. 39.
    Kodirov, E., Xiang, T., Fu, Z., Gong, S.: Unsupervised domain adaptation for zero-shot learning. In: IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  40. 40.
    Mensink, T., Verbeek, J., Perronnin, F., Csurka, G.: Metric learning for large scale image classification: generalizing to new classes at near-zero cost. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 488–501. Springer, Heidelberg (2012)Google Scholar
  41. 41.
    Kuznetsova, A., Hwang, S.J., Rosenhahn, B., Sigal, L.: Exploiting view-specific appearance similarities across classes for zero-shot pose prediction: a metric learning approach. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, Arizona, USA, 12–17 February 2016, pp. 3523–3529 (2016)Google Scholar
  42. 42.
    Bellet, A., Habrard, A., Sebban, M.: A Survey on Metric Learning for Feature Vectors and Structured Data. Technical report arXiv:1306.6709v4, University of St Etienne (2013)
  43. 43.
    Shalev-Shwartz, S., Singer, Y., Ng, A.Y.: Online and batch learning of pseudo-metrics. In: Proceedings of the International Conference on Machine learning, p. 94. ACM (2004)Google Scholar
  44. 44.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset. Technical report, July 2011Google Scholar
  45. 45.
    Patterson, G., Xu, C., Su, H., Hays, J.: The SUN attribute database: beyond categories for deeper scene understanding. Int. J. Comput. Vis. (IJCV) 108(1–2), 59–81 (2014)CrossRefGoogle Scholar
  46. 46.
    Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: Large-scale machine learning on heterogeneous systems Software available from tensorflow.org (2015)
  47. 47.
    Zhang, Z., Saligrama, V.: Zero-shot learning via joint latent similarity embedding. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6034–6042 (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.ONERA - The French Aerospace LabPalaiseauFrance
  2. 2.Normandie Univ, UNICAEN, ENSICAEN, CNRSCaenFrance

Personalised recommendations