Skip to main content

Zero-Shot Transfer Learning Based on Visual and Textual Resemblance

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11955))

Included in the following conference series:

  • 2761 Accesses

Abstract

Existing image search engines, whose ranking functions are built based on labeled images or wrap texts, have poor results on queries in new, or low-frequency keywords. In this paper, we put forward the zero-shot transfer learning (ZSTL), which aims to transfer networks from given classifiers to new zero-shot classifiers with little cost, and helps image searching perform better on new or low-frequency words. Content-based queries (i.e., ranking images was not only based on their visual looks but also depended on their contents) can also be enhanced by ZSTL. ZSTL was proposed after we found the resemblance between photographic composition and the description of objects in natural language. Both composition and description highlight the object by stressing the particularity, so we consider that there exists a resemblance between visual and textual space. We provide several ways to transfer from visual features into textual ones. The method of applying deep learning and Word2Vec models to Wikipedia yielded impressive results. Our experiments present evidence to support the existence of resemblance between composition and description and show the feasibility and effectiveness of transferring zero-shot classifiers. With these transferred zero-shot classifiers, problems of image ranking query with low-frequency or new words can be solved. The image search engine proposed adopts cosine distance ranking as the ranking algorithm. Experiments on image searching show the superior performance of ZSTL.

G. Yang—This work was supported by the Beijing Natural Science Foundation (No. 4192029), and the National Natural Science Foundation of China (61773385, 61672523). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp used for this research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1778–1785. IEEE (2009)

    Google Scholar 

  2. Mitchell, T.M., et al.: Machine learning, vol. 45, no. 37, pp. 870–877. McGraw Hill, Burr Ridge (1997)

    Google Scholar 

  3. Larochelle, H., Erhan, D., Bengio, Y.: Zero-data learning of new tasks. In: AAAI, vol. 1, p. 3 (2008)

    Google Scholar 

  4. Rohrbach, M., Stark, M., Schiele, B.: Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1641–1648. IEEE (2011)

    Google Scholar 

  5. Romera-Paredes, B., Torr, P.: An embarrassingly simple approach to zero-shot learning. In: International Conference on Machine Learning, pp. 2152–2161 (2015)

    Google Scholar 

  6. Norouzi, M., et al.: Zero-shot learning by convex combination of semantic embeddings. arXiv preprint arXiv:1312.5650 (2013)

  7. Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 951–958. IEEE (2009)

    Google Scholar 

  8. Li, X., Liao, S., Lan, W., Du, X., Yang, G.: Zero-shot image tagging by hierarchical semantic embedding. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 879–882 (2015)

    Google Scholar 

  9. Socher, R., Ganjoo, M., Manning, C.D., Ng, A.: Zero-shot learning through cross-modal transfer. In: Advances in Neural Information Processing Systems, pp. 935–943 (2013)

    Google Scholar 

  10. Palatucci, M., Pomerleau, D., Hinton, G.E., Mitchell, T.M.: Zero-shot learning with semantic output codes. In: Advances in Neural Information Processing Systems, pp. 1410–1418 (2009)

    Google Scholar 

  11. Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing visual features for multiclass and multiview object detection. IEEE Trans. Pattern Anal. Mach. Intell. 29(5), 854–869 (2007)

    Article  Google Scholar 

  12. Feng, Y., Lapata, M.: Visual information in semantic representation. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, pp. 91–99 (2010)

    Google Scholar 

  13. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  14. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates, Inc., New York (2012)

    Google Scholar 

  15. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gang Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, G., Xu, J. (2019). Zero-Shot Transfer Learning Based on Visual and Textual Resemblance. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Lecture Notes in Computer Science(), vol 11955. Springer, Cham. https://doi.org/10.1007/978-3-030-36718-3_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-36718-3_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-36717-6

  • Online ISBN: 978-3-030-36718-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics