Image Tagging by Joint Deep Visual-Semantic Propagation

Ma, Yuexin; Zhu, Xinge; Sun, Yujing; Yan, Bingzheng

doi:10.1007/978-3-319-77380-3_3

Yuexin Ma¹⁹,
Xinge Zhu²⁰,
Yujing Sun¹⁹ &
…
Bingzheng Yan²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10735))

Included in the following conference series:

Pacific Rim Conference on Multimedia

2835 Accesses

Abstract

Image tagging has attracted much research interest due to its wide applications. Many existing methods have gained impressive results, however, they have two main limitations: (1) only focus on tagging images, but ignore the tags’ influences on visual feature modeling. (2) model the tag correlation without considering visual contents of image. In this paper, we propose a joint visual-semantic propagation model (JVSP) to address these two issues. First, we leverage a joint visual-semantic modeling to harvest integrated features which can accurately reflect the relationship between tags and image regions. Second, we introduce a visual-guided LSTM to capture the co-occurrence relation of the tags. Third, we also design a diversity loss to enforce that our model learns to focus on different regions. Experimental results on three challenging datasets demonstrate that our proposed method leads to significant performance gains over existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 155.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sun, F., Tang, J., Li, H., Qi, G.J., Huang, T.S.: Multi-label image categorization with sparse factor representation. IEEE TIP 23(3), 1028–1037 (2014)
MathSciNet MATH Google Scholar
Liu, D., Yan, S., Rui, Y., Zhang, H.J.: Unified tag analysis with multi-edge graph. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 25–34 (2010)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR 2009, pp. 248–255 (2009)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR, pp. 770–778 (2015)
Google Scholar
Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, p. 48 (2009)
Google Scholar
Gong, Y., Jia, Y., Leung, T., Toshev, A., Ioffe, S.: Deep convolutional ranking for multilabel image annotation. arXiv preprint arXiv:1312.4894 (2013)
Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., Xu, W.: CNN-RNN: a unified framework for multi-label image classification. In: Proceedings of the IEEE CVPR, pp. 2285–2294 (2016)
Google Scholar
Jin, J., Nakayama, H.: Annotation order matters: recurrent image annotator for arbitrary length image tagging. arXiv preprint arXiv:1604.05225 (2016)
Murthy, V.N., Maji, S., Manmatha, R.: Automatic image annotation using deep learning representations. In: Proceedings of the 5th ACM on ICMR, pp. 603–606 (2015)
Google Scholar
Wang, H., Huang, H., Ding, C.: Image annotation using multi-label correlated green’s function. In: IEEE ICCV (2009)
Google Scholar
Verma, Y., Jawahar, C.V.: Image annotation using metric learning in semantic neighbourhoods. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 836–849. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33712-3_60
Chapter Google Scholar
Guillaumin, M., Mensink, T., Verbeek, J., Schmid, C.: TagProp: discriminative metric learning in nearest neighbor models for image auto-annotation. In: ICCV, pp. 309–316 (2009)
Google Scholar
Cao, X., Zhang, H., Guo, X., Liu, S., Meng, D.: SLED: semantic label embedding dictionary representation for multilabel image annotation. IEEE TIP 24(9), 2746–2759 (2015)
MathSciNet Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Von Ahn, L., Dabbish, L.: Labeling images with a computer game. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 319–326. ACM (2004)
Google Scholar
Jia, X., Gavves, E., Fernando, B., Tuytelaars, T.: Guiding the long-short term memory model for image caption generation. In: ICCV, pp. 2407–2415 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Hong Kong, Hong Kong, China
Yuexin Ma & Yujing Sun
University of Chinese Academy of Sciences, Beijing, China
Xinge Zhu
National University of Defense Technology, Changsha, China
Bingzheng Yan

Authors

Yuexin Ma
View author publications
You can also search for this author in PubMed Google Scholar
Xinge Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yujing Sun
View author publications
You can also search for this author in PubMed Google Scholar
Bingzheng Yan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuexin Ma .

Editor information

Editors and Affiliations

University of Electronic Science and Technology of China, Chengdu, China
Bing Zeng
University of Chinese Academy of Sciences, Beijing, China
Qingming Huang
University of Ottawa, Ottawa, Ontario, Canada
Abdulmotaleb El Saddik
University of Electronic Science and Technology of China, Chengdu, China
Hongliang Li
Chinese Academy of Sciences, Beijing, China
Shuqiang Jiang
Harbin Institute of Technology, Harbin, China
Xiaopeng Fan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ma, Y., Zhu, X., Sun, Y., Yan, B. (2018). Image Tagging by Joint Deep Visual-Semantic Propagation. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds) Advances in Multimedia Information Processing – PCM 2017. PCM 2017. Lecture Notes in Computer Science(), vol 10735. Springer, Cham. https://doi.org/10.1007/978-3-319-77380-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-77380-3_3
Published: 10 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77379-7
Online ISBN: 978-3-319-77380-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics