Describing Images with Ontology-Aware Dictionary Learning

Zhang, Chengyue; Han, Yahong

doi:10.1007/978-3-319-27671-7_29

Chengyue Zhang^19,20 &
Yahong Han^19,20

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9516))

Included in the following conference series:

International Conference on Multimedia Modeling

2939 Accesses

Abstract

In this paper, we focus on the generation of contextual descriptions for images by learning an ontology-aware dictionary. Ontology deals with questions concerning what entities exist and how such entities can be related with a hierarchy. Thus, if we incorporate the semantic hierarchies of visual concepts into a learned visual dictionary, which consists of visual atoms, we can generate contextual descriptions of testing images through the reconstruction. This paper proposes to learn the ontology-aware dictionary by integrating hierarchical dictionary learning and multi-task regression into a joint framework. By utilizing a hierarchical regularization term defined on the multiple semantic categories, the hierarchical structures are introduced into the multi-task regression. The joint optimization of the sparse coding and multi-task regression makes the semantic hierarchies embedded into the learned dictionary. Experiments on two benchmark datasets show the better performance of the proposed algorithm. Examples of the ontology-aware dictionary and generated image descriptions successfully demonstrate the effectiveness of the proposed framework.

Y. Han was partly supported by the NSFC (under Grant 61202166 and 61472276) and the Major Project of National Social Science Fund of China (under Grant 14ZDB153).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Balcan, N., Blum, A., Mansour, Y.: Exploiting ontology structures and unlabeled data for learning. In: ICML, pp. 1112–1120 (2013)
Google Scholar
Chen, X., Lin, Q., Kim, S., Carbonell, J.G., Xing, E.P., et al.: Smoothing proximal gradient method for general structured sparse regression. Ann. Appl. Stat. 6(2), 719–752 (2012)
Article MathSciNet Google Scholar
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Article MATH MathSciNet Google Scholar
Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. CNS-TR-2007-001, Technical report, California Institute of Technology (2007)
Google Scholar
Hwang, S.J., Grauman, K., Sha, F.: Semantic kernel forests from multiple taxonomies. In: NIPS, pp. 1718–1726 (2012)
Google Scholar
Jenatton, R., Mairal, J., Bach, F.R., Obozinski, G.R.: Proximal methods for sparse hierarchical dictionary learning. In: ICML, pp. 487–494 (2010)
Google Scholar
Jiang, Z., Lin, Z., Davis, L.S.: Label consistent k-svd: learning a discriminative dictionary for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2651–2664 (2013)
Article Google Scholar
Mairal, J., Bach, F., Ponce, J.: Task-driven dictionary learning. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 791–804 (2012)
Article Google Scholar
Mairal, J., Ponce, J., Sapiro, G., Zisserman, A., Bach, F.R.: Supervised dictionary learning. In: NIPS, pp. 1033–1040 (2009)
Google Scholar
Schapire, R.E., Singer, Y.: Boostexter: a boosting-based system for text categorization. Mach. Learn. 39(2–3), 135–168 (2000)
Article MATH Google Scholar
Shen, L., Wang, S., Sun, G., Jiang, S., Huang, Q.: Multi-level discriminative dictionary learning towards hierarchical visual categorization. In: CVPR, pp. 383–390. IEEE (2013)
Google Scholar
Wang, H., Nie, F., Cai, W., Huang, H.: Semi-supervised robust dictionary learning via efficient l-norms minimization. In: ICCV, pp. 1145–1152. IEEE (2013)
Google Scholar
Wang, M., Gao, Y., Lu, K., Rui, Y.: View-based discriminative probabilistic modeling for 3d object retrieval and recognition. IEEE Trans. Image Process. 22(4), 1395–1407 (2013)
Article MathSciNet Google Scholar
Zhang, L., Wang, M., Hong, R., Yin, B.C., Li, X.: Large-scale aerial image categorization using a multitask topological codebook. IEEE Trans. Cybern. (2015). doi:10.1109/TCYB.2015.2408592
Google Scholar
Zhang, M.L., Wu, L.: Lift: multi-label learning with label-specific features. IEEE Trans. Pattern Anal. Mach. Intell. 37(1), 107–120 (2015)
Article Google Scholar
Zhang, M.L., Zhou, Z.H.: Ml-knn: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)
Article MATH Google Scholar
Zheng, J., Jiang, Z.: Tag taxonomy aware dictionary learning for region tagging. In: CVPR, pp. 369–376. IEEE (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Tianjin University, Tianjin, China
Chengyue Zhang & Yahong Han
Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin, China
Chengyue Zhang & Yahong Han

Authors

Chengyue Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yahong Han
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yahong Han .

Editor information

Editors and Affiliations

University of Texas at San Antonio, San Antonio, USA
Qi Tian
Dept. of Information Engineering, University of Trento, Povo, Trento, Italy
Nicu Sebe
EECS, University of Central Florida, Orlando, Florida, USA
Guo-Jun Qi
EURECOM, Sophia-Antipolis, France
Benoit Huet
Hefei University of Technology, Hefei, Anhui, China
Richang Hong
School of Computing and Information, Hefei University of Technology, Hefei, Anhui, China
Xueliang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, C., Han, Y. (2016). Describing Images with Ontology-Aware Dictionary Learning. In: Tian, Q., Sebe, N., Qi, GJ., Huet, B., Hong, R., Liu, X. (eds) MultiMedia Modeling. MMM 2016. Lecture Notes in Computer Science(), vol 9516. Springer, Cham. https://doi.org/10.1007/978-3-319-27671-7_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-27671-7_29
Published: 03 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27670-0
Online ISBN: 978-3-319-27671-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics