Abstract
Cross-media retrieval is becoming increasingly important nowadays. To address this challenging problem, most existing approaches project heterogeneous features into a unified feature space to facilitate their similarity computation. However, this unified feature space usually has no explicit semantic meanings, which might ignore the hints contained in the original media content, and thus is not able to fully measure the similarities among different media types. By considering the above issues, we propose a new approach to cross-media retrieval via semantic entity projection (SEP) in this paper. Our approach consists of three main steps. Firstly, an entity level with fine-grained semantics between low-level features and high-level concepts are constructed, so as to help bridge the semantic gap to a certain extent. Then, an entity projection is learned by minimizing both cross-media correlation error and single-media reconstruction error from low-level features to the entity level, with which a unified feature space with explicit semantic meanings can be obtained from low-level features. Finally, the semantic abstraction of high-level concepts is generated by using logistic regression to conduct cross-media retrieval. Experimental results on the Wikipedia dataset show the effectiveness of the proposed approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimedia Syst. 16(6), 345–379 (2010)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Cheng, X., Roth, D.: Relational Inference for Wikification. In: EMNLP (2013)
Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1), 22–29 (1990)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder. In: Proceedings of the ACM International Conference on Multimedia, pp. 7–16 (2014)
Guillaumin, M., Mensink, T., Verbeek, J., Schmid, C.: Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In: IEEE 12th International Conference on Computer Vision, pp. 309–316 (2009)
Hotelling, H.: Relations between two sets of variates. Biometrika 42(1), 321–377 (1936)
Jacobs, P.S.: Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval. Psychology Press, New York (2014)
Jiang, Y., Ngo, C.W., Yang, J.: Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, pp. 494–501 (2007)
Li, D., Dimitrova, N., Li, M., Sethi, I.K.: Multimedia content processing through cross-modal association. In: Proceedings of the 11th ACM International Conference on Multimedia, pp. 604–611 (2003)
Mahadevan, V., Wong, C.W., Pereira, J.C., Liu, T., Vasconcelos, N., Saul, L.K.: Maximum covariance unfolding: manifold learning for bimodal data. In: Advances in Neural Information Processing Systems, pp. 918–926 (2011)
Peng, Y., Zhai, X., Zhao, Y., Huang, X.: Semi-supervised cross-media feature learning with unified patch graph regularization. IEEE Trans. Circ. Syst. Video Technol. (2015). http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7036070&tag=1
Pereira, J.C., Coviello, E., Doyle, G., Rasiwasia, N., Lanckriet, G., Levy, R., Vasconcelos, N.: On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 521–535 (2014)
Qi, G.J., Aggarwal, C., Huang, T.: Towards semantic knowledge propagation from text corpus to web images. In: Proceedings of the 20th International Conference on World Wide Web, pp. 297–306 (2011)
Rasiwasia, N., Pereira, J.C., Coviello, E., Doyle, G., Lanckriet, G., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: Proceedings of the ACM International Conference on Multimedia, pp. 251–260 (2010)
Sharma, A., Kumar, A., Daume III, H., Jacobs, D.W.: Generalized multiview analysis: a discriminative latent space. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2160–2167 (2012)
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. Int J. Comput. Vis. 103(1), 60–79 (2013)
Yang, Y., Zhuang, Y., Wu, F., Pan, Y.: Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans. Multimedia 10(3), 437–446 (2008)
Zhuang, Y., Wang, Y., Wu, F., Zhang, Y., Lu, W.: Supervised coupled dictionary learning with group structures for multi-modal retrieval. In: Proceedings of the 27th AAAI Conference on Artificial Intelligence, pp. 1070–1076 (2013)
Zhuang, Y., Yang, Y., Wu, F.: Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans. Multimedia 10(2), 221–229 (2008)
Acknowledgments
This work was supported by National Natural Science Foundation of China under Grants 61371128 and 61532005, and National Hi-Tech Research and Development Program of China (863 Program) under Grants 2014AA015102 and 2012AA012503.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Huang, L., Peng, Y. (2016). Cross-Media Retrieval via Semantic Entity Projection. In: Tian, Q., Sebe, N., Qi, GJ., Huet, B., Hong, R., Liu, X. (eds) MultiMedia Modeling. MMM 2016. Lecture Notes in Computer Science(), vol 9516. Springer, Cham. https://doi.org/10.1007/978-3-319-27671-7_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-27671-7_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27670-0
Online ISBN: 978-3-319-27671-7
eBook Packages: Computer ScienceComputer Science (R0)