Abstract
Automatically understanding and modeling a user’s liking for an image is a challenging problem. This is because the relationship between the images features (even semantic ones extracted by existing tools, viz. faces, objects etc.) and users’ ‘likes’ is non-linear, influenced by several subtle factors. This work presents a deep bi-modal knowledge representation of images based on their visual content and associated tags (text). A mapping step between the different levels of visual and textual representations allows for the transfer of semantic knowledge between the two modalities. It also includes feature selection before learning deep representation to identify the important features for a user to like an image. Then the proposed representation is shown to be effective in learning a model of users image ‘likes’ based on a collection of images ‘liked’ by him. On a collection of images ‘liked’ by users (from Flickr) the proposed deep representation is shown to better state-of-art low-level features used for modeling user ‘likes’ by around 15–20 %.
These authors ‘S.C. Guntuku and J.T. Zhou’ contributed equally.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Kennedy, L., Naaman, M., Ahern, S., Nair, R., Rattenbury, T.: How flickr helps us make sense of the world: context and content in community-contributed media collections. In: Proceedings of the 15th International Conference on Multimedia. MULTIMEDIA 2007, pp. 631–640. ACM, New York (2007)
Gong, Y., Ke, Q., Isard, M., Lazebnik, S.: A multi-view embedding space for modeling internet images, tags, and their semantics. Int. J. Comput. Vis. 106, 210–233 (2014)
Lampropoulos, A.S., Lampropoulou, P.S., Tsihrintzis, G.A.: A cascade-hybrid music recommender system for mobile services based on musical genre classification and personality diagnosis. Multimedia Tools Appl. 59, 241–258 (2012)
Mairesse, F., Walker, M.A., Mehl, M.R., Moore, R.K.: Using linguistic cues for the automatic recognition of personality in conversation and text. J. Artif. Intell. Res. (JAIR) 30, 457–500 (2007)
Cristani, M., Vinciarelli, A., Segalin, C., Perina, A.: Unveiling the multimedia unconscious: implicit cognitive processes and multimedia content analysis. In: Proceedings of the 21st ACM International Conference on Multimedia. MM 2013, pp. 213–222. ACM, New York (2013)
Guntuku, S.C., Roy, S., Weisi, L.: Personality modeling based image recommendation. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds.) MMM 2015, Part II. LNCS, vol. 8936, pp. 171–182. Springer, Heidelberg (2015)
Lovato, P., Perina, A., Sebe, N., Zandonà, O., Montagnini, A., Bicego, M., Cristani, M.: Tell me what you like and i’ll tell you what you are: discriminating visual preferences on flickr data. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 45–56. Springer, Heidelberg (2013)
Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113. IEEE (2009)
Eichner, M., Marin-Jimenez, M., Zisserman, A., Ferrari, V.: 2d articulated human pose estimation and retrieval in (almost) unconstrained still images. Int. J. Comput. Vis. 99, 190–214 (2012)
Marin-Jimenez, M., Zisserman, A., Eichner, M., Ferrari, V.: Detecting people looking at each other in videos. Int. J. Comput. Vis. 106, 282–296 (2014)
Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2879–2886 (2012)
Ploderer, B., Howard, S., Thomas, P., Reitberger, W.: “Hey world, take a look at me!”: appreciating the human body on social network sites. In: Oinas-Kukkonen, H., Hasle, P., Harjumaa, M., Segerståhl, K., Øhrstrøm, P. (eds.) PERSUASIVE 2008. LNCS, vol. 5033, pp. 245–248. Springer, Heidelberg (2008)
Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Qi, X., Xiao, R., Guo, J., Zhang, L.: Pairwise rotation invariant co-occurrence local binary pattern. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 158–171. Springer, Heidelberg (2012)
Flickr, H.: Freecg (2014)
Ng, T.T., Chang, S.F.: Classifying Photographic and Photorealistic Computer Graphic Images using Natural Image Statistics. ADVENT Technical report, No. 220–2006-6, Columbia University (2004)
Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: IEEE International Conference on Computer Vision (ICCV) (2009)
Rowse, D.: Why black and white photography (2014). http://digital-photography-school.com/why-black-and-white-photography/(Retrived)
Machajdik, J., Hanbury, A.: Affective image classification using features inspired by psychology and art theory. In: Proceedings of the International Conference on Multimedia, MM 2010, pp. 83–92. ACM, New York (2010)
Datta, R., Joshi, D., Li, J., Wang, J.Z.: Studying aesthetics in photographic images using a computational approach. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 288–301. Springer, Heidelberg (2006)
Rosenholtz, R., Li, Y., Mansfield, J., Jin, Z.: Feature congestion: a measure of display clutter. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 761–770. ACM (2005)
Tan, M., Wang, L., Tsang, I.W.: Learning sparse SVM for feature selection on very high dimensional datasets. In: ICML, pp. 1047–1054 (2010)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, pp. 92–100 (1998)
Sindhwani, V., Niyogi, P.: A co-regularized approach to semi-supervised learning with multiple views. In: Proceedings of the ICML Workshop on Learning with Multiple Views (2005)
Zhou, J.T., Pan, S.J., Qi, M., W Tsang, I.: Multi-view positive and unlabeled learning. In: Proceedings of the 4th Asian Conference on Machine Learning, ACML 2012, Singapore, 4–6 November 2012, pp. 555–570 (2012)
Zheng, W., Zhou, X., Zou, C., Zhao, L.: Facial expression recognition using kernel canonical correlation analysis (KCCA). IEEE Trans. Neural Netw. 17, 233–238 (2006)
Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep Boltzmann machines. In: Advances in Neural Information Processing Systems, pp. 2222–2230 (2012)
Zhou, J.T., Pan, S.J., Tsang, I.W., Yan, Y.: Hybrid heterogeneous transfer learning through deep learning. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)
Zhou, J.T., W Tsang, I., Pan, S.J., Tan, M.: Heterogeneous domain adaptation for multiple classes. In: Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, pp. 1095–1103 (2014)
Chen, M., Xu, Z.E., Weinberger, K.Q., Sha, F.: Marginalized denoising autoencoders for domain adaptation. In: ICML (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Guntuku, S.C., Zhou, J.T., Roy, S., Weisi, L., Tsang, I.W. (2015). Deep Representations to Model User ‘Likes’. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision – ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9003. Springer, Cham. https://doi.org/10.1007/978-3-319-16865-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-16865-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16864-7
Online ISBN: 978-3-319-16865-4
eBook Packages: Computer ScienceComputer Science (R0)