Skip to main content

Deep Representations to Model User ‘Likes’

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9003))

Abstract

Automatically understanding and modeling a user’s liking for an image is a challenging problem. This is because the relationship between the images features (even semantic ones extracted by existing tools, viz. faces, objects etc.) and users’ ‘likes’ is non-linear, influenced by several subtle factors. This work presents a deep bi-modal knowledge representation of images based on their visual content and associated tags (text). A mapping step between the different levels of visual and textual representations allows for the transfer of semantic knowledge between the two modalities. It also includes feature selection before learning deep representation to identify the important features for a user to like an image. Then the proposed representation is shown to be effective in learning a model of users image ‘likes’ based on a collection of images ‘liked’ by him. On a collection of images ‘liked’ by users (from Flickr) the proposed deep representation is shown to better state-of-art low-level features used for modeling user ‘likes’ by around 15–20 %.

These authors ‘S.C. Guntuku and J.T. Zhou’ contributed equally.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Kennedy, L., Naaman, M., Ahern, S., Nair, R., Rattenbury, T.: How flickr helps us make sense of the world: context and content in community-contributed media collections. In: Proceedings of the 15th International Conference on Multimedia. MULTIMEDIA 2007, pp. 631–640. ACM, New York (2007)

    Google Scholar 

  2. Gong, Y., Ke, Q., Isard, M., Lazebnik, S.: A multi-view embedding space for modeling internet images, tags, and their semantics. Int. J. Comput. Vis. 106, 210–233 (2014)

    Article  Google Scholar 

  3. Lampropoulos, A.S., Lampropoulou, P.S., Tsihrintzis, G.A.: A cascade-hybrid music recommender system for mobile services based on musical genre classification and personality diagnosis. Multimedia Tools Appl. 59, 241–258 (2012)

    Article  Google Scholar 

  4. Mairesse, F., Walker, M.A., Mehl, M.R., Moore, R.K.: Using linguistic cues for the automatic recognition of personality in conversation and text. J. Artif. Intell. Res. (JAIR) 30, 457–500 (2007)

    Article  Google Scholar 

  5. Cristani, M., Vinciarelli, A., Segalin, C., Perina, A.: Unveiling the multimedia unconscious: implicit cognitive processes and multimedia content analysis. In: Proceedings of the 21st ACM International Conference on Multimedia. MM 2013, pp. 213–222. ACM, New York (2013)

    Google Scholar 

  6. Guntuku, S.C., Roy, S., Weisi, L.: Personality modeling based image recommendation. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds.) MMM 2015, Part II. LNCS, vol. 8936, pp. 171–182. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  7. Lovato, P., Perina, A., Sebe, N., Zandonà, O., Montagnini, A., Bicego, M., Cristani, M.: Tell me what you like and i’ll tell you what you are: discriminating visual preferences on flickr data. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 45–56. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  8. Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113. IEEE (2009)

    Google Scholar 

  9. Eichner, M., Marin-Jimenez, M., Zisserman, A., Ferrari, V.: 2d articulated human pose estimation and retrieval in (almost) unconstrained still images. Int. J. Comput. Vis. 99, 190–214 (2012)

    Article  MathSciNet  Google Scholar 

  10. Marin-Jimenez, M., Zisserman, A., Eichner, M., Ferrari, V.: Detecting people looking at each other in videos. Int. J. Comput. Vis. 106, 282–296 (2014)

    Article  Google Scholar 

  11. Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2879–2886 (2012)

    Google Scholar 

  12. Ploderer, B., Howard, S., Thomas, P., Reitberger, W.: “Hey world, take a look at me!”: appreciating the human body on social network sites. In: Oinas-Kukkonen, H., Hasle, P., Harjumaa, M., Segerståhl, K., Øhrstrøm, P. (eds.) PERSUASIVE 2008. LNCS, vol. 5033, pp. 245–248. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  13. Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)

    Google Scholar 

  14. Qi, X., Xiao, R., Guo, J., Zhang, L.: Pairwise rotation invariant co-occurrence local binary pattern. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 158–171. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  15. Flickr, H.: Freecg (2014)

    Google Scholar 

  16. Ng, T.T., Chang, S.F.: Classifying Photographic and Photorealistic Computer Graphic Images using Natural Image Statistics. ADVENT Technical report, No. 220–2006-6, Columbia University (2004)

    Google Scholar 

  17. Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: IEEE International Conference on Computer Vision (ICCV) (2009)

    Google Scholar 

  18. Rowse, D.: Why black and white photography (2014). http://digital-photography-school.com/why-black-and-white-photography/(Retrived)

  19. Machajdik, J., Hanbury, A.: Affective image classification using features inspired by psychology and art theory. In: Proceedings of the International Conference on Multimedia, MM 2010, pp. 83–92. ACM, New York (2010)

    Google Scholar 

  20. Datta, R., Joshi, D., Li, J., Wang, J.Z.: Studying aesthetics in photographic images using a computational approach. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 288–301. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  21. Rosenholtz, R., Li, Y., Mansfield, J., Jin, Z.: Feature congestion: a measure of display clutter. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 761–770. ACM (2005)

    Google Scholar 

  22. Tan, M., Wang, L., Tsang, I.W.: Learning sparse SVM for feature selection on very high dimensional datasets. In: ICML, pp. 1047–1054 (2010)

    Google Scholar 

  23. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, pp. 92–100 (1998)

    Google Scholar 

  24. Sindhwani, V., Niyogi, P.: A co-regularized approach to semi-supervised learning with multiple views. In: Proceedings of the ICML Workshop on Learning with Multiple Views (2005)

    Google Scholar 

  25. Zhou, J.T., Pan, S.J., Qi, M., W Tsang, I.: Multi-view positive and unlabeled learning. In: Proceedings of the 4th Asian Conference on Machine Learning, ACML 2012, Singapore, 4–6 November 2012, pp. 555–570 (2012)

    Google Scholar 

  26. Zheng, W., Zhou, X., Zou, C., Zhao, L.: Facial expression recognition using kernel canonical correlation analysis (KCCA). IEEE Trans. Neural Netw. 17, 233–238 (2006)

    Article  Google Scholar 

  27. Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep Boltzmann machines. In: Advances in Neural Information Processing Systems, pp. 2222–2230 (2012)

    Google Scholar 

  28. Zhou, J.T., Pan, S.J., Tsang, I.W., Yan, Y.: Hybrid heterogeneous transfer learning through deep learning. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)

    Google Scholar 

  29. Zhou, J.T., W Tsang, I., Pan, S.J., Tan, M.: Heterogeneous domain adaptation for multiple classes. In: Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, pp. 1095–1103 (2014)

    Google Scholar 

  30. Chen, M., Xu, Z.E., Weinberger, K.Q., Sha, F.: Marginalized denoising autoencoders for domain adaptation. In: ICML (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sharath Chandra Guntuku .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Guntuku, S.C., Zhou, J.T., Roy, S., Weisi, L., Tsang, I.W. (2015). Deep Representations to Model User ‘Likes’. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision – ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9003. Springer, Cham. https://doi.org/10.1007/978-3-319-16865-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16865-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16864-7

  • Online ISBN: 978-3-319-16865-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics