Multimedia Tools and Applications

, Volume 74, Issue 4, pp 1443–1468 | Cite as

Data-driven approaches for social image and video tagging

  • Lamberto Ballan
  • Marco BertiniEmail author
  • Tiberio Uricchio
  • Alberto Del Bimbo


The large success of online social platforms for creation, sharing and tagging of user-generated media has lead to a strong interest by the multimedia and computer vision communities in research on methods and techniques for annotating and searching social media. Visual content similarity, geo-tags and tag co-occurrence, together with social connections and comments, can be exploited to perform tag suggestion as well as to per-form content classification and c lustering and enable more effective semantic indexing and retrieval of visual data. However there is need to overcome the relatively low quality of these metadata: user produced tags and annotations are known to be ambiguous, imprecise and/or incomplete, excessively personalized and limited - and at the same time take into account the ‘web-scale’ quantity of media and the fact that social network users continuously add new images and create new terms. We will review the state of the art approaches to automatic annotation and tag refinement for social images, considering also the temporal patterns of their usage, and discuss extensions to tag suggestion and localization in web video sequences.


Social media Image tagging Video tagging Temporal analysis 


  1. 1.
    Alonso O, Gertz M, Baeza-Yates R (2007) On the value of temporal information in information retrieval. SIGIR Forum 41(2): 35–41CrossRefGoogle Scholar
  2. 2.
    Ballan L, Bertini M, Del Bimbo A, Meoni M, Serra G (2010) Tag suggestion and localization in user-generated videos based on social knowledge. In: Proceedings of ACM SIGMM Workshop on Social Media (WSM). FirenzeGoogle Scholar
  3. 3.
    Ballan L, Bertini M, Del Bimbo A, Serra G (2011) Enriching and localizing semantic tags in internet videos. In: Proceedings of ACM international conference on multimedia (ACM MM), pp 1541–1544. doi: 10.1145/2072298.2072060
  4. 4.
    Choi H, Varian H (2011) Predicting the present with Google Trends. Tech. rep., GoogleGoogle Scholar
  5. 5.
    Chu W T, Li C J (2011) Tag suggestion and localization for web videos by bipartite graph matching. In: Proceedings of ACM SIGMM Workshop on Social Media (WSM). New YorkGoogle Scholar
  6. 6.
    Chua T S, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of ACM CIVRGoogle Scholar
  7. 7.
    Cohen J (1988) Statistical power analysis for the behavioral sciences. Routledge AcademicGoogle Scholar
  8. 8.
    Ginsberg J, Mohebbi M H, Patel R S, Brammer L, Smolinski M S, Brilliant L (2009) Detecting influenza epidemics using search engine query data. Nature 457(7232): 1012–1014CrossRefGoogle Scholar
  9. 9.
    Guillaumin M, Mensink T, Verbeek J, Schmid C (2009) Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In: Proceedings of ICCVGoogle Scholar
  10. 10.
    Huiskes M J, Lew MS (2008), The MIR Flickr retrieval evaluation. In: Proceeding of ACM MIRGoogle Scholar
  11. 11.
    Huiskes MJ, Thomee B, Lew MS (2010) New trends and ideas in visual concept detection: the MIR Flickr retrieval evaluation initiative. In: Proceedings of ACM MIR, pp 527–536Google Scholar
  12. 12.
    Jin X, Gallagher A, Cao L, Luo J, Han J (2010) The wisdom of social multimedia: using Flickr for prediction and forecast. In: Proceedings of ACM MM, pp 1235–1244Google Scholar
  13. 13.
    Kennedy L S, Chang S F, Kozintsev I V (2006) To search or to label? Predicting the performance of search-based automatic image classifiers. In: Proceedings of ACM MIRGoogle Scholar
  14. 14.
    Kennedy L S, Slaney M, Weinberger K (2009) Reliable tags using image similarity: mining specificity and expertise from large-scale multimedia databases. In: Proceedings of ACM-MM Workshop on Web-Scale Multimedia Corpus. BeijingGoogle Scholar
  15. 15.
    Kim G, Xing EP (2013) Time-sensitive web image ranking and retrieval via dynamic multi-task regression. In: Proceedings of ACM WSDM, pp 163–172Google Scholar
  16. 16.
    Kim G, Xing EP, Torralba A (2010) Modeling and analysis of dynamic behaviors of web image collections. In: Proceedings of ECCV, pp 85–98Google Scholar
  17. 17.
    Kim G, Fei-Fei L, Xing EP (2012) Web image prediction using multivariate point processes. In: Proceedings of ACM SIGKDD, pp 1068–1076Google Scholar
  18. 18.
    Li G, Wang M, Zheng Y T, Chua T S (2011) ShotTagger: tag location for internet videos. In: Proceedings of ACM ICMRGoogle Scholar
  19. 19.
    Li H, Yi L, Guan Y, Zhang H (2013) DUT-WEBV: a benchmark dataset for performance evaluation of tag localization for web video. In: Proceedings of MMMGoogle Scholar
  20. 20.
    Li X, Snoek C G M, Worring M (2009) Learning social tag relevance by neighbor voting. IEEE Trans Multimed 11(7): 1310–1322CrossRefGoogle Scholar
  21. 21.
    Li X, Snoek C G M, Worring M (2010a) Unsupervised multi-feature tag relevance learning for social image retrieval. In: Proceedings of ACM CIVRGoogle Scholar
  22. 22.
    Li Z, Liu J, Zhu X, Liu T, Lu H (2010b) Image annotation using multi-correlation probabilistic matrix factorization. In: Proceedings of the international conference on multimedia, MM’10. ACM, New York, pp 11871190Google Scholar
  23. 23.
    Liu D, Hua X S, Yang L, Wang M, Zhang HJ (2009) Tag ranking. In: Proceedings of WWWGoogle Scholar
  24. 24.
    Liu D, Hua X S, Wang M, Zhang HJ (2010) Image retagging. In: Proceedings of ACM multimediaGoogle Scholar
  25. 25.
    Liu D, Hua X S, Zhang H J (2011a) Content-based tag processing for internet social images. Multimed Tools Appl 51(1): 723–738CrossRefGoogle Scholar
  26. 26.
    Liu D, Yan S, Hua X S, Zhang H J (2011b) Image retagging using collaborative tag propagation. IEEE Trans Multimed 13(4): 702–712CrossRefGoogle Scholar
  27. 27.
    Liu Y, Jin R, Yang L (2006) Semi-supervised multi-label learning by constrained non-negative matrix factorization. In: AAAI-06: proceedings of the ninth national conference on artificial intelligence, vol 21. AAAI Press, p 421Google Scholar
  28. 28.
    Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. In: Proceedings of ECCVGoogle Scholar
  29. 29.
    Min H S, Choi J, De Neve W, Ro Y M, Plataniotis K N (2009) Semantic annotation of personal video content using an image folksonomy. In: Proceedings of IEEE ICIPGoogle Scholar
  30. 30.
    Paisitkriangkrai S, Mei T, Zhang J, Hua X S (2010) Scalable clip-based near-duplicate video detection with ordinal measure. In: Proceedings of ACM CIVRGoogle Scholar
  31. 31.
    Rattenbury T, Good N, Naaman M (2007) Towards automatic extraction of event and place semantics from flickr tags. In: Proceedings of ACM SIGIR, pp 103–110Google Scholar
  32. 32.
    Salakhutdinov R, Mnih A (2008) Probabilistic matrix factorization. Adv. Neural Info Process Syst 20: 1257–1264Google Scholar
  33. 33.
    Sang J, Xu C, Liu J (2012) User-aware image tag refinement via ternary semantic analysis. IEEE Trans Multimed 14(3): 883–895CrossRefGoogle Scholar
  34. 34.
    Shao J, Yin W, Ma S, Zhuang Y (2010) Topic discovery of web video using star-structured k-partite graph. In: Proceedings of ACM multimediaGoogle Scholar
  35. 35.
    Sigurbjörnsson B, van Zwol R (2008) Flickr tag recommendation based on collective knowledge. In: Proceedings of WWW, pp 327–336Google Scholar
  36. 36.
    Thomee B, Bakker EM, Lew MS (2010) TOP-SURF: a visual words toolkit. In: Proceedings of ACM multimedia. doi: 10.1145/1873951.1874250
  37. 37.
    Tsai D, Jing Y, Liu Y, Rowley H A, Ioffe S, Rehg J M (2011) Large-scale image annotation using visual synset. In: 2011 IEEE International conference on computer vision (ICCV). IEEE, pp 611–618Google Scholar
  38. 38.
    Ulges A, Schulze C, Koch M, Breuel T M (2010) Learning automatic concept detectors from online video. Comp Vision Image Underst 114(4): 429–438CrossRefGoogle Scholar
  39. 39.
    Verbeek J, Guillaumin M, Mensink T, Schmid C (2010) Image annotation with TagProp on the MIRFLICKR set. In: Proceedings of ACM MIRGoogle Scholar
  40. 40.
    von Ahn L, Dabbish L (2004) Labeling images with a computer game. In: Proceedings of ACMCHIGoogle Scholar
  41. 41.
    Wang C, Jing F, Zhang L, Zhang HJ (2007) Content-based image annotation refinement. In: Proceedings of CVPRGoogle Scholar
  42. 42.
    Zhang ML, Zhou ZH (2004) Improve multi-instance neural networks through feature selection. Neural Process Lett 19(1):1–10. doi: 10.1023/B:NEPL.0000016836.03614.9f Google Scholar
  43. 43.
    Zhu G, Yan S, Ma Y (2010) Image tag refinement towards low-rank. In: Proceedings of ACM multimediaGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Lamberto Ballan
    • 1
  • Marco Bertini
    • 1
    Email author
  • Tiberio Uricchio
    • 1
  • Alberto Del Bimbo
    • 1
  1. 1.Media Integration and Communication Center (MICC)Università degli Studi di FirenzeFirenzeItaly

Personalised recommendations