Multimedia Systems

, Volume 23, Issue 1, pp 29–40 | Cite as

Tag relevance fusion for social image retrieval

  • Xirong Li
Special Issue Paper


Due to the subjective nature of social tagging, measuring the relevance of social tags with respect to the visual content is crucial for retrieving the increasing amounts of social-networked images. Witnessing the limit of a single measurement of tag relevance, we introduce in this paper tag relevance fusion as an extension to methods for tag relevance estimation. We present a systematic study, covering tag relevance fusion in early and late stages, and in supervised and unsupervised settings. Experiments on a large present-day benchmark set show that tag relevance fusion leads to better image retrieval. Moreover, unsupervised tag relevance fusion is found to be practically as effective as supervised tag relevance fusion, but without the need of any training efforts. This finding suggests the potential of tag relevance fusion for real-world deployment.


Social image retrieval Tag relevance estimation Tag relevance fusion 



The author is grateful to Dr. Cees Snoek and Dr. Marcel Worring for their very useful comments on this work. The research was supported by NSFC (No. 61303184), SRFDP (No. 20130004120006), the Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China (No. 14XNLQ01), and Shanghai Key Laboratory of Intelligent Information Processing, China (Grant No. IIPL-2014-002).


  1. 1.
    Aslam, J., Montague, M.: Models for metasearch. In: SIGIR (2001)Google Scholar
  2. 2.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, Boston (1999)Google Scholar
  3. 3.
    Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorisation. Int. J. Inf. Fusion 6(1), 5–20 (2005)CrossRefGoogle Scholar
  4. 4.
    Chen, L., Xu, D., Tsang, I., Luo, J.: Tag-based image retrieval improved by augmented features and group-based refinement. IEEE Trans. Multimed. 14(4), 1057–1067 (2012)CrossRefGoogle Scholar
  5. 5.
    Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.T.: NUS-WIDE: a real-world web image database from National University of Singapore. In: CIVR (2009)Google Scholar
  6. 6.
    Datta, R., Joshi, D., Li, J., Wang, J.: Image retrieval: ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2), 1–60 (2008)CrossRefGoogle Scholar
  7. 7.
    Gao, Y., Wang, M., Luan, H., Shen, J., Yan, S., Tao, D.: Tag-based social image search with visual-text joint hypergraph learning. In: ACM multimedia (2011)Google Scholar
  8. 8.
    Gao, Y., Wang, M., Zha, Z.J., Shen, J., Li, X., Wu, X.: Visual-textual joint relevance learning for tag-based social image search. IEEE Trans. Image Process. 22(1), 363–376 (2013)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Gehler, P., Nowozin, S.: Let the kernel figure it out; principled learning of pre-processing for kernel classifiers. In: CVPR (2009)Google Scholar
  10. 10.
    Guillaumin, M., Mensink, T., Verbeek, J., Schmid, C.: TagProp: discriminative metric learning in nearest neighbor models for image auto-annotation. In: ICCV (2009)Google Scholar
  11. 11.
    Heikkilä, M., Pietikäinen, M., Schmid, C.: Description of interest regions with local binary patterns. Pattern Recogn. 42, 425–436 (2009)CrossRefzbMATHGoogle Scholar
  12. 12.
    Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20, 422–446 (2002)CrossRefGoogle Scholar
  13. 13.
    Jaynes, E.: Probability Theory: The Logic of Science. Cambridge University Press, Cambridge (2003)CrossRefzbMATHGoogle Scholar
  14. 14.
    Kennedy, L., Naaman, M., Ahern, S., Nair, R., Rattenbury, T.: How Flickr helps us make sense of the world: context and content in community-contributed media collections. In: ACM multimedia (2007)Google Scholar
  15. 15.
    Lee, S., De Neve, W., Ro, Y.: Image tag refinement along the ’what’ dimension using tag categorization and neighbor voting. In: ICME (2010)Google Scholar
  16. 16.
    Li, M.: Texture moment for content-based image retrieval. In: ICME (2007)Google Scholar
  17. 17.
    Li, X., Liao, S., Liu, B., Yang, G., Jin, Q., Xu, J., Du, X.: Renmin University of China at ImageCLEF 2013 scalable concept image annotation. In: CLEF working notes (2013)Google Scholar
  18. 18.
    Li, X., Snoek, C.: Classifying tag relevance with relevant positive and negative examples. In: ACM multimedia (2013)Google Scholar
  19. 19.
    Li, X., Snoek, C., Worring, M.: Learning social tag relevance by neighbor voting. IEEE Trans. Multimed. 11(7), 1310–1322 (2009)CrossRefGoogle Scholar
  20. 20.
    Li, X., Snoek, C., Worring, M.: Unsupervised multi-feature tag relevance learning for social image retrieval. In: CIVR (2010)Google Scholar
  21. 21.
    Li, X., Snoek, C., Worring, M., Koelma, D., Smeulders, A.: Bootstrapping visual categorization with relevant negatives. IEEE Trans. Multimed. 15(4), 933–945 (2013)CrossRefGoogle Scholar
  22. 22.
    Li, Z., Zhang, L., Ma, W.Y.: Delivering online advertisements inside images. In: ACM Multimedia (2008)Google Scholar
  23. 23.
    Liu, D., Hua, X.S., Wang, M., Zhang, H.J.: Image retagging. In: ACM Multimedia (2010)Google Scholar
  24. 24.
    Liu, D., Hua, X.S., Yang, L., Wang, M., Zhang, H.J.: Tag ranking. In: WWW (2009)Google Scholar
  25. 25.
    Liu, T.Y.: Learning to rank for information retrieval. Found. Trends Inf. Retr. 3(3), 225–331 (2009)CrossRefGoogle Scholar
  26. 26.
    Lu, Y., Zhang, L., Liu, J., Tian, Q.: Constructing concept lexica with small semantic gaps. IEEE Trans. Multimed. 12(4), 288–299 (2010)CrossRefGoogle Scholar
  27. 27.
    Maji, S., Berg, A., Malik, J.: Classification using intersection kernel support vector machines is efficient. In: CVPR, pp. 1–8 (2008)Google Scholar
  28. 28.
    Makadia, A., Pavlovic, V., Kumar, S.: Baselines for image annotation. Int. J. Comput. Vis. 90(1), 88–105 (2010)CrossRefGoogle Scholar
  29. 29.
    Matusiak, K.: Towards user-centered indexing in digital image collections. OCLC Syst. Serv. 22(4), 283–298 (2006)CrossRefGoogle Scholar
  30. 30.
    Metzler, D., Croft, B.: Linear feature-based models for information retrieval. Inf. Retr. 10(3), 257–274 (2007)CrossRefGoogle Scholar
  31. 31.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)CrossRefzbMATHGoogle Scholar
  32. 32.
    Radovanović, M., Nanopoulos, A., Ivanović, M.: Hubs in space: Popular nearest neighbors in high-dimensional data. J. Mach. Learn. Res. 11, 2487–2531 (2010)MathSciNetzbMATHGoogle Scholar
  33. 33.
    van de Sande, K., Gevers, T., Snoek, C.: Evaluating color descriptors for object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1582–1596 (2010)CrossRefGoogle Scholar
  34. 34.
    Smucker, M., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: CIKM (2007)Google Scholar
  35. 35.
    Snoek, C., Worring, M., Smeulders, A.: Early versus late fusion in semantic video analysis. In: ACM Multimedia (2005)Google Scholar
  36. 36.
    Sun, A., Bhowmick, S.: Quantifying tag representativeness of visual content of social images. In: ACM multimedia (2010)Google Scholar
  37. 37.
    Sun, A., Bhowmick, S., Nguyen, K., Bai, G.: Tag-based social image retrieval: an empirical evaluation. J. Am. Soc. Inf. Sci. Technol. 62(12), 2364–2381 (2011)CrossRefGoogle Scholar
  38. 38.
    Tang, J., Hong, R., Yan, S., Chua, T.S., Qi, G.J., Jain, R.: Image annotation by k nn-sparse graph-based label propagation over noisily tagged web images. ACM Trans. Intell. Syst. Technol. 2, 14:1–14:15 (2011)CrossRefGoogle Scholar
  39. 39.
    Uricchio, T., Ballan, L., Bertini, M., Del Bimbo, A.: An evaluation of nearest-neighbor methods for tag refinement. In: ICME (2013)Google Scholar
  40. 40.
    Wang, D., Liu, X., Luo, L., Li, J., Zhang, B.: Video diver: generic video indexing with diverse features. In: ACM MIR (2007)Google Scholar
  41. 41.
    Wang, G., Hoiem, D., Forsyth, D.: Building text features for object image classification. In: CVPR (2009)Google Scholar
  42. 42.
    Wang, J., Li, J., Wiederhold, G.: SIMPLIcity: semantics-sensitive integrated matching for picture libraries. IEEE Trans. Pattern Anal. Mach. Intell. 23, 947–963 (2001)CrossRefGoogle Scholar
  43. 43.
    Wang, M., Hua, X.S., Hong, R., Tang, J., Qi, G.J., Song, Y.: Unified video annotation via multigraph learning. IEEE Trans. Circuit Syst. Video Technol. 19, 733–746 (2009)CrossRefGoogle Scholar
  44. 44.
    Wu, Y., Chang, E., Chang, K., Smith, J.: Optimal multimodal fusion for multimedia data analysis. In: ACM multimedia (2004)Google Scholar
  45. 45.
    Xu, H., Wang, J., Hua, X.S., Li, S.: Tag refinement by regularized LDA. In: ACM multimedia (2009)Google Scholar
  46. 46.
    Yang, Y., Gao, Y., Zhang, H., Shao, J., Chua, T.S.: Image tagging with social assistance. In: ICMR (2014)Google Scholar
  47. 47.
    Yeh, T., Lee, J., Darrell, T.: Photo-based question answering. In: ACM multimedia (2008)Google Scholar
  48. 48.
    Zha, Z.J., Yang, L., Mei, T., Wang, M., Wang, Z., Chua, T.S., Hua, X.S.: Visual query suggestion: Towards capturing user intent in internet image search. ACM Trans. Multimed. Comput. Commun. Appl. 6(3), 13:1–13:19 (2010)CrossRefGoogle Scholar
  49. 49.
    Zhang, L., Gao, Y., Hong, C., Feng, Y., Zhu, J., Cai, D.: Feature correlation hypergraph: exploiting high-order potentials for multimodal recognition. IEEE Trans. Cybernet. 44(8), 1408–1419 (2014)CrossRefGoogle Scholar
  50. 50.
    Zhang, L., Gao, Y., Xia, Y., Dai, Q., Li, X.: A fine-grained image categorization system by cellet-encoded spatial pyramid modeling. IEEE Trans. Ind. Electron. (2014). doi: 10.1109/TIE.2014.2327558
  51. 51.
    Zhang, L., Han, Y., Yang, Y., Song, M., Yan, S., Tian, Q.: Discovering discriminative graphlets for aerial image categories recognition. IEEE Trans. Image Process.22(2), 5071–5084 (2013)MathSciNetCrossRefGoogle Scholar
  52. 52.
    Zhang, L., Rui, Y.: Image search-from thousands to billions in 20 years. ACM Trans. Multimed. Comput. Commun. Appl. 9(1), 36:1–36:20 (2013)Google Scholar
  53. 53.
    Zhang, L., Song, M., Liu, X., Bu, J., Chen, C.: Fast multi-view segment graph kernel for object classification. Signal Process. 93(6), 1597–1607 (2013)CrossRefGoogle Scholar
  54. 54.
    Zhang, L., Song, M., Liu, X., Sun, L., Chen, C., Bu, J.: Recognizing architecture styles by hierarchical sparse coding of blocklets. Inf. Sci. 254, 141–154 (2014)CrossRefGoogle Scholar
  55. 55.
    Zhang, L., Yang, Y., Gao, Y., Yu, Y., Wang, C., Li, X.: A probabilistic associative model for segmenting weakly supervised images. IEEE Trans. Image Process. 23(9), 4150–4159 (2014)MathSciNetCrossRefGoogle Scholar
  56. 56.
    Zhu, G., Yan, S., Ma, Y.: Image tag refinement towards low-rank, content-tag prior and error sparsity. In: ACM multimedia (2010)Google Scholar
  57. 57.
    Zhu, S., Jiang, Y.G., Ngo, C.W.: Sampling and ontologically pooling web images for visual concept learning. IEEE Trans. Multimed. 14(4), 1068–1078 (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.Key Laboratory of Data Engineering and Knowledge EngineeringRenmin University of China BeijingChina
  2. 2.Shanghai Key Laboratory of Intelligent Information Processing ShanghaiChina

Personalised recommendations