Abstract
Due to the subjective nature of social tagging, measuring the relevance of social tags with respect to the visual content is crucial for retrieving the increasing amounts of social-networked images. Witnessing the limit of a single measurement of tag relevance, we introduce in this paper tag relevance fusion as an extension to methods for tag relevance estimation. We present a systematic study, covering tag relevance fusion in early and late stages, and in supervised and unsupervised settings. Experiments on a large present-day benchmark set show that tag relevance fusion leads to better image retrieval. Moreover, unsupervised tag relevance fusion is found to be practically as effective as supervised tag relevance fusion, but without the need of any training efforts. This finding suggests the potential of tag relevance fusion for real-world deployment.
Similar content being viewed by others
Notes
http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm. As some images are no longer available on Flickr, the dataset used in this paper are a bit smaller than the original release.
References
Aslam, J., Montague, M.: Models for metasearch. In: SIGIR (2001)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, Boston (1999)
Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorisation. Int. J. Inf. Fusion 6(1), 5–20 (2005)
Chen, L., Xu, D., Tsang, I., Luo, J.: Tag-based image retrieval improved by augmented features and group-based refinement. IEEE Trans. Multimed. 14(4), 1057–1067 (2012)
Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.T.: NUS-WIDE: a real-world web image database from National University of Singapore. In: CIVR (2009)
Datta, R., Joshi, D., Li, J., Wang, J.: Image retrieval: ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2), 1–60 (2008)
Gao, Y., Wang, M., Luan, H., Shen, J., Yan, S., Tao, D.: Tag-based social image search with visual-text joint hypergraph learning. In: ACM multimedia (2011)
Gao, Y., Wang, M., Zha, Z.J., Shen, J., Li, X., Wu, X.: Visual-textual joint relevance learning for tag-based social image search. IEEE Trans. Image Process. 22(1), 363–376 (2013)
Gehler, P., Nowozin, S.: Let the kernel figure it out; principled learning of pre-processing for kernel classifiers. In: CVPR (2009)
Guillaumin, M., Mensink, T., Verbeek, J., Schmid, C.: TagProp: discriminative metric learning in nearest neighbor models for image auto-annotation. In: ICCV (2009)
Heikkilä, M., Pietikäinen, M., Schmid, C.: Description of interest regions with local binary patterns. Pattern Recogn. 42, 425–436 (2009)
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20, 422–446 (2002)
Jaynes, E.: Probability Theory: The Logic of Science. Cambridge University Press, Cambridge (2003)
Kennedy, L., Naaman, M., Ahern, S., Nair, R., Rattenbury, T.: How Flickr helps us make sense of the world: context and content in community-contributed media collections. In: ACM multimedia (2007)
Lee, S., De Neve, W., Ro, Y.: Image tag refinement along the ’what’ dimension using tag categorization and neighbor voting. In: ICME (2010)
Li, M.: Texture moment for content-based image retrieval. In: ICME (2007)
Li, X., Liao, S., Liu, B., Yang, G., Jin, Q., Xu, J., Du, X.: Renmin University of China at ImageCLEF 2013 scalable concept image annotation. In: CLEF working notes (2013)
Li, X., Snoek, C.: Classifying tag relevance with relevant positive and negative examples. In: ACM multimedia (2013)
Li, X., Snoek, C., Worring, M.: Learning social tag relevance by neighbor voting. IEEE Trans. Multimed. 11(7), 1310–1322 (2009)
Li, X., Snoek, C., Worring, M.: Unsupervised multi-feature tag relevance learning for social image retrieval. In: CIVR (2010)
Li, X., Snoek, C., Worring, M., Koelma, D., Smeulders, A.: Bootstrapping visual categorization with relevant negatives. IEEE Trans. Multimed. 15(4), 933–945 (2013)
Li, Z., Zhang, L., Ma, W.Y.: Delivering online advertisements inside images. In: ACM Multimedia (2008)
Liu, D., Hua, X.S., Wang, M., Zhang, H.J.: Image retagging. In: ACM Multimedia (2010)
Liu, D., Hua, X.S., Yang, L., Wang, M., Zhang, H.J.: Tag ranking. In: WWW (2009)
Liu, T.Y.: Learning to rank for information retrieval. Found. Trends Inf. Retr. 3(3), 225–331 (2009)
Lu, Y., Zhang, L., Liu, J., Tian, Q.: Constructing concept lexica with small semantic gaps. IEEE Trans. Multimed. 12(4), 288–299 (2010)
Maji, S., Berg, A., Malik, J.: Classification using intersection kernel support vector machines is efficient. In: CVPR, pp. 1–8 (2008)
Makadia, A., Pavlovic, V., Kumar, S.: Baselines for image annotation. Int. J. Comput. Vis. 90(1), 88–105 (2010)
Matusiak, K.: Towards user-centered indexing in digital image collections. OCLC Syst. Serv. 22(4), 283–298 (2006)
Metzler, D., Croft, B.: Linear feature-based models for information retrieval. Inf. Retr. 10(3), 257–274 (2007)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Radovanović, M., Nanopoulos, A., Ivanović, M.: Hubs in space: Popular nearest neighbors in high-dimensional data. J. Mach. Learn. Res. 11, 2487–2531 (2010)
van de Sande, K., Gevers, T., Snoek, C.: Evaluating color descriptors for object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1582–1596 (2010)
Smucker, M., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: CIKM (2007)
Snoek, C., Worring, M., Smeulders, A.: Early versus late fusion in semantic video analysis. In: ACM Multimedia (2005)
Sun, A., Bhowmick, S.: Quantifying tag representativeness of visual content of social images. In: ACM multimedia (2010)
Sun, A., Bhowmick, S., Nguyen, K., Bai, G.: Tag-based social image retrieval: an empirical evaluation. J. Am. Soc. Inf. Sci. Technol. 62(12), 2364–2381 (2011)
Tang, J., Hong, R., Yan, S., Chua, T.S., Qi, G.J., Jain, R.: Image annotation by k nn-sparse graph-based label propagation over noisily tagged web images. ACM Trans. Intell. Syst. Technol. 2, 14:1–14:15 (2011)
Uricchio, T., Ballan, L., Bertini, M., Del Bimbo, A.: An evaluation of nearest-neighbor methods for tag refinement. In: ICME (2013)
Wang, D., Liu, X., Luo, L., Li, J., Zhang, B.: Video diver: generic video indexing with diverse features. In: ACM MIR (2007)
Wang, G., Hoiem, D., Forsyth, D.: Building text features for object image classification. In: CVPR (2009)
Wang, J., Li, J., Wiederhold, G.: SIMPLIcity: semantics-sensitive integrated matching for picture libraries. IEEE Trans. Pattern Anal. Mach. Intell. 23, 947–963 (2001)
Wang, M., Hua, X.S., Hong, R., Tang, J., Qi, G.J., Song, Y.: Unified video annotation via multigraph learning. IEEE Trans. Circuit Syst. Video Technol. 19, 733–746 (2009)
Wu, Y., Chang, E., Chang, K., Smith, J.: Optimal multimodal fusion for multimedia data analysis. In: ACM multimedia (2004)
Xu, H., Wang, J., Hua, X.S., Li, S.: Tag refinement by regularized LDA. In: ACM multimedia (2009)
Yang, Y., Gao, Y., Zhang, H., Shao, J., Chua, T.S.: Image tagging with social assistance. In: ICMR (2014)
Yeh, T., Lee, J., Darrell, T.: Photo-based question answering. In: ACM multimedia (2008)
Zha, Z.J., Yang, L., Mei, T., Wang, M., Wang, Z., Chua, T.S., Hua, X.S.: Visual query suggestion: Towards capturing user intent in internet image search. ACM Trans. Multimed. Comput. Commun. Appl. 6(3), 13:1–13:19 (2010)
Zhang, L., Gao, Y., Hong, C., Feng, Y., Zhu, J., Cai, D.: Feature correlation hypergraph: exploiting high-order potentials for multimodal recognition. IEEE Trans. Cybernet. 44(8), 1408–1419 (2014)
Zhang, L., Gao, Y., Xia, Y., Dai, Q., Li, X.: A fine-grained image categorization system by cellet-encoded spatial pyramid modeling. IEEE Trans. Ind. Electron. (2014). doi:10.1109/TIE.2014.2327558
Zhang, L., Han, Y., Yang, Y., Song, M., Yan, S., Tian, Q.: Discovering discriminative graphlets for aerial image categories recognition. IEEE Trans. Image Process.22(2), 5071–5084 (2013)
Zhang, L., Rui, Y.: Image search-from thousands to billions in 20 years. ACM Trans. Multimed. Comput. Commun. Appl. 9(1), 36:1–36:20 (2013)
Zhang, L., Song, M., Liu, X., Bu, J., Chen, C.: Fast multi-view segment graph kernel for object classification. Signal Process. 93(6), 1597–1607 (2013)
Zhang, L., Song, M., Liu, X., Sun, L., Chen, C., Bu, J.: Recognizing architecture styles by hierarchical sparse coding of blocklets. Inf. Sci. 254, 141–154 (2014)
Zhang, L., Yang, Y., Gao, Y., Yu, Y., Wang, C., Li, X.: A probabilistic associative model for segmenting weakly supervised images. IEEE Trans. Image Process. 23(9), 4150–4159 (2014)
Zhu, G., Yan, S., Ma, Y.: Image tag refinement towards low-rank, content-tag prior and error sparsity. In: ACM multimedia (2010)
Zhu, S., Jiang, Y.G., Ngo, C.W.: Sampling and ontologically pooling web images for visual concept learning. IEEE Trans. Multimed. 14(4), 1068–1078 (2012)
Acknowledgments
The author is grateful to Dr. Cees Snoek and Dr. Marcel Worring for their very useful comments on this work. The research was supported by NSFC (No. 61303184), SRFDP (No. 20130004120006), the Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China (No. 14XNLQ01), and Shanghai Key Laboratory of Intelligent Information Processing, China (Grant No. IIPL-2014-002).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, X. Tag relevance fusion for social image retrieval. Multimedia Systems 23, 29–40 (2017). https://doi.org/10.1007/s00530-014-0430-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-014-0430-9