Visually weighted neighbor voting for image tag relevance learning

Lee, Sihyoung; De Neve, Wesley; Ro, Yong Man

doi:10.1007/s11042-013-1439-3

Visually weighted neighbor voting for image tag relevance learning

Published: 16 April 2013

Volume 72, pages 1363–1386, (2014)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Sihyoung Lee¹,
Wesley De Neve^1,2 &
Yong Man Ro¹

516 Accesses
18 Citations
Explore all metrics

Abstract

The presence of non-relevant tags in image folksonomies hampers the effective organization and retrieval of user-contributed images. In this paper, we propose to learn the relevance of user-supplied tags by means of visually weighted neighbor voting, a variant of the popular baseline neighbor voting algorithm proposed by Li et al. (IEEE Trans Multimedia 11(7):1310–1322, 2009). To gain insight into the effectiveness of baseline and visually weighted neighbor voting, we qualitatively analyze the difference in tag relevance when using a different number of neighbors, for both tags relevant and tags not relevant to the content of a seed image. Our qualitative analysis shows that tag relevance values computed by means of visually weighted neighbor voting are more stable and representative than tag relevance values computed by means of baseline neighbor voting. This is quantitatively confirmed through extensive experimentation with MIRFLICKR-25000, studying the variation of tag relevance values as a function of the number of neighbors used (for both tags relevant and tags not relevant with respect to the content of a seed image), as well as the influence of tag relevance learning on the effectiveness of image tag refinement, tag-based image retrieval, and image tag recommendation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Social tag relevance learning via ranking-oriented neighbor voting

Article 12 April 2016

Simultaneous Image Clustering, Classification and Annotation for Tourism Recommendation

Tag relevance fusion for social image retrieval

Article 25 October 2014

Notes

http://www.flickr.com/
http://www.facebook.com/
The subsequent qualitative analysis does not assume that visual search is perfect.

References

Agrawal G (2011) Relevancy tag ranking. In: International conference on computer and communication technology, pp 169–173
Ahn L, Dabbish L (2004) Labeling images with a computer game. In: SIGCHI conference on human factors in computing systems, pp 319–326
Chua T, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from National University of Singapore. In: ACM international conference on image and video retrieval (CIVR), pp 1–9
Feng S, Hong B, Lang C, Xu D (2011) Combining visual attention model with multi-instance learning for tag ranking. Neurocomputing 74(17):3619–3627
Article Google Scholar
Ferreira J, Silva A, Delgado J (2004) How to improve retrieval effectiveness on the web. In: IADIS E-society conference, pp 1–9
Flickr’s Photostream (2012) Trend report—summer’12. http://www.flickr.com/photos/flickr/. Accessed 24 Aug 2012
Huiskes MJ, Lew MS (2008) The MIR Flickr retrieval evaluation. In: ACM international conference on multimedia information retrieval, pp 39–43
Jin Y, Khan L, Wang L, Awad M (2005) Image annotation by combining multiple evidence & WordNet. In: 13th ACM international conference on multimedia, pp 706–715
Kennedy L, Slaney M, Weinberger K (2009) Reliable tags using image similarity: mining specificity and expertise from large-scale multimedia databases. In: 17th ACM international conference on multimedia, pp 17–24
Lee S, De Neve W, Ro YM (2010) Tag refinement in an image Folksonomy using visual similarity and tag co-occurrence statistics. Signal Process 25(10):761–773
Google Scholar
Li X, Snoek CGM, Worring M (2009) Learning social tag relevance by neighbor voting. IEEE Trans Multimedia 11(7):1310–1322
Article Google Scholar
Li X, Snoek CGM, Worring M (2010) Unsupervised multi-feature tag relevance learning for social image retrieval. In: ACM international conference on image and video retrieval (CIVR), pp 10–17
Lindstaedt S, Morzinger R, Sorschag R, Pammer V, Thallinger G (2009) Automatic image annotation using visual content and Folksonomies. Multimedia Tools and Applications 42(1):97–113
Article Google Scholar
Liu D, Hua XS, Yan L, Wang M, Zhang HJ (2009) Tag ranking. In: 18th international conference on world wide web (WWW), pp 351–360
Liu D, Wang M, Yang L, Hua XS, Zhang HJ (2009) Tag quality improvement for social images. In: IEEE international conference on multimedia & expo (ICME), pp 350–353
Manjunath B, Salembier P, Sikora T (2003) Introduction to MPEG-7: multimedia content description interface. Wiley, New Jersey
Google Scholar
OECD (2007) OECD study on the participative web: user generated content. http://www.oecd.org/dataoecd/57/14/38393115.pdf. Accessed 24 Aug 2012
PlanetTech (2012) Facebook reveals staggering new stats. http://www.planettechnews.com/business/item1094. Accessed 24 Aug 2012
Singh K, Ma M, Park D, An S (2005) Image indexing based On MPEG-7 scalable color descriptor. Key Eng Mater 277:375–382
Article Google Scholar
Sun A, Bhowmick SS (2010) Quantifying tag representativeness of visual content of social images. In: 18th ACM international conference on multimedia, pp 471–480
Vander Wal T (2007) Folksonomy coinage and definition. http://www.vanderwal.net/folksonomy.html. Accessed 24 Aug 2012
van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596
Article Google Scholar
Wang X, Yang M, Cour T, Zhu S, Yu K, and Han TX (2011) Contextual weighting for vocabulary tree based image retrieval. In: IEEE international conference on computer vision, pp 6–13
Wu L, Yang L, Yu N, Hua XS (2009) Learning to tag. In: 18th international conference on world wide web (WWW), pp 361–370
Zhuang J, Hoi SCH (2011) A two-view learning approach for image tag ranking. In: ACM international conference on web search and data mining, pp 625–634

Download references

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2012K2A1A2033054).

Author information

Authors and Affiliations

Image and Video Systems Lab, Korea Advanced Institute of Science and Technology, (KAIST), Daejeon, South Korea
Sihyoung Lee, Wesley De Neve & Yong Man Ro
Multimedia Lab, Ghent University - iMinds, Ghent, Belgium
Wesley De Neve

Authors

Sihyoung Lee
View author publications
You can also search for this author in PubMed Google Scholar
Wesley De Neve
View author publications
You can also search for this author in PubMed Google Scholar
Yong Man Ro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Man Ro.

Appendix

This appendix details the derivation of the difference in accuracy of visual search over random sampling. To that end, given a seed image I, we make a distinction between a tag w ₁ relevant to the content of I and a tag w ₂ not relevant to the content of I.

Difference in search accuracy for w ₁ We make use of $V_{I,w_1}(k)$ to represent the number of images relevant to w ₁ in the set of k visual neighbors of I. We assume that the value of $V_{I,w_1}(k)$ is (1) upper-bounded by the number of images relevant to w ₁ when making use of perfectly working visual search and (2) lower-bounded by the number of images relevant to w ₁ when making use of random sampling. This is conceptually illustrated by Fig. 6.

When visual search works perfectly, $V_{I,w_1}(k)$ increases linearly from zero to $|R_{w_1}|$ for k varying from zero to $|R_{w_1}|$. Indeed, all images in the set of visual neighbors belong to $|R_{w_1}|$. For $k>|R_{w_1}|$ , $V_{I,w_1}(k)=|R_{w_1}|$ because Φ only contains $|R_{w_1}|$ images related to w ₁. This is denoted in Fig. 6 by “ideal”. When making use of random sampling, we assume that $V_{I,w_1}(k)$ increases linearly and that all images of $R_{w_1}$ can only be found in the set of visual neighbors when this set is identical to Φ (this is, when k is equal to |Φ|). This is denoted in Fig. 6 by “random”. In practice, we also assume that $V_{I,w_1}(k)$ increases linearly until the value of $V_{I,w_1}(k)$ is equal to $|R_{w_1}|$. This is denoted in Fig. 6 by “real”. When visual search is effective, the dashed line will be close to “ideal”. Otherwise, when visual search is not effective, the dashed line will be close to “random”. In Fig. 6, k′ represents the minimal value of k for which all images of $R_{w_1}$ can be found in the set of visual neighbors of I.

In general, given a tag w ₁, the accuracy of visual search $A_{I,w_{1},k}$ can be written as $V_{I,w_1}(k)/k$. Given the above observations made for $V_{I,w_1}(k)$, $A_{I,w_{1},k}$ can also be expressed as follows:

$$ \label{eq:15} A_{I,w_{1},k} = \left\{ \begin{array}{ll} \dfrac{|R_{w_1}|}{k'}, & {k \leq k'}\\\\[-9pt] \dfrac{|R_{w_1}|}{k}, & {k > k'}.\end{array} \right. $$

(15)

The difference in accuracy of visual search over random sampling for w ₁ can then be expressed as follows:

$$ \label{eq:16} \epsilon_{I,w_{1},k} = \left\{ \begin{array}{ll} \dfrac{|R_{w_1}|}{k'}-P(R_{w_1}), & {k \leq k'}\\\\[-9pt] \dfrac{|R_{w_1}|}{k}-P(R_{w_1}), & {k > k'}.\end{array} \right. $$

(16)

Difference in search accuracy for w ₂ We make use of $V_{I,w_2}(k)$ to represent the number of images relevant to w ₂ in the set of k visual neighbors of I. Further, we assume that the value of $V_{I,w_2}(k)$ is (1) lower-bounded by the number of images relevant to w ₂ when visual search works perfectly and (2) upper-bounded by the number of images relevant to w ₂ when making use of random sampling. This is conceptually illustrated by Fig. 7.

When visual search works perfectly (in this case, when visual search finds all images relevant to I in Φ), then the images in $R_{w_2}$ should not be among the visual neighbors of I when k ≤ |R _I|, where R _I represents the set of images relevant to I. Here, we assume that images are relevant to each other when they have semantic concepts in common (for the sake of simplicity, we also assume that images relevant to I are not relevant to w ₂). However, for k > |R _I|, the set of visual neighbors of I will start to contain images belonging to $R_{w_2}$. This is denoted in Fig. 7 by “ideal”. When making use of random sampling, we assume that the number of images of $R_{w_2}$ in the set of visual neighbors increases linearly when k varies from zero to |Φ|. This is denoted in Fig. 7 by “random”. In practice, we are able to find a k′ for which we can start to see images of $R_{w_2}$ in the set of visual neighbors. This is denoted in Fig. 7 by means of “real”. In practice, we also assume that the number of images of $R_{w_2}$ in the set of visual neighbors increases linearly. The accuracy of visual search for w ₂, $A_{I,w_2,k}$, is calculated by dividing $V_{I,w_2}(k)$ by k:

$$ \label{eq:17} A_{I,w_2,k} = \left\{ \begin{array}{ll} 0, & {k \leq k'}\\\\[-9pt] \dfrac{|R_{w_2}|-|R_{w_2}| \cdot \frac{k'}{k}}{|\Phi|-k'}, & {k > k'}.\end{array} \right. $$

(17)

The difference in accuracy of visual search over random sampling for w ₂ can then be expressed as follows:

$$ \label{eq:18} \epsilon_{I,w_2,k} = \left\{ \begin{array}{ll} -P(R_{w_2}), & {k \leq k'}\\\\[-9pt] \dfrac{|R_{w_2}|-|R_{w_2}| \cdot \frac{k'}{k}}{|\Phi|-k'}-P(R_{w_2}), & {k > k'}.\end{array} \right. $$

(18)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, S., De Neve, W. & Ro, Y.M. Visually weighted neighbor voting for image tag relevance learning. Multimed Tools Appl 72, 1363–1386 (2014). https://doi.org/10.1007/s11042-013-1439-3

Download citation

Published: 16 April 2013
Issue Date: September 2014
DOI: https://doi.org/10.1007/s11042-013-1439-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visually weighted neighbor voting for image tag relevance learning

Abstract

Access this article

Similar content being viewed by others

Social tag relevance learning via ranking-oriented neighbor voting

Simultaneous Image Clustering, Classification and Annotation for Tourism Recommendation

Tag relevance fusion for social image retrieval

Notes

References

Acknowledgements