Multimedia Tools and Applications

, Volume 49, Issue 1, pp 81–99 | Cite as

Automatic tag expansion using visual similarity for photo sharing websites

  • Sare Gul SevilEmail author
  • Onur Kucuktunc
  • Pinar Duygulu
  • Fazli Can


In this paper we present an automatic photo tag expansion method designed for photo sharing websites. The purpose of the method is to suggest tags that are relevant to the visual content of a given photo at upload time. Both textual and visual cues are used in the process of tag expansion. When a photo is to be uploaded, the system asks for a couple of initial tags from the user. The initial tags are used to retrieve relevant photos together with their tags. These photos are assumed to be potentially content related to the uploaded target photo. The tag sets of the relevant photos are used to form the candidate tag list, and visual similarities between the target photo and relevant photos are used to give weights to these candidate tags. Tags with the highest weights are suggested to the user. The method is applied on Flickr ( Results show that including visual information in the process of photo tagging increases accuracy with respect to text-based methods.


Tagging Photo-annotation Visual similarity Folksonomy Flickr 



We thank Muhammet Bastan for preparing MPEG-7 visual feature extractor, and all the users participated in the user-study. This research is partially supported by TUBITAK Career grant number 104E065.


  1. 1.
    Barnard K, Forsyth DA (2001) Learning the semantics of words and pictures. In: Proceedings of the international conference on computer vision, vol 2, pp 408–415Google Scholar
  2. 2.
    Barnard K, Duygulu P, de Freitas N, Forsyth DA, Blei D, Jordan M (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135zbMATHCrossRefGoogle Scholar
  3. 3.
    Blei D, Jordan MI (2003) Modeling annotated data. In: Proceedings of 26th annual international ACM SIGIR conference, Toronto, Canada, July 28–August, pp 127–134Google Scholar
  4. 4.
    Byde A, Wan H, Cayzer S (2007) Personalized tag recommendations via tagging and content-based similarity metrics. In: Proceedings of the international conference on weblogs and social media, Boulder, CO, USAGoogle Scholar
  5. 5.
    Carneiro G, Vasconcelos N (2005) Formulating semantic image annotation as a supervised learning problem. In: Proceedings of IEEE conference on computer vision and pattern recognition, vol 2, pp 163–168Google Scholar
  6. 6.
    Duygulu P, Barnard K, Freitas N, Forsyth DA (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Proceedings of 7th European conference on computer vision, vol 4, Copenhagen Denmark, 27 May–2 June, pp 97–112Google Scholar
  7. 7.
    Feng S, Manmatha R, Lavrenko V (2004) Multiple bernoulli relevance models for image and video annotation. In: Proceedings of international conference on computer vision and pattern recognition, vol 2, pp 1002–1009Google Scholar
  8. 8.
    Jaschke R, Marinho L, Hotho A, Schmidt-Thieme L, Stumme G (2008) Tag recommendations in social bookmarking systems. AI Commun 21(4):231–247Google Scholar
  9. 9.
    Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th annual international ACM SIGIR conference, Toronto, Canada, 28 July–1 August, pp 119–126Google Scholar
  10. 10.
    Jing Y, Baluja S (2008) VisualRank: applying pagerank to large-scale image search. IEEE Trans PAMI 30(11):1877–1890Google Scholar
  11. 11.
    Kucuktunc O, Sevil SG, Tosun AB, Zitouni H, Duygulu P, Can F (2008) Tag Suggestr: automatic photo tag expansion using visual information for photo sharing websites. In: Proceedings of 3rd international conference on semantic and digital media technologies (SAMT ’08), Koblenz, Germany, 3–5 December 2008. Lecture notes in computer science, vol 5392/2008. Springer, Berlin, pp 63–71Google Scholar
  12. 12.
    Lavrenko V, Manmatha R, Jeon J (2003) A model for learning the semantics of pictures. In: Proceedings of 17th annual conference on neural information processing systems, vol 16, pp 553–560Google Scholar
  13. 13.
    Lazarinis F (2007) Engineering and utilizing a stopword list in Greek web retrieval. JASIST 58(11):1645–1652CrossRefGoogle Scholar
  14. 14.
    Li J, Wang J (2003) Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans Pattern Anal Mach Intell 25(9):1075–1088CrossRefGoogle Scholar
  15. 15.
    Li X, Snoek CGM, Worring M (2009) Learning social tag relevance by neighbor voting. IEEE Trans Multimedia (in press)Google Scholar
  16. 16.
    Lindstaedt S, Mrzinger R, Sorschag R, Pammer V, Thallinger G (2009) Automatic image annotation using visual content and folksonomies. Multimedia Tools and Applications 42(1)Google Scholar
  17. 17.
    Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2)Google Scholar
  18. 18.
    Lux M, Marques O, Pitman A (2008) Using visual features to improve tag suggestions in image sharing sites. In: Proceedings of knowledge acquisition from the social web, Graz, AustriaGoogle Scholar
  19. 19.
    Marlow C, Naaman M, Boyd D, Davis M (2006) HT06, tagging paper, taxonomy, Flickr, academic article, to read. In: Proceedings of the 17th conference on hypertext and hypermedia, Odense, Denmark, 22–25 AugustGoogle Scholar
  20. 20.
    Maron O, Ratan AL (1998) Multiple-Instance learning for natural scene classification. In: Proceedings of the 15th international conference on machine learning, pp 341–349Google Scholar
  21. 21.
    Martinez JM: Overview of the MPEG-7 standard. ISO/IEC JTC1/SC29/WG11 N4031 (2001)Google Scholar
  22. 22.
    Mishne G (2008) AutoTag: a collaborative approach to automated tag assignment for weblog posts. In: Proceedings of the 15th international conference on world wide web (WWW ’08), Edinburgh, ScotlandGoogle Scholar
  23. 23.
    Monay F, Gatica-Perez D (2004) PLSA-based image auto-annotation: constraining the latent space. In: Proceedings of ACM international conference on multimedia, pp 348–351Google Scholar
  24. 24.
    Mori Y, Takahashi H, Oka R (1999) Image-to-word transformation based on dividing and vector quantizing images with words. In: Proceedings of 1st int. workshop on multimedia intelligent storage and retrieval managementGoogle Scholar
  25. 25.
    MPEG-7 XM Software (2001) Institute for integrated circuits. Technische Universität Munchen, GermanyGoogle Scholar
  26. 26.
    Pan JY, Yang HJ, Duygulu P, Faloutsos C (2004) Automatic image captioning. In: Proceedings of the 2004 IEEE international conference on multimedia and expo, vol 3, Taipei, Taiwan, June, pp 1987–1990Google Scholar
  27. 27.
    Quack T, Leibe B, Gool LV (2008) World-scale mining of objects and events from community photo collections. In: Proceedings of ACM international conference on image and video retrieval (CIVR ’08), Niagara Falls, Canada, 7–9 JulyGoogle Scholar
  28. 28.
    Rui Y, Huang T, Chang S (1999) Image retrieval: current techniques, promising directions, and open issues. J Vis Commun Image Represent 10(4):39–62CrossRefGoogle Scholar
  29. 29.
    Sigurbjrnsson B, Van Zwol R (2008) Flickr tag recommendation based on collective knowledge. In: Proceeding of the 17th international conference on world wide web (WWW ’08), Beijing, China, 21–25 April, pp 327–336Google Scholar
  30. 30.
    Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380CrossRefGoogle Scholar
  31. 31.
    Torralba A, Fergus R, Freeman WT (2008) 80 million tiny images: a large dataset for nonparametric object and scene recognition. IEEE Trans PAMI 30(11):1958–1970Google Scholar
  32. 32.
    Wang G, Hoiem D, Forsyth D (2009) Building text features for object image classification. In: Proceedings of 19th international conference on pattern recognitionGoogle Scholar
  33. 33.
    Wang C, Jing F, Zhang L, Zhang HJ (2008) Scalable search-based image annotation. Multimedia Syst 14(4):205–220CrossRefGoogle Scholar
  34. 34.
    Wang X, Zhang L, Jing F, Ma WY (2006) AnnoSearch: image auto-annotation by search. In: Proceedings of international conference on computer vision and pattern recognition (CVPR ’06), New York, USAGoogle Scholar
  35. 35.
    Wenyin L, Dumais S, Sun Y, Zhang H, Czerwinski M, Field B (2001) Semiautomatic image annotation. In: Proceedings of the 8th IFIP TC.13 conference on human-computer interaction (INTERACT ’01), pp 326–333Google Scholar
  36. 36.
    Wenyin L, Sun Y, Zhang H (2000) MiAlbum - a system for home photo managemet using the semi-automatic image annotation approach. In: Proceedings of the eighth ACM international conference on multimedia (MULTIMEDIA ’00), Marina del Rey, California, United States, pp 479–480Google Scholar
  37. 37.
    Xu Z, Fu Y, Mao J, Su D (2008) Towards the semantic web: collaborative tag suggestions. In: Proceedings of third international conference on internet and web applications and services, Athens, GreeceGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Sare Gul Sevil
    • 1
    Email author
  • Onur Kucuktunc
    • 1
  • Pinar Duygulu
    • 1
  • Fazli Can
    • 1
  1. 1.Department of Computer EngineeringBilkent UniversityAnkaraTurkey

Personalised recommendations