A Picture Is Worth a Thousand Tags: Automatic Web Based Image Tag Expansion

  • Andrew Gilbert
  • Richard Bowden
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7725)


We present an approach to automatically expand the annotation of images using the internet as an additional information source. The novelty of the work is in the expansion of image tags by automatically introducing new unseen complex linguistic labels which are collected unsupervised from associated webpages. Taking a small subset of existing image tags, a web based search retrieves additional textual information. Both a textual bag of words model and a visual bag of words model are combined and symbolised for data mining. Association rule mining is then used to identify rules which relate words to visual contents. Unseen images that fit these rules are re-tagged. This approach allows a large number of additional annotations to be added to unseen images, on average 12.8 new tags per image, with an 87.2% true positive rate. Results are shown on two datasets including a new 2800 image annotation dataset of landmarks, the results include pictures of buildings being tagged with the architect, the year of construction and even events that have taken place there. This widens the tag annotation impact and their use in retrieval. This dataset is made available along with tags and the 1970 webpages and additional images which form the information corpus. In addition, results for a common state-of-the-art dataset MIRFlickr25000 are presented for comparison of the learning framework against previous works.


Association Rule Image Annotation Image Descriptor Sift Feature Photo Annotation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Tsai, D., Jing, Y., Liu, Y., Rowley, H., Ioffe, S.M., Rehg, J.: Large-scale image annotation using visual synset. In: Proc. of IEEE International Conference on Computer Vision, ICCV 2011 (2011)Google Scholar
  2. 2.
    Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.: Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    Grubinger, M., Clough, P., Müller, H., Deselaers, T.: The iapr tc-12 benchmark - a new evaluation resource for visual information systems. In: Proc. of ICLRE (2006)Google Scholar
  4. 4.
    Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2009 (2009)Google Scholar
  5. 5.
    Weston, J., Bengio, S., Usunier, N.: Large scale image annotation: learning to rank with joint word-image embeddings. Mach. Learn. 81, 21–35 (2010)CrossRefGoogle Scholar
  6. 6.
    Barnard, K., Duygulu, P., Forsyth, D., De Freitas, N., Blei, D.M., Jaz, K., Hofmann, T., Poggio, T., Shawe-taylor, J.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)zbMATHGoogle Scholar
  7. 7.
    Yakhnenko, O., Honavar, V.: Annotating images and image objects using a hierarchical dirichlet process model. In: Proceedings of the 9th International Workshop on Multimedia Data Mining, MDM 2008: held in Conjunction with the ACM SIGKDD 2008, pp. 1–7. ACM, New York (2008)CrossRefGoogle Scholar
  8. 8.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proc. of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2005), pp. 886–893 (2005)Google Scholar
  9. 9.
    Hertz, T., Bar-Hillel, A., Weinshall, D.: Learning distance functions for image retrieval. In: Proc. of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2004), vol. 2, pp. II-570–II-577 (2004)Google Scholar
  10. 10.
    Guillaumin, M., Mensink, T., Verbeek, J., Schmid, C.: Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In: Proc of IEEE International Conference on Computer Vision (ICCV 2009), pp. 309–316 (2009)Google Scholar
  11. 11.
    Makadia, A., Pavlovic, V., Kumar, S.: A New Baseline for Image Annotation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 316–329. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  12. 12.
    Wang, X.J., Zhang, L., Liu, M., Li, Y., Ma, W.Y.: Arista - image search to annotation on billions of web photos. In: Proc of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2010), pp. 2987–2994 (2010)Google Scholar
  13. 13.
    Cesa-Bianchi, N., Gentile, C., Zaniboni, L.: Incremental algorithms for hierarchical classification. J. Mach. Learn. Res. 7, 31–54 (2006)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Bi, W., Kwok, J.: Multi-label classification on tree- and dag-structured hierarchies. In: Getoor, L., Scheffer, T. (eds.) Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 17–24. ACM, New York (2011)Google Scholar
  15. 15.
    Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: VLDB 1994, Proceedings of 20th International Conference on Very Large Data Bases, pp. 487–499 (1994)Google Scholar
  16. 16.
    Gilbert, A., Bowden, R.: igroup: Weakly supervised image and video grouping. In: Proc. of International Conference on Computer Vision, ICCV 2011 (2011)Google Scholar
  17. 17.
    Lowe, D.: Distinctive Image Features from Scale-invariant Keypoints. Proc of International Jounral of Computer Vision (IJCV) 60, 91–110 (2004)CrossRefGoogle Scholar
  18. 18.
    Cai, H., Mikolajczyk, K., Matas, J.: Learning linear discriminant projections for dimensionality reduction of image descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence (2010)Google Scholar
  19. 19.
    Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proc. of the 1993 ACM SIGMOD International Conference on Management of Data SIGMOD 1993, pp. 207–216 (1993)Google Scholar
  20. 20.
    Huiskes, M., Lew, M.: The mir flickr retreieval evaluation. In: Proc of MIR (2008)Google Scholar
  21. 21.
    Oliva, A., Torralba, A.: Modelling the shape of the scene: a holistic representation of the spatial envelope. Proc of International Journal of Computer Vision, IJCV 2001 42(3), 145–175 (2001)zbMATHCrossRefGoogle Scholar
  22. 22.
    Nowak, S.: Overview of the Photo Annotation Task in ImageCLEF@ICPR. In: Ünay, D., Çataltepe, Z., Aksoy, S. (eds.) ICPR 2010. LNCS, vol. 6388, pp. 138–151. Springer, Heidelberg (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Andrew Gilbert
    • 1
  • Richard Bowden
    • 1
  1. 1.Centre for Vision, Speech and Signal ProcessingUniversity of SurreyGuildfordUK

Personalised recommendations