A Study into Annotation Ranking Metrics in Community Contributed Image Corpora

  • Mark Hughes
  • Gareth J. F. Jones
  • Noel E. O’ConnorEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8382)


Community contributed datasets are becoming increasing common in automated image annotation systems. One important issue with community image data is that there is no guarantee that the associated metadata is relevant. A method is required that can accurately rank the semantic relevance of community annotations. This should enable the extracting of relevant subsets from potentially noisy collections of these annotations. Having relevant, non-heterogeneous tags assigned to images should improve community image retrieval systems, such as Flickr, which are based on text retrieval methods. In the literature, the current state of the art approach to ranking the semantic relevance of Flickr tags is based on the widely used tf-idf metric. In the case of datasets containing landmark images, however, this metric is inefficient and can be improved upon. In this paper, we present a landmark recognition framework, that provides end-to-end automated recognition and annotation. In our study into automated annotation, we evaluate 5 alternate approaches to tf-idf to rank tag relevance in community contributed landmark image corpora. We carry out a thorough evaluation of each of these ranking metrics and results of this evaluation demonstrate that four of these proposed techniques outperform the current commonly-used tf-idf approach for this task. Our best performing evaluated approach achieves a significant F-Measure increase of .19 over tf-idf.


Image annotation Landmark recognition Tag relevance 


  1. 1.
    Kennedy, L., Naaman, M., Ahern, S., Nair, R., Rattenbury, T.: How flickr helps us make sense of the world: context and content in community-contributed media collections. In: MULTIMEDIA ’07: Proceedings of the 15th international conference on Multimedia, pp. 631–640 (2007)Google Scholar
  2. 2.
    Kennedy, L., Naaman, M.: Generating diverse and representative image search results for landmarks. In: WWW ’08: Proceeding of the 17th international conference on World Wide Web, pp. 297–306 (2008)Google Scholar
  3. 3.
    Ahern, S., Naaman, M., Nair, R., Yang, J.: World explorer: visualizing aggregate data from unstructured text in geo-referenced collections. In: Proceedings of the Seventh ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 1–10 (2007)Google Scholar
  4. 4.
    Xirong, L., Snoek, C., Worring, M.: Annotating images by harnessing worldwide user-tagged photos. In: Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3717–3720 (2009)Google Scholar
  5. 5.
    Mahapatra, A., Wan, X., Tian, Y., Srivastava, J.: Augmenting image processing with social tag mining for landmark recognition. In: Lee, K.-T., Tsai, W.-H., Liao, H.-Y.M., Chen, T., Hsieh, J.-W., Tseng, C.-C. (eds.) MMM 2011 Part I. LNCS, vol. 6523, pp. 273–283. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  6. 6.
    Sigurbornsson, B., Van Zwol, R.: Flickr tag recommendation based on collective knowledge. In: WWW ’08: Proceeding of the 17th International Conference on World Wide Web, pp. 327–336 (2008)Google Scholar
  7. 7.
    Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2161–2168 (2006)Google Scholar
  9. 9.
    Sivic, J., Zisserman, A.: DVideo Google: a text retrieval approach to object matching in videos. In: Ninth IEEE International Conference on Computer Vision 2003, Proceedings, pp. 1470–1477 (2003)Google Scholar
  10. 10.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)CrossRefGoogle Scholar
  11. 11.
    Girardin, F., Blat, J.: Place this photo on a map: a study of explicit disclosure of location information. In: UbiComp (2007)Google Scholar
  12. 12.
    Hollenstein, L.: Capturing vernacular geography from georeferenced tags. Masters thesis, University of Zurich (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Mark Hughes
    • 1
  • Gareth J. F. Jones
    • 2
  • Noel E. O’Connor
    • 1
    Email author
  1. 1.CLARITY: Centre for Sensor Web TechnologiesDublin City UniversityDublin 9Ireland
  2. 2.Centre for Next Generation LocalisationDublin City UniversityDublin 9Ireland

Personalised recommendations