Advertisement

Automatic Image Annotation at ImageCLEF

  • Josiah WangEmail author
  • Andrew Gilbert
  • Bart Thomee
  • Mauricio Villegas
Chapter
Part of the The Information Retrieval Series book series (INRE, volume 41)

Abstract

Automatic image annotation is the task of automatically assigning some form of semantic label to images, such as words, phrases or sentences describing the objects, attributes, actions, and scenes depicted in the image. In this chapter, we present an overview of the various automatic image annotation tasks that were organized in conjunction with the ImageCLEF track at CLEF between 2009–2016. Throughout the 8 years, the image annotation tasks have evolved from annotating Flickr photos by learning from clean data to annotating web images by learning from large-scale noisy web data. The tasks are divided into three distinct phases, and this chapter will provide a discussion for each of these phases. We will also compare and contrast other related benchmarking challenges, and provide some insights into the future of automatic image annotation.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

The Concept Annotation, Localization and Sentence Generation task in ImageCLEF 2015 and 2016 were co-organized by the VisualSense (ViSen) consortium under the ERA-NET CHIST-ERA D2K 2011 Programme, jointly supported by UK EPSRC Grants EP/K019082/1 and EP/K01904X/1, French ANR Grant ANR-12-CHRI-0002-04 and Spanish MINECO Grant PCIN-2013-047.

References

  1. Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Zitnick CL, Parikh D (2015) VQA: visual question answering. In: Proceedings of the IEEE international conference on computer vision (ICCV). IEEE, Piscataway, pp 2425–2433.  https://doi.org/10.1109/ICCV.2015.279 Google Scholar
  2. Caesar H, Uijlings J, Ferrari V (2018) COCO-stuff: thing and stuff classes in context. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1209–1218. http://openaccess.thecvf.com/content_cvpr_2018/html/Caesar_COCO-Stuff_Thing_and_CVPR_2018_paper.html
  3. Chen X, Fang H, Lin T, Vedantam R, Gupta S, Dollár P, Zitnick CL (2015) Microsoft COCO captions: data collection and evaluation server. CoRR abs/1504.00325. http://arxiv.org/abs/ 1504.00325. 1504.00325.Google Scholar
  4. Clough P, Grubinger M, Deselaers T, Hanbury A, Müller H (2007) Overview of the ImageCLEF 2006 photographic retrieval and object annotation tasks. In: Peters C, Clough P, Gey FC, Karlgren J, Magnini B, Oard DW, de Rijke M, Stempfhuber M (eds) Evaluation of multilingual and multi-modal information retrieval: seventh workshop of the cross–language evaluation forum (CLEF 2006). Revised selected papers. Lecture notes in computer science (LNCS), vol 4730. Springer, Heidelberg, pp 223–256CrossRefGoogle Scholar
  5. Dang-Nguyen DT, Piras L, Riegler M, Boato G, Zhou L, Gurrin C (2017) Overview of ImageCLEFlifelog 2017: lifelog retrieval and summarization. In: Cappellato L, Ferro N, Goeuriot L, Mandl T (eds) CLEF 2017 working notes. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1866/
  6. Das A, Kottur S, Gupta K, Singh A, Yadav D, Moura JMF, Parikh D, Batra D (2017) Visual dialog. In: Proceedings of the IEEE conference on computer vision and pattern recognition, IEEE, Piscataway, pp 1080–1089.  https://doi.org/10.1109/CVPR.2017.121 Google Scholar
  7. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), IEEE, Piscataway, pp 248–255.  https://doi.org/10.1109/CVPR.2009.5206848 Google Scholar
  8. Denkowski M, Lavie A (2014) Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the ninth workshop on statistical machine translation. Association for computational linguistics, pp 376–380. https://doi.org/10.3115/v1/W14-3348
  9. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The PASCAL visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338. https://doi.org/10.1007/s11263-009-0275-4 CrossRefGoogle Scholar
  10. Everingham M, Eslami SMA, Van Gool L, Williams CKI, Winn J, Zisserman A (2015) The PASCAL visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136. https://doi.org/10.1007/s11263-014-0733-5 CrossRefGoogle Scholar
  11. Fellbaum C (ed) (1998) WordNet an electronic lexical database. MIT Press, CambridgezbMATHGoogle Scholar
  12. Gilbert A, Piras L, Wang J, Yan F, Dellandrea E, Gaizauskas R, Villegas M, Mikolajczyk K (2015) Overview of the ImageCLEF 2015 scalable image annotation, localization and sentence generation task. In: Cappellato L, Ferro N, Jones GJF, SanJuan E (eds) CLEF 2015 labs and workshops, Notebook papers. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1391/
  13. Gilbert A, Piras L, Wang J, Yan F, Ramisa A, Dellandrea E, Gaizauskas R, Villegas M, Mikolajczyk K (2016) Overview of the ImageCLEF 2016 scalable concept image annotation task. In: Balog K, Cappellato L, Ferro N, Macdonald C (eds) CLEF 2016 working notes. CEUR workshop proceedings (CEUR-WS.org), pp 254–278. ISSN 1613-0073. http://ceur-ws.org/Vol-1609/
  14. Goëau H, Bonnet P, Joly A, Boujemaa N, Barthelemy D, Molino JF, Birnbaum P, Mouysset E, Picard M (2011) The CLEF 2011 plant images classification task. In: Petras V, Forner P, Clough P, Ferro N (eds) CLEF 2011 working notes. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1177/
  15. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1026–1034.  https://doi.org/10.1109/ICCV.2015.123
  16. Huiskes MJ, Lew MS (2008) The MIR flickr retrieval evaluation. In: Proceedings of the ACM international conference on multimedia information retrieval, pp 39–43Google Scholar
  17. Huiskes MJ, Thomee B, Lew MS (2010) New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative. In: Proceedings of the ACM international conference on multimedia information retrieval, pp 527–536Google Scholar
  18. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems, vol 25. Curran Associates, pp 1097–1105Google Scholar
  19. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Proceedings of the European conference on computer vision (ECCV). Springer, Berlin, pp 740–755Google Scholar
  20. Müller H, Deselaers T, Deserno TM, Clough P, Kim E, Hersh WR (2007) Overview of the ImageCLEFmed 2006 medical retrieval and medical annotation tasks. In: Peters C, Clough P, Gey FC, Karlgren J, Magnini B, Oard DW, de Rijke M, Stempfhuber M (eds) Evaluation of multilingual and multi-modal information retrieval: seventh workshop of the cross–language evaluation forum (CLEF 2006). Revised selected papers. Lecture notes in computer science (LNCS), vol 4730. Springer, Heidelberg, pp 595–608CrossRefGoogle Scholar
  21. Müller H, Deselaers T, Deserno TM, Kalpathy-Cramer J, Kim E, Hersh WR (2008) Overview of the ImageCLEFmed 2007 medical retrieval and medical annotation tasks. In: Peters C, Jijkoun V, Mandl T, Müller H, Oard DW, Peñas A, Petras V, Santos D (eds) Advances in multilingual and multimodal information retrieval: eighth workshop of the cross–language evaluation forum (CLEF 2007). Revised selected papers. Lecture notes in computer science (LNCS), vol 5152. Springer, Heidelberg, pp 472–491CrossRefGoogle Scholar
  22. Müller H, Kalpathy-Cramer J, Kahn CE, Hatt W, Bedrick S, Hersh W (2009) Overview of the ImageCLEFmed 2008 medical image retrieval task. In: Peters C, Deselaers T, Ferro N, Gonzalo J, Jones GJF, Kurimo M, Mandl T, Peñas A (eds) Evaluating systems for multilingual and multimodal information access: ninth workshop of the cross–language evaluation forum (CLEF 2008). Revised selected papers. Lecture notes in computer science (LNCS), vol 5706. Springer, Heidelberg, pp 512–522CrossRefGoogle Scholar
  23. Nowak S, Dunker P (2010) Overview of the CLEF 2009 large-scale visual concept detection and annotation task. In: Peters C, Tsikrika T, Müller H, Kalpathy-Cramer J, Jones GJF, Gonzalo J, Caputo B (eds) Multilingual information access evaluation vol. II multimedia experiments – tenth workshop of the cross–language evaluation forum (CLEF 2009). Revised selected papers. Lecture notes in computer science (LNCS). Springer, Heidelberg, pp 94–109Google Scholar
  24. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis. 42(3):145–175. https://doi.org/10.1023/A:1011139631724 CrossRefGoogle Scholar
  25. Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics (ACL), pp 311–318Google Scholar
  26. Reshma IA, Ullah MZ, Aono M (2014) KDEVIR at ImageCLEF 2014 scalable concept image annotation task: ontology based automatic image annotation. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 labs and workshops, Notebook papers, CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1180/
  27. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252MathSciNetCrossRefGoogle Scholar
  28. Sahbi H (2013) CNRS - TELECOM ParisTech at ImageCLEF 2013 scalable concept image annotation task: winning annotations with context dependent SVMs. In: Forner P, Navigli R, Tufis D, Ferro N (eds) CLEF 2013 evaluation labs and workshop, Online working notes, CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1179/
  29. Thomee B, Popescu A (2012) Overview of the ImageCLEF 2012 flickr photo annotation and retrieval task. In: Forner P, Karlgren J, Womser-Hacker C, Ferro N (eds) CLEF 2012 working notes. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1178/
  30. van de Sande KE, Gevers T, Snoek CG (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32:1582–1596.  https://doi.org/10.1109/TPAMI.2009.154 CrossRefGoogle Scholar
  31. Villegas M, Paredes R (2012a) Image-text dataset generation for image annotation and retrieval. In: Berlanga R, Rosso P (eds) II Congreso Español de Recuperación de Información, CERI 2012, Universidad Politécnica de Valencia, Valencia, pp 115–120Google Scholar
  32. Villegas M, Paredes R (2012b) Overview of the ImageCLEF 2012 scalable web image annotation task. In: Forner P, Karlgren J, Womser-Hacker C, Ferro N (eds) CLEF 2012 working notes. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1178/
  33. Villegas M, Paredes R (2014) Overview of the ImageCLEF 2014 scalable concept image annotation task. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 labs and workshops, Notebook papers, CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1180/, pp 308–328
  34. Villegas M, Paredes R, Thomee B (2013) Overview of the ImageCLEF 2013 scalable concept image annotation subtask. In: Forner P, Navigli R, Tufis D, Ferro N (eds) CLEF 2013 working notes. CEUR workshop proceedings (CEUR-WS.org). ISSN 1613-0073. http://ceur-ws.org/Vol-1179/
  35. Wang J, Gaizauskas R (2015) Generating image descriptions with gold standard visual inputs: motivation, evaluation and baselines. In: Proceedings of the 15th European workshop on natural language generation (ENLG). Association for computational linguistics, pp 117–126Google Scholar
  36. Wang J, Yan F, Aker A, Gaizauskas R (2014) A poodle or a dog? Evaluating automatic image annotation using human descriptions at different levels of granularity. In: Proceedings of the third workshop on vision and language, Dublin City University and the association for computational linguistics, pp 38–45Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Josiah Wang
    • 1
    Email author
  • Andrew Gilbert
    • 2
  • Bart Thomee
    • 3
  • Mauricio Villegas
    • 4
  1. 1.Department of ComputingImperial College LondonLondonUK
  2. 2.CVSSPUniversity of SurreyGuildfordUK
  3. 3.GoogleSan BrunoUSA
  4. 4.omni:usBerlinGermany

Personalised recommendations