Skip to main content
Log in

Image annotation: the effects of content, lexicon and annotation method

  • Regular Paper
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

Image annotation is the process of assigning metadata to images, allowing effective retrieval by text-based search techniques. Despite the lots of efforts in automatic multimedia analysis, automatic semantic annotation of multimedia is still inefficient due to the problems in modeling high-level semantic terms. In this paper, we examine the factors affecting the quality of annotations collected through crowdsourcing platforms. An image dataset was manually annotated utilizing: (1) a vocabulary consists of preselected set of keywords, (2) an hierarchical vocabulary and (3) free keywords. The results show that the annotation quality is affected by the image content itself and the used lexicon. As we expected while annotation using the hierarchical vocabulary is more representative, the use of free keywords leads to increased invalid annotation. Finally, it is shown that images requiring annotations that are not directly related to their content (i.e., annotation using abstract concepts) lead to accrue annotator inconsistency revealing in that way the difficulty in annotating such kind of images is not limited to automatic annotation, but it is a generic problem of annotation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. “The History of Commandaria: Digital Journeys Back to Time”, Project funded by the Cyprus Research Promotion Foundation (CRPF) under the Contract ANTHRO/0308(BE)/04.

References

  1. Tyagi V (2017) Content-based image retrieval techniques: a review. Springer, Singapore, pp 29–48

    Google Scholar 

  2. Zhu L, Shen J, Xie L, Cheng Z (2017) Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans Knowl Data Eng 29(2):472–486

    Google Scholar 

  3. Nazir A, Ashraf R, Hamdani T, Ali N (2018) Content based image retrieval system by using HSV color histogram, discrete wavelet transform and edge histogram descriptor. In: 2018 international conference on computing, mathematics and engineering technologies (iCoMET), pp 1–6

  4. Li A, Sun J, Ng JY, Yu R, Morariu VI, Davis LS (2017) Generating holistic 3D scene abstractions for text-based image retrieval. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1942–1950

  5. Dutta A, Verma Y, Jawahar CV (2018) Automatic image annotation: the quirks and what works. Multimed Tools Appl 77(24):31991–32011

    Google Scholar 

  6. Nguyen DT, Hua B, Yu L, Yeung S (2018) A robust 3D–2D interactive tool for scene segmentation and annotation. IEEE Trans Vis Comput Graph 24(12):3005–3018

    Google Scholar 

  7. Yang CM, Choo Y, Park S (2018) Semi-automatic image and video annotation system for generating ground truth information. In: 2018 International conference on information networking (ICOIN), pp 821–824

  8. Dutta A, Zisserman A (2019) The VIA annotation software for images, audio and video. In: Proceedings of the 27th ACM international conference on multimedia, MM ’19, Nice, France. ACM, New York, NY. https://doi.org/10.1145/3343031.3350535

  9. Cheng Q, Zhang Q, Fu P, Tu C, Li S (2018) A survey and analysis on automatic image annotation. Pattern Recognit 79:242–259

    Google Scholar 

  10. Ma Y, Liu Y, Xie Q (2019) CNN-feature based automatic image annotation method. Multimed Tools Appl 78(3):3767–3780

    Google Scholar 

  11. Jin C, Sun QM, Jin SW (2019) A hybrid automatic image annotation approach. Multimed Tools Appl 78(9):11815–11834

    Google Scholar 

  12. Zhang D, Islam MM, Lu G (2012) A review on automatic image annotation techniques. Pattern Recognit 45:346–362

    Google Scholar 

  13. Zhang R, Zhang Z, Li M, Zhang HJ (2006) A probabilistic semantic model for image annotation and multi-modal image retrieval. Multimed Syst 12:27–33

    Google Scholar 

  14. Kwasnicka H, Paradowski M (2010) Machine learning methods in automatic image annotation. In: Advances in machine learning II. Studies in computational intelligence, vol 263, pp 387–411

  15. Wigness M, Draper BA, Beveridge JR (2018) Efficient label collection for image datasets via hierarchical clustering. Int J Comput Vis 126(1):59–85

    MathSciNet  Google Scholar 

  16. Hong S, Choi J, Feyereisl J, Han B, Davis LS (2016) Joint image clustering and labeling by matrix factorization. IEEE Trans Pattern Anal Mach Intell 38(7):1411–1424

    Google Scholar 

  17. Glowacz A (2018) Acoustic-based fault diagnosis of commutator motor. Electronics 7(11):299

    Google Scholar 

  18. Glowacz A (2019) Fault diagnosis of single-phase induction motor based on acoustic signals. Mech Syst Signal Process 117:65–80

    Google Scholar 

  19. Huang Y, Yang H, Qi X, Malekian R, Pfeiffer O, Li Z (2018) A novel selection method of seismic attributes based on gray relational degree and support vector machine. PLoS ONE 13(2):1–16

    Google Scholar 

  20. dit Leksir YL, Mansour M, Moussaoui A (2018) Localization of thermal anomalies in electrical equipment using infrared thermography and support vector machine. Infrared Phys Technol 89:120–128

    Google Scholar 

  21. Ristin M, Guillaumin M, Gall J, Gool LV (2016) Incremental learning of random forests for large-scale image classification. IEEE Trans Pattern Anal Mach Intell 38(3):490–503

    Google Scholar 

  22. Piramanayagam S, Schwartzkopf W, Koehler FW, Saber E (2016) Classification of remote sensed images using random forests and deep learning framework. In: Bruzzone L, Bovolo F (eds) Image and signal processing for remote sensing XXII, vol 10004. SPIE, pp 205–212. https://doi.org/10.1117/12.2243169

  23. Quintero R, Parra I, Lorenzo J, Fernández-Llorca D, Sotelo MA (2017) Pedestrian intention recognition by means of a hidden Markov model and body language. In: 2017 IEEE 20th international conference on intelligent transportation systems (ITSC), pp 1–7

  24. Xie F, Fan H, Li Y, Jiang Z, Meng R, Bovik A (2017) Melanoma classification on dermoscopy images using a neural network ensemble model. IEEE Trans Med Imaging 36(3):849–858

    Google Scholar 

  25. Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Google Scholar 

  26. Perina A, Mohammadi S, Jojic N, Murino V (2017) Summarization and classification of wearable camera streams by learning the distributions over deep features of out-of-sample image sequences. In: The IEEE international conference on computer vision (ICCV)

  27. Maggiori E, Tarabalka Y, Charpiat G, Alliez P (2017) Convolutional neural networks for large-scale remote-sensing image classification. IEEE Trans Geosci Remote Sens 55(2):645–657

    Google Scholar 

  28. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40:834–848

    Google Scholar 

  29. Wang J, Yang Y, Mao J, Huang Z, Huang C, Xu W (2016) CNN-RNN: a unified framework for multi-label image classification. In: The IEEE conference on computer vision and pattern recognition (CVPR)

  30. Jing X, Wu F, Li Z, Hu R, Zhang D (2016) Multi-label dictionary learning for image annotation. IEEE Trans Image Process 25(6):2712–2725

    MathSciNet  MATH  Google Scholar 

  31. Penna A, Mohammadi S, Jojic N, Murino V (2017) Summarization and classification of wearable camera streams by learning the distributions over deep features of out-of-sample image sequences. In: 2017 IEEE international conference on computer vision (ICCV), pp 4336–4344

  32. Heidorn PB (1999) Image retrieval as linguistic and nonlinguistic visual model matching. Libr Trends 48(2):303–325

    Google Scholar 

  33. Hare JS, Lewis PH, Esner PGB, Sandom CJ (2006) Mind the gap: another look at the problem of the semantic gap in image retrieval. In: Proceedings of multimedia content analysis, management and retrieval 2006 SPIE, San Jose, California, USA

  34. Theodosiou Z, Kasapi C, Tsapatsoulis N (2012) Semantic gap between people: an experimental investigation based on image annotation. In: Seventh international workshop on semantic and social media adaptation and personalization (SMAP), Luxembourg, pp 73–77

  35. Kovashka A, Russakovsky O, Fei-Fei L, Grauman K (2016) Crowdsourcing in computer vision. Found Trends Comput Graph Vis 10(3):177–243

    Google Scholar 

  36. Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. In: Proceedings of European conference on computer vision, Marseille, France, pp 316–329

  37. Hanbury A (2008) A survey of methods for image annotation. J Vis Lang Comput 19(5):617–627

    Google Scholar 

  38. Gulati P, Yadav M (2019) A novel approach for extracting pertinent keywords for web image annotation using semantic distance and euclidean distance. In: Hoda MN, Chauhan N, Quadri SMK, Srivastava PR (eds) Software engineering. Springer, Singapore, pp 173–183

    Google Scholar 

  39. Matusiak KK (2006) Towards user-centered indexing in digital image collections. OCLC Syst Serv 22(4):283–298

    Google Scholar 

  40. Joachims T, Granka L, Pang B, Hembrooke H, Gay G (2005) Accurately interpreting clickthrough data as implicit feedback. In: Proceedings of the 28th annual international ACM SIGIR conference, Salvador, Brazil, pp 154–161

  41. Macdonald C, Ounis I (2009) Usefulness of quality clickthrough data for training. In: Proceedings of the 2009 workshop on web search click data, Barcelona, Spain, pp 75–79

  42. Tsikrika T, Diou C, De Vries AP, Delopoulos A (2009) Image annotation using clickthrough data. In: Proceedings of the 8th international conference on image and video retrieval, Santorini, Greece, pp 1–8

  43. Kittur A, Kraut RE (2008) Harnessing the wisdom of crowds in Wikipedia: quality through coordination. In: Proceedings of the 2008 ACM conference on computer supported cooperative work, San Diego, CA, USA, pp 37–46

  44. Theodosiou Z, Tsapatsoulis N (2011) Crowdsourcing annotation: modelling keywords using low level features. In: Proceedings of the 5th international conference on internet multimedia systems architecture and application, Bangalore, India

  45. Chen KT, Wu CC, Chang YC, Lei CL (2009) A crowdsourceable QoE evaluation framework for multimedia content. In: Proceedings of the 17th ACM international conference on multimedia, Beijing, China, pp 491–500

  46. Brants T (2000) Inter-annotator agreement for a German newspaper corpus. In: Proceedings of the 2nd international conference on language resources and evaluation, Athens, Greece, pp 1–5

  47. Kilgarriff A (1998) Gold standard datasets for evaluating word sense disambiguation programs. Comput Speech Lang 12(3):453–472

    Google Scholar 

  48. Howe J (2008) Crowdsourcing: why the power of the crowd is driving the future of business. Crown Business, New York

    Google Scholar 

  49. Ghezzi A, Gabelloni D, Martini A, Natalicchio A (2018) Crowdsourcing: a review and suggestions for future research. Int J Manag Rev 20(2):343–363

    Google Scholar 

  50. Welinder P, Perona P (2010) Online crowdsourcing: rating annotators and obtaining cost effective labels. In: Proceedings of IEEE conference on computer vision and pattern recognition, San Francisco, CA, USA, pp 25–32

  51. Howe J (2006) The rise of crowdsourcing. Wired Mag 14(6):176–183

    Google Scholar 

  52. Brabham D (2008) Crowdsourcing as a model for problem solving: an introduction and cases. Convergence 14(1):75–90

    Google Scholar 

  53. Brawley AM, Pury CLS (2016) Work experiences on mturk: job satisfaction, turnover, and information sharing. Comput Hum Behav 54:531–546

    Google Scholar 

  54. Fowler F Jr (2014) Survey research methods, 5th edn. SAGE Publications Inc, Thousand Oaks

    Google Scholar 

  55. Allahbakhsh M, Benatallah B, Ignjatovic A, Motahari-Nezhad HR, Bertino E, Dustdar S (2013) Quality control in crowdsourcing systems: issues and directions. IEEE Internet Comput 17(2):76–81

    Google Scholar 

  56. McCredie MN, Morey LC (2018) Who are the turkers? A characterization of mturk workers using the personality assessment inventory. Assessment 26:759–766

    Google Scholar 

  57. Lovett M, Bajaba S, Lovett M, Simmering MJ (2017) Data quality from crowdsourced surveys: a mixed method inquiry into perceptions of Amazon’s mechanical turk masters. Appl Psychol 67(2):339–366

    Google Scholar 

  58. Snow R, O’Connor B, Jurafsky D, Ng A (2008) Cheap and fast but is it good evaluating nonexpert annotations for natural language tasks. In: Proceedings of the conference on empirical methods in natural language processing, Honolulu, HI, USA, pp 254–263

  59. Raykar V, Zhao S, Yu L, Jerebko A, Florin C, Valadez G, Bogoni L, Moy L (2009) Supervised learning from multiple experts: whom to trust when everyone lies a bit. In: Proceedings of the 26th annual international conference on machine learning, Montreal, Canada, pp 889–896

  60. Smyth P, Fayyad UM, Burl M, Perona P, Baldi P (1995) Inferring ground truth from subjective labeling of venus images. Adv Neural Inf Process Syst 7:1085–1092

    Google Scholar 

  61. Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, Las Vegas, NV, USA, pp 614–622

  62. Ahn LV, Maurer B, McMillen C, Abraham D, Blum M (2008) Recaptcha: human-based character recognition via web security measures. Science 321(5895):1465–1468

    MathSciNet  MATH  Google Scholar 

  63. Whitehill J, Ruvolo P, Bergsma T Wu J, Movellan J (2009) Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Proceedings of the 23rd annual conference on neural information processing systems, Vancouver, Canada, pp 2035–2043

  64. Vijayanarasimhan S, Grauman K (2009) What’s it going to cost you? Predicting effort vs. informativeness for multi-label image annotations. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, Miami, FL, USA, pp 2262–2269

  65. Aroyo L, Welty C (2015) Truth is a lie: crowd truth and the seven myths of human annotation. AI Mag 36(1):15–24

    Google Scholar 

  66. Artstein R (2017) Inter-annotator agreement. In: Ide N, Pustejovsky J (eds) Handbook of linguistic annotation. Springer, Dordrecht

    Google Scholar 

  67. Callison-Burch C (2009) Fast, cheap, and creative: evaluating translation quality using Amazon’s mechanical turk. In: Proceedings of conference on empirical methods in natural language processing, Singapore, pp 286–295

  68. Nowak S, Ruger S (2010) How reliable are annotations via crowdsourcing a study about inter-annotator agreement for multi-label image annotation. In: Proceedings of the international conference on multimedia information retrieval, Philadelphia, PA, USA, pp 557–566

  69. Yadav P, Jezek E, Bouillon P, Callahan T, Bada M, Hunter L, Cohen KB (2017) Semantic relations in compound nouns: perspectives from inter-annotator agreement. Stud Health Technol Inform 245:644–648

    Google Scholar 

  70. https://commandaria.cut.ac.cy//

  71. Papadopoulos K, Tsapatsoulis N, Lanitis A, Kounoudes A (2008) The history of commandaria: digital journeys back to time. In: Proceedings of the 14th international conference on virtual systems and multimedia, Limassol, Cyprus

  72. Cohen J (1960) A coefficient of agreement for nomimal scales. Educ Phsychol Meas 20(1):37–46

    Google Scholar 

  73. Landis JR, Koch GK (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–174

    MATH  Google Scholar 

  74. Randolph JJ (2005) Free-marginal multirater kappa: an alternative to Fleiss’ fixed-marginal multirater kappa. In: Joensuu University learning and instruction symposium, Joensuu, Finland

  75. Cowles M, Davis C (1982) On the origins of the.05 level of statistical significance. Am Psychol 37(5):553–558

    Google Scholar 

  76. Fujisawa S (2007) Automatic creation and enhancement of metadata for cultural heritage. In: Bulletin of IEEE technical committee on digital libraries (TCDL)

  77. Randolph JJ (2008) Online kappa calculator. http://justusrandolph.net/kappa/. Retrieved 5 Apr 2019

Download references

Acknowledgements

This work has been partly supported by the project that has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement No. 739578 (RISE-Call: H2020-WIDE-SPREAD-01-2016-2017-TeamingPhase2) and the Government of the Republic of Cyprus through the Directorate General for European Programmes, Coordination and Development.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zenonas Theodosiou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Theodosiou, Z., Tsapatsoulis, N. Image annotation: the effects of content, lexicon and annotation method. Int J Multimed Info Retr 9, 191–203 (2020). https://doi.org/10.1007/s13735-020-00193-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13735-020-00193-z

Keywords

Navigation