Evaluation of Image Annotation Using Amazon Mechanical Turk in ImageCLEF

Liebetrau, Judith; Nowak, Stefanie; Schneider, Sebastian

doi:10.1007/978-3-319-06755-1_20

Judith Liebetrau¹⁵,
Stefanie Nowak¹⁶ &
Sebastian Schneider¹⁵

Part of the book series: Cognitive Technologies ((COGTECH))

1449 Accesses

Abstract

With the increasing amount of digital information in the Web and on personal computers, the need for systems that are capable of automated indexing, searching, and organizing multimedia documents is incessantly growing. Automated systems have to retrieve information with high performance in order to be accepted by industry and end users. Multimedia retrieval systems are often evaluated on different test collections with different performance measures, which makes the comparison of retrieval performance impossible and limits the benefits of the approaches. Benchmarking campaigns counteract these tendencies and establish an objective comparison among the performance of different approaches by posing challenging tasks and by pushing the availability of test collections, topics, and performance measures. As part of the THESEUS research program, Fraunhofer IDMT organized the “Visual Concept Detection and Annotation Task” (VCDT) of the international benchmark ImageCLEF, with the goal of enabling the comparison of technologies developed within THESEUS CTC to international developments. While the test collection in 2009 was assessed with expert knowledge, the relevance assessments for the task have been acquired in a crowdsourcing approach since 2010 by using the platform of Amazon Mechanical Turk (MTurk). In this article the evaluation of THESEUS core technologies within ImageCLEF is explained in detail. A special focus lies on the acquisition of ground truth data using MTurk. Advantages and disadvantages of this approach are discussed and best practices are shared.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

O. Alonso, D.E. Rose, B. Stewart, Crowdsourcing for relevance evaluation. ACM SIGIR Forum 42(2), 9–15 (2008). http://doi.acm.org/10.1145/1480506.1480508
D. Chandler, A. Kapelner, Breaking monotony with meaning: motivation in crowdsourcing markets. Technical report, University of Chicago, 2010
Google Scholar
M. Everingham, L.J.V. Gool, C.K.I. Williams, J.M. Winn, A. Zisserman, The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010). http://dblp.uni-trier.de/db/journals/ijcv/ijcv88.html#EveringhamGWWZ10
A. Felstiner, Working the crowd: employment and labor law in the crowdsourcing industry. Berkeley J. Employ. Labor Law 32(1), 143–204 (2011)
Google Scholar
K. Fort, G. Adda, K.B. Cohen, Amazon mechanical turk: gold mine or coal mine? Comput. Linguist. 37(2), 413–420 (2011)
Article Google Scholar
J.J. Horton, R.J. Zeckhauser, Algorithmic wage negotiations: applications to paid crowdsourcing, in Proceedings of CrowdConf, San Francisco, California, 2010
Google Scholar
M.J. Huiskes, M.S. Lew, The MIR flickr retrieval evaluation, in Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval (MIR’08), Vancouver, ed. by M.S. Lew, A.D. Bimbo, E.M. Bakker (ACM, New York, 2008), pp. 39–43. http://dblp.uni-trier.de/db/conf/mir/mir2008.html#HuiskesL08
W. Mason, D.J. Watts, Financial incentives and the “performance of crowds”, in Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP’09), Paris (ACM, New York, 2009), pp. 77–85
Google Scholar
H. Müller, P. Clough, T. Deselaers, B. Caputo (eds.), ImageCLEF: Experimental Evaluation in Visual Information Retrieval. Volume 32 of The Information Retrieval Series (Springer, Berlin/Heidelberg/New York, 2010)
Google Scholar
S. Nowak, Evaluation methodologies for visual information retrieval and annotation. PhD thesis, Technische Universität Ilmenau, 2011
Google Scholar
S. Nowak, P. Dunker, Overview of the CLEF 2009 large-scale visual concept detection and annotation task, in Multilingual Information Access Evaluation II. Multimedia Experiments, ed. by C. Peters, B. Caputo, J. Gonzalo, G.J.F. Jones, J. Kalpathy-Cramer, H. Múller, T. Tsikrika. Volume 6242 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2010), pp. 94–109
Google Scholar
S. Nowak, S. Rüger, How reliable are annotations via crowdsourcing? A study about inter-annotator agreement for multi-label image annotation, in Proceedings of the International Conference on Multimedia Information Retrieval (MIR’10), Firenze (ACM, New York), 2010, pp. 557–566
Google Scholar
J.A. Russell, A circumplex model of affect. J. Personal. Soc. Psychol. 39(6), 1161–1178 (1980)
Article Google Scholar
A.F. Smeaton, P. Over, W. Kraaij, Evaluation campaigns and TRECVid, in Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, Santa Barbara, 26–27 Oct 2006 (ACM, New York, 2006), pp. 321–330
Google Scholar

Download references

Author information

Authors and Affiliations

Fraunhofer Institute for Digital Media Technology (IDMT), Ilmenau, Germany
Judith Liebetrau & Sebastian Schneider
Hella Aglaia Mobile Vision GmbH, Berlin, Germany
Stefanie Nowak

Authors

Judith Liebetrau
View author publications
You can also search for this author in PubMed Google Scholar
Stefanie Nowak
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Schneider
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Judith Liebetrau .

Editor information

Editors and Affiliations

Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI) GmbH, Saarbrücken, Germany
Wolfgang Wahlster
Fraunhofer Heinrich-Hertz-Institut, Berlin, Germany
Hans-Joachim Grallert
Empolis Information Management GmbH, Kaiserslautern, Germany
Stefan Wess
Corporate Technology, Siemens AG, München, Germany
Hermann Friedrich
Strategy Advisory, SAP Deutschland AG & Co. KG, Walldorf, Germany
Thomas Widenka

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Liebetrau, J., Nowak, S., Schneider, S. (2014). Evaluation of Image Annotation Using Amazon Mechanical Turk in ImageCLEF. In: Wahlster, W., Grallert, HJ., Wess, S., Friedrich, H., Widenka, T. (eds) Towards the Internet of Services: The THESEUS Research Program. Cognitive Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-06755-1_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-06755-1_20
Published: 02 July 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06754-4
Online ISBN: 978-3-319-06755-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics