Tag-based algorithms can predict human ratings of which objects a picture shows

Pammer, Viktoria; Kump, Barbara; Lindstaedt, Stefanie

doi:10.1007/s11042-011-0761-x

Tag-based algorithms can predict human ratings of which objects a picture shows

Published: 23 February 2011

Volume 59, pages 441–462, (2012)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Viktoria Pammer¹,
Barbara Kump² &
Stefanie Lindstaedt¹

144 Accesses
1 Citation
Explore all metrics

Abstract

Collaborative tagging platforms allow users to describe resources with freely chosen keywords, so called tags. The meaning of a tag as well as the precise relation between a tag and the tagged resource are left open for interpretation to the user. Although human users mostly have a fair chance at interpreting this relation, machines do not. In this paper we study the characteristics of the problem to identify descriptive tags, i.e. tags that relate to visible objects in a picture. We investigate the feasibility of using a tag-based algorithm, i.e. an algorithm that ignores actual picture content, to tackle the problem. Given the theoretical feasibility of a well-performing tag-based algorithm, which we show via an optimal algorithm, we describe the implementation and evaluation of a WordNet-based algorithm as proof-of-concept. These two investigations lead to the conclusion that even relatively simple and fast tag-based algorithms can yet predict human ratings of which objects a picture shows. Finally, we discuss the inherent difficulty both humans and machines have when deciding whether a tag is descriptive or not. Based on a qualitative analysis, we distinguish between definitional disagreement, difference in knowledge, disambiguation and difference in perception as reasons for disagreement between raters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Turing test of online reviews: Can we tell the difference between human-written and GPT-4-written online reviews?

Article 12 April 2024

Recommender systems and their ethical challenges

Article Open access 27 February 2020

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Article Open access 06 February 2017

Notes

www.flickr.com
www.digg.com
www.technorati.com
www.citeulike.com
www.delicious.com
For instance, “castle” is a tag of http://www.flickr.com/photos/katclay/4361062759/. This picture shows a particular castle, apparently in Wales. Many objects are castles.
“Versailles” is a tag of http://www.flickr.com/photos/followingtheequator/2655044746 for instance. This picture shows the castle Versailles. There is only one real-world object that “is” Versailles.
http://www.flickr.com/services/api/
The procedure of downloading the most recent and “most interesting” pictures was chosen in order to avoid querying for specific topics (tags) or users. Getting random samples from Flickr is not really possible since Flickr’s database is only accessible via API, such that specific queries need to be formulated in order to access data.
http://wordnet.princeton.edu/
A concept refers to an idea of something. A concept often refers to something abstract, e.g., “love” or to a group of real world entities, e.g., “flower”.
An instance refers to a specific entity in the real world, e.g., “Big Ben” is an instance of the concept “clock tower”.
http://www.flickr.com/photos/bradi/2631548160/
http://www.flickr.com/photos/95197744@N00/2631322270
The picture shows a rock formation created by the process of erosion: http://www.flickr.com/photos/38381877@N00/2632589512.
This picture shows the ocean, a piece of beach and a bird but not a hotel: http://flickr.com/photos/26079103@N00/2630745505.
The picture shows a ferret: http://flickr.com/photos/77651361@N00/2631585847.
The picture shows a daisy on a sunlit background: http://www.flickr.com/photos/7845858@N05/2631348902.
A tropical maritime tree, see e.g., http://flickr.com/photos/7486128@N03/2631792980
A flower, see e.g., http://www.flickr.com/photos/mbgrigby/2930572161/
A landing wharf, a structure where ships lie alongside to in order to load or discharge freight or passengers, see e.g., http://www.flickr.com/photos/71298168@N00/2630153723.
The picture shows a flower with a butterfly, and a barely visible spider http://flickr.com/photos/18718027@N00/2631263572.
The guitar is barely visible between the grass and on top a very dark picture http://www.flickr.com/photos/52752598@N00/2630825364.
http://www.flickr.com/photos/29905372@N00/2631336870

References

Ames M, Naaman M (2007) Why we tag: motivations for annotation in mobile and online media. In: CHI ’07: proceedings of the SIGCHI conference on Human factors in computing systems. ACM, New York, pp 971–980
Chapter Google Scholar
Bechhofer S, Carr L, Goble CA, Kampa S, Miles-Board T (2002) The semantics of semantic annotation. In: On the move to meaningful internet systems, 2002—DOA/CoopIS/ODBASE 2002 Confederated International Conferences DOA, CoopIS and ODBASE 2002. Springer, London, pp 1152–1167
Chapter Google Scholar
Blanche MT, Durrheim K, Painter D (eds) (2006) Research in practice—applied methods for the social sciences. University of Cape Town Press
Cohen J, Cohen P, West SG, Aiken LS (2003) Applied multiple regression/correlation analysis for the behavioral sciences. Lawrence Erlbaum Associates. ISBN: 0805822232
Gangemi A, Guarino N, Oltramari R (2001) Conceptual analysis of lexical taxonomies: the case of wordnet top-level. In: Proceedings of the international cnference on formal ontology in information systems. ACM Press, pp 285–296
Golder SA, Hubermann BA (2006) Usage patterns of collaborative tagging systems. J Inf Sci 32(2):198–208
Article Google Scholar
Kurasaki KS (2000) Intercoder reliability for validating conclusions drawn from open-ended interview data. Field Methods 12(3):179–194
Article Google Scholar
Mika P (2005) Ontologies are us: a unified model of social networks and semantics. In: Gil Y, Motta E, Benjamins VR, Musen MA (eds) International semantic web conference. Lecture notes in computer science, vol 3729. Springer, pp 522–536
Miller GA (1995) Wordnet: a lexical database for english. Commun. ACM 38(11):39–41
Article Google Scholar
Nov O, Naaman M, Ye C (2008) What drives content tagging: the case of photos on flickr. In: CHI ’08: proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems. ACM, New York, pp 1097–1100
Chapter Google Scholar
OpenCyc (2010) http://www.opencyc.org/. Last visited: 10 Aug 2010
Pammer V, Kump B, Lindstaedt S (2009) On the feasibility of a tag-based approach for deciding which objects a picture shows: an empirical study. In: Semantic multimedia. Proceedings of 4th International Conference on Semantic and Digital Media Technologies, SAMT 2009. Lecture notes in computer science, vol 5887/2009. Graz, Austria, 2–4 Dec 2009. Springer Berlin/Heidelberg, pp 40–51
Rattenbury T, Good N, Naaman M (2007) Towards automatic extraction of event and place semantics from flickr tags. In: SIGIR ’07: proceedings of the 30th annual international ACM SIGIR conference. ACM Press, New York, pp 103–110
Chapter Google Scholar
Schmitz P (2006) Inducing ontology from flickr tags. In: Proceedings of the collaborative web tagging workshop at WWW2006. Edinburgh, Scotland
Google Scholar
Sigurbjörnsson B, van Zwol R (2008) Flickr tag recommendation based on collective knowledge. In: Huai J, Chen R, Hon H-W, Liu Y, Ma W-Y, Tomkins A, Zhang X (eds) (2008) WWW. ACM, pp 327–336
Sun A, Bhowmick SS (2009) Image tag clarity: in search of visual-representative tags for social images. In: WSM ’09: proceedings of the first SIGMM workshop on Social media. ACM, New York, pp 19–26
Chapter Google Scholar
Volkmer T, Thom JA, Tahaghoghi SMM (2007) Modeling human judgment of digital imagery for multimedia retrieval. IEEE Trans Multimedia 9(5):967–974
Article Google Scholar
Wimmer RD, Dominick JR (2006) Mass media research: an introduction, 8th edn. Thomson Wadsworth

Download references

Acknowledgements

The Know-Center is funded within the Austrian COMET Program—Competence Centers for Excellent Technologies—under the auspices of the Austrian Federal Ministry of Transport, Innovation and Technology, the Austrian Federal Ministry of Economy, Family and Youth and by the State of Styria. COMET is managed by the Austrian Research Promotion Agency FFG.

Author information

Authors and Affiliations

Know-Center, Inffeldgasse 21a, 8010, Graz, Austria
Viktoria Pammer & Stefanie Lindstaedt
Knowledge Management Institute, Graz University of Technology, Tübingen, Germany
Barbara Kump

Authors

Viktoria Pammer
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Kump
View author publications
You can also search for this author in PubMed Google Scholar
Stefanie Lindstaedt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Viktoria Pammer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pammer, V., Kump, B. & Lindstaedt, S. Tag-based algorithms can predict human ratings of which objects a picture shows. Multimed Tools Appl 59, 441–462 (2012). https://doi.org/10.1007/s11042-011-0761-x

Download citation

Published: 23 February 2011
Issue Date: July 2012
DOI: https://doi.org/10.1007/s11042-011-0761-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tag-based algorithms can predict human ratings of which objects a picture shows

Abstract

Access this article

Similar content being viewed by others

The Turing test of online reviews: Can we tell the difference between human-written and GPT-4-written online reviews?

Recommender systems and their ethical challenges

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Tag-based algorithms can predict human ratings of which objects a picture shows

Abstract

Access this article

Similar content being viewed by others

The Turing test of online reviews: Can we tell the difference between human-written and GPT-4-written online reviews?

Recommender systems and their ethical challenges

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation