Analysis of Inter-rater Agreement among Human Observers Who Judge Image Similarity
In this paper a problem of inter-rater agreement is discussed in the case of human observers who judge how similar pairs of images are. In such a case significant differences in judgment appear among the group of people. We have observed that for some pairs of images all values of similarity ratings are assigned by various people with approximately the same probability. To investigate this phenomenon in a more thorough mannerwe performed experiments in which inter-rater coefficients were used to measure the level of agreement for each given pair of images and for each pair of human judges. The results obtained in the experiments suggest that the variation of the level of agreement is considerable among pairs of images as well as among pairs of people. We suggest that this effect should be taken into account in design of computer systems using image similarity as a criterion.
KeywordsImage Pair Image Similarity Weighted Kappa Rater Agreement Pure Chance
Unable to display preview. Download preview PDF.
- 1.All images used in the experiments at the AI research group website (2010), http://ai.ii.pwr.wroc.pl/similaris/pic/PWR-SimilarImages/
- 2.Flickr - photo sharing (2010), http://www.flickr.com/
- 11.Geertzen, J., Bunt, H.: Measuring annotator agreement in a complex hierarchical dialogue act annotation scheme. In: Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue, Association for Computational Linguistics, Sydney, Australia, pp. 126–133 (2006)Google Scholar
- 15.Ptaszynski, M., Maciejewski, J., Dybala, P., Rzepka, R., Araki, K.: CAO: A fully automatic emoticon analysis system. In: Fox, M., Poole, D. (eds.) AAAI. AAAI Press, Menlo Park (2010)Google Scholar
- 16.Viera, A.J., Garrett, J.M.: Understanding interobserver agreement: the kappa statistic. Family Medicine 37(5), 360–363 (2005)Google Scholar
- 17.Yoon, T., Chavarra, R., Cole, J., Hasegawa-johnson, M.: Intertranscriber reliability of prosodic labeling on telephone conversation using ToBI. In: Proc. ICSLP (2004)Google Scholar