Benefiting from users’ gaze: selection of image regions from eye tracking information for provided tags
Abstract
Providing image annotations is a tedious task. This becomes even more cumbersome when objects shall be annotated in the images. Such region-based annotations can be used in various ways like similarity search or as training set in automatic object detection. We investigate the principle idea of finding objects in images by looking at gaze paths from users, viewing images with an interest in a specific object. We have analyzed 799 gaze paths from 30 subjects viewing image-tag-pairs with the task to decide whether a tag could be found in the image or not. We have compared 13 different fixation measures analyzing the gaze paths. The best performing fixation measure is able to correctly assign a tag to a region for 63 % of the image-tag-pairs and significantly outperforms three baselines. We look into details of the image region characteristics such as the position and size for incorrect and correct assignments. The influence of aggregating multiple gaze paths from several subjects with respect to improving the precision of identifying the correct regions is also investigated. In addition, we look into the possibilities of discriminating different regions in the same image. Here, we are able to correctly identify two regions in the same image from different primings with an accuracy of 38 %.
Keywords
Region identification Region labeling Gaze analysis Eye tracking TaggingNotes
Acknowledgement
We thank the subjects participating in our experiment. The research leading to this article was partially supported by the EU project SocialSensor (FP7-287975).
References
- 1.Bruneau D, Sasse M, McCarthy J (2002) The eyes never lie: The use of eye tracking data in HCI research. In: Proceedings of the CHI, vol 2Google Scholar
- 2.Campbell RJ, Flynn PJ (2001) A survey of free-form object representation and recognition techniques. Comput Vis Image Underst 81(2):166–210CrossRefMATHGoogle Scholar
- 3.Castagnos S, Jones N, Pu P (2010) Eye-tracking product recommenders’ usage. In: Proceedings of the 4th ACM conference on recommender systems. ACM, pp 29–36Google Scholar
- 4.Duygulu P, Barnard K, De Freitas J, Forsyth D (2006) Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Computer vision, ECCV 2002, pp 349–354Google Scholar
- 5.Grabner H, Gall J, Van Gool L (2011) What makes a chair a chair? In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1529–1536Google Scholar
- 6.Hajimirza S, Izquierdo E (2010) Gaze movement inference for implicit image annotation. In: Image analysis for multimedia interactive services. IEEEGoogle Scholar
- 7.Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259CrossRefGoogle Scholar
- 8.Jaimes A (2001) Using human observer eye movements in automatic image classifiers. In: SPIE. ISSN 0277786X. doi: 10.1117/12.429507
- 9.Judd T, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. In: IEEE international conference on computer vision (ICCV). CiteseerGoogle Scholar
- 10.Kim D, Yu S (2008) A new region filtering and region weighting approach to relevance feedback in content-based image retrieval. J Syst Softw 81(9):1525–1538CrossRefGoogle Scholar
- 11.Klami A (2010) Inferring task-relevant image regions from gaze data. In: Workshop on machine learning for signal processing. IEEEGoogle Scholar
- 12.Klami A, Saunders C, De Campos T, Kaski S (2008) Can relevance of images be inferred from eye movements? In: Multimedia information retrieval. ACMGoogle Scholar
- 13.Kompatsiaris I, Triantafyllou E, Strintzis M (2001) A World Wide Web region-based image search engine. In: Conference on image analysis and processing. doi: 10.1109/ICIAP.2001.957041
- 14.Kozma L, Klami A, Kaski S (2009) GaZIR: gaze-based zooming interface for image retrieval. In: Multimodal interfaces. ACMGoogle Scholar
- 15.Li X, Snoek CGM, Worring M (2009) Annotating images by harnessing worldwide user-tagged photos. In: Acoustics, speech, and signal processing. IEEE, pp 3717–3720Google Scholar
- 16.Liu X, Cheng B, Yan S, Tang J, Chua T, Jin H (2009) Label to region by bi-layer sparsity priors. In: Proceedings of the 17th ACM international conference on multimedia. ACM, pp 115–124Google Scholar
- 17.Navalpakkam V, Itti L (2005) Modeling the influence of task on attention. Vis Res 45(2):205–231CrossRefGoogle Scholar
- 18.Pasupa K, Saunders C, Szedmak S, Klami A, Kaski S, Gunn S (2009) Learning to rank images from eye movements. In: IEEE 12th International conference on computer vision workshops, (ICCV Workshops ’09)Google Scholar
- 19.Privitera CM, Stark LW (2000) Algorithms for defining visual regions-of-interest: comparison with eye fixations. IEEE Trans Pattern Anal Mach Intell 22(9):970–982CrossRefGoogle Scholar
- 20.Ramanathan S, Katti H, Huang R, Chua T-S, Kankanhalli M (2009) Automated localization of affective objects and actions in images via caption text-cum-eye gaze analysis. In: Multimedia. ACM. New York, USA. ISBN 9781605586083. doi: 10.1145/1631272.1631399
- 21.Ramanathan S, Katti H, Sebe N, Kankanhalli M, Chua T (2010) An eye fixation database for saliency detection in images. In: Computer vision–ECCV 2010, pp 30–43Google Scholar
- 22.Rowe N (2002) Finding and labeling the subject of a captioned depictive natural photograph. IEEE Trans Knowl Data Eng 14(1):202–207. ISSN 1041-4347CrossRefGoogle Scholar
- 23.Russell BC, Torralba A, Murphy KP, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. International Journal of Computer Vision 77(1):157–173CrossRefGoogle Scholar
- 24.Santella A, Agrawala M, DeCarlo D, Salesin D, Cohen M (2006) Gaze-based interaction for semi-automatic photo cropping. In: CHI. ACM, pp 780Google Scholar
- 25.Schneiderman H, Kanade T (2000) A statistical method for 3d object detection applied to faces and cars. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 1. IEEE, pp 746–751Google Scholar
- 26.Sewell W, Komogortsev O (2010) Real-time eye gaze tracking with an unmodified commodity webcam employing a neural network. In: Proceedings of the 28th of the international conference extended abstracts on human factors in computing systems. ACM, pp 3739–3744Google Scholar
- 27.Tang J, Yan S, Hong R, Qi G, Chua T (2009) Inferring semantic concepts from community-contributed images and noisy tags. In: Proceedings of the 17th ACM international conference on multimedia. ACM, pp 223–232Google Scholar
- 28.Torralba A, Murphy K, Freeman W (2007) Sharing visual features for multiclass and multiview object detection. IEEE Trans Pattern Anal Mach Intell 29(5):854–869CrossRefGoogle Scholar
- 29.Tsai D, Jing Y, Liu Y, Rowley H, Ioffe S, Rehg J (2011) Large-scale image annotation using visual synset. In: 2011 IEEE international conference on computer vision (ICCV). IEEE, pp 611–618Google Scholar
- 30.Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on computer vision and pattern recognition, CVPR 2001, vol 1. IEEE, pp 511–518Google Scholar
- 31.von Ahn L, Liu R, Blum M (2006) Peekaboom: a game for locating objects in images. In: CHI. ACM, 2006. ISBN 1-59593-372-7Google Scholar
- 32.Walber T, Scherp A, Staab S (2012) Identifying objects in images from analyzing the users gaze movements for provided tags. In: Advances in multimedia modeling. Springer, pp 138–148Google Scholar
- 33.Walber T, Scherp A, Staab S (2013) Can you see it? two novel eye-tracking-based measures for assigning tags to image regions. In: Advances in multimedia modeling. Springer, pp 36–46Google Scholar
- 34.Yarbus A (1967) Eye movements and vision. Plenum pressGoogle Scholar
- 35.Zhao Q, Koch C (2011) Learning a saliency map using fixated locations in natural scenes. J Vis 11(3):1–15CrossRefGoogle Scholar