Scene semantics involuntarily guide attention during visual search
During scene viewing, is attention primarily guided by low-level image salience or by high-level semantics? Recent evidence suggests that overt attention in scenes is primarily guided by semantic features. Here we examined whether the attentional priority given to meaningful scene regions is involuntary. Participants completed a scene-independent visual search task in which they searched for superimposed letter targets whose locations were orthogonal to both the underlying scene semantics and image salience. Critically, the analyzed scenes contained no targets, and participants were unaware of this manipulation. We then directly compared how well the distribution of semantic features and image salience accounted for the overall distribution of overt attention. The results showed that even when the task was completely independent from the scene semantics and image salience, semantics explained significantly more variance in attention than image salience and more than expected by chance. This suggests that salient image features were effectively suppressed in favor of task goals, but semantic features were not suppressed. The semantic bias was present from the very first fixation and increased non-monotonically over the course of viewing. These findings suggest that overt attention in scenes is involuntarily guided by scene semantics.
KeywordsScene perception Attention Semantics Salience Visual search
This research was supported by the National Eye Institute of the National Institutes of Health under award number R01EY027792. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
- Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., & Durand, F. (2016). What do different evaluation metrics tell us about saliency models? arXiv preprint arXiv:1604.03605
- Harel, J., Koch, C., & Perona, P. (2006). Graph-based Visual Saliency. In Neural information processing systems (pp. 1–8).Google Scholar
- Henderson, J.M., & Hayes, T.R. (2018). Meaning guides attention in real-world scene images: Evidence from eye movements and meaning maps. Journal of Vision, 18(6:10), 1–18.Google Scholar
- Itti, L., & Borji, A. (2014). Computational models: Bottom-up and top-down aspects. In A. C. Nobre, & S. Kastner (Eds.) , The Oxford Handbook of Attention (pp. 1122–1158). Oxford: Oxford University Press.Google Scholar
- Kümmerer, M., Wallis, T.S.A., Gatys, L.A., & Bethge, M. (2017). Understanding low- and high-level contributions to fixation prediction. In 2017 IEEE international conference on computer vision (pp. 4799–4808).Google Scholar
- SR Research. (2010) EyeLink 1000 user’s manual, version 1.5.2. Mississauga: SR Research Ltd.Google Scholar