Attention, Perception, & Psychophysics

, Volume 81, Issue 1, pp 35–46 | Cite as

Mid-level feature contributions to category-specific gaze guidance

  • Claudia DamianoEmail author
  • John Wilder
  • Dirk B. Walther


Our research has previously shown that scene categories can be predicted from observers’ eye movements when they view photographs of real-world scenes. The time course of category predictions reveals the differential influences of bottom-up and top-down information. Here we used these known differences to determine to what extent image features at different representational levels contribute toward guiding gaze in a category-specific manner. Participants viewed grayscale photographs and line drawings of real-world scenes while their gaze was tracked. Scene categories could be predicted from fixation density at all times over a 2-s time course in both photographs and line drawings. We replicated the shape of the prediction curve found previously, with an initial steep decrease in prediction accuracy from 300 to 500 ms, representing the contribution of bottom-up information, followed by a steady increase, representing top-down knowledge of category-specific information. We then computed the low-level features (luminance contrasts and orientation statistics), mid-level features (local symmetry and contour junctions), and Deep Gaze II output from the images, and used that information as a reference in our category predictions in order to assess their respective contributions to category-specific guidance of gaze. We observed that, as expected, low-level salience contributes mostly to the initial bottom-up peak of gaze guidance. Conversely, the mid-level features that describe scene structure (i.e., local symmetry and junctions) split their contributions between bottom-up and top-down attentional guidance, with symmetry contributing to both bottom-up and top-down guidance, while junctions play a more prominent role in the top-down guidance of gaze.


Eye movements Visual attention Scene perception 


  1. Anderson, B. A. (2013). A value-driven mechanism of attentional selection. Journal of Vision, 13(3), 7. CrossRefGoogle Scholar
  2. Awh, E., Belopolsky, A. V., & Theeuwes, J. (2012). Top-down versus bottom-up attentional control: A failed theoretical dichotomy. Trends in Cognitive Sciences, 16, 437–443. CrossRefGoogle Scholar
  3. Berman, D., Golomb, J. D., & Walther, D. B. (2017). Scene content is predominantly conveyed by high spatial frequencies in scene-selective visual cortex. PLoS ONE, 12, e0189828. CrossRefGoogle Scholar
  4. Biederman, I. 1987. Recognition-by-components: A theory of human image understanding. Psychological Review, 94, 115.CrossRefGoogle Scholar
  5. Bonner, M. F., & Epstein, R. A. (2018). Computational mechanisms underlying cortical responses to the affordance properties of visual scenes. PLoS Computational Biology, 14, e1006111. CrossRefGoogle Scholar
  6. Borji, A., & Itti, L. (2013). State-of-the-art in modeling visual attention. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 185–207.CrossRefGoogle Scholar
  7. Bruce, N. D., & Tsotsos, J. K. (2009). Saliency, attention, and visual search: An information theoretic approach. Journal of Vision, 9(3), 5. CrossRefGoogle Scholar
  8. Castelhano, M. S., Mack, M. L., & Henderson, J. M. (2009). Viewing task influences eye movement control during active scene perception. Journal of Vision, 9(3), 6. CrossRefGoogle Scholar
  9. Choo, H., & Walther, D. B. (2016). Contour junctions underlie neural representations of scene categories in high-level human visual cortex. NeuroImage, 135, 32–44. CrossRefGoogle Scholar
  10. Cichy, R. M., Khosla, A., Pantazis, D., & Oliva, A. (2017). Dynamics of scene representations in the human brain revealed by magnetoencephalography and deep neural networks. NeuroImage, 153, 346–358. CrossRefGoogle Scholar
  11. Feldman, J. (2007). Formation of visual “objects” in the early computation of spatial relations. Perception & Psychophysics, 69, 816–827. CrossRefGoogle Scholar
  12. Greene, M. R. (2013). Statistics of high-level scene context. Frontiers in Psychology, 4, 777. CrossRefGoogle Scholar
  13. Greene, M. R., Liu, T., & Wolfe, J. M. (2012). Reconsidering Yarbus: A failure to predict observers’ task from eye movement patterns. Vision Research, 62, 1–8. CrossRefGoogle Scholar
  14. Greene, M. R., & Oliva, A. (2009). The briefest of glances: The time course of natural scene understanding. Psychological Science, 20, 464–472. CrossRefGoogle Scholar
  15. Henderson, J. M. (2003). Human gaze control during real-world scene perception. Trends in Cognitive Sciences, 7, 498–504. CrossRefGoogle Scholar
  16. Henderson, J. M., & Hayes, T. R. (2017). Meaning-based guidance of attention in scenes as revealed by meaning maps. Nature Human Behavior, 1, 743. CrossRefGoogle Scholar
  17. Hochstein, S., & Ahissar, M. (2002). View from the top: Hierarchies and reverse hierarchies in the visual system. Neuron, 36, 791–804. CrossRefGoogle Scholar
  18. Hollingworth, A., & Henderson, J. M. (2000). Semantic informativeness mediates the detection of changes in natural scenes. Visual Cognition, 7, 213–235. CrossRefGoogle Scholar
  19. Itti, L., & Koch, C. (2000). A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research, 40, 1489–1506. CrossRefGoogle Scholar
  20. Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 1254–1259. CrossRefGoogle Scholar
  21. Koch, C., & Ullman, S. (1985). Shifts in selective visual-attention—Towards the underlying neural circuitry. Human Neurobiology, 4, 219–227.Google Scholar
  22. Koffka, K. (1935). Principles of Gestalt psychology (International Library of Psychology, Philosophy and Scientific Method). New York: Harcourt, Brace & World.Google Scholar
  23. Kowler, E., Anderson, E., Dosher, B., & Blaser, E. (1995). The role of attention in the programming of saccades. Vision Research, 35, 1897–1916. CrossRefGoogle Scholar
  24. Kümmerer, M., Wallis, T. S., & Bethge, M. (2016). DeepGaze II: Reading fixations from deep features trained on object recognition. arXiv preprint. arXiv:1610.01563Google Scholar
  25. Liversedge, S. P., & Findlay, J. M. (2000). Saccadic eye movements and cognition. Trends in Cognitive Sciences, 4, 6–14. CrossRefGoogle Scholar
  26. Marchner, J. R., & Preuschhof, C. (2018). Reward history but not search history explains value-driven attentional capture. Attention, Perception, & Psychophysics, 80, 1436–1448. CrossRefGoogle Scholar
  27. Navalpakkam, V., & Itti, L. (2005). Modeling the influence of task on attention. Vision Research, 45, 205–231. CrossRefGoogle Scholar
  28. O’Connell, T. P., & Walther, D. B. (2015). Dissociation of salience-driven and content-driven spatial attention to scene category with predictive decoding of gaze patterns. Journal of Vision, 15(5), 20. CrossRefGoogle Scholar
  29. Oliva, A., & Torralba, A. (2006). Building the gist of a scene: The role of global image features in recognition. Progress in Brain Research, 155, 23–36. CrossRefGoogle Scholar
  30. Oliva, A., & Torralba, A. (2007). The role of context in object recognition. Trends in Cognitive Sciences, 11, 520–527. CrossRefGoogle Scholar
  31. Panis, S., & Wagemans, J. (2009). Time-course contingencies in perceptual organization and identification of fragmented object outlines. Journal of Experimental Psychology: Human Perception and Performance, 35, 661. Google Scholar
  32. Parkhurst, D., Law, K., & Niebur, E. (2002). Modeling the role of salience in the allocation of overt visual attention. Vision Research, 42, 107–123. CrossRefGoogle Scholar
  33. Peters, R. J., Iyer, A., Itti, L., & Koch, C. (2005). Components of bottom-up gaze allocation in natural images. Vision Research, 45, 2397–2416.CrossRefGoogle Scholar
  34. Rayner, K. (2009). The 35th Sir Frederick Bartlett Lecture: Eye movements and attention in reading, scene perception, and visual search. Quarterly Journal of Experimental Psychology, 62, 1457–1506. CrossRefGoogle Scholar
  35. Reinagel, P., & Zador, A. M. (1999). Natural scene statistics at the centre of gaze. Network: Computation in Neural Systems, 10, 341–350. CrossRefGoogle Scholar
  36. Tatler, B. W., Baddeley, R. J., & Gilchrist, I. D. (2005). Visual correlates of fixation selection: Effects of scale and time. Vision Research, 45, 643–659. CrossRefGoogle Scholar
  37. Torralba, A., Oliva, A., Castelhano, M. S., & Henderson, J. M. (2006). Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search. Psychological Review, 113, 766–786. CrossRefGoogle Scholar
  38. Torralbo, A., Walther, D. B., Chai, B., Caddigan, E., Fei-Fei, L., & Beck, D. M. (2013). Good exemplars of natural scene categories elicit clearer patterns than bad exemplars but not greater BOLD activity. PLoS ONE, 8, e58594. CrossRefGoogle Scholar
  39. Wagemans, J. (1993). Skewed symmetry: A nonaccidental property used to perceive visual forms. Journal of Experimental Psychology: Human Perception and Performance, 19, 364–380. Google Scholar
  40. Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., & von der Heydt, R. (2012). A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure–ground organization. Psychological Bulletin, 138, 1172. CrossRefGoogle Scholar
  41. Walther, D., & Koch, C. (2006). Modeling attention to salient proto-objects. Neural Networks, 19, 1395–1407. CrossRefGoogle Scholar
  42. Walther, D. B., Chai, B., Caddigan, E., Beck, D. M., & Fei-Fei, L. (2011). Simple line drawings suffice for functional MRI decoding of natural scene categories. Proceedings of the National Academy of Sciences, 108, 9661–9666. CrossRefGoogle Scholar
  43. Walther, D. B., & Shen, D. (2014). Nonaccidental properties underlie human categorization of complex natural scenes. Psychological Science, 25, 851–860. CrossRefGoogle Scholar
  44. Wertheimer, M. (1938). Laws of organization in perceptual forms. In W. D. Ellis (Ed.), A source book of Gestalt psychology (pp. 71–88). New York: Humanities Press. (Original work published 1923)CrossRefGoogle Scholar
  45. Wilder, J. D., Rezanejad, M., Dickinson, S., Jepson, A., Siddiqi, K., & Walther, D. B. (2017a). The role of symmetry in scene categorization by human observers, Paper presented at the Computational and Mathematical Models in Vision Conference, St. Pete Beach.Google Scholar
  46. Wilder, J., Rezanejad, M., Dickinson, S., Siddiqi, K., Jepson, A., & Walther, D. (2017b). The perceptual advantage of symmetry for scene perception. Journal of Vision, 17(10), 1091. CrossRefGoogle Scholar
  47. Wu, C. C., Wick, F. A., & Pomplun, M. (2014). Guidance of visual attention by semantic information in real-world scenes. Frontiers in Psychology, 5, 54. Google Scholar
  48. Yarbus, A. L. (1967). Eye movements during perception of complex objects. In Eye movements and Vision (pp. 171–211). New York: Springer. CrossRefGoogle Scholar

Copyright information

© The Psychonomic Society, Inc. 2018

Authors and Affiliations

  • Claudia Damiano
    • 1
    Email author
  • John Wilder
    • 1
  • Dirk B. Walther
    • 1
  1. 1.Department of PsychologyUniversity of TorontoTorontoCanada

Personalised recommendations