Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Modelling Task-Dependent Eye Guidance to Objects in Pictures

  • 272 Accesses

  • 9 Citations


We introduce a model of attentional eye guidance based on the rationale that the deployment of gaze is to be considered in the context of a general action-perception loop relying on two strictly intertwined processes: sensory processing, depending on current gaze position, identifies sources of information that are most valuable under the given task; motor processing links such information with the oculomotor act by sampling the next gaze position and thus performing the gaze shift. In such a framework, the choice of where to look next is task-dependent and oriented to classes of objects embedded within pictures of complex scenes. The dependence on task is taken into account by exploiting the value and the payoff of gazing at certain image patches or proto-objects that provide a sparse representation of the scene objects. The different levels of the action-perception loop are represented in probabilistic form and eventually give rise to a stochastic process that generates the gaze sequence. This way the model also accounts for statistical properties of gaze shifts such as individual scan path variability. Results of the simulations are compared either with experimental data derived from publicly available datasets and from our own experiments.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14


  1. 1.

    Anderson BA. A value-driven mechanism of attentional selection. J Vis. 2013;13(3).

  2. 2.

    Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR), 2010. p. 2963–2970.

  3. 3.

    Bahill AT, Adler D, Stark L. Most naturally occurring human saccades have magnitudes of 15 degrees or less. Investig Ophthalmol Vis Sci. 1975;14(6):468–9.

  4. 4.

    Bartumeus F, da Luz MGE, Viswanathan G, Catalan J. Animal search strategies: a quantitative random-walk analysis. Ecology. 2005;86(11):3078–87.

  5. 5.

    van Beers R. The sources of variability in saccadic eye movements. J Neurosci. 2007;27(33):8757–70.

  6. 6.

    Berridge KC, Robinson TE. Parsing reward. Trends Neurosci. 2003;26(9):507–13.

  7. 7.

    Bettenbuhl M, Rusconi M, Engbert R, Holschneider M. Bayesian selection of markov models for symbol sequences: application to microsaccadic eye movements. PLoS ONE. 2012;7(9):e43,388.

  8. 8.

    Boccignone G. Nonparametric bayesian attentive video analysis. In: Proceedings of 19th international conference on pattern recognition, ICPR 2008. p. 1–4. IEEE Press.

  9. 9.

    Boccignone G, Campadelli P, Ferrari A, Lipori G. Boosted tracking in video. Signal Process Lett IEEE. 2010;17(2):129–32.

  10. 10.

    Boccignone G, Ferraro M. Modelling gaze shift as a constrained random walk. Phys A Stat Mech Appl. 2004;331(1–2):207–18.

  11. 11.

    Boccignone G, Ferraro M. Feed and fly control of visual scanpaths for foveation image processing. Ann Telecommun. 2013;68(3-4):201–17.

  12. 12.

    Boccignone G, Ferraro M. Ecological sampling of gaze shifts. IEEE Trans Cybern. 2014;44(2):266–79.

  13. 13.

    Boccignone G, Marcelli A, Napoletano P, Di Fiore G, Iacovoni G, Morsa S. Bayesian integration of face and low-level cues for foveated video coding. IEEE Trans Circuits Syst Video Technol. 2008;18(12):1727–40.

  14. 14.

    Borji A, Itti L. State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell. 2013;35(1):185–207.

  15. 15.

    Borji A, Sihite DN, Itti L. An object-based bayesian framework for top-down visual attention. In: Twenty-sixth AAAI conference on artificial intelligence (2012).

  16. 16.

    Brockmann D, Geisel T. The ecology of gaze shifts. Neurocomputing. 2000;32(1):643–50.

  17. 17.

    Bundesen C. A computational theory of visual attention. Philos Trans R Soc Lond Ser B Biol Sci. 1998;353(1373):1271–81.

  18. 18.

    Canosa R. Real-world vision: selective perception and task. ACM Trans Appl Percept. 2009;6(2):11.

  19. 19.

    Castellanos EH, Charboneau E, Dietrich MS, Park S, Bradley BP, Mogg K, Cowan RL. Obese adults have visual attention bias for food cue images: evidence for altered reward system function. Int J Obes. 2009;33(9):1063–73.

  20. 20.

    Cerf M, Frady E, Koch C. Faces and text attract gaze independent of the task: experimental data and computer model. J Vis. 2009;9(12).

  21. 21.

    Cerf M, Harel J, Einhäuser W, Koch C. Predicting human gaze using low-level saliency combined with face detection. Adv Neural Inf Process Syst. 2008;20.

  22. 22.

    Chambers J, Mallows C, Stuck B. A method for simulating stable random variables. J Am Stat Assess. 1976;71(354):340–4.

  23. 23.

    Chernyak DA, Stark LW. Top-down guided eye movements. IEEE Trans Syst Man Cybernet B. 2001;31:514–22.

  24. 24.

    Chikkerur S, Serre T, Tan C, Poggio T. What and where: a bayesian inference theory of attention. Vis Res. 2010;50(22):2233–47.

  25. 25.

    Churchland MM, Abbott L. Two layers of neural variability. Nat Neurosci. 2012;15(11):1472–4.

  26. 26.

    Clavelli A, Karatzas D, Llados J, Ferraro M, Boccignone G. Towards modelling an attention-based text localization process. In: Sanches J, Micó L, Cardoso J, editors. Pattern recognition and image analysis, vol. 7887., Lecture notes in computer scienceBerlin: Springer; 2013. p. 296–303.

  27. 27.

    deCroon G, Postma E, van den Herik HJ. Adaptive gaze control for object detection. Cognit Comput. 2011;3:264–78.

  28. 28.

    Desimone R, Duncan J. Neural mechanisms of selective visual attention. Ann Rev Neurosci. 1995;18(1):193–222.

  29. 29.

    Dewhurst R, Nyström M, Jarodzka H, Foulsham T, Johansson R, Holmqvist K. It depends on how you look at it: scanpath comparison in multiple dimensions with multimatch, a vector-based approach. Behav Res Methods. 2012;44(4):1079–100.

  30. 30.

    Dorr M, Martinetz T, Gegenfurtner K, Barth E. Variability of eye movements when viewing dynamic natural scenes. J Vis. 2010;10(10).

  31. 31.

    Einhäuser W, Rutishauser U, Koch C. Task-demands can immediately reverse the effects of sensory-driven saliency in complex visual stimuli. J Vis. 2008;8(2).

  32. 32.

    Einhäuser W, Spain M, Perona P. Objects predict fixations better than early saliency. J Vis. 2008;8(14).10.1167/8.14.18. http://www.journalofvision.org/content/8/14/18.abstract.

  33. 33.

    Ellis S, Stark L. Statistical dependency in visual scanning. Hum Factors J Hum Factors Ergonomics Soc. 1986;28(4):421–38.

  34. 34.

    Everitt BS. The analysis of contingency tables, vol. 45. 2nd ed. Boca Raton: CRC Press; 1992.

  35. 35.

    Feng G. Eye movements as time-series random variables: a stochastic model of eye movement control in reading. Cognit Syst Res. 2006;7(1):70–95.

  36. 36.

    Foulsham T, Teszka R, Kingstone A. Saccade control in natural images is shaped by the information visible at fixation: evidence from asymmetric gaze-contingent windows. Attent Percept Psychophys. 2011;73(1):266–83.

  37. 37.

    Foulsham T, Underwood G. What can saliency models predict about eye movements? Spatial and sequential aspects of fixations during encoding and recognition. J Vis. 2008;8(2).

  38. 38.

    Frintrop S, Rome E, Christensen H. Computational visual attention systems and their cognitive foundations: a survey. ACM Trans Appl Percept. 2010;7(1):6.

  39. 39.

    Fuster J. Upper processing stages of the perception-action cycle. Trends Cognit Sci. 2004;8(4):143–5.

  40. 40.

    Gottlieb J, Balan P. Attention as a decision in information space. Trends Cognit Sci. 2010;14(6):240–8.

  41. 41.

    Greenwood P, Parasuraman R. Scale of attentional focus in visual search. Percept Psychophys. 1999;61(5):837–59.

  42. 42.

    Gros C. Cognition and emotion: perspectives of a closing gap. Cognit Comput. 2010;2(2):78–85.

  43. 43.

    Hacisalihzade S, Stark L, Allen J. Visual perception and sequences of eye movement fixations: a stochastic modeling approach. IEEE Trans Syst Man Cybern. 1992;22(3):474–81.

  44. 44.

    Heinke D, Backhaus A. Modelling visual search with the selective attention for identification model (vs-saim): a novel explanation for visual search asymmetries. Cognit Comput. 2011;3(1):185–205.

  45. 45.

    Heinke D, Humphreys GW. Attention, spatial representation, and visual neglect: simulating emergent attention and spatial memory in the selective attention for identification model (saim). Psychol Rev. 2003;110(1):29.

  46. 46.

    Heinke D, Humphreys GW. Computational models of visual selective attention: a review. Connect Models Cognit Psychol. 2005;1(4):273–312.

  47. 47.

    Hikosaka O, Nakamura K, Nakahara H. Basal ganglia orient eyes to reward. J Neurophysiol. 2006;95(2):567–84.

  48. 48.

    Ho Phuoc T, Guérin-Dugué A, Guyader N. A computational saliency model integrating saccade programming. In: Proceedings of international conference on bio-inspired systems and signal processing, pp. 57–64. Porto, Portugal (2009).

  49. 49.

    Holmqvist K, Nyström M, Andersson R, Dewhurst R, Jarodzka H, Van de Weijer J. Eye tracking: a comprehensive guide to methods and measures. Oxford: Oxford University Press; 2011.

  50. 50.

    Horowitz T, Wolfe J. Visual search has no memory. Nature. 1998;394(6693):575–7.

  51. 51.

    Hou X, Zhang L. Saliency detection: a spectral residual approach. In: Proceedings CVPR ’07, vol 1, 2007. pp 1–8.

  52. 52.

    Humphreys GW, Muller HJ. Search via recursive rejection (serr): a connectionist model of visual search. Cognit Psychol. 1993;25(1):43–110.

  53. 53.

    Ikeda T, Hikosaka O. Reward-dependent gain and bias of visual responses in primate superior colliculus. Neuron. 2003;39(4):693–700.

  54. 54.

    Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell. 1998;20:1254–9.

  55. 55.

    Keech T, Resca L. Eye movements in active visual search: a computable phenomenological model. Attent Percept Psychophys. 2010;72(2):285–307.

  56. 56.

    Kimura A, Pang D, Takeuchi T, Yamato J, Kashino K. Dynamic markov random fields for stochastic modeling of visual attention. In: Proceeding ICPR ‘08; 2008. pp. 1–5. IEEE.

  57. 57.

    Knill D, Kersten D, Yuille A. Introduction: a bayesian formulation of visual perception. In: Knill D, Richards W, editors. Perception as Bayesian inference. Cambridge: Cambridge University Press; 1996. p. 1–21.

  58. 58.

    Knill DC, Pouget A. The bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci. 2004;27(12):712–9.

  59. 59.

    Koch C, Ullman S. Shifts in selective visual attention: towards the underlying neural circuitry. Hum Neurobiol. 1985;4(4):219–27.

  60. 60.

    Koller D, Friedman N. Probabilistic graphical models: principles and techniques. Cambridge: MIT press; 2009.

  61. 61.

    Krause A, Guestrin C. Optimal value of information in graphical models. J Artif Intell Res. 2009;35:557–91.

  62. 62.

    Le Meur O, Baccino T, Roumy A. Prediction of the inter-observer visual congruency (iovc) and application to image ranking. In: Proceedings of the 19th ACM international conference on multimedia, 2011. p. 373–382.

  63. 63.

    Leon MI, Shadlen MN. Effect of expected reward magnitude on the response of neurons in the dorsolateral prefrontal cortex of the macaque. Neuron. 1999;24(2):415–25.

  64. 64.

    Logan GD. The code theory of visual attention: an integration of space-based and object-based attention. Psychol Rev. 1996;103(4):603.

  65. 65.

    Marat S, Rahman A, Pellerin D, Guyader N, Houzet D. Improving visual saliency by adding face feature mapand center bias. Cognit Comput. 2013;5(1):63–75.

  66. 66.

    Marr D. Vision: a computational investigation into the human representation and processing of visual information. New York: W.H. Freeman; 1982.

  67. 67.

    Martinez H, Lungarella M, Pfeifer R. Stochastic extension to the attention-selection system for the iCub.: University of Zurich, Tech. Rep. 2008.

  68. 68.

    Maunsell JH. Neuronal representations of cognitive state: reward or attention? Trends Cogn Sci. 2004;8(6):261–5.

  69. 69.

    Mozer MC. Early parallel processing in reading: a connectionist approach. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.; 1987.

  70. 70.

    Nagai Y. Stability and sensitivity of bottom-up visual attention for dynamic scene analysis. In: Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems. IEEE Press; 2009, p. 5198–5203.

  71. 71.

    Najemnik J, Geisler W. Optimal eye movement strategies in visual search. Nature. 2005;434(7031):387–91.

  72. 72.

    Navalpakkam V, Itti L. Modeling the influence of task on attention. Vis Res. 2005;45(2):205–31.

  73. 73.

    Navalpakkam V, Koch C, Rangel A, Perona P. Optimal reward harvesting in complex perceptual environments. Proc Natl Acad Sci. 2010;107(11):5232–7.

  74. 74.

    Otero-Millan J, Troncoso X, Macknik S, Serrano-Pedraza I, Martinez-Conde S.: Saccades and microsaccades during visual fixation, exploration, and search: foundations for a common saccadic generator. J Vis. 2008;8(14).

  75. 75.

    Over E, Hooge I, Vlaskamp B, Erkelens C. Coarse-to-fine eye movement strategy in visual search. Vis Res. 2007;47:2272–80.

  76. 76.

    Palmer J, Verghese P, Pavel M. The psychophysics of visual search. Vis Res. 2000;40(10):1227–68.

  77. 77.

    Pessoa L. On the relationship between emotion and cognition. Nat Rev Neurosci. 2008;9(2):148–58.

  78. 78.

    Pessoa L, Adolphs R. Emotion processing and the amygdala: from a ’low road’ to ’many roads’ of evaluating biological significance. Nat Rev Neurosci. 2010;11(11):773–83.

  79. 79.

    Peterson MS, Kramer AF, Wang RF, Irwin DE, McCarley JS. Visual search has memory. Psychol Sci. 2001;12(4):287–92.

  80. 80.

    Phaf RH, Van der Heijden A, Hudson PT. Slam: a connectionist model for attention in visual selection tasks. Cogn Psychol. 1990;22(3):273–341.

  81. 81.

    Platt ML, Glimcher PW. Neural correlates of decision variables in parietal cortex. Nature. 1999;400(6741):233–8.

  82. 82.

    Rao RP, Zelinsky GJ, Hayhoe MM, Ballard DH. Eye movements in iconic visual search. Vis Res. 2002;42(11):1447–63.

  83. 83.

    Rensink R. The dynamic representation of scenes. Vis Cogn. 2000;1(3):17–42.

  84. 84.

    Rhee I, Shin M, Hong S, Lee K, Kim S, Chong S. On the levy-walk nature of human mobility. IEEE/ACM Trans Netw. 2011;19(3):630–43.

  85. 85.

    Robert C. The Bayesian choice from decision-theoretic foundations to computational implementation. Berlin: Springer; 2007.

  86. 86.

    Rothkopf C, Ballard D, Hayhoe M. Task and context determine where you look. J Vis. 2007;7(14).

  87. 87.

    Rutishauser U, Koch C. Probabilistic modeling of eye movement data during conjunction search via feature-based attention. J Vis. 2007;7(6).

  88. 88.

    Scholl B. Objects and attention: the state of the art. Cognition. 2001;80(1–2):1–46.

  89. 89.

    Schütz A, Braun D, Gegenfurtner K. Eye movements and perception: a selective review. J Vis. 2011;11(5).

  90. 90.

    Shahab A, Shafait F, Dengel A, Uchida S. How salient is scene text? In: Proceeding 10th IAPR international workshop on document analysis systems (DAS, 2012); 2012. pp. 317–321. IEEE.

  91. 91.

    Shioiri S, Ikeda M. Useful resolution for picture perception as a function of eccentricity. Perception. 1989;18:347–61.

  92. 92.

    Snedecor G, Cochran W. Statistical methods. 8th ed. Ames: Iowa State University Press; 1989.

  93. 93.

    Solway A, Botvinick MM. Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates. Psychol Rev. 2012;119(1):120.

  94. 94.

    Sprague N, Ballard D. Eye movements for reward maximization. In: Advances in neural information processing systems, vol 16. Cambridge: MIT Press; 2003.

  95. 95.

    Sprenger A, Friedrich M, Nagel M, Schmidt CS, Moritz S, Lencer R. Advanced analysis of free visual exploration patterns in schizophrenia. Front Psychol. 2013;4.

  96. 96.

    Stephen D, Mirman D, Magnuson J, Dixon J. Lévy-like diffusion in eye movements during spoken-language comprehension. Phys Rev E. 2009;79(5):056,114.

  97. 97.

    Strasburger H, Rentschler I, Jüttner M. Peripheral vision and pattern recognition: a review. J Vis. 2011;11(5).

  98. 98.

    Sun Y, Fisher R, Wang F, Gomes HM. A computer vision model for visual-object-based attention and eye movements. Comput Vis Image Underst. 2008;112(2):126–42.

  99. 99.

    Tatler B, Baddeley R, Vincent B. The long and the short of it: spatial statistics at fixation vary with saccade amplitude and task. Vis Res. 2006;46(12):1857–62.

  100. 100.

    Tatler B, Hayhoe M, Land M, Ballard D. Eye guidance in natural vision: Reinterpreting salience. J Vis. 2011;11(5).

  101. 101.

    Tatler B, Vincent B. Systematic tendencies in scene viewing. J Eye Mov Res. 2008;2(2):1–18.

  102. 102.

    Tatler B, Vincent B. The prominence of behavioural biases in eye guidance. Vis Cogn. 2009;17(6–7):1029–54.

  103. 103.

    Toh WL, Rossell SL, Castle DJ. Current visual scanpath research: a review of investigations into the psychotic, anxiety, and mood disorders. Compr Psychiatr. 2011;52(6):567–79.

  104. 104.

    Torralba A. Contextual priming for object detection. Int J Comp Vis. 2003;53:153–67.

  105. 105.

    Treisman A. Feature binding, attention and object perception. Philos Trans R Soc Lond Ser B Biol Sci. 1998;353(1373):1295–306.

  106. 106.

    Treisman AM, Gelade G. A feature-integration theory of attention. Cogn Psychol. 1980;12(1):97–136.

  107. 107.

    Underwood G, Foulsham T. Visual saliency and semantic incongruency influence eye movements when inspecting pictures. Q J Exp Psychol. 2006;59(11):1931–49.

  108. 108.

    Underwood G, Foulsham T, van Loon E, Humphreys L, Bloyce J. Eye movements during scene inspection: a test of the saliency map hypothesis. Eur J Cogn Psychol. 2006;18(03):321–42.

  109. 109.

    Vinciarelli A, Pantic M, Bourlard H. Social signal processing: survey of an emerging domain. Image Vis Comput. 2009;27(12):1743–59.

  110. 110.

    Viola P, Jones M. Robust real-time face detection. Int J Comput Vis. 2004;57(2):137–54.

  111. 111.

    Walther D, Koch C. Modeling attention to salient proto-objects. Neural Netw. 2006;19(9):1395–407.

  112. 112.

    Wang H, Pomplun M. The attraction of visual attention to texts in real-world scenes. J Vis. 2012;12(6).

  113. 113.

    Wilming N, Harst S, Schmidt N, König P. Saccadic momentum and facilitation of return saccades contribute to an optimal foraging strategy. PLoS Comput Biol. 2013;9(1):e1002,871.

  114. 114.

    Wischnewski M, Belardinelli A, Schneider W, Steil J. Where to look next? Combining static and dynamic proto-objects in a TVA-based model of visual attention. Cogn Comput. 2010;2(4):326–43.

  115. 115.

    Wolfe JM. Guided search 2.0 a revised model of visual search. Psychon Bull Rev. 1994;1(2):202–38.

  116. 116.

    Wolfe JM. When is it time to move to the next raspberry bush? foraging rules in human visual search. J Vis. 2013;13(3). doi:10.1167/13.3.10. http://www.journalofvision.org/content/13/3/10.abstract.

  117. 117.

    Zelinsky GJ. A theory of eye movements during target acquisition. Psychol Rev. 2008;115(4):787.

Download references


The authors are grateful to the Referees and the Associate Editor, for their enlightening and valuable comments that have greatly improved the quality and clarity of an earlier version of this paper. This work was partially supported by the Spanish projects TIN2011-24631, TIN2009-14633-C03-03, CONSOLIDER INGENIO CSD2007-00018 and the fellowships RYC-2009-05031 and 2009FIB00020. With support from the Commission for Universities and Research Department for Innovation, Universities and Enterprise of the Generalitat of Catalonia and the European Social Fund.

Author information

Correspondence to Giuseppe Boccignone.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Clavelli, A., Karatzas, D., Lladós, J. et al. Modelling Task-Dependent Eye Guidance to Objects in Pictures. Cogn Comput 6, 558–584 (2014). https://doi.org/10.1007/s12559-014-9262-3

Download citation


  • Visual attention
  • Gaze guidance
  • Value
  • Payoff
  • Stochastic fixation prediction