Skip to main content

Utility function generated saccade strategies for robot active vision: a probabilistic approach


Humans and many animals can selectively sample necessary part of the visual scene to carry out daily activities like foraging and finding prey or mates. Selective attention allows them to efficiently use the limited resources of the brain by deploying sensory apparatus to collect data believed to be pertinent to the organisms current situation. Robots operating in dynamic environments are similarly exposed to a wide variety of stimuli, which they must process with limited sensory and computational resources. Computational saliency models inspired by biological studies have previously been used in robotic applications, but these had limited capacity to deal with dynamic environments and have no capacity to reason about uncertainty when planning their sensor placement strategy. This paper generalises the traditional model of saliency by using a Kalman filter estimator to describe an agent’s understanding of the world. The resulting modelling of uncertainty allows the agents to adopt a richer set of strategies to deploy sensory apparatus than is possible with the winner-take-all mechanism of the traditional saliency model. This paper demonstrates the use of three utility functions that are used to encapsulate the perceptual state that is valued by the agent. Each utility function thereby produces a distinct sensory deployment behaviour.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21


  1. An alternative term used to describe the internal belief is the state estimates.

  2. Here \(\varvec{w} \sim \mathcal {N}\left( \varvec{\mu },\varvec{\varSigma }\right) \) means that the random variable \(\varvec{w}\) follows the normal distribution that is completely defined by its mean (\(\varvec{\mu }\)) and variance (\(\varvec{\varSigma }\)) and is analytically described as

    $$\begin{aligned} p\left( \varvec{g}\right) = \frac{1}{\left( 2\pi \right) {|\varvec{\varSigma }|}^{1/2}} e^{-\frac{1}{2}\left( \varvec{g}-\varvec{\mu }\right) ^{\mathsf {T}}\varvec{\varSigma }^{-1}\left( \varvec{g}-\varvec{\mu }\right) .} \end{aligned}$$


  • Bakhtari, A., & Benhabib, B. (2007). An active vision system for multitarget surveillance in dynamic environments. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 37(1), 190–198.

    Article  Google Scholar 

  • Bollmann, M., Hoischen, R., & Mertsching, B. (1997). Integration of static and dynamic scene features guiding visual attention. Mustererkennung, 19, 483–490.

    Google Scholar 

  • Borji, A., Sihite, D. N., & Itti, L. (2013). Quantitative analysis of human-model agreement in visual saliency modeling: A comparative study. IEEE Transactions on Image Processing, 22(1), 55–69.

    MathSciNet  MATH  Article  Google Scholar 

  • Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., & Durand, F. (2016). What do different evaluation metrics tell us about saliency models? arXiv preprint arXiv:1604.03605.

  • Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nature Reviews Neuroscience, 3(3), 201–215.

    Article  Google Scholar 

  • Courty, N., & Marchand, E. (2003). Visual perception based on salient features. In 2003 IEEE/RSJ international conference on intelligent robots and systems, 2003. (IROS 2003). Proceedings (Vol. 1, pp. 1024–1029).

  • Davison, A. J., & Murray, D. W. (2002). Simultaneous localization and map-building using active vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 865–880.

    Article  Google Scholar 

  • Dawkins, M. S., & Woodington, A. (2000). Pattern recognition and active vision in chickens. Nature, 403(6770), 652–655.

    Article  Google Scholar 

  • Dominey, P. F., & Arbib, M. A. (1992). A cortico-subcortical model for generation of spatially accurate sequential saccades. Cerebral Cortex, 2(2), 153–175.

    Article  Google Scholar 

  • Endres, D., Neumann, H., Kolesnik, M., & Giese, M. (2011). Hooligan detection: The effects of saliency and expert knowledge. In 4th international conference on imaging for crime detection and prevention 2011 (ICDP 2011) (pp. 1–6). IET.

  • Findlay, J. M., & Walker, R. (1999). A model of saccade generation based on parallel processing and competitive inhibition. Behavioral and Brain Sciences, 22(04), 661–674.

    Article  Google Scholar 

  • Forrester, A., Sobester, A., & Keane, A. (2008). Engineering design via surrogate modelling: A practical guide. Wiley.

  • Frintrop, S. (2011). Towards attentive robots. Paladyn, Journal of Behavioral Robotics, 2(2), 64–70.

    Article  Google Scholar 

  • Frintrop, S., & Jensfelt, P. (2008). Attentional landmarks and active gaze control for visual slam. IEEE Transactions on Robotics, 24(5), 1054–1065.

    Article  Google Scholar 

  • Guo, C., & Zhang, L. (2010). A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Transactions on Image Processing, 19(1), 185–198.

    MathSciNet  MATH  Article  Google Scholar 

  • Hunter, R. S. (1958). Photoelectric color difference meter. Journal of the Optical Society of America, 48(12), 985–995.,

  • Itti, L. (2004). Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Transactions on Image Processing, 13(10), 1304–1318.

    Article  Google Scholar 

  • Itti, L., & Baldi, P. (2005). A principled approach to detecting surprising events in video. In IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005 (Vol. 1, pp. 631–637). IEEE.

  • Itti, L., Carmi, R. (2009). Eye-tracking data from human volunteers watching complex video stimuli.

  • Itti, L., & Koch, C. (2000). A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research, 40(10), 1489–1506.

    Article  Google Scholar 

  • Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259.

    Article  Google Scholar 

  • James, W. (1890). The principles of psychology. New York: H. Holt and Company.

    Google Scholar 

  • Jones, D. R., Schonlau, M., & Welch, W. J. (1998). Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4), 455–492.

    MathSciNet  MATH  Article  Google Scholar 

  • Kailath, T. (1980). Linear systems (Vol. 156). New York, NJ: Prentice-Hall Englewood Cliffs.

    MATH  Google Scholar 

  • Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Fluids Engineering, 82(1), 35–45.

    Google Scholar 

  • Koch, C., & Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying neural circuitry. Human Neurobiology, 4(4), 219–27.

    Google Scholar 

  • Le Meur, O., Le Callet, P., Barba, D., et al. (2007). A spatio-temporal model to predict visual fixation: Description and assessment. Vision Research, 47(19), 2483–2498.

    Article  Google Scholar 

  • Le Meur, O., Thoreau, D., Le Callet, P., & Barba, D. (2005). A spatio-temporal model of the selective human visual attention. In IEEE international conference on image processing, 2005. ICIP 2005 (Vol. 3, pp. 3–1188). IEEE.

  • Li, Z., Qin, S., & Itti, L. (2011). Visual attention guided bit allocation in video compression. Image and Vision Computing, 29(1), 1–14.

    Article  Google Scholar 

  • Li, J., Xia, C., Song, Y., Fang, S., & Chen, X. (2015). A data-driven metric for comprehensive evaluation of saliency models. In Proceedings of the IEEE international conference on computer vision (pp. 190–198).

  • Marat, S., Ho Phuoc, T., Granjon, L., Guyader, N., Pellerin, D., & Guérin-Dugué, A. (2009). Modelling spatio-temporal saliency to predict gaze direction for short videos. International Journal of Computer Vision, 82(3), 231–243.

    Article  Google Scholar 

  • Masaki, I. (1992). Vision-based vehicle guidance. In Proceedings of the 1992 international conference on industrial electronics, control, instrumentation, and automation, 1992. Power electronics and motion control (pp. 862–867). IEEE.

  • Mateescu, V. A., Hadizadeh, H., & Bajić, I. V. (2012). Evaluation of several visual saliency models in terms of gaze prediction accuracy on video. In Globecom workshops (GC Wkshps), 2012 IEEE (pp. 1304–1308). IEEE.

  • Najemnik, J., & Geisler, W. S. (2005). Optimal eye movement strategies in visual search. Nature, 434(7031), 387.

    Article  Google Scholar 

  • Nothdurft, H. C. (2000). Salience from feature contrast: Variations with texture density. Vision Research, 40(23), 3181–3200.

    Article  Google Scholar 

  • Posner, M. I., & Cohen, Y. (1984). Attention and performance X: Control of language processes. Components of Visual Orienting, 32, 531–556.

    Google Scholar 

  • Roy, S., & Mitra, P. (2016). Visual saliency detection: A kalman filter based approach. arXiv preprint arXiv:1604.04825.

  • Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12(1), 97–136.

    Article  Google Scholar 

  • Tsotsos, J., Culhane, S., Kei Wai, W., Lai, Y., Davis, N., & Nuflo, F. (1995). Modeling visual attention via selective tuning. Artificial Intelligence, 78(1), 507–545.

    Article  Google Scholar 

  • Walsh, V., & Butler, S. (1996). Different ways of looking at seeing. Behavioural Brain Research, 76, 1–3.

    Article  Google Scholar 

  • Walther, D., & Koch, C. (2006). Modeling attention to salient proto-objects. Neural Networks, 19(9), 1395–1407.

    MATH  Article  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Arindam Bhakta.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bhakta, A., Hollitt, C., Browne, W.N. et al. Utility function generated saccade strategies for robot active vision: a probabilistic approach. Auton Robot 43, 947–966 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Sensor planning
  • Active perception
  • Next-best view planning