Cognitive Computation

, Volume 6, Issue 1, pp 125–143 | Cite as

Region-Based Artificial Visual Attention in Space and Time

  • Jan Tünnermann
  • Bärbel Mertsching


Mobile robots have to deal with an enormous amount of visual data containing static and dynamic stimuli. Depending on the task, only small portions of a scene are relevant. Artificial attention systems filter information at early stages. Among the various methods proposed to implement such systems, the region-based approach has proven to be robust and especially suited for integrating top-down influences. This concept was recently transferred to the spatiotemporal domain to obtain motion saliency. A full-featured integration of the spatial and spatiotemporal systems is presented here. We propose a biologically inspired two-stream system, which allows to use different spatial and temporal resolutions and to pick off spatiotemporal saliency at early stages. We compare the output to classic models and demonstrate the flexibility of the integrated approach in different experiments. These include online processing of continuous input, a task similar to thumbnail extraction and a top-down task of selecting specific moving and non-moving objects.


Spatiotemporal saliency Motion saliency Visual attention Region-based attention 


  1. 1.
    Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell. 1998;20(11):1254–9.CrossRefGoogle Scholar
  2. 2.
    Aziz MZ, Mertsching B. Fast and robust generation of feature maps for region-based visual attention. In: IEEE transactions on image processing, vol. 17; 2008. p. 633–44.Google Scholar
  3. 3.
    Wolfe JM, Horowitz TS. What attributes guide the deployment of visual attention and how do they do it? Nat Rev Neurosci. 2004;5(6):495–501.CrossRefPubMedGoogle Scholar
  4. 4.
    Aziz MZ, Knopf M, Mertsching B. Knowledge-driven saliency: attention to the unseen. In: ACIVS 2011, LNCS 6915; 2011. p. 34–45.Google Scholar
  5. 5.
    Aziz MZ. Behavior adaptive and real-time model of integrated bottom-up and top-down visual attention. Dissertation, University of Paderborn; 2009.Google Scholar
  6. 6.
    Tünnermann J, Mertsching B. Continuous region-based processing of spatiotemporal saliency. In: Proceedings of the international conference on computer vision theory and applications; 2012. p. 230–9.Google Scholar
  7. 7.
    Koch C, Ullman S. Shifts in selective attention: towards the underlying neural circuitry. Hum Neurobiol. 1985;4:219–27.PubMedGoogle Scholar
  8. 8.
    Treisman AM, Gelade G. A feature integration theory of attention. Cognit Psychol. 1980;12(1):97–136.Google Scholar
  9. 9.
    Itti L, Koch C. Computational modelling of visual attention. Nat Rev Neurosci. 2001;2(3):194–203.CrossRefPubMedGoogle Scholar
  10. 10.
    Belardinelli A, Pirri F, Carbone A. Attention in cognitive systems. Berlin: Springer. 2009. p. 112–23.Google Scholar
  11. 11.
    Adelson EH, Bergen JR. Spatiotemporal energy models for the perception of motion. J Opt Soc Am. 1985;2(2):284–99.CrossRefGoogle Scholar
  12. 12.
    Hou X, Zhang L. Saliency detection: a spectral residual approach. In: IEEE CVPR; 2007. p. 1–8.Google Scholar
  13. 13.
    Li J, Levine MD, An X, He H. Saliency detection based on frequency and spatial domain analyses. In: Proceedings of the British machine vision conference, BMVA Press; 2011. p. 86.1–.11.Google Scholar
  14. 14.
    Guo C, Ma Q, Zhang L. Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform. In: IEEE CVPR; 2008. p. 1–8.Google Scholar
  15. 15.
    Cui X, Liu Q, Metaxas DN. Temporal spectral residual: fast motion saliency detection. In: ACM multimedia’09; 2009. p. 617–20.Google Scholar
  16. 16.
    Gao D, Mahadevan V, Vasconcelos N. The discriminant center-surround hypothesis for bottom-up saliency. In: Advances in neural information processing systems. vol. 20; 2007. p. 1–8.Google Scholar
  17. 17.
    Seo HJ, Milanfar P. Static and space-time visual saliency detection by self-resemblance. J Vis. 2009;9(12):15.1–.27.CrossRefGoogle Scholar
  18. 18.
    Mahadevan V, Vasconcelos N. Spatiotemporal saliency in dynamic scenes. IEEE Trans Pattern Anal Mach Intell. 2010;32(1):171–7.CrossRefPubMedGoogle Scholar
  19. 19.
    Itti L, Baldi PF. Bayesian surprise attracts human attention. In: Advances in neural information processing systems, vol. 19. Cambridge, MA: MIT Press; 2006. p. 547–54.Google Scholar
  20. 20.
    Itti L, Baldi PF. Bayesian surprise attracts human attention. Vis Res. 2009;49(10):1295–306.CrossRefPubMedCentralPubMedGoogle Scholar
  21. 21.
    Zhang L, Tong MH, Marks TK, Shan H, Cottrell GW. SUN: a Bayesian framework for saliency using natural statistics. J Vis. 2008;8(7):1–20.CrossRefGoogle Scholar
  22. 22.
    Zhang L, Tong MH, Cottrell GW. SUNDAy: saliency using natural statistics for dynamic analysis of scenes. In: 31st annual cognitive science society conference; 2009. p. 2944–9.Google Scholar
  23. 23.
    Torralba A, Oliva A, Castelhano MS, Henderson JM. Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol Rev. 2006;113(4):766–86.CrossRefPubMedGoogle Scholar
  24. 24.
    Oliva A, Torralba A. Building the gist of a scene: the role of global image features in recognition. In: Progress in brain research; 2006. p. 23–36.Google Scholar
  25. 25.
    Itti L, Koch C. Feature combination strategies for saliency-based visual attention systems. J Electron Imaging. 2001;10(1):161–9.CrossRefGoogle Scholar
  26. 26.
    Navalpakkam V, Itti L. An integrated model of top-down and bottom-up attention for optimal object detection. In: IEEE CVPR; 2006. p. 2049–56.Google Scholar
  27. 27.
    Aziz MZ, Mertsching B. Visual search in static and dynamic scenes using fine-grain top-down visual attention. In: ICVS, vol. 5008; 2008. p. 3–12.Google Scholar
  28. 28.
    Wischnewski M, Belardinelli A, Schneider WX, Steil JJ. Where to look next? combining static and dynamic proto-objects in a TVA-based model of visual attention. Cognit Comput. 2010;2(4):326–43.CrossRefGoogle Scholar
  29. 29.
    Kouchaki Z, Nasrabadi AM. A nonlinear feature fusion by variadic neural network in saliency-based visual attention. In: Proceedings of the international conference on computer vision theory and applications; 2012. p. 457–61.Google Scholar
  30. 30.
    Tünnermann J, Born C, Mertsching B. Top-down visual attention with complex templates. In: Proceedings of the international conference on computer vision theory and applications; 2013. p. 370–7.Google Scholar
  31. 31.
    Borji A, Itti L. State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell. 2013;35(1):185–207.CrossRefPubMedGoogle Scholar
  32. 32.
    Aziz MZ, Shafik MS, Mertsching B, Munir A. Color segmentation for visual attention of mobile robots. In: Proceedings of the IEEE symposium on emerging technologies; 2005. p. 115–20.Google Scholar
  33. 33.
    Backer M, Tünnermann J, Mertsching B. Parallel k-means image segmentation using sort, scan and connected components on a GPU. In: Keller R, Kramer D, Weiss JP, editors. Facing the multicore-challenge III. vol. 7686 of lecture notes in computer science. Berlin: Springer; 2013. p. 108–20.Google Scholar
  34. 34.
    Aziz MZ, Mertsching B. Pop-out and IOR in static scenes with region based visual attention. Bielefeld: Bielefeld University eCollections; 2007.Google Scholar
  35. 35.
    Ungerleider LG, Mishkin M. 18. In: Ingle DJ, Goodale M, Mansfield RJW, editors. Two Cortical Visual Systems; 1982. p. 549–86.Google Scholar
  36. 36.
    Goodale MA, Milner AD. Separate visual pathways for perception and action. Trends Neurosci. 1992;15(1):20–5.CrossRefPubMedGoogle Scholar
  37. 37.
    Goodale MA, Westwood DA. An evolving view of duplex vision: separate but interacting cortical pathways for perception and action. Curr Opin Neurobiol. 2004;14(2):203–11.CrossRefPubMedGoogle Scholar
  38. 38.
    Tseng P, Tünnermann J, Roker-Knight N, Winter D, Scharlau I, Bridgeman B. Enhancing implicit change detection through action. Perception. 2010;39:1311–21.CrossRefPubMedGoogle Scholar
  39. 39.
    Itti L. Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes. Vis Cognit. 2005;12(6):1093–123.CrossRefGoogle Scholar
  40. 40.
    CRCNS. Collaborative research in computational neuroscience—data sharing. 2008. Accessed Jun 2013.
  41. 41.
    Deubel H, Schneider WX. Saccade target selection and object recognition: evidence for a common attentional mechanism. Vis Res. 1996;36(12):1827–37.CrossRefPubMedGoogle Scholar
  42. 42.
    Malcolm GL, Henderson JM. Combining Top-down processes to guide eye movements during real-world scene search. J Vis. 2010;10(2):1–11.Google Scholar
  43. 43.
    Tseng PH, Carmi R, Cameron IGM, Munoz DP, Itti L. Quantifying center bias of observers in free viewing of dynamic natural scenes. J Vis. 2009;9(7):1–16.Google Scholar
  44. 44.
    PETS2001. 2nd IEEE international workshop on performance evaluation of tracking and surveillance. 2001. Accessed 3 Jun 2013.

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.GET LabUniversity of PaderbornPaderbornGermany

Personalised recommendations