Skip to main content
Log in

A Biologically Inspired Framework for Visual Information Processing and an Application on Modeling Bottom-Up Visual Attention

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Background

An emerging trend in visual information processing is toward incorporating some interesting properties of the ventral stream in order to account for some limitations of machine learning algorithms. Selective attention and cortical magnification are two such important phenomena that have been the subject of a large body of research in recent years. In this paper, we focus on designing a new model for visual acquisition that takes these important properties into account.

Methods

We propose a new framework for visual information acquisition and representation that emulates the architecture of the primate visual system by integrating features such as retinal sampling and cortical magnification while avoiding spatial deformations and other side effects produced by models that tried to implement these two features. It also explicitly integrates the notion of visual angle, which is rarely taken into account by vision models. We argue that this framework can provide the infrastructure for implementing vision tasks such as object recognition and computational visual attention algorithms.

Results

To demonstrate the utility of the proposed vision framework, we propose an algorithm for bottom-up saliency prediction implemented using the proposed architecture. We evaluate the performance of the proposed model on the MIT saliency benchmark and show that it attains state-of-the-art performance, while providing some advantages over other models.

Conclusion

Here is a summary of the main contributions of this paper: (1) Introducing a new bio-inspired framework for visual information acquisition and representation that offers the following properties: (a) Providing a method for taking the distance between an image and the viewer into account. This is done by incorporating a visual angle parameter which is ignored by most visual acquisition models. (b) Reducing the amount of visual information acquired by introducing a new scheme for emulating retinal sampling and cortical magnification effects observed in the ventral stream. (2) Providing a concrete application of the proposed framework by using it as a substrate for building a new saliency-based visual attention model, which is shown to attain state-of-the-art performance on the MIT saliency benchmark. (3) Providing an online Git repository that implements the introduced framework that is meant to be developed as a scalable, collaborative project.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. https://bitbucket.org/ala_aboudib/see.

References

  1. Aboudib A, Gripon V, Coppin G. A model of bottom-up visual attention using cortical magnification. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). 2015;1493–1497. doi:10.1109/ICASSP.2015.7178219.

  2. Achanta R, Hemami S, Estrada F, Susstrunk S. Frequency-tuned salient region detection. In: IEEE conference on computer vision and pattern recognition (CVPR), 2009, IEEE; 2009. p. 1597–1604.

  3. Anselmi F, Rosasco L, Poggio T. On invariance and selectivity in representation learning. 2015 arXiv preprint arXiv:150305938.

  4. Bonaiuto J, Itti L. Combining attention and recognition for rapid scene analysis. In: IEEE Computer Society Conference on computer vision and pattern recognition-workshops, 2005. CVPR Workshops. IEEE; 2005. p. 90.

  5. Borji A, Itti L. State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell. 2013;35(1):185–207.

    Article  PubMed  Google Scholar 

  6. Borji A, Sihite DN, Itti L. Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE Trans Image Process. 2013;22(1):55–69.

    Article  PubMed  Google Scholar 

  7. Borji A, Tavakoli HR, Sihite DN, Itti L. Analysis of scores, datasets, and models in visual saliency prediction. In: 2013 IEEE international conference on computer vision (ICCV). IEEE; 2013. p. 921–928.

  8. Borji A, Sihite DN, Itti L. What/where to look next? Modeling top-down visual attention in complex interactive environments. IEEE Trans Syst Man Cybern Syst. 2014;44(5):523–38.

    Article  Google Scholar 

  9. Dowling JE. The retina: an approachable part of the brain. Cambridge: Harvard University Press; 1987.

    Google Scholar 

  10. Freeman J, Simoncelli EP. Metamers of the ventral stream. Nat Neurosci. 2011;14(9):1195–201.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Gabor D. Theory of communication. Part 1: the analysis of information. J Inst Electr Eng Part III Radio Commun Eng. 1946;93(26):429–41.

    Google Scholar 

  12. Gao F, Zhang Y, Wang J, Sun J, Yang E, Hussain A. Visual attention model based vehicle target detection in synthetic aperture radar images: a novel approach. Cogn Comput. 2015;7(4):434–44.

    Article  Google Scholar 

  13. Garcia-Diaz A, Leboran V, Fdez-Vidal XR, Pardo XM. On the relationship between optical variability, visual saliency, and eye fixations: a computational approach. J Vis. 2012;12(6):17-17.

    Article  Google Scholar 

  14. Gattass R, Gross C, Sandell J. Visual topography of v2 in the macaque. J Comp Neurol. 1981;201(4):519–39.

    Article  CAS  PubMed  Google Scholar 

  15. Gattass R, Sousa A, Gross C. Visuotopic organization and extent of v3 and v4 of the macaque. J Neurosci. 1988;8(6):1831–45.

    CAS  PubMed  Google Scholar 

  16. Goferman S, Zelnik-Manor L, Tal A. Context-aware saliency detection. IEEE Trans Pattern Anal Mach Intell. 2012;34(10):1915–26.

    Article  PubMed  Google Scholar 

  17. Gonzalez RC, Woods RE. Digital image processing; 2002.

  18. Goodale MA, Milner AD. Separate visual pathways for perception and action. Trends Neurosci. 1992;15(1):20–5.

    Article  CAS  PubMed  Google Scholar 

  19. Harel J, Koch C, Perona P. Graph-based visual saliency. In: Advances in neural information processing systems; 2006. p. 545–552.

  20. Hubel DH, Wiesel TN. Receptive fields of single neurones in the cat’s striate cortex. J Physiol. 1959;148(3):574–91.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Hubel DH, Wiesel TN. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol. 1962;160(1):106–54.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Isik L, Leibo JZ, Mutch J, Lee SW, Poggio T. A hierarchical model of peripheral vision. Tech. rep. MIT’s Computer Science and Artificial Intelligence Laboratory; 2011.

  23. Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell. 1998;20(11):1254–9.

    Article  Google Scholar 

  24. Judd T, Ehinger K, Durand F, Torralba A. Learning to predict where humans look. In: IEEE conference on computer vision and pattern recognition (CVPR), 2009, IEEE; 2009. p. 2106–2113.

  25. Koch C, Ullman S. Shifts in selective visual attention: towards the underlying neural circuitry. In: Matters of intelligence. Springer; 1987. p. 115–141.

  26. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems; 2012. p. 1097–1105.

  27. Kruthiventi SS, Ayush K, Babu RV. Deepfix: A fully convolutional neural network for predicting human eye fixations. 2015. CoRR arXiv:1510.02927.

  28. Lake BM, Salakhutdinov R, Tenenbaum JB. Human-level concept learning through probabilistic program induction. Science. 2015;350(6266):1332–8.

    Article  CAS  PubMed  Google Scholar 

  29. Larochelle H, Hinton GE. Learning to combine foveal glimpses with a third-order boltzmann machine. In: Lafferty J, Williams C, Shawe-Taylor J, Zemel R, Culotta A, editors. Advances in neural information processing systems, vol. 23. Red Hook: Curran Associates Inc; 2010. p. 1243–1251.

    Google Scholar 

  30. LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–2324.

    Article  Google Scholar 

  31. Lee H, Battle A, Raina R, Ng AY, Efficient sparse coding algorithms. In: Advances in neural information processing systems; 2006. p. 801–808.

  32. Liu H, Liu Y, Sun F. Robust exemplar extraction using structured sparse coding. IEEE Trans Neural Netw Learn Syst. 2015;26(8):1816–21.

    Article  PubMed  Google Scholar 

  33. López-García F, Dosil R, Pardo XM, Fdez-Vidal XR. Scene recognition through visual attention and image features: a comparison between sift and surf approaches. Rijeka: INTECH Open Access Publisher; 2011.

    Google Scholar 

  34. Marčelja S. Mathematical description of the responses of simple cortical cells*. JOSA. 1980;70(11):1297–300.

    Article  Google Scholar 

  35. Marr D. Vision, a computational investigation into the human representation and processing of visual information. San Francisco: WH Freeman; 1982.

    Google Scholar 

  36. Martínez J, Robles LA. A new foveal cartesian geometry approach used for object tracking. SPPRA. 2006;6:133–9.

    Google Scholar 

  37. McCulloch WS, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys. 1943;5(4):115–33.

    Article  Google Scholar 

  38. Milner AD, Goodale MA. Two visual systems re-viewed. Neuropsychologia. 2008;46(3):774–85.

    Article  CAS  PubMed  Google Scholar 

  39. Murray N, Vanrell M, Otazu X, Parraga CA. Saliency estimation using a non-parametric low-level vision model. In: IEEE conference on computer vision and pattern recognition (CVPR), 2011, IEEE; 2011. p. 433–440.

  40. Pan J, Li X, Li X, Pang Y. Incrementally detecting moving objects in video with sparsity and connectivity. Cogn Comput. 2016;8(3):420–8.

    Article  Google Scholar 

  41. Poggio T, Mutch J, Isik L. Computational role of eccentricity dependent cortical magnification; 2014. arXiv preprint arXiv:14061770.

  42. Ranzato M, Hinton G, LeCun Y. Guest editorial: deep learning. Int J Comput Vis. 2015;113(1):1–2. doi:10.1007/s11263-015-0813-1.

    Article  Google Scholar 

  43. Ray S, Scott S, Blockeel H. Encyclopedia of machine learning, Springer US, Boston, MA, chap Multi-Instance Learning; 2010. p. 701–710. doi:10.1007/978-0-387-30164-8_569.

  44. Rodieck RW. Quantitative analysis of cat retinal ganglion cell response to visual stimuli. Vis Res. 1965;5(12):583–601.

    Article  CAS  PubMed  Google Scholar 

  45. Rybak IA, Gusakova V, Golovan A, Podladchikova L, Shevtsova N. A model of attention-guided visual perception and recognition. Vis Res. 1998;38(15):2387–400.

    Article  CAS  PubMed  Google Scholar 

  46. Salin PA, Bullier J. Corticocortical connections in the visual system: structure and function. Physiol Rev. 1995;75(1):107–55.

    CAS  PubMed  Google Scholar 

  47. Schwartz EL. Anatomical and physiological correlates of visual computation from striate to infero-temporal cortex. IEEE Trans Syst Man Cybern. 1984;2:257–71.

    Article  Google Scholar 

  48. Serre T, Wolf L, Bileschi S, Riesenhuber M, Poggio T. Robust object recognition with cortex-like mechanisms. IEEE Trans Pattern Anal Mach Intell. 2007;29(3):411–26.

    Article  PubMed  Google Scholar 

  49. Torralba A, Oliva A, Castelhano MS, Henderson JM. Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol Rev. 2006;113(4):766.

    Article  PubMed  Google Scholar 

  50. Treisman AM, Gelade G. A feature-integration theory of attention. Cogn Psychol. 1980;12(1):97–136.

    Article  CAS  PubMed  Google Scholar 

  51. Tu Z, Abel A, Zhang L, Luo B, Hussain A. A new spatio-temporal saliency-based video object segmentation. Cogn Comput. 2016;8:629–47.

    Article  Google Scholar 

  52. Walther D, Koch C. Attention in hierarchical models of object recognition. Prog Brain Res. 2007;165:57–78.

    Article  PubMed  Google Scholar 

  53. Walther D, Rutishauser U, Koch C, Perona P. On the usefulness of attention for object recognition. In: Workshop on attention and performance in computational vision at ECCV, Citeseer; 2004. p. 96–103.

  54. Wohrer A, Kornprobst P. Virtual retina: a biological retina model and simulator, with contrast gain control. J Comput Neurosci. 2009;26(2):219–49.

    Article  PubMed  Google Scholar 

  55. Zhang J, Sclaroff S. Saliency detection: a Boolean map approach. In: Proceedings of the IEEE international conference on computer vision; 2013. p. 153–160.

  56. Zhang L, Tong MH, Marks TK, Shan H, Cottrell GW. Sun: a Bayesian framework for saliency using natural statistics. J Vis. 2008;8(7):32.

    Article  PubMed  Google Scholar 

  57. Zhao J, Sun S, Liu X, Sun J, Yang A. A novel biologically inspired visual saliency model. Cogn Comput. 2014;6(4):841–8.

    Article  Google Scholar 

  58. Zheng Y, Zemel R, Zhang YJ, Larochelle H. A neural autoregressive approach to attention-based recognition. Int J Comput Vis. 2015;113(1):67–79.

    Article  Google Scholar 

  59. Zhu JY, Wu J, Xu Y, Chang E, Tu Z. Unsupervised object class discovery via saliency-guided multiple class learning. IEEE Trans Pattern Anal Mach Intell. 2015;37(4):862–75.

    Article  PubMed  Google Scholar 

Download references

Funding

This study was funded by the European Research Council under the European Union’s Seventh Framework Program (FP7/2007-2013)/ERC Grant Agreement No. 290901.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ala Aboudib.

Ethics declarations

Conflict of interest

Ala Aboudib, Vincent Gripon and Gilles Coppin declare that they have no conflict of interest.

Informed Consent

Informed consent was not required as no human or animals were involved.

Human and Animal Rights

This article does not contain any studies with human or animal subjects performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aboudib, A., Gripon, V. & Coppin, G. A Biologically Inspired Framework for Visual Information Processing and an Application on Modeling Bottom-Up Visual Attention. Cogn Comput 8, 1007–1026 (2016). https://doi.org/10.1007/s12559-016-9430-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-016-9430-8

Keywords

Navigation