KI - Künstliche Intelligenz

, Volume 29, Issue 1, pp 9–18 | Cite as

What We Can Learn From the Primate’s Visual System

  • Norbert Krüger
  • Michael ZillichEmail author
  • Peter Janssen
  • Anders Glent Buch
Technical Contribution


In this review, we discuss the impact (or lack thereof) biologically motivated vision has had on computer vision in the last decades. We then summarize a number of computer vision and robotic problems for which biological models can give indications for how these can be addressed. Then we summarize important findings about the primate’s visual system and draw a number of conclusions for the development of algorithms from these findings.


Visual Field Visual Cortex Receptive Field Optic Flow Human Visual System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Norbert Krüger was supported by the EU Cognitive Systems project XPERIENCE (FP7-ICT-270273) and the DSF project patient@home. Michael Zillich was supported by EU projects SQUIRREL (FP7-ICT-610532) and STRANDS (FP7-ICT-600623) and Austrian Science Fund (FWF) grant No. TRP 139-N23 InSitu. Many thanks to Antonio Rodriguez Sanchez for his work on Fig. 1 and Laurenz Wiskott for his work on Fig. 2. Many thanks also to IEEE for allowing to re-use these figures from [48].


  1. 1.
    Agarwal S, Snavely N, Simon I, Seitz SM, Szeliski R (2009) Building Rome in a day. In: International Conference on Computer Vision (ICCV). pp 72–79Google Scholar
  2. 2.
    Aldoma A, Fäulhammer T, Vincze M (2014) Automation of “ground truth” annotation for multi-view RGB-D object instance recognition datasets. In: International Conference on Robotics and Automation (ICRA). pp 5016–5023Google Scholar
  3. 3.
    Ambrus R, Bore N, Folkesson J, Jensfelt P (2014) Meta-rooms: building and maintaining long term spatial models in a dynamic world. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp 1854–1861Google Scholar
  4. 4.
    Andreopoulos A, Tsotsos JK (2013) 50 Years of object recognition: directions forward. Comput Vision Image Underst 117(8):827–891CrossRefGoogle Scholar
  5. 5.
    Arterberry ME, Yonas A, Bensen AS (1977) Self-produced locomotion and the development of responsiveness to linear perspective and texture gradients. Dev Psychol 25:976–982CrossRefGoogle Scholar
  6. 6.
    Bay H, Ess A, Tuytelaars T, Van Gool L (2008) SURF: speeded up robust features. Comput Vision Image Underst 110(3):346–359CrossRefGoogle Scholar
  7. 7.
    Bengio S, Deng L, Larochelle H, Lee H, Salakhutdinov R (guest eds) (2013) Special section on learning deep architectures. Pattern analysis and machine intelligence, IEEE Transactions on 35(8)Google Scholar
  8. 8.
    Berkes P, Wiskott L (2005) Slow feature analysis yields a rich repertoire of complex cell properties. J Vision 5(6):579–602CrossRefGoogle Scholar
  9. 9.
    Borji A, Itti L (2013) State-of-the-art in visual attention modeling. Pattern Anal Mach Intell IEEE Trans 35(1):185–207CrossRefMathSciNetGoogle Scholar
  10. 10.
    Borji A, Sihite DN, Itti L (2012) Probabilistic learning of task-specific visual attention. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp 470–477Google Scholar
  11. 11.
    Boyer KL, Sarkar S (1999) Perceptual organisation in computer vision: status, challenges and potential. Guest Editor Comput Vision Image Underst 76(1):1–5CrossRefGoogle Scholar
  12. 12.
    Canny JF (1986) A computational approach to edge detection. Pattern Anal Mach Intell IEEE Trans 8(6):679–698CrossRefGoogle Scholar
  13. 13.
    Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. In: British Machine Vision Conference (BMVC)Google Scholar
  14. 14.
    Chuang AT, Margo CE, Greenberg PB (2014) Retinal implants: a systematic review. Br J Ophthalmol 98:852–856CrossRefGoogle Scholar
  15. 15.
    Criminisi A, Blake A, Rother C, Shotton J, Torr P (2007) Efficient dense stereo with occlusions for new view-synthesis by four-state dynamic programming. Int J Comput Vision 71(1):89–110CrossRefGoogle Scholar
  16. 16.
    Cummins M, Newman P (2010) Appearance-only SLAM at large scale with FAB-MAP 2.0. The International Journal of Robotics ResearchGoogle Scholar
  17. 17.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol 2. pp 886–893Google Scholar
  18. 18.
    Dame A, Prisacariu VA, Ren CY, Reid I (2013) Dense reconstruction using 3D object shape priors. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp 1288–1295Google Scholar
  19. 19.
    Dementhon DF, Davis LS (1995) Model-based object pose in 25 lines of code. Int J Comput Vision 15(1–2):123–141CrossRefGoogle Scholar
  20. 20.
    Dickinson S (2009) The evolution of object categorization and the challenge of image abstraction. In: Dickinson S, Leonardis A, Schiele B, Tarr M (eds) Object categorization: computer and human vision perspectives. Cambridge University Press, pp 1–37Google Scholar
  21. 21.
    Dickinson S, Levinshtein A, Sala P, Sminchisescu C (2013) The role of mid-level shape priors in perceptual grouping and image abstraction. In: Dickinson S, Pizlo Z (eds), Shape perception in human and computer vision: sn interdisciplinary perspective. SpringerGoogle Scholar
  22. 22.
    Fang F, Boyaci H, Kersten D (2009) Border ownership selectivity in human early visual cortex and its modulation by attention. J Neurosci 29(2):460–465CrossRefGoogle Scholar
  23. 23.
    Faugeras OD (1993) Three-dimensional computer vision. MIT press, CambridgeGoogle Scholar
  24. 24.
    Fidler S, Boben M, Leonardis A (2010) A coarse-to-fine taxonomy of constellations for fast multi-class object detection. In: European Conference on Computer Vision (ECCV)Google Scholar
  25. 25.
    Freedman DJ, Assad JA (2012) Experience-dependent representation of visual categories in parietal cortex. Nature 443:85–88CrossRefGoogle Scholar
  26. 26.
    Frintrop S, Rome E, Christensen H (2010) Computational visual attention systems and their cognitive foundations: a survey. ACM Trans Appl Percept (TAP) 7(1):1–46CrossRefGoogle Scholar
  27. 27.
    Geman S, Bienenstock E, Doursat R (1995) Neural networks and the bias/variance dilemma. Neural Comput 4:1–58CrossRefGoogle Scholar
  28. 28.
    Girshick R, Donahue J, Darrell T, and Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)Google Scholar
  29. 29.
    Gordon I , Lowe DG (2006) What and where: 3D object recognition with accurate pose. In: Ponce J, Hebert M, Schmid C, Zisserman A (eds) Toward category-level object recognition, chapter what and w. Springer, pp 67–82Google Scholar
  30. 30.
    Hager GD, Wegbreit B (2011) Scene parsing using a prior world model. Int J Robot ResGoogle Scholar
  31. 31.
    Hartley RI, Zisserman A (2000) Multiple view geometry in computer vision. University Press, CambridgezbMATHGoogle Scholar
  32. 32.
    Herbst E, Ren X, Fox D (2011) RGB-D object discovery via multi-scene analysis. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)Google Scholar
  33. 33.
    Hinterstoisser S, Cagniart C, Ilic S, Sturm P, Navab N, Fua P, Lepetit V (2012) Gradient response maps for real-time detection of textureless objects. Pattern Anal Mach Intell IEEE Trans 34(5):876–888CrossRefGoogle Scholar
  34. 34.
    Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554CrossRefzbMATHMathSciNetGoogle Scholar
  35. 35.
    Hochberg LR, Bacher D, Jarosiewicz B, Masse NY, Simeral JD, Vogel J, Haddadin JS, Liu J, Cash SS, vander Smagt P, Donoghue JP (2012) Reach and grasp by people with tetraplegia using a neurally controlled robotic arm. Nature 485:372–375CrossRefGoogle Scholar
  36. 36.
    Hubel DH, Wiesel TN (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160:106–154CrossRefGoogle Scholar
  37. 37.
    Hubel DH, Wiesel TN (1969) Anatomical demonstration of columns in the monkey striate cortex. Nature 221:747–750CrossRefGoogle Scholar
  38. 38.
    Hummel J, Biederman I (1992) Dynamic binding in a neural network for shape recognition. Psychol Rev 99:480–517CrossRefGoogle Scholar
  39. 39.
    Johnson AE, Hebert M (1999) Using spin images for efficient object recognition in cluttered 3d scenes. Pattern Anal Mach Intell IEEE Trans 21(5):433–449CrossRefGoogle Scholar
  40. 40.
    Kayser C, Körding KP, König P (2004) Processing of complex stimuli and natural scenes in the visual cortex. Curr Opin Neurobiol 14(4):468–473CrossRefGoogle Scholar
  41. 41.
    Kellman PJ, Arterberry ME (1998) The cradle of knowledge. MIT Press, CambridgeGoogle Scholar
  42. 42.
    Klein G, Murray D (2007) Parallel tracking and mapping for small AR workspaces. In: Sixth IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR). Nara, Japan, pp 225–234Google Scholar
  43. 43.
    König P, Krüger N (2006) Perspectives: symbols as self-emergent entities in an optimization process of feature extraction and predictions. Biol Cybern 94(4):325–334CrossRefzbMATHGoogle Scholar
  44. 44.
    Kraft D, Pugeault N, Başeski M, Popović M, Kragic D, Kalkan S, Wörgötter F, Krüger N (2009) Birth of the object: detection of objectness and extraction of object shape through object action complexes. Int J Humanoid Robot 5:247–265CrossRefGoogle Scholar
  45. 45.
    Krainin M, Henry P, Ren X, Fox D (2010) Manipulator and object tracking for in Hand Model Acquisition. In: IEEE International Conference on Robotics and Automation (ICRA)Google Scholar
  46. 46.
    Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet Classification with Deep Convolutional Neural Networks. In: Advances in neural information processing systems. pp 1–9Google Scholar
  47. 47.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25 (NIPS 2012). Curran Associates Inc, pp 1097–1105Google Scholar
  48. 48.
    Krüger N, Janssen P, Kalkan S, Lappe M, Leonardis A, Piater J, Rodríguez-Sánchez AJ, Wiskott L (2013) Deep hierarchies in the primate visual cortex: what can we learn for computer vision? Pattern Anal Mach Intell IEEE Trans 35(8):1847–1871CrossRefGoogle Scholar
  49. 49.
    Krüger N, vonder Malsburg C (2015) A required paradigm shift in todays vision research: interview with Prof. Christoph von der Malsburg. Künstliche Intelligenz—special issue on bio-inspired vision systemsGoogle Scholar
  50. 50.
    Krüger N, Wörgötter F (2004) Statistical and deterministic regularities: utilisation of motion and grouping in biological and artificial visual systems. Adv Imaging Electron Phys 131:82–147Google Scholar
  51. 51.
    Leung T, Malik J (1998) Contour continuity in region based image segmentation. In: European Conference on Computer Vision (ECCV). pp 544–559Google Scholar
  52. 52.
    Lowe DG (1987) Three-dimensional object recognition from single two-dimensional images. Artif Intell 31(3):355–395CrossRefGoogle Scholar
  53. 53.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 2(60):91–110CrossRefGoogle Scholar
  54. 54.
    Marr D (1982) Vision: a computational investigation into the human representation and processing of visual information. Freeman WHGoogle Scholar
  55. 55.
    Mian AS, Bennamoun M, Owens R (2006) Three-dimensional model-based object recognition and segmentation in cluttered scenes. Pattern Anal Mach Intell IEEE Trans 28(10):1584–1601CrossRefGoogle Scholar
  56. 56.
    Navalpakkam V, Itti L (2006) An integrated model of top-down and bottom-up attention for optimal object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York, pp 2049–2056Google Scholar
  57. 57.
    Newcombe RA, Davison AJ (2010) Live dense reconstruction with a single moving camera. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp 1498–1505Google Scholar
  58. 58.
    Newcombe RA, Lovegrove SJ, Davison AJ (2011) DTAM : dense tracking and mapping in real-time. In: IEEE International Conference on Computer Vision (ICCV)Google Scholar
  59. 59.
    Olshausen BA, Field D (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381:607–609CrossRefGoogle Scholar
  60. 60.
    Osada R, Funkhouser T, Chazelle B, Dobkin D (2002) Matching 3D models with shape distributions. In: International Conference on Shape Modeling and Applications (SMI)Google Scholar
  61. 61.
    Pizzoli M, Forster C, Scaramuzza D (2014) REMODE : probabilistic, monocular dense reconstruction in real time. In: IEEE International Conference on Robotics and Automation (ICRA)Google Scholar
  62. 62.
    Pugeault N, Wörgötter F, Krüger N (2010) Visual primitives: local, condensed, and semantically rich visual descriptors and their applications in robotics. Int J Humanoid Robot 7(3):379–405CrossRefGoogle Scholar
  63. 63.
    Richtsfeld A, Mörwald T, Prankl J, Zillich M, Vincze M (2014) Learning of perceptual grouping for object segmentation on RGB-D data. J Vis Commun Image Represent 25(1):64–73CrossRefGoogle Scholar
  64. 64.
    Rosten E, Porter R, Drummond T (2010) Faster and better: a machine learning approach to corner detection. Pattern Anal Mach Intell IEEE Trans 32(1):105–119CrossRefGoogle Scholar
  65. 65.
    Rumelhart D, Hinton GE, Williams RJ (1986) Learning representation by back-propagating errors. Nature 323(9):533–536CrossRefGoogle Scholar
  66. 66.
    Russakovsky O, Deng J, Huang Z, Berg AC, Fei-Fei L (2013) Detecting avocados to Zucchinis: what have we done, and where are we going? In: IEEE International Conference on Computer Vision (ICCV). pp 2064–2071Google Scholar
  67. 67.
    Rusu RB, Bradski G, Thibaux R, Hsu J (2010) Fast 3D recognition and pose using the Viewpoint Feature Histogram. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)Google Scholar
  68. 68.
    Sala P, Dickinson S (2010) Contour grouping and abstraction using simple part models. In: European Conference on Computer Vision (ECCV). pp 603–616Google Scholar
  69. 69.
    Salti S, Tombari F, Di Stefano L (2014) SHOT: unique signatures of histograms for surface and texture description. Comput Vision Image Underst 125:251–264CrossRefGoogle Scholar
  70. 70.
    Sarkar S, Boyer KL (1993) Perceptual organization in computer vision: a review and a proposal for a classificatory structure. IEEE Trans Syst Man Cybern 23(2):382–399CrossRefGoogle Scholar
  71. 71.
    Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from RGBD images. In: European Conference on Computer Vision (ECCV). pp 746–760Google Scholar
  72. 72.
    Sinha SN, Scharstein D, Szeliski R (2014) Efficient high-resolution stereo matching using local plane sweeps. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp 1582–1589Google Scholar
  73. 73.
    Stuehmer J, Gumhold S, Cremers D (2010) Real-time dense geometry from a handheld camera. In: Proceedings of the DAGM Symposium on Pattern RecognitionGoogle Scholar
  74. 74.
    Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2014) Intriguing properties of neural networks. In: International Conference on Learning Representations (ICLR)Google Scholar
  75. 75.
    Tanaka K (1993) Neuronal mechanisms of object recognition. Science 262:685–688CrossRefGoogle Scholar
  76. 76.
    Tola E, Lepetit V, Fua P (2010) DAISY: an efficient dense descriptor applied to wide-baseline stereo. Pattern Anal Mach Intell IEEE Trans 32(5):815–830CrossRefGoogle Scholar
  77. 77.
    Tombari F, Salti S, Di Stefano L (2010) Unique signatures of histograms for local surface description. In: European Conference on Computer Vision (ECCV). Springer, pp 356–369Google Scholar
  78. 78.
    Tuytelaars T, Mikolajczyk K (2008) Local invariant feature detectors: a survey. Found Trends Comput Graph Vision 3(3):1–104Google Scholar
  79. 79.
    Ückermann A, Elbrechter C, Haschke R, Ritter H (2014) Real-time hierarchical scene segmentation and classification. In: IEEE-RAS International Conference on Humanoid Robots (Humanoids)Google Scholar
  80. 80.
    Vapnik VN (1998) Stat Learn Theory. Adaptive and learning systems for signal processing. Wiley, New-YorkGoogle Scholar
  81. 81.
    Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154CrossRefGoogle Scholar
  82. 82.
    vonder Heydt R, Peterhans E, Baumgartner G (1984) Illusory contours and cortical neuron responses. Science 224:1260–1262CrossRefGoogle Scholar
  83. 83.
    Wagemans J, Elder JH, Kubovy M, Palmer SE, Peterson MA, Singh M, vonder Heydt R (2012) A century of gestalt psychology in visual perception: I. perceptual grouping and figure-ground organization. Psychol Bull 138(6)Google Scholar
  84. 84.
    Wendel A, Maurer M, Graber G, Pock T, Bischof H (2012) Dense reconstruction on-the-fly. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp 1450–1457Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Norbert Krüger
    • 1
  • Michael Zillich
    • 2
    Email author
  • Peter Janssen
    • 3
  • Anders Glent Buch
    • 1
  1. 1.The Maersk Mc-Kinney Moller InstituteUniversity of Southern DenmarkOdenseDenmark
  2. 2.Automation and Control InstituteVienna University of TechnologyViennaAustria
  3. 3.Laboratorium voor Neuro- en Psychofysiologie KU LeuvenLouvainBelgium

Personalised recommendations