Approaches and Challenges for Cognitive Vision Systems

  • Julian Eggert
  • Heiko Wersing
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5436)


A cognitive visual system is generally intended to work robustly under varying environmental conditions, adapt to a broad range of unforeseen changes, and even exhibit prospective behavior like systematically anticipating possible visual events. These properties are unquestionably out of reach of currently available solutions. To analyze the reasons underlying this failure, in this paper we develop the idea of a vision system that flexibly controls the order and the accessibility of visual processes during operation. Vision is hereby understood as the dynamic process of selective adaptation of visual parameters and modules as a function of underlying goals or intentions. This perspective requires a specific architectural organization, since vision is then a continuous balance between the sensory stimulation and internally generated information. Furthermore, the consideration of intrinsic resource limitations and their organization by means of an appropriate control substrate become a centerpiece for the creation of truly cognitive vision systems. We outline the main concepts that are required for the development of such systems, and discuss modern approaches to a few selected vision subproblems like image segmentation, item tracking and visual object classification from the perspective of their integration and recruitment into a cognitive vision system.


Online Learning Visual Memory Humanoid Robot Perceptual Learning Visual Scene 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aloimonos, J.: Purposive and qualitative active vision. In: Proc. 10th Int. Conf. Patt. Recog., June 1990, pp. 345–360 (1990)Google Scholar
  2. 2.
    Aloimonos, Y.: Active vision revisited. In: Active Perception (1993)Google Scholar
  3. 3.
    Arsenio, A.: Developmental learning on a humanoid robot. In: Proc. Int. Joint Conf. Neur. Netw. 2004, Budapest, pp. 3167–3172 (2004)Google Scholar
  4. 4.
    Arulampalam, S., Maskell, S., Gordon, N., Clapp, T.: A Tutorial on Particle Filters for On-line Non-linear/Non-Gaussian Bayesian Tracking. IEEE Trans. Signal Processing, 100–107 (2001)Google Scholar
  5. 5.
    Baars, B.J.: Metaphors of consciousness and attention in the brain. Trends in Neuroscience 21(2), 58–62 (1998)CrossRefGoogle Scholar
  6. 6.
    Backer, M., Pashler, H.: Volatile visual representations: Failing to detect changes in recently processed information. Psychonomic Bulletin and Review 9, 744–750 (2002)CrossRefGoogle Scholar
  7. 7.
    Baddeley, A.D., Hitch, G.J.: Working memory. In: Bower, G.A. (ed.) Recent Advances in Learning and Motivation, vol. 8, p. 47. Academic Press, New York (1974)Google Scholar
  8. 8.
    Barsalu, L.W.: Grounded cognition. Annu. Rev. Psychol. 59, 617–645 (2008)CrossRefGoogle Scholar
  9. 9.
    Bauckhage, C., Wachsmuth, S., Hanheide, M., Wrede, S., Sagerer, G., Heidemann, G., Ritter, H.: The visual active memory perspective on integrated recognition systems. Image and Vision Computing 26(1) (2008)Google Scholar
  10. 10.
    Bekel, H., Bax, I., Heidemann, G., Ritter, H.: Adaptive computer vision: Online learning for object recognition. In: German Pattern Recognition Symposium, pp. 447–454 (2004)Google Scholar
  11. 11.
    Chan, T., Vese, L.: Active contours without edges 10(2), 266–277 (2001)Google Scholar
  12. 12.
    Chao, L.L., Martin, A.: Representation of manipulable man-made objects in the dorsal stream. Neuroimage 12(4), 478–484 (2000)CrossRefPubMedGoogle Scholar
  13. 13.
    Comaniciu, D., Meer, P.: Mean shift analysis and applications. In: International Conference on Computer Vision, pp. 1197–1203 (1999)Google Scholar
  14. 14.
    Coradeschi, S., Saffiotti, A.: An introduction to the anchoring problem. Robotics and Autonomous Systems 43(2-3), 85–96 (2003)CrossRefGoogle Scholar
  15. 15.
    Cowan, N.: The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences 24(1), 87–185 (2001)CrossRefPubMedGoogle Scholar
  16. 16.
    Weiler, D., Eggert, J., Willert, V., Koerner, E.: A probabilistic method for motion pattern segmentation. In: Proceedings of the IJCNN 2007 (2007)Google Scholar
  17. 17.
    Eggert, J., Einecke, N., Koerner, E.: Tracking in a temporally varying context. In: Tsujino, H., Fujimura, K., Sendhoff, B. (eds.) Proceedings of the 3rd HRI International Workshop on Advances in Computational Intelligence, Honda Research Institute, Wako, Japan (2005)Google Scholar
  18. 18.
    Eriksen, C.W.: Attentional search of the visual field. In: David, B. (ed.) International Conference on Visual Search, 4 John St., London, WC1N 2ET, pp. 3–19. Taylor and Francis Ltd, Abington (1988)Google Scholar
  19. 19.
    Eriksen, C.W., James, J.D.S.: Visual attention within and around the field of focal attention: A zoom lens model. Percept. Psychophys. 40, 225–240 (1986)CrossRefPubMedGoogle Scholar
  20. 20.
    Fahle, M., Morgan, M.: No transfer of perceptual learning between similar stmuli in the same retinal position. Current Biology 6, 292–297 (1996)CrossRefPubMedGoogle Scholar
  21. 21.
    Metta, G., Sandini, G., Konczak, J.: A developmental approach to visually-guided reaching in artificial systems. Neural Networks 12(10), 1413–1427 (1999)CrossRefPubMedGoogle Scholar
  22. 22.
    Goerick, C., Wersing, H., Mikhailova, I., Dunn, M.: Peripersonal space and object recognition for humanoids. In: Proceedings of the IEEE/RSJ International Conference on Humanoid Robots (Humanoids 2005), Tsukuba, Japan (2005)Google Scholar
  23. 23.
    Goldstone, R.L.: Perceptual learning. Annual Review of Psychology 49, 585–612 (1998)CrossRefPubMedGoogle Scholar
  24. 24.
    Grossberg, Stephen, Hong, Simon: A neural model of surface perception: Lightness, anchoring, and filling-in. Spatial Vision 19(2-4), 263–321 (2006)CrossRefPubMedGoogle Scholar
  25. 25.
    Itti, L.: Models of bottom-up attention and saliency. In: Itti, L., Rees, G., Tsotsos, J.K. (eds.) Neurobiology of Attention, pp. 576–582. Elsevier, San Diego (2005)CrossRefGoogle Scholar
  26. 26.
    Eggert, J., Rebhan, S., Koerner, E.: First steps towards an intentional vision system. In: Proceedings of the International Conference on Computer Vision (ICVS 2007) (2007)Google Scholar
  27. 27.
    Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. International Journal for Computer Vision 1(4), 321–331 (1988)CrossRefGoogle Scholar
  28. 28.
    Kim, J., Fisher, J.W., Yezzi, A.J., Çetin, M., Willsky, A.S.: Nonparametric methods for image segmentation using information theory and curve evolution. In: International Conference on Image Processing, Rochester, New York, September 2002, vol. 3, pp. 797–800 (2002)Google Scholar
  29. 29.
    Körner, E., Matsumoto, G.: Cortical architecture and self-referential control for brain-like computation. IEEE Engineering in Medicine and Biology 21(5), 121–133 (2002)CrossRefGoogle Scholar
  30. 30.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  31. 31.
    Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: International Joint Conference on Artificial Intelligence, pp. 674–679 (1981) Google Scholar
  32. 32.
    Marr, D.: Vision. Freeman, San Francisco (1982)Google Scholar
  33. 33.
    Maturana, H., Varela, F.: The Tree of Knowledge - The Biological Roots of Human Understanding. In: New Science Library, Boston (1987)Google Scholar
  34. 34.
    Mumford, D., Shah, J.: Optimal approximation by piecewise smooth functions and associated variational problems. Commun. Pure Appl. Math. 42, 577–685 (1989)CrossRefGoogle Scholar
  35. 35.
    Navalpakkam, V., Itti, L.: An integrated model of top-down and bottom-up attention for optimizing detection speed. In: IEEE Computer Vision and Pattern Recognition or CVPR, pp. II: 2049–II: 2056 (2006)Google Scholar
  36. 36.
    Neumann, B., Moller, R.: On scene interpretation with description logics. Image and Vision Computing 26(1), 82–101 (2008)CrossRefGoogle Scholar
  37. 37.
    Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations. J. Cmpt. Phys. 79, 12–49 (1988)CrossRefGoogle Scholar
  38. 38.
    Pylyshyn, Z.W.: The role of location indexes in spatial perception: A sketch of the FINST spatial index model. Cognition 32(1), 65–97 (1989)CrossRefPubMedGoogle Scholar
  39. 39.
    Pylyshyn, Z.W., Storm, R.W.: Tracking multiple independent targets: Evidence for a parallel tracking mechanism. Spatial Vision 3, 179–197 (1988)CrossRefPubMedGoogle Scholar
  40. 40.
    Rebhan, S., Röhrbein, F., Eggert, J., Körner, E.: Attention modulation using short- and long-term knowledge. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds.) ICVS 2008. LNCS, vol. 5008, pp. 151–160. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  41. 41.
    Rensink, R.A., O’Regan, J., Kevin, J., Clark, J.J.: To see or not to see: the need for attention to perceive changes in scenes. Psychological Science 8(5), 368–373 (1997)CrossRefGoogle Scholar
  42. 42.
    Ristic, B., Arulampalam, S., Gordon, N.: Beyond the Kalman Filter. Artech House (2004)Google Scholar
  43. 43.
    Shams, L., von der Malsburg, C.: Acquisition of visual shape primitives. Vision Research 42(17), 2105–2122 (2002)CrossRefPubMedGoogle Scholar
  44. 44.
    Shi, J., Tomasi, C.: Good features to track. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 1994), Seattle (June 1994)Google Scholar
  45. 45.
    Sperling, G.: The information available in brief visual presentations. Psychological Monographs: General and APllied 74(11), 1–30 (1960)CrossRefGoogle Scholar
  46. 46.
    Treisman, A., Schmidt, H.: Illusory conjunctions in the perception of objects. Cognitive Psychology 14, 107–141 (1982)CrossRefPubMedGoogle Scholar
  47. 47.
    Triesch, J., Ballard, D.H., Hayhoe, M.M., Sullivan, B.T.: What you see is what you need. Journal of Vision 3(1), 86–94 (2003)CrossRefPubMedGoogle Scholar
  48. 48.
    van Gelder, T., Port, R.F.: It’s about time: An overview of the dynamical approach to cognition. In: Port, R.F., van Gelder, T. (eds.) Mind as Motion - Exploration in the Dynamics of Cognition, Bradford Books, MA, pp. 1–43. MIT Press, Cambridge (1995)Google Scholar
  49. 49.
    Vernon, D.: Cognitive vision: The case for embodied perception. Image and Vision Computing 26, 127–140 (2006)CrossRefGoogle Scholar
  50. 50.
    Walther, D., Rutishauser, U., Koch, C., Perona, P.: Selective visual attention enables learning and recognition of multiple objects in cluttered scenes. Computer Vision and Image Understanding 100(1-2), 41–63 (2005)CrossRefGoogle Scholar
  51. 51.
    Weiler, D., Eggert, J.: Segmentation using level-sets and histograms. In: Ishikawa, M., Doya, K., Miyamoto, H., Yamakawa, T. (eds.) ICONIP 2007, Part II. LNCS, vol. 4985, pp. 963–972. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  52. 52.
    Wersing, H., Kirstein, S., Götting, M., Brandl, H., Dunn, M., Mikhailova, I., Goerick, C., Steil, J.J., Ritter, H., Körner, E.: Online learning of objects and faces in an integrated biologically motivated architecture. In: Proc. ICVS, Bielefeld (2007)Google Scholar
  53. 53.
    Wersing, H., Kirstein, S., Schneiders, B., Bauer-Wersing, U., Körner, E.: Online learning for bootstrapping of object recognition and localization in a biologically motivated architecture. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds.) ICVS 2008. LNCS, vol. 5008, pp. 383–392. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  54. 54.
    Williams, C.K.I., Titsias, M.K.: Greedy learning of multiple objects in images using robust statistics and factorial learning. Neural Computation 16(5), 1039–1062 (2004)CrossRefPubMedGoogle Scholar
  55. 55.
    Winn, J., Jojic, N.: LOCUS: Learning object classes with unsupervised segmentation. In: ICCV 2005, pp. I: 756–I: 763 (2005)Google Scholar
  56. 56.
    Zeki, S.: Localization and globalization in conscious vision. Annual Review Neuroscience 24, 57–86 (2001)CrossRefGoogle Scholar
  57. 57.
    Zhu, S.C., Yuille, A.L.: Region competition: Unifying snakes, region growing, and bayes/MDL for multiband image segmentation. PAMI 18(9), 884–900 (1996)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Julian Eggert
    • 1
  • Heiko Wersing
    • 1
  1. 1.Honda Research Institute Europe GmbHOffenbachGermany

Personalised recommendations