Vision, Attention Control, and Goals Creation System

  • Konstantinos Rapantzikos
  • Yannis Avrithis
  • Stefanos Kolias
Part of the Springer Series in Cognitive and Neural Systems book series (SSCNS)


Biological visual attention has been long studied by experts in the field of cognitive psychology. The Holy Grail of this study is the exact modeling of the interaction between the visual sensory and the process of perception. It seems that there is an informal agreement on the four important functions of the attention process: (a) the bottom-up process, which is responsible for the saliency of the input stimuli; (b) the top-down process that bias attention toward known areas or regions of predefined characteristics; (c) the attentional selection that fuses information derived from the two previous processes and enables focus; and (d) the dynamic evolution of the attentional selection process. In the following, we will outline established computational solutions for each of the four functions.


Video Sequence Visual Attention Action Recognition Interest Point Salient Region 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Abrams, R.A., Christ, S.E., “Motion onset captures attention”, Psychological Science, vol. 14, pp. 427–432, 2003.PubMedCrossRefGoogle Scholar
  2. Adam, A., Rivlin, E., Shimshoni, I., Reinitz, D., “Robust Real-Time Unusual Event Detection using Multiple Fixed-Location Monitors”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 3, pp. 555–560, Mar 2008.PubMedCrossRefGoogle Scholar
  3. Black, M.J., Anandan, P., “The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields”, CVIU, vol. 63, no. 1, pp. 75–104, 1996.Google Scholar
  4. Boiman, O., Irani, M., “Detecting Irregularities in Images and in Video”, IEEE International Conference on Computer Vision (ICCV), Beijing, 2005.Google Scholar
  5. Bosch, A., Zisserman, A., Munoz, X., “Scene Classification via pLSA”, ECCV06, pp. 517–530, 2006.Google Scholar
  6. Bruce, N.D.B., Tsotsos, J.K., Saliency, attention, and visual search: An information theoretic approach. Journal of Vision, vol. 9, no. 3, pp. 1–24, 2009.PubMedCrossRefGoogle Scholar
  7. Bruce, N., Tsotsos, J., “Saliency based on information maximization”, Advances in Neural Information Processing Systems, vol. 18, pp. 155–162, 2006.Google Scholar
  8. Csurka, G., Bray, C., Dance, C., Fan, L., “Visual categorization with bags of key-points”, pp. 1–22, Workshop on Statistical Learning in Computer Vision, ECCV, 2004.Google Scholar
  9. Cutsuridis, V., “A Cognitive Model of Saliency, Attention, and Picture Scanning”, Cognitive Computation, vol. 1, no. 4, pp. 292–299, Sep. 2009.Google Scholar
  10. Dollár, P., Rabaud, V., Cottrell, G., Belongie, S., “Behavior Recognition via Sparse Spatio-Temporal Features”, VS-PETS, pp. 65–72, Oct 2005.Google Scholar
  11. Duncan, J., “Selective attention and the organization of visual information”, Journal of Experimental Psychology: General, vol. 113, no. 4, pp. 501–517, 1984.CrossRefGoogle Scholar
  12. Evangelopoulos, G., Rapantzikos, K., Potamianos, A., Maragos, P., Zlatintsi, A., Avrithis, Y., “Movie Summarization Based On Audio-Visual Saliency Detection”, Proceedings International Conference on Image Processing (ICIP), San Diego, California, 2008.Google Scholar
  13. Evangelopoulos, G., Zlatintsi, A., Skoumas, G., Rapantzikos, K., Potamianos, A., Maragos, P., Avrithis, Y., “Video event detection and summarization using audio, visual and text saliency”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3553–3556, 2009.Google Scholar
  14. Frintrop, S., Cremers, A., “Top-down attention supports visual loop closing”, In. Proceedings Of European Conference On Mobile Robotics (ECMR’05), 2007.Google Scholar
  15. Frintrop, S., Backer, G., Rome, E., “Goal directed search with a top-down modulated computational attention system”, LCNS, vol. 3663, no. 117, 2005.Google Scholar
  16. Frintrop, S., Rome, E., Nuchter, A., Surmann, H., “A bimodal laser-based attention system”, Computer Vision and Image Understanding, vol. 100, no. 1–2, pp. 124–151, 2005.CrossRefGoogle Scholar
  17. Frintrop, S., Klodt, M., Rome, E., “A real-time visual attention system using integral images”, In Proceedings Of the 5th International Conference on Computer Vision systems, ICVS, 2007.Google Scholar
  18. Hamid, R., Johnson, A., Batta, S., Bobick, A., Isbell, C., Coleman, G., “Detection and explanation of anomalous activities: representing activities as bags of event n-grams”, CVPR’05, vol. 1, pp. 1031–1038, Jun 2005.Google Scholar
  19. Harris, C., Stephens, M., “A combined corner and edge detector”, Alvey Vision Conference, pp. 147–152, 1988.Google Scholar
  20. Itti, L., Baldi, P., “A Principled Approach to Detecting Surprising Events in Video”, CVPR’05, 2005, vol. 1, pp. 631–637, 2005.Google Scholar
  21. Itti, L., Baldi, P., “Bayesian surprise attracts human attention”, Vision Research, vol. 49, no. 10, pp. 1295–1306, 2009.PubMedCrossRefGoogle Scholar
  22. Itti, L., Koch, C., Niebur, E., “A model of saliency-based visual attention for rapid scene analysis”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1254–1259, 1998.CrossRefGoogle Scholar
  23. James, W., “The principles of psychology”, Cambridge, MAL Harvard UP, 1890/1981.Google Scholar
  24. Kadir, T., Brady, M., Saliency, scale and image description, International Journal of Computer Vision, vol. 45, no. 2, pp. 83–105, 2001.CrossRefGoogle Scholar
  25. Kandel, E.R., Schwartz, J.H., Jessell, T.M., “Essentials of Neural Science and Behavior”, Appleton & Lange, Stamford, Connecticut, 1995.Google Scholar
  26. Koch, C., Ullman, S., “Shifts in selective visual attention: towards the underlying neural circuitry”, Human Neurobiology, vol. 4, no. 4, pp. 219–227, 1985.PubMedGoogle Scholar
  27. Koffka, K., Principles of Gestalt Psychology, Harcourt, New York, 1935.Google Scholar
  28. Laptev, I., Lindeberg, T., “Space-Time Interest Points”, in Proceedings of the ICCV’03, Nice, France, pp. 432–443, 2003.Google Scholar
  29. Laptev, I., Caputo, B., Schuldt, C., Lindeberg, T., “Local Velocity-Adapted Motion Events for Spatio-Temporal Recognition”, Computer Vision and Image Understanding, vol. 108, pp. 207–229, 2007.CrossRefGoogle Scholar
  30. Lee, K., Buxton, H., Feng, J., “Cue-guided search: A computational model of selective attention”, IEEE Transactions On Neural Networks, vol. 16, no. 4, pp. 910–924, 2005.PubMedCrossRefGoogle Scholar
  31. Leventhal, A., “The neural basis of visual function: vision and visual dysfunction”, Nature Neuroscience, vol. 4, 1991.Google Scholar
  32. Lindeberg, T., “Feature detection with automatic scale selection”, International Journal of Computer Vision, vol. 30, no. 2, pp. 79–116, 1998.CrossRefGoogle Scholar
  33. Lowe, D., “Object recognition from local scale-invariant features”, In Proceedings of ICCV, pp. 1150–1157, 1999.Google Scholar
  34. Mahadevan, V., Vasconcelos, N., “Spatiotemporal Saliency in Dynamic Scenes”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009.Google Scholar
  35. Ma, Y.F., Lu, L. Zhang, H.J., Li, M., “A user attention model for video summarization”, ACM Multimedia Conference, pp. 533–542, 2002.Google Scholar
  36. May, Y., Zhang, H., “Contrast-based image attention analysis by using fuzzy growing”, In Proceedings ACM International Conference on Multimedia, pp. 374–381, 2003.Google Scholar
  37. Mikolajczyk, K., Schmid, C., “An affine invariant interest point detector”, ECCV, pp. 128–142, 2002.Google Scholar
  38. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., “A comparison of affine region detectors”, International Journal of Computer Vision, vol. 65, no. 1/2, pp. 43–72, 2005.CrossRefGoogle Scholar
  39. Milanese, R., Gil, S., Pun, T., “Attentive mechanisms for dynamic and static scene analysis”, Optical Engineering, vol. 34 no. 8, pp. 2428–2434, 1995.CrossRefGoogle Scholar
  40. Navalpakkam, V., Itti, L., “An integrated model of top-down and bottom-up attention for optimal object detection”, Computer Vision and Pattern Recognition (CVPR), pp. 1–7, 2006.Google Scholar
  41. Navalpakkam, V., Itti, L., “Modeling the influence of task on attention”, Vision Research, vol. 45, no. 2, pp. 205–231, 2005.PubMedCrossRefGoogle Scholar
  42. Niebles, J.C., Wang, H., Fei-Fei, L., “Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words”, British Machine Vision Conference (BMVC), Edinburgh, 2006.Google Scholar
  43. Okamoto, H., Yasugi, Y., Babaguchi, N., Kitahashi, T., “Video clustering using spatiotemporal image with fixed length”, ICME’02, pp. 2002–2008, 2002.Google Scholar
  44. Oliver, N.M., Rosario, B., Pentland, A.P., “A Bayesian computer vision system for modeling human interactions”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, Aug 2000.Google Scholar
  45. Park, S., Shin, J., Lee, M., “Biologically inspired saliency map model for bottom-up visual attention”, Lectrure Notes in Computer Science, pp. 418–426, 2002.Google Scholar
  46. Rapantzikos, K., Avrithis, Y., “An enhanced spatiotemporal visual attention model for sports video analysis”, International Workshop on Content-based Multimedia indexing (CBMI’05), Riga, Latvia, Jun 2005.Google Scholar
  47. Rapantzikos, K., Tsapatsoulis, N., “Enhancing the robustness of skin-based face detection schemes through a visual attention architecture”, Proceedings of the IEEE International Conference on Image Processing (ICIP), Genova, Italy, vol. 2, pp. 1298–1301, 2005.Google Scholar
  48. Rapantzikos, K., Tsapatsoulis, N., Avrithis, Y., Kollias, S., “A Bottom-Up Spatiotemporal Visual Attention Model for Video Analysis”, IET Image Processing, vol. 1, no. 2, pp. 237–248, 2007.CrossRefGoogle Scholar
  49. Rapantzikos, K., Avrithis, Y., Kollias, S., “Dense saliency-based spatiotemporal feature points for action recognition”, Conference on Computer Vision and Pattern Recognition (CVPR), 2009.Google Scholar
  50. Rapantzikos, K., Tsapatsoulis, N., Avrithis, Y., Kollias, S., “Spatiotemporal saliency for video classification”, Signal Processing: Image Communication, vol. 24, no. 7, pp. 557–571, 2009.CrossRefGoogle Scholar
  51. Rensink, R.A., “Seeing, sensing, and scrutinizing”, Vision Research, vol. 40, no. 10–12, pp. 1469–1487, 2000.PubMedCrossRefGoogle Scholar
  52. Ristivojević, M., Konrad, J., “Space-time image sequence analysis: object tunnels and occlusion volumes”, IEEE Transactions Of Image Processings, vol. 15, pp. 364–376, Feb. 2006.Google Scholar
  53. Rothenstein, A., Tsotsos, J., “Attention links sensing to recognition”, Image and Vision Computing, vol. 26, no. 1, pp. 114–126, 2008.CrossRefGoogle Scholar
  54. Rutishauer, U. Walther, D., Koch, C., Perona, P., “Is bottom-up attention useful for object recognition?”, Computer Vision and Pattern Recognition (CVPR), vol. 2, 2004.Google Scholar
  55. Rybak, I., Gusakova, V., Golovan, A., Podladchikova, L., Shevtsova, N., “A model of attention-guided visual perception and recognition”, Vision Research, vol. 38, no. 15, pp. 2387–2400, 1998.PubMedCrossRefGoogle Scholar
  56. Shao, L., Kadir, T., Brady, M., “Geometric and photometric invariant distinctive regions detection”, Information Sciences 177, vol. 4, pp. 1088–1122, 2007.Google Scholar
  57. Siagian, C., Itti, L., “Biologically inspired robotics vision monte-carlo localization in the outdoor environment, In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2007.Google Scholar
  58. Sillito, A., Jones, H., “Context-dependent interactions and visual processing in V1”, Journal of Physiology-Paris, vol. 90, no. 3–4, pp. 205–209, 1996.CrossRefGoogle Scholar
  59. Stauffer, C., Grimson, E., “Learning Patterns of Activity Using Real-Time Tracking”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 747–757, Aug 2000.CrossRefGoogle Scholar
  60. Sternberg, R., “Cognitive Psychology,” Wadsworth Publishing, 2006.Google Scholar
  61. Sun, Y., Fisher, R., “Object-based visual attention for computer vision”, Artificial Intelligence, vol. 146, no. 1, pp. 77–123, 2003.CrossRefGoogle Scholar
  62. Taylor, J.G., “Attentional movement: the control basis for consciousness”, Society for Neuroscience Abstracts, vol. 26, no. 2231, 2000.Google Scholar
  63. Taylor, J.G., “CODAM: A neural network model of consciousness”, Neural Networks, vol. 20, no. 9, pp. 983–992, Nov 2007.PubMedCrossRefGoogle Scholar
  64. Taylor, J.G., “On the neurodoynamics of the creation of consciousness”, Cognitive Neurodynamics, vol. 1, no. 2, Jun 2007.Google Scholar
  65. Taylor, J.G., “Paying attention to consciousness”, Progress in Neurobiology, vol. 71, pp. 305–335, 2003.PubMedCrossRefGoogle Scholar
  66. Taylor, J.G., Hartley, M., Taylor, N., Panchev, C., Kasderidis, S., “A hierarchical attention-based neural network architecture, based on human brain guidance, for perception, conceptualisation, action and reasoning”, Image and Vision Computing, vol. 27, no. 11, pp. 1641–1657, 2009.CrossRefGoogle Scholar
  67. Torralba, A., “Contextual priming for object detection”, International Journal of Computer Vision, vol. 53, no. 2, pp. 169–191, 2003.CrossRefGoogle Scholar
  68. Torralba, A., Oliva, A., Castelahno, M., Henderson, J., “Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search”, Psychological Review, vol. 113, no. 4, pp. 766–786, 2006.PubMedCrossRefGoogle Scholar
  69. Treisman, A.M., Gelade, G., “A feature integration theory of attention”, Cognitive Psychology, vol. 12, no. 1, pp. 97–136, 1980.PubMedCrossRefGoogle Scholar
  70. Tsotsos, J.K., Culhane, S.M., Wai, W.Y.K., Lai, Y., Davis, N., Nuflo, F., “Modelling visual attention via selective tuning”, Artifficial Intelligence, vol. 78, pp. 507–545, 1995.CrossRefGoogle Scholar
  71. Walther, D., Koch, C., “Modelling attention to salient proto-objects”, Neural Networks, vol. 19, no. 9, pp. 1395–1407, 2006.PubMedCrossRefGoogle Scholar
  72. Walther, D., Rutishauer, U., Koch, C., Perona, P., “On the uselfuness of attention for object recognition”, In Workshop of Attention for Object Recognition at ECCV, pp. 96–103, 2004.Google Scholar
  73. Walther, D., Rutishauer, U., Koch, C., Perona, P., “Selective visual attention enables learning and recognition of multiple objects in cluttered scenes, Computer Vision and Image Understanding (CVIU), vol. 100, no. 1–2, pp. 41–63, 2005.Google Scholar
  74. Wang, Y., Jiang, H., Drew, M.S., Li, Z., Mori, G., “Unsupervised Discovery of Action Classes”. In Proceedings of CVPR’06, vol. 2, pp. 17–22, 2006.Google Scholar
  75. Wertheimer, M., “Laws of Organization in Perceptual Forms”, First published as “Untersuchungen zur Lehre von der Gestalt II, in Psycologische Forschung, vol. 4, pp. 301–350, 1923.Google Scholar
  76. Wolfe, J.M., “Guided search 2.0: A revised model of visual search”, Psychonomic Bulletin & Review 1, vol. 2, pp. 202–238, 1994.Google Scholar
  77. Wolfe, J.M., “Guided search 4.0: current progress with a model of visual search”, Integrated Models of Cognitive Systems, pp. 99–119, 2007.Google Scholar
  78. Wolfe, J.M., Cave, K.R., Franzel, S.L., “Guided search: an alternative to the feature integration model for visual search”, Journal of Experimental Psychology: Human Perception and Performance, vol. 15, no. 3, pp. 419–433, 1989.PubMedCrossRefGoogle Scholar
  79. Zhong, H., Shi, J., Visontai, M., “Detecting Unusual Activity in Video”, CVPR’04, Washington, DC, vol. 2, pp. 819–826, Jun 2004.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Konstantinos Rapantzikos
    • 1
  • Yannis Avrithis
  • Stefanos Kolias
  1. 1.Image, Video and Multimedia Systems Laboratory, Computer Science Division, School of Electrical and Computer EngineeringNational Technical University of AthensZografouGreece

Personalised recommendations