Attention Prediction in Egocentric Video Using Motion and Visual Saliency

  • Kentaro Yamada
  • Yusuke Sugano
  • Takahiro Okabe
  • Yoichi Sato
  • Akihiro Sugimoto
  • Kazuo Hiraki
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7087)


We propose a method of predicting human egocentric visual attention using bottom-up visual saliency and egomotion information. Computational models of visual saliency are often employed to predict human attention; however, its mechanism and effectiveness have not been fully explored in egocentric vision. The purpose of our framework is to compute attention maps from an egocentric video that can be used to infer a person’s visual attention. In addition to a standard visual saliency model, two kinds of attention maps are computed based on a camera’s rotation velocity and direction of movement. These rotation-based and translation-based attention maps are aggregated with a bottom-up saliency map to enhance the accuracy with which the person’s gaze positions can be predicted. The efficiency of the proposed framework was examined in real environments by using a head-mounted gaze tracker, and we found that the egomotion-based attention maps contributed to accurately predicting human visual attention.


Visual saliency visual attention first-person vision camera motion estimation 


  1. 1.
    Avraham, T., Lindenbaum, M.: Esaliency (extended saliency): Meaningful attention using stochastic image modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 32(4), 693–708 (2010)CrossRefGoogle Scholar
  2. 2.
    Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. In: Advances in Neural Information Processing Systems (NIPS), vol. 20, pp. 241–248 (2007)Google Scholar
  3. 3.
    Costa, L.: Visual saliency and atention as random walks on complex networks. ArXiv Physics e-prints, arXiv:physics/0603025, pp. 1–6 (2006)Google Scholar
  4. 4.
    Fischler, M., Bolles, R.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24(6), 381–395 (1981)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Foulsham, T., Underwood, G.: What can saliency models predict about eye movements? spatial and sequential aspects of fixations during encoding and recognition. Journal of Vision 8(2:6), 1–17 (2008)Google Scholar
  6. 6.
    Fukuchi, M., Tsuchiya, N., Koch, C.: The focus of expansion in optical flow fields acts as a strong cue for visual attention. Journal of Vision 9(8), 137a (2009)CrossRefGoogle Scholar
  7. 7.
    Hansen, D., Ji, Q.: In the eye of the beholder: A survey of models for eyes and gaze. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 32(3), 478–500 (2010)CrossRefGoogle Scholar
  8. 8.
    Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: Advances in Neural Information Processing Systems (NIPS), vol. 19, pp. 545–552 (2006)Google Scholar
  9. 9.
    Hartley, R.: In defense of the eight-point algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 19(6), 580–593 (1997)CrossRefGoogle Scholar
  10. 10.
    Hillaire, S., Lécuyer, A., Breton, G., Corte, T.R.: Gaze behavior and visual attention model when turning in virtual environments. In: Proceedings of the 16th ACM Symposium on Virtual Reality Software and Technology, VRST 2009, pp. 43–50. ACM, New York (2009)Google Scholar
  11. 11.
    Hillaire, S., Lécuyer, A., Regia-Corte, T., Cozot, R., Royan, J., Breton, G.: A real-time visual attention model for predicting gaze point during first-person exploration of virtual environments. In: Proceedings of the 17th ACM Symposium on Virtual Reality Software and Technology, VRST 2010, pp. 191–198. ACM, New York (2010)Google Scholar
  12. 12.
    Itti, L.: Quantitative modeling of perceptual salience at human eye position. Visual Cognition 14(4), 959–984 (2006)CrossRefGoogle Scholar
  13. 13.
    Itti, L., Baldi, P.F.: Bayesian surprise attracts human attention. In: Advances in Neural Information Processing Systems, NIPS 2005, vol. 19, pp. 547–554 (2006)Google Scholar
  14. 14.
    Itti, L., Dhavale, N., Pighin, F., et al.: Realistic avatar eye and head animation using a neurobiological model of visual attention. In: SPIE 48th Annual International Symposiumon Optical Science and Technology, vol. 5200, pp. 64–78 (2003)Google Scholar
  15. 15.
    Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 20(11), 1254–1259 (1998)CrossRefGoogle Scholar
  16. 16.
    Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: IEEE International Conference on Computer Vision (ICCV), pp. 2106–2113. IEEE (2009)Google Scholar
  17. 17.
    Koch, C., Ullman, S.: Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiology 4(4), 219–227 (1985)Google Scholar
  18. 18.
    Ma, Y., Hua, X., Lu, L., Zhang, H.: A generic framework of user attention model and its application in video summarization. IEEE Transactions on Multimedia 7(5), 907–919 (2005)CrossRefGoogle Scholar
  19. 19.
  20. 20.
    Parkhurst, D., Law, K., Niebur, E.: Modeling the role of salience in the allocation of overt visual attention. Vision Research 42(1), 107–123 (2002)CrossRefGoogle Scholar
  21. 21.
    Qiu, X., Jiang, S., Liu, H., Huang, Q., Cao, L.: Spatial-temporal attention analysis for home video. In: IEEE International Conference on Multimedia and Expo (ICME 2008), pp. 1517–1520 (2008)Google Scholar
  22. 22.
    Shi, J., Tomasi, C.: Good features to track. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 593–600. IEEE (1994)Google Scholar
  23. 23.
    Tomasi, C., Kanade, T.: Detection and tracking of point features. Carnegie Mellon University Technical Report CMU-CS-91-132, pp. 1–22 (1991)Google Scholar
  24. 24.
    Treisman, A., Gelade, G.: A feature-integration theory of attention. Cognitive Psychology 12(1), 97–136 (1980)CrossRefGoogle Scholar
  25. 25.
    Wang, W., Wang, Y., Huang, Q., Gao, W.: Measuring visual saliency by site entropy rate. In: Computer Vision and Pattern Recognition (CVPR), pp. 2368–2375. IEEE (2010)Google Scholar
  26. 26.
    Yamada, K., Sugano, Y., Okabe, T., Sato, Y., Sugimoto, A., Hiraki, K.: Can saliency map models predict human egocentric visual attention? In: Proc. International Workshop on Gaze Sensing and Interactions (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Kentaro Yamada
    • 1
  • Yusuke Sugano
    • 1
  • Takahiro Okabe
    • 1
  • Yoichi Sato
    • 1
  • Akihiro Sugimoto
    • 2
  • Kazuo Hiraki
    • 3
  1. 1.The University of TokyoTokyoJapan
  2. 2.National Institute of InformaticsTokyoJapan
  3. 3.The University of TokyoTokyoJapan

Personalised recommendations