Modelling Spatio-Temporal Saliency to Predict Gaze Direction for Short Videos

  • Sophie Marat
  • Tien Ho Phuoc
  • Lionel Granjon
  • Nathalie Guyader
  • Denis Pellerin
  • Anne Guérin-Dugué
Short Paper


This paper presents a spatio-temporal saliency model that predicts eye movement during video free viewing. This model is inspired by the biology of the first steps of the human visual system. The model extracts two signals from video stream corresponding to the two main outputs of the retina: parvocellular and magnocellular. Then, both signals are split into elementary feature maps by cortical-like filters. These feature maps are used to form two saliency maps: a static and a dynamic one. These maps are then fused into a spatio-temporal saliency map. The model is evaluated by comparing the salient areas of each frame predicted by the spatio-temporal saliency map to the eye positions of different subjects during a free video viewing experiment with a large database (17000 frames). In parallel, the static and the dynamic pathways are analyzed to understand what is more or less salient and for what type of videos our model is a good or a poor predictor of eye movement.


Saliency Spatio-temporal model Gaze prediction Video viewing 


  1. Beaudot, W. H. (1994). The neural information in the vertebra retina: a melting pot of ideas for artificial vision. PHD thesis, Tirf laboratory, Grenoble, France. Google Scholar
  2. Beaudot, W. H. A., Palagi, P., & Hérault, J. (1993). Realistic simulation tool for early visual processing including space, time and colour data. In Lecture notes in computer science : Vol. 686. IWANN (pp. 370–375). Barcelona, June 1993. Berlin: Springer. Google Scholar
  3. Bruno, E., & Pellerin, D. (2002). Robust motion estimation using spatial Gabor-like filters. Signal Processing, 82, 297–309. MATHCrossRefGoogle Scholar
  4. Carmi, R., & Itti, L. (2006). Visual causes versus correlates of attentional selection in dynamic scenes. Vision Research, 46, 4333–4345. CrossRefGoogle Scholar
  5. Daugman, J. G. (1980). Two-dimensional spectral analysis of cortical receptive field profiles. Vision Research, 20, 847–856. CrossRefGoogle Scholar
  6. DeValois, R. L. (1991). Orientation and spatial frequency selectivity: properties and modular organization. In A. Valberg & B. B. Lee (Eds.). From pigment to perception. New York: Plenum. Google Scholar
  7. Egeth, H. E., & Yantis, S. (1997). Visual attention: control representation and time course. Annual Review of Psychology, 48, 269–297. CrossRefGoogle Scholar
  8. Guironnet, M., Pellerin, D., Guyader, N., & Ladret, P. (2007). Video summarization based on camera motion and a subjective evaluation method. EURASIP Journal on Image and Video Processing, 2007, Article ID 60245, 12 pages. Google Scholar
  9. Hansen, T., Sepp, W., & Neumann, H. (2001). Recurrent long-range interactions in early vision. In Lecture notes in computer science/Lecture notes in artificial intelligence : Vol. 2036. Emergent neural computational architectures based on neuroscience (pp. 139–153). Berlin: Springer. Google Scholar
  10. Henderson, J. M. (2003). Human gaze control during real-world scene perception. Trends in Cognitive Sciences, 7, 498–504. CrossRefGoogle Scholar
  11. Hubel, D. H., & Wiesel, T. N. (1977). Functional architecture of macaque visual cortex. Proceedings of the Royal Society of London, B, 198, 1–59. CrossRefGoogle Scholar
  12. Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 1254–1259. CrossRefGoogle Scholar
  13. Koch, C., & Ullman, S. (1985). Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiology, 4, 219–227. Google Scholar
  14. Le Meur, O., Le Callet, P., & Barba, D. (2006). A coherent computational approach to model bottom-up visual attention. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 802–817. CrossRefGoogle Scholar
  15. Le Meur, O., Le Callet, P., & Barba, D. (2007). Predicting visual fixations on video based on low-level visual features. Vision Research, 47, 2483–2498. CrossRefGoogle Scholar
  16. Lisberger, S. G., Morris, E. J., & Tychsen, L. (1987). Visual motion processing and sensory-motor integration for smooth pursuit eye movements. Annual Review—Neuroscience, 10, 97–129. CrossRefGoogle Scholar
  17. Ma, Y.-F., Hua, X.-S., Lu, L., & Zhang, H.-J. (2005). A generic framework of user attention model and its application in video summarization. IEEE Transactions on Multimedia, 7. Google Scholar
  18. Marat, S., Ho Phuoc, T., Granjon, L., Guyader, N., Pellerin, D., & Guérin-Dugué, A. (2008). Spatiotemporal saliency model to predict eye movements in video free viewing. In EUSIPCO’08—16th European signal processing conference, Lausanne, Switzerland, 2008. Google Scholar
  19. Massot, C., & Hérault, J. (2008). Model of frequency analysis in the visual cortex and the shape from texture problem. International Journal of Computer Vision, 76, 165–182. CrossRefGoogle Scholar
  20. Milanese, R., Wechsler, H., Gil, S., Bost, J.-M., & Pun, T. (1994). Integration of bottom-up and top-down cues for visual attention using non-linear relaxation. In Proc. CVPR (pp. 781–785) 1994. Google Scholar
  21. Odobez, J.-M., & Bouthemy, P. (1995). Robust multiresolution estimation of parametric motion models. Journal of Visual Communication and Image Representation, 6, 348–365. CrossRefGoogle Scholar
  22. Palmer, S. E. (1999). Vision science: photons to phenomenology (1st edn.). Cambridge: MIT Press. Google Scholar
  23. Parkhurst, D., Law, K., & Niebur, E. (2002). Modeling the role of salience in the allocation of overt visual attention. Vision Research, 42, 107–123. CrossRefGoogle Scholar
  24. Peters, R. J., & Itti, L. (2008). Applying computational tools to predict gaze direction in interactive visual environments. ACM Transactions on Applied Perception, 5. Google Scholar
  25. Peters, R. J., Iyer, A., Itti, L., & Koch, C. (2005). Components of bottom-up gaze allocation in natural images. Vision Research, 45, 2397–2416. CrossRefGoogle Scholar
  26. Rajashekar, U., Cormack, L. K., & Bovik, A. C. (2004). Point of gaze analysis reveals visual search strategies. In Proceedings of SPIE : Vol. 5292. Human vision and electronic imaging IX 2004 (pp. 296–306). Bellingham: SPIE Press. CrossRefGoogle Scholar
  27. Reinagel, P., & Zador, A. (1999). Natural scene statistics at the center of gaze. Network: Computation in Neural Systems, 10, 341–350. MATHCrossRefGoogle Scholar
  28. Schwartz, S. H. (2004). Visual perception: a clinical orientation (3rd edn.). New-York: McGraw-Hill. Google Scholar
  29. Tatler, B. W., Baddeley, R. J., & Gilchrist, I. D. (2005). Visual correlates of fixation selection: effects of scale and time. Vision Research, 45, 643–659. CrossRefGoogle Scholar
  30. Torralba, A., Oliva, A., Castelhano, M. S., & Henderson, J. M. (2006). Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychological Review, 113, 766–786. CrossRefGoogle Scholar
  31. Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12, 97–136. CrossRefGoogle Scholar
  32. Tsotsos, J. K., Culhane, S. M., Winky, Y. K. W., Lai, Y., Davis, N., & Nuflo, F. (1995). Modeling visual attention via selective tuning. Artificial Intelligence, 78, 507–545. CrossRefGoogle Scholar
  33. Tsotsos, J. K., Rodríguez-Sánchez, A. J., Rothenstein, A. L., & Simine, E. (2008). The different stages of visual recognition need different attentional binding strategies. Brain Research, 1225, 119–132. CrossRefGoogle Scholar
  34. Wolfe, J. M., Alvarez, G. A., & Horowitz, T. S. (2000). Attention is fast but volition is slow. Nature, 406, 691. CrossRefGoogle Scholar
  35. Wolfe, J. M., Cave, K. R., & Franzel, S. L. (2006). Guided search: an alternative to the feature integration model for visual search. Journal of Experimental Psychology: Human Perception and Performance, 15, 419–433. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Sophie Marat
    • 1
  • Tien Ho Phuoc
    • 1
  • Lionel Granjon
    • 1
  • Nathalie Guyader
    • 1
  • Denis Pellerin
    • 1
  • Anne Guérin-Dugué
    • 1
  1. 1.GIPSA-Lab/Department Images-SignalGrenoble CedexFrance

Personalised recommendations