Modelling Spatio-Temporal Saliency to Predict Gaze Direction for Short Videos

Marat, Sophie; Ho Phuoc, Tien; Granjon, Lionel; Guyader, Nathalie; Pellerin, Denis; Guérin-Dugué, Anne

doi:10.1007/s11263-009-0215-3

Modelling Spatio-Temporal Saliency to Predict Gaze Direction for Short Videos

Short Paper
Published: 12 February 2009

Volume 82, pages 231–243, (2009)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Sophie Marat¹,
Tien Ho Phuoc¹,
Lionel Granjon¹,
Nathalie Guyader¹,
Denis Pellerin¹ &
…
Anne Guérin-Dugué¹

1047 Accesses
155 Citations
9 Altmetric
Explore all metrics

Abstract

This paper presents a spatio-temporal saliency model that predicts eye movement during video free viewing. This model is inspired by the biology of the first steps of the human visual system. The model extracts two signals from video stream corresponding to the two main outputs of the retina: parvocellular and magnocellular. Then, both signals are split into elementary feature maps by cortical-like filters. These feature maps are used to form two saliency maps: a static and a dynamic one. These maps are then fused into a spatio-temporal saliency map. The model is evaluated by comparing the salient areas of each frame predicted by the spatio-temporal saliency map to the eye positions of different subjects during a free video viewing experiment with a large database (17000 frames). In parallel, the static and the dynamic pathways are analyzed to understand what is more or less salient and for what type of videos our model is a good or a poor predictor of eye movement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Superior colliculus neurons encode a visual saliency map during free viewing of natural dynamic video

Article Open access 24 January 2017

Gaze Location Prediction with Depth Features as Auxiliary Information

Modeling of Human Saccadic Scanpaths Based on Visual Saliency

References

Beaudot, W. H. (1994). The neural information in the vertebra retina: a melting pot of ideas for artificial vision. PHD thesis, Tirf laboratory, Grenoble, France.
Beaudot, W. H. A., Palagi, P., & Hérault, J. (1993). Realistic simulation tool for early visual processing including space, time and colour data. In Lecture notes in computer science : Vol. 686. IWANN (pp. 370–375). Barcelona, June 1993. Berlin: Springer.
Google Scholar
Bruno, E., & Pellerin, D. (2002). Robust motion estimation using spatial Gabor-like filters. Signal Processing, 82, 297–309.
Article MATH Google Scholar
Carmi, R., & Itti, L. (2006). Visual causes versus correlates of attentional selection in dynamic scenes. Vision Research, 46, 4333–4345.
Article Google Scholar
Daugman, J. G. (1980). Two-dimensional spectral analysis of cortical receptive field profiles. Vision Research, 20, 847–856.
Article Google Scholar
DeValois, R. L. (1991). Orientation and spatial frequency selectivity: properties and modular organization. In A. Valberg & B. B. Lee (Eds.). From pigment to perception. New York: Plenum.
Egeth, H. E., & Yantis, S. (1997). Visual attention: control representation and time course. Annual Review of Psychology, 48, 269–297.
Article Google Scholar
Guironnet, M., Pellerin, D., Guyader, N., & Ladret, P. (2007). Video summarization based on camera motion and a subjective evaluation method. EURASIP Journal on Image and Video Processing, 2007, Article ID 60245, 12 pages.
Hansen, T., Sepp, W., & Neumann, H. (2001). Recurrent long-range interactions in early vision. In Lecture notes in computer science/Lecture notes in artificial intelligence : Vol. 2036. Emergent neural computational architectures based on neuroscience (pp. 139–153). Berlin: Springer.
Google Scholar
Henderson, J. M. (2003). Human gaze control during real-world scene perception. Trends in Cognitive Sciences, 7, 498–504.
Article Google Scholar
Hubel, D. H., & Wiesel, T. N. (1977). Functional architecture of macaque visual cortex. Proceedings of the Royal Society of London, B, 198, 1–59.
Article Google Scholar
Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 1254–1259.
Article Google Scholar
Koch, C., & Ullman, S. (1985). Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiology, 4, 219–227.
Google Scholar
Le Meur, O., Le Callet, P., & Barba, D. (2006). A coherent computational approach to model bottom-up visual attention. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 802–817.
Article Google Scholar
Le Meur, O., Le Callet, P., & Barba, D. (2007). Predicting visual fixations on video based on low-level visual features. Vision Research, 47, 2483–2498.
Article Google Scholar
Lisberger, S. G., Morris, E. J., & Tychsen, L. (1987). Visual motion processing and sensory-motor integration for smooth pursuit eye movements. Annual Review—Neuroscience, 10, 97–129.
Article Google Scholar
Ma, Y.-F., Hua, X.-S., Lu, L., & Zhang, H.-J. (2005). A generic framework of user attention model and its application in video summarization. IEEE Transactions on Multimedia, 7.
Marat, S., Ho Phuoc, T., Granjon, L., Guyader, N., Pellerin, D., & Guérin-Dugué, A. (2008). Spatiotemporal saliency model to predict eye movements in video free viewing. In EUSIPCO’08—16th European signal processing conference, Lausanne, Switzerland, 2008.
Massot, C., & Hérault, J. (2008). Model of frequency analysis in the visual cortex and the shape from texture problem. International Journal of Computer Vision, 76, 165–182.
Article Google Scholar
Milanese, R., Wechsler, H., Gil, S., Bost, J.-M., & Pun, T. (1994). Integration of bottom-up and top-down cues for visual attention using non-linear relaxation. In Proc. CVPR (pp. 781–785) 1994.
Odobez, J.-M., & Bouthemy, P. (1995). Robust multiresolution estimation of parametric motion models. Journal of Visual Communication and Image Representation, 6, 348–365.
Article Google Scholar
Palmer, S. E. (1999). Vision science: photons to phenomenology (1st edn.). Cambridge: MIT Press.
Google Scholar
Parkhurst, D., Law, K., & Niebur, E. (2002). Modeling the role of salience in the allocation of overt visual attention. Vision Research, 42, 107–123.
Article Google Scholar
Peters, R. J., & Itti, L. (2008). Applying computational tools to predict gaze direction in interactive visual environments. ACM Transactions on Applied Perception, 5.
Peters, R. J., Iyer, A., Itti, L., & Koch, C. (2005). Components of bottom-up gaze allocation in natural images. Vision Research, 45, 2397–2416.
Article Google Scholar
Rajashekar, U., Cormack, L. K., & Bovik, A. C. (2004). Point of gaze analysis reveals visual search strategies. In Proceedings of SPIE : Vol. 5292. Human vision and electronic imaging IX 2004 (pp. 296–306). Bellingham: SPIE Press.
Chapter Google Scholar
Reinagel, P., & Zador, A. (1999). Natural scene statistics at the center of gaze. Network: Computation in Neural Systems, 10, 341–350.
Article MATH Google Scholar
Schwartz, S. H. (2004). Visual perception: a clinical orientation (3rd edn.). New-York: McGraw-Hill.
Google Scholar
Tatler, B. W., Baddeley, R. J., & Gilchrist, I. D. (2005). Visual correlates of fixation selection: effects of scale and time. Vision Research, 45, 643–659.
Article Google Scholar
Torralba, A., Oliva, A., Castelhano, M. S., & Henderson, J. M. (2006). Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychological Review, 113, 766–786.
Article Google Scholar
Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12, 97–136.
Article Google Scholar
Tsotsos, J. K., Culhane, S. M., Winky, Y. K. W., Lai, Y., Davis, N., & Nuflo, F. (1995). Modeling visual attention via selective tuning. Artificial Intelligence, 78, 507–545.
Article Google Scholar
Tsotsos, J. K., Rodríguez-Sánchez, A. J., Rothenstein, A. L., & Simine, E. (2008). The different stages of visual recognition need different attentional binding strategies. Brain Research, 1225, 119–132.
Article Google Scholar
Wolfe, J. M., Alvarez, G. A., & Horowitz, T. S. (2000). Attention is fast but volition is slow. Nature, 406, 691.
Article Google Scholar
Wolfe, J. M., Cave, K. R., & Franzel, S. L. (2006). Guided search: an alternative to the feature integration model for visual search. Journal of Experimental Psychology: Human Perception and Performance, 15, 419–433.
Article Google Scholar

Download references

Author information

Authors and Affiliations

GIPSA-Lab/Department Images-Signal, BP 46, 38402, Grenoble Cedex, France
Sophie Marat, Tien Ho Phuoc, Lionel Granjon, Nathalie Guyader, Denis Pellerin & Anne Guérin-Dugué

Authors

Sophie Marat
View author publications
You can also search for this author in PubMed Google Scholar
Tien Ho Phuoc
View author publications
You can also search for this author in PubMed Google Scholar
Lionel Granjon
View author publications
You can also search for this author in PubMed Google Scholar
Nathalie Guyader
View author publications
You can also search for this author in PubMed Google Scholar
Denis Pellerin
View author publications
You can also search for this author in PubMed Google Scholar
Anne Guérin-Dugué
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sophie Marat.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marat, S., Ho Phuoc, T., Granjon, L. et al. Modelling Spatio-Temporal Saliency to Predict Gaze Direction for Short Videos. Int J Comput Vis 82, 231–243 (2009). https://doi.org/10.1007/s11263-009-0215-3

Download citation

Received: 13 May 2008
Accepted: 13 January 2009
Published: 12 February 2009
Issue Date: May 2009
DOI: https://doi.org/10.1007/s11263-009-0215-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modelling Spatio-Temporal Saliency to Predict Gaze Direction for Short Videos

Abstract

Access this article

Similar content being viewed by others

Superior colliculus neurons encode a visual saliency map during free viewing of natural dynamic video

Gaze Location Prediction with Depth Features as Auxiliary Information

Modeling of Human Saccadic Scanpaths Based on Visual Saliency

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modelling Spatio-Temporal Saliency to Predict Gaze Direction for Short Videos

Abstract

Access this article

Similar content being viewed by others

Superior colliculus neurons encode a visual saliency map during free viewing of natural dynamic video

Gaze Location Prediction with Depth Features as Auxiliary Information

Modeling of Human Saccadic Scanpaths Based on Visual Saliency

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation