Decoding What People See from Where They Look: Predicting Visual Stimuli from Scanpaths

Cerf, Moran; Harel, Jonathan; Huth, Alex; Einhäuser, Wolfgang; Koch, Christof

doi:10.1007/978-3-642-00582-4_2

Moran Cerf²¹,
Jonathan Harel²¹,
Alex Huth²¹,
Wolfgang Einhäuser²² &
…
Christof Koch²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5395))

Included in the following conference series:

International Workshop on Attention in Cognitive Systems

1011 Accesses
9 Citations

Abstract

Saliency algorithms are applied to correlate with the overt attentional shifts, corresponding to eye movements, made by observers viewing an image. In this study, we investigated if saliency maps could be used to predict which image observers were viewing given only scanpath data. The results were strong: in an experiment with 441 trials, each consisting of 2 images with scanpath data - pooled over 9 subjects - belonging to one unknown image in the set, in 304 trials (69%) the correct image was selected, a fraction significantly above chance, but much lower than the correctness rate achieved using scanpaths from individual subjects, which was 82.4%. This leads us to propose a new metric for quantifying the importance of saliency map features, based on discriminability between images, as well as a new method for comparing present saliency map efficacy metrics. This has potential application for other kinds of predictions, e.g., categories of image content, or even subject class.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Young, M., Yamane, S.: Sparse population coding of faces in the inferotemporal cortex. Science 256(5061), 1327–1331 (1992)
Article CAS PubMed Google Scholar
Schwartz, E., Desimone, R., Albright, T., Gross, C.: Shape Recognition and Inferior Temporal Neurons. Proceedings of the National Academy of Sciences of the United States of America 80(18), 5776–5778 (1983)
Article CAS PubMed PubMed Central Google Scholar
Sato, T., Kawamura, T., Iwai, E.: Responsiveness of inferotemporal single units to visual pattern stimuli in monkeys performing discrimination. Experimental Brain Research 38(3), 313–319 (1980)
Article CAS PubMed Google Scholar
Perrett, D., Rolls, E., Caan, W.: Visual neurones responsive to faces in the monkey temporal cortex. Experimental Brain Research 47(3), 329–342 (1982)
Article CAS PubMed Google Scholar
Logothetis, N., Pauls, J., Poggio, T.: Shape representation in the inferior temporal cortex of monkeys. Current Biology 5(5), 552–563 (1995)
Article CAS PubMed Google Scholar
Hung, C., Kreiman, G., Poggio, T., DiCarlo, J.: Fast Readout of Object Identity from Macaque Inferior Temporal Cortex (2005)
Google Scholar
Quiroga, R., Reddy, L., Koch, C., Fried, I.: Decoding Visual Inputs From Multiple Neurons in the Human Temporal Lobe. Journal of Neurophysiology 98(4), 1997 (2007)
Article PubMed Google Scholar
Itti, L., Koch, C., Niebur, E., et al.: A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(11), 1254–1259 (1998)
Article Google Scholar
Dickinson, S., Christensen, H., Tsotsos, J., Olofsson, G.: Active object recognition integrating attention and viewpoint control. Computer Vision and Image Understanding 67(3), 239–260 (1997)
Article Google Scholar
Koch, C., Ullman, S.: Shifts in selective visual attention: towards the underlying neural circuitry. Hum. Neurobiol. 4(4), 219–227 (1985)
CAS PubMed Google Scholar
Yarbus, A.: Eye Movements and Vision. Plenum Press, New York (1967)
Book Google Scholar
Goldstein, R., Woods, R., Peli, E.: Where people look when watching movies: Do all viewers look at the same place? Computers in Biology and Medicine 37(7), 957–964 (2007)
Article PubMed Google Scholar
Privitera, C., Stark, L.: Evaluating image processing algorithms that predict regions of interest. Pattern Recognition Letters 19(11), 1037–1043 (1998)
Article Google Scholar
Itti, L., Koch, C.: Computational modeling of visual attention. Nature Rev. Neurosci. 2(3), 194–203 (2001)
Article CAS Google Scholar
Peters, R., Iyer, A., Itti, L., Koch, C.: Components of bottom-up gaze allocation in natural images. Vision Res. 45(18), 2397–2416 (2005)
Article PubMed Google Scholar
Tatler, B., Baddeley, R., Gilchrist, I.: Visual correlates of fixation selection: effects of scale and time. Vision Research 45(5), 643–659 (2005)
Article PubMed Google Scholar
Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems, vol. 20. MIT Press, Cambridge (2008)
Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. Computer Vision and Pattern Recognition 1, 511–518 (2001)
Google Scholar
Buswell, G.: How People Look at Pictures: A Study of the Psychology of Perception in Art. The University of Chicago press (1935)
Google Scholar
Barton, J.: Disorders of face perception and recognition. Neurol. Clin. 21(2), 521–548 (2003)
Article PubMed Google Scholar
Klin, A., Jones, W., Schultz, R., Volkmar, F., Cohen, D.: Visual Fixation Patterns During Viewing of Naturalistic Social Situations as Predictors of Social Competence in Individuals With Autism (2002)
Google Scholar
Adolphs, R.: Neural systems for recognizing emotion. Curr. Op. Neurobiol. 12(2), 169–177 (2002)
Article CAS PubMed Google Scholar
Baddeley, R., Tatler, B.: High frequency edges (but not contrast) predict where we fixate: A Bayesian system identification analysis. Vision Research 46(18), 2824–2833 (2006)
Article PubMed Google Scholar
Einhäuser, W., König, P.: Does luminance-contrast contribute to a saliency map for overt visual attention?. Eur. J. Neurosci. 17(5), 1089–1097 (2003)
Article PubMed Google Scholar
Einhäuser, W., Kruse, W., Hoffmann, K., König, P.: Differences of monkey and human overt attention under natural conditions. Vision Res. 46(8-9), 1194–1209 (2006)
Article PubMed Google Scholar
Navalpakkam, V., Itti, L.: Search goal tunes visual features optimally. Neuron 53(4), 605–617 (2007)
Article CAS PubMed Google Scholar
Kayser, C., Nielsen, K., Logothetis, N.: Fixations in natural scenes: Interaction of image structure and image content. Vision Res. 46(16), 2535–2545 (2006)
Article PubMed Google Scholar
Einhäuser, W., Rutishauser, U., Frady, E., Nadler, S., König, P., Koch, C.: The relation of phase noise and luminance contrast to overt attention in complex visual stimuli. J. Vis. 6(11), 1148–1158 (2006)
Article PubMed Google Scholar

Download references

Author information

Authors and Affiliations

California Institute of Technology, Pasadena, CA, USA
Moran Cerf, Jonathan Harel, Alex Huth & Christof Koch
Philipps-University, Marburg, Germany
Wolfgang Einhäuser

Authors

Moran Cerf
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Harel
View author publications
You can also search for this author in PubMed Google Scholar
Alex Huth
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Einhäuser
View author publications
You can also search for this author in PubMed Google Scholar
Christof Koch
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Joanneum Research, Graz, Austria
Lucas Paletta
Center for Vision Research (CVR) and, Department of Computer Science and Engineering, York University, 4700 Keele St., ON M3J 1P3, Toronto, Canada
John K. Tsotsos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cerf, M., Harel, J., Huth, A., Einhäuser, W., Koch, C. (2009). Decoding What People See from Where They Look: Predicting Visual Stimuli from Scanpaths. In: Paletta, L., Tsotsos, J.K. (eds) Attention in Cognitive Systems. WAPCV 2008. Lecture Notes in Computer Science(), vol 5395. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00582-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-00582-4_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00581-7
Online ISBN: 978-3-642-00582-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics