Saliency Prediction for Action Recognition

Dorr, Michael; Vig, Eleonora

doi:10.1007/978-3-319-57687-9_5

Michael Dorr⁴ &
Eleonora Vig⁵

Part of the book series: Multimedia Systems and Applications ((MMSA))

482 Accesses

Abstract

Despite all recent progress in computer vision, humans are still far superior to machines when it comes to the high-level understanding of complex dynamic scenes. The apparent ease of human perception and action cannot be explained by sheer neural computation power alone: Estimates put the transmission rate of the optic nerve at only about 10 MBit/s. One particular effective strategy to reduce the computational burden of vision in biological systems is the combination of attention with space-variant processing, where only subsets of the visual scene are processed in full detail at any one time. Here, we report on experiments that mimic eye movements and attention as a preprocessing step for state-of-the-art computer vision algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Where Should Saliency Models Look Next?

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

Attentive Systems: A Survey

Article 15 September 2017

Notes

1.
http://lear.inrialpes.fr/~wang/improved_trajectories.

References

Agtzidis, I., Startsev, M., Dorr, M.: Smooth pursuit detection based on multiple observers. In: Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications, ETRA’16, pp. 303–306. ACM, New York (2016)
Google Scholar
Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 185–207 (2013)
Article Google Scholar
Buso, V., Benois-Pineau, J., González-Díaz, I.: Object recognition in egocentric videos with saliency-based non uniform sampling and variable resolution space for features selection. In: CVPR 2014 Egocentric (First-Person) Vision Workshop (2014)
Google Scholar
Bylinskii, Z., Judd, T., Borji, A., Itti, L., Durand, F., Oliva, A., Torralba, A.: MIT Saliency Benchmark (2016). http://saliency.mit.edu
Google Scholar
Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., Durand, F.: What do different evaluation metrics tell us about saliency models? arXiv preprint arXiv:1604.03605 (2016)
Google Scholar
Castelhano, M.S., Mack, M.L., Henderson, J.M.: Viewing task influences eye movement control during active scene perception. J. Vis. 9(3), 6 (2009)
Article Google Scholar
Cerf, M., Frady, P., Koch, C.: Faces and text attract gaze independent of the task: experimental data and computer model. J. Vis. 9(12:10), 1–15 (2009)
Google Scholar
Chaabouni, S., Benois-Pineau, J., Zemmari, A., Amar, C.B.: Deep saliency: prediction of interestingness in video with CNN. In: Benois-Pineau, J., Le Callet, P. (eds.) Visual Content Indexing and Retrieval with Psycho-Visual Models. Springer, Cham (2017)
Google Scholar
Ciregan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3642–3649 (2012)
Google Scholar
Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)
Google Scholar
Dorr, M., Martinetz, T., Gegenfurtner, K., Barth, E.: Variability of eye movements when viewing dynamic natural scenes. J. Vis. 10(10), 1–17 (2010)
Article Google Scholar
de Souza, C.R., Gaidon, A., Vig, E., López, A.M.: Sympathy for the details: Dense trajectories and hybrid classification architectures for action recognition. In: Proceedings of the European Conference on Computer Vision, pp. 697–716. Springer, Cham (2016)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD Proceedings, vol. 96, pp. 226–231 (1996)
Google Scholar
Feichtenhofer, C., Pinz, A., Wildes, R.P.: Dynamically encoded actions based on spacetime saliency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2755–2764 (2015)
Google Scholar
Goldstein, R.B., Woods, R.L., Peli, E.: Where people look when watching movies: Do all viewers look at the same place? Comput. Biol. Med. 3(7), 957–64 (2007)
Article Google Scholar
Harel, J., Koch, C., Perona, P., et al.: Graph-based visual saliency. In: Advances in Neural Information Processing Systems, vol. 1, p. 5 (2006)
Google Scholar
Hasson, U., Landesman, O., Knappmeyer, B., Vallines, I., Rubin, N., Heeger, D.J.: Neurocinematics: the neuroscience of film. Projections 2(1), 1–26 (2008)
Article Google Scholar
Hooge, I., Holmqvist, K., Nyström, M.: The pupil is faster than the corneal reflection (CR): are video based pupil-CR eye trackers suitable for studying detailed dynamics of eye movements? Vis. Res. 128, 6–18 (2016)
Article Google Scholar
Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 2106–2113 (2009)
Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)
Book Google Scholar
Koch, K., McLean, J., Segev, R., Freed, M.A., II, M.J.B., Balasubramanian, V., Sterling, P.: How much the eye tells the brain. Curr. Biol. 16, 1428–34 (2006)
Google Scholar
Lan, Z., Lin, M., Li, X., Hauptmann, A.G., Raj, B.: Beyond Gaussian Pyramid: Multi-skip feature stacking for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 204–212 (2015)
Google Scholar
Marat, S., Rahman, A., Pellerin, D., Guyader, N., Houzet, D.: Improving visual saliency by adding ‘face feature map’ and ‘center bias’. Cogn. Comput. 5(1), 63–75 (2013)
Article Google Scholar
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2929–2936 (2009)
Google Scholar
Mathe, S., Sminchisescu, C.: Dynamic eye movement datasets and learnt saliency models for visual action recognition. In: Proceedings of the European Conference on Computer Vision, pp. 842–856. Springer, Berlin (2012)
Google Scholar
Mathe, S., Sminchisescu, C.: Actions in the eye: dynamic gaze datasets and learnt saliency models for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1408–1424 (2015)
Article Google Scholar
Mital, P.K., Smith, T.J., Hill, R., Henderson, J.M.: Clustering of gaze during dynamic scene viewing is predicted by motion. Cogn. Comput. 3(1), 5–24 (2011)
Article Google Scholar
Peng, X., Wang, L., Wang, X., Qiao, Y.: Bag of visual words and fusion methods for action recognition. Comput. Vis. Image Underst. 150(C), 109–125 (2016)
Article Google Scholar
Peters, R.J., Iyer, A., Itti, L., Koch, C.: Components of bottom-up gaze allocation in natural images. Vis. Res. 45(8), 2397–2416 (2005)
Article Google Scholar
Prest, A., Schmid, C., Ferrari, V.: Weakly supervised learning of interactions between humans and objects. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 601–614 (2012)
Article Google Scholar
Sapienza, M., Cuzzolin, F., Torr, P.H.: Learning discriminative space-time actions from weakly labelled videos. In: Proceedings of the British Machine Vision Conference, vol. 2, p. 3 (2012)
Google Scholar
Sapienza, M., Cuzzolin, F., Torr, P.H.: Learning discriminative space–time action parts from weakly labelled videos. Int. J. Comput. Vis. 110(1), 30–47 (2014)
Article Google Scholar
Shapovalova, N., Raptis, M., Sigal, L., Mori, G.: Action is in the eye of the beholder: eye-gaze driven model for spatio-temporal action localization. In: Advances in Neural Information Processing Systems, pp. 2409–2417 (2013)
Google Scholar
Shi, F., Petriu, E., Laganiere, R.: Sampling strategies for real-time action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2595–2602 (2013)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)
Google Scholar
Smith, T.J., Mital, P.K.: Attentional synchrony and the influence of viewing task on gaze behavior in static and dynamic scenes. J. Vis. 13(8), 16–16 (2013)
Article Google Scholar
Spering, M., Schütz, A.C., Braun, D.I., Gegenfurtner, K.R.: Keep your eyes on the ball: smooth pursuit eye movements enhance prediction of visual motion. J. Neurophysiol. 105(4), 1756–1767 (2011)
Article Google Scholar
Sultani, W., Saleemi, I.: Human action recognition across datasets by foreground-weighted histogram decomposition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 764–771 (2014)
Google Scholar
Tatler, B.W.: The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions. J. Vis. 7(14), 1–17 (2007). http://journalofvision.org/7/14/4/
Article Google Scholar
Tseng, P.H., Carmi, R., Cameron, I.G.M., Munoz, D.P., Itti, L.: Quantifying center bias of observers in free viewing of dynamic natural scenes. J. Vis. 9(7), 1–16 (2009). http://journalofvision.org/9/7/4/
Article Google Scholar
Vig, E., Dorr, M., Cox, D.D.: Saliency-based selection of sparse descriptors for action recognition. In: Proceedings of International Conference on Image Processing, pp. 1405–1408 (2012)
Google Scholar
Vig, E., Dorr, M., Cox, D.D.: Space-variant descriptor sampling for action recognition based on saliency and eye movements. In: Proceedings of the European Conference on Computer Vision. LNCS, vol. 7578, pp. 84–97 (2012)
Google Scholar
Vig, E., Dorr, M., Martinetz, T., Barth, E.: Intrinsic dimensionality predicts the saliency of natural dynamic scenes. IEEE Trans. Pattern Anal. Mach. Intell. 34(6), 1080–1091 (2012)
Article Google Scholar
Vrigkas, M., Nikou, C., Kakadiaris, I.A.: A review of human activity recognition methods. Front. Robot. AI 2, 28 (2015)
Article Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision (2013)
Book Google Scholar
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176. IEEE, New York (2011)
Google Scholar
Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4305–4314 (2015)
Google Scholar
Wang, H., Oneata, D., Verbeek, J., Schmid, C.: A robust and efficient video representation for action recognition. Int. J. Comput. Vis. 119, 219–38 (2016)
Article MathSciNet Google Scholar
von Wartburg, R., Wurtz, P., Pflugshaupt, T., Nyffeler, T., Lüthi, M., Müri, R.: Size matters: Saccades during scene perception. Perception 36, 355–65 (2007)
Article Google Scholar
Zhou, Y., Yu, H., Wang, S.: Feature sampling strategies for action recognition. arXiv preprint arXiv:1501.06993 (2015)
Google Scholar
Zitnick, L., Dollar, P.: Edge boxes: locating object proposals from edges. In: Proceedings of the European Conference on Computer Vision (2014)
Google Scholar

Download references

Acknowledgements

Our research was supported by the Elite Network Bavaria, funded by the Bavarian State Ministry for Research and Education.

Author information

Authors and Affiliations

Technical University Munich, Munich, Germany
Michael Dorr
German Aerospace Center, Oberpfaffenhofen, Germany
Eleonora Vig

Authors

Michael Dorr
View author publications
You can also search for this author in PubMed Google Scholar
Eleonora Vig
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Dorr .

Editor information

Editors and Affiliations

LaBRI UMR 5800, Univ. Bordeaux, CNRS, Bordeaux INP, Univ. Bordeaux, Talence, France
Jenny Benois-Pineau
LS2N, UMR CNRS 6004, Université de Nantes, Nantes Cedex 3, France
Patrick Le Callet

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dorr, M., Vig, E. (2017). Saliency Prediction for Action Recognition. In: Benois-Pineau, J., Le Callet, P. (eds) Visual Content Indexing and Retrieval with Psycho-Visual Models. Multimedia Systems and Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-57687-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-57687-9_5
Published: 16 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57686-2
Online ISBN: 978-3-319-57687-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Saliency Prediction for Action Recognition

Abstract

Access this chapter

Similar content being viewed by others

Where Should Saliency Models Look Next?

Attention mechanisms in computer vision: A survey

Attentive Systems: A Survey

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Saliency Prediction for Action Recognition

Abstract

Access this chapter

Similar content being viewed by others

Where Should Saliency Models Look Next?

Attention mechanisms in computer vision: A survey

Attentive Systems: A Survey

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation