Geometrical cues in visual saliency models for active object recognition in egocentric videos

Buso, Vincent; Benois-Pineau, Jenny; Domenger, Jean-Philippe

doi:10.1007/s11042-015-2803-2

Geometrical cues in visual saliency models for active object recognition in egocentric videos

Published: 23 July 2015

Volume 74, pages 10077–10095, (2015)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Vincent Buso¹,
Jenny Benois-Pineau¹ &
Jean-Philippe Domenger¹

332 Accesses
6 Citations
Explore all metrics

Abstract

In the problem of “human sensing”, videos recorded with wearable cameras give an “egocentric” view of the world, capturing details of human activities. In this paper we continue research on visual saliency for such kind of content with the goal of “active” objects recognition in egocentric videos. In particular, a geometrical cue is considered in case when the central-bias hypothesis does not hold. The proposed visual saliency models are trained based on eye fixations of observers and incorporated into spatio-temporal saliency models. The proposed models have been compared to state of the art visual saliency models using a metric based on target object recognition performances. The results are promising:they highlight the necessity of a non-centered geometric saliency cue.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bottom-Up Saliency Models for Videos: A Practical Review

An Experimental Analysis of Saliency Detection with Respect to Three Saliency Levels

Exploiting Visual Saliency Algorithms for Object-Based Attention: A New Color and Scale-Based Approach

References

Achanta R, Hemami S, Estrada F, Ssstrunk S (2009) Frequency-tuned salient region detection. In: IEEE international conference on computer vision and pattern recognition (CVPR 2009), pp 1597–1604. doi:10.1109/CVPR.2009.5206596. For code and supplementary material, click on the url below
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Comput Vis Image Underst 110:346–359. doi:10.1016/j.cviu.2007.09.014
Article Google Scholar
Borji A, Sihite DN, Itti L (2013) Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE Trans Image Process 22(1):55–69
Article MathSciNet Google Scholar
Boujut H, Benois-Pineau J, Ahmed T, Hadar O, Bonnet P (2012) No-reference video quality assessment of h.264 video streams based on semantic saliency maps. doi:10.1117/12.905379
Boujut H, Benois-Pineau J, Megret R (2012) Fusion of multiple visual cues for visual saliency extraction from wearable camera settings with strong motion. In: ECCV 2012 - Workshops, ECCV’12, pp 436–445
Brouard O, Ricordel V, Barba D (2009) Cartes de Saillance Spatio-Temporelle basées Contrastes de Couleur et Mouvement Relatif. In: Compression et representation des signaux audiovisuels, CORESA 2009, 6 pages. Toulouse, France. http://hal.archives-ouvertes.fr/hal-00364867
Buswell GT (1935) How people look at pictures. University of Chicago Press, Chicago
Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
MATH Google Scholar
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision. ECCV, pp 1–22
Daly SJ (1998) Engineering observations from spatiovelocity and spatiotemporal visual models. In: IS&T/SPIE conference on human vision and electronic imaging III
Dorr M, Martinetz T, Barth E (2010) Variability of eye movements when viewing dynamic natural scenes. J Vis 28(10):1–17
Duan L, Wu C, Miao J (2011) Visual conspicuity index: spatial dissimilarity, distance, and central bias. IEEE Signal Process Lett 18 Nr. 11, S. 690–693
Farnebäck G (2000) Fast and accurate motion estimation using orientation tensors and parametric motion models. In: Proceedings of 15th international conference on pattern recognition, vol 1. IAPR, Barcelona, Spain, pp 135–139
Fathi A, Li Y, Rehg JM (2012) Learning to recognize daily actions using gaze. In: ECCV (1), pp 314–327
Felzenszwalb PF, Girshick RB, McAllester DA, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Article Google Scholar
Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24:381–395. doi:10.1145/358669.358692
Article MathSciNet Google Scholar
González Díaz I, Buso V, Benois-Pineau J, Bourmaud G, Megret R (2013) Modeling instrumental activities of daily living in egocentric vision as sequences of active objects and context for alzheimer disease research. In: Proceedings of the 1st ACM international workshop on multimedia indexing and information retrieval for Healthcare, MIIRH ’13. ACM, New York, pp 11–14. doi:10.1145/2505323.2505328
Harel J, Koch C, Perona P (2007) Graph-based visual saliency. In: Advances in neural information processing systems 19. MIT Press, Cambridge, pp 545–552
Google Scholar
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259. doi:10.1109/34.730558
Article Google Scholar
Kim H, Lee S, Bovik A (2014) Saliency prediction on stereoscopic videos. IEEE Trans Image Process 23:1476–1490
Article MathSciNet Google Scholar
Komogortsev OV (2009) Gaze-contingent video compression with targeted gaze containment performance. J Electron Imaging 18(3):033,001–033,001–10. doi:10.1117/1.3158609
Article Google Scholar
Land M, Mennie N, Rusted J The role of vision and eye movements in the control of activities of daily living. Perception 28:1311–1328
Le Meur O, Baccino T (2013) Methods for comparing scanpaths and saliency maps: strengths and weaknesses. Behavior Research Methods 45(1):251–266. doi: 10.3758/s13428-012-0226-9
Li J, Tian Y, Huang T, Gao W (2009) A dataset and evaluation methodology for visual saliency in video. In: ICME. IEEE, pp 442–445
Liu C. (2009) Beyond pixels: exploring new representations and applications for motion analysis. Doctoral Thesis, Massachusetts Institute of Technology
Marat S, Ho Phuoc T, Granjon L, Guyader N, Pellerin D, Guérin-Dugué A (2009) Modelling spatio-temporal saliency to predict gaze direction for short videos. Int J Comput Vis 82(3):231–243. doi:10.1007/s11263-009-0215-3
Article Google Scholar
Mayol WW, Murray DW (2005) Wearable hand activity recognition for event summarization. In: Ninth IEEE international symposium on wearable computers, 2005. Proceedings, pp 122–129. doi: 10.1109/ISWC.2005.57
Pirsiavash H, Ramanan D (2012) Detecting activities of daily living in first-person camera views. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE
Ren X, Philipose M (2009) Egocentric recognition of handled objects: benchmark and analysis. In: IEEE computer society conference on computer vision and pattern recognition workshops, 2009. CVPR Workshops 2009, pp 1–8. doi:10.1109/CVPRW.2009.5204360
Riche N, Duvinage M, Mancas M, Gosselin B, Dutoit T (2013) Saliency and human fixations: state-of-the-art and study of comparison metrics. In: The IEEE international conference on computer vision (ICCV)
Rudoy D, Goldman DB, Shechtman E, Zelnik-Manor L (2013) Learning video saliency from human gaze using candidate selection. In: CVPR. IEEE, pp 1147–1154
Seo HJ, Milanfar P (2009) Nonparametric bottom-up saliency detection by self-resemblance. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops 0, pp 45–52. doi:http://doi.ieeecomputersociety.org/10.1109/CVPR.2009.5204207
Tatler BW (2007) The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. J Vis 7(14):4, 1–17
Tilke J, Durand F, Torralba A (2012) A benchmark of computational models of saliency to predict human fixations
Tilke J, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. In: IEEE international conference on computer vision (ICCV)
Tong Y, Cheikh FA, Guraya FFE, Konik H, Trmeau A (2011) A spatiotemporal saliency model for video surveillance. Cogn Comput 3(1):241–263
Article Google Scholar
Vig E, Dorr M, Cox DD (2012) Space-variant descriptor sampling for action recognition based on saliency and eye movements. In: ECCV (7), pp 84–97
Yamada K, Sugano Y, Okabe T, Sato Y, Sugimoto A, Hiraki K (2011) Detecting activities of daily living in first-person camera views. In: Pacific-Rim symposium on image and video technology (PSIVT), LNCS 7087. IAPR, pp 1627–1645
Zhong S, Liu Y, Ren F, Zhang J, Ren T (2013) Video saliency detection via dynamic consistent spatio-temporal attention modelling. In: AAAI

Download references

Acknowledgments

This research is supported by the EU FP7 PI Dem@Care project under grant agreement #288199.

Author information

Authors and Affiliations

Laboratoire Bordelais de Recherche en Informatique (LaBRI), Talence, France
Vincent Buso, Jenny Benois-Pineau & Jean-Philippe Domenger

Authors

Vincent Buso
View author publications
You can also search for this author in PubMed Google Scholar
Jenny Benois-Pineau
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Philippe Domenger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vincent Buso.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Buso, V., Benois-Pineau, J. & Domenger, JP. Geometrical cues in visual saliency models for active object recognition in egocentric videos. Multimed Tools Appl 74, 10077–10095 (2015). https://doi.org/10.1007/s11042-015-2803-2

Download citation

Received: 30 January 2015
Revised: 08 May 2015
Accepted: 01 July 2015
Published: 23 July 2015
Issue Date: November 2015
DOI: https://doi.org/10.1007/s11042-015-2803-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Geometrical cues in visual saliency models for active object recognition in egocentric videos

Abstract

Access this article

Similar content being viewed by others

Bottom-Up Saliency Models for Videos: A Practical Review

An Experimental Analysis of Saliency Detection with Respect to Three Saliency Levels

Exploiting Visual Saliency Algorithms for Object-Based Attention: A New Color and Scale-Based Approach

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Geometrical cues in visual saliency models for active object recognition in egocentric videos

Abstract

Access this article

Similar content being viewed by others

Bottom-Up Saliency Models for Videos: A Practical Review

An Experimental Analysis of Saliency Detection with Respect to Three Saliency Levels

Exploiting Visual Saliency Algorithms for Object-Based Attention: A New Color and Scale-Based Approach

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation