Abstract
In the problem of “human sensing”, videos recorded with wearable cameras give an “egocentric” view of the world, capturing details of human activities. In this paper we continue research on visual saliency for such kind of content with the goal of “active” objects recognition in egocentric videos. In particular, a geometrical cue is considered in case when the central-bias hypothesis does not hold. The proposed visual saliency models are trained based on eye fixations of observers and incorporated into spatio-temporal saliency models. The proposed models have been compared to state of the art visual saliency models using a metric based on target object recognition performances. The results are promising:they highlight the necessity of a non-centered geometric saliency cue.
Similar content being viewed by others
References
Achanta R, Hemami S, Estrada F, Ssstrunk S (2009) Frequency-tuned salient region detection. In: IEEE international conference on computer vision and pattern recognition (CVPR 2009), pp 1597–1604. doi:10.1109/CVPR.2009.5206596. For code and supplementary material, click on the url below
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Comput Vis Image Underst 110:346–359. doi:10.1016/j.cviu.2007.09.014
Borji A, Sihite DN, Itti L (2013) Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE Trans Image Process 22(1):55–69
Boujut H, Benois-Pineau J, Ahmed T, Hadar O, Bonnet P (2012) No-reference video quality assessment of h.264 video streams based on semantic saliency maps. doi:10.1117/12.905379
Boujut H, Benois-Pineau J, Megret R (2012) Fusion of multiple visual cues for visual saliency extraction from wearable camera settings with strong motion. In: ECCV 2012 - Workshops, ECCV’12, pp 436–445
Brouard O, Ricordel V, Barba D (2009) Cartes de Saillance Spatio-Temporelle basées Contrastes de Couleur et Mouvement Relatif. In: Compression et representation des signaux audiovisuels, CORESA 2009, 6 pages. Toulouse, France. http://hal.archives-ouvertes.fr/hal-00364867
Buswell GT (1935) How people look at pictures. University of Chicago Press, Chicago
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision. ECCV, pp 1–22
Daly SJ (1998) Engineering observations from spatiovelocity and spatiotemporal visual models. In: IS&T/SPIE conference on human vision and electronic imaging III
Dorr M, Martinetz T, Barth E (2010) Variability of eye movements when viewing dynamic natural scenes. J Vis 28(10):1–17
Duan L, Wu C, Miao J (2011) Visual conspicuity index: spatial dissimilarity, distance, and central bias. IEEE Signal Process Lett 18 Nr. 11, S. 690–693
Farnebäck G (2000) Fast and accurate motion estimation using orientation tensors and parametric motion models. In: Proceedings of 15th international conference on pattern recognition, vol 1. IAPR, Barcelona, Spain, pp 135–139
Fathi A, Li Y, Rehg JM (2012) Learning to recognize daily actions using gaze. In: ECCV (1), pp 314–327
Felzenszwalb PF, Girshick RB, McAllester DA, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24:381–395. doi:10.1145/358669.358692
González Díaz I, Buso V, Benois-Pineau J, Bourmaud G, Megret R (2013) Modeling instrumental activities of daily living in egocentric vision as sequences of active objects and context for alzheimer disease research. In: Proceedings of the 1st ACM international workshop on multimedia indexing and information retrieval for Healthcare, MIIRH ’13. ACM, New York, pp 11–14. doi:10.1145/2505323.2505328
Harel J, Koch C, Perona P (2007) Graph-based visual saliency. In: Advances in neural information processing systems 19. MIT Press, Cambridge, pp 545–552
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259. doi:10.1109/34.730558
Kim H, Lee S, Bovik A (2014) Saliency prediction on stereoscopic videos. IEEE Trans Image Process 23:1476–1490
Komogortsev OV (2009) Gaze-contingent video compression with targeted gaze containment performance. J Electron Imaging 18(3):033,001–033,001–10. doi:10.1117/1.3158609
Land M, Mennie N, Rusted J The role of vision and eye movements in the control of activities of daily living. Perception 28:1311–1328
Le Meur O, Baccino T (2013) Methods for comparing scanpaths and saliency maps: strengths and weaknesses. Behavior Research Methods 45(1):251–266. doi: 10.3758/s13428-012-0226-9
Li J, Tian Y, Huang T, Gao W (2009) A dataset and evaluation methodology for visual saliency in video. In: ICME. IEEE, pp 442–445
Liu C. (2009) Beyond pixels: exploring new representations and applications for motion analysis. Doctoral Thesis, Massachusetts Institute of Technology
Marat S, Ho Phuoc T, Granjon L, Guyader N, Pellerin D, Guérin-Dugué A (2009) Modelling spatio-temporal saliency to predict gaze direction for short videos. Int J Comput Vis 82(3):231–243. doi:10.1007/s11263-009-0215-3
Mayol WW, Murray DW (2005) Wearable hand activity recognition for event summarization. In: Ninth IEEE international symposium on wearable computers, 2005. Proceedings, pp 122–129. doi: 10.1109/ISWC.2005.57
Pirsiavash H, Ramanan D (2012) Detecting activities of daily living in first-person camera views. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE
Ren X, Philipose M (2009) Egocentric recognition of handled objects: benchmark and analysis. In: IEEE computer society conference on computer vision and pattern recognition workshops, 2009. CVPR Workshops 2009, pp 1–8. doi:10.1109/CVPRW.2009.5204360
Riche N, Duvinage M, Mancas M, Gosselin B, Dutoit T (2013) Saliency and human fixations: state-of-the-art and study of comparison metrics. In: The IEEE international conference on computer vision (ICCV)
Rudoy D, Goldman DB, Shechtman E, Zelnik-Manor L (2013) Learning video saliency from human gaze using candidate selection. In: CVPR. IEEE, pp 1147–1154
Seo HJ, Milanfar P (2009) Nonparametric bottom-up saliency detection by self-resemblance. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops 0, pp 45–52. doi:http://doi.ieeecomputersociety.org/10.1109/CVPR.2009.5204207
Tatler BW (2007) The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. J Vis 7(14):4, 1–17
Tilke J, Durand F, Torralba A (2012) A benchmark of computational models of saliency to predict human fixations
Tilke J, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. In: IEEE international conference on computer vision (ICCV)
Tong Y, Cheikh FA, Guraya FFE, Konik H, Trmeau A (2011) A spatiotemporal saliency model for video surveillance. Cogn Comput 3(1):241–263
Vig E, Dorr M, Cox DD (2012) Space-variant descriptor sampling for action recognition based on saliency and eye movements. In: ECCV (7), pp 84–97
Yamada K, Sugano Y, Okabe T, Sato Y, Sugimoto A, Hiraki K (2011) Detecting activities of daily living in first-person camera views. In: Pacific-Rim symposium on image and video technology (PSIVT), LNCS 7087. IAPR, pp 1627–1645
Zhong S, Liu Y, Ren F, Zhang J, Ren T (2013) Video saliency detection via dynamic consistent spatio-temporal attention modelling. In: AAAI
Acknowledgments
This research is supported by the EU FP7 PI Dem@Care project under grant agreement #288199.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Buso, V., Benois-Pineau, J. & Domenger, JP. Geometrical cues in visual saliency models for active object recognition in egocentric videos. Multimed Tools Appl 74, 10077–10095 (2015). https://doi.org/10.1007/s11042-015-2803-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-2803-2