Skip to main content
Log in

Geometrical cues in visual saliency models for active object recognition in egocentric videos

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In the problem of “human sensing”, videos recorded with wearable cameras give an “egocentric” view of the world, capturing details of human activities. In this paper we continue research on visual saliency for such kind of content with the goal of “active” objects recognition in egocentric videos. In particular, a geometrical cue is considered in case when the central-bias hypothesis does not hold. The proposed visual saliency models are trained based on eye fixations of observers and incorporated into spatio-temporal saliency models. The proposed models have been compared to state of the art visual saliency models using a metric based on target object recognition performances. The results are promising:they highlight the necessity of a non-centered geometric saliency cue.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Achanta R, Hemami S, Estrada F, Ssstrunk S (2009) Frequency-tuned salient region detection. In: IEEE international conference on computer vision and pattern recognition (CVPR 2009), pp 1597–1604. doi:10.1109/CVPR.2009.5206596. For code and supplementary material, click on the url below

  2. Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Comput Vis Image Underst 110:346–359. doi:10.1016/j.cviu.2007.09.014

    Article  Google Scholar 

  3. Borji A, Sihite DN, Itti L (2013) Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE Trans Image Process 22(1):55–69

    Article  MathSciNet  Google Scholar 

  4. Boujut H, Benois-Pineau J, Ahmed T, Hadar O, Bonnet P (2012) No-reference video quality assessment of h.264 video streams based on semantic saliency maps. doi:10.1117/12.905379

  5. Boujut H, Benois-Pineau J, Megret R (2012) Fusion of multiple visual cues for visual saliency extraction from wearable camera settings with strong motion. In: ECCV 2012 - Workshops, ECCV’12, pp 436–445

  6. Brouard O, Ricordel V, Barba D (2009) Cartes de Saillance Spatio-Temporelle basées Contrastes de Couleur et Mouvement Relatif. In: Compression et representation des signaux audiovisuels, CORESA 2009, 6 pages. Toulouse, France. http://hal.archives-ouvertes.fr/hal-00364867

  7. Buswell GT (1935) How people look at pictures. University of Chicago Press, Chicago

    Google Scholar 

  8. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297

    MATH  Google Scholar 

  9. Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision. ECCV, pp 1–22

  10. Daly SJ (1998) Engineering observations from spatiovelocity and spatiotemporal visual models. In: IS&T/SPIE conference on human vision and electronic imaging III

  11. Dorr M, Martinetz T, Barth E (2010) Variability of eye movements when viewing dynamic natural scenes. J Vis 28(10):1–17

  12. Duan L, Wu C, Miao J (2011) Visual conspicuity index: spatial dissimilarity, distance, and central bias. IEEE Signal Process Lett 18 Nr. 11, S. 690–693

  13. Farnebäck G (2000) Fast and accurate motion estimation using orientation tensors and parametric motion models. In: Proceedings of 15th international conference on pattern recognition, vol 1. IAPR, Barcelona, Spain, pp 135–139

  14. Fathi A, Li Y, Rehg JM (2012) Learning to recognize daily actions using gaze. In: ECCV (1), pp 314–327

  15. Felzenszwalb PF, Girshick RB, McAllester DA, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  16. Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24:381–395. doi:10.1145/358669.358692

    Article  MathSciNet  Google Scholar 

  17. González Díaz I, Buso V, Benois-Pineau J, Bourmaud G, Megret R (2013) Modeling instrumental activities of daily living in egocentric vision as sequences of active objects and context for alzheimer disease research. In: Proceedings of the 1st ACM international workshop on multimedia indexing and information retrieval for Healthcare, MIIRH ’13. ACM, New York, pp 11–14. doi:10.1145/2505323.2505328

  18. Harel J, Koch C, Perona P (2007) Graph-based visual saliency. In: Advances in neural information processing systems 19. MIT Press, Cambridge, pp 545–552

    Google Scholar 

  19. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259. doi:10.1109/34.730558

    Article  Google Scholar 

  20. Kim H, Lee S, Bovik A (2014) Saliency prediction on stereoscopic videos. IEEE Trans Image Process 23:1476–1490

    Article  MathSciNet  Google Scholar 

  21. Komogortsev OV (2009) Gaze-contingent video compression with targeted gaze containment performance. J Electron Imaging 18(3):033,001–033,001–10. doi:10.1117/1.3158609

    Article  Google Scholar 

  22. Land M, Mennie N, Rusted J The role of vision and eye movements in the control of activities of daily living. Perception 28:1311–1328

  23. Le Meur O, Baccino T (2013) Methods for comparing scanpaths and saliency maps: strengths and weaknesses. Behavior Research Methods 45(1):251–266. doi: 10.3758/s13428-012-0226-9

  24. Li J, Tian Y, Huang T, Gao W (2009) A dataset and evaluation methodology for visual saliency in video. In: ICME. IEEE, pp 442–445

  25. Liu C. (2009) Beyond pixels: exploring new representations and applications for motion analysis. Doctoral Thesis, Massachusetts Institute of Technology

  26. Marat S, Ho Phuoc T, Granjon L, Guyader N, Pellerin D, Guérin-Dugué A (2009) Modelling spatio-temporal saliency to predict gaze direction for short videos. Int J Comput Vis 82(3):231–243. doi:10.1007/s11263-009-0215-3

    Article  Google Scholar 

  27. Mayol WW, Murray DW (2005) Wearable hand activity recognition for event summarization. In: Ninth IEEE international symposium on wearable computers, 2005. Proceedings, pp 122–129. doi: 10.1109/ISWC.2005.57

  28. Pirsiavash H, Ramanan D (2012) Detecting activities of daily living in first-person camera views. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE

  29. Ren X, Philipose M (2009) Egocentric recognition of handled objects: benchmark and analysis. In: IEEE computer society conference on computer vision and pattern recognition workshops, 2009. CVPR Workshops 2009, pp 1–8. doi:10.1109/CVPRW.2009.5204360

  30. Riche N, Duvinage M, Mancas M, Gosselin B, Dutoit T (2013) Saliency and human fixations: state-of-the-art and study of comparison metrics. In: The IEEE international conference on computer vision (ICCV)

  31. Rudoy D, Goldman DB, Shechtman E, Zelnik-Manor L (2013) Learning video saliency from human gaze using candidate selection. In: CVPR. IEEE, pp 1147–1154

  32. Seo HJ, Milanfar P (2009) Nonparametric bottom-up saliency detection by self-resemblance. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops 0, pp 45–52. doi:http://doi.ieeecomputersociety.org/10.1109/CVPR.2009.5204207

  33. Tatler BW (2007) The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. J Vis 7(14):4, 1–17

  34. Tilke J, Durand F, Torralba A (2012) A benchmark of computational models of saliency to predict human fixations

  35. Tilke J, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. In: IEEE international conference on computer vision (ICCV)

  36. Tong Y, Cheikh FA, Guraya FFE, Konik H, Trmeau A (2011) A spatiotemporal saliency model for video surveillance. Cogn Comput 3(1):241–263

    Article  Google Scholar 

  37. Vig E, Dorr M, Cox DD (2012) Space-variant descriptor sampling for action recognition based on saliency and eye movements. In: ECCV (7), pp 84–97

  38. Yamada K, Sugano Y, Okabe T, Sato Y, Sugimoto A, Hiraki K (2011) Detecting activities of daily living in first-person camera views. In: Pacific-Rim symposium on image and video technology (PSIVT), LNCS 7087. IAPR, pp 1627–1645

  39. Zhong S, Liu Y, Ren F, Zhang J, Ren T (2013) Video saliency detection via dynamic consistent spatio-temporal attention modelling. In: AAAI

Download references

Acknowledgments

This research is supported by the EU FP7 PI Dem@Care project under grant agreement #288199.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vincent Buso.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Buso, V., Benois-Pineau, J. & Domenger, JP. Geometrical cues in visual saliency models for active object recognition in egocentric videos. Multimed Tools Appl 74, 10077–10095 (2015). https://doi.org/10.1007/s11042-015-2803-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-2803-2

Keywords

Navigation