Abstract
Dynamic cues have until recently been usually considered as a simple extension of the static saliency, usually in the form of optic flow between two frames. The evolution of stimuli over a period longer than two frames has been largely ignored in saliency research. We argue that considering temporal evolution of trajectory even for a relatively short period can significantly extend the kind of meaningful regions that can be extracted from videos, without resorting to higher-level processes. Our work is a systematic and principled investigation of the temporal aspect of saliency under a dynamic setting. Departing from the majority of works where the dynamic cue is considered as an extension of the static saliency, our work places central importance on temporality. We formulate both intra- and inter-trajectory saliency to measure relationships within and between trajectories respectively. Our inter-trajectory saliency formulation also represents the first attempt among computational saliency works to look beyond the immediate neighborhood in space and time, utilizing the perceptual organization rule of common fate (temporal synchrony) to make a group of trajectories stand out from the rest. At the technical level, our use of the superpixel trajectory representation captures the detailed dynamics of superpixels as they progress in time. This allows us to better measure changes such as sudden movement or onset compared to other representations. Experimental results show that our method achieves state-of-the-art performance both quantitatively and qualitatively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: NIPS, pp. 545–552 (2007)
Einhauser, W., Spain, M., Perona, P.: Objects predict fixations better than early saliency. J. Vis. 8, 1–26 (2008)
Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: ICCV, pp. 2106–2113 (2009)
Cerf, M., Harel, J., Einhaeuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. In: NIPS, pp. 241–248 (2008)
Zhao, Q., Koch, C.: Learning a saliency map using fixated locations in natural scenes. J. Vis. 11, 1–15 (2011)
Cerf, M., Koch, C.: Faces and text attract gaze independent of the task: experimental data and computer model. J. Vis. 9, 1–15 (2009)
Shen, C., Mingli, S., Zhao, Q.: Learning high-level concepts by training a deep network on eye fixations. In: Deep Learning and Unsupervised Feature Learning Workshop, in Conjunction with NIPS (2012)
Fowlkes, C.C., Martin, D.R., Malik, J.: Local figure-ground cues are valid for natural images. J. Vis. 7, 1–9 (2007)
Xu, J., Jiang, M., Wang, S., Kankanhalli, M.S., Zhao, Q.: Predicting human gaze beyond pixels. J. Vis. 14, 1–20 (2014)
Blake, R., Lee, S.H.: The role of temporal structure in human vision. Behav. Cogn. Neurosci. Rev. 4, 21–42 (2005)
Gao, T., Scholl, B.: Chasing vs. stalking: Interrupting the perception of animacy. J. Exp. Psychol. 37, 669–684 (2011)
Ballas, N., Yang, Y., Lan, Z.Z., Delezoide, B., Preteux, F., Hauptmann, A.: Space-time robust representation for action recognition. In: ICCV, pp. 2704–2711 (2013)
Seo, H.J.J., Milanfar, P.: Static and space-time visual saliency detection by self-resemblance. J. Vis. 9, 1–27 (2009)
Wang, W., Wang, Y., Huang, Q., Gao, W.: Measuring visual saliency by site entropy rate. In: CVPR, pp. 2368–2375 (2010)
Riche, N., Mancas, M., Culibrk, D., Crnojevic, V., Gosselin, B., Dutoit, T.: Dynamic saliency models and human attention: a comparative study on videos. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part III. LNCS, vol. 7726, pp. 586–598. Springer, Heidelberg (2013)
Borji, A., Itti, L.: State-of-the-art in visual attention modeling. T-PAMI 35, 185–207 (2013)
Mahadevan, V., Vasconcelos, N.: Spatiotemporal saliency in dynamic scenes. T-PAMI 32, 171–177 (2010)
Rahtu, E., Kannala, J., Salo, M., Heikkilä, J.: Segmenting salient objects from images and videos. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 366–379. Springer, Heidelberg (2010)
Zhou, F., Kang, S.B., Cohen, M.F.: Time-mapping using space-time saliency. In: CVPR (2014)
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. T-PAMI 20, 1254–1259 (1998)
Li, Y., Zhou, Y., Xu, L., Yang, X., Yang, J.: Incremental sparse saliency detection. In: ICIP, pp. 3093–3096 (2009)
Zhang, L., Tong, M.H., Cottrell, G.W.: SUNDAy: saliency using natural statistics for dynamic analysis of scenes. In: The Thirty-First Annual Cognitive Science Society Conference, pp. 1–6 (2009)
Hou, X., Zhang, L.: Dynamic visual attention: searching for coding length increments. In: NIPS, pp. 681–688 (2008)
Itti, L., Baldi, P.: A principled approach to detecting surprising events in video. In: CVPR, pp. 631–637 (2005)
Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. T-PAMI 34, 2189–2202 (2012)
Bergh, M.V.D., Roig, G., Boix, X., Manen, S., Gool, L.V.: Online video seeds for temporal window objectness. In: ICCV, pp. 377–384 (2013)
Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 282–295. Springer, Heidelberg (2010)
Ochs, P., Brox, T.: Object segmentation in video: a hierarchical variational approach for turning point trajectories into dense regions. In: ICCV, pp. 1583–1590 (2011)
Zhang, D., Javed, O., Shah, M.: Video object segmentation through spatially accurate and temporally dense extraction of primary object regions. In: CVPR, pp. 628–635 (2013)
Tatler, B.W., Hayhoe, M.M., Land, M.F., Ballard, D.H.: Eye guidance in natural vision: Reinterpreting salience. J. Vis. 11, 1–23 (2011)
Chang, J., Wei, D., III, J.W.F.: A video representation using temporal superpixels. In: CVPR, pp. 2051–2058 (2013)
Borga, M.: Learning Multidimensional Signal Processing. Ph.D. thesis, Linköping University, Sweden, SE-581 83 Linköping, Sweden (1998)
Shapovalova, N., Raptis, M., Sigal, L., Mori, G.: Action is in the eye of the beholder: eye-gaze driven model for spatio-temporal action localization. In: NIPS, pp. 2409–2417 (2013)
Fukuchi, K., Miyazato, K., Kimura, A., Takagi, S., Yamato, J.: Saliency-based video segmentation with graph cuts and sequentially updated priors. In: Proceeding of International Conference on Multimedia and Expo (ICME), pp. 638–641 (2009)
Riche, N., Duvinage, M., Mancas, M., Gosselin, B., Dutoit, T.: Saliency and human fixations: state-of-the-art and study of comparison metrics. In: ICCV (2013)
Borji, A., Sihite, D.N., Itti, L.: Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. TIP 1, 55–69 (2012)
Peters, R.J., Iyer, A., Itti, L., Koch, C.: Components of bottom-up gaze allocation in natural images. Vision. Res. 45, 2397–2416 (2005)
Jost, T., Ouerhani, N., von Wartburg, R., Muri, R., Hugli, H.: Assessing the contribution of color in visual attention. CVIU 100, 107–123 (2005)
Green, D.M., Swets, J.A.: Signal Detection Theory and Psychophysics. Wiley, New York (1966)
Guo, C., Zhang, L.: A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. TIP 57, 1856–1866 (2010)
Acknowledgement
This work was partially supported by the Singapore PSF grant 1321202075 and the NUS AcRF grant R-263-000-A21-112.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Luo, Y., Cheong, LF., Cabibihan, JJ. (2015). Modeling the Temporality of Saliency. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9005. Springer, Cham. https://doi.org/10.1007/978-3-319-16811-1_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-16811-1_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16810-4
Online ISBN: 978-3-319-16811-1
eBook Packages: Computer ScienceComputer Science (R0)