Abstract
The understanding and analysis of video content are fundamentally important for numerous applications, including video summarization, retrieval, navigation, and editing. An important part of this process is to detect salient (which usually means important and interesting) objects in video segments. Unlike existing approaches, we propose a method that combines the saliency measurement with spatial and temporal coherence. The integration of spatial and temporal coherence is inspired by the focused attention in human vision. In the proposed method, the spatial coherence of low-level visual grouping cues (e.g. appearance and motion) helps per-frame object-background separation, while the temporal coherence of the object properties (e.g. shape and appearance) ensures consistent object localization over time, and thus the method is robust to unexpected environment changes and camera vibrations. Having developed an efficient optimization strategy based on coarse-to-fine multi-scale dynamic programming, we evaluate our method using a challenging dataset that is freely available together with this paper. We show the effectiveness and complementariness of the two types of coherence, and demonstrate that they can significantly improve the performance of salient object detection in videos.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Itti L, Rees G, Tsotsos J. Neurobiology of Attention. San Diego: Elsevier, 2005
Treue S. Visual attention: The where, what, how and why of saliency. Curr Opin Neurobiol, 2003, 13: 428–432
Liu T, Sun J, Zheng N N, et al. Learning to detect a salient object. In: Proceedings of IEEE Computer Society Conference on Computer and Vision Pattern Recognition, Minneapolis, USA, 2007. 1–8
Liu T, Yuan Z J, Sun J, et al. Learning to detect a salient object. IEEE Trans Pattern Anal Mach Intell, 02 Mar. 2010, doi: 10.1109/TPAMI. 2010.70
Li J, Tian Y H, Huang T J, et al. Probabilistic multi-task learning for visual saliency estimation in video. Int J Comput Vis, 2010, 90: 150–165
Palmer S. Vision Science: Photons to Phenomenology. Cambridge, MA: The MIT Press, 1999
Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell, 1998, 20: 1254–1259
Itti L, Baldi P. A principled approach to detecting surprising events in video. In: Proceedings of the IEEE Computer Society Conference on Computer and Vision Pattern Recognition, San Diego, CA, USA, 2005. 631–637
Walther D, Koch C. Modeling attention to salient protoobjects. Neural Netw, 2006, 19: 1395–1407
Treisman A M, Gelade G. A feature-integration theory of attention. Cogn Psychol, 1980, 12: 97–136
Alexe B, Deselaers T, Ferrari V. What is an object? In: Proceedings of IEEE Computer Society Conference on Computer and Vision Pattern Recognition, San Francisco, USA, 2010. 1–8
Moosmann F, Larlus D, Jurie F. Learning saliency maps for object categorization. In: Proceedings of ECCV International Workshop on the Representation and Use of Prior Knowledge in Vision, Graz, Austria, 2006. 1–14
Peters R J, Itti L. Beyond bottom-up: incorporating Task dependent influences into a computational model of spatial attention. In: Proceedings of IEEE Computer Society Conference on Computer and Vision Pattern Recognition, Minneapolis, USA, 2007. 1–8
Cao L L, Li F F. Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes. In: Proceedings of IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, 2007. 1–8
Weiss Y, Adelson E H. A unified mixture framework for motion segmentation: incorporating spatial coherence and estimating the number of models. In: Proceedings of IEEE Computer Society Conference on Computer and Vision Pattern Recognition, San Francisco, USA, 1996. 321–326
Moscheni F, Dufaux F, Kunt M. Object tracking based on temporal and spatial information. In: Proceedings of IEEE Conference on Acoustics, Speech, and Signal Processing, Atlanta, USA, 1996. 1914–1917
Kim M, Choi J G, Kim D, et al. A VOP generation tool: Automatic segmentation of moving objects in image sequences based on spatio-temporal information. IEEE Trans Circuits Syst Video Technol, 1999, 9: 1216–1226
Tsaig Y, Averbuch A. Automatic segmentation of moving objects in video sequences: A region labeling approach. IEEE Trans Circuits Syst Video Technol, 2002, 12: 597–612
Lafferty J, McCallum A, Pereira F. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Brodley C E, Danyluk A P, eds. Proceedings of International Conference on Machine Learning, Williams College, Williamstown, MA, USA, 2001, 282–289
Baker S, Roth S, Scharstein D, et al. A database and evaluation methodology for optical flow. In: Proceedings of IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, 2007. 1–8
Liu T, Zheng N N, Ding W, et al. Video attention: Learning to detect a salient object sequence. In: Proceedings of International Conference on Pattern Recognition, Tampa, Florida, USA, 2008. 1–4
Brox T, Bruhn A, Papenberg N, et al. High accuracy optical flow estimation based on a theory for warping. In: Pajdla T, Matas J, eds. Proceedings of European Conference on Computer Vision, Prague, Czech Republic, 2004. 25–36
Rosenfeld A, Pfaltz J. Sequential operations in digital picture processing. J ACM, 1966, 13: 471–494
Yang H, Tian J, Chu Y, et al. Spatiotemporal smooth models for moving object detection. IEEE Signal Process Lett, 2008, 15: 497–500
Sun J, Zhang W, Tang X, et al. Bidirectional tracking using trajectory segment analysis. In: Proceedings of IEEE International Conference on Computer Vision, Beijing, China, 2005. 717–724
Author information
Authors and Affiliations
Corresponding authors
Additional information
This article is published with open access at Springerlink.com
Rights and permissions
This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.
About this article
Cite this article
Wu, Y., Zheng, N., Yuan, Z. et al. Detection of salient objects with focused attention based on spatial and temporal coherence. Chin. Sci. Bull. 56, 1055–1062 (2011). https://doi.org/10.1007/s11434-010-4387-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11434-010-4387-1