Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video

Li, Jia; Tian, Yonghong; Huang, Tiejun; Gao, Wen

doi:10.1007/s11263-010-0354-6

Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video

Published: 27 May 2010

Volume 90, pages 150–165, (2010)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Jia Li^1,2,
Yonghong Tian³,
Tiejun Huang³ &
…
Wen Gao³

1088 Accesses
110 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper, we present a probabilistic multi-task learning approach for visual saliency estimation in video. In our approach, the problem of visual saliency estimation is modeled by simultaneously considering the stimulus-driven and task-related factors in a probabilistic framework. In this framework, a stimulus-driven component simulates the low-level processes in human vision system using multi-scale wavelet decomposition and unbiased feature competition; while a task-related component simulates the high-level processes to bias the competition of the input features. Different from existing approaches, we propose a multi-task learning algorithm to learn the task-related “stimulus-saliency” mapping functions for each scene. The algorithm also learns various fusion strategies, which are used to integrate the stimulus-driven and task-related components to obtain the visual saliency. Extensive experiments were carried out on two public eye-fixation datasets and one regional saliency dataset. Experimental results show that our approach outperforms eight state-of-the-art approaches remarkably.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. In Advances in neural information processing systems (pp. 41–48).
Bruce, N. D., & Tsotsos, J. K. (2006). Saliency based on information maximization. In Advances in neural information processing systems (pp. 155–162).
Cerf, M., Harel, J., Einhauser, W., & Koch, C. (2008). Predicting human gaze using low-level saliency combined with face detection. In Advances in neural information processing systems (pp. 241–248).
Cheung, C. H., & Po, L. M. (2002). A novel cross-diamond search algorithm for fast block motion estimation. IEEE Transactions on Circuits and Systems for Video Technology, 12(12), 1168–1177.
Article Google Scholar
Chun, M. M. (2005). Contextual guidance of visual attention. In Neurobiology of attention (pp. 246–250).
Evgeniou, T., Micchelli, C. A., & Pontil, M. (2005). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6, 615–637.
MathSciNet Google Scholar
Frith, C. (2005). The top in top-down attention. In Neurobiology of attention (pp. 105–108).
Guo, C., Ma, Q., & Zhang, L. (2008). Spatio-temporal saliency detection using phase spectrum of quaternion Fourier transform. In IEEE conference on computer vision and pattern recognition.
Harel, J., Koch, C., & Perona, P. (2007). Graph-based visual saliency. In Advances in neural information processing systems (pp. 545–552).
Henderson, J. M. (2003). Human gaze control during real-world scene perception. Trends in Cognitive Sciences, 7(11), 498–504.
Article Google Scholar
Hou, X., & Zhang, L. (2007). Saliency detection: a spectral residual approach. In IEEE conference on computer vision and pattern recognition.
Hu, Y., Rajan, D., & Chia, L. T. (2005a). Adaptive local context suppression of multiple cues for salient visual attention detection. In IEEE international conference on multimedia and expo.
Hu, Y., Rajan, D., & Chia, L. T. (2005b). Robust subspace analysis for detecting visual attention regions in images. In ACM international conference on multimedia (pp. 716–724).
Itti, L. (2000). Models of bottom-up and top-down visual attention. Ph.D. thesis, California Institute of Technology.
Itti, L. (2008). Crcns data sharing: eye movements during free-viewing of natural videos. In Collaborative research in computational neuroscience annual meeting.
Itti, L., & Baldi, P. (2005). A principled approach to detecting surprising events in video. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 631–637).
Itti, L., & Koch, C. (2001a). Computational modeling of visual attention. Nature Review Neuroscience, 2(3), 194–203.
Article Google Scholar
Itti, L., & Koch, C. (2001b). Feature combination strategies for saliency-based visual attention systems. Journal of Electronic Imaging, 10(1), 161–169.
Article Google Scholar
Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259.
Article Google Scholar
Itti, L., Rees, G., & Tsotsos, J. (2005). Neurobiology of attention. San Diego: Elsevier.
Google Scholar
Jacob, L., Bach, F., & Vert, J. P. (2008). Clustered multi-task learning: a convex formulation. In Advances in neural information processing systems (pp. 745–752).
Kienzle, W., Wichmann, A. F., Scholkopf, B., & Franz, M. O. (2007a). A nonparametric approach to bottom-up visual saliency. In Advances in neural information processing systems (pp. 689–696).
Kienzle, W., Scholkopf, B., Wichmann, F. A., & Franz, M. O. (2007b). How to find interesting locations in video: a spatiotemporal interest point detector learned from human eye movements. In 29th DAGM symposium (pp. 405–414).
Li, J., Tian, Y., Huang, T., & Gao, W. (2009). A dataset and evaluation methodology for visual saliency in video. In IEEE international conference on multimedia and expo (pp. 442–445).
Liu, H., Jiang, S., Huang, Q., Xu, C., & Gao, W. (2007a). Region-based visual attention analysis with its application in image browsing on small displays. In ACM international conference on multimedia (pp. 305–308).
Liu, T., Sun, J., Zheng, N. N., Tang, X., & Shum, H. Y. (2007b). Learning to detect a salient object. In IEEE conference on computer vision and pattern recognition.
Liu, T., Zheng, N., Ding, W., & Yuan, Z. (2008). Video attention: Learning to detect a salient object sequence. IEEE international conference on pattern recognition.
Ma, Y. F., Hua, X. S., Lu, L., & Zhang, H. J. (2005). A generic framework of user attention model and its application in video summarization. IEEE Transactions on Multimedia, 7(5), 907–919.
Article Google Scholar
Marat, S., Phuoc, T. H., Granjon, L., Guyader, N., Pellerin, D., & Guärin-Duguä, A. (2009). Modelling spatio-temporal saliency to predict gaze direction for short videos. International Journal of Computer Vision, 82(3), 231–243.
Article Google Scholar
Mozer, M. C., Shettel, M., & Vecera, S. (2005). Top-down control of visual attention—a rational account. In Neural information processing systems.
Navalpakkam, V., & Itti, L. (2007). Search goal tunes visual features optimally. Neuron, 53, 605–617.
Article Google Scholar
Peters, R. J., & Itti, L. (2007a). Beyond bottom-up: incorporating task-dependent influences into a computational model of spatial attention. In IEEE conference on computer vision and pattern recognition.
Peters, R. J., & Itti, L. (2007b). Congruence between model and human attention reveals unique signatures of critical visual events. In Advances in neural information processing systems (pp. 1145–1152).
Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12(1), 97–136.
Article Google Scholar
Walther, D., & Koch, C. (2006). Modeling attention to salient proto-objects. Neural Networks, 19(9), 1395–1407.
Article MATH Google Scholar
Wolfe, J. M., Alvarez, G. A., & Horowitz, T. S. (2000). Attention is fast but volition is slow. Nature, 406, 691.
Article Google Scholar
Wolfe, J. M. (1998). Visual search. In Attention (pp. 13–73).
Wolfe, J. M. (2005). Guidance of visual search by preattentive information. In Neurobiology of attention (pp. 101–104).
Zhai, Y., & Shah, M. (2006). Visual attention detection in video sequences using spatiotemporal cues. In ACM international conference on multimedia (pp. 815–824).

Download references

Author information

Authors and Affiliations

Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Jia Li
Graduate University of Chinese Academy of Sciences, Beijing, 100049, China
Jia Li
National Engineering Laboratory for Video Technology, Peking University, Beijing, 100871, China
Yonghong Tian, Tiejun Huang & Wen Gao

Authors

Jia Li
View author publications
You can also search for this author in PubMed Google Scholar
Yonghong Tian
View author publications
You can also search for this author in PubMed Google Scholar
Tiejun Huang
View author publications
You can also search for this author in PubMed Google Scholar
Wen Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yonghong Tian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Tian, Y., Huang, T. et al. Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video. Int J Comput Vis 90, 150–165 (2010). https://doi.org/10.1007/s11263-010-0354-6

Download citation

Received: 17 October 2009
Accepted: 07 May 2010
Published: 27 May 2010
Issue Date: November 2010
DOI: https://doi.org/10.1007/s11263-010-0354-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Visual attention network

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Visual attention network

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation