Skip to main content

Advertisement

Log in

Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

In this paper, we present a probabilistic multi-task learning approach for visual saliency estimation in video. In our approach, the problem of visual saliency estimation is modeled by simultaneously considering the stimulus-driven and task-related factors in a probabilistic framework. In this framework, a stimulus-driven component simulates the low-level processes in human vision system using multi-scale wavelet decomposition and unbiased feature competition; while a task-related component simulates the high-level processes to bias the competition of the input features. Different from existing approaches, we propose a multi-task learning algorithm to learn the task-related “stimulus-saliency” mapping functions for each scene. The algorithm also learns various fusion strategies, which are used to integrate the stimulus-driven and task-related components to obtain the visual saliency. Extensive experiments were carried out on two public eye-fixation datasets and one regional saliency dataset. Experimental results show that our approach outperforms eight state-of-the-art approaches remarkably.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. In Advances in neural information processing systems (pp. 41–48).

  • Bruce, N. D., & Tsotsos, J. K. (2006). Saliency based on information maximization. In Advances in neural information processing systems (pp. 155–162).

  • Cerf, M., Harel, J., Einhauser, W., & Koch, C. (2008). Predicting human gaze using low-level saliency combined with face detection. In Advances in neural information processing systems (pp. 241–248).

  • Cheung, C. H., & Po, L. M. (2002). A novel cross-diamond search algorithm for fast block motion estimation. IEEE Transactions on Circuits and Systems for Video Technology, 12(12), 1168–1177.

    Article  Google Scholar 

  • Chun, M. M. (2005). Contextual guidance of visual attention. In Neurobiology of attention (pp. 246–250).

  • Evgeniou, T., Micchelli, C. A., & Pontil, M. (2005). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6, 615–637.

    MathSciNet  Google Scholar 

  • Frith, C. (2005). The top in top-down attention. In Neurobiology of attention (pp. 105–108).

  • Guo, C., Ma, Q., & Zhang, L. (2008). Spatio-temporal saliency detection using phase spectrum of quaternion Fourier transform. In IEEE conference on computer vision and pattern recognition.

  • Harel, J., Koch, C., & Perona, P. (2007). Graph-based visual saliency. In Advances in neural information processing systems (pp. 545–552).

  • Henderson, J. M. (2003). Human gaze control during real-world scene perception. Trends in Cognitive Sciences, 7(11), 498–504.

    Article  Google Scholar 

  • Hou, X., & Zhang, L. (2007). Saliency detection: a spectral residual approach. In IEEE conference on computer vision and pattern recognition.

  • Hu, Y., Rajan, D., & Chia, L. T. (2005a). Adaptive local context suppression of multiple cues for salient visual attention detection. In IEEE international conference on multimedia and expo.

  • Hu, Y., Rajan, D., & Chia, L. T. (2005b). Robust subspace analysis for detecting visual attention regions in images. In ACM international conference on multimedia (pp. 716–724).

  • Itti, L. (2000). Models of bottom-up and top-down visual attention. Ph.D. thesis, California Institute of Technology.

  • Itti, L. (2008). Crcns data sharing: eye movements during free-viewing of natural videos. In Collaborative research in computational neuroscience annual meeting.

  • Itti, L., & Baldi, P. (2005). A principled approach to detecting surprising events in video. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 631–637).

  • Itti, L., & Koch, C. (2001a). Computational modeling of visual attention. Nature Review Neuroscience, 2(3), 194–203.

    Article  Google Scholar 

  • Itti, L., & Koch, C. (2001b). Feature combination strategies for saliency-based visual attention systems. Journal of Electronic Imaging, 10(1), 161–169.

    Article  Google Scholar 

  • Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259.

    Article  Google Scholar 

  • Itti, L., Rees, G., & Tsotsos, J. (2005). Neurobiology of attention. San Diego: Elsevier.

    Google Scholar 

  • Jacob, L., Bach, F., & Vert, J. P. (2008). Clustered multi-task learning: a convex formulation. In Advances in neural information processing systems (pp. 745–752).

  • Kienzle, W., Wichmann, A. F., Scholkopf, B., & Franz, M. O. (2007a). A nonparametric approach to bottom-up visual saliency. In Advances in neural information processing systems (pp. 689–696).

  • Kienzle, W., Scholkopf, B., Wichmann, F. A., & Franz, M. O. (2007b). How to find interesting locations in video: a spatiotemporal interest point detector learned from human eye movements. In 29th DAGM symposium (pp. 405–414).

  • Li, J., Tian, Y., Huang, T., & Gao, W. (2009). A dataset and evaluation methodology for visual saliency in video. In IEEE international conference on multimedia and expo (pp. 442–445).

  • Liu, H., Jiang, S., Huang, Q., Xu, C., & Gao, W. (2007a). Region-based visual attention analysis with its application in image browsing on small displays. In ACM international conference on multimedia (pp. 305–308).

  • Liu, T., Sun, J., Zheng, N. N., Tang, X., & Shum, H. Y. (2007b). Learning to detect a salient object. In IEEE conference on computer vision and pattern recognition.

  • Liu, T., Zheng, N., Ding, W., & Yuan, Z. (2008). Video attention: Learning to detect a salient object sequence. IEEE international conference on pattern recognition.

  • Ma, Y. F., Hua, X. S., Lu, L., & Zhang, H. J. (2005). A generic framework of user attention model and its application in video summarization. IEEE Transactions on Multimedia, 7(5), 907–919.

    Article  Google Scholar 

  • Marat, S., Phuoc, T. H., Granjon, L., Guyader, N., Pellerin, D., & Guärin-Duguä, A. (2009). Modelling spatio-temporal saliency to predict gaze direction for short videos. International Journal of Computer Vision, 82(3), 231–243.

    Article  Google Scholar 

  • Mozer, M. C., Shettel, M., & Vecera, S. (2005). Top-down control of visual attention—a rational account. In Neural information processing systems.

  • Navalpakkam, V., & Itti, L. (2007). Search goal tunes visual features optimally. Neuron, 53, 605–617.

    Article  Google Scholar 

  • Peters, R. J., & Itti, L. (2007a). Beyond bottom-up: incorporating task-dependent influences into a computational model of spatial attention. In IEEE conference on computer vision and pattern recognition.

  • Peters, R. J., & Itti, L. (2007b). Congruence between model and human attention reveals unique signatures of critical visual events. In Advances in neural information processing systems (pp. 1145–1152).

  • Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12(1), 97–136.

    Article  Google Scholar 

  • Walther, D., & Koch, C. (2006). Modeling attention to salient proto-objects. Neural Networks, 19(9), 1395–1407.

    Article  MATH  Google Scholar 

  • Wolfe, J. M., Alvarez, G. A., & Horowitz, T. S. (2000). Attention is fast but volition is slow. Nature, 406, 691.

    Article  Google Scholar 

  • Wolfe, J. M. (1998). Visual search. In Attention (pp. 13–73).

  • Wolfe, J. M. (2005). Guidance of visual search by preattentive information. In Neurobiology of attention (pp. 101–104).

  • Zhai, Y., & Shah, M. (2006). Visual attention detection in video sequences using spatiotemporal cues. In ACM international conference on multimedia (pp. 815–824).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yonghong Tian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Tian, Y., Huang, T. et al. Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video. Int J Comput Vis 90, 150–165 (2010). https://doi.org/10.1007/s11263-010-0354-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-010-0354-6

Keywords

Navigation