Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video

Abstract

In this paper, we present a probabilistic multi-task learning approach for visual saliency estimation in video. In our approach, the problem of visual saliency estimation is modeled by simultaneously considering the stimulus-driven and task-related factors in a probabilistic framework. In this framework, a stimulus-driven component simulates the low-level processes in human vision system using multi-scale wavelet decomposition and unbiased feature competition; while a task-related component simulates the high-level processes to bias the competition of the input features. Different from existing approaches, we propose a multi-task learning algorithm to learn the task-related “stimulus-saliency” mapping functions for each scene. The algorithm also learns various fusion strategies, which are used to integrate the stimulus-driven and task-related components to obtain the visual saliency. Extensive experiments were carried out on two public eye-fixation datasets and one regional saliency dataset. Experimental results show that our approach outperforms eight state-of-the-art approaches remarkably.

This is a preview of subscription content, access via your institution.

References

  1. Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. In Advances in neural information processing systems (pp. 41–48).

  2. Bruce, N. D., & Tsotsos, J. K. (2006). Saliency based on information maximization. In Advances in neural information processing systems (pp. 155–162).

  3. Cerf, M., Harel, J., Einhauser, W., & Koch, C. (2008). Predicting human gaze using low-level saliency combined with face detection. In Advances in neural information processing systems (pp. 241–248).

  4. Cheung, C. H., & Po, L. M. (2002). A novel cross-diamond search algorithm for fast block motion estimation. IEEE Transactions on Circuits and Systems for Video Technology, 12(12), 1168–1177.

    Article  Google Scholar 

  5. Chun, M. M. (2005). Contextual guidance of visual attention. In Neurobiology of attention (pp. 246–250).

  6. Evgeniou, T., Micchelli, C. A., & Pontil, M. (2005). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6, 615–637.

    MathSciNet  Google Scholar 

  7. Frith, C. (2005). The top in top-down attention. In Neurobiology of attention (pp. 105–108).

  8. Guo, C., Ma, Q., & Zhang, L. (2008). Spatio-temporal saliency detection using phase spectrum of quaternion Fourier transform. In IEEE conference on computer vision and pattern recognition.

  9. Harel, J., Koch, C., & Perona, P. (2007). Graph-based visual saliency. In Advances in neural information processing systems (pp. 545–552).

  10. Henderson, J. M. (2003). Human gaze control during real-world scene perception. Trends in Cognitive Sciences, 7(11), 498–504.

    Article  Google Scholar 

  11. Hou, X., & Zhang, L. (2007). Saliency detection: a spectral residual approach. In IEEE conference on computer vision and pattern recognition.

  12. Hu, Y., Rajan, D., & Chia, L. T. (2005a). Adaptive local context suppression of multiple cues for salient visual attention detection. In IEEE international conference on multimedia and expo.

  13. Hu, Y., Rajan, D., & Chia, L. T. (2005b). Robust subspace analysis for detecting visual attention regions in images. In ACM international conference on multimedia (pp. 716–724).

  14. Itti, L. (2000). Models of bottom-up and top-down visual attention. Ph.D. thesis, California Institute of Technology.

  15. Itti, L. (2008). Crcns data sharing: eye movements during free-viewing of natural videos. In Collaborative research in computational neuroscience annual meeting.

  16. Itti, L., & Baldi, P. (2005). A principled approach to detecting surprising events in video. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 631–637).

  17. Itti, L., & Koch, C. (2001a). Computational modeling of visual attention. Nature Review Neuroscience, 2(3), 194–203.

    Article  Google Scholar 

  18. Itti, L., & Koch, C. (2001b). Feature combination strategies for saliency-based visual attention systems. Journal of Electronic Imaging, 10(1), 161–169.

    Article  Google Scholar 

  19. Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259.

    Article  Google Scholar 

  20. Itti, L., Rees, G., & Tsotsos, J. (2005). Neurobiology of attention. San Diego: Elsevier.

    Google Scholar 

  21. Jacob, L., Bach, F., & Vert, J. P. (2008). Clustered multi-task learning: a convex formulation. In Advances in neural information processing systems (pp. 745–752).

  22. Kienzle, W., Wichmann, A. F., Scholkopf, B., & Franz, M. O. (2007a). A nonparametric approach to bottom-up visual saliency. In Advances in neural information processing systems (pp. 689–696).

  23. Kienzle, W., Scholkopf, B., Wichmann, F. A., & Franz, M. O. (2007b). How to find interesting locations in video: a spatiotemporal interest point detector learned from human eye movements. In 29th DAGM symposium (pp. 405–414).

  24. Li, J., Tian, Y., Huang, T., & Gao, W. (2009). A dataset and evaluation methodology for visual saliency in video. In IEEE international conference on multimedia and expo (pp. 442–445).

  25. Liu, H., Jiang, S., Huang, Q., Xu, C., & Gao, W. (2007a). Region-based visual attention analysis with its application in image browsing on small displays. In ACM international conference on multimedia (pp. 305–308).

  26. Liu, T., Sun, J., Zheng, N. N., Tang, X., & Shum, H. Y. (2007b). Learning to detect a salient object. In IEEE conference on computer vision and pattern recognition.

  27. Liu, T., Zheng, N., Ding, W., & Yuan, Z. (2008). Video attention: Learning to detect a salient object sequence. IEEE international conference on pattern recognition.

  28. Ma, Y. F., Hua, X. S., Lu, L., & Zhang, H. J. (2005). A generic framework of user attention model and its application in video summarization. IEEE Transactions on Multimedia, 7(5), 907–919.

    Article  Google Scholar 

  29. Marat, S., Phuoc, T. H., Granjon, L., Guyader, N., Pellerin, D., & Guärin-Duguä, A. (2009). Modelling spatio-temporal saliency to predict gaze direction for short videos. International Journal of Computer Vision, 82(3), 231–243.

    Article  Google Scholar 

  30. Mozer, M. C., Shettel, M., & Vecera, S. (2005). Top-down control of visual attention—a rational account. In Neural information processing systems.

  31. Navalpakkam, V., & Itti, L. (2007). Search goal tunes visual features optimally. Neuron, 53, 605–617.

    Article  Google Scholar 

  32. Peters, R. J., & Itti, L. (2007a). Beyond bottom-up: incorporating task-dependent influences into a computational model of spatial attention. In IEEE conference on computer vision and pattern recognition.

  33. Peters, R. J., & Itti, L. (2007b). Congruence between model and human attention reveals unique signatures of critical visual events. In Advances in neural information processing systems (pp. 1145–1152).

  34. Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12(1), 97–136.

    Article  Google Scholar 

  35. Walther, D., & Koch, C. (2006). Modeling attention to salient proto-objects. Neural Networks, 19(9), 1395–1407.

    MATH  Article  Google Scholar 

  36. Wolfe, J. M., Alvarez, G. A., & Horowitz, T. S. (2000). Attention is fast but volition is slow. Nature, 406, 691.

    Article  Google Scholar 

  37. Wolfe, J. M. (1998). Visual search. In Attention (pp. 13–73).

  38. Wolfe, J. M. (2005). Guidance of visual search by preattentive information. In Neurobiology of attention (pp. 101–104).

  39. Zhai, Y., & Shah, M. (2006). Visual attention detection in video sequences using spatiotemporal cues. In ACM international conference on multimedia (pp. 815–824).

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Yonghong Tian.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Li, J., Tian, Y., Huang, T. et al. Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video. Int J Comput Vis 90, 150–165 (2010). https://doi.org/10.1007/s11263-010-0354-6

Download citation

Keywords

  • Visual saliency
  • Probabilistic framework
  • Visual search tasks
  • Multi-task learning