International Journal of Computer Vision

, Volume 90, Issue 2, pp 150–165 | Cite as

Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video

Article

Abstract

In this paper, we present a probabilistic multi-task learning approach for visual saliency estimation in video. In our approach, the problem of visual saliency estimation is modeled by simultaneously considering the stimulus-driven and task-related factors in a probabilistic framework. In this framework, a stimulus-driven component simulates the low-level processes in human vision system using multi-scale wavelet decomposition and unbiased feature competition; while a task-related component simulates the high-level processes to bias the competition of the input features. Different from existing approaches, we propose a multi-task learning algorithm to learn the task-related “stimulus-saliency” mapping functions for each scene. The algorithm also learns various fusion strategies, which are used to integrate the stimulus-driven and task-related components to obtain the visual saliency. Extensive experiments were carried out on two public eye-fixation datasets and one regional saliency dataset. Experimental results show that our approach outperforms eight state-of-the-art approaches remarkably.

Keywords

Visual saliency Probabilistic framework Visual search tasks Multi-task learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. In Advances in neural information processing systems (pp. 41–48). Google Scholar
  2. Bruce, N. D., & Tsotsos, J. K. (2006). Saliency based on information maximization. In Advances in neural information processing systems (pp. 155–162). Google Scholar
  3. Cerf, M., Harel, J., Einhauser, W., & Koch, C. (2008). Predicting human gaze using low-level saliency combined with face detection. In Advances in neural information processing systems (pp. 241–248). Google Scholar
  4. Cheung, C. H., & Po, L. M. (2002). A novel cross-diamond search algorithm for fast block motion estimation. IEEE Transactions on Circuits and Systems for Video Technology, 12(12), 1168–1177. CrossRefGoogle Scholar
  5. Chun, M. M. (2005). Contextual guidance of visual attention. In Neurobiology of attention (pp. 246–250). Google Scholar
  6. Evgeniou, T., Micchelli, C. A., & Pontil, M. (2005). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6, 615–637. MathSciNetGoogle Scholar
  7. Frith, C. (2005). The top in top-down attention. In Neurobiology of attention (pp. 105–108). Google Scholar
  8. Guo, C., Ma, Q., & Zhang, L. (2008). Spatio-temporal saliency detection using phase spectrum of quaternion Fourier transform. In IEEE conference on computer vision and pattern recognition. Google Scholar
  9. Harel, J., Koch, C., & Perona, P. (2007). Graph-based visual saliency. In Advances in neural information processing systems (pp. 545–552). Google Scholar
  10. Henderson, J. M. (2003). Human gaze control during real-world scene perception. Trends in Cognitive Sciences, 7(11), 498–504. CrossRefGoogle Scholar
  11. Hou, X., & Zhang, L. (2007). Saliency detection: a spectral residual approach. In IEEE conference on computer vision and pattern recognition. Google Scholar
  12. Hu, Y., Rajan, D., & Chia, L. T. (2005a). Adaptive local context suppression of multiple cues for salient visual attention detection. In IEEE international conference on multimedia and expo. Google Scholar
  13. Hu, Y., Rajan, D., & Chia, L. T. (2005b). Robust subspace analysis for detecting visual attention regions in images. In ACM international conference on multimedia (pp. 716–724). Google Scholar
  14. Itti, L. (2000). Models of bottom-up and top-down visual attention. Ph.D. thesis, California Institute of Technology. Google Scholar
  15. Itti, L. (2008). Crcns data sharing: eye movements during free-viewing of natural videos. In Collaborative research in computational neuroscience annual meeting. Google Scholar
  16. Itti, L., & Baldi, P. (2005). A principled approach to detecting surprising events in video. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 631–637). Google Scholar
  17. Itti, L., & Koch, C. (2001a). Computational modeling of visual attention. Nature Review Neuroscience, 2(3), 194–203. CrossRefGoogle Scholar
  18. Itti, L., & Koch, C. (2001b). Feature combination strategies for saliency-based visual attention systems. Journal of Electronic Imaging, 10(1), 161–169. CrossRefGoogle Scholar
  19. Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259. CrossRefGoogle Scholar
  20. Itti, L., Rees, G., & Tsotsos, J. (2005). Neurobiology of attention. San Diego: Elsevier. Google Scholar
  21. Jacob, L., Bach, F., & Vert, J. P. (2008). Clustered multi-task learning: a convex formulation. In Advances in neural information processing systems (pp. 745–752). Google Scholar
  22. Kienzle, W., Wichmann, A. F., Scholkopf, B., & Franz, M. O. (2007a). A nonparametric approach to bottom-up visual saliency. In Advances in neural information processing systems (pp. 689–696). Google Scholar
  23. Kienzle, W., Scholkopf, B., Wichmann, F. A., & Franz, M. O. (2007b). How to find interesting locations in video: a spatiotemporal interest point detector learned from human eye movements. In 29th DAGM symposium (pp. 405–414). Google Scholar
  24. Li, J., Tian, Y., Huang, T., & Gao, W. (2009). A dataset and evaluation methodology for visual saliency in video. In IEEE international conference on multimedia and expo (pp. 442–445). Google Scholar
  25. Liu, H., Jiang, S., Huang, Q., Xu, C., & Gao, W. (2007a). Region-based visual attention analysis with its application in image browsing on small displays. In ACM international conference on multimedia (pp. 305–308). Google Scholar
  26. Liu, T., Sun, J., Zheng, N. N., Tang, X., & Shum, H. Y. (2007b). Learning to detect a salient object. In IEEE conference on computer vision and pattern recognition. Google Scholar
  27. Liu, T., Zheng, N., Ding, W., & Yuan, Z. (2008). Video attention: Learning to detect a salient object sequence. IEEE international conference on pattern recognition. Google Scholar
  28. Ma, Y. F., Hua, X. S., Lu, L., & Zhang, H. J. (2005). A generic framework of user attention model and its application in video summarization. IEEE Transactions on Multimedia, 7(5), 907–919. CrossRefGoogle Scholar
  29. Marat, S., Phuoc, T. H., Granjon, L., Guyader, N., Pellerin, D., & Guärin-Duguä, A. (2009). Modelling spatio-temporal saliency to predict gaze direction for short videos. International Journal of Computer Vision, 82(3), 231–243. CrossRefGoogle Scholar
  30. Mozer, M. C., Shettel, M., & Vecera, S. (2005). Top-down control of visual attention—a rational account. In Neural information processing systems. Google Scholar
  31. Navalpakkam, V., & Itti, L. (2007). Search goal tunes visual features optimally. Neuron, 53, 605–617. CrossRefGoogle Scholar
  32. Peters, R. J., & Itti, L. (2007a). Beyond bottom-up: incorporating task-dependent influences into a computational model of spatial attention. In IEEE conference on computer vision and pattern recognition. Google Scholar
  33. Peters, R. J., & Itti, L. (2007b). Congruence between model and human attention reveals unique signatures of critical visual events. In Advances in neural information processing systems (pp. 1145–1152). Google Scholar
  34. Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12(1), 97–136. CrossRefGoogle Scholar
  35. Walther, D., & Koch, C. (2006). Modeling attention to salient proto-objects. Neural Networks, 19(9), 1395–1407. MATHCrossRefGoogle Scholar
  36. Wolfe, J. M., Alvarez, G. A., & Horowitz, T. S. (2000). Attention is fast but volition is slow. Nature, 406, 691. CrossRefGoogle Scholar
  37. Wolfe, J. M. (1998). Visual search. In Attention (pp. 13–73). Google Scholar
  38. Wolfe, J. M. (2005). Guidance of visual search by preattentive information. In Neurobiology of attention (pp. 101–104). Google Scholar
  39. Zhai, Y., & Shah, M. (2006). Visual attention detection in video sequences using spatiotemporal cues. In ACM international conference on multimedia (pp. 815–824). Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Key Lab of Intelligent Information Processing, Institute of Computing TechnologyChinese Academy of SciencesBeijingChina
  2. 2.Graduate University of Chinese Academy of SciencesBeijingChina
  3. 3.National Engineering Laboratory for Video TechnologyPeking UniversityBeijingChina

Personalised recommendations