International Journal of Computer Vision

, Volume 107, Issue 3, pp 239–253 | Cite as

Visual Saliency with Statistical Priors

Article

Abstract

Visual saliency is a useful cue to locate the conspicuous image content. To estimate saliency, many approaches have been proposed to detect the unique or rare visual stimuli. However, such bottom-up solutions are often insufficient since the prior knowledge, which often indicates a biased selectivity on the input stimuli, is not taken into account. To solve this problem, this paper presents a novel approach to estimate image saliency by learning the prior knowledge. In our approach, the influences of the visual stimuli and the prior knowledge are jointly incorporated into a Bayesian framework. In this framework, the bottom-up saliency is calculated to pop-out the visual subsets that are probably salient, while the prior knowledge is used to recover the wrongly suppressed targets and inhibit the improperly popped-out distractors. Compared with existing approaches, the prior knowledge used in our approach, including the foreground prior and the correlation prior, is statistically learned from 9.6 million images in an unsupervised manner. Experimental results on two public benchmarks show that such statistical priors are effective to modulate the bottom-up saliency to achieve impressive improvements when compared with 10 state-of-the-art methods.

Keywords

Visual saliency Prior knowledge Image statistics  Bayesian framework 

Notes

Acknowledgments

This work was supported in part by grants from the Chinese National Natural Science Foundation under contract No. 61370113 and No. 61035001, and the Supervisor Award Funding for Excellent Doctoral Dissertation of Beijing (No. 20128000103). This research was also partially supported by the Singapore National Research Foundation under its IDM Futures Funding Initiative and administered by the Interactive & Digital Media Programme Office, Media Development Authority.

References

  1. Achanta, R., Hemami, S., Estrada, F., & Susstrunk, S. (2009). Frequency-tuned salient region detection. In: IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  2. Aziz, M. Z., & Mertsching, B. (2008). Fast and robust generation of feature maps for region-based visual attention. IEEE Transactions on Image Processing, 17(5), 633–644.CrossRefMathSciNetGoogle Scholar
  3. Borji, A., & Itti, L. (2012). Exploiting local and global patch rarities for saliency detection (pp. 478–485). In: Proc. IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  4. Bruce, N. D., & Tsotsos, T. K. (2006). Saliency based on information maximization. Advances in Neural Information Processing Systems, 18, 155–162.Google Scholar
  5. Cerf, M., Harel, J., Einhauser, W., & Koch, C. (2008). Predicting human gaze using low-level saliency combined with face detection (pp. 241–248). In: Advances in Neural Information Processing Systems.Google Scholar
  6. Cheng, M. M., Zhang, G. X., Mitra, N. J., Huang, X., & Hu, S. M. (2011). Global contrast based salient region detection. In: IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  7. Chikkerur, S., Serre, T., Tan, C., & Poggio, T. (2010). What and where: A bayesian inference theory of attention. Vision Research, 50(22), 2233–2247.CrossRefGoogle Scholar
  8. Elazary, L., & Itti, L. (2008). Interesting objects are visually salient. Journal of Vision, 8(3), 1–15.CrossRefGoogle Scholar
  9. Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315, 972–976.CrossRefMATHMathSciNetGoogle Scholar
  10. Frith, C. (2005). The top in top-down attention. In L. Itti, G. Rees, & J. K. Tsotsos (Eds.), Neurobiology of attention (pp. 105–108). San Diego, CA: Elsevier.CrossRefGoogle Scholar
  11. Goferman, S., Zelnik-Manor, L., & Tal, A. (2010). Context-aware saliency detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  12. Han, F., & Zhu, S. C. (2009). Bottom-up/top-down image parsing with attribute grammar. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 59–73.CrossRefMathSciNetGoogle Scholar
  13. Harel, J., Koch, C., & Perona, P. (2006). Graph-based visual saliency. Advances in Neural Information Processing Systems, 19, 545–552.Google Scholar
  14. Henderson, J. M. (2003). Human gaze control during real-world scene perception. Trends in Cognitive Sciences, 7(11), 498–504.CrossRefGoogle Scholar
  15. Hou, X., & Zhang, L. (2007). Saliency detection: A spectral residual approach. In: IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  16. Hou, X., & Zhang, L. (2008). Dynamic visual attention: Searching for coding length increments. Advances in Neural Information Processing Systems, 21, 681–688.Google Scholar
  17. Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259.CrossRefGoogle Scholar
  18. Judd, T., Ehinger, K., Durand, F., & Torralba, A. (2009). Learning to predict where humans look. In: IEEE International Conference on Computer Vision.Google Scholar
  19. Kienzle, W., Wichmann F.A., Scholkopf B., & Franz M.O. (2007). A nonparametric approach to bottom-up visual saliency (pp. 689–696). In: Advances in Neural Information Processing Systems.Google Scholar
  20. Li, J., Tian, Y., Huang, T., & Gao, W. (2010). Probabilistic multi-task learning for visual saliency estimation in video. International Journal of Computer Vision, 90(2), 150–165.CrossRefGoogle Scholar
  21. Li, J., Levine, M. D., An, X., Xu, X., & He, H. (2013). Visual saliency based on scale-space analysis in the frequency domain. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(4), 996–1010.CrossRefGoogle Scholar
  22. Liu, H., Jiang, S., Huang, Q., Xu, C., & Gao, W. (2007a). Region-based visual attention analysis with its application in image browsing on small displays (pp. 305–308). In: ACM International Conference on Multimedia.Google Scholar
  23. Liu, T., Sun, J., Zheng, N. N., Tang, X., & Shum, H. Y. (2007b). Learning to detect a salient object. In: IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  24. Lu, Y., Zhang, W., Lu, H., & Xue, X. (2011). Salient object detection using concavity context (pp. 233–240). In: IEEE International Conference on Computer Vision.Google Scholar
  25. Meur, O. L., Callet, P. L., Barba, D., & Thoreau, D. (2006). A coherent computational approach to model bottom-up visual attention. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(5), 802–817.CrossRefGoogle Scholar
  26. Navalpakkam, V., & Itti, L. (2007). Search goal tunes visual features optimally. Neuron, 53, 605–617.CrossRefGoogle Scholar
  27. Parikh, D., Zitnick, C., & Chen, T. (2008). Determining patch saliency using low-level context. In: European Conference on Computer Vision.Google Scholar
  28. Peters, R. J., & Itti, L. (2007). Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention. In: IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  29. Riche, N., Mancas, M., Gosselin, B., & Dutoit, T. (2012). Rare: a new bottom-up saliency model. In: IEEE International Conference on Image Processing.Google Scholar
  30. Sun, X., Yao, H., & Ji, R. (2012). What are we looking for: Towards statistical modeling of saccadic eye movements and visual saliency (pp. 1552–1559). In: Proc. IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  31. Tatler, B. W., Baddeley, R. J., & Gilchrist, I. D. (2005). Visual correlates of fixation selection: Effects of scale and time. Vision Research, 45, 643–659.CrossRefGoogle Scholar
  32. Torralba, A., Oliva, A., Castelhano, M., & Henderson, J. (2006). Contextual guidance of eye movements and attention in real-world scenes: The role of global features on object search. Psychological Review, 113(4), 766–786.CrossRefGoogle Scholar
  33. Tseng, P. H., Carmi, R., Cameron, I. G. M., Munoz, D. P., & Itti, L. (2009). Quantifying center bias of observers in free viewing of dynamic natural scenes. Journal of Vision, 9(7), 1–16.CrossRefGoogle Scholar
  34. Vikram, T. N., Tscherepanow, M., & Wrede, B. (2012). A saliency map based on sampling an image into random rectangular regions of interest. Pattern Recognition, 45, 3114–3124.CrossRefGoogle Scholar
  35. Walther, D., & Koch, C. (2006). Modeling attention to salient proto-objects. Neural Networks, 19(9), 1395–1407.CrossRefMATHGoogle Scholar
  36. Wang, W., Wang, Y., Huang, Q., & Gao, W. (2010). Measuring visual saliency by site entropy rate. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  37. Wolfe, J. M., Alvarez, G. A., & Horowitz, T. S. (2000). Attention is fast but volition is slow. Nature, 406, 691.CrossRefGoogle Scholar
  38. Wolfe, J. M. (2005). Guidance of visual search by preattentive information. In L. Itti, G. Rees, & J. Tsotsos (Eds.), Neurobiology of attention (pp. 101–104). San Diego, CA: Academic Press.CrossRefGoogle Scholar
  39. Wu, T., & Zhu, S. C. (2011). A numerical study of the bottom-up and top-down inference processes in and-or graphs. International Journal of Computer Vision, 93(2), 226–252.CrossRefMATHMathSciNetGoogle Scholar
  40. Yang, J., & Yang, M. H. (2012). Top-down visual saliency via joint crf and dictionary learning (pp. 2296–2303). In: Proc. IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  41. Zhang, L., Tong, M. H., Marks, T. K., Shan, H., & Cottrell, G. W. (2008). Sun: A bayesian framework for saliency using natural statistics. Journal of Vision, 8(7), 1–20.CrossRefGoogle Scholar
  42. Zhao, Q., & Koch, C. (2011). Learning a saliency map using fixated locations in natural scenes. Journal of Vision, 11(3), 1–15.CrossRefGoogle Scholar
  43. Zhao, Q., & Koch, C. (2012). Learning visual saliency by combining feature maps in a nonlinear manner using adaboost. Journal of Vision, 12(6), 1–15.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.National Engineering Laboratory for Video Technology, School of EE & CSPeking UniversityBeijingChina

Personalised recommendations