Measuring Visual Surprise Jointly from Intrinsic and Extrinsic Contexts for Image Saliency Estimation

Abstract

Detecting conspicuous image content is a challenging task in the field of computer vision. In existing studies, most approaches focus on estimating saliency only with the cues from the input image. However, such “intrinsic” cues are often insufficient to distinguish targets and distractors that may share some common visual attributes. To address this problem, we present an approach to estimate image saliency by measuring the joint visual surprise from intrinsic and extrinsic contexts. In this approach, a hierarchical context model is first built on a database of 31.2 million images, where a Gaussian mixture model (GMM) is trained for each leaf node to encode the prior knowledge on “what is where” in a specific scene. For a testing image that shares similar spatial layout within a scene, the pre-trained GMM can serve as an extrinsic context model to measure the “surprise” of an image patch. Since human attention may quickly shift between different surprising locations, we adopt a Markov chain to model a surprise-driven attention-shifting process so as to infer the salient patches that can best capture human attention. Experiments show that our approach outperforms 19 state-of-the-art methods in fixation prediction.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Notes

  1. 1.

    No winner-take-all is used in IT.

  2. 2.

    The bottom-up saliency maps predicted by IT are used as the input for SP.

  3. 3.

    Our implementation of (Wang et al. 2011), as well as its experimental data and results on MIT1003, can be found at http://mlg.idm.pku.edu.cn/mlg/download/code/wang11.zip.

  4. 4.

    SNR, PMT, JUD and MIR are trained on 903 images and tested on the other 100 images.

References

  1. Borji, A. (2012). Boosting bottom-up and top-down visual features for saliency estimation. In IEEE conference on computer vision and pattern recognition (pp. 438–445).

  2. Bouman, C. A. (1997). Cluster: An unsupervised algorithm for modeling Gaussian mixtures. http://www.ece.purdue.edu/bouman.

  3. Bruce, N. D., & Tsotsos, J. K. (2005). Saliency based on information maximization. In Advances in neural information processing systems, Vancouver, BC, Canada (pp. 155–162).

  4. Cerf, M., Harel, J., Einhauser, W., & Koch, C. (2009). Predictinghuman gaze using low-level saliency combined with face detection. InAdvances in neural information processing systems, Vancouver, BC, Canada.

  5. Cheng, M. M., Zhang, G. X., Mitra, N., Huang, X., & Hu, S. M. (2011). Global contrast based salient region detection. In IEEE conference on computer vision and pattern recognition (pp. 409–416).

  6. Chun, M. M., & Jiang, Y. (1998). Contextual cueing: Implicit learning and memory of visual context guides spatial attention. Cognitive Psychology, 36(1), 28–71.

    Article  Google Scholar 

  7. Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.

    MathSciNet  MATH  Google Scholar 

  8. Desingh, K., Krishna, K. M., Rajan, D., & Jawahar, C. (2013). Depth really matters: Improving visual salient region detection with depth. In British machine vision conference.

  9. Erdem, E., & Erdem, A. (2013). Visual saliency estimation by nonlinearly integrating features using region covariances. Journal of Vision, 13(4), 11.

    MathSciNet  Article  Google Scholar 

  10. Fu, H., Cao, X., & Tu, Z. (2013). Cluster-based co-saliency detection. IEEE Transactions on Image Processing, 22(10), 3766–3778.

    MathSciNet  Article  Google Scholar 

  11. Gao, D., Han, S., & Vasconcelos, N. (2009). Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6), 989–1005.

    Article  Google Scholar 

  12. Goferman, S., Zelnik-Manor, L., & Tal, A. (2010). Context-aware saliency detection. In IEEE conference on computer vision and pattern recognition (pp. 2376–2383).

  13. Gopalakrishnan, V., Hu, Y., & Rajan, D. (2009). Random walks on graphs to model saliency in images. In IEEE conference on computer vision and pattern recognition (pp. 1698–1705).

  14. Harel, J., Koch, C., & Perona, P. (2007) Graph-based visual saliency. In Advances in neural information processing systems (pp. 545–552).

  15. Hou, W., Gao, X., Tao, D., & Li, X. (2013). Visual saliency detection using information divergence. Pattern Recognition, 46(10), 2658–2669.

    Article  Google Scholar 

  16. Hou, X., & Zhang, L. (2007). Saliency detection: A spectral residual approach. In IEEE conference on computer vision and pattern recognition (pp. 1–8).

  17. Hou, X., & Zhang, L. (2009). Dynamic visual attention: Searching for coding length increments. In Advances in neural information processing systems (pp. 681–688).

  18. Hou, X., Harel, J., & Koch, C. (2012). Image signature: Highlighting sparse salient regions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(1), 194–201.

    Article  Google Scholar 

  19. Hu, Y., Rajan, D., & Chia, L. T. (2005). Adaptive local context suppression of multiple cues for salient visual attention detection. In IEEE international conference on multimedia and expo.

  20. Itti, L., & Baldi, P. (2005). A principled approach to detecting surprising events in video. In IEEE conference on computer vision and pattern recognition (Vol. 1, pp. 631–637).

  21. Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259.

    Article  Google Scholar 

  22. Itti, L., Rees, G., & Tsotsos, J. (2005). Neurobiology of attention (1st ed.). San Diego, Amsterdam: Elsevier Press.

    Google Scholar 

  23. Judd, T., Ehinger, K., Durand, F., & Torralba, A. (2009). Learning to predict where humans look. In IEEE international conference on computer vision (pp. 2106–2113).

  24. Kanwisher, N., & Wojciulik, E. (2000). Visual attention: Insights from brain imaging. Nature Reviews Neuroscience, 1(2), 91–100.

    Article  Google Scholar 

  25. Kunar, M. A., Flusberg, S. J., & Wolfe, J. M. (2006). Contextual cuing by global features. Journal Perception and Psychophysics, 68(7), 1204–1216.

    Article  Google Scholar 

  26. Li, J., & Gao, W. (2014). Visual saliency computation: A machine learning perspective (1st ed.). Switzerland: Springer.

    Google Scholar 

  27. Li, J., Tian, Y., Huang, T., & Gao, W. (2010). Probabilistic multi-task learning for visual saliency estimation in video. International Journal of Computer Vision, 90(2), 150–165.

    Article  Google Scholar 

  28. Li, J., Tian, Y., Huang, T., & Gao, W. (2011). Multi-task rank learning for visual saliency estimation. IEEE Transactions on Circuits and Systems for Video Technology, 21(5), 623–636.

    Article  Google Scholar 

  29. Li, J., Xu, D., & Gao, W. (2012). Removing label ambiguity in learning-based visual saliency estimation. IEEE Transactions on Image Processing, 21(4), 1513–1525.

    MathSciNet  Article  Google Scholar 

  30. Li, J., Levine, M., An, X., Xu, X., & He, H. (2013). Visual saliency based on scale-space analysis in the frequency domain. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(4), 996–1010.

    Article  Google Scholar 

  31. Li, J., Tian, Y., & Huang, T. (2014). Visual saliency with statistical priors. International Journal of Computer Vision, 107(3), 239–253.

    MathSciNet  Article  MATH  Google Scholar 

  32. Liu, T., Sun, J., Zheng, N. N., Tang, X., & Shum, H. Y. (2007). Learning to detect a salient object. In IEEE conference on computer vision and pattern recognition (pp. 1–8).

  33. Lu, S., Tan, C., & Lim, J. (2013). Robust and efficient saliency modeling from image co-occurrence histograms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(1), 195–201.

    Google Scholar 

  34. Marat, S., Ho Phuoc, T., Granjon, L., Guyader, N., Pellerin, D., & Guérin-Dugué, A. (2009). Modelling spatio-temporal saliency to predict gaze direction for short videos. International Journal of Computer Vision, 82(3), 231–243.

    Article  Google Scholar 

  35. Marchesotti, L., Cifarelli, C., & Csurka, G. (2009). A framework for visual saliency detection with applications to image thumbnailing. In IEEE international conference on computer vision (pp. 2232–2239).

  36. Margolin, R., Tal, A., & Zelnik-Manor, L. (2013). What makes a patch distinct? In IEEE conference on computer vision and pattern recognition (pp. 1139–1146).

  37. Navalpakkam, V., & Itti, L. (2007). Search goal tunes visual features optimally. Neuron, 53, 605–617.

    Article  Google Scholar 

  38. Niu, Y., Geng, Y., Li, X., & Liu, F. (2012). Leveraging stereopsis for saliency analysis. In IEEE conference on computer vision and pattern recognition (pp. 454–461).

  39. Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 3, 145–175.

    Article  MATH  Google Scholar 

  40. Russell, A. F., Mihalaş, S., von der Heydt, R., Niebur, E., & Etienne-Cummings, R. (2014). A model of proto-object based saliency. Vision Research, 94, 1–15.

    Article  Google Scholar 

  41. Siva, P., Russell, C., Xiang, T., & Agapito, L. (2013). Looking beyond the image: Unsupervised learning for object saliency and detection. In IEEE conference on computer vision and pattern recognition (pp. 3238–3245).

  42. Sun, X., Yao, H., & Ji, R. (2012). What are we looking for: Towards statistical modeling of saccadic eye movements and visual saliency. In IEEE conference on computer vision and pattern recognition (pp. 1552–1559).

  43. Torralba, A. (2005). Contextual influences on saliency. In L. Itti, G. Rees, & J. Tsotsos (Eds.), Neurobiology of Attention (1st ed., pp. 586–592). Amsterdam: Elsevier Press.

    Google Scholar 

  44. Vig, E., Dorr, M., Martinetz, T., & Barth, E. (2012). Intrinsic dimensionality predicts the saliency of natural dynamic scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(6), 1080–1091.

    Article  Google Scholar 

  45. Vikram, T. N., Tscherepanow, M., & Wrede, B. (2012). A saliency map based on sampling an image into random rectangular regions of interest. Pattern Recognition, 45, 3114–3124.

    Article  Google Scholar 

  46. Wang, M., Konrad, J., Ishwar, P., Jing, K., & Rowley, H. (2011). Image saliency: From intrinsic to extrinsic context. In IEEE conference on computer vision and pattern recognition (pp. 417–424).

  47. Wang, P., Wang, J., Zeng, G., Feng, J., Zha, H., & Li, S. (2012). Salient object detection for searched web images via global saliency. In IEEE conference on computer vision and pattern recognition (pp. 3194–3201).

  48. Wang, W., Wang, Y., Huang, Q., & Gao, W. (2010). Measuring visual saliency by site entropy rate. In IEEE conference on computer vision and pattern recognition (pp. 2368–2375).

  49. Wei, S., Xu, D., Li, X., & Zhao, Y. (2013). Joint optimization toward effective and efficient image search. IEEE Transactions on Cybernetics, 43(6), 2216–2227.

  50. Xu, J., Jiang, M., Wang, S., Kankanhalli, M. S., & Zhao, Q. (2014). Predicting human gaze beyond pixels. Journal of Vision, 14(1), 1–20.

    Article  Google Scholar 

  51. Zhang, J., & Sclaroff, S. (2013). Saliency detection: A boolean map approach. In IEEE international conference on computer vision (pp. 153–160).

  52. Zhang, L., Tong, M. H., Marks, T. K., Shan, H., & Cottrell, G. W. (2008). Sun: A bayesian framework for saliency using natural statistics. Journal of Vision, 8(7), 32.

    Article  Google Scholar 

  53. Zhao, Q., & Koch, C. (2011). Learning a saliency map using fixated locations in natural scenes. Journal of Vision, 11(3), 9.

    Article  Google Scholar 

  54. Zhao, Q. (2012). Learning visual saliency by combining feature maps in a nonlinear manner using adaboost. Journal of Vision, 12(6), 22.

    Article  Google Scholar 

  55. Zhu, G., Wang, Q., Yuan, Y., & Yan, P. (2013). Learning saliency by mrf and differential threshold. IEEE Transactions on Cybernetics, 43(6), 2032–2043.

    Article  Google Scholar 

Download references

Acknowledgments

Authors would like to thank anonymous reviewers for their helpful comments on the paper. This work was supported in part by grants from National Basic Research Program of China under Contract 2015CB351806, National Natural Science Foundation of China (61370113, 61532003, 61390515 and 61421062), and Fundamental Research Funds for the Central Universities.

Author information

Affiliations

Authors

Corresponding authors

Correspondence to Jia Li or Yonghong Tian.

Additional information

Communicated by Y. Sato.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, J., Tian, Y., Chen, X. et al. Measuring Visual Surprise Jointly from Intrinsic and Extrinsic Contexts for Image Saliency Estimation. Int J Comput Vis 120, 44–60 (2016). https://doi.org/10.1007/s11263-016-0892-7

Download citation

Keywords

  • Image saliency
  • Visual surprise
  • Intrinsic context
  • Extrinsic context
  • Gaussian mixture model
  • Markov chain