Skip to main content
Log in

Measuring Visual Surprise Jointly from Intrinsic and Extrinsic Contexts for Image Saliency Estimation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Detecting conspicuous image content is a challenging task in the field of computer vision. In existing studies, most approaches focus on estimating saliency only with the cues from the input image. However, such “intrinsic” cues are often insufficient to distinguish targets and distractors that may share some common visual attributes. To address this problem, we present an approach to estimate image saliency by measuring the joint visual surprise from intrinsic and extrinsic contexts. In this approach, a hierarchical context model is first built on a database of 31.2 million images, where a Gaussian mixture model (GMM) is trained for each leaf node to encode the prior knowledge on “what is where” in a specific scene. For a testing image that shares similar spatial layout within a scene, the pre-trained GMM can serve as an extrinsic context model to measure the “surprise” of an image patch. Since human attention may quickly shift between different surprising locations, we adopt a Markov chain to model a surprise-driven attention-shifting process so as to infer the salient patches that can best capture human attention. Experiments show that our approach outperforms 19 state-of-the-art methods in fixation prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. No winner-take-all is used in IT.

  2. The bottom-up saliency maps predicted by IT are used as the input for SP.

  3. Our implementation of (Wang et al. 2011), as well as its experimental data and results on MIT1003, can be found at http://mlg.idm.pku.edu.cn/mlg/download/code/wang11.zip.

  4. SNR, PMT, JUD and MIR are trained on 903 images and tested on the other 100 images.

References

  • Borji, A. (2012). Boosting bottom-up and top-down visual features for saliency estimation. In IEEE conference on computer vision and pattern recognition (pp. 438–445).

  • Bouman, C. A. (1997). Cluster: An unsupervised algorithm for modeling Gaussian mixtures. http://www.ece.purdue.edu/bouman.

  • Bruce, N. D., & Tsotsos, J. K. (2005). Saliency based on information maximization. In Advances in neural information processing systems, Vancouver, BC, Canada (pp. 155–162).

  • Cerf, M., Harel, J., Einhauser, W., & Koch, C. (2009). Predictinghuman gaze using low-level saliency combined with face detection. InAdvances in neural information processing systems, Vancouver, BC, Canada.

  • Cheng, M. M., Zhang, G. X., Mitra, N., Huang, X., & Hu, S. M. (2011). Global contrast based salient region detection. In IEEE conference on computer vision and pattern recognition (pp. 409–416).

  • Chun, M. M., & Jiang, Y. (1998). Contextual cueing: Implicit learning and memory of visual context guides spatial attention. Cognitive Psychology, 36(1), 28–71.

    Article  Google Scholar 

  • Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.

    MathSciNet  MATH  Google Scholar 

  • Desingh, K., Krishna, K. M., Rajan, D., & Jawahar, C. (2013). Depth really matters: Improving visual salient region detection with depth. In British machine vision conference.

  • Erdem, E., & Erdem, A. (2013). Visual saliency estimation by nonlinearly integrating features using region covariances. Journal of Vision, 13(4), 11.

    Article  MathSciNet  Google Scholar 

  • Fu, H., Cao, X., & Tu, Z. (2013). Cluster-based co-saliency detection. IEEE Transactions on Image Processing, 22(10), 3766–3778.

    Article  MathSciNet  Google Scholar 

  • Gao, D., Han, S., & Vasconcelos, N. (2009). Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6), 989–1005.

    Article  Google Scholar 

  • Goferman, S., Zelnik-Manor, L., & Tal, A. (2010). Context-aware saliency detection. In IEEE conference on computer vision and pattern recognition (pp. 2376–2383).

  • Gopalakrishnan, V., Hu, Y., & Rajan, D. (2009). Random walks on graphs to model saliency in images. In IEEE conference on computer vision and pattern recognition (pp. 1698–1705).

  • Harel, J., Koch, C., & Perona, P. (2007) Graph-based visual saliency. In Advances in neural information processing systems (pp. 545–552).

  • Hou, W., Gao, X., Tao, D., & Li, X. (2013). Visual saliency detection using information divergence. Pattern Recognition, 46(10), 2658–2669.

    Article  Google Scholar 

  • Hou, X., & Zhang, L. (2007). Saliency detection: A spectral residual approach. In IEEE conference on computer vision and pattern recognition (pp. 1–8).

  • Hou, X., & Zhang, L. (2009). Dynamic visual attention: Searching for coding length increments. In Advances in neural information processing systems (pp. 681–688).

  • Hou, X., Harel, J., & Koch, C. (2012). Image signature: Highlighting sparse salient regions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(1), 194–201.

    Article  Google Scholar 

  • Hu, Y., Rajan, D., & Chia, L. T. (2005). Adaptive local context suppression of multiple cues for salient visual attention detection. In IEEE international conference on multimedia and expo.

  • Itti, L., & Baldi, P. (2005). A principled approach to detecting surprising events in video. In IEEE conference on computer vision and pattern recognition (Vol. 1, pp. 631–637).

  • Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259.

    Article  Google Scholar 

  • Itti, L., Rees, G., & Tsotsos, J. (2005). Neurobiology of attention (1st ed.). San Diego, Amsterdam: Elsevier Press.

    Google Scholar 

  • Judd, T., Ehinger, K., Durand, F., & Torralba, A. (2009). Learning to predict where humans look. In IEEE international conference on computer vision (pp. 2106–2113).

  • Kanwisher, N., & Wojciulik, E. (2000). Visual attention: Insights from brain imaging. Nature Reviews Neuroscience, 1(2), 91–100.

    Article  Google Scholar 

  • Kunar, M. A., Flusberg, S. J., & Wolfe, J. M. (2006). Contextual cuing by global features. Journal Perception and Psychophysics, 68(7), 1204–1216.

    Article  Google Scholar 

  • Li, J., & Gao, W. (2014). Visual saliency computation: A machine learning perspective (1st ed.). Switzerland: Springer.

    Book  Google Scholar 

  • Li, J., Tian, Y., Huang, T., & Gao, W. (2010). Probabilistic multi-task learning for visual saliency estimation in video. International Journal of Computer Vision, 90(2), 150–165.

    Article  Google Scholar 

  • Li, J., Tian, Y., Huang, T., & Gao, W. (2011). Multi-task rank learning for visual saliency estimation. IEEE Transactions on Circuits and Systems for Video Technology, 21(5), 623–636.

    Article  Google Scholar 

  • Li, J., Xu, D., & Gao, W. (2012). Removing label ambiguity in learning-based visual saliency estimation. IEEE Transactions on Image Processing, 21(4), 1513–1525.

    Article  MathSciNet  Google Scholar 

  • Li, J., Levine, M., An, X., Xu, X., & He, H. (2013). Visual saliency based on scale-space analysis in the frequency domain. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(4), 996–1010.

    Article  Google Scholar 

  • Li, J., Tian, Y., & Huang, T. (2014). Visual saliency with statistical priors. International Journal of Computer Vision, 107(3), 239–253.

    Article  MathSciNet  MATH  Google Scholar 

  • Liu, T., Sun, J., Zheng, N. N., Tang, X., & Shum, H. Y. (2007). Learning to detect a salient object. In IEEE conference on computer vision and pattern recognition (pp. 1–8).

  • Lu, S., Tan, C., & Lim, J. (2013). Robust and efficient saliency modeling from image co-occurrence histograms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(1), 195–201.

    Google Scholar 

  • Marat, S., Ho Phuoc, T., Granjon, L., Guyader, N., Pellerin, D., & Guérin-Dugué, A. (2009). Modelling spatio-temporal saliency to predict gaze direction for short videos. International Journal of Computer Vision, 82(3), 231–243.

    Article  Google Scholar 

  • Marchesotti, L., Cifarelli, C., & Csurka, G. (2009). A framework for visual saliency detection with applications to image thumbnailing. In IEEE international conference on computer vision (pp. 2232–2239).

  • Margolin, R., Tal, A., & Zelnik-Manor, L. (2013). What makes a patch distinct? In IEEE conference on computer vision and pattern recognition (pp. 1139–1146).

  • Navalpakkam, V., & Itti, L. (2007). Search goal tunes visual features optimally. Neuron, 53, 605–617.

    Article  Google Scholar 

  • Niu, Y., Geng, Y., Li, X., & Liu, F. (2012). Leveraging stereopsis for saliency analysis. In IEEE conference on computer vision and pattern recognition (pp. 454–461).

  • Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 3, 145–175.

    Article  MATH  Google Scholar 

  • Russell, A. F., Mihalaş, S., von der Heydt, R., Niebur, E., & Etienne-Cummings, R. (2014). A model of proto-object based saliency. Vision Research, 94, 1–15.

    Article  Google Scholar 

  • Siva, P., Russell, C., Xiang, T., & Agapito, L. (2013). Looking beyond the image: Unsupervised learning for object saliency and detection. In IEEE conference on computer vision and pattern recognition (pp. 3238–3245).

  • Sun, X., Yao, H., & Ji, R. (2012). What are we looking for: Towards statistical modeling of saccadic eye movements and visual saliency. In IEEE conference on computer vision and pattern recognition (pp. 1552–1559).

  • Torralba, A. (2005). Contextual influences on saliency. In L. Itti, G. Rees, & J. Tsotsos (Eds.), Neurobiology of Attention (1st ed., pp. 586–592). Amsterdam: Elsevier Press.

    Chapter  Google Scholar 

  • Vig, E., Dorr, M., Martinetz, T., & Barth, E. (2012). Intrinsic dimensionality predicts the saliency of natural dynamic scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(6), 1080–1091.

    Article  Google Scholar 

  • Vikram, T. N., Tscherepanow, M., & Wrede, B. (2012). A saliency map based on sampling an image into random rectangular regions of interest. Pattern Recognition, 45, 3114–3124.

    Article  Google Scholar 

  • Wang, M., Konrad, J., Ishwar, P., Jing, K., & Rowley, H. (2011). Image saliency: From intrinsic to extrinsic context. In IEEE conference on computer vision and pattern recognition (pp. 417–424).

  • Wang, P., Wang, J., Zeng, G., Feng, J., Zha, H., & Li, S. (2012). Salient object detection for searched web images via global saliency. In IEEE conference on computer vision and pattern recognition (pp. 3194–3201).

  • Wang, W., Wang, Y., Huang, Q., & Gao, W. (2010). Measuring visual saliency by site entropy rate. In IEEE conference on computer vision and pattern recognition (pp. 2368–2375).

  • Wei, S., Xu, D., Li, X., & Zhao, Y. (2013). Joint optimization toward effective and efficient image search. IEEE Transactions on Cybernetics, 43(6), 2216–2227.

  • Xu, J., Jiang, M., Wang, S., Kankanhalli, M. S., & Zhao, Q. (2014). Predicting human gaze beyond pixels. Journal of Vision, 14(1), 1–20.

    Article  Google Scholar 

  • Zhang, J., & Sclaroff, S. (2013). Saliency detection: A boolean map approach. In IEEE international conference on computer vision (pp. 153–160).

  • Zhang, L., Tong, M. H., Marks, T. K., Shan, H., & Cottrell, G. W. (2008). Sun: A bayesian framework for saliency using natural statistics. Journal of Vision, 8(7), 32.

    Article  Google Scholar 

  • Zhao, Q., & Koch, C. (2011). Learning a saliency map using fixated locations in natural scenes. Journal of Vision, 11(3), 9.

    Article  Google Scholar 

  • Zhao, Q. (2012). Learning visual saliency by combining feature maps in a nonlinear manner using adaboost. Journal of Vision, 12(6), 22.

    Article  Google Scholar 

  • Zhu, G., Wang, Q., Yuan, Y., & Yan, P. (2013). Learning saliency by mrf and differential threshold. IEEE Transactions on Cybernetics, 43(6), 2032–2043.

    Article  Google Scholar 

Download references

Acknowledgments

Authors would like to thank anonymous reviewers for their helpful comments on the paper. This work was supported in part by grants from National Basic Research Program of China under Contract 2015CB351806, National Natural Science Foundation of China (61370113, 61532003, 61390515 and 61421062), and Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jia Li or Yonghong Tian.

Additional information

Communicated by Y. Sato.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Tian, Y., Chen, X. et al. Measuring Visual Surprise Jointly from Intrinsic and Extrinsic Contexts for Image Saliency Estimation. Int J Comput Vis 120, 44–60 (2016). https://doi.org/10.1007/s11263-016-0892-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-016-0892-7

Keywords

Navigation