Measuring Visual Surprise Jointly from Intrinsic and Extrinsic Contexts for Image Saliency Estimation

Li, Jia; Tian, Yonghong; Chen, Xiaowu; Huang, Tiejun

doi:10.1007/s11263-016-0892-7

Measuring Visual Surprise Jointly from Intrinsic and Extrinsic Contexts for Image Saliency Estimation

Published: 01 March 2016

Volume 120, pages 44–60, (2016)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Jia Li^1,2,
Yonghong Tian^3,4,
Xiaowu Chen¹ &
…
Tiejun Huang^3,4

910 Accesses
8 Citations
Explore all metrics

Abstract

Detecting conspicuous image content is a challenging task in the field of computer vision. In existing studies, most approaches focus on estimating saliency only with the cues from the input image. However, such “intrinsic” cues are often insufficient to distinguish targets and distractors that may share some common visual attributes. To address this problem, we present an approach to estimate image saliency by measuring the joint visual surprise from intrinsic and extrinsic contexts. In this approach, a hierarchical context model is first built on a database of 31.2 million images, where a Gaussian mixture model (GMM) is trained for each leaf node to encode the prior knowledge on “what is where” in a specific scene. For a testing image that shares similar spatial layout within a scene, the pre-trained GMM can serve as an extrinsic context model to measure the “surprise” of an image patch. Since human attention may quickly shift between different surprising locations, we adopt a Markov chain to model a surprise-driven attention-shifting process so as to infer the salient patches that can best capture human attention. Experiments show that our approach outperforms 19 state-of-the-art methods in fixation prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

Microsoft COCO: Common Objects in Context

Notes

No winner-take-all is used in IT.
The bottom-up saliency maps predicted by IT are used as the input for SP.
Our implementation of (Wang et al. 2011), as well as its experimental data and results on MIT1003, can be found at http://mlg.idm.pku.edu.cn/mlg/download/code/wang11.zip.
SNR, PMT, JUD and MIR are trained on 903 images and tested on the other 100 images.

References

Borji, A. (2012). Boosting bottom-up and top-down visual features for saliency estimation. In IEEE conference on computer vision and pattern recognition (pp. 438–445).
Bouman, C. A. (1997). Cluster: An unsupervised algorithm for modeling Gaussian mixtures. http://www.ece.purdue.edu/bouman.
Bruce, N. D., & Tsotsos, J. K. (2005). Saliency based on information maximization. In Advances in neural information processing systems, Vancouver, BC, Canada (pp. 155–162).
Cerf, M., Harel, J., Einhauser, W., & Koch, C. (2009). Predictinghuman gaze using low-level saliency combined with face detection. InAdvances in neural information processing systems, Vancouver, BC, Canada.
Cheng, M. M., Zhang, G. X., Mitra, N., Huang, X., & Hu, S. M. (2011). Global contrast based salient region detection. In IEEE conference on computer vision and pattern recognition (pp. 409–416).
Chun, M. M., & Jiang, Y. (1998). Contextual cueing: Implicit learning and memory of visual context guides spatial attention. Cognitive Psychology, 36(1), 28–71.
Article Google Scholar
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.
MathSciNet MATH Google Scholar
Desingh, K., Krishna, K. M., Rajan, D., & Jawahar, C. (2013). Depth really matters: Improving visual salient region detection with depth. In British machine vision conference.
Erdem, E., & Erdem, A. (2013). Visual saliency estimation by nonlinearly integrating features using region covariances. Journal of Vision, 13(4), 11.
Article MathSciNet Google Scholar
Fu, H., Cao, X., & Tu, Z. (2013). Cluster-based co-saliency detection. IEEE Transactions on Image Processing, 22(10), 3766–3778.
Article MathSciNet Google Scholar
Gao, D., Han, S., & Vasconcelos, N. (2009). Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6), 989–1005.
Article Google Scholar
Goferman, S., Zelnik-Manor, L., & Tal, A. (2010). Context-aware saliency detection. In IEEE conference on computer vision and pattern recognition (pp. 2376–2383).
Gopalakrishnan, V., Hu, Y., & Rajan, D. (2009). Random walks on graphs to model saliency in images. In IEEE conference on computer vision and pattern recognition (pp. 1698–1705).
Harel, J., Koch, C., & Perona, P. (2007) Graph-based visual saliency. In Advances in neural information processing systems (pp. 545–552).
Hou, W., Gao, X., Tao, D., & Li, X. (2013). Visual saliency detection using information divergence. Pattern Recognition, 46(10), 2658–2669.
Article Google Scholar
Hou, X., & Zhang, L. (2007). Saliency detection: A spectral residual approach. In IEEE conference on computer vision and pattern recognition (pp. 1–8).
Hou, X., & Zhang, L. (2009). Dynamic visual attention: Searching for coding length increments. In Advances in neural information processing systems (pp. 681–688).
Hou, X., Harel, J., & Koch, C. (2012). Image signature: Highlighting sparse salient regions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(1), 194–201.
Article Google Scholar
Hu, Y., Rajan, D., & Chia, L. T. (2005). Adaptive local context suppression of multiple cues for salient visual attention detection. In IEEE international conference on multimedia and expo.
Itti, L., & Baldi, P. (2005). A principled approach to detecting surprising events in video. In IEEE conference on computer vision and pattern recognition (Vol. 1, pp. 631–637).
Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259.
Article Google Scholar
Itti, L., Rees, G., & Tsotsos, J. (2005). Neurobiology of attention (1st ed.). San Diego, Amsterdam: Elsevier Press.
Google Scholar
Judd, T., Ehinger, K., Durand, F., & Torralba, A. (2009). Learning to predict where humans look. In IEEE international conference on computer vision (pp. 2106–2113).
Kanwisher, N., & Wojciulik, E. (2000). Visual attention: Insights from brain imaging. Nature Reviews Neuroscience, 1(2), 91–100.
Article Google Scholar
Kunar, M. A., Flusberg, S. J., & Wolfe, J. M. (2006). Contextual cuing by global features. Journal Perception and Psychophysics, 68(7), 1204–1216.
Article Google Scholar
Li, J., & Gao, W. (2014). Visual saliency computation: A machine learning perspective (1st ed.). Switzerland: Springer.
Book Google Scholar
Li, J., Tian, Y., Huang, T., & Gao, W. (2010). Probabilistic multi-task learning for visual saliency estimation in video. International Journal of Computer Vision, 90(2), 150–165.
Article Google Scholar
Li, J., Tian, Y., Huang, T., & Gao, W. (2011). Multi-task rank learning for visual saliency estimation. IEEE Transactions on Circuits and Systems for Video Technology, 21(5), 623–636.
Article Google Scholar
Li, J., Xu, D., & Gao, W. (2012). Removing label ambiguity in learning-based visual saliency estimation. IEEE Transactions on Image Processing, 21(4), 1513–1525.
Article MathSciNet Google Scholar
Li, J., Levine, M., An, X., Xu, X., & He, H. (2013). Visual saliency based on scale-space analysis in the frequency domain. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(4), 996–1010.
Article Google Scholar
Li, J., Tian, Y., & Huang, T. (2014). Visual saliency with statistical priors. International Journal of Computer Vision, 107(3), 239–253.
Article MathSciNet MATH Google Scholar
Liu, T., Sun, J., Zheng, N. N., Tang, X., & Shum, H. Y. (2007). Learning to detect a salient object. In IEEE conference on computer vision and pattern recognition (pp. 1–8).
Lu, S., Tan, C., & Lim, J. (2013). Robust and efficient saliency modeling from image co-occurrence histograms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(1), 195–201.
Google Scholar
Marat, S., Ho Phuoc, T., Granjon, L., Guyader, N., Pellerin, D., & Guérin-Dugué, A. (2009). Modelling spatio-temporal saliency to predict gaze direction for short videos. International Journal of Computer Vision, 82(3), 231–243.
Article Google Scholar
Marchesotti, L., Cifarelli, C., & Csurka, G. (2009). A framework for visual saliency detection with applications to image thumbnailing. In IEEE international conference on computer vision (pp. 2232–2239).
Margolin, R., Tal, A., & Zelnik-Manor, L. (2013). What makes a patch distinct? In IEEE conference on computer vision and pattern recognition (pp. 1139–1146).
Navalpakkam, V., & Itti, L. (2007). Search goal tunes visual features optimally. Neuron, 53, 605–617.
Article Google Scholar
Niu, Y., Geng, Y., Li, X., & Liu, F. (2012). Leveraging stereopsis for saliency analysis. In IEEE conference on computer vision and pattern recognition (pp. 454–461).
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 3, 145–175.
Article MATH Google Scholar
Russell, A. F., Mihalaş, S., von der Heydt, R., Niebur, E., & Etienne-Cummings, R. (2014). A model of proto-object based saliency. Vision Research, 94, 1–15.
Article Google Scholar
Siva, P., Russell, C., Xiang, T., & Agapito, L. (2013). Looking beyond the image: Unsupervised learning for object saliency and detection. In IEEE conference on computer vision and pattern recognition (pp. 3238–3245).
Sun, X., Yao, H., & Ji, R. (2012). What are we looking for: Towards statistical modeling of saccadic eye movements and visual saliency. In IEEE conference on computer vision and pattern recognition (pp. 1552–1559).
Torralba, A. (2005). Contextual influences on saliency. In L. Itti, G. Rees, & J. Tsotsos (Eds.), Neurobiology of Attention (1st ed., pp. 586–592). Amsterdam: Elsevier Press.
Chapter Google Scholar
Vig, E., Dorr, M., Martinetz, T., & Barth, E. (2012). Intrinsic dimensionality predicts the saliency of natural dynamic scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(6), 1080–1091.
Article Google Scholar
Vikram, T. N., Tscherepanow, M., & Wrede, B. (2012). A saliency map based on sampling an image into random rectangular regions of interest. Pattern Recognition, 45, 3114–3124.
Article Google Scholar
Wang, M., Konrad, J., Ishwar, P., Jing, K., & Rowley, H. (2011). Image saliency: From intrinsic to extrinsic context. In IEEE conference on computer vision and pattern recognition (pp. 417–424).
Wang, P., Wang, J., Zeng, G., Feng, J., Zha, H., & Li, S. (2012). Salient object detection for searched web images via global saliency. In IEEE conference on computer vision and pattern recognition (pp. 3194–3201).
Wang, W., Wang, Y., Huang, Q., & Gao, W. (2010). Measuring visual saliency by site entropy rate. In IEEE conference on computer vision and pattern recognition (pp. 2368–2375).
Wei, S., Xu, D., Li, X., & Zhao, Y. (2013). Joint optimization toward effective and efficient image search. IEEE Transactions on Cybernetics, 43(6), 2216–2227.
Xu, J., Jiang, M., Wang, S., Kankanhalli, M. S., & Zhao, Q. (2014). Predicting human gaze beyond pixels. Journal of Vision, 14(1), 1–20.
Article Google Scholar
Zhang, J., & Sclaroff, S. (2013). Saliency detection: A boolean map approach. In IEEE international conference on computer vision (pp. 153–160).
Zhang, L., Tong, M. H., Marks, T. K., Shan, H., & Cottrell, G. W. (2008). Sun: A bayesian framework for saliency using natural statistics. Journal of Vision, 8(7), 32.
Article Google Scholar
Zhao, Q., & Koch, C. (2011). Learning a saliency map using fixated locations in natural scenes. Journal of Vision, 11(3), 9.
Article Google Scholar
Zhao, Q. (2012). Learning visual saliency by combining feature maps in a nonlinear manner using adaboost. Journal of Vision, 12(6), 22.
Article Google Scholar
Zhu, G., Wang, Q., Yuan, Y., & Yan, P. (2013). Learning saliency by mrf and differential threshold. IEEE Transactions on Cybernetics, 43(6), 2032–2043.
Article Google Scholar

Download references

Acknowledgments

Authors would like to thank anonymous reviewers for their helpful comments on the paper. This work was supported in part by grants from National Basic Research Program of China under Contract 2015CB351806, National Natural Science Foundation of China (61370113, 61532003, 61390515 and 61421062), and Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing, China
Jia Li & Xiaowu Chen
International Research Institute for Multidisciplinary Science, Beihang University, Beijing, China
Jia Li
School of EE & CS, Peking University, Beijing, China
Yonghong Tian & Tiejun Huang
Cooperative Medianet Innovation Center, Beijing, China
Yonghong Tian & Tiejun Huang

Authors

Jia Li
View author publications
You can also search for this author in PubMed Google Scholar
Yonghong Tian
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Tiejun Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jia Li or Yonghong Tian.

Additional information

Communicated by Y. Sato.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Tian, Y., Chen, X. et al. Measuring Visual Surprise Jointly from Intrinsic and Extrinsic Contexts for Image Saliency Estimation. Int J Comput Vis 120, 44–60 (2016). https://doi.org/10.1007/s11263-016-0892-7

Download citation

Received: 11 July 2014
Accepted: 15 February 2016
Published: 01 March 2016
Issue Date: October 2016
DOI: https://doi.org/10.1007/s11263-016-0892-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Measuring Visual Surprise Jointly from Intrinsic and Extrinsic Contexts for Image Saliency Estimation

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

Microsoft COCO: Common Objects in Context

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Measuring Visual Surprise Jointly from Intrinsic and Extrinsic Contexts for Image Saliency Estimation

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

Microsoft COCO: Common Objects in Context

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation