Abstract
Detecting conspicuous image content is a challenging task in the field of computer vision. In existing studies, most approaches focus on estimating saliency only with the cues from the input image. However, such “intrinsic” cues are often insufficient to distinguish targets and distractors that may share some common visual attributes. To address this problem, we present an approach to estimate image saliency by measuring the joint visual surprise from intrinsic and extrinsic contexts. In this approach, a hierarchical context model is first built on a database of 31.2 million images, where a Gaussian mixture model (GMM) is trained for each leaf node to encode the prior knowledge on “what is where” in a specific scene. For a testing image that shares similar spatial layout within a scene, the pre-trained GMM can serve as an extrinsic context model to measure the “surprise” of an image patch. Since human attention may quickly shift between different surprising locations, we adopt a Markov chain to model a surprise-driven attention-shifting process so as to infer the salient patches that can best capture human attention. Experiments show that our approach outperforms 19 state-of-the-art methods in fixation prediction.
Similar content being viewed by others
Notes
No winner-take-all is used in IT.
The bottom-up saliency maps predicted by IT are used as the input for SP.
Our implementation of (Wang et al. 2011), as well as its experimental data and results on MIT1003, can be found at http://mlg.idm.pku.edu.cn/mlg/download/code/wang11.zip.
SNR, PMT, JUD and MIR are trained on 903 images and tested on the other 100 images.
References
Borji, A. (2012). Boosting bottom-up and top-down visual features for saliency estimation. In IEEE conference on computer vision and pattern recognition (pp. 438–445).
Bouman, C. A. (1997). Cluster: An unsupervised algorithm for modeling Gaussian mixtures. http://www.ece.purdue.edu/bouman.
Bruce, N. D., & Tsotsos, J. K. (2005). Saliency based on information maximization. In Advances in neural information processing systems, Vancouver, BC, Canada (pp. 155–162).
Cerf, M., Harel, J., Einhauser, W., & Koch, C. (2009). Predictinghuman gaze using low-level saliency combined with face detection. InAdvances in neural information processing systems, Vancouver, BC, Canada.
Cheng, M. M., Zhang, G. X., Mitra, N., Huang, X., & Hu, S. M. (2011). Global contrast based salient region detection. In IEEE conference on computer vision and pattern recognition (pp. 409–416).
Chun, M. M., & Jiang, Y. (1998). Contextual cueing: Implicit learning and memory of visual context guides spatial attention. Cognitive Psychology, 36(1), 28–71.
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.
Desingh, K., Krishna, K. M., Rajan, D., & Jawahar, C. (2013). Depth really matters: Improving visual salient region detection with depth. In British machine vision conference.
Erdem, E., & Erdem, A. (2013). Visual saliency estimation by nonlinearly integrating features using region covariances. Journal of Vision, 13(4), 11.
Fu, H., Cao, X., & Tu, Z. (2013). Cluster-based co-saliency detection. IEEE Transactions on Image Processing, 22(10), 3766–3778.
Gao, D., Han, S., & Vasconcelos, N. (2009). Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6), 989–1005.
Goferman, S., Zelnik-Manor, L., & Tal, A. (2010). Context-aware saliency detection. In IEEE conference on computer vision and pattern recognition (pp. 2376–2383).
Gopalakrishnan, V., Hu, Y., & Rajan, D. (2009). Random walks on graphs to model saliency in images. In IEEE conference on computer vision and pattern recognition (pp. 1698–1705).
Harel, J., Koch, C., & Perona, P. (2007) Graph-based visual saliency. In Advances in neural information processing systems (pp. 545–552).
Hou, W., Gao, X., Tao, D., & Li, X. (2013). Visual saliency detection using information divergence. Pattern Recognition, 46(10), 2658–2669.
Hou, X., & Zhang, L. (2007). Saliency detection: A spectral residual approach. In IEEE conference on computer vision and pattern recognition (pp. 1–8).
Hou, X., & Zhang, L. (2009). Dynamic visual attention: Searching for coding length increments. In Advances in neural information processing systems (pp. 681–688).
Hou, X., Harel, J., & Koch, C. (2012). Image signature: Highlighting sparse salient regions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(1), 194–201.
Hu, Y., Rajan, D., & Chia, L. T. (2005). Adaptive local context suppression of multiple cues for salient visual attention detection. In IEEE international conference on multimedia and expo.
Itti, L., & Baldi, P. (2005). A principled approach to detecting surprising events in video. In IEEE conference on computer vision and pattern recognition (Vol. 1, pp. 631–637).
Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259.
Itti, L., Rees, G., & Tsotsos, J. (2005). Neurobiology of attention (1st ed.). San Diego, Amsterdam: Elsevier Press.
Judd, T., Ehinger, K., Durand, F., & Torralba, A. (2009). Learning to predict where humans look. In IEEE international conference on computer vision (pp. 2106–2113).
Kanwisher, N., & Wojciulik, E. (2000). Visual attention: Insights from brain imaging. Nature Reviews Neuroscience, 1(2), 91–100.
Kunar, M. A., Flusberg, S. J., & Wolfe, J. M. (2006). Contextual cuing by global features. Journal Perception and Psychophysics, 68(7), 1204–1216.
Li, J., & Gao, W. (2014). Visual saliency computation: A machine learning perspective (1st ed.). Switzerland: Springer.
Li, J., Tian, Y., Huang, T., & Gao, W. (2010). Probabilistic multi-task learning for visual saliency estimation in video. International Journal of Computer Vision, 90(2), 150–165.
Li, J., Tian, Y., Huang, T., & Gao, W. (2011). Multi-task rank learning for visual saliency estimation. IEEE Transactions on Circuits and Systems for Video Technology, 21(5), 623–636.
Li, J., Xu, D., & Gao, W. (2012). Removing label ambiguity in learning-based visual saliency estimation. IEEE Transactions on Image Processing, 21(4), 1513–1525.
Li, J., Levine, M., An, X., Xu, X., & He, H. (2013). Visual saliency based on scale-space analysis in the frequency domain. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(4), 996–1010.
Li, J., Tian, Y., & Huang, T. (2014). Visual saliency with statistical priors. International Journal of Computer Vision, 107(3), 239–253.
Liu, T., Sun, J., Zheng, N. N., Tang, X., & Shum, H. Y. (2007). Learning to detect a salient object. In IEEE conference on computer vision and pattern recognition (pp. 1–8).
Lu, S., Tan, C., & Lim, J. (2013). Robust and efficient saliency modeling from image co-occurrence histograms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(1), 195–201.
Marat, S., Ho Phuoc, T., Granjon, L., Guyader, N., Pellerin, D., & Guérin-Dugué, A. (2009). Modelling spatio-temporal saliency to predict gaze direction for short videos. International Journal of Computer Vision, 82(3), 231–243.
Marchesotti, L., Cifarelli, C., & Csurka, G. (2009). A framework for visual saliency detection with applications to image thumbnailing. In IEEE international conference on computer vision (pp. 2232–2239).
Margolin, R., Tal, A., & Zelnik-Manor, L. (2013). What makes a patch distinct? In IEEE conference on computer vision and pattern recognition (pp. 1139–1146).
Navalpakkam, V., & Itti, L. (2007). Search goal tunes visual features optimally. Neuron, 53, 605–617.
Niu, Y., Geng, Y., Li, X., & Liu, F. (2012). Leveraging stereopsis for saliency analysis. In IEEE conference on computer vision and pattern recognition (pp. 454–461).
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 3, 145–175.
Russell, A. F., Mihalaş, S., von der Heydt, R., Niebur, E., & Etienne-Cummings, R. (2014). A model of proto-object based saliency. Vision Research, 94, 1–15.
Siva, P., Russell, C., Xiang, T., & Agapito, L. (2013). Looking beyond the image: Unsupervised learning for object saliency and detection. In IEEE conference on computer vision and pattern recognition (pp. 3238–3245).
Sun, X., Yao, H., & Ji, R. (2012). What are we looking for: Towards statistical modeling of saccadic eye movements and visual saliency. In IEEE conference on computer vision and pattern recognition (pp. 1552–1559).
Torralba, A. (2005). Contextual influences on saliency. In L. Itti, G. Rees, & J. Tsotsos (Eds.), Neurobiology of Attention (1st ed., pp. 586–592). Amsterdam: Elsevier Press.
Vig, E., Dorr, M., Martinetz, T., & Barth, E. (2012). Intrinsic dimensionality predicts the saliency of natural dynamic scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(6), 1080–1091.
Vikram, T. N., Tscherepanow, M., & Wrede, B. (2012). A saliency map based on sampling an image into random rectangular regions of interest. Pattern Recognition, 45, 3114–3124.
Wang, M., Konrad, J., Ishwar, P., Jing, K., & Rowley, H. (2011). Image saliency: From intrinsic to extrinsic context. In IEEE conference on computer vision and pattern recognition (pp. 417–424).
Wang, P., Wang, J., Zeng, G., Feng, J., Zha, H., & Li, S. (2012). Salient object detection for searched web images via global saliency. In IEEE conference on computer vision and pattern recognition (pp. 3194–3201).
Wang, W., Wang, Y., Huang, Q., & Gao, W. (2010). Measuring visual saliency by site entropy rate. In IEEE conference on computer vision and pattern recognition (pp. 2368–2375).
Wei, S., Xu, D., Li, X., & Zhao, Y. (2013). Joint optimization toward effective and efficient image search. IEEE Transactions on Cybernetics, 43(6), 2216–2227.
Xu, J., Jiang, M., Wang, S., Kankanhalli, M. S., & Zhao, Q. (2014). Predicting human gaze beyond pixels. Journal of Vision, 14(1), 1–20.
Zhang, J., & Sclaroff, S. (2013). Saliency detection: A boolean map approach. In IEEE international conference on computer vision (pp. 153–160).
Zhang, L., Tong, M. H., Marks, T. K., Shan, H., & Cottrell, G. W. (2008). Sun: A bayesian framework for saliency using natural statistics. Journal of Vision, 8(7), 32.
Zhao, Q., & Koch, C. (2011). Learning a saliency map using fixated locations in natural scenes. Journal of Vision, 11(3), 9.
Zhao, Q. (2012). Learning visual saliency by combining feature maps in a nonlinear manner using adaboost. Journal of Vision, 12(6), 22.
Zhu, G., Wang, Q., Yuan, Y., & Yan, P. (2013). Learning saliency by mrf and differential threshold. IEEE Transactions on Cybernetics, 43(6), 2032–2043.
Acknowledgments
Authors would like to thank anonymous reviewers for their helpful comments on the paper. This work was supported in part by grants from National Basic Research Program of China under Contract 2015CB351806, National Natural Science Foundation of China (61370113, 61532003, 61390515 and 61421062), and Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Communicated by Y. Sato.
Rights and permissions
About this article
Cite this article
Li, J., Tian, Y., Chen, X. et al. Measuring Visual Surprise Jointly from Intrinsic and Extrinsic Contexts for Image Saliency Estimation. Int J Comput Vis 120, 44–60 (2016). https://doi.org/10.1007/s11263-016-0892-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-016-0892-7