Advertisement

International Journal of Computer Vision

, Volume 120, Issue 1, pp 44–60 | Cite as

Measuring Visual Surprise Jointly from Intrinsic and Extrinsic Contexts for Image Saliency Estimation

Article

Abstract

Detecting conspicuous image content is a challenging task in the field of computer vision. In existing studies, most approaches focus on estimating saliency only with the cues from the input image. However, such “intrinsic” cues are often insufficient to distinguish targets and distractors that may share some common visual attributes. To address this problem, we present an approach to estimate image saliency by measuring the joint visual surprise from intrinsic and extrinsic contexts. In this approach, a hierarchical context model is first built on a database of 31.2 million images, where a Gaussian mixture model (GMM) is trained for each leaf node to encode the prior knowledge on “what is where” in a specific scene. For a testing image that shares similar spatial layout within a scene, the pre-trained GMM can serve as an extrinsic context model to measure the “surprise” of an image patch. Since human attention may quickly shift between different surprising locations, we adopt a Markov chain to model a surprise-driven attention-shifting process so as to infer the salient patches that can best capture human attention. Experiments show that our approach outperforms 19 state-of-the-art methods in fixation prediction.

Keywords

Image saliency Visual surprise Intrinsic context  Extrinsic context Gaussian mixture model Markov chain 

Notes

Acknowledgments

Authors would like to thank anonymous reviewers for their helpful comments on the paper. This work was supported in part by grants from National Basic Research Program of China under Contract 2015CB351806, National Natural Science Foundation of China (61370113, 61532003, 61390515 and 61421062), and Fundamental Research Funds for the Central Universities.

References

  1. Borji, A. (2012). Boosting bottom-up and top-down visual features for saliency estimation. In IEEE conference on computer vision and pattern recognition (pp. 438–445).Google Scholar
  2. Bouman, C. A. (1997). Cluster: An unsupervised algorithm for modeling Gaussian mixtures. http://www.ece.purdue.edu/bouman.
  3. Bruce, N. D., & Tsotsos, J. K. (2005). Saliency based on information maximization. In Advances in neural information processing systems, Vancouver, BC, Canada (pp. 155–162).Google Scholar
  4. Cerf, M., Harel, J., Einhauser, W., & Koch, C. (2009). Predictinghuman gaze using low-level saliency combined with face detection. InAdvances in neural information processing systems, Vancouver, BC, Canada.Google Scholar
  5. Cheng, M. M., Zhang, G. X., Mitra, N., Huang, X., & Hu, S. M. (2011). Global contrast based salient region detection. In IEEE conference on computer vision and pattern recognition (pp. 409–416).Google Scholar
  6. Chun, M. M., & Jiang, Y. (1998). Contextual cueing: Implicit learning and memory of visual context guides spatial attention. Cognitive Psychology, 36(1), 28–71.CrossRefGoogle Scholar
  7. Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.MathSciNetMATHGoogle Scholar
  8. Desingh, K., Krishna, K. M., Rajan, D., & Jawahar, C. (2013). Depth really matters: Improving visual salient region detection with depth. In British machine vision conference.Google Scholar
  9. Erdem, E., & Erdem, A. (2013). Visual saliency estimation by nonlinearly integrating features using region covariances. Journal of Vision, 13(4), 11.MathSciNetCrossRefGoogle Scholar
  10. Fu, H., Cao, X., & Tu, Z. (2013). Cluster-based co-saliency detection. IEEE Transactions on Image Processing, 22(10), 3766–3778.MathSciNetCrossRefGoogle Scholar
  11. Gao, D., Han, S., & Vasconcelos, N. (2009). Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6), 989–1005.CrossRefGoogle Scholar
  12. Goferman, S., Zelnik-Manor, L., & Tal, A. (2010). Context-aware saliency detection. In IEEE conference on computer vision and pattern recognition (pp. 2376–2383).Google Scholar
  13. Gopalakrishnan, V., Hu, Y., & Rajan, D. (2009). Random walks on graphs to model saliency in images. In IEEE conference on computer vision and pattern recognition (pp. 1698–1705).Google Scholar
  14. Harel, J., Koch, C., & Perona, P. (2007) Graph-based visual saliency. In Advances in neural information processing systems (pp. 545–552).Google Scholar
  15. Hou, W., Gao, X., Tao, D., & Li, X. (2013). Visual saliency detection using information divergence. Pattern Recognition, 46(10), 2658–2669.CrossRefGoogle Scholar
  16. Hou, X., & Zhang, L. (2007). Saliency detection: A spectral residual approach. In IEEE conference on computer vision and pattern recognition (pp. 1–8).Google Scholar
  17. Hou, X., & Zhang, L. (2009). Dynamic visual attention: Searching for coding length increments. In Advances in neural information processing systems (pp. 681–688).Google Scholar
  18. Hou, X., Harel, J., & Koch, C. (2012). Image signature: Highlighting sparse salient regions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(1), 194–201.CrossRefGoogle Scholar
  19. Hu, Y., Rajan, D., & Chia, L. T. (2005). Adaptive local context suppression of multiple cues for salient visual attention detection. In IEEE international conference on multimedia and expo. Google Scholar
  20. Itti, L., & Baldi, P. (2005). A principled approach to detecting surprising events in video. In IEEE conference on computer vision and pattern recognition (Vol. 1, pp. 631–637).Google Scholar
  21. Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259.CrossRefGoogle Scholar
  22. Itti, L., Rees, G., & Tsotsos, J. (2005). Neurobiology of attention (1st ed.). San Diego, Amsterdam: Elsevier Press.Google Scholar
  23. Judd, T., Ehinger, K., Durand, F., & Torralba, A. (2009). Learning to predict where humans look. In IEEE international conference on computer vision (pp. 2106–2113).Google Scholar
  24. Kanwisher, N., & Wojciulik, E. (2000). Visual attention: Insights from brain imaging. Nature Reviews Neuroscience, 1(2), 91–100.CrossRefGoogle Scholar
  25. Kunar, M. A., Flusberg, S. J., & Wolfe, J. M. (2006). Contextual cuing by global features. Journal Perception and Psychophysics, 68(7), 1204–1216.CrossRefGoogle Scholar
  26. Li, J., & Gao, W. (2014). Visual saliency computation: A machine learning perspective (1st ed.). Switzerland: Springer.CrossRefGoogle Scholar
  27. Li, J., Tian, Y., Huang, T., & Gao, W. (2010). Probabilistic multi-task learning for visual saliency estimation in video. International Journal of Computer Vision, 90(2), 150–165.CrossRefGoogle Scholar
  28. Li, J., Tian, Y., Huang, T., & Gao, W. (2011). Multi-task rank learning for visual saliency estimation. IEEE Transactions on Circuits and Systems for Video Technology, 21(5), 623–636.CrossRefGoogle Scholar
  29. Li, J., Xu, D., & Gao, W. (2012). Removing label ambiguity in learning-based visual saliency estimation. IEEE Transactions on Image Processing, 21(4), 1513–1525.MathSciNetCrossRefGoogle Scholar
  30. Li, J., Levine, M., An, X., Xu, X., & He, H. (2013). Visual saliency based on scale-space analysis in the frequency domain. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(4), 996–1010.CrossRefGoogle Scholar
  31. Li, J., Tian, Y., & Huang, T. (2014). Visual saliency with statistical priors. International Journal of Computer Vision, 107(3), 239–253.MathSciNetCrossRefMATHGoogle Scholar
  32. Liu, T., Sun, J., Zheng, N. N., Tang, X., & Shum, H. Y. (2007). Learning to detect a salient object. In IEEE conference on computer vision and pattern recognition (pp. 1–8).Google Scholar
  33. Lu, S., Tan, C., & Lim, J. (2013). Robust and efficient saliency modeling from image co-occurrence histograms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(1), 195–201.Google Scholar
  34. Marat, S., Ho Phuoc, T., Granjon, L., Guyader, N., Pellerin, D., & Guérin-Dugué, A. (2009). Modelling spatio-temporal saliency to predict gaze direction for short videos. International Journal of Computer Vision, 82(3), 231–243.CrossRefGoogle Scholar
  35. Marchesotti, L., Cifarelli, C., & Csurka, G. (2009). A framework for visual saliency detection with applications to image thumbnailing. In IEEE international conference on computer vision (pp. 2232–2239).Google Scholar
  36. Margolin, R., Tal, A., & Zelnik-Manor, L. (2013). What makes a patch distinct? In IEEE conference on computer vision and pattern recognition (pp. 1139–1146).Google Scholar
  37. Navalpakkam, V., & Itti, L. (2007). Search goal tunes visual features optimally. Neuron, 53, 605–617.CrossRefGoogle Scholar
  38. Niu, Y., Geng, Y., Li, X., & Liu, F. (2012). Leveraging stereopsis for saliency analysis. In IEEE conference on computer vision and pattern recognition (pp. 454–461).Google Scholar
  39. Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 3, 145–175.CrossRefMATHGoogle Scholar
  40. Russell, A. F., Mihalaş, S., von der Heydt, R., Niebur, E., & Etienne-Cummings, R. (2014). A model of proto-object based saliency. Vision Research, 94, 1–15.CrossRefGoogle Scholar
  41. Siva, P., Russell, C., Xiang, T., & Agapito, L. (2013). Looking beyond the image: Unsupervised learning for object saliency and detection. In IEEE conference on computer vision and pattern recognition (pp. 3238–3245).Google Scholar
  42. Sun, X., Yao, H., & Ji, R. (2012). What are we looking for: Towards statistical modeling of saccadic eye movements and visual saliency. In IEEE conference on computer vision and pattern recognition (pp. 1552–1559).Google Scholar
  43. Torralba, A. (2005). Contextual influences on saliency. In L. Itti, G. Rees, & J. Tsotsos (Eds.), Neurobiology of Attention (1st ed., pp. 586–592). Amsterdam: Elsevier Press.CrossRefGoogle Scholar
  44. Vig, E., Dorr, M., Martinetz, T., & Barth, E. (2012). Intrinsic dimensionality predicts the saliency of natural dynamic scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(6), 1080–1091.CrossRefGoogle Scholar
  45. Vikram, T. N., Tscherepanow, M., & Wrede, B. (2012). A saliency map based on sampling an image into random rectangular regions of interest. Pattern Recognition, 45, 3114–3124.CrossRefGoogle Scholar
  46. Wang, M., Konrad, J., Ishwar, P., Jing, K., & Rowley, H. (2011). Image saliency: From intrinsic to extrinsic context. In IEEE conference on computer vision and pattern recognition (pp. 417–424).Google Scholar
  47. Wang, P., Wang, J., Zeng, G., Feng, J., Zha, H., & Li, S. (2012). Salient object detection for searched web images via global saliency. In IEEE conference on computer vision and pattern recognition (pp. 3194–3201).Google Scholar
  48. Wang, W., Wang, Y., Huang, Q., & Gao, W. (2010). Measuring visual saliency by site entropy rate. In IEEE conference on computer vision and pattern recognition (pp. 2368–2375).Google Scholar
  49. Wei, S., Xu, D., Li, X., & Zhao, Y. (2013). Joint optimization toward effective and efficient image search. IEEE Transactions on Cybernetics, 43(6), 2216–2227.Google Scholar
  50. Xu, J., Jiang, M., Wang, S., Kankanhalli, M. S., & Zhao, Q. (2014). Predicting human gaze beyond pixels. Journal of Vision, 14(1), 1–20.CrossRefGoogle Scholar
  51. Zhang, J., & Sclaroff, S. (2013). Saliency detection: A boolean map approach. In IEEE international conference on computer vision (pp. 153–160).Google Scholar
  52. Zhang, L., Tong, M. H., Marks, T. K., Shan, H., & Cottrell, G. W. (2008). Sun: A bayesian framework for saliency using natural statistics. Journal of Vision, 8(7), 32.CrossRefGoogle Scholar
  53. Zhao, Q., & Koch, C. (2011). Learning a saliency map using fixated locations in natural scenes. Journal of Vision, 11(3), 9.CrossRefGoogle Scholar
  54. Zhao, Q. (2012). Learning visual saliency by combining feature maps in a nonlinear manner using adaboost. Journal of Vision, 12(6), 22.CrossRefGoogle Scholar
  55. Zhu, G., Wang, Q., Yuan, Y., & Yan, P. (2013). Learning saliency by mrf and differential threshold. IEEE Transactions on Cybernetics, 43(6), 2032–2043.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and EngineeringBeihang UniversityBeijingChina
  2. 2.International Research Institute for Multidisciplinary ScienceBeihang UniversityBeijingChina
  3. 3.School of EE & CSPeking UniversityBeijingChina
  4. 4.Cooperative Medianet Innovation CenterBeijingChina

Personalised recommendations