Machine Vision and Applications

, Volume 25, Issue 7, pp 1671–1683 | Cite as

Image visual attention computation and application via the learning of object attributes

  • Junwei Han
  • Dongyang Wang
  • Ling Shao
  • Xiaoliang Qian
  • Gong Cheng
  • Jungong Han
Special Issue Paper


Visual attention aims at selecting a salient subset from the visual input for further processing while ignoring redundant data. The dominant view for the computation of visual attention is based on the assumption that bottom-up visual saliency such as local contrast and interest points drives the allocation of attention in scene viewing. However, we advocate in this paper that the deployment of attention is primarily and directly guided by objects and thus propose a novel framework to explore image visual attention via the learning of object attributes from eye-tracking data. We mainly aim to solve three problems: (1) the pixel-level visual attention computation (the saliency map); (2) the image-level visual attention computation; (3) the application of the computation model in image categorization. We first adopt the algorithm of object bank to acquire the responses to a number of object detectors at each location in an image and thus form a feature descriptor to indicate the occurrences of various objects at a pixel or in an image. Next, we integrate the inference of interesting objects from fixations in eye-tracking data with the competition among surrounding objects to solve the first problem. We further propose a computational model to solve the second problem and estimate the interestingness of each image via the mapping between object attributes and the inter-observer visual congruency obtained from eye-tracking data. Finally, we apply the proposed pixel-level visual attention model to the image categorization task. Comprehensive evaluations on publicly available benchmarks and comparisons with state-of-the-art methods demonstrate the effectiveness of the proposed models.


Visual attention Eye tracking Object bank Image categorization 



This work was partially supported by the National Science Foundation of China under Grant 61005018 and 91120005, NPU-FFR-JC20104, and Program for New Century Excellent Talents in University under grant NCET-10-0079.


  1. 1.
    Achanta, R., Estrada, F., Wils, P., Süsstrunk, S.: Salient region detection and segmentation. In: ICVS, pp. 66–75 (2008)Google Scholar
  2. 2.
    Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. In: CVPR, pp. 1597–1604 (2009)Google Scholar
  3. 3.
    Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: CVPR, pp. 73–80 (2010)Google Scholar
  4. 4.
    Bruce, N., Tsotsos, J.: Saliency based on information maximization. In: NIPS, pp. 155–162 (2006)Google Scholar
  5. 5.
    Chen, Z., Han, J., Ngan, K.N.: Dynamic bit allocation for multiple video object coding. IEEE Trans. Multimed. 8(6), 1117–1124 (2006)CrossRefGoogle Scholar
  6. 6.
    Cheng, M.M., Zhang, G.X., Mitra, N.J., Huang, X., Hu, S.M.: Global contrast based salient region detection. In: CVPR, pp. 409–416 (2011)Google Scholar
  7. 7.
    Einhäuser, W., Spain, M., Perona, P.: Objects predict fixations better than early saliency. J. Vis. 8(14), 1–26 (2008)CrossRefGoogle Scholar
  8. 8.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  9. 9.
    Gao, D., Mahadevan, V., Vasconcelos, N.: On the plausibility of the discriminant center-surround hypothesis for visual saliency. J. Vis. 8(7), 1–18 (2008)CrossRefGoogle Scholar
  10. 10.
    Goferman, S., Zelnik-Manor, L., Tal, A.: Context-aware saliency detection. In: CVPR, pp. 2376–2383 (2010)Google Scholar
  11. 11.
    Gopalakrishnan, V., Hu, Y., Rajan, D.: Salient region detection by modeling distributions of color and orientation. IEEE Trans. Multimed. 11(5), 892–905 (2009)CrossRefGoogle Scholar
  12. 12.
    Guo, C., Ma, Q., Zhang, L.: Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform. In: CVPR, pp. 1–8 (2008)Google Scholar
  13. 13.
    Han, B., Zhu, H., Ding, Y.: Bottom-up saliency based on weighted sparse coding residual. In: ACM Multimedia, pp. 1117–1120 (2011)Google Scholar
  14. 14.
    Han, J., Ngan, K.N., Li, M., Zhang, H.J.: Unsupervised extraction of visual attention objects in color images. IEEE Trans. Circuit. Syst. Video Technol. 16(1), 141–145 (2006)CrossRefGoogle Scholar
  15. 15.
    Han, J., Pauwels, E., Zeeuw, P.: Fast saliency-aware multi-modality image fusion. Neurocomputing 111, 70–80 (2013)CrossRefGoogle Scholar
  16. 16.
    Harel, J., Koch, C., Perona, P.: Graph-Based Visual Saliency. In: NIPS, pp. 545–552 (2007)Google Scholar
  17. 17.
    Hoiem, D., Efros, A.A., Hebert, M.: Automatic photo pop-up. ACM Trans. Graphics 24, 577–584 (2005)CrossRefGoogle Scholar
  18. 18.
    Hou, X., Zhang, L.: Saliency detection: a spectral residual approach. In: CVPR, pp. 1–8 (2007)Google Scholar
  19. 19.
    Hou, X., Harel, J., Koch, C.: Image signature: highlighting sparse salient regions. IEEE Trans. Pattern Anal. Mach. Intell. 34(1), 194–201 (2012)CrossRefGoogle Scholar
  20. 20.
    Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998)CrossRefGoogle Scholar
  21. 21.
    Itti, L., Koch, C.: A saliency-based search mechanism for overt and covert shifts of visual attention. Vis. Res. 40(10–12), 1489–1506 (2000)CrossRefGoogle Scholar
  22. 22.
    Jiang, P., Qin, X.: Keyframe-based video summary using visual attention clues. IEEE MultiMed. 17(2), 64–73 (2010)MathSciNetGoogle Scholar
  23. 23.
    Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: ICCV, pp. 2106–2113 (2009)Google Scholar
  24. 24.
    Khuwuthyakorn, P., Robles-Kelly, A., Zhou, J.: Object of interest detection by saliency learning. In: ECCV, pp. 636–649 (2010)Google Scholar
  25. 25.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR, pp. 2169–2178 (2006)Google Scholar
  26. 26.
    Le Meur, O., Le Callet, P., Barba, D., Thoreau, D.: A coherent computational approach to model bottom-up visual attention. IEEE Trans. Pattern Anal. Mach. Intell. 28(5), 802–817 (2006)CrossRefGoogle Scholar
  27. 27.
    Le Meur, O., Baccino, T., Roumy, A.: Prediction of the inter-observer visual congruency (IOVC) and application to image ranking. In: ACM Multimedia, pp. 373–382 (2011)Google Scholar
  28. 28.
    Lee, W.F., Huang, T.H., Yeh, S.L., Chen, H.H.: Learning-based prediction of visual attention for video signals. IEEE Trans. Image Process. 20(11), 3028–3038 (2011)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Li, L.J., Su, H., Lim, Y., Fei-Fei, L.: Objects as attributes for scene classification. In: ECCV, pp. 1–13 (2010)Google Scholar
  30. 30.
    Liu, F., Gleicher, M.: Video retargeting: automating pan and scan. In: ACM Multimedia, pp. 241–250 (2006)Google Scholar
  31. 31.
    Liu, T., Wang, J., Sun, J., Zheng, N., Tang, X., Shum, H.Y.: Picture collage. IEEE Trans. Multimed. 11(7), 1225–1239 (2009)CrossRefGoogle Scholar
  32. 32.
    Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X., Shum, H.Y.: Learning to detect a salient object. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 353–367 (2011)CrossRefGoogle Scholar
  33. 33.
    Ma, Y.F., Lu, L., Zhang, H.J., Li, M.: A user attention model for video summarization. In: ACM Multimedia, pp. 533–542 (2002)Google Scholar
  34. 34.
    Ma, Y.F., Zhang, H.J.: Contrast-based image attention analysis by using fuzzy growing. In: ACM Multimedia, pp. 374–381 (2003)Google Scholar
  35. 35.
    Ngo, C.W., Ma, Y.F., Zhang, H.J.: Video summarization and scene detection by graph modeling. IEEE Trans. Circuit. Syst. Video Technol. 15(2), 296–305 (2005)CrossRefGoogle Scholar
  36. 36.
    Nuthmann, A., Henderson, J.M.: Object-based attentional selection in scene viewing. J. Vis. 10(8), 1–19 (2010)CrossRefGoogle Scholar
  37. 37.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)CrossRefzbMATHGoogle Scholar
  38. 38.
    Seo, H.J., Milanfar, P.: Static and space-time visual saliency detection by self-resemblance. J. Vis. 9(12), 1–27 (2009)CrossRefGoogle Scholar
  39. 39.
    Shao, L., Brady, M.: Specific object retrieval based on salient regions. Pattern Recognit. 39(10), 1932–1948 (2006)CrossRefzbMATHGoogle Scholar
  40. 40.
    Shao, L., Kadir, T., Brady, M.: Geometric and photometric invariant distinctive regions detection. Inf. Sci. 177(4), 1088–1122 (2007)CrossRefGoogle Scholar
  41. 41.
    Subramanian, R., Yanulevskaya, V., Sebe, N.: Can computers learn from humans to see better?: inferring scene semantics from viewers’ eye movements. In: ACM Multimedia, pp. 33–42 (2011)Google Scholar
  42. 42.
    Tatler, B.W., Hayhoe, M.M., Land, M.F., Ballard, D.H.: Eye guidance in natural vision: reinterpreting salience. J. Vis. 11(5), 1–23 (2011)CrossRefGoogle Scholar
  43. 43.
    Torralba, A., Oliva, A., Castelhano, M.S., Henderson, J.M.: Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol. Rev. 113(4), 766–786 (2006)CrossRefGoogle Scholar
  44. 44.
    Wang, L., Xue, J., Zheng, N., Hua, G.: Automatic salient object extraction with contextual cue. In: ICCV, pp. 105–112 (2011)Google Scholar
  45. 45.
    Han, J., He, S., Qian, X., Wang, D., Guo, L., Liu, T.: An object-oriented visual saliency detection framework based on sparse coding representations. IEEE Trans. Circuit. Syst. Video Technol. (2013)Google Scholar
  46. 46.
    Han, J., Shao, L., Xu, D., Shotton, J.: Enhanced computer vision with Microsoft Kinect sensor: a review. IEEE Trans. Cybern. 43(5), 1318–1334 (2013)CrossRefGoogle Scholar
  47. 47.
    Hong, R., Tang, J., Tan, H., Ngo, C., Yan, S., Chua, T.: Beyond search: event-driven summarization for web videos. TOMCCAP 7(4), 35 (2011)CrossRefGoogle Scholar
  48. 48.
    Wang, M., Hong, R., Li, G., Zha, Z., Yan, S., Chua, T.: Event driven web video summarization by tag localization and key-shot identification. IEEE Trans. Multimed. 14(4), 975–985 (2012)CrossRefGoogle Scholar
  49. 49.
    Hong, R., Wang, M., Li, G., Nie, L., Zha, Z., Chua, T.: Multimedia question answering. IEEE Multimed. 19(4), 72–78 (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Junwei Han
    • 1
  • Dongyang Wang
    • 1
  • Ling Shao
    • 2
  • Xiaoliang Qian
    • 1
  • Gong Cheng
    • 1
  • Jungong Han
    • 3
  1. 1.School of AutomationNorthwestern Polytechnical UniversityXi’anChina
  2. 2.University of SheffieldSheffieldUK
  3. 3.Civolution TechnologyEindhovenThe Netherlands

Personalised recommendations