Science China Information Sciences

, Volume 54, Issue 12, pp 2461–2470 | Cite as

Salient region detection and segmentation for general object recognition and image understanding

  • TieJun Huang
  • YongHong TianEmail author
  • Jia Li
  • HaoNan Yu
Research Papers Special Focus


General object recognition and image understanding is recognized as a dramatic goal for computer vision and multimedia retrieval. In spite of the great efforts devoted in the last two decades, it still remains an open problem. In this paper, we propose a selective attention-driven model for general image understanding, named GORIUM (general object recognition and image understanding model). The key idea of our model is to discover recurring visual objects by selective attention modeling and pairwise local invariant features matching on a large image set in an unsupervised manner. Towards this end, it can be formulated as a four-layer bottomup model, i.e., salient region detection, object segmentation, automatic object discovering and visual dictionary construction. By exploiting multi-task learning methods to model visual saliency simultaneously with the bottom-up and top-down factors, the lowest layer can effectively detect salient objects in an image. The second layer exploits a simple yet effective learning approach to generate two complementary maps from several raw saliency maps, which then can be utilized to segment the salient objects precisely from a complex scene. For the third layer, we have also implemented an unsupervised approach to automatically discover general objects from large image set by pairwise matching with local invariant features. Afterwards, visual dictionary construction can be implemented by using many state-of-the-art algorithms and tools available nowadays.


object recognition image understanding visual saliency salient object segmentation visual dictionary 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

Supplementary material, approximately 11.5 MB.


  1. 1.
    Smeulders A W M, Worring M, Santini S, et al. Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell, 2000, 22: 1349–1380CrossRefGoogle Scholar
  2. 2.
    Lowe D G. Distinctive image features from scale-invariant keypoints. Int J Comput Vision, 2004, 60: 91–110CrossRefGoogle Scholar
  3. 3.
    Bay H, Ess A, Tuytelaars T, et al. SURF: Speeded up robust features. Comput Vis Image Underst, 2008, 110: 346–359CrossRefGoogle Scholar
  4. 4.
    Deng J, Dong W, Socher R, et al. ImageNet: A large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009. 248–255Google Scholar
  5. 5.
    Biederman I. Recognition-by-components: A theory of human image understanding. Psycho Rev, 1987, 94: 115–147CrossRefGoogle Scholar
  6. 6.
    Itti L, Rees G, Tsotsos J. Neurobiology of Attention. San Diego: Elsevier, 2005Google Scholar
  7. 7.
    Li J, Tian Y H, Huang T J, et al. Probabilistic multi-task learning for visual saliency estimation in video. Int J Comput Vision, 2010, 90: 150–165CrossRefGoogle Scholar
  8. 8.
    Li J, Tian Y H, Huang T J, et al. Cost-sensitive rank learning from positive and unlabeled data for visual saliency estimation. IEEE Signal Process Lett, 2010, 17: 591–594CrossRefGoogle Scholar
  9. 9.
    Li J, Tian Y H, Huang T J, et al. Multi-task rank learning for visual saliency in video. IEEE Trans Circuits Syst Video Technol, 2011, 21: 623–636CrossRefGoogle Scholar
  10. 10.
    Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell, 1998, 20: 1254–1259CrossRefGoogle Scholar
  11. 11.
    Achanta R, Hemami S, Estrada F, et al. Frequency-tuned salient region detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009. 1597–1604Google Scholar
  12. 12.
    Hou X, Zhang L. Saliency detection: a spectral residual approach. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, Minnesota, USA, 2007. 1–8Google Scholar
  13. 13.
    Ma Y, Zhang H. Contrast-based image attention analysis by using fuzzy growing. In: Proceedings of the 11th ACM International Conference on Multimedia, Berkeley, CA, USA, 2003. 374–381Google Scholar
  14. 14.
    Yu H N, Li J, Tian Y H, et al. Automatic interesting object extraction from images using complementary saliency maps. In: Proceedings of ACM Multimedia, Firenze, Italy, 2010. 891–894Google Scholar
  15. 15.
    Goferman S, Manor L Z, Tal A. Context-aware saliency detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 2010, 2376–2383Google Scholar
  16. 16.
    Harel J, Koch C, Perona P. Graph-based visual saliency. Adv Neural Inf Process Syst, 2007, 19: 545–552Google Scholar
  17. 17.
    Seo H J, Milanfar P. Static and space-time visual saliency detection by self-resemblance. J Vision, 2009, 9: 1–27CrossRefGoogle Scholar
  18. 18.
    Rother C, Kolmogorov V, Blake A. GrabCut-interactive foreground extraction using iterated graph cuts. ACM Trans Graphics, 2004, 23: 309–314CrossRefGoogle Scholar
  19. 19.
    Boykov Y, Kolmogorov V. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans Pattern Anal Mach Intell, 2004, 23: 1124–1137CrossRefGoogle Scholar
  20. 20.
    Movahedi V, Elder J H. Design and perceptual validation of performance measures for salient object segmentation. In: Proceedings of IEEE Workshop on Perceptual Organization in Computer Vision, San Francisco, CA, USA, 2010Google Scholar
  21. 21.
    Chen D, Tsai S, Chandrasekhar V, et al. Inverted index compression for scalable image matching. In: Proceedings of IEEE Data Compression Conference, Snowbird, UT, USA, 2010Google Scholar
  22. 22.
    Chen Z, Duan L Y, Wang C Y, et al. Generating vocabulary for global feature representation towards commerce image retrieval. In: Proceedings of IEEE International Conference Image Processing, Brussels, Belgium, 2011Google Scholar

Copyright information

© Science China Press and Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • TieJun Huang
    • 1
  • YongHong Tian
    • 1
    Email author
  • Jia Li
    • 2
  • HaoNan Yu
    • 1
  1. 1.National Engineering Laboratory for Video Technology, School of Electrical Engineering and Computer SciencePeking UniversityBeijingChina
  2. 2.Key Laboratory of Intelligent Information Processing, Institute of Computing TechnologyChinese Academy of SciencesBeijingChina

Personalised recommendations