The Visual Computer

, Volume 30, Issue 4, pp 443–453 | Cite as

SalientShape: group saliency in image collections

  • Ming-Ming Cheng
  • Niloy J. Mitra
  • Xiaolei Huang
  • Shi-Min Hu
Original Article


Efficiently identifying salient objects in large image collections is essential for many applications including image retrieval, surveillance, image annotation, and object recognition. We propose a simple, fast, and effective algorithm for locating and segmenting salient objects by analysing image collections. As a key novelty, we introduce group saliency to achieve superior unsupervised salient object segmentation by extracting salient objects (in collections of pre-filtered images) that maximize between-image similarities and within-image distinctness. To evaluate our method, we construct a large benchmark dataset consisting of 15 K images across multiple categories with 6000+ pixel-accurate ground truth annotations for salient object regions where applicable. In all our tests, group saliency consistently outperforms state-of-the-art single-image saliency algorithms, resulting in both higher precision and better recall. Our algorithm successfully handles image collections, of an order larger than any existing benchmark datasets, consisting of diverse and heterogeneous images from various internet sources.


Saliency detection Group saliency Object of interest segmentation Image retrieval 



We would like to thank the anonymous reviewers for their constructive comments. This research was supported by the 973 Program (2011CB302205), the 863 Program (2009AA01Z327), the Key Project of S&T (2011ZX01042-001-002), and NSFC (U0735001). Ming-Ming Cheng was funded by Google Ph.D. fellowship, IBM Ph.D. fellowship, and New Ph.D. Researcher Award (Ministry of Edu., CN).


  1. 1.
    Achanta, R., Hemami, S., Estrada, F., Süsstrunk, S.: Frequency-tuned salient region detection. In: IEEE CVPR, pp. 1597–1604 (2009) Google Scholar
  2. 2.
    Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002) CrossRefGoogle Scholar
  3. 3.
    Ben-Haim, N., Babenko, B., Belongie, S.: Improving web-based image search via content based clustering. In: IEEE CVPRW, p. 106 (2006) Google Scholar
  4. 4.
    Biswas, S., Aggarwal, G., Chellappa, R.: An efficient and robust algorithm for shape indexing and retrieval. IEEE Trans. Multimed. 12, 371–385 (2010) CrossRefGoogle Scholar
  5. 5.
    Bouet, M., Khenchaf, A., Briand, H.: Shape representation for image retrieval. In: ACM MM, pp. 1–4 (1999) Google Scholar
  6. 6.
    Cao, Y., Wang, C., Zhang, L., Zhang, L.: Edgel index for large-scale sketch-based image search. In: IEEE CVPR, pp. 761–768 (2011) Google Scholar
  7. 7.
    Cao, Y., Wang, H., Wang, C., Li, Z., Zhang, L., Zhang, L.: Mindfinder: interactive sketch-based image search on millions of images. In: ACM MM, pp. 1605–1608 (2010) Google Scholar
  8. 8.
    Chang, K., Liu, T., Lai, S.: From co-saliency to co-segmentation: an efficient and fully unsupervised energy minimization model. In: IEEE CVPR, pp. 2129–2136 (2011) Google Scholar
  9. 9.
    Charpiat, G., Faugeras, O., Keriven, R.: Shape statistics for image segmentation with prior. In: IEEE CVPR, pp. 1–6 (2007) Google Scholar
  10. 10.
    Chen, H.: Preattentive co-saliency detection. In: IEEE ICIP, pp. 1117–1120 (2010) Google Scholar
  11. 11.
    Chen, T., Cheng, M.M., Tan, P., Shamir, A., Hu, S.M.: Sketch2photo: Internet image montage. ACM Trans. Graph. 28(5), 124:1–124:10 (2009) Google Scholar
  12. 12.
    Chen, T., Tan, P., Ma, L.Q., Cheng, M.M., Shamir, A., Hu, S.M.: Poseshop: human image database construction and personalized content synthesis. IEEE Trans. Vis. Comput. Graph. 19(5), 824–837 (2013) CrossRefGoogle Scholar
  13. 13.
    Cheng, M.M., Mitra, N.J., Huang, X., Torr, P.H.S., Hu, S.M.: Salient object detection and segmentation. Tech. rep., Tsinghua University (2011). Submission NO. TPAMI-2011-10-0753
  14. 14.
    Cheng, M.M., Zhang, F.L., Mitra, N.J., Huang, X., Hu, S.M.: Repfinder: finding approximately repeated scene elements for image editing. ACM Trans. Graph. 29(4), 83:1–83:8 (2010) CrossRefGoogle Scholar
  15. 15.
    Cheng, M.M., Zhang, G.X., Mitra, N.J., Huang, X., Hu, S.M.: Global contrast based salient region detection. In: IEEE CVPR, pp. 409–416 (2011) Google Scholar
  16. 16.
    Chia, Y.S., Zhuo, S., Gupta, R.K., Tai, Y.W., Cho, S.Y., Tan, P., Lin, S.: Semantic colorization with internet images. ACM Trans. Graph. 30(6) (2011). doi: 10.1145/2024156.2024190
  17. 17.
    Cui, J., Wen, F., Tang, X.: Real time Google and live image search re-ranking. In: ACM MM, pp. 729–732 (2008) Google Scholar
  18. 18.
    Datta, R., Joshi, D., Li, J., Wang, J.: Image retrieval: ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2), 1–60 (2008) CrossRefGoogle Scholar
  19. 19.
    Del Bimbo, A., Pala, P.: Visual image retrieval by elastic matching of user sketches. IEEE Trans. Pattern Anal. Mach. Intell. 19(2), 121–132 (1997) CrossRefGoogle Scholar
  20. 20.
    Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE CVPR, pp. 248–255 (2009) Google Scholar
  21. 21.
    Eitz, M., Hays, J., Alexa, M.: How do humans sketch objects? ACM Trans. Graph. 31(4) (2012). doi: 10.1145/2185520.2185540
  22. 22.
    Eitz, M., Hildebrand, K., Boubekeur, T., Alexa, M.: Sketch-based image retrieval: benchmark and bag-of-features descriptors. IEEE Trans. Vis. Comput. Graph. 17, 1624–1636 (2011) CrossRefGoogle Scholar
  23. 23.
    Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The Pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010) CrossRefGoogle Scholar
  24. 24.
    Felzenszwalb, P., Huttenlocher, D.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59(2), 167–181 (2004) CrossRefGoogle Scholar
  25. 25.
    Fergus, R., Perona, P., Zisserman, A.: A visual category filter for Google images. In: ECCV, pp. 242–256 (2004) Google Scholar
  26. 26.
    Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., et al.: Query by image and video content: the QBIC system. Computer 28(9), 23–32 (1995) CrossRefGoogle Scholar
  27. 27.
    Gao, Y., Wang, M., Tao, D., Ji, R., Dai, Q.: 3-d object retrieval and recognition with hypergraph analysis. IEEE Trans. Image Process. 21(9), 4290–4303 (2012) CrossRefMathSciNetGoogle Scholar
  28. 28.
    Gao, Y., Wang, M., Zha, Z., Tian, Q., Dai, Q., Zhang, N.: Less is more: efficient 3d object retrieval with query view selection. IEEE Trans. Multimed. 11(5), 1007–1018 (2011) CrossRefGoogle Scholar
  29. 29.
    Han, J., Ngan, K., Li, M., Zhang, H.: Unsupervised extraction of visual attention objects in color images. IEEE Trans. Circuits Syst. Video Technol. 16(1), 141–145 (2006) CrossRefGoogle Scholar
  30. 30.
    He, J., Feng, J., Liu, X., Cheng, T., Lin, T.H., Chung, H., Chang, S.F.: Mobile product search with bag of hash bits and boundary reranking. In: IEEE CVPR, pp. 3005–3012 (2012) Google Scholar
  31. 31.
    Hirata, K., Kato, T.: Query by visual example-content based image retrieval. In: Advances in Database Technology-EDBT, pp. 56–71 (1992) CrossRefGoogle Scholar
  32. 32.
    Hu, R., Wang, T., Collomosse, J.: A bag-of-regions approach to sketch-based image retrieval. In: IEEE ICIP, pp. 3661–3664 (2011) Google Scholar
  33. 33.
    Hu, S.M., Chen, T., Xu, K., Cheng, M.M., Martin, R.R.: Internet visual media processing: a survey with graphics and vision applications. Vis. Comput., 1–13 (2013). doi: 10.1007/s00371-013-0792-6
  34. 34.
    Huang, H., Zhang, L., Zhang, H.: Arcimboldo-like collage using Internet images. ACM Trans. Graph. 30(6), 155 (2011) Google Scholar
  35. 35.
    Jones, M., Rehg, J.: Statistical color models with application to skin detection. Int. J. Comput. Vis. 46(1), 81–96 (2002) CrossRefzbMATHGoogle Scholar
  36. 36.
    Ko, B., Nam, J.: Object-of-interest image segmentation based on human attention and semantic region clustering. J. Opt. Soc. Am. 23(10), 2462–2470 (2006) CrossRefGoogle Scholar
  37. 37.
    Kuettel, D., Ferrari, V.: Figure-ground segmentation by transferring window masks. In: IEEE CVPR, pp. 558–565 (2012) Google Scholar
  38. 38.
    Kuettel, D., Guillaumin, M., Ferrari, V.: Segmentation propagation in ImageNet. In: ECCV, pp. 459–473. Springer, Berlin (2012) Google Scholar
  39. 39.
    Li, H., Ngan, K.N.: A co-saliency model of image pairs. IEEE Trans. Image Process. 20(12), 3365–3375 (2011) CrossRefMathSciNetGoogle Scholar
  40. 40.
    Liu, H., Zhang, L., Huang, H.: Web-image driven best views of 3d shapes. Vis. Comput., pp. 1–9 (2012). doi: 10.1007/s00371-011-0638-z
  41. 41.
    Margolin, R., Zelnik-Manor, L., Tal, A.: Saliency for image manipulation. Vis. Comput., pp. 381–392 (2013). doi: 10.1007/s00371-012-0740-x
  42. 42.
    Mori, G., Belongie, S., Malik, J.: Shape contexts enable efficient retrieval of similar shapes. In: IEEE CVPR, pp. 723–730 (2001) Google Scholar
  43. 43.
    Peter, A., Rangarajan, A., Ho, J.: Shape L’ane rouge: sliding wavelets for indexing and retrieval. In: IEEE CVPR, pp. 1–8 (2008) Google Scholar
  44. 44.
    Popescu, A., Moëllic, P., Kanellos, I., Landais, R.: Lightweight web image reranking. In: ACM MM, pp. 657–660 (2009) Google Scholar
  45. 45.
    Rahtu, E., Kannala, J., Salo, M., Heikkila, J.: Segmenting salient objects from images and videos. In: ECCV, pp. 366–379 (2010) Google Scholar
  46. 46.
    Rother, C., Kolmogorov, V., Blake, A.: “GrabCut”—interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004) CrossRefGoogle Scholar
  47. 47.
    Schmid, J., Guitián, J.A.I., Gobbetti, E., Magnenat-Thalmann, N.: A GPU framework for parallel segmentation of volumetric images using discrete deformable models. Vis. Comput. 27(2), 85–95 (2011) CrossRefGoogle Scholar
  48. 48.
    Yang, X., Koknar-Tezel, S., Latecki, L.: Locally constrained diffusion process on locally densified distance spaces with applications to shape retrieval. In: IEEE CVPR, pp. 357–364 (2009) Google Scholar
  49. 49.
    Zhang, D., Lu, G.: Review of shape representation and description techniques. Pattern Recognit. 37, 1–19 (2004) CrossRefzbMATHGoogle Scholar
  50. 50.
    Zhang, D.S., Lu, G.J.: Shape-based image retrieval using generic Fourier descriptor. Signal Process. Image Commun. 17(10), 825–848 (2002) CrossRefGoogle Scholar
  51. 51.
    Zhang, G.X., Cheng, M.M., Hu, S.M., Martin, R.R.: A shape-preserving approach to image resizing. Comput. Graph. Forum 28(7), 1897–1906 (2009) CrossRefGoogle Scholar
  52. 52.
    Zhang, L., Huang, H.: Hierarchical narrative collage for digital photo album. Comput. Graph. Forum 31(7), 2173–2181 (2012) CrossRefGoogle Scholar
  53. 53.
    Zheng, Y., Chen, X., Cheng, M.M., Zhou, K., Hu, S.M., Mitra, N.J.: Interactive images: Cuboid-based scene understanding for smart manipulation. ACM Trans. Graph. 31(4) (2012). doi: 10.1145/2185520.2185595

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Ming-Ming Cheng
    • 1
  • Niloy J. Mitra
    • 2
  • Xiaolei Huang
    • 3
  • Shi-Min Hu
    • 1
  1. 1.TNListTsinghua UniversityBeijingChina
  2. 2.University College LondonLondonUK
  3. 3.Lehigh UniversityBethlehemUSA

Personalised recommendations