Advertisement

International Journal of Computer Vision

, Volume 120, Issue 2, pp 215–232 | Cite as

Detection of Co-salient Objects by Looking Deep and Wide

  • Dingwen Zhang
  • Junwei Han
  • Chao Li
  • Jingdong Wang
  • Xuelong Li
Article

Abstract

In this paper, we propose a unified co-salient object detection framework by introducing two novel insights: (1) looking deep to transfer higher-level representations by using the convolutional neural network with additional adaptive layers could better reflect the sematic properties of the co-salient objects; (2) looking wide to take advantage of the visually similar neighbors from other image groups could effectively suppress the influence of the common background regions. The wide and deep information are explored for the object proposal windows extracted in each image. The window-level co-saliency scores are calculated by integrating the intra-image contrast, the intra-group consistency, and the inter-group separability via a principled Bayesian formulation and are then converted to the superpixel-level co-saliency maps through a foreground region agreement strategy. Comprehensive experiments on two existing and one newly established datasets have demonstrated the consistent performance gain of the proposed approach.

Keywords

Co-saliency detection Domain adaptive convolutional neural network Bayesian framework 

Notes

Acknowledgments

This work was supported in part by the National Science Foundation of China under Grants 61522207 and 61473231, the Doctorate Foundation, and the Excellent Doctorate Foundation of Northwestern Polytechnical University.

References

  1. Achanta, R., Hemami, S., Estrada, F., & Susstrunk, S. (2009). Frequency-tuned salient region detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1597–1604).Google Scholar
  2. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Susstrunk, S. (2012). Slic superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2274–2282.CrossRefGoogle Scholar
  3. Akamine, K., Fukuchi, K., Kimura, A., & Takagi, S. (2012). Fully automatic extraction of salient objects from videos in near real time. The Computer Journal, 55(1), 3–14.CrossRefGoogle Scholar
  4. Batra, D., Kowdle, A., Parikh, D., Luo, J., & Chen, T. (2010). icoseg: Interactive co-segmentation with intelligent scribble guidance. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3169–3176).Google Scholar
  5. Bengio, Y. (2009). Learning deep architectures for ai. Foundations and trends in Machine Learning, 2(1), 1–127.MathSciNetCrossRefzbMATHGoogle Scholar
  6. Boiman, O., Shechtman, E., & Irani, M. (2008) In defense of nearest-neighbor based image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8).Google Scholar
  7. Cao, X., Cheng, Y., Tao, Z., & Fu, H. (2014). Co-saliency detection via base reconstruction. In Proceedings of the ACM international conference on multimedia (pp. 997–1000).Google Scholar
  8. Cao, X., Tao, Z., Zhang, B., Fu, H., & Feng, W. (2014). Self-adaptively weighted co-saliency detection via rank constraint. IEEE Transactions on Image Processing, 22(9), 4175–4182.MathSciNetGoogle Scholar
  9. Chen, H.-T. (2010). Preattentive co-saliency detection. In Proceedings of the IEEE international conference on image processing (pp. 1117–1120).Google Scholar
  10. Chen, X., Shrivastava, A., & Gupta, A. (2014). Enriching visual knowledge bases via object discovery and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2035–2042).Google Scholar
  11. Cheng, M.-M., Zhang, Z., Lin, W.-Y. & Torr, P. (2014). Bing: Binarized normed gradients for objectness estimation at 300fps. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3286–3293).Google Scholar
  12. Dai, J., Wu, Y. N., Zhou, J. & Zhu,S.-C. (2013). Cosegmentation and cosketch by unsupervised learning. In Proceedings of the IEEE international conference on computer vision (pp. 1305–1312).Google Scholar
  13. Eichner, M., & Ferrari, V. (2012). Human pose co-estimation and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2282–2288.CrossRefGoogle Scholar
  14. Fu, H., Cao, X., & Tu, Z. (2013). Cluster-based co-saliency detection. IEEE Transactions on Image Processing, 22(10), 3766–3778.MathSciNetCrossRefGoogle Scholar
  15. Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580–587).Google Scholar
  16. Goferman, S., Zelnik-Manor, L., & Tal, A. (2012). Context-aware saliency detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(10), 1915–1926.CrossRefGoogle Scholar
  17. Guo, J., Li, Z., Cheong, L.-F., & Zhou, S. Z. (2013). Video co-segmentation for meaningful action extraction. In Proceedings of the IEEE international conference on computer vision (pp. 2232–2239).Google Scholar
  18. Jacobs, D. E., Goldman, D. B., & Shechtman, E. (2010). Cosaliency: Where people look when comparing images. In Proceedings of the 23nd annual ACM symposium on User interface software and technology (pp. 219–228).Google Scholar
  19. Jia, Y., & Han, M. (2013). Category-independent object-level saliency detection. In Proceedings of the IEEE international conference on computer vision (pp. 1761–1768).Google Scholar
  20. Jiang, H., Wang, J., Yuan, Z., Liu, T., & Zheng, N. (2011). Automatic salient object segmentation based on context and shape prior. In Proceedings of the British machine vision conference (pp. 1–12).Google Scholar
  21. Jiang, H., Wang, J., Yuan, Z., Wu, Y., Zheng, N., & Li, S. (2013). Salient object detection: A discriminative regional feature integration approach. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2083–2090).Google Scholar
  22. Joulin, A., Bach, F., & Ponce, J. (2010). Discriminative clustering for image co-segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1943–1950).Google Scholar
  23. Joulin, A., Bach, F., & Ponce, J. (2012). Multi-class cosegmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 542–549).Google Scholar
  24. Kim, G., Xing, E. P., Fei-Fei, L., & Kanade, T. (2011). Distributed cosegmentation via submodular optimization on anisotropic diffusion. In Proceedings of the IEEE conference on computer vision (pp. 169–176).Google Scholar
  25. Kuettel, D., & Ferrari, V. (2012). Figure-ground segmentation by transferring window masks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 558–565).Google Scholar
  26. Lee, W.-F., Huang, T.-H., Yeh, S.-L., & Chen, H. H. (2011). Learning-based prediction of visual attention for video signals. IEEE Transactions on Image Processing, 20(11), 3028–3038.MathSciNetCrossRefGoogle Scholar
  27. Li, W.-T., Chang, H.-S., Lien, K.-C., Chang, H.-T., & Wang, Y. (2013). Exploring visual and motion saliency for automatic video object extraction. IEEE Transactions on Image Processing, 22(7), 2600–2610.CrossRefGoogle Scholar
  28. Li, Y., Fu, K., Liu, Z., & Yang, J. (2015). Efficient saliency-model-guided visual co-saliency detection. IEEE Signal Processing Letters, 22(5), 588–592.CrossRefGoogle Scholar
  29. Li, H., Meng, F., & Ngan, K. N. (2013). Co-salient object detection from multiple images. IEEE Transactions on Multimedia, 15(8), 1896–1909.CrossRefGoogle Scholar
  30. Li, H., & Ngan, K. N. (2011). A co-saliency model of image pairs. IEEE Transactions on Image Processing, 20(12), 3365–3375.MathSciNetCrossRefGoogle Scholar
  31. Li, Y., Sheng, B., Ma, L., Wu, W., & Xie, Z. (2013). Temporally coherent video saliency using regional dynamic contrast. IEEE Transactions on Circuits and Systems for Video Technology, 23(12), 2067–2076.Google Scholar
  32. Li, J., Tian, Y., & Huang, T. (2014). Visual saliency with statistical priors. International Journal of Computer Vision, 107(3), 239–253.MathSciNetCrossRefzbMATHGoogle Scholar
  33. Li, J., Tian, Y., Huang, T., & Gao, W. (2010). Probabilistic multi-task learning for visual saliency estimation in video. International Journal of Computer Vision, 90(2), 150–165.CrossRefGoogle Scholar
  34. Liu, Z., Zou, W., Li, L., Shen, L., & Le Meur, O. (2014). Co-saliency detection based on hierarchical segmentation. IEEE Signal Processing Letters, 21(1), 88–92.CrossRefGoogle Scholar
  35. Marat, S., Phuoc, T. H., Granjon, L., Guyader, N., Pellerin, D., & Guérin-Dugué, A. (2009). Modelling spatio-temporal saliency to predict gaze direction for short videos. International Journal of Computer Vision, 82(3), 231–243.CrossRefGoogle Scholar
  36. Meng, F., Li, H., Liu, G., & Ngan, K. N. (2012). Object co-segmentation based on shortest path algorithm and saliency model. IEEE Transactions on Multimedia, 14(5), 1429–1441.CrossRefGoogle Scholar
  37. Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2014). Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1717–1724).Google Scholar
  38. Prest, A., Leistner, C., Civera, J., Schmid, C., & Ferrari, V. (2012). Learning object class detectors from weakly annotated video. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3282–3289).Google Scholar
  39. Rubinstein, M., Joulin, A., Kopf, J.  & Liu, C. (2013). Unsupervised joint object discovery and segmentation in internet images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1939–1946).Google Scholar
  40. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.MathSciNetCrossRefGoogle Scholar
  41. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2013). Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229.
  42. Shen, X., & Wu, Y. (2012). A unified approach to salient object detection via low rank matrix recovery. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 853–860).Google Scholar
  43. Siva, P., Russell, C., & Xiang,T. (2012). In defence of negative mining for annotating weakly labelled data. In Proceedings of the European conference on computer vision (pp. 594–608).Google Scholar
  44. Siva, P., Russell, C., Xiang, T., & Agapito,L. (2013). Looking beyond the image: Unsupervised learning for object saliency and detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3238–3245).Google Scholar
  45. Tan, Z., Wan, L., Feng, W., & Pun, C.-M. (2013). Image co-saliency detection by propagating superpixel affinities. In Proceedings of the IEEE international conference on acoustics, speech and signal processing (pp. 2114–2118).Google Scholar
  46. Tang, K., Joulin, A., Li, L.-J., & Fei-Fei, L. (2014). Co-localization in real-world images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1464–1471).Google Scholar
  47. Tian, Y., Li, J., Yu, S., & Huang, T. (2014). Learning complementary saliency priors for foreground object segmentation in complex scenes. International Journal of Computer Vision, 111(2), 153–170.CrossRefGoogle Scholar
  48. Tighe, J., & Lazebnik, S. (2013). Finding things: Image parsing with regions and per-exemplar detectors. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3001–3008).Google Scholar
  49. Toshev, A., Shi, J., & Daniilidis, K. (2007). Image matching via saliency region correspondences. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8).Google Scholar
  50. Vicente, S., Rother, C., & Kolmogorov, V. (2011). Object cosegmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2217–2224).Google Scholar
  51. Wang Z., & Liu, R. (2012). Semi-supervised learning for large scale image cosegmentation. In Proceedings of the IEEE international conference on computer vision (pp. 393–400).Google Scholar
  52. Wang, L., Hua, G., Sukthankar, R., Xue, J., & Zheng, N. (2014). Video object discovery and co-segmentation with extremely weak supervision. In Proceedings of the European conference on computer vision (pp. 640–655).Google Scholar
  53. Wang, J., DaSilva, M. P., LeCallet, P., & Ricordel, V. (2013). Computational model of stereoscopic 3d visual saliency. IEEE Transactions on Image Processing, 22(6), 2151–2165.Google Scholar
  54. Winn, J., Criminisi, A., & Minka, T. (2005). Object categorization by learned universal visual dictionary. In Proceedings of the IEEE international conference on computer vision (pp. 1800–1807).Google Scholar
  55. Xie, Y., Lu, H., & Yang, M.-H. (2013). Bayesian saliency via low and mid level cues. IEEE Transactions on Image Processing, 22(5), 1689–1698.MathSciNetCrossRefGoogle Scholar
  56. Xue, J., Wang, L., Zheng, N., & Hua, G. (2013). Automatic salient object extraction with contextual cue and its applications to recognition and alpha matting. Pattern Recognition, 46(11), 2874–2889.CrossRefGoogle Scholar
  57. Yang, C., Zhang, L., Lu, H., Ruan, X., & Yang, M.-H. (2013). Saliency detection via graph-based manifold ranking. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3166–3173).Google Scholar
  58. Zhang, D., Han, J., Li, C., & Wang, J. (2015). Co-saliency detection via looking deep and wide. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2994–3002).Google Scholar
  59. Zhang, L., Tong, M. H., Marks, T. K., Shan, H., & Cottrell, G. W. (2008). Sun: A bayesian framework for saliency using natural statistics. Journal of Vision, 8(7), 32.CrossRefGoogle Scholar
  60. Zhou, D., Weston, J., Gretton, A., Bousquet, O., & Schölkopf, B. (2004). Ranking on data manifolds. In Proceedings of advances in neural information processing systems (pp. 169–176).Google Scholar
  61. Zhu, J.-Y., Wu, J. Wei, Y., Chang, E., & Tu, Z. (2012). Unsupervised object class discovery via saliency-guided multiple class learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3218–3225).Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Dingwen Zhang
    • 1
  • Junwei Han
    • 1
  • Chao Li
    • 1
  • Jingdong Wang
    • 2
  • Xuelong Li
    • 3
  1. 1.School of AutomationNorthwestern Polytechnical UniversityXi’anChina
  2. 2.Microsoft Research AsiaBeijingChina
  3. 3. Xi’an Institute of Optics and Precision MechanicsChinese Academy of SciencesXi’anChina

Personalised recommendations