Skip to main content
Log in

Learning to Detect Instance-Level Salient Objects Using Complementary Image Labels

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Existing salient instance detection (SID) methods typically learn from pixel-level annotated datasets. In this paper, we present the first weakly-supervised approach to the SID problem. Although weak supervision has been considered in general saliency detection, it is mainly based on using class labels for object localization. However, it is non-trivial to use only class labels to learn instance-aware saliency information, as salient instances with high semantic affinities may not be easily separated by the labels. As the subitizing information provides an instant judgement on the number of salient items, it is naturally related to detecting salient instances and may help separate instances of the same class while grouping different parts of the same instance. Inspired by this observation, we propose to use class and subitizing labels as weak supervision for the SID problem. We propose a novel weakly-supervised network with three branches: a Saliency Detection Branch leveraging class consistency information to locate candidate objects; a Boundary Detection Branch exploiting class discrepancy information to delineate object boundaries; and a Centroid Detection Branch using subitizing information to detect salient instance centroids. This complementary information is then fused to produce a salient instance map. To facilitate the learning process, we further propose a progressive training scheme to reduce label noise and the corresponding noise learned by the model, via reciprocating the model with progressive salient instance prediction and model refreshing. Our extensive evaluations show that the proposed method plays favorably against carefully designed baseline methods adapted from related tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. As of today, the codes for MSRNet (Li et al. 2017) are still not available. Following (Fan et al. 2019), we directly copy the numbers reported in Li et al. (2017) to our submission for a quantitative comparison.

References

  • Achanta, R., Hemami, S., Estrada, F., & Süsstrunk, S. (2009). Frequency-tuned salient region detection. In CVPR.

  • Ahn, J., Cho, S., & Kwak, S. (2019). Weakly supervised learning of instance segmentation with inter-pixel relations. In CVPR.

  • Ahn, J., & Kwak, S. (2018). Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In CVPR.

  • Alexe, B., Deselaers, T., & Ferrari, V. (2012). Measuring the objectness of image windows. In IEEE PAMI.

  • Arbeláez, P., Pont-Tuset, J., Barron, J., Marques, F., & Malik, J. (2014). Multiscale combinatorial grouping. In CVPR.

  • Chen, C., Sun, X., Hua, Y., Dong, J., & Xv, H. (2020). Learning deep relations to promote saliency detection. In AAAI.

  • Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587.

  • Cheng, M. M., Mitra, N. J., Huang, X., Torr, P. H., & Hu, S. M. (2014). Global contrast based salient region detection. In IEEE TPAMI.

  • Cholakkal, H., Sun, G., Khan, F. S., Shao, L. (2019). Object counting and instance segmentation with image-level supervision. In CVPR.

  • Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. In IJCV.

  • Fan, R., Cheng, M. M., Hou, Q., Mu, T. J., Wang, J., & Hu, S. M. (2019). S4net: Single stage salient-instance segmentation. In CVPR.

  • Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., & Lu, H. (2019). Dual attention network for scene segmentation. In CVPR.

  • Gao, S.H., Tan, Y.Q., Cheng, M.M., Lu, C., Chen, Y., & Yan, S. (2020). Highly efficient salient object detection with 100k parameters. In ECCV.

  • Hariharan, B., Arbeláez, P., Girshick, R., & Malik, J. (2014). Simultaneous detection and segmentation. In ECCV.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.

  • He, S., Jiao, J., Zhang, X., Han, G., & Lau, R. W. (2017). Delving into salient object subitizing and detection. In ICCV.

  • Hou, Q., Cheng, M. M., Hu, X., Borji, A., Tu, Z., & Torr, P. H. (2017). Deeply supervised salient object detection with short connections. In CVPR.

  • Hu, M., Han, H., Shan, S., & Chen, X.(2019). Weakly supervised image classification through noise regularization. In CVPR.

  • Jiang, H., Wang, J., Yuan, Z., Wu, Y., Zheng, N., & Li, S. (2013). Salient object detection: A discriminative regional feature integration approach. In CVPR.

  • John, C. (1986). A computational approach to edge detection. In IEEE TPAMI.

  • Karpathy, A., & Fei-Fei, L. (2015). Deep visual-semantic alignments for generating image descriptions. In CVPR.

  • Krähenbühl, P., & Koltun, V. (2011). Efficient inference in fully connected crfs with gaussian edge potentials. In NeurIPS.

  • Laine, S., & Aila, T. (2016). Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242.

  • Laradji, I. H., Vazquez, D., Schmidt, M. (2019). Where are the masks: Instance segmentation with image-level supervision. In BMVC.

  • Li, Y., Hou, X., Koch, C., Rehg, J. M., Yuille, A. L. (2014). The secrets of salient object segmentation. In CVPR.

  • Li, G., Xie, Y., Lin, L., & Yu, Y. (2017). Instance-level salient object segmentation. In CVPR.

  • Li, X., Yang, F., Cheng, H., Liu, W., & Shen, D. (2018). Contour knowledge transfer for salient object detection. In ECCV.

  • Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In CVPR.

  • Liu, N., Han, J., & Yang, M. H (2018). Picanet: Learning pixel-wise contextual attention for saliency detection. In CVPR.

  • Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X., & Shum, H. Y. (2007). Learning to detect a salient object. In CVPR.

  • Liu, Y., Wang, P., Cao, Y., Liang, Z., & Lau, R. W. (2021). Weakly-supervised salient object detection with saliency bounding boxes. In TIP.

  • Lu, Z., Fu, Z., Xiang, T., Han, P., Wang, L., & Gao, X. (2016). Learning from weak and noisy labels for semantic segmentation. In IEEE TPAMI.

  • Luo, Z., Mishra, A., Achkar, A., Eichel, J., Li, S., & Jodoin, P. M. (2017). Non-local deep features for salient object detection. In CVPR.

  • Neven, D., Brabandere, B. D., Proesmans, M., & Gool, L. V. (2019). Instance segmentation by jointly optimizing spatial embeddings and clustering bandwidth. In CVPR.

  • Pang, Y., Zhao, X., Zhang, L., Lu, H. (2020). Multi-scale interactive network for salient object detection. In CVPR.

  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In NeurIPS.

  • Perazzi, F., Krähenbühl, P., Pritch, Y., & Hornung, A. (2012). Saliency filters: contrast based filtering for salient region detection. In CVPR.

  • Pinheiro, P. O., Collobert, R., & Dollár, P. (2015). Learning to segment object candidates. In NeurIPS

  • Shi, J., Yan, Q., Xu, L., & Jia, J. (2015). Hierarchical image saliency detection on extended cssd. In IEEE TPAMI.

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.

  • Siris, A., Jiao, J., Tam, G. K., Xie, X., & Lau, R. W. (2020). Inferring attention shift ranks of objects for image saliency. In CVPR.

  • Su, J., Li, J., Zhang, Y., Xia, C., & Tian, Y. (2019). Selectivity or invariance: Boundary-aware salient object detection. In ICCV.

  • Tan, X., Xu, K., Ying, C., Yiheng, Z., Ma, L., & Rynson, L. (2021). Night-time scene parsing with a large real dataset. In IEEE TIP.

  • Tarvainen, A., & Valpola, H. (2017). Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In NeurIPS.

  • Tian, X., Xu, K., Yang, X., Yin, B., & Lau, R. W. (2020). Weakly-supervised salient instance detection. In BMVC.

  • Wang, B., Chen, Q., Zhou, M., Zhang, Z., Jin, X., & Gai, K. (2020). Progressive feature polishing network for salient object detection. In AAAI.

  • Wang, L., Lu, H., Wang, Y., Feng, M., Wang, D., Yin, B., & Ruan, X. (2017). Learning to detect salient objects with image-level supervision. In CVPR.

  • Wang, W., & Shen, J. (2017). Deep cropping via attention box prediction and aesthetics assessment. In ICCV.

  • Wang, W., Shen, J., & Porikli, F. (2015). Saliency-aware geodesic video object segmentation. In CVPR.

  • Wang, W., Zhao, S., Shen, J., Hoi, S. C., & Borji, A. (2019). Salient object detection with pyramid attention and salient edges. In CVPR.

  • Wang, T., Zhang, L., Wang, S., Lu, H., Yang, G., Ruan, X., & Borji, A. (2018). Detect globally, refine locally: A novel approach to saliency detection. In CVPR.

  • Wei, J., Wang, S., Huang, Q (2020). F\(^3\)net: Fusion, feedback and focus for salient object detection. In AAAI.

  • Wei, J., Wang, S., Wu, Z., Su, C., Huang, Q., & Tian, Q. (2020). Label decoupling framework for salient object detection. In CVPR.

  • Woo, S., Park, J., Lee, J. Y., So Kweon, I. (2018). Cbam: Convolutional block attention module. In ECCV.

  • Wu, Z., Su, L., & Huang, Q. (2019). Stacked cross refinement network for edge-aware salient object detection. In ICCV.

  • Xu, Y., Xu, D., Hong, X., Ouyang, W., Ji, R., Xu, M., & Zhao, G. (2019). Structured modeling of joint deep feature and prediction refinement for salient object detection. In ICCV.

  • Yang, J., Price, B., Cohen, S., Lee, H., & Yang, M. H. (2016). Object contour detection with a fully convolutional encoder-decoder network. In CVPR.

  • Yang, X., Xu, K., Chen, S., He, S., Yin, B. Y., & Lau, R. (2018). Active matting. In NeurIPS.

  • Yang, C., Zhang, L., Lu, H., Ruan, X., & Yang, M. H. (2013). Saliency detection via graph-based manifold ranking. In CVPR.

  • Zeng, Y., Zhuge, Y., Lu, H., Zhang, L., Qian, M., & Yu, Y. (2019). Multi-source weak supervision for saliency detection. In CVPR.

  • Zhang, L., Dai, J., Lu, H., He, Y., & Wang, G. (2018). A bi-directional message passing model for salient object detection. In CVPR.

  • Zhang, D., Han, J., Zhang, Y. (2017). Supervision by fusion: Towards unsupervised learning of deep salient object detector. In ICCV.

  • Zhang, J., Sclaroff, S., Lin, Z., Shen, X., Price, B., & Mech, R. (2016). Unconstrained salient object detection via proposal subset optimization. In CVPR.

  • Zhang, J., Xie, J., & Barnes, N. (2020). Learning noise-aware encoder-decoder from noisy labels by alternating back-propagation for saliency detection. In ECCV.

  • Zhang, J., Yu, X., Li, A., Song, P., Liu, B., & Dai, Y. (2020). Weakly-supervised salient object detection via scribble annotations. In CVPR.

  • Zhang, J., Zhang, T., Dai, Y., Harandi, M., & Hartley, R. (2018). Deep unsupervised saliency detection: A multiple noisy labeling perspective. In CVPR.

  • Zhao, J. X., Liu, J., Fan, D. P., Cao, Y., Yang, J., & Cheng, M. M. (2019). Egnet: Edge guidance network for salient object detection. In ICCV.

  • Zhao, T., & Wu, X. (2019). Pyramid feature attention network for saliency detection. In CVPR.

  • Zhao, X., Pang, Y., Zhang, L., Lu, H., & Zhang, L (2020). Suppress and balance: A simple gated network for salient object detection. In ECCV.

  • Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A (2016). Learning deep features for discriminative localization. In CVPR.

  • Zhou, H., Xie, X., Lai, J. H., Chen, Z., & Yang, L. (2020). Interactive two-stream decoder for accurate and fast saliency detection. In CVPR.

  • Zhou, Y., Zhu, Y., Ye, Q., Qiu, Q., & Jiao, J. (2018). Weakly supervised instance segmentation using class peak response. In CVPR.

  • Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., & Jiao, J. (2019) Learning instance activation maps for weakly supervised instance segmentation. In CVPR, pp. 3116–3125.

  • Zhuge, Y., Yang, G., Zhang, P., & Lu, H. (2018). Boundary-guided feature aggregation network for salient object detection. In IEEE Signal Processing Letters.

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grants 61632006, 61972067, and the Innovation Technology Funding of Dalian (Project No. 2018J11CY010, 2020JJ26GX036); a General Research Fund from RGC of Hong Kong (RGC Ref.: 11205620); and a Strategic Research Grant from City University of Hong Kong (Ref.: 7005674).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Tian.

Additional information

Communicated by Jingdong Wang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Xin Tian and Ke Xu have first authors.

Rynson Lau leads this project.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, X., Xu, K., Yang, X. et al. Learning to Detect Instance-Level Salient Objects Using Complementary Image Labels. Int J Comput Vis 130, 729–746 (2022). https://doi.org/10.1007/s11263-021-01553-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-021-01553-w

Keywords

Navigation