Abstract
Existing salient instance detection (SID) methods typically learn from pixel-level annotated datasets. In this paper, we present the first weakly-supervised approach to the SID problem. Although weak supervision has been considered in general saliency detection, it is mainly based on using class labels for object localization. However, it is non-trivial to use only class labels to learn instance-aware saliency information, as salient instances with high semantic affinities may not be easily separated by the labels. As the subitizing information provides an instant judgement on the number of salient items, it is naturally related to detecting salient instances and may help separate instances of the same class while grouping different parts of the same instance. Inspired by this observation, we propose to use class and subitizing labels as weak supervision for the SID problem. We propose a novel weakly-supervised network with three branches: a Saliency Detection Branch leveraging class consistency information to locate candidate objects; a Boundary Detection Branch exploiting class discrepancy information to delineate object boundaries; and a Centroid Detection Branch using subitizing information to detect salient instance centroids. This complementary information is then fused to produce a salient instance map. To facilitate the learning process, we further propose a progressive training scheme to reduce label noise and the corresponding noise learned by the model, via reciprocating the model with progressive salient instance prediction and model refreshing. Our extensive evaluations show that the proposed method plays favorably against carefully designed baseline methods adapted from related tasks.
Similar content being viewed by others
References
Achanta, R., Hemami, S., Estrada, F., & Süsstrunk, S. (2009). Frequency-tuned salient region detection. In CVPR.
Ahn, J., Cho, S., & Kwak, S. (2019). Weakly supervised learning of instance segmentation with inter-pixel relations. In CVPR.
Ahn, J., & Kwak, S. (2018). Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In CVPR.
Alexe, B., Deselaers, T., & Ferrari, V. (2012). Measuring the objectness of image windows. In IEEE PAMI.
Arbeláez, P., Pont-Tuset, J., Barron, J., Marques, F., & Malik, J. (2014). Multiscale combinatorial grouping. In CVPR.
Chen, C., Sun, X., Hua, Y., Dong, J., & Xv, H. (2020). Learning deep relations to promote saliency detection. In AAAI.
Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587.
Cheng, M. M., Mitra, N. J., Huang, X., Torr, P. H., & Hu, S. M. (2014). Global contrast based salient region detection. In IEEE TPAMI.
Cholakkal, H., Sun, G., Khan, F. S., Shao, L. (2019). Object counting and instance segmentation with image-level supervision. In CVPR.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. In IJCV.
Fan, R., Cheng, M. M., Hou, Q., Mu, T. J., Wang, J., & Hu, S. M. (2019). S4net: Single stage salient-instance segmentation. In CVPR.
Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., & Lu, H. (2019). Dual attention network for scene segmentation. In CVPR.
Gao, S.H., Tan, Y.Q., Cheng, M.M., Lu, C., Chen, Y., & Yan, S. (2020). Highly efficient salient object detection with 100k parameters. In ECCV.
Hariharan, B., Arbeláez, P., Girshick, R., & Malik, J. (2014). Simultaneous detection and segmentation. In ECCV.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.
He, S., Jiao, J., Zhang, X., Han, G., & Lau, R. W. (2017). Delving into salient object subitizing and detection. In ICCV.
Hou, Q., Cheng, M. M., Hu, X., Borji, A., Tu, Z., & Torr, P. H. (2017). Deeply supervised salient object detection with short connections. In CVPR.
Hu, M., Han, H., Shan, S., & Chen, X.(2019). Weakly supervised image classification through noise regularization. In CVPR.
Jiang, H., Wang, J., Yuan, Z., Wu, Y., Zheng, N., & Li, S. (2013). Salient object detection: A discriminative regional feature integration approach. In CVPR.
John, C. (1986). A computational approach to edge detection. In IEEE TPAMI.
Karpathy, A., & Fei-Fei, L. (2015). Deep visual-semantic alignments for generating image descriptions. In CVPR.
Krähenbühl, P., & Koltun, V. (2011). Efficient inference in fully connected crfs with gaussian edge potentials. In NeurIPS.
Laine, S., & Aila, T. (2016). Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242.
Laradji, I. H., Vazquez, D., Schmidt, M. (2019). Where are the masks: Instance segmentation with image-level supervision. In BMVC.
Li, Y., Hou, X., Koch, C., Rehg, J. M., Yuille, A. L. (2014). The secrets of salient object segmentation. In CVPR.
Li, G., Xie, Y., Lin, L., & Yu, Y. (2017). Instance-level salient object segmentation. In CVPR.
Li, X., Yang, F., Cheng, H., Liu, W., & Shen, D. (2018). Contour knowledge transfer for salient object detection. In ECCV.
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In CVPR.
Liu, N., Han, J., & Yang, M. H (2018). Picanet: Learning pixel-wise contextual attention for saliency detection. In CVPR.
Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X., & Shum, H. Y. (2007). Learning to detect a salient object. In CVPR.
Liu, Y., Wang, P., Cao, Y., Liang, Z., & Lau, R. W. (2021). Weakly-supervised salient object detection with saliency bounding boxes. In TIP.
Lu, Z., Fu, Z., Xiang, T., Han, P., Wang, L., & Gao, X. (2016). Learning from weak and noisy labels for semantic segmentation. In IEEE TPAMI.
Luo, Z., Mishra, A., Achkar, A., Eichel, J., Li, S., & Jodoin, P. M. (2017). Non-local deep features for salient object detection. In CVPR.
Neven, D., Brabandere, B. D., Proesmans, M., & Gool, L. V. (2019). Instance segmentation by jointly optimizing spatial embeddings and clustering bandwidth. In CVPR.
Pang, Y., Zhao, X., Zhang, L., Lu, H. (2020). Multi-scale interactive network for salient object detection. In CVPR.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In NeurIPS.
Perazzi, F., Krähenbühl, P., Pritch, Y., & Hornung, A. (2012). Saliency filters: contrast based filtering for salient region detection. In CVPR.
Pinheiro, P. O., Collobert, R., & Dollár, P. (2015). Learning to segment object candidates. In NeurIPS
Shi, J., Yan, Q., Xu, L., & Jia, J. (2015). Hierarchical image saliency detection on extended cssd. In IEEE TPAMI.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.
Siris, A., Jiao, J., Tam, G. K., Xie, X., & Lau, R. W. (2020). Inferring attention shift ranks of objects for image saliency. In CVPR.
Su, J., Li, J., Zhang, Y., Xia, C., & Tian, Y. (2019). Selectivity or invariance: Boundary-aware salient object detection. In ICCV.
Tan, X., Xu, K., Ying, C., Yiheng, Z., Ma, L., & Rynson, L. (2021). Night-time scene parsing with a large real dataset. In IEEE TIP.
Tarvainen, A., & Valpola, H. (2017). Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In NeurIPS.
Tian, X., Xu, K., Yang, X., Yin, B., & Lau, R. W. (2020). Weakly-supervised salient instance detection. In BMVC.
Wang, B., Chen, Q., Zhou, M., Zhang, Z., Jin, X., & Gai, K. (2020). Progressive feature polishing network for salient object detection. In AAAI.
Wang, L., Lu, H., Wang, Y., Feng, M., Wang, D., Yin, B., & Ruan, X. (2017). Learning to detect salient objects with image-level supervision. In CVPR.
Wang, W., & Shen, J. (2017). Deep cropping via attention box prediction and aesthetics assessment. In ICCV.
Wang, W., Shen, J., & Porikli, F. (2015). Saliency-aware geodesic video object segmentation. In CVPR.
Wang, W., Zhao, S., Shen, J., Hoi, S. C., & Borji, A. (2019). Salient object detection with pyramid attention and salient edges. In CVPR.
Wang, T., Zhang, L., Wang, S., Lu, H., Yang, G., Ruan, X., & Borji, A. (2018). Detect globally, refine locally: A novel approach to saliency detection. In CVPR.
Wei, J., Wang, S., Huang, Q (2020). F\(^3\)net: Fusion, feedback and focus for salient object detection. In AAAI.
Wei, J., Wang, S., Wu, Z., Su, C., Huang, Q., & Tian, Q. (2020). Label decoupling framework for salient object detection. In CVPR.
Woo, S., Park, J., Lee, J. Y., So Kweon, I. (2018). Cbam: Convolutional block attention module. In ECCV.
Wu, Z., Su, L., & Huang, Q. (2019). Stacked cross refinement network for edge-aware salient object detection. In ICCV.
Xu, Y., Xu, D., Hong, X., Ouyang, W., Ji, R., Xu, M., & Zhao, G. (2019). Structured modeling of joint deep feature and prediction refinement for salient object detection. In ICCV.
Yang, J., Price, B., Cohen, S., Lee, H., & Yang, M. H. (2016). Object contour detection with a fully convolutional encoder-decoder network. In CVPR.
Yang, X., Xu, K., Chen, S., He, S., Yin, B. Y., & Lau, R. (2018). Active matting. In NeurIPS.
Yang, C., Zhang, L., Lu, H., Ruan, X., & Yang, M. H. (2013). Saliency detection via graph-based manifold ranking. In CVPR.
Zeng, Y., Zhuge, Y., Lu, H., Zhang, L., Qian, M., & Yu, Y. (2019). Multi-source weak supervision for saliency detection. In CVPR.
Zhang, L., Dai, J., Lu, H., He, Y., & Wang, G. (2018). A bi-directional message passing model for salient object detection. In CVPR.
Zhang, D., Han, J., Zhang, Y. (2017). Supervision by fusion: Towards unsupervised learning of deep salient object detector. In ICCV.
Zhang, J., Sclaroff, S., Lin, Z., Shen, X., Price, B., & Mech, R. (2016). Unconstrained salient object detection via proposal subset optimization. In CVPR.
Zhang, J., Xie, J., & Barnes, N. (2020). Learning noise-aware encoder-decoder from noisy labels by alternating back-propagation for saliency detection. In ECCV.
Zhang, J., Yu, X., Li, A., Song, P., Liu, B., & Dai, Y. (2020). Weakly-supervised salient object detection via scribble annotations. In CVPR.
Zhang, J., Zhang, T., Dai, Y., Harandi, M., & Hartley, R. (2018). Deep unsupervised saliency detection: A multiple noisy labeling perspective. In CVPR.
Zhao, J. X., Liu, J., Fan, D. P., Cao, Y., Yang, J., & Cheng, M. M. (2019). Egnet: Edge guidance network for salient object detection. In ICCV.
Zhao, T., & Wu, X. (2019). Pyramid feature attention network for saliency detection. In CVPR.
Zhao, X., Pang, Y., Zhang, L., Lu, H., & Zhang, L (2020). Suppress and balance: A simple gated network for salient object detection. In ECCV.
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A (2016). Learning deep features for discriminative localization. In CVPR.
Zhou, H., Xie, X., Lai, J. H., Chen, Z., & Yang, L. (2020). Interactive two-stream decoder for accurate and fast saliency detection. In CVPR.
Zhou, Y., Zhu, Y., Ye, Q., Qiu, Q., & Jiao, J. (2018). Weakly supervised instance segmentation using class peak response. In CVPR.
Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., & Jiao, J. (2019) Learning instance activation maps for weakly supervised instance segmentation. In CVPR, pp. 3116–3125.
Zhuge, Y., Yang, G., Zhang, P., & Lu, H. (2018). Boundary-guided feature aggregation network for salient object detection. In IEEE Signal Processing Letters.
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grants 61632006, 61972067, and the Innovation Technology Funding of Dalian (Project No. 2018J11CY010, 2020JJ26GX036); a General Research Fund from RGC of Hong Kong (RGC Ref.: 11205620); and a Strategic Research Grant from City University of Hong Kong (Ref.: 7005674).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Jingdong Wang.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Xin Tian and Ke Xu have first authors.
Rynson Lau leads this project.
Rights and permissions
About this article
Cite this article
Tian, X., Xu, K., Yang, X. et al. Learning to Detect Instance-Level Salient Objects Using Complementary Image Labels. Int J Comput Vis 130, 729–746 (2022). https://doi.org/10.1007/s11263-021-01553-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-021-01553-w