Skip to main content

Guided residual network for RGB-D salient object detection with efficient depth feature learning


RGB-D salient object detection aims at identifying the most attractive parts from a RGB image and its corresponding depth image, which has been widely applied in many computer vision tasks. However, there are still two challenges: (1) how to quickly and effectively integrate the cross-modal features from the RGB-D data; and (2) how to mitigate the negative impact from the low-quality depth map. The previous methods mostly employ a two-stream architecture which adopts two backbone network to process RGB-D data and ignore the quality of depth map. In this paper, we propose a guided residual network to address these two issues. On the one hand, we design a simpler and efficient depth branch only using one convolutional layer and three residual modules to extract depth features instead of employing a pre-trained backbone to handle the depth data, and fuse RGB features and depth features in a multi-scale manner for refinement with top-down guidance. On the other hand, we add adaptive weight to depth maps to control the fusion between them, which mitigates the negative influence of unreliable depth map. Experimental results compared with 13 state-of-the-art methods on 7 datasets demonstrate the validity of the proposed approach both quantitatively and qualitatively, especially in efficiency (102 FPS) and compactness (64.2 MB).

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. 1.

    Zhao, R., Ouyang, W., Wang, X.: Unsupervised salience learning for person re-identification. In: CVPR, pp. 3586–3593 (2013)

  2. 2.

    Zhang, F., Bo, D., Zhang, L.: Saliency-guided unsupervised feature learning for scene classification. IEEE TGRS 53(4), 2175–2184 (2014)

    Google Scholar 

  3. 3.

    Hong, S., You, T., Kwak, S., Han, B.: Online tracking by learning discriminative saliency map with convolutional neural network. In: ICML, pp. 597–606 (2015)

  4. 4.

    Wang, W., Shen, J., Yang, R., Porikli, F.: Saliency-aware video object segmentation. IEEE TPAMI 40(1), 20–33 (2017)

    Article  Google Scholar 

  5. 5.

    Su, J., Li, J., Zhang, Y., Xia, C., Tian, Y.: Selectivity or Invariance: Boundary-aware Salient Object Detection. In: ICCV, pp. 3799-3808 (2019)

  6. 6.

    Zhao, T., Wu, X.: Pyramid Feature Attention Network for Saliency detection. In CVPR, pp. 3085-3094 (2019)

  7. 7.

    Zeng, Y., Zhuge, Y., Lu, H., Zhang, L., Qian, M., Yu, Y.: Multi-source weak supervision for saliency detection. In: CVPR, pp. 6074–6083 (2009)

  8. 8.

    Liu, J.-Ji., Hou, Q., Cheng, M.-M., Feng, J., Jiang, J.: A Simple Pooling-Based Design for Real-Time Salient Object Detection. In: CVPR, pp. 3917–3926 (2019)

  9. 9.

    Zhao, J.-X., Liu, J.-J., Fan, D.-P., Cao, Y., Yang, J.-F., Cheng, M.-M.: EGNet: Edge Guidance Network for Salient Object Detection. In: ICCV, pp. 8779–8788 (2019)

  10. 10.

    Niu, Y., Geng, Y., Li, X., Liu, F.: Leveraging stereopsis for saliency analysis. In: CVPR, pp. 454–461 (2012)

  11. 11.

    Zhu, C., Cai, X., Huang, K., Li, T.H., Li, G.: Pdnet: Prior-model guided depth-enhanced network for salient object detection. In: ICME, pp. 199–204. IEEE (2019)

  12. 12.

    Piao, Y., Rong, Z., Zhang, M., Ren, W., Lu, H.: A2dele: Adaptive and Attentive Depth Distiller for Efficient RGB-D Salient Object Detection. In: CVPR, pp. 9060-9069 (2020)

  13. 13.

    Chen, Z., Huang, Q.: Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection. In: CVPR (2020)

  14. 14.

    Li, C., Cong, R., Kwong, S., Hou, J., Fu, H., Zhu, G., Zhang, D., Huang, Q.: ASIF-Net: attention steered interweave fusion network for RGB-D salient object detection. IEEE Trans. Cybern 51(1), 88–100 (2021)

  15. 15.

    Fu, K., Fan, D.-P., Ji, G.-P., Zhao, Q.: JL-DCF: Joint learning and densely-cooperative fusion framework for RGB-D salient object detection. In: CVPR, pp. 3052–3062 (2020)

  16. 16.

    Zhang, J., Fan, D.-P., Dai, Y., Anwar, S.: UC-Net: Uncertainty inspired RGB-D saliency detection via conditional variational autoencoders. In: CVPR, pp. 8582–8591 (2020)

  17. 17.

    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

  18. 18.

    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

  19. 19.

    Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multi-scale structural similarity for image quality assessment. In: the Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, 2003, vol. 2, pp. 1398–1402. Ieee (2003)

  20. 20.

    de Boer, P.-T., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. OR 134(1), 19–67 (2005)

    MathSciNet  Article  Google Scholar 

  21. 21.

    Mattyus, G., Luo, W., Urtasun, R.: Deep-roadmapper: Extracting road topology from aerial images

  22. 22.

    Ju, R., Ge, L., Geng, W., Ren, T., Wu, G.: Depth saliency based on anisotropic center-surround difference. In: ICIP, pp. 1115–1119 (2014)

  23. 23.

    Peng, H., Li, B., Xiong, W., Hu, W., Ji, R.: RGBD salient object detection: A benchmark and algorithms. In: ECCV, pp. 92–109 (2014)

  24. 24.

    Li, N., Ye, J., Ji, Y., Ling, H., Yu, J.: Saliency detection on light field. In: CVPR, pp. 2806–2813 (2014)

  25. 25.

    Zhu, C., Li, G.: A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In: ICCV, pp. 3008–3014 (2017)

  26. 26.

    Piao, Y., Ji, W., Li, J., Zhang, M., Lu, H.: Depth-induced multi-scale recurrent attention network for saliency detection. In: ICCV (2019)

  27. 27.

    Fan, D.-P., Lin, Z., Zhang, Z., Zhu, M., Cheng, M.-M.: Rethinking RGB-D salient object detection: Models, datasets, and large-scale benchmarks. IEEE TNNLS (2020)

  28. 28.

    Borji, A., Cheng, M.-M., Jiang, H., Li, J.: Salient object detection: A benchmark. IEEE TIP 24(12), 5706–5722 (2015)

    MathSciNet  MATH  Google Scholar 

  29. 29.

    Fan, D.-P., Cheng, M.-M., Liu, Y., Li, T., Borji, A.: Structure-measure: A New Way to Evaluate Foreground Maps. In: ICCV, pp. 4548–4557 (2017)

  30. 30.

    Zhao, J.-X., Cao, Y., Fan, D.-P., Cheng, M.-M., Li, X.-Y., Zhang, L.: Contrast prior and fluid pyramid integration for rgbd salient object detection. In: CVPR, pp. 3927–3936 (2019)

  31. 31.

    Fan, D.-P., Gong, C., Cao, Y., Ren, B., Cheng, M., Borji, A.: Enhanced-alignment measure for binary foreground map evaluation. In: IJCAI, pp. 698–704 (2018)

  32. 32.

    Liangqiong, Q., He, S., Zhang, J., Tian, J., Tang, Y., Yang, Q.: Rgbd salient object detection via deep fusion. IEEE TIP 26(5), 2274–2285 (2017)

    MathSciNet  MATH  Google Scholar 

  33. 33.

    Han, J., Chen, H., Liu, N., Chenggang Y, Xuelong L.: CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE TCYB, pp. 3171–3183 (2018)

  34. 34.

    Chen, H., Li, Y., Dan, S.: Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection. PR 86, 376–385 (2019)

    Google Scholar 

  35. 35.

    Chen, H., Li, Y.: Three-stream Attention-aware Network for RGB-D Salient Object Detection. IEEE TIP, pp. 2825–2835 (2019)

  36. 36.

    Chen, H., Li, Y.: Progressively complementarity-aware fusion network for RGB-D Salient Object Detection. In: IEEE CVPR, pp. 3051–3060 (2018)

  37. 37.

    Zhao, J.-X., Cao, Y., Fan, D.-P., Cheng, M.-M., Li, X.-Y., Zhang, L.: Contrast Prior and Fluid Pyramid Integration for RGBD Salient Object Detection. In: IEEE CVPR (2019)

  38. 38.

    Li, G., Liu, Z., Ling, H.: ICNet: information conversion Nnetwork for RGB-D based salient object detection. IEEE Trans. Image Process. 29, 4873–4884 (2020)

  39. 39.

    Zhang, M., Ren, W., Piao, Y., Rong, Z., Lu, H.: Select, Supplement and Focus for RGB-D Saliency Detection. In: CVPR, pp. 3472–3481 (2020)

  40. 40.

    Liu, N., Zhang, N., Han, J.: Learning Selective Self-Mutual Attention for RGB-D Saliency Detection. In: CVPR, pp. 13756–13765 (2020)

  41. 41.

    Hou, Q., Cheng, M.-M., Hu, X., Borji, A., Tu, Z., Torr, P.H.S.: Deeply supervised salient object detection with short connections. IEEE TPAMI 41(4), 815–828 (2019)

    Article  Google Scholar 

  42. 42.

    Chen, H., Li, Y.: Progressively complementarity-aware fusion network for rgb-d salient object detection. In: CVPR, pp. 3051–3060 (2018)

  43. 43.

    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)

  44. 44.

    Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene cnns. In: ICLR (2015)

  45. 45.

    Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In: ECCV (2018)

  46. 46.

    Liu, S., Huang, D., Wang, Yu.: Receptive Field Block Net for Accurate and Fast Object Detection. In: ECCV, pp. 385–400 (2018)

  47. 47.

    Diederik, Kingma, P., Ba, J.: Adam: A method for stochastic optimization. In: ICLR (2015)

  48. 48.

    Liu, Z., Duan, Q., Shi, S., et al.: Multi-level progressive parallel attention guided salient object detection for RGB-D images. Vis Comput (2020).

Download references


This work was supported by the Natural Science Foundation of China (No. 61802336), Jiangsu Province 7th Projects for Summit Talents in Six Main Industries, Electronic Information Industry (DZXX-149, No.110).

Author information



Corresponding author

Correspondence to Shuhan Chen.

Ethics declarations

Conflict of interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Chen, S., Lv, X. et al. Guided residual network for RGB-D salient object detection with efficient depth feature learning. Vis Comput (2021).

Download citation


  • RGB-D salient object detection
  • Guided residual network
  • Efficient depth feature learning
  • Adaptive depth weight