RGB-D Salient Object Detection with Cross-Modality Modulation and Selection

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12353)


We present an effective method to progressively integrate and refine the cross-modality complementarities for RGB-D salient object detection (SOD). The proposed network mainly solves two challenging issues: 1) how to effectively integrate the complementary information from RGB image and its corresponding depth map, and 2) how to adaptively select more saliency-related features. First, we propose a cross-modality feature modulation (cmFM) module to enhance feature representations by taking the depth features as prior, which models the complementary relations of RGB-D data. Second, we propose an adaptive feature selection (AFS) module to select saliency-related features and suppress the inferior ones. The AFS module exploits multi-modality spatial feature fusion with the self-modality and cross-modality interdependencies of channel features are considered. Third, we employ a saliency-guided position-edge attention (sg-PEA) module to encourage our network to focus more on saliency-related regions. The above modules as a whole, called cmMS block, facilitates the refinement of saliency features in a coarse-to-fine fashion. Coupled with a bottom-up inference, the refined saliency features enable accurate and edge-preserving SOD. Extensive experiments demonstrate that our network outperforms state-of-the-art saliency detectors on six popular RGB-D SOD benchmarks.



This research was supported by SenseTime-NTU Collaboration Project, Singapore MOE AcRF Tier 1 (2018-T1-002-056), NTU NAP, in part by the Fundamental Research Funds for the Central Universities under Grant 2019RC039, and in part by China Postdoctoral Science Foundation Grant 2019M660438.

Supplementary material

504445_1_En_14_MOESM1_ESM.pdf (2.2 mb)
Supplementary material 1 (pdf 2293 KB)


  1. 1.
    Boer, P.T.D., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134(1), 19–67 (2005)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Borji, A., Cheng, M.M., Jiang, H., Li, J.: Salient object detection: a benchmark. IEEE Trans. Image Process. 24(12), 5706–5722 (2015)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Chen, H., Li, Y.: Progressively complementarity-aware fusion network for RGB-D salient object detection. In: CVPR, pp. 3051–3060 (2018)Google Scholar
  4. 4.
    Chen, H., Li, Y.: Three-stream attention-aware network for RGB-D salient object detection. IEEE Trans. Image Process. 28(6), 2825–2835 (2019)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Chen, H., Li, Y., Su, D.: Discriminative cross-modal transfer learning and densely cross-level feedback fusion for RGB-D salient object detection. IEEE Trans. Cybern., 1–13 (2019) Google Scholar
  6. 6.
    Chen, H., Li, Y., Su, D.: Multi-modal fusion network with multiscale multi-path and cross-modal interactions for RGB-D salient object detection. Pattern Recognit. 86, 376–385 (2019)CrossRefGoogle Scholar
  7. 7.
    Chen, L., et al.: SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: CVPR, pp. 5659–5667 (2017)Google Scholar
  8. 8.
    Cong, R., Lei, J., Fu, H., Cheng, M.M., Lin, W., Huang, Q.: Review of visual saliency detection with comprehensive information. IEEE Trans. Circuits Syst. Video Technol. 29(10), 2941–2959 (2019)CrossRefGoogle Scholar
  9. 9.
    Cong, R., Lei, J., Fu, H., Hou, J., Huang, Q., Kwong, S.: Going from RGB to RGBD saliency: a depth-guided transformation model. IEEE Trans. Cybern. 50(8), 3627–3639 (2020)CrossRefGoogle Scholar
  10. 10.
    Cong, R., Lei, J., Zhang, C., Huang, Q., Cao, X., Hou, C.: Saliency detection for stereoscopic images based on depth confidence analysis and multiple cues fusion. IEEE Sig. Process. Lett. 23(6), 819–823 (2016)CrossRefGoogle Scholar
  11. 11.
    Fan, D.P., Cheng, M.M., Liu, Y., Li, T., Borji, A.: Structure-measure: a new way to evaluate foreground maps. In: ICCV, pp. 4548–4557 (2017)Google Scholar
  12. 12.
    Fan, D.P., Zhai, Y., Borji, A., Yang, J., Shao, L.: BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) ECCV 2020. LNCS, vol. 12357. Springer, Cham (2020). Scholar
  13. 13.
    Feng, D., Barnes, N., You, S., McCarthy, C.: Local background enclosure for RGB-D salient object detection. In: CVPR, pp. 2343–2350 (2016)Google Scholar
  14. 14.
    Feng, M., Lu, H., Ding, E.: Attentive feedback network for boundary-aware salient object detection. In: CVPR, pp. 1623–1632 (2019)Google Scholar
  15. 15.
    Fu, J., Liu, J., Tian, H., Li, Y.: Dual attention network for scene segmentation. In: CVPR, pp. 3146–3154 (2019)Google Scholar
  16. 16.
    Fu, K.F., Fan, D.P., Ji, G.P., Zhao, Q.: JL-DCF: joint learning and densely-cooperative fusion framework for RGB-D salient object detection. In: CVPR, pp. 3052–3062 (2020)Google Scholar
  17. 17.
    Guan, W., Wang, T., Qi, J., Zhang, L., Lu, H.: Edge-aware convolutional neural network based salient object detection. IEEE Sig. Process. Lett. 26, 114–118 (2018)CrossRefGoogle Scholar
  18. 18.
    Han, J., Chen, H., Liu, N., Yan, C., Li, X.: CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE Trans. Cybern. 48(11), 3171–3183 (2018)CrossRefGoogle Scholar
  19. 19.
    Hou, Q., Cheng, M.M., Hu, X., Borji, A., Tu, Z., Torr, P.H.: Deeply supervised salient object detection with short connections. IEEE Trans. Pattern Anal. Mach. Intell. 41(4), 815–828 (2019)CrossRefGoogle Scholar
  20. 20.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR, pp. 7132–7141 (2018)Google Scholar
  21. 21.
    Ju, R., Liu, Y., Ren, T., Ge, L., Wu, G.: Depth-aware salient object detection using anisotropic center-surround difference. Sig. Process. Image Commun. 38, 115–126 (2015)CrossRefGoogle Scholar
  22. 22.
    Li, C., et al.: ASIF-Net: attention steered interweave fusion network for RGBD salient object detection. IEEE Trans. Cybern., 1–13 (2020)Google Scholar
  23. 23.
    Li, G., Zhu, C.: A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In: ICCVW, pp. 3008–3014 (2017)Google Scholar
  24. 24.
    Li, N., Ye, J., Ji, Y., Ling, H., Yu, J.: Saliency detection on light field. In: CVPR, pp. 2806–2813 (2014)Google Scholar
  25. 25.
    Li, X., Lu, H., Zhang, L., Ruan, X., Yang, M.H.: Saliency detection via dense and sparse reconstruction. In: ICCV, pp. 2976–2983 (2013)Google Scholar
  26. 26.
    Liu, J., Hou, Q., Cheng, M.M., Feng, J., Jiang, J.: A simple pooling-based design for real-time salient object detection. In: CVPR, pp. 3917–3926 (2019)Google Scholar
  27. 27.
    Niu, Y., Geng, Y., Li, X., Liu, F.: Leveraging stereopsis for saliency analysis. In: CVPR, pp. 454–461 (2012)Google Scholar
  28. 28.
    Oreshkin, B.N., Rodriguez, P., Lacoste, A.: TADAM: task dependent adaptive metric for improved few-shot learning. In: NeurIPS, pp. 721–731 (2018)Google Scholar
  29. 29.
    Peng, H., Li, B., Ling, H., Hu, W., Xiong, W., Maybank, S.J.: Salient object detection via structured matrix decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 818–832 (2017)CrossRefGoogle Scholar
  30. 30.
    Peng, H., Li, B., Xiong, W., Hu, W., Ji, R.: RGBD salient object detection: a benchmark and algorithms. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691. Springer, Cham (2014). Scholar
  31. 31.
    Perez, E., Strub, F., de Vries, H., Dumoulin, V., Courville, A.: FiLM: Visual reasoning with a general conditioning layer. In: AAAI, pp. 3942–3951 (2018)Google Scholar
  32. 32.
    Piao, Y., Ji, W., Li, J., Zhang, M., Lu, H.: Depth-induced multi-scale recurrent attention network for saliency detection. In: ICCV, pp. 7254–7263 (2019)Google Scholar
  33. 33.
    Piao, Y., Rong, Z., Zhang, M., Ren, W., Lu, H.: A2dele: adaptive and attentive depth distiller for efficient RGB-D salient object detection. In: CVPR, pp. 9060–9069 (2020)Google Scholar
  34. 34.
    Qin, X., Zhang, Z., Huang, C., Gao, C., Dehghan, M., Jagersand, M.: BASNet: boundary-aware salient object detection. In: CVPR, pp. 7479–7489 (2019)Google Scholar
  35. 35.
    Qu, L., He, S., Zhang, J., Tian, J., Tang, Y., Yang, Q.: RGBD salient object detection via deep fusion. IEEE Trans. Image Process. 26(5), 2274–2285 (2017)MathSciNetCrossRefGoogle Scholar
  36. 36.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  37. 37.
    Song, H., Liu, Z., Du, H., Sun, G., Le Meur, O., Ren, T.: Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning. IEEE Trans. Image Process. 26(9), 4204–4216 (2017)MathSciNetCrossRefGoogle Scholar
  38. 38.
    Vaswani, A., et al.: Attention is all you need. In: NeurIPS, pp. 5998–6008 (2017)Google Scholar
  39. 39.
    Wang, W., Lai, Q., Fu, H., Shen, J., Ling, H.: Salient object detection in the deep learning era: An in-depth survey. arXiv preprint arXiv:1904.09146 (2019)
  40. 40.
    Wang, X., Yu, K., Dong, C., Loy, C.C.: Recovering realistic texture in image super-resolution by deep spatial feature transform. In: CVPR, pp. 606–615 (2018)Google Scholar
  41. 41.
    Yu, D., Fu, J., Mei, T., Rui, Y.: Multi-level attention networks for visual question answering. In: CVPR, pp. 4709–4717 (2017)Google Scholar
  42. 42.
    Yuan, Y., Li, C., Kim, J., Cai, W., Feng, D.D.: Reversion correction and regularized random walk ranking for saliency detection. IEEE Trans. Image Process. 27(3), 1311–1322 (2018)MathSciNetCrossRefGoogle Scholar
  43. 43.
    Zhang, J., et al.: UC-Net: uncertainty inspired RGB-D saliency detection via conditional variational autoencoders. In: CVPR, pp. 8582–8591 (2020)Google Scholar
  44. 44.
    Zhang, M., Ren, W., Piao, Y., Rong, Z., Lu, H.: Select, supplement and focus for RGB-D saliency detection. In: CVPR, pp. 3472–3481 (2020)Google Scholar
  45. 45.
    Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211. Springer, Cham (2018). Scholar
  46. 46.
    Zhao, J., Cao, Y., Fan, D.P., Cheng, M.M., Li, X.Y., Zhang, L.: Contrast prior and fluid pyramid integration for RGBD salient object detection. In: CVPR, pp. 3927–3936 (2019)Google Scholar
  47. 47.
    Zhao, J., Liu, J., Fan, D.P., Cao, Y., Yang, J., Cheng, M.M.: EGNet: edge guidance network for salient object detection. In: ICCV, pp. 8779–8788 (2019)Google Scholar
  48. 48.
    Zhu, C., Li, G.: A multilayer backpropagation saliency detection algorithm and its applications. Multimed. Tools Appl. 77(19), 25181–25197 (2018). Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Nanyang Technological UniversitySingaporeSingapore
  2. 2.Beijing Jiaotong UniversityBeijingChina
  3. 3.Dalian University of TechnologyDalianChina
  4. 4.Institute of Computing TechnologyChinese Academy of SciencesBeijingChina

Personalised recommendations