Cross-Modal Weighting Network for RGB-D Salient Object Detection

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12362)


Depth maps contain geometric clues for assisting Salient Object Detection (SOD). In this paper, we propose a novel Cross-Modal Weighting (CMW) strategy to encourage comprehensive interactions between RGB and depth channels for RGB-D SOD. Specifically, three RGB-depth interaction modules, named CMW-L, CMW-M and CMW-H, are developed to deal with respectively low-, middle- and high-level cross-modal information fusion. These modules use Depth-to-RGB Weighing (DW) and RGB-to-RGB Weighting (RW) to allow rich cross-modal and cross-scale interactions among feature layers generated by different network blocks. To effectively train the proposed Cross-Modal Weighting Network (CMWNet), we design a composite loss function that summarizes the errors between intermediate predictions and ground truth over different scales. With all these novel components working together, CMWNet effectively fuses information from RGB and depth channels, and meanwhile explores object localization and details across scales. Thorough evaluations demonstrate CMWNet consistently outperforms 15 state-of-the-art RGB-D SOD methods on seven popular benchmarks.


RGB-D salient object detection Cross-Modal Weighting Depth-to-RGB weighting RGB-to-RGB weighting 



This work was supported by the National Natural Science Foundation of China under Grant 61771301. Linwei Ye and Yang Wang were supported by NSERC.


  1. 1.
    Borji, A., Cheng, M.-M., Hou, Q., Jiang, H., Li, J.: Salient object detection: a survey. Comput. Vis. Media 5(2), 117–150 (2019)CrossRefGoogle Scholar
  2. 2.
    Borji, A., Cheng, M.M., Jiang, H., Li, J.: Salient object detection: a benchmark. IEEE TIP 24(12), 5706–5722 (2015)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: COMPSTAT (2010)Google Scholar
  4. 4.
    Chen, H., Li, Y.: Progressively complementarity-aware fusion network for RGB-D salient object detection. In: IEEE CVPR (2018)Google Scholar
  5. 5.
    Chen, H., Li, Y.: Three-stream attention-aware network for RGB-D salient object detection. IEEE TIP 28(6), 2825–2835 (2019)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Chen, H., Li, Y., Su, D.: Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection. Pattern Recogn. 86, 376–385 (2019)CrossRefGoogle Scholar
  7. 7.
    Cheng, Y., Fu, H., Wei, X., Xiao, J., Cao, X.: Depth enhanced saliency detection method. In: ACM ICIMCS (2014)Google Scholar
  8. 8.
    Cong, R., Lei, J., Fu, H., Hou, J., Huang, Q., Kwong, S.: Going from RGB to RGBD saliency: a depth-guided transformation model. IEEE TCYB 50, 3627–3639 (2019). Scholar
  9. 9.
    Cong, R., Lei, J., Zhang, C., Huang, Q., Cao, X., Hou, C.: Saliency detection for stereoscopic images based on depth confidence analysis and multiple cues fusion. IEEE SPL 23(6), 819–823 (2016)Google Scholar
  10. 10.
    Ding, Y., Liu, Z., Huang, M., Shi, R., Wang, X.: Depth-aware saliency detection using convolutional neural networks. J. Vis. Commun. Image Represent. 61, 1–9 (2019)CrossRefGoogle Scholar
  11. 11.
    Fan, D.P., Cheng, M.M., Liu, Y., Li, T., Borji, A.: Structure-measure: a new way to evaluate foreground maps. In: IEEE ICCV (2017)Google Scholar
  12. 12.
    Fan, D.P., Gong, C., Cao, Y., Ren, B., Cheng, M.M., Borji, A.: Enhanced-alignment measure for binary foreground map evaluation. In: IJCAI (2018)Google Scholar
  13. 13.
    Fan, D.P., et al.: Rethinking RGB-D salient object detection: models, datasets, and large-scale benchmarks. arXiv preprint arXiv:1907.06781 (2019)
  14. 14.
    Fan, X., Liu, Z., Sun, G.: Salient region detection for stereoscopic images. In: IEEE DSP (2014)Google Scholar
  15. 15.
    Fang, Y., Wang, J., Narwaria, M., Callet, P.L., Lin, W.: Saliency detection for stereoscopic images. IEEE TIP 23(6), 2625–2636 (2014)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Feng, D., Barnes, N., You, S., McCarthy, C.: Local background enclosure for RGB-D salient object detection. In: IEEE CVPR (2016)Google Scholar
  17. 17.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS (2010)Google Scholar
  18. 18.
    Guo, J., Ren, T., Bei, J.: Salient object detection for RGB-D image via saliency evolution. In: IEEE ICME (2016)Google Scholar
  19. 19.
    Guo, J., Ren, T., Jia, B., Zhu, Y.: Salient object detection in RGB-D image based on saliency fusion and propagation. In: ACM ICIMCS (2015)Google Scholar
  20. 20.
    Han, J., Chen, H., Liu, N., Yan, C., Li, X.: CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE TCYB 48(11), 3171–3183 (2018)Google Scholar
  21. 21.
    Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE TPAMI 20(11), 1254–1259 (1998)CrossRefGoogle Scholar
  22. 22.
    Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: ACM MM (2014)Google Scholar
  23. 23.
    Ju, R., Ge, L., Geng, W., Ren, T., Wu, G.: Depth saliency based on anisotropic center-surround difference. In: IEEE ICIP (2014)Google Scholar
  24. 24.
    Li, G., Zhu, C.: A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In: IEEE ICCVW (2017)Google Scholar
  25. 25.
    Li, N., Ye, J., Ji, Y., Ling, H., Yu, J.: Saliency detection on light field. In: IEEE CVPR (2014)Google Scholar
  26. 26.
    Liu, Z., Shi, S., Duan, Q., Zhang, W., Zhao, P.: Salient object detection for RGB-D image by single stream recurrent convolution neural network. Neurocomputing 363, 46–57 (2019)CrossRefGoogle Scholar
  27. 27.
    Margolin, R., Zelnik-Manor, L., Tal, A.: How to evaluate foreground maps. In: IEEE CVPR (2014)Google Scholar
  28. 28.
    Niu, Y., Geng, Y., Li, X., Liu, F.: Leveraging stereopsis for saliency analysis. In: IEEE CVPR (2012)Google Scholar
  29. 29.
    Peng, H., Li, B., Xiong, W., Hu, W., Ji, R.: RGBD salient object detection: a benchmark and algorithms. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 92–109. Springer, Cham (2014). Scholar
  30. 30.
    Piao, Y., Ji, W., Li, J., Zhang, M., Lu, H.: Depth-induced multi-scale recurrent attention network for saliency detection. In: IEEE ICCV (2019)Google Scholar
  31. 31.
    Qu, L., He, S., Zhang, J., Tian, J., Tang, Y., Yang, Q.: RGBD salient object detection via deep fusion. IEEE TIP 26(5), 2274–2285 (2017)MathSciNetzbMATHGoogle Scholar
  32. 32.
    Ren, J., Gong, X., Yu, L., Zhou, W., Yang, M.Y.: Exploiting global priors for RGB-D saliency detection. In: IEEE CVPRW (2015)Google Scholar
  33. 33.
    Shigematsu, R., Feng, D., You, S., Barnes, N.: Learning RGB-D salient object detection using background enclosure, depth contrast, and top-down features. In: IEEE ICCVW (2017)Google Scholar
  34. 34.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  35. 35.
    Song, H., Liu, Z., Du, H., Sun, G., Bai, C.: Saliency detection for RGBD images. In: ACM ICIMCS (2015)Google Scholar
  36. 36.
    Song, H., Liu, Z., Du, H., Sun, G., Olivier, L.M., Ren, T.: Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning. IEEE TIP 26(9), 4204–4216 (2017)MathSciNetzbMATHGoogle Scholar
  37. 37.
    Wang, A., Wang, M.: RGB-D salient object detection via minimum barrier distance transform and saliency fusion. IEEE SPL 24(5), 663–667 (2017)Google Scholar
  38. 38.
    Wang, N., Gong, X.: Adaptive fusion for RGB-D salient object detection. IEEE Access 7, 55277–55284 (2019)CrossRefGoogle Scholar
  39. 39.
    Wang, W., Lai, Q., Fu, H., Shen, J., Ling, H.: Salient object detection in the deep learning era: an in-depth survey. arXiv preprint arXiv:1904.09146 (2019)
  40. 40.
    Xie, S., Tu, Z.: Holistically-nested edge detection. In: IEEE ICCV (2015)Google Scholar
  41. 41.
    Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: ICLR (2016)Google Scholar
  42. 42.
    Zhao, J.X., Cao, Y., Fan, D.P., Cheng, M.M., Li, X.Y., Zhang, L.: Contrast prior and fluid pyramid integration for RGBD salient object detection. In: IEEE CVPR (2019)Google Scholar
  43. 43.
    Zhou, Z., Wang, Z., Lu, H., Wang, S., Sun, M.: Global and local sensitivity guided key salient object re-augmentation for video saliency detection. arXiv preprint arXiv:1811.07480 (2018)
  44. 44.
    Zhu, C., Cai, X., Huang, K., Li, T.H., Li, G.: PDNet: prior-model guided depth-enhanced network for salient object detection. In: IEEE ICME (2019)Google Scholar
  45. 45.
    Zhu, C., Li, G., Wang, W., Wang, R.: An innovative salient object detection using center-dark channel prior. In: IEEE ICCVW (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Shanghai UniversityShanghaiChina
  2. 2.University of ManitobaWinnipegCanada
  3. 3.Stony Brook UniversityStony BrookUSA
  4. 4.Huawei Technologies CanadaMarkhamCanada

Personalised recommendations