Skip to main content
Log in

Depth Enhanced Cross-Modal Cascaded Network for RGB-D Salient Object Detection

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Deep modal can provide supplementary features for RGB images, which deeply improves the performance of salient object detection (SOD). However, depth images are disturbed by external factors during the acquisition process, resulting in low-quality acquisitions. Moreover, there are differences between the RGB and depth modals, so simply fusing the two modals cannot fully complement the depth information into the RGB modal. To enhance the quality of the depth image and integrate the cross-modal information effectively, we propose a depth enhanced cross-modal cascaded network (DCCNet) for RGB-D SOD. The entire cascaded network includes a depth cascaded branch, a RGB cascaded branch and a cross-modal fusion strategy. In the depth cascaded branch, we design a depth preprocessing algorithm to enhance the quality of the depth image. And in the process of depth feature extraction, we adopt four cascaded cross-modal guided modules to guide the RGB feature extraction process. In the RGB cascaded branch, we design five cascaded residual adaptive selection modules to output the RGB image feature extraction in each stage. In the cross-modal fusion strategy, a cross-modal channel-wise refinement is adopted to fuse the top-level features of the different modal feature branches. Finally, the multiscale loss is adopted to optimize the network training. Experimental results on six common RGB-D SOD datasets show that the performance of the proposed DCCNet is comparable to that of the state-of-the-art RGB-D SOD methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Mahadevan V, Vasconcelos N (2013) Biologically inspired object tracking using center-surround saliency mechanisms. IEEE Trans Pattern Anal Mach Intell 35:541–554. https://doi.org/10.1109/TPAMI.2012.98

    Article  Google Scholar 

  2. Zhang T, Liu S, Ahuja N et al (2015) Robust visual tracking via consistent low-rank sparse learning. Int J Comput Vis 111:171–190. https://doi.org/10.1007/s11263-014-0738-0

    Article  MATH  Google Scholar 

  3. Wei Y, Liang X, Chen Y et al (2017) STC: a simple to complex framework for weakly-supervised semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39:2314–2320. https://doi.org/10.1109/TPAMI.2016.2636150

    Article  Google Scholar 

  4. Li Y, Chen X, Zhu Z, et al (2019) Attention-guided unified network for panoptic segmentation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. pp 7019–7028. https://doi.org/10.1109/CVPR.2019.00719

  5. Fu J, Liu J, Tian H, et al (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. pp 3141–3149. https://doi.org/10.1109/CVPR.2019.00326

  6. Kompella A, Kulkarni RV (2021) A semi-supervised recurrent neural network for video salient object detection. Neural Comput Appl 33:2065–2083. https://doi.org/10.1007/s00521-020-05081-5

    Article  Google Scholar 

  7. Wang W, Shen J, Shao L (2018) Video salient object detection via fully convolutional networks. IEEE Trans Image Process 27:38–49. https://doi.org/10.1109/TIP.2017.2754941

    Article  MathSciNet  MATH  Google Scholar 

  8. Gidaris S, Komodakis N (2016) LocNet: improving localization accuracy for object detection. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. pp 789–798. https://doi.org/10.1109/CVPR.2016.92

  9. Cai Z, Vasconcelos N (2018) cascaded R-CNN: delving into high quality object detection. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. pp 6154–6162. https://doi.org/10.1109/CVPR.2018.00644

  10. Wang J, Zhao Z, Yang S et al (2022) Global contextual guided residual attention network for salient object detection. Appl Intell 52:6208–6226. https://doi.org/10.1007/s10489-021-02713-8

    Article  Google Scholar 

  11. Liu Y, Wang Y, Kong AWK (2021) Pixel-wise ordinal classification for salient object grading. Image Vis Comput 106:104086. https://doi.org/10.1016/j.imavis.2020.104086

    Article  Google Scholar 

  12. Meng M, Lan M, Yu J et al (2020) Constrained discriminative projection learning for image classification. IEEE Trans Image Process 29:186–198. https://doi.org/10.1109/TIP.2019.2926774

    Article  MathSciNet  MATH  Google Scholar 

  13. Liu JJ, Hou Q, Cheng MM, et al (2019) A simple pooling-based design for real-time salient object detection. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. pp 3912–3921. https://doi.org/10.1109/CVPR.2019.00404

  14. Wang W, Shen J, Cheng MM, et al (2019) An iterative and cooperative top-down and bottom-up inference network for salient object detection. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. pp 5961–5970. https://doi.org/10.1109/CVPR.2019.00612

  15. Wu R, Feng M, Guan W, et al (2019) A mutual learning method for salient object detection with intertwined multi-supervision. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. pp 8142–8151. https://doi.org/10.1109/CVPR.2019.00834

  16. Ju R, Ge L, Geng W, et al (2014) Depth saliency based on anisotropic center-surround difference. In: 2014 IEEE international conference on image processing, ICIP 2014. pp 1115–1119. https://doi.org/10.1109/ICIP.2014.7025222

  17. Ren J, Gong X, Yu L, et al (2015) Exploiting global priors for RGB-D saliency detection. In: IEEE computer society conference on computer vision and pattern recognition workshops. pp 25–32. https://doi.org/10.1109/CVPRW.2015.7301391

  18. Feng D, Barnes N, You S, et al (2016) Local background enclosure for RGB-D salient object detection. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. pp 2343–2350. https://doi.org/10.1109/CVPR.2016.257

  19. Chen H, Li Y, Su D (2019) Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection. Pattern Recognit 86:376–385. https://doi.org/10.1016/j.patcog.2018.08.007

    Article  Google Scholar 

  20. Zhou W, Lv Y, Lei J et al (2021) Global and local-contrast guides content-aware fusion for RGB-D saliency prediction. IEEE Trans Syst Man, Cybern Syst 51:3641–3649. https://doi.org/10.1109/TSMC.2019.2957386

    Article  Google Scholar 

  21. Chen H, Li Y, Su D (2020) Discriminative cross-modal transfer learning and densely cross-level feedback fusion for RGB-D salient object detection. IEEE Trans Cybern 50:4808–4820. https://doi.org/10.1109/TCYB.2019.2934986

    Article  Google Scholar 

  22. Fan DP, Zhai Y, Borji A, et al (2020) BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network. In: European conference on computer vision. pp 275–292. https://doi.org/10.1007/978-3-030-58610-2_17

  23. Chen H, Deng Y, Li Y et al (2020) RGBD salient object detection via disentangled cross-modal fusion. IEEE Trans Image Process 29:8407–8416. https://doi.org/10.1109/TIP.2020.3014734

    Article  MATH  Google Scholar 

  24. Wang N, Gong X (2019) Adaptive fusion for rgb-d salient object detection. IEEE Access 7:55277–55284. https://doi.org/10.1109/ACCESS.2019.2913107

    Article  Google Scholar 

  25. Zhao Z, Yang Q, Yang S, Wang J (2021) Depth guided cross-modal residual adaptive network for RGB-D salient object detection. J Phys. https://doi.org/10.1088/1742-6596/1873/1/012024

    Article  Google Scholar 

  26. He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. pp 770–778. https://doi.org/10.1109/CVPR.2016.90

  27. Qu L, He S, Zhang J et al (2017) RGBD salient object detection via deep fusion. IEEE Trans Image Process 26:2274–2285. https://doi.org/10.1109/TIP.2017.2682981

    Article  MathSciNet  MATH  Google Scholar 

  28. Zhao JX, Cao Y, Fan DP, et al (2019) Contrast prior and fluid pyramid integration for rgbd salient object detection. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. pp 3922–3931. https://doi.org/10.1109/CVPR.2019.00405

  29. Liu Z, Liu J, Zuo X et al (2021) Multi-scale iterative refinement network for RGB-D salient object detection. Eng Appl Artif Intell. https://doi.org/10.1016/j.engappai.2021.104473

    Article  Google Scholar 

  30. Fan DP, Lin Z, Zhang Z et al (2021) Rethinking RGB-D salient object detection: models, data sets, and large-scale benchmarks. IEEE Trans Neural Networks Learn Syst 32:2075–2089. https://doi.org/10.1109/TNNLS.2020.2996406

    Article  Google Scholar 

  31. Chen H, Li Y (2018) Progressively complementarity-aware fusion network for RGB-D salient object detection. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. pp 3051–3060. https://doi.org/10.1109/CVPR.2018.00322

  32. Yu J, Tan M, Zhang H et al (2022) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44:563–578. https://doi.org/10.1109/TPAMI.2019.2932058

    Article  Google Scholar 

  33. Piao Y, Rong Z, Zhang M, et al (2020) A2dele: Adaptive and attentive depth distiller for efficient RGB-D salient object detection. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. pp 9057–9066. https://doi.org/10.1109/CVPR42600.2020.00908

  34. Lopez-Paz D, Bottou L, Schölkopf B, et al. (2016) Unifying distillation and privileged information. In: 4th international conference on learning representations, ICLR 2016 - Conference track proceedings. https://doi.org/10.48550/arXiv.1511.03643

  35. Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23:2019–2032. https://doi.org/10.1109/TIP.2014.2311377

    Article  MathSciNet  MATH  Google Scholar 

  36. Meng M, Wang H, Yu J et al (2021) Asymmetric supervised consistent and specific hashing for cross-modal retrieval. IEEE Trans Image Process 30:986–1000. https://doi.org/10.1109/TIP.2020.3038365

    Article  MathSciNet  Google Scholar 

  37. Liu Z, Shi S, Duan Q et al (2019) Salient object detection for RGB-D image by single stream recurrent convolution neural network. Neurocomputing 363:46–57. https://doi.org/10.1016/j.neucom.2019.07.012

    Article  Google Scholar 

  38. Peng H, Li B, Xiong W, et al (2014) RGBD salient object detection: a benchmark and algorithms. In: European conference on computer vision. pp 92–109. https://doi.org/10.1007/978-3-319-10578-9_7

  39. Shigematsu R, Feng D, You S, et al (2017) Learning RGB-D salient object detection using background enclosure, depth contrast, and top-down features. In: Proceedings of the IEEE international conference on computer vision workshops. pp 2749–2757. https://doi.org/10.1109/ICCVW.2017.323

  40. Zhu C, Cai X, Huang K, et al (2019) PDNet: prior-model guided depth-enhanced network for salient object detection. In: Proceedings - IEEE international conference on multimedia and expo. pp 199–204. https://doi.org/10.1109/ICME.2019.00042

  41. Chen H, Li YF, Su D (2018) Attention-aware cross-modal cross-level fusion network for RGB-D salient object detection. In: IEEE international conference on intelligent robots and systems. pp 6821–6826. https://doi.org/10.1109/IROS.2018.8594373

  42. Piao Y, Ji W, Li J, et al (2019) Depth-induced multi-scale recurrent attention network for saliency detection. In: Proceedings of the IEEE International Conference on Computer Vision. pp 7253–7262. https://doi.org/10.1109/ICCV.2019.00735

  43. Le AV, Jung SW, Won CS (2014) Directional joint bilateral filter for depth images. Sensors 14:11362–11378. https://doi.org/10.3390/s140711362

    Article  Google Scholar 

  44. Achanta R, Shaji A, Smith K et al (2012) SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell 34:2274–2281. https://doi.org/10.1109/TPAMI.2012.120

    Article  Google Scholar 

  45. Cheng Y, Fu H, Wei X, et al (2014) Depth enhanced saliency detection method. In: ACM international conference proceeding series. pp 23–27. https://doi.org/10.1145/2632856.2632866

  46. Li N, Ye J, Ji Y et al (2017) Saliency detection on light field. IEEE Trans Pattern Anal Mach Intell 39:1605–1616. https://doi.org/10.1109/TPAMI.2016.2610425

    Article  Google Scholar 

  47. Borji A, Cheng MM, Jiang H et al (2015) Salient object detection: a benchmark. IEEE Trans Image Process 24:5706–5722. https://doi.org/10.1109/TIP.2015.2487833

    Article  MathSciNet  MATH  Google Scholar 

  48. Fan DP, Cheng MM, Liu Y, et al (2017) Structure-measure: a new way to evaluate foreground maps. In: proceedings of the IEEE international conference on computer vision. pp 4558–4567. https://doi.org/10.1109/ICCV.2017.487

  49. Fan DP, Gong C, Cao Y, et al (2018) Enhanced-alignment measure for binary foreground map evaluation. In: IJCAI international joint conference on artificial intelligence. pp 698–704. https://doi.org/10.48550/arXiv.1805.10421

  50. Perazzi F, Krahenbuhl P, Pritch Y, et al (2012) Saliency filters: Contrast based filtering for salient region detection. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. pp 733–740. https://doi.org/10.1109/CVPR.2012.6247743

  51. Li G, Zhu C (2017) A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In: Proceedings of the IEEE international conference on computer vision workshops. pp 3008–3014. https://doi.org/10.1109/ICCVW.2017.355

  52. Cong R, Lei J, Zhang C et al (2016) Saliency detection for stereoscopic images based on depth confidence analysis and multiple cues fusion. IEEE Signal Process Lett 23:819–823. https://doi.org/10.1109/LSP.2016.2557347

    Article  Google Scholar 

  53. Song H, Liu Z, Du H et al (2017) Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning. IEEE Trans Image Process 26:4204–4216. https://doi.org/10.1109/TIP.2017.2711277

    Article  MathSciNet  MATH  Google Scholar 

  54. Quo J, Ren T, Bei J (2016) Salient object detection for RGB-D image via saliency evolution. In: Proceedings - IEEE international conference on multimedia and expo pp 1–6. https://doi.org/10.1109/ICME.2016.7552907

  55. Chen H, Li Y (2019) Three-stream attention-aware network for RGB-D salient object detection. IEEE Trans Image Process 28:2825–2835. https://doi.org/10.1109/TIP.2019.2891104

    Article  MathSciNet  MATH  Google Scholar 

  56. Han J, Chen H, Liu N et al (2018) CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE Trans Cybern 48:3171–3183. https://doi.org/10.1109/TCYB.2017.2761775

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 61802111) and the Science and Technology Foundation of Henan Province of China (No. 212102210156).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Wang.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, Z., Huang, Z., Chai, X. et al. Depth Enhanced Cross-Modal Cascaded Network for RGB-D Salient Object Detection. Neural Process Lett 55, 361–384 (2023). https://doi.org/10.1007/s11063-022-10886-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-022-10886-7

Keywords

Navigation