Skip to main content
Log in

Cross-modal refined adjacent-guided network for RGB-D salient object detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

RGB and depth modalities can be exploited to effectively recognize the most eye-catching objects in different scenes. Therefore, RGB-D salient object detection (RGB-D SOD) has been a popular direction focused by researchers. Particularly in recent years, various newfangled RGB-D SOD algorithms have been proposed endlessly and achieved outstanding performance. However, most approaches adopt the common pyramid structure to integrate multi-scale cues but ignore the complementarity of features in cross-layers. Besides, it is still challenging to fully utilize RGB and Depth information for cross-modal interaction. To compensate for these shortcomings, we propose a CRA-Net (Cross-modal Refined Adjacent-guided Network), which takes advantage of the high-level semantic information contained in the high layers to guide the details of the local characteristics in the low layers for improving detection accuracy. Specifically, a multiplier refinement module (MRM) is proposed to adequately carry out the information interaction between two modalities, in which a five-layer refinement mechanism is adopted to enhance cross-modal fusion representations. Moreover, for the purpose of obliterating the interference of non-significant factors in the low-level backgrounds, we design an adjacent-guided aggregation module (AAM). The multi-level features are fed in groups into two AAMs with identical structures. By utilizing an adjacent-layer guidance strategy to effectively guide multi-scale features assemblage from deep to shallow. Numerous experiments show that our CRA-Net is competitive for four common evaluation metrics on four popular datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability

The NJU2K dataset, NLPR dataset, SSD dataset, and SIP dataset used in the study are publicly available. The NJU2K dataset can be downloaded from the website: http://mcg.nju.edu.cn/resource.html. The NLPR dataset can be downloaded from the website: https://sites.google.com/site/rgbdsaliency/dataset. The SSD dataset can be downloaded from the website: https://pan.baidu.com/s/1zNL9-KSQwGILdAAfStMXWQ. The SIP dataset can be downloaded from the website: https://pan.baidu.com/s/14VjtMBn0_bQDRB0gMPznoA. All data, models, or code generated or used during the current study are available from the corresponding author by reasonable request.

References

  1. Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned salient region detection. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 1597–1604

  2. Arbelaez P, Maire M, Fowlkes C, Malik J (2010) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 33(5):898–916

    Article  Google Scholar 

  3. Chen S, Tian Y (2013) Margin-constrained multiple kernel learning based multi-modal fusion for affect recognition. In: 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG). IEEE, pp 1–7

  4. Cheng Y, Fu H, Wei X, Xiao J, Cao X (2014) Depth enhanced saliency detection method. In: Proceedings of international conference on internet multimedia computing and service, pp 23–27

  5. Chen H, Li Y (2018) Progressively complementarity-aware fusion network for RGB-D salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3051–3060

  6. Chen H, Li Y, Su D (2019) Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection. Pattern Recogn 86:376–385

    Article  Google Scholar 

  7. Chen H, Deng Y, Li Y, Hung T-Y, Lin G (2020) RGBD salient object detection via disentangled cross-modal fusion. IEEE Trans Image Process 29:8407–8416

    Article  MATH  Google Scholar 

  8. Chen Q, Fu K, Liu Z, Chen G, Du H, Qiu B, Shao L (2021) EF-Net: a novel enhancement and fusion network for RGB-D saliency detection. Pattern Recogn 112:107740

    Article  Google Scholar 

  9. Chen Q, Liu Z, Zhang Y, Fu K, Zhao Q, Du H (2021) RGB-D salient object detection via 3d convolutional neural networks. arXiv:2101.10241

  10. Chen Y, Ma T, Yang X, Wang J, Song B, Zeng X (2021) MUFFIN: multi-scale feature fusion for drug–drug interaction prediction. Bioinformatics 37(17):2651–2658

    Article  Google Scholar 

  11. Cheng B, Schwing A, Kirillov A (2021) Per-pixel classification is not all you need for semantic segmentation. Adv Neural Inf Process Syst :34

  12. Ciptadi A, Hermans T, Rehg JM (2013) An in depth view of saliency. Georgia Institute of Technology

  13. Desingh K, Krishna KM, Rajan D, Jawahar C (2013) Depth really matters: improving visual salient region detection with depth. In: BMVC, pp 1–11

  14. Ding Y, Liu Z, Huang M, Shi R, Wang X (2019) Depth-aware saliency detection using convolutional neural networks. J Vis Commun Image Represent 61:1–9

    Article  Google Scholar 

  15. Fan X, Liu Z, Sun G (2014) Salient region detection for stereoscopic images. In: 2014 19th international conference on digital signal processing. IEEE, pp 454–458

  16. Fan D-P, Cheng M-M, Liu Y, Li T, Borji A (2017) Structure-measure: a new way to evaluate foreground maps. In: Proceedings of the IEEE international conference on computer vision, pp 4548–4557

  17. Fan D-P, Gong C, Cao Y, Ren B, Cheng M-M, Borji A (2018) Enhanced-alignment measure for binary foreground map evaluation. arXiv:1805.10421

  18. Fan D-P, Zhai Y, Borji A, Yang J, Shao L (2020) BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network. In: European conference on computer vision. Springer, pp 275–292

  19. Fan D-P, Lin Z, Zhang Z, Zhu M, Cheng M-M (2020) Rethinking RGB-D salient object detection: models, data sets, and large-scale benchmarks. IEEE Trans Neural Netw Learn Syst 32(5):2075–2089

    Article  Google Scholar 

  20. Feng D, Barnes N, You S, McCarthy C (2016) Local background enclosure for RGB-D salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2343–2350

  21. Fidler S, Sharma A, Urtasun R (2013) A sentence is worth a thousand pixels. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1995–2002

  22. Fu K, Fan D-P, Ji G-P, Zhao Q (2020) JL-DCF: joint learning and densely-cooperative fusion framework for RGB-D salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3052–3062

  23. Fu K, Fan D-P, Ji G-P, Zhao Q, Shen J, Zhu C (2021) Siamese network for RGB-D salient object detection and beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence

  24. Gao S-H, Cheng M-M, Zhao K, Zhang X-Y, Yang M-H, Torr P (2019) Res2Net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell 43(2):652–662

    Article  Google Scholar 

  25. Guo J, Ren T, Bei J (2016) Salient object detection for RGB-D image via saliency evolution. In: 2016 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6

  26. Han J, Chen H, Liu N, Yan C, Li X (2017) CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE Trans Cybern 48(11):3171–3183

    Article  Google Scholar 

  27. Hu R, Deng Z, Zhu X (2021) Multi-scale graph fusion for co-saliency detection. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 7789–7796

  28. Hou Q, Cheng M-M, Hu X, Borji A, Tu Z, Torr PH (2017) Deeply supervised salient object detection with short connections. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3203–3212

  29. Ji W, Li J, Zhang M, Piao Y, Lu H (2020) Accurate RGB-D salient object detection via collaborative learning. In: European conference on computer vision. Springer, pp 52–69

  30. Ji W, Li J, Yu S, Zhang M, Piao Y, Yao S, Bi Q, Ma K, Zheng Y, Lu H et al (2021) Calibrated RGB-D salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9471–9481

  31. Ju R, Ge L, Geng W, Ren T, Wu G (2014) Depth saliency based on anisotropic center-surround difference. In: 2014 IEEE international conference on image processing (ICIP). IEEE, pp 1115–1119

  32. Jin W-D, Xu J, Han Q, Zhang Y, Cheng M-M (2021) CDNet: complementary depth network for RGB-D salient object detection. IEEE Trans Image Process 30:3376–3390

    Article  Google Scholar 

  33. Jiang K, Wang Z, Yi P, Chen C, Huang B, Luo Y, Ma J, Jiang J (2020) Multi-scale progressive fusion network for single image deraining. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8346–8355

  34. Jiang B, Zhou Z, Wang X, Tang J, Luo B (2020) CmSalGAN: RGB-D salient object detection with cross-view generative adversarial networks. IEEE Trans Multimed 23:1343–1353

    Article  Google Scholar 

  35. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980

  36. Li P, Wang D, Wang L, Lu H (2018) Deep visual tracking: review and experimental comparison. Pattern Recogn 76:323–338

    Article  Google Scholar 

  37. Li G, Liu Z, Ling H (2020) ICNet: information conversion network for RGB-D based salient object detection. IEEE Trans Image Process 29:4873–4884

    Article  MATH  Google Scholar 

  38. Li G, Liu Z, Ye L, Wang Y, Ling H (2020) Cross-modal weighting network for RGB-D salient object detection. In: European conference on computer vision. Springer, pp 665–681

  39. Li G, Liu Z, Chen M, Bai Z, Lin W, Ling H (2021) Hierarchical alternate interaction network for RGB-D salient object detection. IEEE Trans Image Process 30:3528–3542

    Article  Google Scholar 

  40. Liu S, Huang D et al (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the european conference on computer vision (ECCV), pp 385–400

  41. Liu Z, Shi S, Duan Q, Zhang W, Zhao P (2019) Salient object detection for RGB-D image by single stream recurrent convolution neural network. Neurocomputing 363:46–57

    Article  Google Scholar 

  42. Pang Y, Zhang L, Zhao X, Lu H (2020) Hierarchical dynamic filtering network for RGB-D salient object detection. In: European conference on computer vision. Springer, pp 235–252

  43. Perazzi F, Krähenbühl P, Pritch Y, Hornung A (2012) Saliency filters: contrast based filtering for salient region detection. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 733–740

  44. Peng H, Li B, Xiong W, Hu W, Ji R (2014) RGBD salient object detection: a benchmark and algorithms. In: European Conference On Computer Vision. Springer, pp 92–109

  45. Piao Y, Ji W, Li J, Zhang M, Lu H (2019) Depth-induced multi-scale recurrent attention network for saliency detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7254–7263

  46. Piao Y, Rong Z, Zhang M, Ren W, Lu H (2020) A2dele: adaptive and attentive depth distiller for efficient RGB-D salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9060–9069

  47. Ren G, Xie Y, Dai T, Stathaki T (2021) Progressive multi-scale fusion network for RGB-D salient object detection. arXiv:2106.03941

  48. Ren J, Gong X, Yu L, Zhou W, Ying Yang M (2015) Exploiting global priors for RGB-D saliency detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 25–32

  49. Richardson E, Alaluf Y, Patashnik O, Nitzan Y, Azar Y, Shapiro S, Cohen-Or D (2021) Encoding in style: a stylegan encoder for image-to-image translation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2287–2296

  50. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  51. Song H, Liu Z, Du H, Sun G, Le Meur O, Ren T (2017) Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning. IEEE Trans Image Process 26(9):4204–4216

    Article  MathSciNet  MATH  Google Scholar 

  52. Wang N, Zhou W, Wang J, Li H (2021) Transformer meets tracker: exploiting temporal context for robust visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1571–1580

  53. Wang N, Gong X (2019) Adaptive fusion for RGB-D salient object detection. IEEE Access 7:55277–55284

    Article  Google Scholar 

  54. Wang R, Fan J, Li Y (2020) Deep multi-scale fusion neural network for multi-class arrhythmia detection. IEEE J Biomed Health Inform 24 (9):2461–2472

    Article  Google Scholar 

  55. Wang F, Pan J, Xu S, Tang J (2022) Learning discriminative cross-modality features for RGB-D saliency detection. IEEE Transactions on Image Processing

  56. Wu J, Zhou W, Luo T, Yu L, Lei J (2021) Multiscale multilevel context and multimodal fusion for RGB-D salient object detection. Sig Process 178:107766

    Article  Google Scholar 

  57. Wu Y-H, Liu Y, Xu J, Bian J-W, Gu Y-C, Cheng M-M (2021) MobileSal: extremely efficient RGB-D salient object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence

  58. Yan B, Peng H, Fu J, Wang D, Lu H (2021) Learning spatio-temporal transformer for visual tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10448–10457

  59. Yang R, Yu Y (2021) Artificial convolutional neural network in object detection and semantic segmentation for medical imaging analysis. Front Oncol 11:573

    Google Scholar 

  60. Yeh Y-R, Lin T-C, Chung Y-Y, Wang Y-CF (2012) A novel multiple kernel learning framework for heterogeneous feature fusion and variable selection. IEEE Trans Multimed 14(3):563–574

    Article  Google Scholar 

  61. Yuan X, Shi J, Gu L (2021) A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Syst Appl 169:114417

    Article  Google Scholar 

  62. Zou Z, Shi Z, Guo Y, Ye J (2019) Object detection in 20 years: a survey. arXiv:1905.05055

  63. Zhan F, Yu Y, Cui K, Zhang G, Lu S, Pan J, Zhang C, Ma F, Xie X, Miao C (2021) Unbalanced feature transport for exemplar-based image translation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15028–15038

  64. Zhang M, Ren W, Piao Y, Rong Z, Lu H (2020) Select, supplement and focus for RGB-D saliency detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3472–3481

  65. Zhang W, Jiang Y, Fu K, Zhao Q (2021) BTS-Net: bi-directional transfer-and-selection network for RGB-D salient object detection. In: 2021 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6

  66. Zhang J, Fan D-P, Dai Y, Anwar S, Saleh F, Aliakbarian S, Barnes N (2021) Uncertainty inspired RGB-D saliency detection. IEEE Transactions on Pattern Analysis and Machine Intelligence

  67. Zhang Z, Lin Z, Xu J, Jin W-D, Lu S-P, Fan D-P (2021) Bilateral attention network for RGB-D salient object detection. IEEE Trans Image Process 30:1949–1961

    Article  Google Scholar 

  68. Zhao J-X, Cao Y, Fan D-P, Cheng M-M, Li X-Y, Zhang L (2019) Contrast prior and fluid pyramid integration for RGBD salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3927–3936

  69. Zhao X, Pang Y, Zhang L, Lu H, Zhang L (2020) Suppress and balance: a simple gated network for salient object detection. In: European conference on computer vision. Springer, pp 35–51

  70. Zhao X, Pang Y, Zhang L, Lu H, Ruan X (2021) Self-supervised representation learning for RGB-D salient object detection. arXiv:2101.12482

  71. Zhou L, Yang Z, Yuan Q, Zhou Z, Hu D (2015) Salient region detection via integrating diffusion-based compactness and local contrast. IEEE Trans Image Process 24(11):3308–3320

    Article  MathSciNet  MATH  Google Scholar 

  72. Zhou T, Fu H, Chen G, Zhou Y, Fan D-P, Shao L (2021) Specificity-preserving RGB-D saliency detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4681–4691

  73. Zhou W, Liu C, Lei J, Yu L, Luo T (2022) HFNet: hierarchical feedback network with multilevel atrous spatial pyramid pooling for RGB-D saliency detection. Neurocomputing 490:347–357

    Article  Google Scholar 

  74. Zhu Y, Liu D, Li Y, Wang X (2015) Selective and incremental fusion for fuzzy and uncertain data based on probabilistic graphical model. J Intell Fuzzy Syst 29(6):2397–2403

    Article  Google Scholar 

  75. Zhu C, Li G, Wang W, Wang R (2017) An innovative salient object detection using center-dark channel prior. In: Proceedings of the IEEE international conference on computer vision workshops, pp 1509–1515

  76. Zhu C, Li G (2017) A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In: Proceedings of the IEEE international conference on computer vision workshops, pp 3008–3014

  77. Zhu D, Dai L, Luo Y, Zhang G, Shao X, Itti L, Lu J (2018) Multi-scale adversarial feature learning for saliency detection. Symmetry 10(10):457

    Article  Google Scholar 

  78. Zhu C, Cai X, Huang K, Li TH, Li G (2019) PDNet: prior-model guided depth-enhanced network for salient object detection. In: IEEE international conference on multimedia and expo (ICME). IEEE, pp 199–204

Download references

Acknowledgements

The paper is supported by AnHui Province Key Laboratory of Infrared and Low-Temperature Plasma under No.IRKL2022KF07.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongbo Bi.

Ethics declarations

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bi, H., Zhang, J., Wu, R. et al. Cross-modal refined adjacent-guided network for RGB-D salient object detection. Multimed Tools Appl 82, 37453–37478 (2023). https://doi.org/10.1007/s11042-023-14421-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14421-1

Keywords

Navigation