Cross-modal refined adjacent-guided network for RGB-D salient object detection

Bi, Hongbo; Zhang, Jiayuan; Wu, Ranwan; Tong, Yuyu; Jin, Wei

doi:10.1007/s11042-023-14421-1

Cross-modal refined adjacent-guided network for RGB-D salient object detection

Published: 22 March 2023

Volume 82, pages 37453–37478, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Hongbo Bi ORCID: orcid.org/0000-0003-2442-330X¹,
Jiayuan Zhang¹,
Ranwan Wu¹,
Yuyu Tong¹ &
…
Wei Jin²

1 Citation
Explore all metrics

Abstract

RGB and depth modalities can be exploited to effectively recognize the most eye-catching objects in different scenes. Therefore, RGB-D salient object detection (RGB-D SOD) has been a popular direction focused by researchers. Particularly in recent years, various newfangled RGB-D SOD algorithms have been proposed endlessly and achieved outstanding performance. However, most approaches adopt the common pyramid structure to integrate multi-scale cues but ignore the complementarity of features in cross-layers. Besides, it is still challenging to fully utilize RGB and Depth information for cross-modal interaction. To compensate for these shortcomings, we propose a CRA-Net (Cross-modal Refined Adjacent-guided Network), which takes advantage of the high-level semantic information contained in the high layers to guide the details of the local characteristics in the low layers for improving detection accuracy. Specifically, a multiplier refinement module (MRM) is proposed to adequately carry out the information interaction between two modalities, in which a five-layer refinement mechanism is adopted to enhance cross-modal fusion representations. Moreover, for the purpose of obliterating the interference of non-significant factors in the low-level backgrounds, we design an adjacent-guided aggregation module (AAM). The multi-level features are fed in groups into two AAMs with identical structures. By utilizing an adjacent-layer guidance strategy to effectively guide multi-scale features assemblage from deep to shallow. Numerous experiments show that our CRA-Net is competitive for four common evaluation metrics on four popular datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-modality information refinement fusion network for RGB-D salient object detection

Article 21 September 2023

Feature Enhancement and Multi-scale Cross-Modal Attention for RGB-D Salient Object Detection

Lightweight Multi-modal Representation Learning for RGB Salient Object Detection

Article 02 June 2023

Data Availability

The NJU2K dataset, NLPR dataset, SSD dataset, and SIP dataset used in the study are publicly available. The NJU2K dataset can be downloaded from the website: http://mcg.nju.edu.cn/resource.html. The NLPR dataset can be downloaded from the website: https://sites.google.com/site/rgbdsaliency/dataset. The SSD dataset can be downloaded from the website: https://pan.baidu.com/s/1zNL9-KSQwGILdAAfStMXWQ. The SIP dataset can be downloaded from the website: https://pan.baidu.com/s/14VjtMBn0_bQDRB0gMPznoA. All data, models, or code generated or used during the current study are available from the corresponding author by reasonable request.

References

Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned salient region detection. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 1597–1604
Arbelaez P, Maire M, Fowlkes C, Malik J (2010) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 33(5):898–916
Article Google Scholar
Chen S, Tian Y (2013) Margin-constrained multiple kernel learning based multi-modal fusion for affect recognition. In: 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG). IEEE, pp 1–7
Cheng Y, Fu H, Wei X, Xiao J, Cao X (2014) Depth enhanced saliency detection method. In: Proceedings of international conference on internet multimedia computing and service, pp 23–27
Chen H, Li Y (2018) Progressively complementarity-aware fusion network for RGB-D salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3051–3060
Chen H, Li Y, Su D (2019) Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection. Pattern Recogn 86:376–385
Article Google Scholar
Chen H, Deng Y, Li Y, Hung T-Y, Lin G (2020) RGBD salient object detection via disentangled cross-modal fusion. IEEE Trans Image Process 29:8407–8416
Article MATH Google Scholar
Chen Q, Fu K, Liu Z, Chen G, Du H, Qiu B, Shao L (2021) EF-Net: a novel enhancement and fusion network for RGB-D saliency detection. Pattern Recogn 112:107740
Article Google Scholar
Chen Q, Liu Z, Zhang Y, Fu K, Zhao Q, Du H (2021) RGB-D salient object detection via 3d convolutional neural networks. arXiv:2101.10241
Chen Y, Ma T, Yang X, Wang J, Song B, Zeng X (2021) MUFFIN: multi-scale feature fusion for drug–drug interaction prediction. Bioinformatics 37(17):2651–2658
Article Google Scholar
Cheng B, Schwing A, Kirillov A (2021) Per-pixel classification is not all you need for semantic segmentation. Adv Neural Inf Process Syst :34
Ciptadi A, Hermans T, Rehg JM (2013) An in depth view of saliency. Georgia Institute of Technology
Desingh K, Krishna KM, Rajan D, Jawahar C (2013) Depth really matters: improving visual salient region detection with depth. In: BMVC, pp 1–11
Ding Y, Liu Z, Huang M, Shi R, Wang X (2019) Depth-aware saliency detection using convolutional neural networks. J Vis Commun Image Represent 61:1–9
Article Google Scholar
Fan X, Liu Z, Sun G (2014) Salient region detection for stereoscopic images. In: 2014 19th international conference on digital signal processing. IEEE, pp 454–458
Fan D-P, Cheng M-M, Liu Y, Li T, Borji A (2017) Structure-measure: a new way to evaluate foreground maps. In: Proceedings of the IEEE international conference on computer vision, pp 4548–4557
Fan D-P, Gong C, Cao Y, Ren B, Cheng M-M, Borji A (2018) Enhanced-alignment measure for binary foreground map evaluation. arXiv:1805.10421
Fan D-P, Zhai Y, Borji A, Yang J, Shao L (2020) BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network. In: European conference on computer vision. Springer, pp 275–292
Fan D-P, Lin Z, Zhang Z, Zhu M, Cheng M-M (2020) Rethinking RGB-D salient object detection: models, data sets, and large-scale benchmarks. IEEE Trans Neural Netw Learn Syst 32(5):2075–2089
Article Google Scholar
Feng D, Barnes N, You S, McCarthy C (2016) Local background enclosure for RGB-D salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2343–2350
Fidler S, Sharma A, Urtasun R (2013) A sentence is worth a thousand pixels. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1995–2002
Fu K, Fan D-P, Ji G-P, Zhao Q (2020) JL-DCF: joint learning and densely-cooperative fusion framework for RGB-D salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3052–3062
Fu K, Fan D-P, Ji G-P, Zhao Q, Shen J, Zhu C (2021) Siamese network for RGB-D salient object detection and beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence
Gao S-H, Cheng M-M, Zhao K, Zhang X-Y, Yang M-H, Torr P (2019) Res2Net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell 43(2):652–662
Article Google Scholar
Guo J, Ren T, Bei J (2016) Salient object detection for RGB-D image via saliency evolution. In: 2016 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6
Han J, Chen H, Liu N, Yan C, Li X (2017) CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE Trans Cybern 48(11):3171–3183
Article Google Scholar
Hu R, Deng Z, Zhu X (2021) Multi-scale graph fusion for co-saliency detection. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 7789–7796
Hou Q, Cheng M-M, Hu X, Borji A, Tu Z, Torr PH (2017) Deeply supervised salient object detection with short connections. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3203–3212
Ji W, Li J, Zhang M, Piao Y, Lu H (2020) Accurate RGB-D salient object detection via collaborative learning. In: European conference on computer vision. Springer, pp 52–69
Ji W, Li J, Yu S, Zhang M, Piao Y, Yao S, Bi Q, Ma K, Zheng Y, Lu H et al (2021) Calibrated RGB-D salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9471–9481
Ju R, Ge L, Geng W, Ren T, Wu G (2014) Depth saliency based on anisotropic center-surround difference. In: 2014 IEEE international conference on image processing (ICIP). IEEE, pp 1115–1119
Jin W-D, Xu J, Han Q, Zhang Y, Cheng M-M (2021) CDNet: complementary depth network for RGB-D salient object detection. IEEE Trans Image Process 30:3376–3390
Article Google Scholar
Jiang K, Wang Z, Yi P, Chen C, Huang B, Luo Y, Ma J, Jiang J (2020) Multi-scale progressive fusion network for single image deraining. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8346–8355
Jiang B, Zhou Z, Wang X, Tang J, Luo B (2020) CmSalGAN: RGB-D salient object detection with cross-view generative adversarial networks. IEEE Trans Multimed 23:1343–1353
Article Google Scholar
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Li P, Wang D, Wang L, Lu H (2018) Deep visual tracking: review and experimental comparison. Pattern Recogn 76:323–338
Article Google Scholar
Li G, Liu Z, Ling H (2020) ICNet: information conversion network for RGB-D based salient object detection. IEEE Trans Image Process 29:4873–4884
Article MATH Google Scholar
Li G, Liu Z, Ye L, Wang Y, Ling H (2020) Cross-modal weighting network for RGB-D salient object detection. In: European conference on computer vision. Springer, pp 665–681
Li G, Liu Z, Chen M, Bai Z, Lin W, Ling H (2021) Hierarchical alternate interaction network for RGB-D salient object detection. IEEE Trans Image Process 30:3528–3542
Article Google Scholar
Liu S, Huang D et al (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the european conference on computer vision (ECCV), pp 385–400
Liu Z, Shi S, Duan Q, Zhang W, Zhao P (2019) Salient object detection for RGB-D image by single stream recurrent convolution neural network. Neurocomputing 363:46–57
Article Google Scholar
Pang Y, Zhang L, Zhao X, Lu H (2020) Hierarchical dynamic filtering network for RGB-D salient object detection. In: European conference on computer vision. Springer, pp 235–252
Perazzi F, Krähenbühl P, Pritch Y, Hornung A (2012) Saliency filters: contrast based filtering for salient region detection. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 733–740
Peng H, Li B, Xiong W, Hu W, Ji R (2014) RGBD salient object detection: a benchmark and algorithms. In: European Conference On Computer Vision. Springer, pp 92–109
Piao Y, Ji W, Li J, Zhang M, Lu H (2019) Depth-induced multi-scale recurrent attention network for saliency detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7254–7263
Piao Y, Rong Z, Zhang M, Ren W, Lu H (2020) A2dele: adaptive and attentive depth distiller for efficient RGB-D salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9060–9069
Ren G, Xie Y, Dai T, Stathaki T (2021) Progressive multi-scale fusion network for RGB-D salient object detection. arXiv:2106.03941
Ren J, Gong X, Yu L, Zhou W, Ying Yang M (2015) Exploiting global priors for RGB-D saliency detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 25–32
Richardson E, Alaluf Y, Patashnik O, Nitzan Y, Azar Y, Shapiro S, Cohen-Or D (2021) Encoding in style: a stylegan encoder for image-to-image translation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2287–2296
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Song H, Liu Z, Du H, Sun G, Le Meur O, Ren T (2017) Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning. IEEE Trans Image Process 26(9):4204–4216
Article MathSciNet MATH Google Scholar
Wang N, Zhou W, Wang J, Li H (2021) Transformer meets tracker: exploiting temporal context for robust visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1571–1580
Wang N, Gong X (2019) Adaptive fusion for RGB-D salient object detection. IEEE Access 7:55277–55284
Article Google Scholar
Wang R, Fan J, Li Y (2020) Deep multi-scale fusion neural network for multi-class arrhythmia detection. IEEE J Biomed Health Inform 24 (9):2461–2472
Article Google Scholar
Wang F, Pan J, Xu S, Tang J (2022) Learning discriminative cross-modality features for RGB-D saliency detection. IEEE Transactions on Image Processing
Wu J, Zhou W, Luo T, Yu L, Lei J (2021) Multiscale multilevel context and multimodal fusion for RGB-D salient object detection. Sig Process 178:107766
Article Google Scholar
Wu Y-H, Liu Y, Xu J, Bian J-W, Gu Y-C, Cheng M-M (2021) MobileSal: extremely efficient RGB-D salient object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence
Yan B, Peng H, Fu J, Wang D, Lu H (2021) Learning spatio-temporal transformer for visual tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10448–10457
Yang R, Yu Y (2021) Artificial convolutional neural network in object detection and semantic segmentation for medical imaging analysis. Front Oncol 11:573
Google Scholar
Yeh Y-R, Lin T-C, Chung Y-Y, Wang Y-CF (2012) A novel multiple kernel learning framework for heterogeneous feature fusion and variable selection. IEEE Trans Multimed 14(3):563–574
Article Google Scholar
Yuan X, Shi J, Gu L (2021) A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Syst Appl 169:114417
Article Google Scholar
Zou Z, Shi Z, Guo Y, Ye J (2019) Object detection in 20 years: a survey. arXiv:1905.05055
Zhan F, Yu Y, Cui K, Zhang G, Lu S, Pan J, Zhang C, Ma F, Xie X, Miao C (2021) Unbalanced feature transport for exemplar-based image translation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15028–15038
Zhang M, Ren W, Piao Y, Rong Z, Lu H (2020) Select, supplement and focus for RGB-D saliency detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3472–3481
Zhang W, Jiang Y, Fu K, Zhao Q (2021) BTS-Net: bi-directional transfer-and-selection network for RGB-D salient object detection. In: 2021 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6
Zhang J, Fan D-P, Dai Y, Anwar S, Saleh F, Aliakbarian S, Barnes N (2021) Uncertainty inspired RGB-D saliency detection. IEEE Transactions on Pattern Analysis and Machine Intelligence
Zhang Z, Lin Z, Xu J, Jin W-D, Lu S-P, Fan D-P (2021) Bilateral attention network for RGB-D salient object detection. IEEE Trans Image Process 30:1949–1961
Article Google Scholar
Zhao J-X, Cao Y, Fan D-P, Cheng M-M, Li X-Y, Zhang L (2019) Contrast prior and fluid pyramid integration for RGBD salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3927–3936
Zhao X, Pang Y, Zhang L, Lu H, Zhang L (2020) Suppress and balance: a simple gated network for salient object detection. In: European conference on computer vision. Springer, pp 35–51
Zhao X, Pang Y, Zhang L, Lu H, Ruan X (2021) Self-supervised representation learning for RGB-D salient object detection. arXiv:2101.12482
Zhou L, Yang Z, Yuan Q, Zhou Z, Hu D (2015) Salient region detection via integrating diffusion-based compactness and local contrast. IEEE Trans Image Process 24(11):3308–3320
Article MathSciNet MATH Google Scholar
Zhou T, Fu H, Chen G, Zhou Y, Fan D-P, Shao L (2021) Specificity-preserving RGB-D saliency detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4681–4691
Zhou W, Liu C, Lei J, Yu L, Luo T (2022) HFNet: hierarchical feedback network with multilevel atrous spatial pyramid pooling for RGB-D saliency detection. Neurocomputing 490:347–357
Article Google Scholar
Zhu Y, Liu D, Li Y, Wang X (2015) Selective and incremental fusion for fuzzy and uncertain data based on probabilistic graphical model. J Intell Fuzzy Syst 29(6):2397–2403
Article Google Scholar
Zhu C, Li G, Wang W, Wang R (2017) An innovative salient object detection using center-dark channel prior. In: Proceedings of the IEEE international conference on computer vision workshops, pp 1509–1515
Zhu C, Li G (2017) A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In: Proceedings of the IEEE international conference on computer vision workshops, pp 3008–3014
Zhu D, Dai L, Luo Y, Zhang G, Shao X, Itti L, Lu J (2018) Multi-scale adversarial feature learning for saliency detection. Symmetry 10(10):457
Article Google Scholar
Zhu C, Cai X, Huang K, Li TH, Li G (2019) PDNet: prior-model guided depth-enhanced network for salient object detection. In: IEEE international conference on multimedia and expo (ICME). IEEE, pp 199–204

Download references

Acknowledgements

The paper is supported by AnHui Province Key Laboratory of Infrared and Low-Temperature Plasma under No.IRKL2022KF07.

Author information

Authors and Affiliations

School of Electrical Engineering and Information, Northeast Petroleum University, Daqing, 163000, Heilongjiang Province, China
Hongbo Bi, Jiayuan Zhang, Ranwan Wu & Yuyu Tong
College of Electronic Countermeasures, National University of Defense Technology, Hefei, 230037, Anhui Province, China
Wei Jin

Authors

Hongbo Bi
View author publications
You can also search for this author in PubMed Google Scholar
Jiayuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ranwan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yuyu Tong
View author publications
You can also search for this author in PubMed Google Scholar
Wei Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongbo Bi.

Ethics declarations

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bi, H., Zhang, J., Wu, R. et al. Cross-modal refined adjacent-guided network for RGB-D salient object detection. Multimed Tools Appl 82, 37453–37478 (2023). https://doi.org/10.1007/s11042-023-14421-1

Download citation

Received: 13 June 2022
Revised: 28 September 2022
Accepted: 21 January 2023
Published: 22 March 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s11042-023-14421-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-modal refined adjacent-guided network for RGB-D salient object detection

Abstract

Access this article

Similar content being viewed by others

Multi-modality information refinement fusion network for RGB-D salient object detection

Feature Enhancement and Multi-scale Cross-Modal Attention for RGB-D Salient Object Detection

Lightweight Multi-modal Representation Learning for RGB Salient Object Detection

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cross-modal refined adjacent-guided network for RGB-D salient object detection

Abstract

Access this article

Similar content being viewed by others

Multi-modality information refinement fusion network for RGB-D salient object detection

Feature Enhancement and Multi-scale Cross-Modal Attention for RGB-D Salient Object Detection

Lightweight Multi-modal Representation Learning for RGB Salient Object Detection

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation