Skip to main content
Log in

Lightweight Multi-modal Representation Learning for RGB Salient Object Detection

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

The task of salient object detection (SOD) often faces various challenges such as complex backgrounds and low appearance contrast. Depth information, which reflects the geometric shape of an object’s surface, can be used as a supplement to visible information and receives increasing interest in SOD. However, depth sensors suffer from limited conditions and range (e.g., 4–5 ms at most in indoor scenes), and the imaging quality is usually low. We design a lightweight network in order to infer depth features while reducing computational complexities, which only needs a few parameters to effectively capture depth-specific features by fusing high-level features from the RGB modality. Both RGB features and inferred depth features might contain noises, and thus we design a fusion network, which includes a self-attention-based feature interaction module and a foreground-background enhancement module, to achieve an adaptive fusion of RGB and depth features. In addition, we introduce a multi-scale fusion module with different dilated convolutions to leverage useful local and global context clues. Experimental results on five benchmark datasets show that our approach significantly outperforms the state-of-the-art RGBD SOD methods, and also performs comparably against the state-of-the-art RGB SOD methods. The experimental results show that our multi-modal representation learning method can deal with the imaging limitations of single-modality data for RGB salient object detection, and the experimental results on multiple RGBD and RGB SOD datasets illustrate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Wei YC, Feng JS, Liang XD, Cheng MM, Zhao Y, Yan SC. Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017;1568–1576.

  2. Ren ZX, Gao SH, Chia LT, Tsang IWH. Region-based saliency detection and its application in object recognition. IEEE Trans Circuits Syst Video Technol. 2013;24:769–79.

    Article  Google Scholar 

  3. Bhat G, Lawin FJ, Danelljan M, Robinson A, Felsberg M, Van Gool L, Timofte R. Learning what to learn for video object segmentation. In: Proceedings of European Conference on Computer Vision 2020;777–794.

  4. Sinha A, Dolz J. Multi-scale self-guided attention for medical image segmentation. IEEE J Biomed Health Inform. 2020;25:121–30.

    Article  Google Scholar 

  5. Jia X, Lu HC, Yang MH. Visual tracking via coarse and fine structural local sparse appearance models. IEEE Trans Image Process. 2016;25:4555–64.

    Article  MathSciNet  MATH  Google Scholar 

  6. Feng MY, Lu HC, Ding ER. Attentive feedback network for boundary-aware salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019;1623–1632.

  7. Wang WG, Zhao SY, Shen BJ, Steven H, Borji A. Salient object detection with pyramid attention and salient edges. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019;1448–1457.

  8. Fan DP, Zhai YJ, Borji A, Yang, JF, Shao L. BBS-NET: RGB-D salient object detection with a bifurcated backbone strategy network. In: Proceedings of European Conference on Computer Vision. 2020;275–292.

  9. Piao YR, Rong ZK, Zhang M, Rng WS, Lu HC. A2dele: adaptive and attentive depth distiller for efficient RGB-D salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2020;9060–9069.

  10. Zhang J, Fan DP, Dai YC, Dai YC, Anwar S, Saleh F S, Zhang T, Barnes N. UC-NET: uncertainty inspired RGB-D saliency detection via conditional variational autoencoders. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2020;8582–8591.

  11. Yang Y, Qin Q, Luo YJ, Liu Y, Zhang Q. Han JG. Bi-directional progressive guidance network for RGB-D salient object detection: IEEE Transactions on Circuits and Systems for Video Technology; 2022. p. 1–6.

    Google Scholar 

  12. Wang FY, Pan JS, Xu SK, Tang JH. Learning discriminative cross-modality features for RGB-D saliency detection. IEEE Trans Image Process. 2022;31:1285–97.

    Article  Google Scholar 

  13. Wu YH, Liu Y, Xu J, Bian JW, Gu YC, Cheng MM. MobileSal: Extremely efficient RGB-D salient object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 2021.

  14. Zhao XQ, Pang YW, Zhang LH, Lu HC, Ruan X. Self-supervised pretraining for RGB-D salient object detection. In: Proceedings of Association for the Advancement of Artificial Intelligence. 2022;3.

  15. Xu YQ, Yu X, Zhang J, Zhu LC, Wang DD. Weakly supervised RGB-D salient object detection with prediction consistency training and active scribble boosting. IEEE Trans Image Process. 2022;31:2148–61.

    Article  Google Scholar 

  16. Wang XQ, Zhu L, Tang SL, Fu HZ, Li P, Wu F, Yang Y, Zhuang YT. Boosting RGB-D saliency detection by leveraging unlabeled RGB images. IEEE Trans Image Process. 2022;31:1107–19.

    Article  Google Scholar 

  17. Patterson R, Moe L, Hewitt T. Factors that affect depth perception in stereoscopic displays. Hum Factors. 1992;34:655–67.

    Article  Google Scholar 

  18. Zhang ZY. Microsoft kinect sensor and its effect. IEEE Trans Multimedia. 2012;19:4–10.

    Article  Google Scholar 

  19. Zhang YF, Zheng JB, Jia WJ, Huang WF, Li D, Liu N, Li F, He XJ. Deep RGB-D saliency detection without depth. IEEE Trans Multimedia. 2021;24:755–67.

    Article  Google Scholar 

  20. Zhao YF, Zhao JW, Li J, Chen XW. RGB-D salient object detection with ubiquitous target awareness. IEEE Trans Image Process. 2021;30:7717–31.

    Article  Google Scholar 

  21. Zhang M, Ren WS, Piao YR, Rong ZK, Lu HC. Select, supplement and focus for RGB-D saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2020;3472–3481.

  22. Fu KR, Fan DP, Ji GP, Zhao QJ. JL-DCF: joint learning and densely-cooperative fusion framework for RGB-D salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2020;3052–3062.

  23. He KM, Zhang XY,Ren SQ, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016. 770–778.

  24. Chen H, Li YF, Deng YJ, Lin GS. CNN-based RGB-D salient object detection: learn, select, and fuse, In: International Journal of Computer Vision. 2021;129:2076–96.

    Google Scholar 

  25. Liu N, Zhang N, Han JW. Learning selective self-mutual attention for RGB-D saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2020;13756–13765.

  26. Li CY, Yuan YC, Cai WD, Xia Y, Feng DG. Robust saliency detection via regularized random walks ranking, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015;2710–2717.

  27. Liao GB, Gao W, Jiang QP, Wang RG, Li G. MMNet: multi-stage and multi-scale fusion network for RGB-D salient object detection. In: ACM International Conference on Multimedia. 2020;2436–2444.

  28. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille A. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell. 2017;40:834–48.

    Article  Google Scholar 

  29. Feng D, Barnes N, You SD, You SD, McCarthy C. Local background enclosure for RGB-D salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016;2343–2350.

  30. Ren JQ, Gong XJ, Yu L, Zhou WH, Ying Y. Exploiting global priors for RGB-D saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015;25–32.

  31. Ren JY, Wang Z, Ren JC. PS-Net: progressive selection network for salient object detection. Cogn Comput. 2022;14:1–11.

    Article  Google Scholar 

  32. Shigematsu R, Feng D, You SD, Barnes N. Learning RGB-D salient object detection using background enclosure, depth contrast, and top-down features. In :Proceedings of the IEEE International Conference on Computer Vision. 2017;2749–2757.

  33. Zhao JW, Zhao YF, Li J, Chen XW. Is depth really necessary for salient object detection? In: ACM International Conference on Multimedia. 2020;1745–1754.

  34. Zhou XF, Li GY, Gong C, Liu Z, Zhang JY. Attention-guided RGBD saliency detection using appearance information. Image Vis Comput. 2020;95:103888.

    Article  Google Scholar 

  35. Zhao XQ, Pang YW, Zhang LH, Lu HC, Zhang L. Suppress and balance: a simple gated network for salient object detection. In: Proceedings of European Conference on Computer Vision 2020;35–51.

  36. Zhou T, Fu HZ, Chen G, Zhou Y, Fan DP, Shao L. Specificity-preserving RGB-D saliency detection. In: Proceedings of the IEEE International Conference on Computer Vision. 2021;4681–4691.

  37. Chen H, Li YF, Su D. Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection. Pattern Recogn. 2019;86:376–85.

    Article  Google Scholar 

  38. Chen ZY, Cong RM, Xu QQ, Huang QM. DPANet: depth potentiality-aware gated attention network for RGB-D salient object detection. IEEE Trans Image Process. 2020;30:7012–24.

    Article  Google Scholar 

  39. Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N. Deeper depth prediction with fully convolutional residual networks. In: Fourth International Conference on 3DVision. 2016;239–248.

  40. Ji W, Li JJ, Yu S, Zhang M, Piao YR, Yao SY, Bi Q, Ma K, Zheng YF, Lu HC. Calibrated RGB-D salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2021;9471–9481.

  41. Deng ZJ, Hu XW, Zhu L, Xu XM, Qin J, Han GQ, Heng PA. R3net: recurrent residual refinement network for saliency detection. In Proceedings of the International Joint Conferenceon Artificial Intelligence 2018;684–690.

  42. Wu Z, Su L, Huang QM. Cascaded partial decoder for fast and accurate salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019;3907–3916.

  43. Eigen D, Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision. 2015;2650–2658.

  44. Eigen D, Puhrsch C, Fergus R. Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of Advances in Neural Information Processing Systems. 2014;2366–2374.

  45. Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018;7132–7141.

  46. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I. Attention is all you need. In Proceedings of Advances in Neural Information Processing Systems. 2017;5998–6008.

  47. Chen SH, Tan XL, Wang B, Hu XL. Reverse attention for salient object detection. In: Proceedings of European Conference on Computer Vision. 2018;234–250.

  48. Zhang Z, Lin Z, Xu J, Jin WD, Lu SP, Fan DP. Bilateral attention network for RGB-D salient object detection. IEEE Trans Image Process. 2021;30:1949–61.

    Article  Google Scholar 

  49. Piao YR, Ji W, Li JJ, Zhang M, Lu HC. Depth-induced multi-scale recurrent attention network for saliency detection. In: Proceedings of the IEEE International Conference on Computer Vision. 2019;7254–7263

  50. Ju R, Ge L, Geng WJ, Ren TW, Wu GS. Depth saliency based on anisotropic center-surround difference. In: IEEE international conference on Image Processing 2014;1115–1119.

  51. Peng HW, Li B, Xiong WH, Hu WM, Ji RR. RGBD salient object detection: a benchmark and algorithms. In: Proceedings of European Conference on Computer Vision. 2014;92–109.

  52. Niu YZ, Geng YJ, Li XQ, Liu F. Leveraging stereopsis for saliency analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2012;454–461.

  53. Li NY, Ye JW, Ji Y, Ling HB, Yu JY. Saliency detection on light field. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014;2806–2813.

  54. Chen SH, Fu Y. Progressively guided alternate refinement network for RGB-D salient object detection. In: Proceedings of European Conference on Computer Vision 2020;520–538.

  55. Fan DP, Cheng M, Liu Y, Li T, Borji A. Structure-measure: a new way to evaluate foreground maps. In: Proceedings of the IEEE International Conference on Computer Vision. 2017;4548–4557.

  56. Achanta R, Hemami S, Estrada F, Susstrunk S. Frequency-tuned salient region detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2009. 1597–1604.

  57. Fan DP, Gong C, Cao Y, Ren B, Cheng MM, Borji A. Enhanced-alignment measure for binary foreground map evaluation. In: Proceedings of the International Joint Conference on Artificial Intelligence. 2018. 698–704.

  58. Borji A, Cheng MM, Jiang HZ, Li J. Salient object detection: a benchmark. IEEE Trans Image Process. 2015;24:5706–22.

    Article  MathSciNet  MATH  Google Scholar 

  59. Paszke A, Gross S, Massa F, Bradbury J, Chanan G, Killeen T. Pytorch: an imperative style, high-performance deep learning library. In Proceedings of Advances in Neural Information Processing Systems. 2019;8026–8037.

  60. Li CY, Cong RM, Piao YR, Xu QQ, Loy CC. RGB-D salient object detection with cross-modality modulation and selection. In: Proceedings of European Conference on Computer Vision. 2020;225–241.

  61. Zhao XQ, Zhang LH, Pang YW, Lu HC, Zhang L. A single stream network for robust and real-time RGB-D salient object detection. In: Proceedings of European Conference on Computer Vision. 2020;646–662.

  62. Zhang M, Zhang Y, Piao YR, Hu BQ, Lu HC. Feature reintegration over differential treatment: a top-down and adaptive fusion network for RGB-D salient object detection.  In: ACM International Conference on Multimedia. 2020;4107-4115.

  63. Wang XH, Li S, Chen CLZ, Fang YM, Hao AM, Qin H. Data-level recombination and lightweight fusion scheme for RGB-D salient object detection. IEEE Trans Image Process. 2020;30:458–71.

    Article  Google Scholar 

  64. Jin WD, Xu J, Han Q, Zhang Y, Cheng MM. CDNet: complementary depth network for RGB-D salient object detection. IEEE Trans Image Process. 2021;30:3376–90.

    Article  Google Scholar 

  65. Huang NC, Yang Y, Zhang DW, Zhang Q, Han JG. Employing bilinear fusion and saliency prior information for RGB-D salient object detection. IEEE Transactionson Multimedia. 2022;24:1651–64.

    Article  Google Scholar 

  66. Wang WG, Shen JB, Cheng MM, Shao L. An iterative and cooperative top-down and bottom-up inference network for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019;5968–5977.

  67. Wang TT, Zhang LH, Wang S, Lu HC, Yang G, Ruan X, Borji A. Detect globally, refine locally: a novel approach to saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018;3127–3135.

  68. Wu RM, Feng MY, Guan WL, Wang D, Lu HC, Ding ER. A mutual learning method for salient object detection with intertwined multi-supervision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019;8150–8159.

  69. Mohammadi S, Noori M, Bahri A, Majelan SG, Havaei M. CAGNet: content-aware guidance for salient object detection. Pattern Recogn. 2020;103:107303.

    Article  Google Scholar 

  70. Zhou HJ, Xie XH, Lai JH, Chen ZX, Yang LX. Interactive two-stream decoder for accurate and fast saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2020;9141–9150.

Download references

Acknowledgements

This work is partly supported by the National Natural Science Foundation of China (No.62006002, 62106006), the Natural Science Foundation of Anhui Province (No.1908085QF264, 2208085J18, 2208085QF192), the Natural Science Foundation of Anhui Higher Education Institution (No.2022AH040014) , and the University Synergy Innovation Program of Anhui Province (No.GXXT-2022-033, GXXT-2021-038).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chenglong Li.

Ethics declarations

Ethics Approval and Consent to Participate

This article does not contain any studies with animals performed by any of the authors.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiao, Y., Huang, Y., Li, C. et al. Lightweight Multi-modal Representation Learning for RGB Salient Object Detection. Cogn Comput 15, 1868–1883 (2023). https://doi.org/10.1007/s12559-023-10148-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-023-10148-1

Keywords

Navigation