Skip to main content
Log in

SRFCNM: Spatiotemporal recurrent fully convolutional network model for salient object detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Video saliency detection has recently been widely used because of its ability to distinguish significant regions of interest. It has several applications, such as video segmentation, abnormal activity detection, video summarization, etc. This research paper develops a novel technique for video saliency detection known as Spatiotemporal Recurrent Fully Convolutional Network Model (SRFCNM). This model uses recurrent convolutional layers to represent spatial and temporal features of superpixels for element uniqueness. The model is trained in two phases; initially, we pre-train the model on the segmented data sets and then fine-tune it for saliency detection, which allows the network to learn salient objects more accurately. The uniqueness of integrating saliency maps with recurrent convolutional layers and spatiotemporal characteristics facilitates the robust representation of salient objects to capture the relevant features. The SRFCNM model is extensively estimated on the challenging datasets viz. SegTrackV2, FBMS and DAVIS. Our model is compared with the existing Deep Learning and Convolutional Neural Network algorithms. This research demonstrates that SRFCNM outperforms the state-of-the-art saliency models considerably over the three public datasets regarding accuracy recall and mean absolute error (MAE). The proposed SRFCNM model produces the lowest MAE values, 3.2%, 3.5%, and 7.5%, for SegTrackV2, DAVIS, and FBMS datasets, respectively, with hand-crafted color features, compared with the existing models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

  1. A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “Enet: A deep neural network architecture for real-time semantic segmentation,” ArXiv Prepr. ArXiv160602147, 2016.

  2. Li J, Liang X, Shen S, Xu T, Feng J, Yan S (2017) Scale-aware fast R-CNN for pedestrian detection. IEEE Trans Multimed 20(4):985–996

    Google Scholar 

  3. Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  4. Li G, Yu Y (2016) “Deep contrast learning for salient object detection”. In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp 478–487

  5. Pan H, Jiang H (2016) “A deep learning based fast image saliency detection algorithm”.ArXiv Prepr. ArXiv160200577

  6. Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386

    Article  Google Scholar 

  7. Simonyan K, Zisserman A (2014) “Two-stream convolutional networks for action recognition in videos”. ArXiv Prepr. ArXiv14062199

  8. Wang L, Ouyang W, Wang X, Lu H (2015) “Visual tracking with fully convolutional networks”. In:Proceedings of the IEEE international conference on computer vision, pp 3119–3127

  9. Wang L, Wang L, Lu H, Zhang P, Ruan X (2018) Salient object detection with recurrent fully convolutional networks. IEEE Trans Pattern Anal Mach Intell 41(7):1734–1746

    Article  Google Scholar 

  10. Gastal ES, Oliveira MM (2012) Adaptive manifolds for real-time high-dimensional filtering. ACM Trans Graph TOG 31(4):1–13

    Article  Google Scholar 

  11. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184

    Article  Google Scholar 

  12. Goferman S, Zelnik-Manor L, Tal A (2012) Context-Aware Saliency Detection. IEEE Trans Pattern Anal Mach Intell 34(10):1915–1926. https://doi.org/10.1109/TPAMI.2011.272

    Article  Google Scholar 

  13. Cheng M-M, Mitra NJ, Huang X, Torr PHS, Hu S-M (2015) Global Contrast Based Salient Region Detection. IEEE Trans Pattern Anal Mach Intell 37(3):569–582. https://doi.org/10.1109/TPAMI.2014.2345401

    Article  Google Scholar 

  14. Mahamud S, Williams LR, Thornber KK, Xu K (2003) Segmentation of multiple salient closed contours from real images. IEEE Trans Pattern Anal Mach Intell 25(4):433–444

    Article  Google Scholar 

  15. Yang B, Zhang X, Chen L, Yang H, Gao Z (2017) Edge guided salient object detection. Neurocomputing 221:60–71

    Article  Google Scholar 

  16. Li J, Xia C, Chen X (2018) A Benchmark Dataset and Saliency-Guided Stacked Autoencoders for Video-Based Salient Object Detection. IEEE Trans Image Process 27(1):349–364. https://doi.org/10.1109/TIP.2017.2762594

    Article  MathSciNet  Google Scholar 

  17. Yan Y et al (2018) Unsupervised image saliency detection with Gestalt-laws guided optimization and visual attention based refinement. Pattern Recognit 79:65–78

    Article  Google Scholar 

  18. Zhang P, Wang D, Lu H, Wang H, Yin B (2017) “Learning uncertain convolutional features for accurate saliency detection”. In:Proceedings of the IEEE International Conference on computer vision, pp 212–221

  19. Sajid H, Cheung S-CS, Jacobs N (2019) Motion and appearance based background subtraction for freely moving cameras. Signal Process Image Commun 75:11–21

    Article  Google Scholar 

  20. Liang J, Zhou J, Tong L, Bai X, Wang B (2018) Material based salient object detection from hyperspectral images. Pattern Recognit 76:476–490

    Article  Google Scholar 

  21. Xiao F, Peng L, Fu L, Gao X (2018) Salient object detection based on eye tracking data. Signal Process 144:392–397

    Article  Google Scholar 

  22. Fu K, Gu IY-H, Yang J (2018) Spectral salient object detection. Neurocomputing 275:788–803

    Article  Google Scholar 

  23. Li H, Chen J, Lu H, Chi Z (2017) CNN for saliency detection with low-level feature integration. Neurocomputing 226:212–220

    Article  Google Scholar 

  24. Qu L, He S, Zhang J, Tian J, Tang Y, Yang Q (2017) RGBD salient object detection via deep fusion. IEEE Trans Image Process 26(5):2274–2285

    Article  MathSciNet  Google Scholar 

  25. Huang K, Gao S (2020) Image saliency detection via multi-scale iterative CNN. Vis Comput 36(7):1355–1367. https://doi.org/10.1007/s00371-019-01734-2

    Article  Google Scholar 

  26. Huang L, Song K, Wang J, Niu M, Yan Y (2022) Multi-Graph Fusion and Learning for RGBT Image Saliency Detection. IEEE Trans Circuits Syst Video Technol 32(3):1366–1377. https://doi.org/10.1109/TCSVT.2021.3069812

    Article  Google Scholar 

  27. Zhang Q, Xiao X, Wang X, Wang S, Kwong S, Jiang J (2022) Adaptive Viewpoint Feature Enhancement-Based Binocular Stereoscopic Image Saliency Detection. IEEE Trans Circuits Syst Video Technol 32(10):6543–6556. https://doi.org/10.1109/TCSVT.2022.3171563

    Article  Google Scholar 

  28. Fang Y, Wang Z, Lin W, Fang Z (2014) Video Saliency Incorporating Spatiotemporal Cues and Uncertainty Weighting. IEEE Trans Image Process 23(9):3910–3921. https://doi.org/10.1109/TIP.2014.2336549

    Article  MathSciNet  Google Scholar 

  29. Wang W, Shen J, Shao L (2017) Video salient object detection via fully convolutional networks. IEEE Trans Image Process 27(1):38–49

    Article  MathSciNet  Google Scholar 

  30. Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) “Beyond short snippets: Deep networks for video classification”. In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4694–4702

  31. Xingjian SHI, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W (2015) “Convolutional LSTM network: A machine learning approach for precipitation nowcasting”. In:Advances in neural information processing systems, pp 802–810

  32. Chen Y, Zou W, Tang Y, Li X, Xu C, Komodakis N (2018) SCOM: Spatiotemporal Constrained Optimization for Salient Object Detection. IEEE Trans Image Process 27(7):3345–3357. https://doi.org/10.1109/TIP.2018.2813165

    Article  MathSciNet  Google Scholar 

  33. Le T-N, Sugimoto A (2018) Video Salient Object Detection Using Spatiotemporal Deep Features. IEEE Trans Image Process 27(10):5002–5015. https://doi.org/10.1109/TIP.2018.2849860

    Article  MathSciNet  Google Scholar 

  34. Song H, Wang W, Zhao S, Shen J, Lam K-M (2018) “Pyramid dilated deeper convlstm for video salient object detection”. In: Proceedings of the European conference on computer vision (ECCV), pp 715–731

  35. Li G, Xie Y, Wei T, Wang K, Lin L (2018) “Flow guided recurrent neural encoder for video salient object detection”. In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3243–3252

  36. Jiao L et al (2019) A Survey of Deep Learning-Based Object Detection. IEEE Access 7:128837–128868. https://doi.org/10.1109/ACCESS.2019.2939201

    Article  Google Scholar 

  37. Huang K, Li G, Liu S (2020) Learning channel-wise spatio-temporal representations for video salient object detection. Neurocomputing 403:325–336. https://doi.org/10.1016/j.neucom.2020.04.015

    Article  Google Scholar 

  38. Lu X, Wang W, Ma C, Shen J, Shao L, Porikli F (2019) “See more, know more: Unsupervised video object segmentation with co-attention siamese networks,”. In:Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3623–3632

  39. Qin Z, Lu X, Nie X, Liu D, Yin Y, Wang W (2023) Coarse-to-fine video instance segmentation with factorized conditional appearance flows. IEEECAA J Autom Sin 10(5):1192–1208

    Article  Google Scholar 

  40. Rahtu E, Kannala J, Salo M, Heikkilä J (2010) “Segmenting salient objects from images and videos”. In: European conference on computer vision, Springer, pp 366–379

  41. Chang Q, Zhu S (2021) “Temporal-spatial feature pyramid for video saliency detection”.ArXiv Prepr. ArXiv210504213

  42. Jian M, Wang J, Yu H, Wang G-G (2021) Integrating object proposal with attention networks for video saliency detection. Inf Sci 576:819–830. https://doi.org/10.1016/j.ins.2021.08.069

    Article  MathSciNet  Google Scholar 

  43. Tang L, Li B, Kuang S, Song M, Ding S (2022) Re-thinking the relations in co-saliency detection. IEEE Trans Circuits Syst Video Technol 32(8):5453–5466. https://doi.org/10.1109/TCSVT.2022.3150923

    Article  Google Scholar 

  44. Long J, Shelhamer E, Darrell T (2015) “Fully convolutional networks for semantic segmentation”. In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  45. Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Süsstrunk S (2012) SLIC Superpixels Compared to State-of-the-Art Superpixel Methods. IEEE Trans Pattern Anal Mach Intell 34(11):2274–2282. https://doi.org/10.1109/TPAMI.2012.120

    Article  Google Scholar 

  46. Kim J, Han D, Tai Y-W, Kim J (2016) Salient Region Detection via High-Dimensional Color Transform and Local Spatial Support. IEEE Trans Image Process 25(1):9–23. https://doi.org/10.1109/TIP.2015.2495122

    Article  MathSciNet  Google Scholar 

  47. Milan A, Leal-Taixé L, Reid I, Roth S, Schindler K (2016) “MOT16: A benchmark for multi-object tracking”.ArXiv Prepr. ArXiv160300831

  48. Jia Y et al (2014) “Caffe: Convolutional Architecture for Fast Feature Embedding,” in Proceedings of the 22nd ACM international conference on Multimedia, Orlando Florida USA: ACM, pp 675–678. https://doi.org/10.1145/2647868.2654889

  49. Borji A, Cheng M-M, Jiang H, Li J (2015) Salient Object Detection: A Benchmark. IEEE Trans Image Process 24(12):5706–5722. https://doi.org/10.1109/TIP.2015.2487833

    Article  MathSciNet  Google Scholar 

  50. Tsai D, Flagg M, Nakazawa A, Rehg JM (2012) Motion coherent tracking using multi-label MRF optimization. Int J Comput Vis 100(2):190–202

    Article  MathSciNet  Google Scholar 

  51. Hutchison D et al (2010) “Object Segmentation by Long Term Analysis of Point Trajectories,” in Computer Vision – ECCV 2010, K. Daniilidis, P. Maragos, and N. Paragios, Eds., in Lecture Notes in Computer Science, vol. 6315. Berlin, Heidelberg: Springer Berlin Heidelberg, pp 282–295. https://doi.org/10.1007/978-3-642-15555-0_21

  52. Perazzi F, Pont-Tuset J, McWilliams B, Van Gool L, Gross M, Sorkine-Hornung A (2016) “A benchmark dataset and evaluation methodology for video object segmentation,” In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 724–732

  53. Navalpakkam V, Itti L (2005) Modeling the influence of task on attention. Vision Res 45(2):205–231

    Article  Google Scholar 

  54. Wei Y, Wen F, Zhu W, Sun J (2012) “Geodesic saliency using background priors”. In European conference on computer vision, Springer, 2012, pp 29–42

  55. Fu H, Cao X, Tu Z (2013) Cluster-Based Co-Saliency Detection. IEEE Trans Image Process 22(10):3766–3778. https://doi.org/10.1109/TIP.2013.2260166

    Article  MathSciNet  Google Scholar 

  56. Zhu W, Liang S, Wei Y, Sun J (2014) “Saliency optimization from robust background detection”. In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2814–2821

  57. Yang C, Zhang L, Lu H, Ruan X, Yang M-H (2013) “Saliency detection via graph-based manifold ranking”. In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3166–3173

  58. Zhou F, Bing Kang S, Cohen MF (2014) “Time-mapping using space-time saliency”. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3358–3365

  59. Wang L, Lu H, Ruan X, Yang M-H (2015) “Deep networks for saliency detection via local estimation and global search”. In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3183–3192

  60. Jiang H, Wang J, Yuan Z, Wu Y, Zheng N, Li S (2013) “Salient object detection: A discriminative regional feature integration approach”. In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2083–2090

  61. Wang W, Shen J, Porikli F (2015) “Saliency-aware geodesic video object segmentation,” In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3395–3402

  62. Wang W, Shen J, Shao L (2015) Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans Image Process 24(11):4185–4196

    Article  MathSciNet  Google Scholar 

  63. Liu N, Han J (2016) “Dhsnet: Deep hierarchical saliency network for salient object detection”. In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp 678–686

  64. Wang L, Wang L, Lu H, Zhang P, Ruan X (2016) “Saliency detection with recurrent fully convolutional networks,” in European conference on computer vision, Springer, pp 825–841

  65. Hou Q, Cheng M-M, Hu X, Borji A, Tu Z, Torr PH (2017) “Deeply supervised salient object detection with short connections”. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3203–3212

  66. Ji Y, Zhang H, Jie Z, Ma L, Jonathan Wu QM (2021) CASNet: A Cross-Attention Siamese Network for Video Salient Object Detection. IEEE Trans Neural Netw Learn Syst 32(6):2676–2690. https://doi.org/10.1109/TNNLS.2020.3007534

    Article  Google Scholar 

  67. Liu N, Han J, Yang M-H (2018) “Picanet: Learning pixel-wise contextual attention for saliency detection”. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3089–3098

  68. Huang L, Yan P, Li G, Wang Q, Lin L (2019) Attention embedded spatio-temporal network for video salient object detection. IEEE Access 7:166203–166213

    Article  Google Scholar 

  69. Xu C, Gao Z, Zhang H, Li S, de Albuquerque VHC (2021) Video salient object detection using dual-stream spatiotemporal attention. Appl Soft Comput 108:107433

    Article  Google Scholar 

  70. Liu Y, Han J, Zhang Q, Wang L (2019) Salient Object Detection via Two-Stage Graphs. IEEE Trans Circuits Syst Video Technol 29(4):1023–1037. https://doi.org/10.1109/TCSVT.2018.2823769

    Article  Google Scholar 

  71. Lu H, Li X, Zhang L, Ruan X, Yang M-H (2016) Dense and Sparse Reconstruction Error Based Saliency Descriptor. IEEE Trans Image Process 25(4):1592–1603. https://doi.org/10.1109/TIP.2016.2524198

    Article  MathSciNet  Google Scholar 

  72. Zhang L, Yang C, Lu H, Ruan X, Yang M-H (2017) Ranking Saliency. IEEE Trans Pattern Anal Mach Intell 39(9):1892–1904. https://doi.org/10.1109/TPAMI.2016.2609426

    Article  Google Scholar 

  73. Zhou L, Yang Z, Yuan Q, Zhou Z, Hu D (2015) Salient Region Detection via Integrating Diffusion-Based Compactness and Local Contrast. IEEE Trans Image Process 24(11):3308–3320. https://doi.org/10.1109/TIP.2015.2438546

    Article  MathSciNet  Google Scholar 

Download references

Funding

None.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ishita Arora.

Ethics declarations

Ethical approval

This research does not contain any studies with human participants or animals performed by any authors.

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arora, I., Gangadharappa, M. SRFCNM: Spatiotemporal recurrent fully convolutional network model for salient object detection. Multimed Tools Appl 83, 38009–38036 (2024). https://doi.org/10.1007/s11042-023-17009-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17009-x

Keywords

Navigation