Abstract
Video saliency detection has recently been widely used because of its ability to distinguish significant regions of interest. It has several applications, such as video segmentation, abnormal activity detection, video summarization, etc. This research paper develops a novel technique for video saliency detection known as Spatiotemporal Recurrent Fully Convolutional Network Model (SRFCNM). This model uses recurrent convolutional layers to represent spatial and temporal features of superpixels for element uniqueness. The model is trained in two phases; initially, we pre-train the model on the segmented data sets and then fine-tune it for saliency detection, which allows the network to learn salient objects more accurately. The uniqueness of integrating saliency maps with recurrent convolutional layers and spatiotemporal characteristics facilitates the robust representation of salient objects to capture the relevant features. The SRFCNM model is extensively estimated on the challenging datasets viz. SegTrackV2, FBMS and DAVIS. Our model is compared with the existing Deep Learning and Convolutional Neural Network algorithms. This research demonstrates that SRFCNM outperforms the state-of-the-art saliency models considerably over the three public datasets regarding accuracy recall and mean absolute error (MAE). The proposed SRFCNM model produces the lowest MAE values, 3.2%, 3.5%, and 7.5%, for SegTrackV2, DAVIS, and FBMS datasets, respectively, with hand-crafted color features, compared with the existing models.
Similar content being viewed by others
Data availability
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
References
A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “Enet: A deep neural network architecture for real-time semantic segmentation,” ArXiv Prepr. ArXiv160602147, 2016.
Li J, Liang X, Shen S, Xu T, Feng J, Yan S (2017) Scale-aware fast R-CNN for pedestrian detection. IEEE Trans Multimed 20(4):985–996
Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Li G, Yu Y (2016) “Deep contrast learning for salient object detection”. In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp 478–487
Pan H, Jiang H (2016) “A deep learning based fast image saliency detection algorithm”.ArXiv Prepr. ArXiv160200577
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386
Simonyan K, Zisserman A (2014) “Two-stream convolutional networks for action recognition in videos”. ArXiv Prepr. ArXiv14062199
Wang L, Ouyang W, Wang X, Lu H (2015) “Visual tracking with fully convolutional networks”. In:Proceedings of the IEEE international conference on computer vision, pp 3119–3127
Wang L, Wang L, Lu H, Zhang P, Ruan X (2018) Salient object detection with recurrent fully convolutional networks. IEEE Trans Pattern Anal Mach Intell 41(7):1734–1746
Gastal ES, Oliveira MM (2012) Adaptive manifolds for real-time high-dimensional filtering. ACM Trans Graph TOG 31(4):1–13
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184
Goferman S, Zelnik-Manor L, Tal A (2012) Context-Aware Saliency Detection. IEEE Trans Pattern Anal Mach Intell 34(10):1915–1926. https://doi.org/10.1109/TPAMI.2011.272
Cheng M-M, Mitra NJ, Huang X, Torr PHS, Hu S-M (2015) Global Contrast Based Salient Region Detection. IEEE Trans Pattern Anal Mach Intell 37(3):569–582. https://doi.org/10.1109/TPAMI.2014.2345401
Mahamud S, Williams LR, Thornber KK, Xu K (2003) Segmentation of multiple salient closed contours from real images. IEEE Trans Pattern Anal Mach Intell 25(4):433–444
Yang B, Zhang X, Chen L, Yang H, Gao Z (2017) Edge guided salient object detection. Neurocomputing 221:60–71
Li J, Xia C, Chen X (2018) A Benchmark Dataset and Saliency-Guided Stacked Autoencoders for Video-Based Salient Object Detection. IEEE Trans Image Process 27(1):349–364. https://doi.org/10.1109/TIP.2017.2762594
Yan Y et al (2018) Unsupervised image saliency detection with Gestalt-laws guided optimization and visual attention based refinement. Pattern Recognit 79:65–78
Zhang P, Wang D, Lu H, Wang H, Yin B (2017) “Learning uncertain convolutional features for accurate saliency detection”. In:Proceedings of the IEEE International Conference on computer vision, pp 212–221
Sajid H, Cheung S-CS, Jacobs N (2019) Motion and appearance based background subtraction for freely moving cameras. Signal Process Image Commun 75:11–21
Liang J, Zhou J, Tong L, Bai X, Wang B (2018) Material based salient object detection from hyperspectral images. Pattern Recognit 76:476–490
Xiao F, Peng L, Fu L, Gao X (2018) Salient object detection based on eye tracking data. Signal Process 144:392–397
Fu K, Gu IY-H, Yang J (2018) Spectral salient object detection. Neurocomputing 275:788–803
Li H, Chen J, Lu H, Chi Z (2017) CNN for saliency detection with low-level feature integration. Neurocomputing 226:212–220
Qu L, He S, Zhang J, Tian J, Tang Y, Yang Q (2017) RGBD salient object detection via deep fusion. IEEE Trans Image Process 26(5):2274–2285
Huang K, Gao S (2020) Image saliency detection via multi-scale iterative CNN. Vis Comput 36(7):1355–1367. https://doi.org/10.1007/s00371-019-01734-2
Huang L, Song K, Wang J, Niu M, Yan Y (2022) Multi-Graph Fusion and Learning for RGBT Image Saliency Detection. IEEE Trans Circuits Syst Video Technol 32(3):1366–1377. https://doi.org/10.1109/TCSVT.2021.3069812
Zhang Q, Xiao X, Wang X, Wang S, Kwong S, Jiang J (2022) Adaptive Viewpoint Feature Enhancement-Based Binocular Stereoscopic Image Saliency Detection. IEEE Trans Circuits Syst Video Technol 32(10):6543–6556. https://doi.org/10.1109/TCSVT.2022.3171563
Fang Y, Wang Z, Lin W, Fang Z (2014) Video Saliency Incorporating Spatiotemporal Cues and Uncertainty Weighting. IEEE Trans Image Process 23(9):3910–3921. https://doi.org/10.1109/TIP.2014.2336549
Wang W, Shen J, Shao L (2017) Video salient object detection via fully convolutional networks. IEEE Trans Image Process 27(1):38–49
Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) “Beyond short snippets: Deep networks for video classification”. In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4694–4702
Xingjian SHI, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W (2015) “Convolutional LSTM network: A machine learning approach for precipitation nowcasting”. In:Advances in neural information processing systems, pp 802–810
Chen Y, Zou W, Tang Y, Li X, Xu C, Komodakis N (2018) SCOM: Spatiotemporal Constrained Optimization for Salient Object Detection. IEEE Trans Image Process 27(7):3345–3357. https://doi.org/10.1109/TIP.2018.2813165
Le T-N, Sugimoto A (2018) Video Salient Object Detection Using Spatiotemporal Deep Features. IEEE Trans Image Process 27(10):5002–5015. https://doi.org/10.1109/TIP.2018.2849860
Song H, Wang W, Zhao S, Shen J, Lam K-M (2018) “Pyramid dilated deeper convlstm for video salient object detection”. In: Proceedings of the European conference on computer vision (ECCV), pp 715–731
Li G, Xie Y, Wei T, Wang K, Lin L (2018) “Flow guided recurrent neural encoder for video salient object detection”. In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3243–3252
Jiao L et al (2019) A Survey of Deep Learning-Based Object Detection. IEEE Access 7:128837–128868. https://doi.org/10.1109/ACCESS.2019.2939201
Huang K, Li G, Liu S (2020) Learning channel-wise spatio-temporal representations for video salient object detection. Neurocomputing 403:325–336. https://doi.org/10.1016/j.neucom.2020.04.015
Lu X, Wang W, Ma C, Shen J, Shao L, Porikli F (2019) “See more, know more: Unsupervised video object segmentation with co-attention siamese networks,”. In:Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3623–3632
Qin Z, Lu X, Nie X, Liu D, Yin Y, Wang W (2023) Coarse-to-fine video instance segmentation with factorized conditional appearance flows. IEEECAA J Autom Sin 10(5):1192–1208
Rahtu E, Kannala J, Salo M, Heikkilä J (2010) “Segmenting salient objects from images and videos”. In: European conference on computer vision, Springer, pp 366–379
Chang Q, Zhu S (2021) “Temporal-spatial feature pyramid for video saliency detection”.ArXiv Prepr. ArXiv210504213
Jian M, Wang J, Yu H, Wang G-G (2021) Integrating object proposal with attention networks for video saliency detection. Inf Sci 576:819–830. https://doi.org/10.1016/j.ins.2021.08.069
Tang L, Li B, Kuang S, Song M, Ding S (2022) Re-thinking the relations in co-saliency detection. IEEE Trans Circuits Syst Video Technol 32(8):5453–5466. https://doi.org/10.1109/TCSVT.2022.3150923
Long J, Shelhamer E, Darrell T (2015) “Fully convolutional networks for semantic segmentation”. In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Süsstrunk S (2012) SLIC Superpixels Compared to State-of-the-Art Superpixel Methods. IEEE Trans Pattern Anal Mach Intell 34(11):2274–2282. https://doi.org/10.1109/TPAMI.2012.120
Kim J, Han D, Tai Y-W, Kim J (2016) Salient Region Detection via High-Dimensional Color Transform and Local Spatial Support. IEEE Trans Image Process 25(1):9–23. https://doi.org/10.1109/TIP.2015.2495122
Milan A, Leal-Taixé L, Reid I, Roth S, Schindler K (2016) “MOT16: A benchmark for multi-object tracking”.ArXiv Prepr. ArXiv160300831
Jia Y et al (2014) “Caffe: Convolutional Architecture for Fast Feature Embedding,” in Proceedings of the 22nd ACM international conference on Multimedia, Orlando Florida USA: ACM, pp 675–678. https://doi.org/10.1145/2647868.2654889
Borji A, Cheng M-M, Jiang H, Li J (2015) Salient Object Detection: A Benchmark. IEEE Trans Image Process 24(12):5706–5722. https://doi.org/10.1109/TIP.2015.2487833
Tsai D, Flagg M, Nakazawa A, Rehg JM (2012) Motion coherent tracking using multi-label MRF optimization. Int J Comput Vis 100(2):190–202
Hutchison D et al (2010) “Object Segmentation by Long Term Analysis of Point Trajectories,” in Computer Vision – ECCV 2010, K. Daniilidis, P. Maragos, and N. Paragios, Eds., in Lecture Notes in Computer Science, vol. 6315. Berlin, Heidelberg: Springer Berlin Heidelberg, pp 282–295. https://doi.org/10.1007/978-3-642-15555-0_21
Perazzi F, Pont-Tuset J, McWilliams B, Van Gool L, Gross M, Sorkine-Hornung A (2016) “A benchmark dataset and evaluation methodology for video object segmentation,” In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 724–732
Navalpakkam V, Itti L (2005) Modeling the influence of task on attention. Vision Res 45(2):205–231
Wei Y, Wen F, Zhu W, Sun J (2012) “Geodesic saliency using background priors”. In European conference on computer vision, Springer, 2012, pp 29–42
Fu H, Cao X, Tu Z (2013) Cluster-Based Co-Saliency Detection. IEEE Trans Image Process 22(10):3766–3778. https://doi.org/10.1109/TIP.2013.2260166
Zhu W, Liang S, Wei Y, Sun J (2014) “Saliency optimization from robust background detection”. In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2814–2821
Yang C, Zhang L, Lu H, Ruan X, Yang M-H (2013) “Saliency detection via graph-based manifold ranking”. In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3166–3173
Zhou F, Bing Kang S, Cohen MF (2014) “Time-mapping using space-time saliency”. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3358–3365
Wang L, Lu H, Ruan X, Yang M-H (2015) “Deep networks for saliency detection via local estimation and global search”. In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3183–3192
Jiang H, Wang J, Yuan Z, Wu Y, Zheng N, Li S (2013) “Salient object detection: A discriminative regional feature integration approach”. In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2083–2090
Wang W, Shen J, Porikli F (2015) “Saliency-aware geodesic video object segmentation,” In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3395–3402
Wang W, Shen J, Shao L (2015) Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans Image Process 24(11):4185–4196
Liu N, Han J (2016) “Dhsnet: Deep hierarchical saliency network for salient object detection”. In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp 678–686
Wang L, Wang L, Lu H, Zhang P, Ruan X (2016) “Saliency detection with recurrent fully convolutional networks,” in European conference on computer vision, Springer, pp 825–841
Hou Q, Cheng M-M, Hu X, Borji A, Tu Z, Torr PH (2017) “Deeply supervised salient object detection with short connections”. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3203–3212
Ji Y, Zhang H, Jie Z, Ma L, Jonathan Wu QM (2021) CASNet: A Cross-Attention Siamese Network for Video Salient Object Detection. IEEE Trans Neural Netw Learn Syst 32(6):2676–2690. https://doi.org/10.1109/TNNLS.2020.3007534
Liu N, Han J, Yang M-H (2018) “Picanet: Learning pixel-wise contextual attention for saliency detection”. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3089–3098
Huang L, Yan P, Li G, Wang Q, Lin L (2019) Attention embedded spatio-temporal network for video salient object detection. IEEE Access 7:166203–166213
Xu C, Gao Z, Zhang H, Li S, de Albuquerque VHC (2021) Video salient object detection using dual-stream spatiotemporal attention. Appl Soft Comput 108:107433
Liu Y, Han J, Zhang Q, Wang L (2019) Salient Object Detection via Two-Stage Graphs. IEEE Trans Circuits Syst Video Technol 29(4):1023–1037. https://doi.org/10.1109/TCSVT.2018.2823769
Lu H, Li X, Zhang L, Ruan X, Yang M-H (2016) Dense and Sparse Reconstruction Error Based Saliency Descriptor. IEEE Trans Image Process 25(4):1592–1603. https://doi.org/10.1109/TIP.2016.2524198
Zhang L, Yang C, Lu H, Ruan X, Yang M-H (2017) Ranking Saliency. IEEE Trans Pattern Anal Mach Intell 39(9):1892–1904. https://doi.org/10.1109/TPAMI.2016.2609426
Zhou L, Yang Z, Yuan Q, Zhou Z, Hu D (2015) Salient Region Detection via Integrating Diffusion-Based Compactness and Local Contrast. IEEE Trans Image Process 24(11):3308–3320. https://doi.org/10.1109/TIP.2015.2438546
Funding
None.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical approval
This research does not contain any studies with human participants or animals performed by any authors.
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Arora, I., Gangadharappa, M. SRFCNM: Spatiotemporal recurrent fully convolutional network model for salient object detection. Multimed Tools Appl 83, 38009–38036 (2024). https://doi.org/10.1007/s11042-023-17009-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17009-x