Abstract
Even though temporal information matters for the quality of video saliency detection, many problems still arise/emerge in present network frameworks, such as bad performance in time-space coherence and edge continuity. In order to solve these problems, this paper proposes a full convolutional neural network, which integrates temporal differential and pixel gradient to fine tune the edges of salient targets. Considering the features of neighboring frames are highly relevant because of their proximity in location, a co-attention mechanism is used to put pixel-wise weight on the saliency probability map after features extraction with multi-scale pooling so that attention can be paid on both the edge and central of images. And the changes of pixel gradients of original images are used to recursively improve the continuity of target edges and details of central areas. In addition, residual networks are utilized to integrate information between modules, ensuring stable connections between the backbone network and modules and propagation of pixel gradient changes. In addition, a self-adjustment strategy for loss functions is presented to solve the problem of overfitting in experiments. The method presented in the paper has been tested with three available public datasets and its effectiveness has been proved after comparing with 6 other typically stat-of-the-art methods.
Similar content being viewed by others
Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Guo C, Zhang L (2009) A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Transactions on image processing 19(1):185–198
Wu H, Li G, Luo X (2014) Weighted attentional blocks for probabilistic object tracking. The Visual Computer 30(2):229–243
Fan Q, Luo W, Xia Y et al (2019) Metrics and methods of video quality assessment: a brief review. Multimedia Tools and Applications 78(22):31019–31033
Götze N, Mertsching B, Schmalz S, et al. (1996) Multistage recognition of complex objects with the active vision system NAVIS
Lu X, Yuan Y, Zheng X (2016) Joint dictionary learning for multispectral change detection. IEEE Transactions on cybernetics 47(4):884–897
Wang Q, Wan J, Yuan Y (2018) Locality constraint distance metric learning for traffic congestion detection. Pattern Recognition 75:272–281
Wang Q, Gao J, Yuan Y (2017) Embedding structured contour and location prior in siamesed fully convolutional networks for road detection. IEEE Transactions on Intelligent Transportation Systems 19(1):230–241
Wang Q, Gao J, Yuan Y (2017) A joint convolutional neural networks and context transfer for street scenes labeling. IEEE Transactions on Intelligent Transportation Systems 19(5):1457–1470
Wang Q, Wan J, Yuan Y (2017) Deep metric learning for crowdedness regression. IEEE Transactions on Circuits and Systems for Video Technology 28(10):2633–2643
Yang J, Yang MH (2016) Top-down visual saliency via joint CRF and dictionary learning. IEEE transactions on pattern analysis and machine intelligence 39(3):576–588
Gao D, Vasconcelos N (2007) Bottom-up saliency is a discriminant process 2007 IEEE 11th International Conference on Computer Vision. IEEE, 1-6
Cheng MM, Mitra NJ, Huang X et al (2014) Global contrast based salient region detection. IEEE transactions on pattern analysis and machine intelligence 37(3):569–582
Fang Y, Wang Z, Lin W et al (2014) Video saliency incorporating spatiotemporal cues and uncertainty weighting. IEEE transactions on image processing 23(9):3910–3921
Wang W, Shen J, Shao L (2015) Consistent video saliency using local gradient flow optimization and global refinement. IEEE Transactions on Image Processing 24(11):4185–4196
Wang W, Shen J, Shao L (2017) Video salient object detection via fully convolutional networks. IEEE Transactions on Image Processing 27(1):38–49
Brox T, Malik J (2010) Object segmentation by long term analysis of point trajectories European conference on computer vision. Springer, Berlin, Heidelberg, pp 282–295
Li F, Kim T, Humayun A, et al. (2013) Video segmentation by tracking many figure-ground segments Proceedings of the IEEE International Conference on Computer Vision. 2192-2199
Perazzi F, Pont-Tuset J, McWilliams B, et al. (2016) A benchmark dataset and evaluation methodology for video object segmentation Proceedings of the IEEE conference on computer vision and pattern recognition. 724-732
Achanta R, Hemami S, Estrada F, et al. (2009) Frequency-tuned salient region detection 2009 IEEE conference on computer vision and pattern recognition. IEEE, 1597-1604
Fan D P, Wang W, Cheng M M, et al. (2019) Shifting more attention to video salient object detection Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8554-8564
Song H, Wang W, Zhao S, et al. (2018) Pyramid dilated deeper convlstm for video salient object detection Proceedings of the European conference on computer vision (ECCV). 715-731
Li G, Xie Y, Wei T, et al. (2018) Flow guided recurrent neural encoder for video salient object detection Proceedings of the IEEE conference on computer vision and pattern recognition. 3243-3252
Chen Y, Zou W, Tang Y et al (2018) SCOM: Spatiotemporal constrained optimization for salient object detection. IEEE Transactions on Image Processing 27(7):3345–3357
Li S, Seybold B, Vorobyov A, et al. (2018) Unsupervised video object segmentation with motion-based bilateral networks proceedings of the European Conference on Computer Vision (ECCV). 207-223
Wang B, Liu W, Han G et al (2020) Learning long-term structural dependencies for video salient object detection. IEEE Transactions on Image Processing 29:9017–9031
Jian M, Lam K-M, Dong J, Shen L (2014) Visual-patch-attention aware saliency detection, IEEE Trans Cybern, pp. 1575–1586
Wang Q, Lin J, Yuan Y (2016) Salient band selection for hyperspectral image classification via manifold ranking, IEEE Transactions on Neural Networks and Learning Systems, 1279–1289
Han J, Chen H, Liu N, Yan C, Li X (2017) Cnns-based rgb-d saliency detection via cross-view transfer and multiview fusion. IEEE Transactions on Cybernetics 48(11):3171–3183
Cong R, Lei J, Fu H, Lin W, Huang Q, Cao X, Hou C (2019) An iterative co-saliency framework for rgbd images. IEEE Transactions on Cybernetics 49(1):233–246
Cong R, Lei J, Fu H, Hou J, Huang Q, Kwong S (2020) Going from rgb to rgbd saliency: A depth-guided transformation model. IEEE Transactions on Cybernetics 50(8):3627–3639
Zhang M, Ji W, Piao Y, Li J, Zhang Y, Xu S, Lu H (2020) Lfnet: Light field fusion network for salient object detection. IEEE Transactions on Image Processing 29:6276–6287
Li C, Cong R, Kwong S, Hou J, Fu H, Zhu G, Zhang D, Huang Q (2020) Asif-net: Attention steered interweave fusion network for rgb-d salient object detection, IEEE Trans Cybern, pp.1–13
Jian M, Qi Q, Dong J et al (2018) Saliency detection using quaternionic distance based weber local descriptor and level priors. Multimed Tools Appl 77:14343–14360
Jian M, Wang J, Dong J et al (2020) Saliency detection using multiple low-level priors and a propagation mechanism. Multimed Tools Appl 79:33467–33482
Hu R, Deng Z, Zhu X. Multi-scale Graph Fusion for Co-saliency Detection. Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 7789–7796
Wang Z, Zhou Z, Lu H, Jiang J et al (2020) Global and local sensitivity guided key salient object re-augmentation for video saliency detection. Pattern Recognition 103:107275
Zhang K, Dong M, Liu B et al. (2021) DeepACG: Co-Saliency Detection via Semantic-aware Contrast Gromov-Wasserstein Distance. the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13703-13712
Wang Y, Wang R, Fan X, Wang T, He X (2023) Pixels, Regions, and Objects: Multiple Enhancement for Salient Object Detection, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 10031-10040
Acknowledgments
This work was supported by National Natural Science Foundation of China (NSFC) (61976123, 61601427, 61876098); the Taishan Young Scholars Program of Shandong Province; and Key Development Program for Basic Research of Shandong Province (ZR2020ZD44).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lu, X., Jian, M., Wang, R. et al. Video saliency detection via combining temporal difference and pixel gradient. Multimed Tools Appl 83, 37589–37602 (2024). https://doi.org/10.1007/s11042-023-17128-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17128-5