STA-Net: spatial-temporal attention network for video salient object detection

Bi, Hong-Bo; Lu, Di; Zhu, Hui-Hui; Yang, Li-Na; Guan, Hua-Ping

doi:10.1007/s10489-020-01961-4

STA-Net: spatial-temporal attention network for video salient object detection

Published: 13 November 2020

Volume 51, pages 3450–3459, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Hong-Bo Bi ORCID: orcid.org/0000-0003-2442-330X¹,
Di Lu¹,
Hui-Hui Zhu¹,
Li-Na Yang¹ &
…
Hua-Ping Guan²

1194 Accesses
24 Citations
Explore all metrics

Abstract

This paper conducts a systematic study on the role of spatial and temporal attention mechanism in the video salient object detection (VSOD) task. We present a two-stage spatial-temporal attention network, named STA-Net, which makes two major contributions. In the first stage, we devise a Multi-Scale-Spatial-Attention (MSSA) module to reduce calculation cost on non-salient regions while exploiting multi-scale saliency information. Such a sliced attention method offers an individual way to efficiently exploit the high-level features of the network with an enlarged receptive field. The second stage is to propose a Pyramid-Saliency-Shift-Aware (PSSA) module, which puts emphasis on the importance of dynamic object information since it offers a valid shift cue to confirm salient object and capture temporal information. Such a temporal detection module is able to encourage precise salient region detection. Exhaustive experiments show that the proposed STA-Net is effective for video salient object detection task, and achieves compelling performance in comparison with state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel spatiotemporal attention enhanced discriminative network for video salient object detection

Article 25 August 2021

Spatiotemporal context-aware network for video salient object detection

Article 20 May 2022

Video-based salient object detection via spatio-temporal difference and coherence

Article 24 May 2017

References

Fukuchi K, Miyazato K, Kimura A, Takagi S, Yamato J (2009) Saliency-based video segmentation with graph cuts and sequentially updated priors. In: 2009 IEEE International Conference on Multimedia and Expo, IEEE, pp 638–641
Hua G, Zhang C, Liu Z, Zhang Z, Shan Y (2009) Efficient scale-space spatiotemporal saliency tracking for distortion-free video retargeting. In: Asian Conference on Computer Vision, Springer, Berlin, Heidelberg, pp 182–192
Chen Y, Zhang W, Wang S, Li L, Huang Q (2018) Saliency-based spatiotemporal attention for video captioning. In: 2018 IEEE Fourth International Conference on Multimedia Big Data BigMM pp. 1–8, IEEE
Hadizadeh H, Bajić I. V. (2013) Saliency-aware video compression. IEEE Trans Image Process 23(1):19–33
Article MathSciNet Google Scholar
Tu Z, Guo Z, Xie W, Yan M, Veltkamp RC, Li B, Yuan J (2017) Fusing disparate object signatures for salient object detection in video. Pattern Recogn 72:285–299
Article Google Scholar
Huang L, Luo B (2018) Video-based salient object detection via spatio-temporal difference and coherence. Multimedia Tools and Applications 77(9):10685–10699
Article Google Scholar
Fu K, Gu IY, Yun Y, Gong C, Yang J (2014) Graph construction for salient object detection in videos. In: 2014 22nd International Conference on Pattern Recognition (pp. 2371–2376), IEEE
Wei Y, Wen F, Zhu W, Sun J (2012) Geodesic saliency using background priors. In: European conference on computer vision (pp. 29–42). Springer, Berlin, Heidelberg
Chen Y, Zou W, Tang Y, Li X, Xu C, Komodakis N (2018) SCOM: Spatiotemporal Constrained optimization for salient object detection. IEEE Trans Image Process 27(7):3345–3357
Article MathSciNet Google Scholar
Treisman AM, Gelade G (1980) A feature-integration theory of attention. Cognitive psychology 12(1):97–136
Article Google Scholar
Xi T, Zhao W, Wang H, Lin W (2016) Salient object detection with spatiotemporal background priors for video. IEEE Trans Image Process 26(7):3425–3436
Article MathSciNet Google Scholar
Wang W, Shen J, Yang R, Porikli F (2017) Saliency-aware video object segmentation. IEEE transactions on pattern analysis and machine intelligence 40(1):20–33
Article Google Scholar
Wang W, Shen J, Shao L (2015) Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans Image Process 24(11):4185–4196
Article MathSciNet Google Scholar
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473
Nie G, Guo Y, Liu Y, Wang Y (2017) Real-time salient object detection based on fully convolutional networks. In: Chinese Conference on Image and Graphics Technologies (pp. 189–198). Springer, Singapore
Bi H, Lu D, Li N, Yang L, Guan H (2019) Multi-Level Model for Video Saliency Detection. In: 2019 IEEE International Conference on Image Processing (ICIP) (pp. 4654–4658), IEEE
Fan DP, Wang W, Cheng MM, Shen J (2019) Shifting more attention to video salient object detection. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 8554–8564
Song H, Wang W, Zhao S, Shen J, Lam KM (2018) Pyramid dilated deeper convlstm for video salient object detection. In: proceedings of the European conference on computer vision (ECCV) (pp. 715–731)
Li G, Xie Y, Wei T, Wang K, Lin L (2018) Flow guided recurrent neural encoder for video salient object detection. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 3243–3252
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Tang X (2017) Residual attention network for image classification. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Pérez-Hernández F., Tabik S, Lamas A, Olmos R, Fujita H, Herrera F (2020) Object detection binary classifiers methodology based on deep learning to identify small objects handled similarly: Application in video surveillance. Knowledge-Based Systems, 105590
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Woo S, Park J, Lee JY, So Kweon I (2018) Cbam: Convolutional block attention module. In: proceedings of the European conference on computer vision (ECCV), pp 3–19
Gao P, Yuan R, Wang F, Xiao L, Fujita H, Zhang Y (2020) Siamese attentional keypoint network for high performance visual tracking. Knowledge-Based Systems 193:105–448
Google Scholar
Gao P, Zhang Q, Wang F, Xiao L, Fujita H, Zhang Y (2020) Learning reinforced attentional representation for end-to-end visual tracking. Information Sciences 517:52–67
Article Google Scholar
Perazzi F, Pont-Tuset J, McWilliams B, Van Gool L, Gross M, Sorkine-Hornung A (2016) A benchmark dataset and evaluation methodology for video object segmentation. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 724–732
Li F, Kim T, Humayun A, Tsai D, Rehg JM (2013) Video segmentation by tracking many figure-ground segments. In: proceedings of the IEEE International Conference on Computer Vision (pp. 2192–2199)
Rahtu E, Kannala J, Salo M, Heikkilä J (2010) Segmenting salient objects from images and videos. In: European conference on computer vision (pp. 366–379). Springer, Berlin, Heidelberg
Zhou F, Bing Kang S, Cohen MF (2014) Time-mapping using space-time saliency. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3358–3365
Liu Z, Zhang X, Luo S, Le Meur O (2014) Superpixel-based spatiotemporal saliency detection. IEEE transactions on circuits and systems for video technology 24(9):1522–1540
Article Google Scholar
Zhang J, Sclaroff S, Lin Z, Shen X, Price B, Mech R (2015) Minimum barrier salient object detection at 80 fps. In: proceedings of the IEEE international conference on computer vision, pp 1404–1412
Zhang J, Sclaroff S, Lin Z, Shen X, Price B, Mech R (2015) Minimum barrier salient object detection at 80 fps. In: proceedings of the IEEE international conference on computer vision, pp 1404–1412
Tu WC, He S, Yang Q, Chien SY (2016) Real-time salient object detection with a minimum spanning tree. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 2334–2342
Liu Z, Li J, Ye L, Sun G, Shen L (2016) Saliency detection for unconstrained videos using superpixel-level graph and spatiotemporal propagation. IEEE transactions on circuits and systems for video technology 27(12):2527–2542
Article Google Scholar
Chen C, Li S, Wang Y, Qin H, Hao A (2017) Video saliency detection via spatial-temporal fusion and low-rank coherency diffusion. IEEE Trans Image Process 26(7):3156–3170
Article MathSciNet Google Scholar
Tang Y, Zou W, Jin Z, Chen Y, Hua Y, Li X (2018) Weakly supervised salient object detection with spatiotemporal cascade neural networks. IEEE Transactions on Circuits and Systems for Video Technology 29(7):1973–1984
Article Google Scholar
Wang W, Shen J, Shao L (2017) Video salient object detection via fully convolutional networks. IEEE Trans Image Process 27(1):38–49
Article MathSciNet Google Scholar
Li S, Seybold B, Vorobyov A, Lei X, Jay Kuo CC (2018) Unsupervised video object segmentation with motion-based bilateral networks. In: proceedings of the European Conference on Computer Vision (ECCV) (pp. 207–223)
Fan DP, Cheng MM, Liu Y, Li T, Borji A (2017) Structure-measure: a new way to evaluate foreground maps. In: proceedings of the IEEE international conference on computer vision, pp 4548–4557

Download references

Author information

Authors and Affiliations

NorthEast Petroleum University, Daqing, China
Hong-Bo Bi, Di Lu, Hui-Hui Zhu & Li-Na Yang
Fujian Normal University, Fuzhou, China
Hua-Ping Guan

Authors

Hong-Bo Bi
View author publications
You can also search for this author in PubMed Google Scholar
Di Lu
View author publications
You can also search for this author in PubMed Google Scholar
Hui-Hui Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Li-Na Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hua-Ping Guan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong-Bo Bi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bi, HB., Lu, D., Zhu, HH. et al. STA-Net: spatial-temporal attention network for video salient object detection. Appl Intell 51, 3450–3459 (2021). https://doi.org/10.1007/s10489-020-01961-4

Download citation

Accepted: 18 September 2020
Published: 13 November 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s10489-020-01961-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

STA-Net: spatial-temporal attention network for video salient object detection

Abstract

Access this article

Similar content being viewed by others

A novel spatiotemporal attention enhanced discriminative network for video salient object detection

Spatiotemporal context-aware network for video salient object detection

Video-based salient object detection via spatio-temporal difference and coherence

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

STA-Net: spatial-temporal attention network for video salient object detection

Abstract

Access this article

Similar content being viewed by others

A novel spatiotemporal attention enhanced discriminative network for video salient object detection

Spatiotemporal context-aware network for video salient object detection

Video-based salient object detection via spatio-temporal difference and coherence

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation