Multimedia Tools and Applications

, Volume 78, Issue 21, pp 30793–30807 | Cite as

Self-attention recurrent network for saliency detection

  • Fengdong Sun
  • Wenhui LiEmail author
  • Yuanyuan Guan


Feature maps in deep neural networks generally contain different semantics. Existing methods often omit their characteristics that may lead to sub-optimal results. In this paper, we propose a novel end-to-end deep saliency network which could effectively utilize multi-scale feature maps according to their characteristics. Shallow layers generally contain more local information, and deep layers have advantages in global semantics. Therefore, our network could generate elaborate saliency maps by exploiting the different semantics of feature maps in different layers. On one hand, local information of shallow layers is enhanced by a recurrent structure which shared convolution kernels at different time steps. On the other hand, global information of deep layers is utilized by a self-attention module, which generates attention weights for salient objects and backgrounds thus achieve better performance. Experimental results on four widely used datasets demonstrate that our method has advantages in performance over existing algorithms.


Saliency detection Recurrent convolutional layer Self attention module 



This work was supported by the Science and Technology Development Plan of Jilin Province under Grant 20170204020GX, the National Science Foundation of China under Grant U1564211.


  1. 1.
    Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: A system for large-scale machine learning. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI’16. USENIX Association, pp265–283.
  2. 2.
    Achantay R, Hemamiz S, Estraday F, Su̇sstrunky S (2009) Frequency-tuned salient region detection. In: 2009 IEEE Computer society conference on computer vision and pattern recognition workshops, CVPR workshops 2009, pp 1597–1604.
  3. 3.
    Bi S, Li G, Yu Y (2014) Person re-identification using multiple experts with random subspaces. Int J Image Graph 2(2):151–157Google Scholar
  4. 4.
    Borji A, Frintrop S, Sihite DN, Itti L (2012) Adaptive object tracking by learning background context. In: IEEE Computer society conference on computer vision and pattern recognition workshops, pp 23–30.
  5. 5.
    Cheng M, Zhang F, Mitra N, Huang X, Hu S (2010) RepFinder: Finding Approximately Repeated Scene Elements for Image Editing. ACM Trans Graph TOG 29(4):1. CrossRefGoogle Scholar
  6. 6.
    Cheng MM, Zhang GX, Mitra NJ, Huang X, Hu SM (2011) Global Contrast based Salient Region Detection, pp 409–416.
  7. 7.
    Cheng MM, Hou QB, Zhang SH, Rosin PL (2017) Intelligent visual media processing:when graphics meets vision. J Comput Sci Technol 32(1):110–121CrossRefGoogle Scholar
  8. 8.
    Guo C, Zhang L (2010) A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression. IEEE Trans Image Process 19(1):185–198. MathSciNetCrossRefGoogle Scholar
  9. 9.
    Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition, pp 248–255.
  10. 10.
    Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Teh YW, Titterington M (eds) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol 9. PMLR, Chia Laguna Resort, Sardinia, pp 249–256.
  11. 11.
    Hou Q, Cheng MM, Hu X, Borji A, Tu Z, Torr PH (2018) Deeply Supervised Salient Object Detection with Short Connections.
  12. 12.
    Hua Y, Zhao Z, Tian H, Guo X, Cai A (2013) A probabilistic saliency model with memory-guided top-down cues for free-viewing. In: IEEE International conference on multimedia and expo, pp 1–6Google Scholar
  13. 13.
    Itti L, Koch C, Niebur E (1998) A Model of Saliency Based Visual Attention for Rapid Scene Analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259. CrossRefGoogle Scholar
  14. 14.
    Koch C, Ullman S (1985) Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiol 4(4):219–27. Google Scholar
  15. 15.
    Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet Classification with Deep Convolutional Neural Networks. Advances In Neural Information Processing Systems 1:1097–1105. Google Scholar
  16. 16.
    Kuen J, Wang Z, Wang G (2016) Recurrent attentional networks for saliency detection. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 3668–3677.
  17. 17.
    Li Y, Hou X, Koch C, Rehg J, Yuille A (2014) The secrets of salient object segmentation, pp 4321–4328.
  18. 18.
    Li G, Yu Y (2016) Deep contrast learning for salient object detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, vol 2016, pp 478–487Google Scholar
  19. 19.
    Liang M, Hu X (2015) Recurrent convolutional neural network for object recognition. IEEE Computer Society, Washington, pp 3367–3375. arXiv: Google Scholar
  20. 20.
    Liu N, Han J (2016) DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 678–686.
  21. 21.
    Luo Z, Mishra A, Achkar A, Eichel J, Li S, Jodoin P (2017) Non-local deep features for salient object detection. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR), pp 6593–6601.
  22. 22.
    Ma YF, Lu L, Zhang HJ, Li M (2002) A user attention model for video summarization. In: Proceedings of the Tenth ACM International Conference on Multimedia, MULTIMEDIA ’02. ACM, New York, pp 533–542.
  23. 23.
    Simonyan K, Zisserman A (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. pp. 1–14. arXiv:1409.1556
  24. 24.
    Wang Y, Zhao Q (2015) Superpixel tracking via graph-based semi-supervised svm and supervised saliency detection. In: IEEE International conference on multimedia and expo, pp 1–6Google Scholar
  25. 25.
    Wang Y, Lin X, Wu L, Zhang W, Zhang Q, Huang X (2015) Robust subspace clustering for multi-view data by exploiting correlation consensus. IEEE Trans Image Process 24(11):3939–3949. MathSciNetCrossRefGoogle Scholar
  26. 26.
    Wang T, Zhang L, Lu H, Sun C, Qi J (2016) Kernelized subspace ranking for saliency detection. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision – ECCV 2016. Springer International Publishing, Cham, pp 450–466Google Scholar
  27. 27.
    Wang Y, Zhang W, Wu L, Lin X, Fang M, Pan S (2016) Iterative views agreement: An iterative low-rank based structured optimization method to multi-view spectral clustering. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16. AAAI Press, pp 2153–2159.
  28. 28.
    Wang L, Wang L, Lu H, Zhang P, Xiang R (2016) Saliency detection with recurrent fully convolutional networks. In: European conference on computer vision, pp 825–841Google Scholar
  29. 29.
    Wang T, Borji A, Zhang L, Zhang P, Lu H (2017) A stagewise refinement model for detecting salient objects in images. 2017 IEEE International conference on computer vision (ICCV), pp 4039–4048.
  30. 30.
    Wang Y, Lin X, Wu L, Zhang W (2017) Effective multi-query expansions: Collaborative deep networks for robust landmark retrieval. IEEE Trans Image Process 26(3):1393–1404. MathSciNetCrossRefGoogle Scholar
  31. 31.
    Wang Y, Wu L (2018) Beyond low-rank representations: Orthogonal clustering basis reconstruction with optimized graph structure for multi-view spectral clustering, vol 103.
  32. 32.
    Wang Y, Wu L, Lin X, Gao J (2018) Multiview spectral clustering via structured low-rank matrix factorization. In: IEEE Transactions on Neural Networks and Learning Systems, pp 1–11.
  33. 33.
    Wang Y, Zhang W, Wu L, Lin X, Zhao X (2017) Unsupervised metric fusion over multiview data by graph random walk-based cross-view diffusion. IEEE Trans Neural Netw Learn Syst 28(1):57–70. CrossRefGoogle Scholar
  34. 34.
    Wu L, Wang Y, Gao J, Li X (2018) Deep adaptive feature embedding with local sample distributions for person re-identification. Pattern Recogn 73:275–288CrossRefGoogle Scholar
  35. 35.
    Wu L, Wang Y, Li X, Gao J (2018) Deep attention-based spatially recursive networks for fine-grained visual recognition. IEEE Transactions on Cybernetics.
  36. 36.
    Wu L, Wang Y, Li X, Gao J (2018) What-and-where to match: Deep spatially multiplicative integration networks for person re-identification. Pattern Recogn 76:727–738CrossRefGoogle Scholar
  37. 37.
    Yang J (2012) Top-down visual saliency via joint crf and dictionary learning. In: Computer vision and pattern recognition, pp 2296–2303Google Scholar
  38. 38.
    Yang C, Zhang L, Lu H, Ruan X, Yang MH (2013) Saliency detection via graph-based manifold ranking. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 3166–3173.
  39. 39.
    Zhang GX, Cheng MM, Hu SM, Martin RR (2009) A shape-preserving approach to image resizing. Comput Graph Forum 28(7):1897–1906. CrossRefGoogle Scholar
  40. 40.
    Zhang J, Sclaroff S, Lin Z, Shen X, Price B, Mech R (2016) Minimum barrier salient object detection at 80 FPS. pp 1404–1412.
  41. 41.
    Zhang P, Wang D, Lu H, Wang H, Yin B (2017) Learning uncertain convolutional features for accurate saliency detection. In: 2017 IEEE International conference on computer vision (ICCV), pp 212–221.
  42. 42.
    Zhang H, Goodfellow I, Metaxas D, Odena A (2018) Self-Attention Generative Adversarial Networks. arXiv:1805.08318
  43. 43.
    Zhang P, Wang L, Wang D, Lu H, Shen C (2018) Agile Amulet: Real-Time Salient Object Detection with Contextual Attention. arXiv:1802.06960
  44. 44.
    Zhang X, Wang T, Qi J, Lu H, Wang G (2018) Progressive Attention Guided Recurrent Network for Salient Object Detection. In: Cvpr, pp. 714–722.
  45. 45.
    Zhu L, Klein DA, Frintrop S, Cao Z, Cremers AB (2014) A multisize superpixel approach for salient object detection based on multivariate normal distribution estimation. IEEE Trans Image Process 23(12):5094–5107. MathSciNetCrossRefGoogle Scholar
  46. 46.
    Zhu W, Liang S, Wei Y, Sun J (2014) Saliency optimization from robust background detection. In: 2014 IEEE Conference on computer vision and pattern recognition. Columbus, OH, pp 2814–2821.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.College of Computer Science and TechnologyJilin UniversityChangchunChina

Personalised recommendations