Advertisement

Multimedia Tools and Applications

, Volume 78, Issue 21, pp 30793–30807 | Cite as

Self-attention recurrent network for saliency detection

  • Fengdong Sun
  • Wenhui LiEmail author
  • Yuanyuan Guan
Article

Abstract

Feature maps in deep neural networks generally contain different semantics. Existing methods often omit their characteristics that may lead to sub-optimal results. In this paper, we propose a novel end-to-end deep saliency network which could effectively utilize multi-scale feature maps according to their characteristics. Shallow layers generally contain more local information, and deep layers have advantages in global semantics. Therefore, our network could generate elaborate saliency maps by exploiting the different semantics of feature maps in different layers. On one hand, local information of shallow layers is enhanced by a recurrent structure which shared convolution kernels at different time steps. On the other hand, global information of deep layers is utilized by a self-attention module, which generates attention weights for salient objects and backgrounds thus achieve better performance. Experimental results on four widely used datasets demonstrate that our method has advantages in performance over existing algorithms.

Keywords

Saliency detection Recurrent convolutional layer Self attention module 

Notes

Acknowledgments

This work was supported by the Science and Technology Development Plan of Jilin Province under Grant 20170204020GX, the National Science Foundation of China under Grant U1564211.

References

  1. 1.
    Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: A system for large-scale machine learning. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI’16. USENIX Association, pp265–283. http://dl.acm.org/citation.cfm?id=3026877.3026899
  2. 2.
    Achantay R, Hemamiz S, Estraday F, Su̇sstrunky S (2009) Frequency-tuned salient region detection. In: 2009 IEEE Computer society conference on computer vision and pattern recognition workshops, CVPR workshops 2009, pp 1597–1604.  https://doi.org/10.1109/CVPRW.2009.5206596
  3. 3.
    Bi S, Li G, Yu Y (2014) Person re-identification using multiple experts with random subspaces. Int J Image Graph 2(2):151–157Google Scholar
  4. 4.
    Borji A, Frintrop S, Sihite DN, Itti L (2012) Adaptive object tracking by learning background context. In: IEEE Computer society conference on computer vision and pattern recognition workshops, pp 23–30.  https://doi.org/10.1109/CVPRW.2012.6239191
  5. 5.
    Cheng M, Zhang F, Mitra N, Huang X, Hu S (2010) RepFinder: Finding Approximately Repeated Scene Elements for Image Editing. ACM Trans Graph TOG 29(4):1.  https://doi.org/10.1145/1778765.1778820. http://discovery.ucl.ac.uk/1327991/ CrossRefGoogle Scholar
  6. 6.
    Cheng MM, Zhang GX, Mitra NJ, Huang X, Hu SM (2011) Global Contrast based Salient Region Detection, pp 409–416.  https://doi.org/10.1109/CVPR.2011.5995344
  7. 7.
    Cheng MM, Hou QB, Zhang SH, Rosin PL (2017) Intelligent visual media processing:when graphics meets vision. J Comput Sci Technol 32(1):110–121CrossRefGoogle Scholar
  8. 8.
    Guo C, Zhang L (2010) A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression. IEEE Trans Image Process 19(1):185–198.  https://doi.org/10.1109/TIP.2009.2030969. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5223506 MathSciNetCrossRefGoogle Scholar
  9. 9.
    Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition, pp 248–255.  https://doi.org/10.1109/CVPR.2009.5206848
  10. 10.
    Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Teh YW, Titterington M (eds) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol 9. PMLR, Chia Laguna Resort, Sardinia, pp 249–256. http://proceedings.mlr.press/v9/glorot10a.html
  11. 11.
    Hou Q, Cheng MM, Hu X, Borji A, Tu Z, Torr PH (2018) Deeply Supervised Salient Object Detection with Short Connections.  https://doi.org/10.1109/TPAMI.2018.2815688
  12. 12.
    Hua Y, Zhao Z, Tian H, Guo X, Cai A (2013) A probabilistic saliency model with memory-guided top-down cues for free-viewing. In: IEEE International conference on multimedia and expo, pp 1–6Google Scholar
  13. 13.
    Itti L, Koch C, Niebur E (1998) A Model of Saliency Based Visual Attention for Rapid Scene Analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259.  https://doi.org/10.1016/S1053-5357(00)00088-3 CrossRefGoogle Scholar
  14. 14.
    Koch C, Ullman S (1985) Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiol 4(4):219–27.  https://doi.org/10.1016/j.imavis.2008.02.004. http://www.ncbi.nlm.nih.gov/pubmed/3836989 Google Scholar
  15. 15.
    Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet Classification with Deep Convolutional Neural Networks. Advances In Neural Information Processing Systems 1:1097–1105.  https://doi.org/10.1016/j.protcy.2014.09.007 Google Scholar
  16. 16.
    Kuen J, Wang Z, Wang G (2016) Recurrent attentional networks for saliency detection. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 3668–3677.  https://doi.org/10.1109/CVPR.2016.399
  17. 17.
    Li Y, Hou X, Koch C, Rehg J, Yuille A (2014) The secrets of salient object segmentation, pp 4321–4328.  https://doi.org/10.1109/CVPR.2014.43. http://www.stat.ucla.edu/yuille/Pubs10_12/LiHouKochRehgYuille.pdf
  18. 18.
    Li G, Yu Y (2016) Deep contrast learning for salient object detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, vol 2016, pp 478–487Google Scholar
  19. 19.
    Liang M, Hu X (2015) Recurrent convolutional neural network for object recognition. IEEE Computer Society, Washington, pp 3367–3375.  https://doi.org/10.1109/CVPR.2015.7298958. arXiv:https://arxiv.org/abs/1704.07709 Google Scholar
  20. 20.
    Liu N, Han J (2016) DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 678–686.  https://doi.org/10.1109/CVPR.2016.80. http://ieeexplore.ieee.org/document/7780449/
  21. 21.
    Luo Z, Mishra A, Achkar A, Eichel J, Li S, Jodoin P (2017) Non-local deep features for salient object detection. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR), pp 6593–6601.  https://doi.org/10.1109/CVPR.2017.698
  22. 22.
    Ma YF, Lu L, Zhang HJ, Li M (2002) A user attention model for video summarization. In: Proceedings of the Tenth ACM International Conference on Multimedia, MULTIMEDIA ’02. ACM, New York, pp 533–542.  https://doi.org/10.1145/641007.641116
  23. 23.
    Simonyan K, Zisserman A (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. pp. 1–14.  https://doi.org/10.1016/j.infsof.2008.09.005. arXiv:1409.1556
  24. 24.
    Wang Y, Zhao Q (2015) Superpixel tracking via graph-based semi-supervised svm and supervised saliency detection. In: IEEE International conference on multimedia and expo, pp 1–6Google Scholar
  25. 25.
    Wang Y, Lin X, Wu L, Zhang W, Zhang Q, Huang X (2015) Robust subspace clustering for multi-view data by exploiting correlation consensus. IEEE Trans Image Process 24(11):3939–3949.  https://doi.org/10.1109/TIP.2015.2457339 MathSciNetCrossRefGoogle Scholar
  26. 26.
    Wang T, Zhang L, Lu H, Sun C, Qi J (2016) Kernelized subspace ranking for saliency detection. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision – ECCV 2016. Springer International Publishing, Cham, pp 450–466Google Scholar
  27. 27.
    Wang Y, Zhang W, Wu L, Lin X, Fang M, Pan S (2016) Iterative views agreement: An iterative low-rank based structured optimization method to multi-view spectral clustering. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16. AAAI Press, pp 2153–2159. http://dl.acm.org/citation.cfm?id=3060832.3060922
  28. 28.
    Wang L, Wang L, Lu H, Zhang P, Xiang R (2016) Saliency detection with recurrent fully convolutional networks. In: European conference on computer vision, pp 825–841Google Scholar
  29. 29.
    Wang T, Borji A, Zhang L, Zhang P, Lu H (2017) A stagewise refinement model for detecting salient objects in images. 2017 IEEE International conference on computer vision (ICCV), pp 4039–4048.  https://doi.org/10.1109/ICCV.2017.433
  30. 30.
    Wang Y, Lin X, Wu L, Zhang W (2017) Effective multi-query expansions: Collaborative deep networks for robust landmark retrieval. IEEE Trans Image Process 26(3):1393–1404.  https://doi.org/10.1109/TIP.2017.2655449 MathSciNetCrossRefGoogle Scholar
  31. 31.
    Wang Y, Wu L (2018) Beyond low-rank representations: Orthogonal clustering basis reconstruction with optimized graph structure for multi-view spectral clustering, vol 103.  https://doi.org/10.1016/j.neunet.2018.03.006. http://www.sciencedirect.com/science/article/pii/S0893608018300911
  32. 32.
    Wang Y, Wu L, Lin X, Gao J (2018) Multiview spectral clustering via structured low-rank matrix factorization. In: IEEE Transactions on Neural Networks and Learning Systems, pp 1–11.  https://doi.org/10.1109/TNNLS.2017.2777489
  33. 33.
    Wang Y, Zhang W, Wu L, Lin X, Zhao X (2017) Unsupervised metric fusion over multiview data by graph random walk-based cross-view diffusion. IEEE Trans Neural Netw Learn Syst 28(1):57–70.  https://doi.org/10.1109/TNNLS.2015.2498149 CrossRefGoogle Scholar
  34. 34.
    Wu L, Wang Y, Gao J, Li X (2018) Deep adaptive feature embedding with local sample distributions for person re-identification. Pattern Recogn 73:275–288CrossRefGoogle Scholar
  35. 35.
    Wu L, Wang Y, Li X, Gao J (2018) Deep attention-based spatially recursive networks for fine-grained visual recognition. IEEE Transactions on Cybernetics.  https://doi.org/10.1109/TCYB.2018.2813971
  36. 36.
    Wu L, Wang Y, Li X, Gao J (2018) What-and-where to match: Deep spatially multiplicative integration networks for person re-identification. Pattern Recogn 76:727–738CrossRefGoogle Scholar
  37. 37.
    Yang J (2012) Top-down visual saliency via joint crf and dictionary learning. In: Computer vision and pattern recognition, pp 2296–2303Google Scholar
  38. 38.
    Yang C, Zhang L, Lu H, Ruan X, Yang MH (2013) Saliency detection via graph-based manifold ranking. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 3166–3173.  https://doi.org/10.1109/CVPR.2013.407
  39. 39.
    Zhang GX, Cheng MM, Hu SM, Martin RR (2009) A shape-preserving approach to image resizing. Comput Graph Forum 28(7):1897–1906.  https://doi.org/10.1111/j.1467-8659.2009.01568.x CrossRefGoogle Scholar
  40. 40.
    Zhang J, Sclaroff S, Lin Z, Shen X, Price B, Mech R (2016) Minimum barrier salient object detection at 80 FPS. pp 1404–1412.  https://doi.org/10.1109/ICCV.2015.165
  41. 41.
    Zhang P, Wang D, Lu H, Wang H, Yin B (2017) Learning uncertain convolutional features for accurate saliency detection. In: 2017 IEEE International conference on computer vision (ICCV), pp 212–221.  https://doi.org/10.1109/ICCV.2017.32
  42. 42.
    Zhang H, Goodfellow I, Metaxas D, Odena A (2018) Self-Attention Generative Adversarial Networks. arXiv:1805.08318
  43. 43.
    Zhang P, Wang L, Wang D, Lu H, Shen C (2018) Agile Amulet: Real-Time Salient Object Detection with Contextual Attention. arXiv:1802.06960
  44. 44.
    Zhang X, Wang T, Qi J, Lu H, Wang G (2018) Progressive Attention Guided Recurrent Network for Salient Object Detection. In: Cvpr, pp. 714–722.  https://doi.org/10.1109/CVPR.2018.00081. https://github.com/zhangxiaoning666/PAGR
  45. 45.
    Zhu L, Klein DA, Frintrop S, Cao Z, Cremers AB (2014) A multisize superpixel approach for salient object detection based on multivariate normal distribution estimation. IEEE Trans Image Process 23(12):5094–5107.  https://doi.org/10.1109/TIP.2014.2361024 MathSciNetCrossRefGoogle Scholar
  46. 46.
    Zhu W, Liang S, Wei Y, Sun J (2014) Saliency optimization from robust background detection. In: 2014 IEEE Conference on computer vision and pattern recognition. Columbus, OH, pp 2814–2821.  https://doi.org/10.1109/CVPR.2014.360

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.College of Computer Science and TechnologyJilin UniversityChangchunChina

Personalised recommendations