Video Saliency Detection Using Deep Convolutional Neural Networks

  • Xiaofei Zhou
  • Zhi LiuEmail author
  • Chen Gong
  • Gongyang Li
  • Mengke Huang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11257)


Numerous deep learning based efforts have been done for image saliency detection, and thus, it is a natural idea that we can construct video saliency model on basis of these image saliency models in an effective way. Besides, as for the limited number of training videos, existing video saliency model is trained with large-scale synthetic video data. In this paper, we construct video saliency model based on existing image saliency model and perform training on the limited video data. Concretely, our video saliency model consists of three steps including feature extraction, feature aggregation and spatial refinement. Firstly, the concatenation of current frame and its optical flow image is fed into the feature extraction network, yielding feature maps. Then, a tensor, which consists of the generated feature maps and the original information including the current frame and the optical flow image, is passed to the aggregation network, in which the original information can provide complementary information for aggregation. Finally, in order to obtain a high-quality saliency map with well-defined boundaries, the output of aggregation network and the current frame are used to perform spatial refinement, yielding the final saliency map for the current frame. The extensive qualitative and quantitative experiments on two challenging video datasets show that the proposed model consistently outperforms the state-of-the-art saliency models for detecting salient objects in videos.


Video saliency Convolutional neural networks Feature aggregation 


  1. 1.
    Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell 39, 2481–2495 (2017)CrossRefGoogle Scholar
  2. 2.
    Baker, S., Scharstein, D., Lewis, J.P., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. Int. J. Comput. Vis. 92(1), 1–31 (2011)CrossRefGoogle Scholar
  3. 3.
    Brox, T., Malik, J.: Large displacement optical flow: descriptor matching in variational motion estimation. IEEE Trans. Pattern Anal. Mach. Intell. 33(3), 500–513 (2011)CrossRefGoogle Scholar
  4. 4.
    Caelles, S., Maninis, K., Ponttuset, J., Lealtaixe, L., Cremers, D., Van Gool, L.: One-shot video object segmentation, pp. 221–230, June 2016Google Scholar
  5. 5.
    Chen, C., Li, S., Wang, Y., Qin, H., Hao, A.: Video saliency detection via spatial-temporal fusion and low-rank coherency diffusion. IEEE Trans. Image Process. 26(7), 3156–3170 (2017)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)CrossRefGoogle Scholar
  7. 7.
    Chen, L., Shen, J., Wang, W., Ni, B.: Video object segmentation via dense trajectories. IEEE Trans. Multimed. 17(12), 2225–2234 (2015)CrossRefGoogle Scholar
  8. 8.
    Du, H., Liu, Z., Jiang, J., Shen, L.: Stretchability-aware block scaling for image retargeting. J. Vis. Commun. Image Represent. 24(4), 499–508 (2013)CrossRefGoogle Scholar
  9. 9.
    Fang, Y., Wang, Z., Lin, W., Fang, Z.: Video saliency incorporating spatiotemporal cues and uncertainty weighting. IEEE Trans. Image Process. 23(9), 3910–3921 (2014)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448. IEEE (2015)Google Scholar
  11. 11.
    Gong, C., et al.: Saliency propagation from simple to difficult. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2531–2539. IEEE, June 2015Google Scholar
  12. 12.
    Guo, C., Zhang, L.: A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Trans. Image Process. 19(1), 185–198 (2010)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Guo, J., Song, B., Du, X.: Significance evaluation of video data over media cloud based on compressed sensing. IEEE Trans. Multimed. 18(7), 1297–1304 (2016)CrossRefGoogle Scholar
  14. 14.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on imageNet classification. In: The IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034. IEEE (2015)Google Scholar
  15. 15.
    Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998)CrossRefGoogle Scholar
  16. 16.
    Lee, W.F., Huang, T.H., Yeh, S.L., Chen, H.H.: Learning-based prediction of visual attention for video signals. IEEE Trans. Image Process. 20(11), 3028–3038 (2011)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Li, F., Kim, T., Humayun, A., Tsai, D., Rehg, J.M.: Video segmentation by tracking many figure-ground segments. In: IEEE International Conference on Computer Vision (ICCV), pp. 2192–2199. IEEE (2013)Google Scholar
  18. 18.
    Liu, C., Yuen, P.C., Qiu, G.: Object motion detection using information theoretic spatio-temporal saliency. Pattern Recognit. 42(11), 2897–2906 (2009)CrossRefGoogle Scholar
  19. 19.
    Liu, Z., Li, J., Ye, L., Sun, G., Shen, L.: Saliency detection for unconstrained videos using superpixel-level graph and spatiotemporal propagation. IEEE Trans. Circuits Syst. Video Technol. 27(12), 2527–2542 (2017)CrossRefGoogle Scholar
  20. 20.
    Liu, Z., Zhang, X., Luo, S., Le Meur, O.: Superpixel-based spatiotemporal saliency detection. IEEE Trans. Circuits Syst. Video Technol. 24(9), 1522–1540 (2014)CrossRefGoogle Scholar
  21. 21.
    Liu, Z., Zou, W., Le Meur, O.: Saliency tree: a novel saliency detection framework. IEEE Trans. Image Process. 23(5), 1937–1952 (2014)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Mahadevan, V., Vasconcelos, N.: Spatiotemporal saliency in dynamic scenes. IEEE Trans. Pattern Anal. Mach. Intell. 32(1), 171–177 (2010)CrossRefGoogle Scholar
  23. 23.
    Mahapatra, D., Winkler, S., Yen, S.C.: Motion saliency outweighs other low-level features while watching videos. In: Human Vision and Electronic Imaging XIII, vol. 6806, p. 68060P. International Society for Optics and Photonics (2008)Google Scholar
  24. 24.
    Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 724–732. IEEE (2016)Google Scholar
  25. 25.
    Vig, E., Dorr, M., Martinetz, T., Barth, E.: Intrinsic dimensionality predicts the saliency of natural dynamic scenes. IEEE Trans. Pattern Anal. Mach. Intell. 34(6), 1080–1091 (2012)CrossRefGoogle Scholar
  26. 26.
    Wang, W., Shen, J., Porikli, F.: Saliency-aware geodesic video object segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3395–3402. IEEE (2015)Google Scholar
  27. 27.
    Wang, W., Shen, J., Shao, L.: Video salient object detection via fully convolutional networks. IEEE Trans. Image Process. 27(1), 38–49 (2018)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Yan, B., Yuan, B., Yang, B.: Effective video retargeting with jittery assessment. IEEE Trans. Multimed. 16(1), 272–277 (2014)CrossRefGoogle Scholar
  29. 29.
    Ye, L., Liu, Z., Li, L., Shen, L., Bai, C., Wang, Y.: Salient object segmentation via effective integration of saliency and objectness. IEEE Trans. Multimed. 19(8), 1742–1756 (2017)CrossRefGoogle Scholar
  30. 30.
    Zhang, P., Wang, D., Lu, H., Wang, H., Ruan, X.: Amulet: aggregating multi-level convolutional features for salient object detection. In: The IEEE International Conference on Computer Vision (ICCV), pp. 202–211. IEEE, October 2017Google Scholar
  31. 31.
    Zhao, R., Ouyang, W., Li, H., Wang, X.: Saliency detection by multi-context deep learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1265–1274. IEEE, June 2015Google Scholar
  32. 32.
    Zhou, X., Liu, Z., Gong, C., Liu, W.: Improving video saliency detection via localized estimation and spatiotemporal refinement. IEEE Trans. Multimed. (2018). Scholar
  33. 33.
    Zhou, X., Liu, Z., Sun, G., Ye, L., Wang, X.: Improving saliency detection via multiple kernel boosting and adaptive fusion. IEEE Signal Process. Lett. 23(4), 517–521 (2016)CrossRefGoogle Scholar
  34. 34.
    Zhu, Z., et al.: An adaptive hybrid pattern for noise-robust texture analysis. Pattern Recognit. 48(8), 2592–2608 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Xiaofei Zhou
    • 1
    • 2
    • 3
  • Zhi Liu
    • 2
    • 3
    Email author
  • Chen Gong
    • 4
  • Gongyang Li
    • 2
    • 3
  • Mengke Huang
    • 2
    • 3
  1. 1.Institute of Information and ControlHangzhou Dianzi UniversityHangzhouChina
  2. 2.Shanghai Institute for Advanced Communication and Data ScienceShanghai UniversityShanghaiChina
  3. 3.School of Communication and Information EngineeringShanghai UniversityShanghaiChina
  4. 4.Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, School of Computer Science and EngineeringNanjing University of Science and TechnologyNanjingChina

Personalised recommendations