TENet: Triple Excitation Network for Video Salient Object Detection

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12350)


In this paper, we propose a simple yet effective approach, named Triple Excitation Network, to reinforce the training of video salient object detection (VSOD) from three aspects, spatial, temporal, and online excitations. These excitation mechanisms are designed following the spirit of curriculum learning and aim to reduce learning ambiguities at the beginning of training by selectively exciting feature activations using ground truth. Then we gradually reduce the weight of ground truth excitations by a curriculum rate and replace it by a curriculum complementary map for better and faster convergence. In particular, the spatial excitation strengthens feature activations for clear object boundaries, while the temporal excitation imposes motions to emphasize spatio-temporal salient regions. Spatial and temporal excitations can combat the saliency shifting problem and conflict between spatial and temporal features of VSOD. Furthermore, our semi-curriculum learning design enables the first online refinement strategy for VSOD, which allows exciting and boosting saliency responses during testing without re-training. The proposed triple excitations can easily plug in different VSOD methods. Extensive experiments show the effectiveness of all three excitation methods and the proposed method outperforms state-of-the-art image and video salient object detection methods.



This project is supported by the National Natural Science Foundation of China (No. 61472145, No. 61972162, and No. 61702194), the Special Fund of Science and Technology Research and Development of Applications From Guangdong Province (SF-STRDA-GD) (No. 2016B010127003), the Guangzhou Key Industrial Technology Research fund (No. 201802010036), the Guangdong Natural Science Foundation (No. 2017A030312008), and the CCF-Tencent Open Research fund (CCF-Tencent RAGR20190112).


  1. 1.
    Achanta, R., Hemami, S., Estrada, F., Süsstrunk, S.: Frequency-tuned salient region detection. In: CVPR, pp. 1597–1604 (2009)Google Scholar
  2. 2.
    Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: ICML, pp. 41–48 (2009)Google Scholar
  3. 3.
    Borji, A.: Boosting bottom-up and top-down visual features for saliency estimation. In: CVPR, pp. 438–445. IEEE (2012)Google Scholar
  4. 4.
    Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) Computer Vision – ECCV 2010. LNCS, vol. 6315, pp. 282–295. Springer, Berlin (2010). Scholar
  5. 5.
    Cao, J., Pang, Y., Li, X.: Triply supervised decoder networks for joint detection and segmentation. In: CVPR, pp. 7392–7401 (2019)Google Scholar
  6. 6.
    Chen, C., Li, S., Wang, Y., Qin, H., Hao, A.: Video saliency detection via spatial-temporal fusion and low-rank coherency diffusion. IEEE TIP 26(7), 3156–3170 (2017)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Chen, Y., et al.: Scom: spatiotemporal constrained optimization for salient object detection. IEEE TIP 27(7), 3345–3357 (2018)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: CVPR, pp. 3150–3158 (2016)Google Scholar
  9. 9.
    De Boer, P.T., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134(1), 19–67 (2005)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Fan, D.P., Cheng, M.M., Liu, Y., Li, T., Borji, A.: Structure-measure: a new way to evaluate foreground maps. In: ICCV, pp. 4548–4557 (2017)Google Scholar
  11. 11.
    Fan, D.P., Wang, W., Cheng, M.M., Shen, J.: Shifting more attention to video salient object detection. In: CVPR, pp. 8554–8564 (2019)Google Scholar
  12. 12.
    Feng, M., Lu, H., Ding, E.: Attentive feedback network for boundary-aware salient object detection. In: CVPR (2019)Google Scholar
  13. 13.
    Gao, D., Vasconcelos, N.: Bottom-up saliency is a discriminant process. In: ICCV, pp. 1–6 (2007)Google Scholar
  14. 14.
    Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Simultaneous detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014. Lecture Notes in Computer Science, pp. 297–312. Springer, Cham (2014). Scholar
  15. 15.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017)Google Scholar
  16. 16.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR. pp. 770–778 (2016)Google Scholar
  17. 17.
    Hou, Q., et al.: Deeply supervised salient object detection with short connections. In: CVPR, pp. 3203–3212 (2017)Google Scholar
  18. 18.
    Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE TPAMI 11, 1254–1259 (1998)CrossRefGoogle Scholar
  19. 19.
    Jiang, H., et al.: Salient object detection: a discriminative regional feature integration approach. In: CVPR, pp. 2083–2090 (2013)Google Scholar
  20. 20.
    Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: saliency in context. In: CVPR (2015)Google Scholar
  21. 21.
    Koch, C.: Shifts in selective visual attention: towards the underlying neural circuitry. In: Vaina, L.M. (ed.) Matters of Intelligence. Synthese Library, vol. 188, pp. 115–141. Springer, Dordrecht (1987). Scholar
  22. 22.
    Lee, H., Kim, D.: Salient region-based online object tracking. In: WACV, pp. 1170–1177. IEEE (2018)Google Scholar
  23. 23.
    Li, G., Xie, Y., Wei, T., Wang, K., Lin, L.: Flow guided recurrent neural encoder for video salient object detection. In: ICCV, pp. 3243–3252 (2018)Google Scholar
  24. 24.
    Li, G., Yu, Y.: Visual saliency based on multiscale deep features. In: CVPR, pp. 5455–5463 (2015)Google Scholar
  25. 25.
    Li, H., Chen, G., Li, G., Yu, Y.: Motion guided attention for video salient object detection. In: ICCV (2019)Google Scholar
  26. 26.
    Li, S., Seybold, B., Vorobyov, A., Lei, X., Kuo, C.C.J.: Unsupervised video object segmentation with motion-based bilateral networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. Lecture Notes in Computer Science, vol. 11207, pp. 215–231. Springer, Cham (2018). Scholar
  27. 27.
    Liu, J.J., Hou, Q., Cheng, M.M., Feng, J., Jiang, J.: A simple pooling-based design for real-time salient object detection. In: CVPR (2019)Google Scholar
  28. 28.
    Liu, N., Han, J., Yang, M.H.: PiCANet: learning pixel-wise contextual attention for saliency detection. In: CVPR, pp. 3089–3098 (2018)Google Scholar
  29. 29.
    Liu, P., Lyu, M., King, I., Xu, J.: Selflow: self-supervised learning of optical flow. In: CVPR, pp. 4571–4580 (2019)Google Scholar
  30. 30.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)Google Scholar
  31. 31.
    Lu, X., et al.: See more, know more: Unsupervised video object segmentation with co-attention siamese networks. In: CVPR (2019)Google Scholar
  32. 32.
    Mechrez, R., Shechtman, E., Zelnik-Manor, L.: Saliency driven image manipulation. Mach. Vis. Appl. 30(2), 189–202 (2019)CrossRefGoogle Scholar
  33. 33.
    Papazoglou, A., Ferrari, V.: Fast object segmentation in unconstrained video. In: ICCV, pp. 1777–1784 (2013)Google Scholar
  34. 34.
    Perazzi, F., Krähenbühl, P., Pritch, Y., Hornung, A.: Saliency filters: Contrast based filtering for salient region detection. In: CVPR, pp. 733–740 (2012)Google Scholar
  35. 35.
    Perazzi, F., et al.: A benchmark dataset and evaluation methodology for video object segmentation. In: CVPR, pp. 724–732 (2016)Google Scholar
  36. 36.
    Qin, X., et al.: BASNet: boundary-aware salient object detection. In: CVPR. pp. 7479–7489 (2019)Google Scholar
  37. 37.
    Rahtu, E., Kannala, J., Salo, M., Heikkilä, J.: Segmenting salient objects from images and videos. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) Computer Vision – ECCV 2010. Lecture Notes in Computer Science, vol. 6315, pp. 366–379. Springer, Berlin, Heidelberg (2010). Scholar
  38. 38.
    Shafieyan, F., Karimi, N., Mirmahboub, B., Samavi, S., Shirani, S.: Image seam carving using depth assisted saliency map. In: ICIP, pp. 1155–1159. IEEE (2014)Google Scholar
  39. 39.
    Shi, X., et al.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: NeurIPS, pp. 802–810 (2015)Google Scholar
  40. 40.
    Song, H., Wang, W., Zhao, S., Shen, J., Lam, K.M.: Pyramid dilated deeper convLSTM for video salient object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. Lecture Notes in Computer Science, vol. 11215, pp. 744–760. Springer, Cham (2018). Scholar
  41. 41.
    Squire, L.R., Dronkers, N., Baldo, J.: Encyclopedia of Neuroscience. Elsevier, London (2009)Google Scholar
  42. 42.
    Tang, Y., et al.: Weakly supervised salient object detection with spatiotemporal cascade neural networks. IEEE Trans. Circuits Syst. Video Technol. (2018)Google Scholar
  43. 43.
    Tu, W.C., He, S., Yang, Q., Chien, S.Y.: Real-time salient object detection with a minimum spanning tree. In: CVPR, pp. 2334–2342 (2016)Google Scholar
  44. 44.
    Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. IJCV 103(1), 60–79 (2013)MathSciNetCrossRefGoogle Scholar
  45. 45.
    Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV, pp. 3551–3558 (2013)Google Scholar
  46. 46.
    Wang, L., et al.: Learning to detect salient objects with image-level supervision. In: CVPR, pp. 136–145 (2017)Google Scholar
  47. 47.
    Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: CVPR, pp. 4305–4314 (2015)Google Scholar
  48. 48.
    Wang, W., Shen, J., Guo, F., Cheng, M.M., Borji, A.: Revisiting video saliency: a large-scale benchmark and a new model. In: CVPR (2018)Google Scholar
  49. 49.
    Wang, W., Shen, J., Shao, L.: Consistent video saliency using local gradient flow optimization and global refinement. IEEE TIP 24(11), 4185–4196 (2015)MathSciNetzbMATHGoogle Scholar
  50. 50.
    Wang, W., Shen, J., Shao, L.: Video salient object detection via fully convolutional networks. IEEE TIP 27(1), 38–49 (2017)MathSciNetzbMATHGoogle Scholar
  51. 51.
    Wang, W., et al.: Learning unsupervised video object segmentation through visual attention. In: CVPR (2019)Google Scholar
  52. 52.
    Wang, Z., et al.: Image quality assessment: from error visibility to structural similarity. IEEE TIP 13(4), 600–612 (2004)Google Scholar
  53. 53.
    Yang, J., Yang, M.H.: Top-down visual saliency via joint CRF and dictionary learning. IEEE TPAMI 39(3), 576–588 (2016)CrossRefGoogle Scholar
  54. 54.
    Yang, Y., et al.: Salient color names for person re-identification. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014. Lecture Notes in Computer Science, vol. 8689, pp. 536–551. Springer, Cham (2014). Scholar
  55. 55.
    Yang, Z., et al.: Anchor diffusion for unsupervised video object segmentation. In: ICCV (2019)Google Scholar
  56. 56.
    Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: UnitBox: an advanced object detection network. In: ACM MM, pp. 516–520 (2016)Google Scholar
  57. 57.
    Zhang, L., Dai, J., Lu, H., He, Y., Wang, G.: A bi-directional message passing model for salient object detection. In: CVPR, pp. 1741–1750 (2018)Google Scholar
  58. 58.
    Zhang, X., Wang, T., Qi, J., Lu, H., Wang, G.: Progressive attention guided recurrent network for salient object detection. In: CVPR (2018)Google Scholar
  59. 59.
    Zhang, Z., et al.: Single-shot object detection with enriched semantics. In: CVPR, pp. 5813–5821 (2018)Google Scholar
  60. 60.
    Zhao, R., Ouyang, W., Li, H., Wang, X.: Saliency detection by multi-context deep learning. In: CVPR, pp. 1265–1274 (2015)Google Scholar
  61. 61.
    Zhao, R., Ouyang, W., Wang, X.: Unsupervised salience learning for person re-identification. In: CVPR, pp. 3586–3593 (2013)Google Scholar
  62. 62.
    Zhao, R., Oyang, W., Wang, X.: Person re-identification by saliency learning. IEEE TPAMI 39(2), 356–370 (2016)CrossRefGoogle Scholar
  63. 63.
    Zhao, T., Wu, X.: Pyramid feature attention network for saliency detection. In: CVPR (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.School of Computer Science and EngineeringSouth China University of TechnologyGuangzhouChina
  2. 2.Guangdong Provincial People’s Hospital, Guangdong Academy of Medical SciencesGuangzhouChina
  3. 3.Department of Computer Science and TechnologyDalian University of TechnologyDalianChina

Personalised recommendations