Advertisement

IoT-based 3D convolution for video salient object detection

  • Shizhou Dong
  • Zhifan Gao
  • Sandeep Pirbhulal
  • Gui-Bin Bian
  • Heye Zhang
  • Wanqing Wu
  • Shuo Li
Intelligent Biomedical Data Analysis and Processing
  • 44 Downloads

Abstract

The video salient object detection (SOD) is the first step for the devices in the Internet of Things (IoT) to understand the environment around them. The video SOD needs the objects’ motion information in contiguous video frames as well as spatial contrast information from a single video frame. A large number of IoT devices’ computing power is not sufficient to support the existing SOD methods’ expensive computational complexity in emotion estimation, because they might have low hardware configurations (e.g., surveillance camera, and smartphone). In order to model the objects’ motion information efficiently for SOD, we propose an end-to-end video SOD algorithm with an efficient representation of the objects’ motion information. This algorithm contains two major parts: a 3D convolution-based X-shape structure that directly represents the motion information in successive video frames efficiently, and 2D densely connected convolutional neural networks (DenseNet) with pyramid structure to extract the rich spatial contrast information in a single video frame. Our method not only can maintain a small number of parameters as the 2D convolutional neural network but also represents spatiotemporal information uniformly that enables it can be trained end-to-end. We evaluate our proposed method on four benchmark datasets. The results show that our method achieves state-of-the-art performance compared with the other five methods.

Keywords

Internet of Things Salient object detection Video processing Deep learning 

Mathematics Subject Classification

68T45 68T10 68T05 

Notes

Funding

This study was funded by Youth Innovation Promotion Association of the Chinese Academy of Sciences (Grant No. 218165), Shenzhen Key Laboratory of Neuropsychiatric Modulation (CN) (Grant No. JCYJ20170307165309009).

Compliance with ethical standards

Conflict of interest

The authors declared that they have no conflict of interest to this work.

References

  1. 1.
    Borji A (2012) Boosting bottom-up and top-down visual features for saliency estimation. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp 438–445Google Scholar
  2. 2.
    Chen S, Xu H, Liu D, Hu B, Wang H (2014) A vision of IoT: applications, challenges, and opportunities with China perspective. IEEE Internet Things J 1(4):349–359.  https://doi.org/10.1109/JIOT.2014.2337336 CrossRefGoogle Scholar
  3. 3.
    Cheng MM, Mitra NJ, Huang XL, Torr PHS, Hu SM (2015) Global contrast based salient region detection. IEEE TPAMI 37(3):569–582.  https://doi.org/10.1109/TPAMI.2014.2345401 CrossRefGoogle Scholar
  4. 4.
    Fukuchi K, Miyazato K, Kimura A, Takagi S, Yamato J (2009) Saliency-based video segmentation with graph cuts and sequentially updated priors. In: 2009 IEEE international conference on multimedia and expo (ICME), pp 638–641Google Scholar
  5. 5.
    Gao H, Zhuang L, Laurens M, Kilian W (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269Google Scholar
  6. 6.
    Guo C, Ma Q, Zhang L (2008) Spatio-temporal saliency detection using phase spectrum of quaternion Fourier transform. In: 2008 IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8Google Scholar
  7. 7.
    Hou Q, Cheng MM, Hu X, Borji A, Tu Z, Torr PHS (2018) Deeply supervised salient object detection with short connections. IEEE Trans Pattern Anal Mach Intell 1–1Google Scholar
  8. 8.
    Hsu KJ, Lin YY, Chuang YY (2017) Weakly supervised saliency detection with a category-driven map generator. In: British machine vision conference (BMVC)Google Scholar
  9. 9.
    Hu P, Shuai B, Liu J, Wang G (2017) Deep level sets for salient object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR)Google Scholar
  10. 10.
    Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231CrossRefGoogle Scholar
  11. 11.
    Jiang B, Zhang L, Lu H, Yang C, Yang MH (2013) Saliency detection via absorbing Markov chain. In: 2013 IEEE international conference on computer vision (ICCV), pp 1665–1672Google Scholar
  12. 12.
    Judd T, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. In: 2009 IEEE 12th international conference on computer vision, pp 2106–2113Google Scholar
  13. 13.
    Kazuma A, Fukuchi K, Kimura A, Takagi S (2010) Fully automatic extraction of salient objects from videos in near real-time. CoRR 1–25Google Scholar
  14. 14.
    Le TN, Sugimoto A (2017) Spatiotemporal utilization of deep features for video saliency detection. In: 2017 IEEE international conference on multimedia and expo workshops (ICMEW). IEEE, pp 465–470Google Scholar
  15. 15.
    Lee YJ, Kim J, Grauman K (2011) Key-segments for video object segmentation. In: 2011 International conference on computer vision (ICCV), pp 1995–2002Google Scholar
  16. 16.
    Li G, Yu Y (2015) Visual saliency based on multiscale deep features. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 5455–5463Google Scholar
  17. 17.
    Li G, Xie Y, Wei T, Wang K, Lin L (2018) Flow guided recurrent neural encoder for video salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3243–3252Google Scholar
  18. 18.
    Li GB, Xie Y, Lin L, Yu YZ (2017) Instance-level salient object segmentation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 247–256Google Scholar
  19. 19.
    Li J, Levine M, An X, He H (2011) Saliency detection based on frequency and spatial domain analyses. In: Proceedings of the British machine vision conference (BMVC). BMVA Press, pp 86.1–86.11Google Scholar
  20. 20.
    Li J, Xia C, Chen X (2018) A benchmark dataset and saliency-guided stacked autoencoders for video-based salient object detection. IEEE Trans Image Process 27(1):349–364MathSciNetCrossRefGoogle Scholar
  21. 21.
    Li X, Zhao LM, Wei L, Yang MH, Wu F, Zhuang YT, Ling HB, Wang JD (2016) Deepsaliency: multi-task deep neural network model for salient object detection. IEEE Trans Image Process 25(8):3919–3930MathSciNetCrossRefGoogle Scholar
  22. 22.
    Liu N, Han J (2016) Dhsnet: deep hierarchical saliency network for salient object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 678–686Google Scholar
  23. 23.
    Liu T, Zheng N, Wei, Yuan Z (2008) Video attention: learning to detect a salient object sequence. In: 2008 19th International conference on pattern recognition (ICPR), pp 1–4Google Scholar
  24. 24.
    Luo ZM, Mishra A, Achkar A, Eichel J, Li SZ, Jodoin PM (2017) Non-local deep features for salient object detection. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR)Google Scholar
  25. 25.
    Ma T, Latecki LJ (2012) Maximum weight cliques with mutex constraints for video object segmentation. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp 670–677Google Scholar
  26. 26.
    Margolin R, Tal A, Zelnik-Manor L (2013) What makes a patch distinct? In: 2013 IEEE conference on computer vision and pattern recognition, pp 1139–1146Google Scholar
  27. 27.
    Mohammadi M, Al-Fuqaha A, Guizani M, Oh JS (2018) Semisupervised deep reinforcement learning in support of IoT and smart city services. IEEE Internet Things J 5(2):624–635.  https://doi.org/10.1109/JIOT.2017.2712560 CrossRefGoogle Scholar
  28. 28.
    Ochs P, Malik J, Brox T (2014) Segmentation of moving objects by long term video analysis. IEEE Trans Pattern Anal Mach Intell 36(6):1187–1200CrossRefGoogle Scholar
  29. 29.
    Perazzi F, Pont-Tuset J, McWilliams B, Gool LV, Gross M, Sorkine-Hornung A (2016) A benchmark dataset and evaluation methodology for video object segmentation. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 724–732Google Scholar
  30. 30.
    Rahtu E, Kannala J, Salo M, Heikkil J (2010) Segmenting salient objects from images and videos. In: Proceedings of the 11th European conference on computer vision: part V (ECCV), ECCV’10. Springer, Berlin, pp 366–379Google Scholar
  31. 31.
    Seo PM (2009) Static and space–time visual saliency detection by self-resemblance. J Vis 9(12):15CrossRefGoogle Scholar
  32. 32.
    Sezer OB, Dogdu E, Ozbayoglu AM (2018) Context-aware computing, learning, and big data in internet of things: a survey. IEEE Internet Things J 5(1):1–27.  https://doi.org/10.1109/JIOT.2017.2773600 CrossRefGoogle Scholar
  33. 33.
    Stankovic JA (2014) Research directions for the internet of things. IEEE Internet Things J 1(1):3–9.  https://doi.org/10.1109/JIOT.2014.2312291 MathSciNetCrossRefGoogle Scholar
  34. 34.
    Sudre CH, Li WQ, Vercauteren T, Ourselin S, Cardoso MJ (2017) Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Deep learning in medical image analysis and multimodal learning for clinical decision support. Springer, pp 240–248Google Scholar
  35. 35.
    Sumati M, Shanu S (2016) Analysis of computer vision based techniques for motion detection. In: Cloud system and big data engineering. IEEE, pp 445–450Google Scholar
  36. 36.
    Lijun W, Huchuan L, Xiang R, Ming-Hsuan Y (2015) Deep networks for saliency detection via local estimation and global search. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3183–3192Google Scholar
  37. 37.
    Wang LJ, Lu HH, Wang YF, Feng MY, Wang D, Yin BC, Ruan X (2017) Learning to detect salient objects with image-level supervision. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR)Google Scholar
  38. 38.
    Wang T, Borji A, Zhang LH, Zhang PP, Lu HC (2017) A stagewise refinement model for detecting salient objects in images. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 4019–4028Google Scholar
  39. 39.
    Wang W, Shen J, Porikli F (2015) Saliency-aware geodesic video object segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3395–3402Google Scholar
  40. 40.
    Wang W, Shen J, Shao L (2015) Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans Image Process 24(11):4185–4196MathSciNetCrossRefGoogle Scholar
  41. 41.
    Wang W, Shen J, Shao L (2018) Video salient object detection via fully convolutional networks. IEEE Trans Image Process 27(1):38–49MathSciNetCrossRefGoogle Scholar
  42. 42.
    Xiao X, Xu C, Rui Y (2010) Video based 3D reconstruction using spatio-temporal attention analysis. In: 2010 IEEE international conference on multimedia and expo (ICME), pp 1091–1096Google Scholar
  43. 43.
    Yang C, Zhang LH, Lu HH, Ruan X, Yang M (2013) Saliency detection via graph-based manifold ranking. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3166–3173Google Scholar
  44. 44.
    Zhang L, Tong MH, Marks TK, Shan H, Cottrell GW (2008) Sun: a Bayesian framework for saliency using natural statistics. J Vis 8(7):32CrossRefGoogle Scholar
  45. 45.
    Zhao R, Ouyang W, Li H, Wang X (2015) Saliency detection by multi-context deep learning. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 1265–1274Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Shenzhen Institutes of Advanced TechnologyChinese Academy of SciencesShenzhenChina
  2. 2.Shenzhen College of Advanced TechnologyUniversity of Chinese Academy of Sciences ShenzhenShenzhenChina
  3. 3.Western UniversityLondonCanada
  4. 4.State Key Laboratory of Management and Control for Complex Systems Institute of AutomationChinese Academy of SciencesBeijingChina
  5. 5.Sun Yat-Sen UniversityGuangzhouChina

Personalised recommendations