Abstract
The video salient object detection (SOD) is the first step for the devices in the Internet of Things (IoT) to understand the environment around them. The video SOD needs the objects’ motion information in contiguous video frames as well as spatial contrast information from a single video frame. A large number of IoT devices’ computing power is not sufficient to support the existing SOD methods’ expensive computational complexity in emotion estimation, because they might have low hardware configurations (e.g., surveillance camera, and smartphone). In order to model the objects’ motion information efficiently for SOD, we propose an end-to-end video SOD algorithm with an efficient representation of the objects’ motion information. This algorithm contains two major parts: a 3D convolution-based X-shape structure that directly represents the motion information in successive video frames efficiently, and 2D densely connected convolutional neural networks (DenseNet) with pyramid structure to extract the rich spatial contrast information in a single video frame. Our method not only can maintain a small number of parameters as the 2D convolutional neural network but also represents spatiotemporal information uniformly that enables it can be trained end-to-end. We evaluate our proposed method on four benchmark datasets. The results show that our method achieves state-of-the-art performance compared with the other five methods.
Similar content being viewed by others
References
Borji A (2012) Boosting bottom-up and top-down visual features for saliency estimation. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp 438–445
Chen S, Xu H, Liu D, Hu B, Wang H (2014) A vision of IoT: applications, challenges, and opportunities with China perspective. IEEE Internet Things J 1(4):349–359. https://doi.org/10.1109/JIOT.2014.2337336
Cheng MM, Mitra NJ, Huang XL, Torr PHS, Hu SM (2015) Global contrast based salient region detection. IEEE TPAMI 37(3):569–582. https://doi.org/10.1109/TPAMI.2014.2345401
Fukuchi K, Miyazato K, Kimura A, Takagi S, Yamato J (2009) Saliency-based video segmentation with graph cuts and sequentially updated priors. In: 2009 IEEE international conference on multimedia and expo (ICME), pp 638–641
Gao H, Zhuang L, Laurens M, Kilian W (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269
Guo C, Ma Q, Zhang L (2008) Spatio-temporal saliency detection using phase spectrum of quaternion Fourier transform. In: 2008 IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8
Hou Q, Cheng MM, Hu X, Borji A, Tu Z, Torr PHS (2018) Deeply supervised salient object detection with short connections. IEEE Trans Pattern Anal Mach Intell 1–1
Hsu KJ, Lin YY, Chuang YY (2017) Weakly supervised saliency detection with a category-driven map generator. In: British machine vision conference (BMVC)
Hu P, Shuai B, Liu J, Wang G (2017) Deep level sets for salient object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR)
Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Jiang B, Zhang L, Lu H, Yang C, Yang MH (2013) Saliency detection via absorbing Markov chain. In: 2013 IEEE international conference on computer vision (ICCV), pp 1665–1672
Judd T, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. In: 2009 IEEE 12th international conference on computer vision, pp 2106–2113
Kazuma A, Fukuchi K, Kimura A, Takagi S (2010) Fully automatic extraction of salient objects from videos in near real-time. CoRR 1–25
Le TN, Sugimoto A (2017) Spatiotemporal utilization of deep features for video saliency detection. In: 2017 IEEE international conference on multimedia and expo workshops (ICMEW). IEEE, pp 465–470
Lee YJ, Kim J, Grauman K (2011) Key-segments for video object segmentation. In: 2011 International conference on computer vision (ICCV), pp 1995–2002
Li G, Yu Y (2015) Visual saliency based on multiscale deep features. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 5455–5463
Li G, Xie Y, Wei T, Wang K, Lin L (2018) Flow guided recurrent neural encoder for video salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3243–3252
Li GB, Xie Y, Lin L, Yu YZ (2017) Instance-level salient object segmentation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 247–256
Li J, Levine M, An X, He H (2011) Saliency detection based on frequency and spatial domain analyses. In: Proceedings of the British machine vision conference (BMVC). BMVA Press, pp 86.1–86.11
Li J, Xia C, Chen X (2018) A benchmark dataset and saliency-guided stacked autoencoders for video-based salient object detection. IEEE Trans Image Process 27(1):349–364
Li X, Zhao LM, Wei L, Yang MH, Wu F, Zhuang YT, Ling HB, Wang JD (2016) Deepsaliency: multi-task deep neural network model for salient object detection. IEEE Trans Image Process 25(8):3919–3930
Liu N, Han J (2016) Dhsnet: deep hierarchical saliency network for salient object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 678–686
Liu T, Zheng N, Wei, Yuan Z (2008) Video attention: learning to detect a salient object sequence. In: 2008 19th International conference on pattern recognition (ICPR), pp 1–4
Luo ZM, Mishra A, Achkar A, Eichel J, Li SZ, Jodoin PM (2017) Non-local deep features for salient object detection. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR)
Ma T, Latecki LJ (2012) Maximum weight cliques with mutex constraints for video object segmentation. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp 670–677
Margolin R, Tal A, Zelnik-Manor L (2013) What makes a patch distinct? In: 2013 IEEE conference on computer vision and pattern recognition, pp 1139–1146
Mohammadi M, Al-Fuqaha A, Guizani M, Oh JS (2018) Semisupervised deep reinforcement learning in support of IoT and smart city services. IEEE Internet Things J 5(2):624–635. https://doi.org/10.1109/JIOT.2017.2712560
Ochs P, Malik J, Brox T (2014) Segmentation of moving objects by long term video analysis. IEEE Trans Pattern Anal Mach Intell 36(6):1187–1200
Perazzi F, Pont-Tuset J, McWilliams B, Gool LV, Gross M, Sorkine-Hornung A (2016) A benchmark dataset and evaluation methodology for video object segmentation. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 724–732
Rahtu E, Kannala J, Salo M, Heikkil J (2010) Segmenting salient objects from images and videos. In: Proceedings of the 11th European conference on computer vision: part V (ECCV), ECCV’10. Springer, Berlin, pp 366–379
Seo PM (2009) Static and space–time visual saliency detection by self-resemblance. J Vis 9(12):15
Sezer OB, Dogdu E, Ozbayoglu AM (2018) Context-aware computing, learning, and big data in internet of things: a survey. IEEE Internet Things J 5(1):1–27. https://doi.org/10.1109/JIOT.2017.2773600
Stankovic JA (2014) Research directions for the internet of things. IEEE Internet Things J 1(1):3–9. https://doi.org/10.1109/JIOT.2014.2312291
Sudre CH, Li WQ, Vercauteren T, Ourselin S, Cardoso MJ (2017) Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Deep learning in medical image analysis and multimodal learning for clinical decision support. Springer, pp 240–248
Sumati M, Shanu S (2016) Analysis of computer vision based techniques for motion detection. In: Cloud system and big data engineering. IEEE, pp 445–450
Lijun W, Huchuan L, Xiang R, Ming-Hsuan Y (2015) Deep networks for saliency detection via local estimation and global search. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3183–3192
Wang LJ, Lu HH, Wang YF, Feng MY, Wang D, Yin BC, Ruan X (2017) Learning to detect salient objects with image-level supervision. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR)
Wang T, Borji A, Zhang LH, Zhang PP, Lu HC (2017) A stagewise refinement model for detecting salient objects in images. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 4019–4028
Wang W, Shen J, Porikli F (2015) Saliency-aware geodesic video object segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3395–3402
Wang W, Shen J, Shao L (2015) Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans Image Process 24(11):4185–4196
Wang W, Shen J, Shao L (2018) Video salient object detection via fully convolutional networks. IEEE Trans Image Process 27(1):38–49
Xiao X, Xu C, Rui Y (2010) Video based 3D reconstruction using spatio-temporal attention analysis. In: 2010 IEEE international conference on multimedia and expo (ICME), pp 1091–1096
Yang C, Zhang LH, Lu HH, Ruan X, Yang M (2013) Saliency detection via graph-based manifold ranking. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3166–3173
Zhang L, Tong MH, Marks TK, Shan H, Cottrell GW (2008) Sun: a Bayesian framework for saliency using natural statistics. J Vis 8(7):32
Zhao R, Ouyang W, Li H, Wang X (2015) Saliency detection by multi-context deep learning. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 1265–1274
Funding
This study was funded by Youth Innovation Promotion Association of the Chinese Academy of Sciences (Grant No. 218165), Shenzhen Key Laboratory of Neuropsychiatric Modulation (CN) (Grant No. JCYJ20170307165309009).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declared that they have no conflict of interest to this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Dong, S., Gao, Z., Pirbhulal, S. et al. IoT-based 3D convolution for video salient object detection. Neural Comput & Applic 32, 735–746 (2020). https://doi.org/10.1007/s00521-018-03971-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-018-03971-3