Abstract
In this paper we present a novel moment-based skeleton detection for representing human objects in RGB-D videos with animated 3D skeletons. An object often consists of several parts, where each of them can be concisely represented with a skeleton. However, it remains as a challenge to detect the skeletons of individual objects in an image since it requires an effective part detector and a part merging algorithm to group parts into objects. In this paper, we present a novel fully unsupervised learning framework to detect the skeletons of human objects in a RGB-D video. The skeleton modeling algorithm uses a pipeline architecture which consists of a series of cascaded operations, i.e., symmetry patch detection, linear time search of symmetry patch pairs, part and symmetry detection, symmetry graph partitioning, and object segmentation. The properties of geometric moment-based functions for embedding symmetry features into centers of symmetry patches are also investigated in detail. As compared with the state-of-the-art deep learning approaches for skeleton detection, the proposed approach does not require tedious human labeling work on training images to locate the skeleton pixels and their associated scale information. Although our algorithm can detect parts and objects simultaneously, a pre-learned convolution neural network (CNN) can be used to locate the human object from each frame of the input video RGB-D video in order to achieve the goal of constructing real-time applications. This much reduces the complexity to detect the skeleton structure of individual human objects with our proposed method. Using the segmented human object skeleton model, a video surveillance application is constructed to verify the effectiveness of the approach. Experimental results show that the proposed method gives good performance in terms of detection and recognition using publicly available datasets.
Similar content being viewed by others
References
Alexiadis DS, Zarpalas D, Daras P (2013) Real-time, full 3-D reconstruction of moving foreground objects from multiple consumer depth cameras. IEEE Trans Multimedia 15(2):339–358. https://doi.org/10.1109/TMM.2012.2229264
Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans PAMI 39(12):2481–2495
Barnachon M, Bouakaz S, Boufama B, Guillou E (2014) Ongoing human action recognition with motion capture. Pattern Recogn 47(1):238–247
Berg T, Belhumeur PN (2013) POOF: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Besl P, McKay N (1992) A method for registration of 3-D shapes. IEEE Trans PAMI 14(2):239–256
Boykov Y, Funka-Lea G (2006) Graph cuts and efficient n-d image segmentation. Int J Comput Vis 70:109–131
Buluç A, Meyerhenke H, Safro I, Sanders P, Schulz C (2015) Recent Advances in Graph Partitioning. arXiv:13113144v3 [csDS]
Cheng SC, Su JY, Hsiao KF, Rashvand HF (2016) Latent semantic learning with time-series cross correlation analysis for video scene detection and classification. Multimedia Tools and Applications 75(20):12919–12940
Cheung KKT, Shen D, Ip H, Teoh EK (1999) Symmetry detection by generalized complex moments: A close-form solution. IEEE Trans Pattern Anal Mach Intell 21(5):466–476
Chuan CH, Chen YN, Fan KC (2016) Human action recognition based on action forests model using Kinect camera. In: Proc. Intl. Conf. Advanced Information Networking and Applications (WAINA)
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) DeCAF: A deep convolutional activation feature for generic visual recognition. In: Proc. International Conference on Machine Learning (ICML)
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32:1627–1645
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proc. CVPR
Göring C, Rodner E, Freytag A, Denzler J (2014) Nonparametric part transfer for fine-grained recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Gourgari S, Goudelis G, Karpouzis K, Kollias S (2013) THETIS: Three dimensional tennis shots a human action dataset. In: Proc. IEEE Conf. Computer Vision and pattern Recognition Workshops (CVPRW), pp 676–681
Halim A, Dartigues-Pallez C, Precioso F, Riveill M, Benslimane A, Ghoneim S (2016) Human action recognition based on 3D skeleton part-based pose estimation and temporal multi-resolution analysis. IEEE International Conference on Image Processing (ICIP), Phoenix
Harandi MT, Salzmann M, Hartley R (2014) From manifold to manifold: Geometry-aware dimensionality reduction for SPD matrices. In: Proc. European Conf. Computer Vision (ECCV)
He K, Gkioxari G, Dollár P, Girshick R (2018) Mask R-CNN. arXiv:1703.06870v3 [cs.CV]
Hsieh JW, Chen LC, Chen DY (2014) Symmetrical surf and its applications to vehicle detection and vehicle make and model recognition. IEEE Trans Intell Transp 15:6–20
Klank U, Zia M (2009) Beetz M 3D model selection from an internet database for robotic vision. In: Proc. IEEE Intl. Conf. Robotics and Automation
Lawson CL, Hanson RJ (1974) Solving least squares problems. Prentice-Hall series in automatic computation. Published by Prentice-Hall, Inc., Englewood Cliffs, New Jersey
Lee TSH, Fidler S, Dickinson SJ (2013) Detecting curved symmetric parts using a deformable disc model. In: Proc. ICCV, pp 1753–1760
Levinshtein A, Dickinson SJ, Sminchisescu C (2009) Multiscale symmetric part detection and grouping. In: Proc. ICCV, pp 2162–2169
Liao L, Cao G, Cao W (2017) Action recognition based on depth image sequence. In: Proc. IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp 1583–1587
Lindeberg T (1998) Edge detection and ridge detection with automatic scale selection. Int J Comput Vis 30(2):117–156
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proc. CVPR
Loy G, Eklundh J-O (2006) Detecting symmetry and symmetric constellations of features. In: Proceedings of the 9th European Conference on Computer Vision, pp 508–521
Lucchi A, Yunpeng L, Boix X, Smith K, Fua P (2011) Are spatial and global constraints really necessary for segmentation? In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 9–16
Mitra NJ, Pauly M, Wand M, Ceylan D (2013) Symmetry in 3D geometry: Extraction and applications. Comput Graph Forum 32:1–23
Müller M, Röder T, Clausen M, Eberhardt B, Krüger B, Weber A (2007) Documentation Mocap Database HDM05. Technical report, No. CG-2007-2, ISSN 1610–8892, Universität Bonn
Munaro M, Fossati A, Basso A, Menegatti E, Gool LV (2013) One-Shot Person Re-Identification with a Consumer Depth Camera. In: Person Re-Identification. Springer
Park HS, Jun CH (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36(2):3336–3341
Pei SC, Lin CN (1992) Normalization of rotationally symmetric shapes for pattern recognition. Pattern Recogn 25(9):913–920
Pierobon M, Marcon M, Sarti A, Tubaro S (2007) A human action classifier from 4-D data (3-D+time) based on an invariant body shape descriptor and hidden Markov models. In: Procs. Intl. Conf. Signal Processing and Multimedia Applications, pp 8–11
Rother C, Kolmogorov V, Blake A (2004) GrabCut: Interactive foreground extraction using iterated graph cuts. ACM Trans Graph 23:309–314
Saha PK, Borgefors G, GSd B (2016) A survey on skeletonization algorithms and their applications. Pattern Recognition Letters. Pattern Recogn Lett 76(1):3–12
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2014) OverFeat: Integrated recognition, localization and detection using convolutional networks. In: Proc. International Conference on Learning Representations
Shen W, Zhao K, Jiang Y, Wang Y, Zhang Z, Bai X (2016) Object skeleton extraction in natural images by fusing scale-associated deep side outputs. In: Proc. CVPR
Shen W, Zhao K, Jiang Y, Wang Y, Bai X, Yuille A (2017) DeepSkeleton: Learning multi-task scale-associated deep side outputs for object skeleton extraction in natural images. IEEE Trans Image Processing 26(11):5298–5311
Sironi A, Lepetit V, Fua P (2014) Multiscale centerline detection by learning a scale-space distance transform. In: Proc. CVPR, pp 2697–2704
Su JY, Cheng SC, Huang DK (2015) Unsupervised object modeling and segmentation with symmetry detection for human activity recognition. Symmetry 7(2):427–449
Tsogkas S, Kokkinos I (2012) Learning-based symmetry detection in natural images. In: Proc. ECCV, pp 41–54
Tzimiropoulos G, Mitianoudis N, Stathaki T (2009) A unifying approach to moment-based shape orientation and symmetry classification. IEEE Trans Image Process 18(1):125–139
Vainstein J, Manera JF, Negri P, Delrieux C, Maguitman A (2014) Modeling video activity with dynamic phrases and its application to action recognition in tennis videos. Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Lect Notes Comput Sci 8827:909–916
Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition
Webb J, Ashley J (2012) Beginning Kinect Programming with the Microsoft Kinect SDK
Widynski N, Moevus A, Mignotte M (2014) Local symmetry detection in natural images using a particle filtering approach. IEEE Trans Image Process 23(12):5309–5322
Xiang Y, Li S (2012) Symmetric object detection based on symmetry and centripetal-sift edge descriptor. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR), pp 1403–1406
Yao BZ, Nie BX, Liu Z, Zhu S-C (2014) Animated pose templates for modeling and detecting human actions. IEEE Trans Pattern Analysis and Machine. Intelligence 36(3):436–452
Zhang Q, Couloigner I (2007) Accurate centerline detection and line width estimation of thick lines using the radon transform. IEEE Trans Image Process 16(2):310–316
Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based R-CNNs for fine-grained category detection. In: Proc. European Conference on Computer Vision (ECCV)
Acknowledgements
This work was supported in part by Ministry of Science and Technology, Taiwan under Grant Numbers MOST 105-2221-E-019-034-MY2, 107-2634-F-019 -001, and 106-2632-E-130-002.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cheng, SC., Hsiao, KF., Yang, CK. et al. A novel unsupervised 3D skeleton detection in RGB-D images for video surveillance. Multimed Tools Appl 79, 15829–15857 (2020). https://doi.org/10.1007/s11042-018-6292-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6292-y