A novel unsupervised 3D skeleton detection in RGB-D images for video surveillance

Cheng, Shyi-Chyi; Hsiao, Kuei-Fang; Yang, Chen-Kuei; Hsiao, Po-Fu; Yu, Wan-Hsuan

doi:10.1007/s11042-018-6292-y

A novel unsupervised 3D skeleton detection in RGB-D images for video surveillance

Published: 01 July 2018

Volume 79, pages 15829–15857, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Shyi-Chyi Cheng¹,
Kuei-Fang Hsiao ORCID: orcid.org/0000-0002-8342-6909²,
Chen-Kuei Yang²,
Po-Fu Hsiao¹ &
…
Wan-Hsuan Yu¹

594 Accesses
1 Citation
Explore all metrics

Abstract

In this paper we present a novel moment-based skeleton detection for representing human objects in RGB-D videos with animated 3D skeletons. An object often consists of several parts, where each of them can be concisely represented with a skeleton. However, it remains as a challenge to detect the skeletons of individual objects in an image since it requires an effective part detector and a part merging algorithm to group parts into objects. In this paper, we present a novel fully unsupervised learning framework to detect the skeletons of human objects in a RGB-D video. The skeleton modeling algorithm uses a pipeline architecture which consists of a series of cascaded operations, i.e., symmetry patch detection, linear time search of symmetry patch pairs, part and symmetry detection, symmetry graph partitioning, and object segmentation. The properties of geometric moment-based functions for embedding symmetry features into centers of symmetry patches are also investigated in detail. As compared with the state-of-the-art deep learning approaches for skeleton detection, the proposed approach does not require tedious human labeling work on training images to locate the skeleton pixels and their associated scale information. Although our algorithm can detect parts and objects simultaneously, a pre-learned convolution neural network (CNN) can be used to locate the human object from each frame of the input video RGB-D video in order to achieve the goal of constructing real-time applications. This much reduces the complexity to detect the skeleton structure of individual human objects with our proposed method. Using the segmented human object skeleton model, a video surveillance application is constructed to verify the effectiveness of the approach. Experimental results show that the proposed method gives good performance in terms of detection and recognition using publicly available datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

References

Alexiadis DS, Zarpalas D, Daras P (2013) Real-time, full 3-D reconstruction of moving foreground objects from multiple consumer depth cameras. IEEE Trans Multimedia 15(2):339–358. https://doi.org/10.1109/TMM.2012.2229264
Article Google Scholar
Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans PAMI 39(12):2481–2495
Article Google Scholar
Barnachon M, Bouakaz S, Boufama B, Guillou E (2014) Ongoing human action recognition with motion capture. Pattern Recogn 47(1):238–247
Article Google Scholar
Berg T, Belhumeur PN (2013) POOF: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Besl P, McKay N (1992) A method for registration of 3-D shapes. IEEE Trans PAMI 14(2):239–256
Article Google Scholar
Boykov Y, Funka-Lea G (2006) Graph cuts and efficient n-d image segmentation. Int J Comput Vis 70:109–131
Article Google Scholar
Buluç A, Meyerhenke H, Safro I, Sanders P, Schulz C (2015) Recent Advances in Graph Partitioning. arXiv:13113144v3 [csDS]
Cheng SC, Su JY, Hsiao KF, Rashvand HF (2016) Latent semantic learning with time-series cross correlation analysis for video scene detection and classification. Multimedia Tools and Applications 75(20):12919–12940
Article Google Scholar
Cheung KKT, Shen D, Ip H, Teoh EK (1999) Symmetry detection by generalized complex moments: A close-form solution. IEEE Trans Pattern Anal Mach Intell 21(5):466–476
Article Google Scholar
Chuan CH, Chen YN, Fan KC (2016) Human action recognition based on action forests model using Kinect camera. In: Proc. Intl. Conf. Advanced Information Networking and Applications (WAINA)
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) DeCAF: A deep convolutional activation feature for generic visual recognition. In: Proc. International Conference on Machine Learning (ICML)
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32:1627–1645
Article Google Scholar
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proc. CVPR
Göring C, Rodner E, Freytag A, Denzler J (2014) Nonparametric part transfer for fine-grained recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Gourgari S, Goudelis G, Karpouzis K, Kollias S (2013) THETIS: Three dimensional tennis shots a human action dataset. In: Proc. IEEE Conf. Computer Vision and pattern Recognition Workshops (CVPRW), pp 676–681
Halim A, Dartigues-Pallez C, Precioso F, Riveill M, Benslimane A, Ghoneim S (2016) Human action recognition based on 3D skeleton part-based pose estimation and temporal multi-resolution analysis. IEEE International Conference on Image Processing (ICIP), Phoenix
Book Google Scholar
Harandi MT, Salzmann M, Hartley R (2014) From manifold to manifold: Geometry-aware dimensionality reduction for SPD matrices. In: Proc. European Conf. Computer Vision (ECCV)
He K, Gkioxari G, Dollár P, Girshick R (2018) Mask R-CNN. arXiv:1703.06870v3 [cs.CV]
Hsieh JW, Chen LC, Chen DY (2014) Symmetrical surf and its applications to vehicle detection and vehicle make and model recognition. IEEE Trans Intell Transp 15:6–20
Article Google Scholar
Klank U, Zia M (2009) Beetz M 3D model selection from an internet database for robotic vision. In: Proc. IEEE Intl. Conf. Robotics and Automation
Lawson CL, Hanson RJ (1974) Solving least squares problems. Prentice-Hall series in automatic computation. Published by Prentice-Hall, Inc., Englewood Cliffs, New Jersey
Lee TSH, Fidler S, Dickinson SJ (2013) Detecting curved symmetric parts using a deformable disc model. In: Proc. ICCV, pp 1753–1760
Levinshtein A, Dickinson SJ, Sminchisescu C (2009) Multiscale symmetric part detection and grouping. In: Proc. ICCV, pp 2162–2169
Liao L, Cao G, Cao W (2017) Action recognition based on depth image sequence. In: Proc. IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp 1583–1587
Lindeberg T (1998) Edge detection and ridge detection with automatic scale selection. Int J Comput Vis 30(2):117–156
Article Google Scholar
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proc. CVPR
Loy G, Eklundh J-O (2006) Detecting symmetry and symmetric constellations of features. In: Proceedings of the 9th European Conference on Computer Vision, pp 508–521
Lucchi A, Yunpeng L, Boix X, Smith K, Fua P (2011) Are spatial and global constraints really necessary for segmentation? In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 9–16
Mitra NJ, Pauly M, Wand M, Ceylan D (2013) Symmetry in 3D geometry: Extraction and applications. Comput Graph Forum 32:1–23
Article Google Scholar
Müller M, Röder T, Clausen M, Eberhardt B, Krüger B, Weber A (2007) Documentation Mocap Database HDM05. Technical report, No. CG-2007-2, ISSN 1610–8892, Universität Bonn
Munaro M, Fossati A, Basso A, Menegatti E, Gool LV (2013) One-Shot Person Re-Identification with a Consumer Depth Camera. In: Person Re-Identification. Springer
Park HS, Jun CH (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36(2):3336–3341
Article Google Scholar
Pei SC, Lin CN (1992) Normalization of rotationally symmetric shapes for pattern recognition. Pattern Recogn 25(9):913–920
Article Google Scholar
Pierobon M, Marcon M, Sarti A, Tubaro S (2007) A human action classifier from 4-D data (3-D+time) based on an invariant body shape descriptor and hidden Markov models. In: Procs. Intl. Conf. Signal Processing and Multimedia Applications, pp 8–11
Rother C, Kolmogorov V, Blake A (2004) GrabCut: Interactive foreground extraction using iterated graph cuts. ACM Trans Graph 23:309–314
Article Google Scholar
Saha PK, Borgefors G, GSd B (2016) A survey on skeletonization algorithms and their applications. Pattern Recognition Letters. Pattern Recogn Lett 76(1):3–12
Article Google Scholar
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2014) OverFeat: Integrated recognition, localization and detection using convolutional networks. In: Proc. International Conference on Learning Representations
Shen W, Zhao K, Jiang Y, Wang Y, Zhang Z, Bai X (2016) Object skeleton extraction in natural images by fusing scale-associated deep side outputs. In: Proc. CVPR
Shen W, Zhao K, Jiang Y, Wang Y, Bai X, Yuille A (2017) DeepSkeleton: Learning multi-task scale-associated deep side outputs for object skeleton extraction in natural images. IEEE Trans Image Processing 26(11):5298–5311
Article MathSciNet MATH Google Scholar
Sironi A, Lepetit V, Fua P (2014) Multiscale centerline detection by learning a scale-space distance transform. In: Proc. CVPR, pp 2697–2704
Su JY, Cheng SC, Huang DK (2015) Unsupervised object modeling and segmentation with symmetry detection for human activity recognition. Symmetry 7(2):427–449
Article Google Scholar
Tsogkas S, Kokkinos I (2012) Learning-based symmetry detection in natural images. In: Proc. ECCV, pp 41–54
Tzimiropoulos G, Mitianoudis N, Stathaki T (2009) A unifying approach to moment-based shape orientation and symmetry classification. IEEE Trans Image Process 18(1):125–139
Article MathSciNet MATH Google Scholar
Vainstein J, Manera JF, Negri P, Delrieux C, Maguitman A (2014) Modeling video activity with dynamic phrases and its application to action recognition in tennis videos. Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Lect Notes Comput Sci 8827:909–916
Article Google Scholar
Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition
Webb J, Ashley J (2012) Beginning Kinect Programming with the Microsoft Kinect SDK
Widynski N, Moevus A, Mignotte M (2014) Local symmetry detection in natural images using a particle filtering approach. IEEE Trans Image Process 23(12):5309–5322
Article MathSciNet MATH Google Scholar
Xiang Y, Li S (2012) Symmetric object detection based on symmetry and centripetal-sift edge descriptor. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR), pp 1403–1406
Yao BZ, Nie BX, Liu Z, Zhu S-C (2014) Animated pose templates for modeling and detecting human actions. IEEE Trans Pattern Analysis and Machine. Intelligence 36(3):436–452
Google Scholar
Zhang Q, Couloigner I (2007) Accurate centerline detection and line width estimation of thick lines using the radon transform. IEEE Trans Image Process 16(2):310–316
Article MathSciNet Google Scholar
Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based R-CNNs for fine-grained category detection. In: Proc. European Conference on Computer Vision (ECCV)

Download references

Acknowledgements

This work was supported in part by Ministry of Science and Technology, Taiwan under Grant Numbers MOST 105-2221-E-019-034-MY2, 107-2634-F-019 -001, and 106-2632-E-130-002.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Taiwan Ocean University, 2 Pei-Ning Road, Keelung, 202, Taiwan
Shyi-Chyi Cheng, Po-Fu Hsiao & Wan-Hsuan Yu
Department of Information Management, Ming Chuan University, 5 De-Ming Road, Gui-Shan, Taoyuan, 333, Taiwan
Kuei-Fang Hsiao & Chen-Kuei Yang

Authors

Shyi-Chyi Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Kuei-Fang Hsiao
View author publications
You can also search for this author in PubMed Google Scholar
Chen-Kuei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Po-Fu Hsiao
View author publications
You can also search for this author in PubMed Google Scholar
Wan-Hsuan Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kuei-Fang Hsiao.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cheng, SC., Hsiao, KF., Yang, CK. et al. A novel unsupervised 3D skeleton detection in RGB-D images for video surveillance. Multimed Tools Appl 79, 15829–15857 (2020). https://doi.org/10.1007/s11042-018-6292-y

Download citation

Received: 14 April 2018
Revised: 22 May 2018
Accepted: 18 June 2018
Published: 01 July 2018
Issue Date: June 2020
DOI: https://doi.org/10.1007/s11042-018-6292-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel unsupervised 3D skeleton detection in RGB-D images for video surveillance

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel unsupervised 3D skeleton detection in RGB-D images for video surveillance

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation