Skip to main content

Advertisement

Log in

A novel unsupervised 3D skeleton detection in RGB-D images for video surveillance

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper we present a novel moment-based skeleton detection for representing human objects in RGB-D videos with animated 3D skeletons. An object often consists of several parts, where each of them can be concisely represented with a skeleton. However, it remains as a challenge to detect the skeletons of individual objects in an image since it requires an effective part detector and a part merging algorithm to group parts into objects. In this paper, we present a novel fully unsupervised learning framework to detect the skeletons of human objects in a RGB-D video. The skeleton modeling algorithm uses a pipeline architecture which consists of a series of cascaded operations, i.e., symmetry patch detection, linear time search of symmetry patch pairs, part and symmetry detection, symmetry graph partitioning, and object segmentation. The properties of geometric moment-based functions for embedding symmetry features into centers of symmetry patches are also investigated in detail. As compared with the state-of-the-art deep learning approaches for skeleton detection, the proposed approach does not require tedious human labeling work on training images to locate the skeleton pixels and their associated scale information. Although our algorithm can detect parts and objects simultaneously, a pre-learned convolution neural network (CNN) can be used to locate the human object from each frame of the input video RGB-D video in order to achieve the goal of constructing real-time applications. This much reduces the complexity to detect the skeleton structure of individual human objects with our proposed method. Using the segmented human object skeleton model, a video surveillance application is constructed to verify the effectiveness of the approach. Experimental results show that the proposed method gives good performance in terms of detection and recognition using publicly available datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Alexiadis DS, Zarpalas D, Daras P (2013) Real-time, full 3-D reconstruction of moving foreground objects from multiple consumer depth cameras. IEEE Trans Multimedia 15(2):339–358. https://doi.org/10.1109/TMM.2012.2229264

    Article  Google Scholar 

  2. Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans PAMI 39(12):2481–2495

    Article  Google Scholar 

  3. Barnachon M, Bouakaz S, Boufama B, Guillou E (2014) Ongoing human action recognition with motion capture. Pattern Recogn 47(1):238–247

    Article  Google Scholar 

  4. Berg T, Belhumeur PN (2013) POOF: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  5. Besl P, McKay N (1992) A method for registration of 3-D shapes. IEEE Trans PAMI 14(2):239–256

    Article  Google Scholar 

  6. Boykov Y, Funka-Lea G (2006) Graph cuts and efficient n-d image segmentation. Int J Comput Vis 70:109–131

    Article  Google Scholar 

  7. Buluç A, Meyerhenke H, Safro I, Sanders P, Schulz C (2015) Recent Advances in Graph Partitioning. arXiv:13113144v3 [csDS]

  8. Cheng SC, Su JY, Hsiao KF, Rashvand HF (2016) Latent semantic learning with time-series cross correlation analysis for video scene detection and classification. Multimedia Tools and Applications 75(20):12919–12940

    Article  Google Scholar 

  9. Cheung KKT, Shen D, Ip H, Teoh EK (1999) Symmetry detection by generalized complex moments: A close-form solution. IEEE Trans Pattern Anal Mach Intell 21(5):466–476

    Article  Google Scholar 

  10. Chuan CH, Chen YN, Fan KC (2016) Human action recognition based on action forests model using Kinect camera. In: Proc. Intl. Conf. Advanced Information Networking and Applications (WAINA)

  11. Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) DeCAF: A deep convolutional activation feature for generic visual recognition. In: Proc. International Conference on Machine Learning (ICML)

  12. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32:1627–1645

    Article  Google Scholar 

  13. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proc. CVPR

  14. Göring C, Rodner E, Freytag A, Denzler J (2014) Nonparametric part transfer for fine-grained recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  15. Gourgari S, Goudelis G, Karpouzis K, Kollias S (2013) THETIS: Three dimensional tennis shots a human action dataset. In: Proc. IEEE Conf. Computer Vision and pattern Recognition Workshops (CVPRW), pp 676–681

  16. Halim A, Dartigues-Pallez C, Precioso F, Riveill M, Benslimane A, Ghoneim S (2016) Human action recognition based on 3D skeleton part-based pose estimation and temporal multi-resolution analysis. IEEE International Conference on Image Processing (ICIP), Phoenix

    Book  Google Scholar 

  17. Harandi MT, Salzmann M, Hartley R (2014) From manifold to manifold: Geometry-aware dimensionality reduction for SPD matrices. In: Proc. European Conf. Computer Vision (ECCV)

  18. He K, Gkioxari G, Dollár P, Girshick R (2018) Mask R-CNN. arXiv:1703.06870v3 [cs.CV]

  19. Hsieh JW, Chen LC, Chen DY (2014) Symmetrical surf and its applications to vehicle detection and vehicle make and model recognition. IEEE Trans Intell Transp 15:6–20

    Article  Google Scholar 

  20. Klank U, Zia M (2009) Beetz M 3D model selection from an internet database for robotic vision. In: Proc. IEEE Intl. Conf. Robotics and Automation

  21. Lawson CL, Hanson RJ (1974) Solving least squares problems. Prentice-Hall series in automatic computation. Published by Prentice-Hall, Inc., Englewood Cliffs, New Jersey

  22. Lee TSH, Fidler S, Dickinson SJ (2013) Detecting curved symmetric parts using a deformable disc model. In: Proc. ICCV, pp 1753–1760

  23. Levinshtein A, Dickinson SJ, Sminchisescu C (2009) Multiscale symmetric part detection and grouping. In: Proc. ICCV, pp 2162–2169

  24. Liao L, Cao G, Cao W (2017) Action recognition based on depth image sequence. In: Proc. IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp 1583–1587

  25. Lindeberg T (1998) Edge detection and ridge detection with automatic scale selection. Int J Comput Vis 30(2):117–156

    Article  Google Scholar 

  26. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proc. CVPR

  27. Loy G, Eklundh J-O (2006) Detecting symmetry and symmetric constellations of features. In: Proceedings of the 9th European Conference on Computer Vision, pp 508–521

  28. Lucchi A, Yunpeng L, Boix X, Smith K, Fua P (2011) Are spatial and global constraints really necessary for segmentation? In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 9–16

  29. Mitra NJ, Pauly M, Wand M, Ceylan D (2013) Symmetry in 3D geometry: Extraction and applications. Comput Graph Forum 32:1–23

    Article  Google Scholar 

  30. Müller M, Röder T, Clausen M, Eberhardt B, Krüger B, Weber A (2007) Documentation Mocap Database HDM05. Technical report, No. CG-2007-2, ISSN 1610–8892, Universität Bonn

  31. Munaro M, Fossati A, Basso A, Menegatti E, Gool LV (2013) One-Shot Person Re-Identification with a Consumer Depth Camera. In: Person Re-Identification. Springer

  32. Park HS, Jun CH (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36(2):3336–3341

    Article  Google Scholar 

  33. Pei SC, Lin CN (1992) Normalization of rotationally symmetric shapes for pattern recognition. Pattern Recogn 25(9):913–920

    Article  Google Scholar 

  34. Pierobon M, Marcon M, Sarti A, Tubaro S (2007) A human action classifier from 4-D data (3-D+time) based on an invariant body shape descriptor and hidden Markov models. In: Procs. Intl. Conf. Signal Processing and Multimedia Applications, pp 8–11

  35. Rother C, Kolmogorov V, Blake A (2004) GrabCut: Interactive foreground extraction using iterated graph cuts. ACM Trans Graph 23:309–314

    Article  Google Scholar 

  36. Saha PK, Borgefors G, GSd B (2016) A survey on skeletonization algorithms and their applications. Pattern Recognition Letters. Pattern Recogn Lett 76(1):3–12

    Article  Google Scholar 

  37. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2014) OverFeat: Integrated recognition, localization and detection using convolutional networks. In: Proc. International Conference on Learning Representations

  38. Shen W, Zhao K, Jiang Y, Wang Y, Zhang Z, Bai X (2016) Object skeleton extraction in natural images by fusing scale-associated deep side outputs. In: Proc. CVPR

  39. Shen W, Zhao K, Jiang Y, Wang Y, Bai X, Yuille A (2017) DeepSkeleton: Learning multi-task scale-associated deep side outputs for object skeleton extraction in natural images. IEEE Trans Image Processing 26(11):5298–5311

    Article  MathSciNet  MATH  Google Scholar 

  40. Sironi A, Lepetit V, Fua P (2014) Multiscale centerline detection by learning a scale-space distance transform. In: Proc. CVPR, pp 2697–2704

  41. Su JY, Cheng SC, Huang DK (2015) Unsupervised object modeling and segmentation with symmetry detection for human activity recognition. Symmetry 7(2):427–449

    Article  Google Scholar 

  42. Tsogkas S, Kokkinos I (2012) Learning-based symmetry detection in natural images. In: Proc. ECCV, pp 41–54

  43. Tzimiropoulos G, Mitianoudis N, Stathaki T (2009) A unifying approach to moment-based shape orientation and symmetry classification. IEEE Trans Image Process 18(1):125–139

    Article  MathSciNet  MATH  Google Scholar 

  44. Vainstein J, Manera JF, Negri P, Delrieux C, Maguitman A (2014) Modeling video activity with dynamic phrases and its application to action recognition in tennis videos. Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Lect Notes Comput Sci 8827:909–916

    Article  Google Scholar 

  45. Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition

  46. Webb J, Ashley J (2012) Beginning Kinect Programming with the Microsoft Kinect SDK

  47. Widynski N, Moevus A, Mignotte M (2014) Local symmetry detection in natural images using a particle filtering approach. IEEE Trans Image Process 23(12):5309–5322

    Article  MathSciNet  MATH  Google Scholar 

  48. Xiang Y, Li S (2012) Symmetric object detection based on symmetry and centripetal-sift edge descriptor. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR), pp 1403–1406

  49. Yao BZ, Nie BX, Liu Z, Zhu S-C (2014) Animated pose templates for modeling and detecting human actions. IEEE Trans Pattern Analysis and Machine. Intelligence 36(3):436–452

    Google Scholar 

  50. Zhang Q, Couloigner I (2007) Accurate centerline detection and line width estimation of thick lines using the radon transform. IEEE Trans Image Process 16(2):310–316

    Article  MathSciNet  Google Scholar 

  51. Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based R-CNNs for fine-grained category detection. In: Proc. European Conference on Computer Vision (ECCV)

Download references

Acknowledgements

This work was supported in part by Ministry of Science and Technology, Taiwan under Grant Numbers MOST 105-2221-E-019-034-MY2, 107-2634-F-019 -001, and 106-2632-E-130-002.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kuei-Fang Hsiao.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, SC., Hsiao, KF., Yang, CK. et al. A novel unsupervised 3D skeleton detection in RGB-D images for video surveillance. Multimed Tools Appl 79, 15829–15857 (2020). https://doi.org/10.1007/s11042-018-6292-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6292-y

Keywords

Navigation