Action recognition from point cloud patches using discrete orthogonal moments

Cheng, Huaining; Chung, Soon M.

doi:10.1007/s11042-017-4711-0

Action recognition from point cloud patches using discrete orthogonal moments

Published: 27 April 2017

Volume 77, pages 8213–8236, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

584 Accesses
4 Citations
Explore all metrics

Abstract

3D sensors such as standoff Light Detection and Ranging (LIDAR) generate partial 3D point clouds that resemble patches of irregularly-shaped, coarse groups of points. 3D modeling of this type of data for human action recognition has been rarely studied. Although 2D–based depth image analysis is an option, its effectiveness on this type of low-resolution data hasn’t been well answered. This paper investigates a new multi-scale 3D shape descriptor, based on the discrete orthogonal Tchebichef Moments, for the characterization of 3D action pose shapes made of low-resolution point cloud patches. Our shape descriptor consists of low-order 3D Tchebichef moments computed with respect to a new point cloud voxelization scheme that normalizes translation, scale, and resolution. The action recognition is built on the Naïve Bayes classifier using temporal statistics of a ‘bag of pose shapes’. For performance evaluation, a synthetic LIDAR pose shape baseline was developed with 62 human subjects performing three actions ― digging, jogging, and throwing. Our action classification experiments demonstrated that the 3D Tchebichef moment representation of point clouds achieves excellent action and viewing direction predictions with superb consistency across a large range of scale and viewing angle variations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Article Open access 08 October 2020

Real-Time Human Pose Detection and Recognition Using MediaPipe

3D point cloud-based place recognition: a survey

Article Open access 07 March 2024

References

Aggarwal JK, Xia L (2014) Human activity recognition from 3D data: a review. Pattern Recogn Lett 48:70–80
Article Google Scholar
Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional space. In: Proc. Int. Conf. database theory, pp 420–434
Ballin G, Munaro M, Menegatti E (2012) Human action recognition from RGB-D frames based on real-time 3d optical flow estimation. Biologically Inspired Cognitive Architectures, Springer-Velag, pp 65–74
Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267
Article Google Scholar
Cheng H, Chung SM (2016) Orthogonal moment-based descriptors for pose shape query on 3D point cloud patches. Pattern Recognition 52, Elsevier Science:397–406
Article Google Scholar
Chihara TS (1978) An introduction to orthogonal polynomials, Gordon and Breach
Costantini L, Seidenari L, Serra G, Capodiferro L, Bimbo AD (2011) Space-time Zernike moments and pyramid kernel descriptors for action classification. In: Proc. Int. Conf. Image Anal. Processing
Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. Proc Eur Conf Comput Vis. Lect Notes Comput Sci 3952:428–441
Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. IEEE Conf Comput Vis Pattern Recogn 2625–2634
Efros AA, Berg A, Mori G, Malik J (2003) Recognizing action at a distance. Proc Int Conf Comput Vis 2:726–733
Article Google Scholar
Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253
Article Google Scholar
Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Article Google Scholar
Johnstone IM, Lu AY (2009) On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc 104:682–693
Article MathSciNet MATH Google Scholar
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. IEEE Conf Comput Vis. Pattern Recogn 1725–1732
Kazhdan M, Funkhouser T, Rusinkiewicz S (2003) Rotation invariant spherical harmonic representation of 3D shape descriptors. In: Proc. Eurographics Symp. Geometry Processing, pp 156–164
Kläser A, Marszałek M, Schmid C (2008) A spatial-temporal descriptor based on 3D gradients. In: Proc. British Mach. Vis. Conf
Krizhevsky A, Sutskever I, and Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (NIPS 2012), pp 1097–1105
Laptev I, Lindeberg T (2003) Space–time interest points. Proc Int Conf Comput Vis 2:432–439
Article MATH Google Scholar
Lassoued I, Zagrouba E, Chahir Y (2011) An efficient approach for video action classification based on 3D Zernike moments. In: Proc. Int. Conf. Future Inf. Tech., Part II, pp 196–205
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: Proc. IEEE. Conf. Comput. Vis. Pattern Recogn. Workshops, pp 9–14
Lian Z, Godil A, Sun X (2010) Visual similarity based 3D shape retrieval using bag-of-features. Int Conf Shape Model Appl 25–36
Lu Y, Li Y, Shen Y, Ding F, Wang X, Hu J, Ding S (2012) A human action recognition method based on Tchebichef moment invariants and temporal templates. In: Proc. Int. Conf. Intelligent Human-Machine Sys. and Cybernetics, vol. II, pp 76–79
Mademlis A, Axenopoulos A, Daras P, Tzovaras D, Strintzis MG (2006) 3D content-based search based on 3D Krawtchouk moments. In: Proc. Int. Symp. 3D data processing, visualization, and transmission, pp 743–749
Maturana D, Scherer S (2015) Voxnet: a 3D convolutional neural network for real-time object recognition. IEEE/RSJ Int Conf Intell Robots Sys 922–928
McCallum A, Freitag D, Pereira F (2000) Maximum entropy Markov models for information extraction and segmentation. Int Conf Mach Learning 591–598
Metsis V, Androutsopoulos I, Paliouras G (2006) Spam filtering with naive Bayes — which naive Bayes? In: Proc. Conf. Email and anti-spam, pp 27–28
Mukundan R, Ong SH, Lee PA (2001) Image analysis by Tchebichef moments. IEEE Trans Image Process 10(9):1357–1364
Article MathSciNet MATH Google Scholar
Ni B, Wang G, Moulin P (2011) RGBD-HuDaAct: a color-depth video database for human daily activity recognition. In: Proc. IEEE. Int. Conf. Comput. Vis. Workshops, pp 1147–1153
Niebles J, Wang H, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318
Article Google Scholar
Novotni M, Klein R (2004) Shape retrieval using 3D Zernike descriptors. Comput Aided Des 36(11):1047–1062
Article Google Scholar
Ohbuchi R, Osada K, Furuya T, Banno T (2008) Salient local visual features for shape-based 3D model retrieval. IEEE Int Conf Shape Model Appl 93–102
Ovsjanikov M, Bronstein AM, Bronstein MM, Guibas L (2009) Shape google: a computer vision approach to isometry invariant shape retrieval. In: Proc. workshop on non-rigid shape analysis and deformable image alignment (NORDIA’09)
Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990
Article Google Scholar
Schüldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach Int Conf Pattern Recogn 32–36
Sheng Y, Shen L (1994) Orthogonal Fourier-Mellin moments for invariant pattern recognition. J Opt Soc Am 11(6):1748–1757
Article Google Scholar
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems 27 (NIPS 2014), pp. 568–576
Sivic J, Zisserman A (2003) Video Google: A text retrieval approach to object matching in videos. Proc Int Conf Comput Vis 2:1470–1477
Article Google Scholar
Sminchisescu C, Kanaujia A, Li Z, Metaxas D (2006) Conditional models for contextual human motion recognition. Comput Vis Image Underst 104:210–220
Article Google Scholar
Sun X, Cheng M, Hauptmann A (2009) Action recognition via local descriptors and holistic features. In: Proc. IEEE Conf. Comput. Vis. Pattern Recogn. Workshops, pp 58–65
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. Proc. IEEE Conf Comput Vis Pattern Recogn
Tabia H, Daoudi M, Vandeborre J-P, Colot O (2011) Deformable shape retrieval using bag-of-feautre techniques. In: Proc. SPIE-IS&T Electronic Imaging, SPIE, vol 7864
Teague MR (1980) Image analysis via the general theory of moments. J Opt Soc Am 70(8):920–930
Article MathSciNet Google Scholar
Teh CH, Chin RT (1988) On image analysis by the methods of moments. IEEE Trans Pattern Anal Mach Intell 10(4):496–513
Article MATH Google Scholar
Vieira A, Nascimento E, Oliveira G, Liu Z, Campos M (2012) STOP: Space-time occupancy patterns for 3D action recognition from depth map sequences. Progress in Pattern Recognition, Image Analysis, Computer Vision and Application. Lect Notes Comput Sci 7441:252–259
Article Google Scholar
Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. IEEE Conf Comput Vis Pattern Recogn 3156–3164
Wang Y, Mori G (2009) Human action recognition by Semilatent topic models. IEEE Trans Pattern Anal Mach Intell 31(10):1762–1774
Article Google Scholar
Wang J, Liu Z, Chorowski J, Chen Z, Wu Y (2012) Robust 3D action recognition with random occupancy patterns. European Conf Comput Vis 872–885
Wang J, Liu Z, Wu Y, Yuan J (2014) Learning actionlet ensemble for 3D human action recognition. IEEE Trans Pattern Anal Mach Intell 36(5):914–927
Article Google Scholar
Weinland D, Ronfard R, Boyer E (2011) A survey of vision-based methods for action representation, segmentation, and recognition. Comput Vis Image Underst 115(2):224–241
Article Google Scholar
Wolf C, Mille J, Lombardi E, Celiktutan O, Jiu MB, Dellandrea E, Bichot C, Garcia C, Sankur B (2012) The LIRIS human activities dataset and the ICPR 2012 human activities recognition and localization competition. Technical report RR-LIRIS-2012-004, LIRIS Laboratory
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3D ShapeNets: a deep representation for volumetric shapes. IEEE Conf Comput Vis Pattern Recogn 1912–1920
Xia L, Chen C.-C, and Aggarwal JK (2012) View invariant human action recognition using histograms of 3D joints. In: Proc. IEEE Conf. Comput. Vis. Pattern Recogn. Workshops, pp 20–27
Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden Markov model. IEEE Conf Comput Vis Pattern Recogn 379–385
Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps based histograms of oriented gradients. ACM Int Conf Multimed 1057–1060
Yao L, Torabi A, Cho K, Ballas N, Pal C, Larochelle H, Courville A (2015) Describing videos by exploiting temporal structure. IEEE Conf Comput Vis 4507–4515
Ye M, Zhang Q, Wang L, Zhu J, Yang R, Gail J (2013) A survey on human motion analysis from depth data. Time-of-Flight and Depth Imaging, Sensors, Algorithms, and Applications. Lect Notes Comput Sci 8200:149–187
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank Isiah Davenport, Max Grattan, and Jeanne Smith for their indispensable help in the creation of biofidelic pose shape baseline.

Author information

Authors and Affiliations

711th Human Performance Wing, Air Force Research Laboratory, Wright-Patterson AFB, Dayton, OH, 45433, USA
Huaining Cheng
Department of Computer Science and Engineering, Wright State University, Dayton, OH, 45435, USA
Soon M. Chung

Authors

Huaining Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Soon M. Chung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Soon M. Chung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cheng, H., Chung, S.M. Action recognition from point cloud patches using discrete orthogonal moments. Multimed Tools Appl 77, 8213–8236 (2018). https://doi.org/10.1007/s11042-017-4711-0

Download citation

Received: 09 September 2016
Revised: 14 February 2017
Accepted: 12 April 2017
Published: 27 April 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s11042-017-4711-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Action recognition from point cloud patches using discrete orthogonal moments

Abstract

Access this article

Similar content being viewed by others

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Real-Time Human Pose Detection and Recognition Using MediaPipe

3D point cloud-based place recognition: a survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Action recognition from point cloud patches using discrete orthogonal moments

Abstract

Access this article

Similar content being viewed by others

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Real-Time Human Pose Detection and Recognition Using MediaPipe

3D point cloud-based place recognition: a survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation