Multimedia Tools and Applications

, Volume 77, Issue 7, pp 8213–8236 | Cite as

Action recognition from point cloud patches using discrete orthogonal moments

  • Huaining Cheng
  • Soon M. Chung


3D sensors such as standoff Light Detection and Ranging (LIDAR) generate partial 3D point clouds that resemble patches of irregularly-shaped, coarse groups of points. 3D modeling of this type of data for human action recognition has been rarely studied. Although 2D–based depth image analysis is an option, its effectiveness on this type of low-resolution data hasn’t been well answered. This paper investigates a new multi-scale 3D shape descriptor, based on the discrete orthogonal Tchebichef Moments, for the characterization of 3D action pose shapes made of low-resolution point cloud patches. Our shape descriptor consists of low-order 3D Tchebichef moments computed with respect to a new point cloud voxelization scheme that normalizes translation, scale, and resolution. The action recognition is built on the Naïve Bayes classifier using temporal statistics of a ‘bag of pose shapes’. For performance evaluation, a synthetic LIDAR pose shape baseline was developed with 62 human subjects performing three actions ― digging, jogging, and throwing. Our action classification experiments demonstrated that the 3D Tchebichef moment representation of point clouds achieves excellent action and viewing direction predictions with superb consistency across a large range of scale and viewing angle variations.


LIDAR Point cloud Action recognition Discrete orthogonal moment Tchebichef moment 



The authors would like to thank Isiah Davenport, Max Grattan, and Jeanne Smith for their indispensable help in the creation of biofidelic pose shape baseline.


  1. 1.
    Aggarwal JK, Xia L (2014) Human activity recognition from 3D data: a review. Pattern Recogn Lett 48:70–80CrossRefGoogle Scholar
  2. 2.
    Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional space. In: Proc. Int. Conf. database theory, pp 420–434Google Scholar
  3. 3.
    Ballin G, Munaro M, Menegatti E (2012) Human action recognition from RGB-D frames based on real-time 3d optical flow estimation. Biologically Inspired Cognitive Architectures, Springer-Velag, pp 65–74Google Scholar
  4. 4.
    Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267CrossRefGoogle Scholar
  5. 5.
    Cheng H, Chung SM (2016) Orthogonal moment-based descriptors for pose shape query on 3D point cloud patches. Pattern Recognition 52, Elsevier Science:397–406CrossRefGoogle Scholar
  6. 6.
    Chihara TS (1978) An introduction to orthogonal polynomials, Gordon and BreachGoogle Scholar
  7. 7.
    Costantini L, Seidenari L, Serra G, Capodiferro L, Bimbo AD (2011) Space-time Zernike moments and pyramid kernel descriptors for action classification. In: Proc. Int. Conf. Image Anal. ProcessingGoogle Scholar
  8. 8.
    Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. Proc Eur Conf Comput Vis. Lect Notes Comput Sci 3952:428–441Google Scholar
  9. 9.
    Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. IEEE Conf Comput Vis Pattern Recogn 2625–2634Google Scholar
  10. 10.
    Efros AA, Berg A, Mori G, Malik J (2003) Recognizing action at a distance. Proc Int Conf Comput Vis 2:726–733CrossRefGoogle Scholar
  11. 11.
    Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253CrossRefGoogle Scholar
  12. 12.
    Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231CrossRefGoogle Scholar
  13. 13.
    Johnstone IM, Lu AY (2009) On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc 104:682–693MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. IEEE Conf Comput Vis. Pattern Recogn 1725–1732Google Scholar
  15. 15.
    Kazhdan M, Funkhouser T, Rusinkiewicz S (2003) Rotation invariant spherical harmonic representation of 3D shape descriptors. In: Proc. Eurographics Symp. Geometry Processing, pp 156–164Google Scholar
  16. 16.
    Kläser A, Marszałek M, Schmid C (2008) A spatial-temporal descriptor based on 3D gradients. In: Proc. British Mach. Vis. ConfGoogle Scholar
  17. 17.
    Krizhevsky A, Sutskever I, and Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (NIPS 2012), pp 1097–1105Google Scholar
  18. 18.
    Laptev I, Lindeberg T (2003) Space–time interest points. Proc Int Conf Comput Vis 2:432–439CrossRefzbMATHGoogle Scholar
  19. 19.
    Lassoued I, Zagrouba E, Chahir Y (2011) An efficient approach for video action classification based on 3D Zernike moments. In: Proc. Int. Conf. Future Inf. Tech., Part II, pp 196–205Google Scholar
  20. 20.
    Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: Proc. IEEE. Conf. Comput. Vis. Pattern Recogn. Workshops, pp 9–14Google Scholar
  21. 21.
    Lian Z, Godil A, Sun X (2010) Visual similarity based 3D shape retrieval using bag-of-features. Int Conf Shape Model Appl 25–36Google Scholar
  22. 22.
    Lu Y, Li Y, Shen Y, Ding F, Wang X, Hu J, Ding S (2012) A human action recognition method based on Tchebichef moment invariants and temporal templates. In: Proc. Int. Conf. Intelligent Human-Machine Sys. and Cybernetics, vol. II, pp 76–79Google Scholar
  23. 23.
    Mademlis A, Axenopoulos A, Daras P, Tzovaras D, Strintzis MG (2006) 3D content-based search based on 3D Krawtchouk moments. In: Proc. Int. Symp. 3D data processing, visualization, and transmission, pp 743–749Google Scholar
  24. 24.
    Maturana D, Scherer S (2015) Voxnet: a 3D convolutional neural network for real-time object recognition. IEEE/RSJ Int Conf Intell Robots Sys 922–928Google Scholar
  25. 25.
    McCallum A, Freitag D, Pereira F (2000) Maximum entropy Markov models for information extraction and segmentation. Int Conf Mach Learning 591–598Google Scholar
  26. 26.
    Metsis V, Androutsopoulos I, Paliouras G (2006) Spam filtering with naive Bayes — which naive Bayes? In: Proc. Conf. Email and anti-spam, pp 27–28Google Scholar
  27. 27.
    Mukundan R, Ong SH, Lee PA (2001) Image analysis by Tchebichef moments. IEEE Trans Image Process 10(9):1357–1364MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Ni B, Wang G, Moulin P (2011) RGBD-HuDaAct: a color-depth video database for human daily activity recognition. In: Proc. IEEE. Int. Conf. Comput. Vis. Workshops, pp 1147–1153Google Scholar
  29. 29.
    Niebles J, Wang H, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318CrossRefGoogle Scholar
  30. 30.
    Novotni M, Klein R (2004) Shape retrieval using 3D Zernike descriptors. Comput Aided Des 36(11):1047–1062CrossRefGoogle Scholar
  31. 31.
    Ohbuchi R, Osada K, Furuya T, Banno T (2008) Salient local visual features for shape-based 3D model retrieval. IEEE Int Conf Shape Model Appl 93–102Google Scholar
  32. 32.
    Ovsjanikov M, Bronstein AM, Bronstein MM, Guibas L (2009) Shape google: a computer vision approach to isometry invariant shape retrieval. In: Proc. workshop on non-rigid shape analysis and deformable image alignment (NORDIA’09)Google Scholar
  33. 33.
    Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990CrossRefGoogle Scholar
  34. 34.
    Schüldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach Int Conf Pattern Recogn 32–36Google Scholar
  35. 35.
    Sheng Y, Shen L (1994) Orthogonal Fourier-Mellin moments for invariant pattern recognition. J Opt Soc Am 11(6):1748–1757CrossRefGoogle Scholar
  36. 36.
    Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems 27 (NIPS 2014), pp. 568–576Google Scholar
  37. 37.
    Sivic J, Zisserman A (2003) Video Google: A text retrieval approach to object matching in videos. Proc Int Conf Comput Vis 2:1470–1477CrossRefGoogle Scholar
  38. 38.
    Sminchisescu C, Kanaujia A, Li Z, Metaxas D (2006) Conditional models for contextual human motion recognition. Comput Vis Image Underst 104:210–220CrossRefGoogle Scholar
  39. 39.
    Sun X, Cheng M, Hauptmann A (2009) Action recognition via local descriptors and holistic features. In: Proc. IEEE Conf. Comput. Vis. Pattern Recogn. Workshops, pp 58–65Google Scholar
  40. 40.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. Proc. IEEE Conf Comput Vis Pattern RecognGoogle Scholar
  41. 41.
    Tabia H, Daoudi M, Vandeborre J-P, Colot O (2011) Deformable shape retrieval using bag-of-feautre techniques. In: Proc. SPIE-IS&T Electronic Imaging, SPIE, vol 7864Google Scholar
  42. 42.
    Teague MR (1980) Image analysis via the general theory of moments. J Opt Soc Am 70(8):920–930MathSciNetCrossRefGoogle Scholar
  43. 43.
    Teh CH, Chin RT (1988) On image analysis by the methods of moments. IEEE Trans Pattern Anal Mach Intell 10(4):496–513CrossRefzbMATHGoogle Scholar
  44. 44.
    Vieira A, Nascimento E, Oliveira G, Liu Z, Campos M (2012) STOP: Space-time occupancy patterns for 3D action recognition from depth map sequences. Progress in Pattern Recognition, Image Analysis, Computer Vision and Application. Lect Notes Comput Sci 7441:252–259CrossRefGoogle Scholar
  45. 45.
    Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. IEEE Conf Comput Vis Pattern Recogn 3156–3164Google Scholar
  46. 46.
    Wang Y, Mori G (2009) Human action recognition by Semilatent topic models. IEEE Trans Pattern Anal Mach Intell 31(10):1762–1774CrossRefGoogle Scholar
  47. 47.
    Wang J, Liu Z, Chorowski J, Chen Z, Wu Y (2012) Robust 3D action recognition with random occupancy patterns. European Conf Comput Vis 872–885Google Scholar
  48. 48.
    Wang J, Liu Z, Wu Y, Yuan J (2014) Learning actionlet ensemble for 3D human action recognition. IEEE Trans Pattern Anal Mach Intell 36(5):914–927CrossRefGoogle Scholar
  49. 49.
    Weinland D, Ronfard R, Boyer E (2011) A survey of vision-based methods for action representation, segmentation, and recognition. Comput Vis Image Underst 115(2):224–241CrossRefGoogle Scholar
  50. 50.
    Wolf C, Mille J, Lombardi E, Celiktutan O, Jiu MB, Dellandrea E, Bichot C, Garcia C, Sankur B (2012) The LIRIS human activities dataset and the ICPR 2012 human activities recognition and localization competition. Technical report RR-LIRIS-2012-004, LIRIS LaboratoryGoogle Scholar
  51. 51.
    Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3D ShapeNets: a deep representation for volumetric shapes. IEEE Conf Comput Vis Pattern Recogn 1912–1920Google Scholar
  52. 52.
    Xia L, Chen C.-C, and Aggarwal JK (2012) View invariant human action recognition using histograms of 3D joints. In: Proc. IEEE Conf. Comput. Vis. Pattern Recogn. Workshops, pp 20–27Google Scholar
  53. 53.
    Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden Markov model. IEEE Conf Comput Vis Pattern Recogn 379–385Google Scholar
  54. 54.
    Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps based histograms of oriented gradients. ACM Int Conf Multimed 1057–1060Google Scholar
  55. 55.
    Yao L, Torabi A, Cho K, Ballas N, Pal C, Larochelle H, Courville A (2015) Describing videos by exploiting temporal structure. IEEE Conf Comput Vis 4507–4515Google Scholar
  56. 56.
    Ye M, Zhang Q, Wang L, Zhu J, Yang R, Gail J (2013) A survey on human motion analysis from depth data. Time-of-Flight and Depth Imaging, Sensors, Algorithms, and Applications. Lect Notes Comput Sci 8200:149–187CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.711th Human Performance Wing, Air Force Research Laboratory, Wright-Patterson AFBDaytonUSA
  2. 2.Department of Computer Science and EngineeringWright State UniversityDaytonUSA

Personalised recommendations