A framework for interpreting, modeling and recognizing human body gestures through 3D eigenpostures

Marcon, Marco; Paracchini, Marco Brando Mario; Tubaro, Stefano

doi:10.1007/s13042-018-0801-1

A framework for interpreting, modeling and recognizing human body gestures through 3D eigenpostures

Original Article
Published: 21 March 2018

Volume 10, pages 1205–1226, (2019)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Marco Marcon ORCID: orcid.org/0000-0001-6557-2120¹,
Marco Brando Mario Paracchini¹ &
Stefano Tubaro¹

322 Accesses
5 Citations
Explore all metrics

Abstract

In this article we propose a novel system for recognizing human gestures through acquisition and processing of volumetric data sequences. Volumetric sequences are acquired with two different approaches, a multi-camera set-up and a multi-Kinect\(^\mathrm{TM}\) set-up. The recognition based on volumetric representation does not require any skeleton fitting or limb tracking and the system relies on the extraction of robust features directly from the available 3D data. Volumetric shape descriptors are, in fact, invariant with respect to viewpoint and body size; they are designed to provide us with a unique signature for each posture. Hidden Markov Models (HMMs), trained on different gestures, are then used for identifying a set of key postures and classifying their sequences over a set of possible actions. The paper also presents a method for identifying the number of hidden states of the HMMs that describe gestures. Despite its implementation and conceptual simplicity, the number of states that we estimate with this method turns out to match that of other classical approaches such as the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC). The same approach is also applied in the definition of the Gaussian Mixture for the Hidden states Observations providing us with good results. Extensive tests were performed on a database that we acquired, which is made of ten different actions, each performed by five different actors and in five different ways (different speed and orientation) and on another public database, achieving a 96% correct recognition rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3D Activity Recognition Using Motion History and Binary Shape Templates

Laban movement analysis and hidden Markov models for dynamic 3D gesture recognition

Article Open access 01 August 2017

Arthur Truong & Titus Zaharia

Depth-based human action recognition using histogram of templates

Article 09 October 2023

Merzouk Younsi, Samir Yesli & Moussa Diaf

Notes

http://www.marcon.net\(\rightarrow\) projects \(\rightarrow\) volumetric gesture recognition.
http://www.r-project.org.
http://cran.r-project.org/web/packages/R.matlab.

References

(2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666
Aggarwal JK, Cai Q (1997) Human motion analysis: a review. In: Proc. of IEEE Int. Worksh. on Nonrigid and Articul. Motion, San Juan, Puerto Rico, 16 June 1997, pp 90–102,
Aggarwal JK, Park S (2004) Human motion: modeling and recognition of actions and interactions. In: Proc. of IEEE Int. Symp. on 3D Data Proc. Visual. and Transm. (3DPVT’04), Thessaloniki, Greece, 6–9 Sep 2004, pp 640–647
Ahmad M, Lee S-W (2006) Human action recognition using multi-view image sequences features. In: Proc. of IEEE Int. Conf. on Autom. Face and Gest. Rec. (FGR’06), Southampton, UK, 10–12 April 2006, pp 523–528,
Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267
Article Google Scholar
Brand M, Oliver N, Pentland A (1997) Coupled hidden markov models for complex action recognition. In: Proc. of IEEE Int. Conf. on Comp. Vision and Patt. Rec. (CVPR’97), San Juan, Puerto Rico, 17–19 June 1997, pp 994–999
Burnham K, Anderson D (2004) Multimodel inference: understanding aic and bic in model selection. Sociol Methods Res 33:261–304
Article MathSciNet Google Scholar
Cai Q, Aggarwal J (1999) Tracking human motion using a distributed-camera system in structured environments. IEEE Trans Pattern Anal Mach Intell 21(12):1241–1247
Article Google Scholar
Cai Q, Aggarwal JK (1996) Tracking human motion using multiple cameras. In: Proc. of IEEE Int. Conf. on Patt. Rec. (ICPR’96), Vienna, Austria, 25–29 Aug 1996, pp 68–72
Cappé O, Moulines E, Rydén T (2005) Inference in hidden Markov models. Springer, New York
MATH Google Scholar
Chen L, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:abs/1706.05587
Cheung GK, Kanade T, Bouguet J-Y, Holler M (2000) A real time system for robust 3d voxel reconstruction of human motions. In: Proc. of IEEE Int. Conf. on Comp. Vision and Patt. Rec. (CVPR’96), Hilton Head Island, SC, USA, 13–15 June 2000, pp 714–720
Wren C, Azarbayejani A, Darrell T, Pentland A (1997) Pfinder: real-time tracking of the human body. IEEE Trans Pattern Anal Mach Intell 19:780–785
Article Google Scholar
de Aguiar E, Theobalt C, Magnor M, Theisel H, Seidel H-P (2005) M\(^{3}\): Marker-free model reconstruction and motion tracking from 3d voxel data. In: Proc. of IEEE Pacific Conf. on Comp. Graph. and Appl. (PG’04), Seoul, Korea, 6–8 Oct 2005, pp 101—110
Dockstader SL, Tekalp AM (2001) Multiple camera tracking of interacting and occluded human motion. Proc IEEE 89(10):1441–1455
Article Google Scholar
Duda R, Hart P (1974) Pattern classification and scene analysis. Wiley, New York
MATH Google Scholar
(2006) Pattern Recognit Lett An introduction to ROC analysis. 27(8):861–874
Gavrila D, Davis L (1996) 3-d model-based tracking of humans in action: a multi-view approach. In: Proc. of IEEE Int. Conf. on Comp. Vision and Patt. Rec. (CVPR’96), San Francisco, CA, USA, 18–20 June 1996, pp 73–80,
Gavrila DM (1999) The visual analysis of human movement: a survey. Comput Vis Image Underst 73(1):82–98
Article MATH Google Scholar
Gkalelis N, Kim H, Hilton A, Nikolaidis N, Pitas I (2009) The i3dpost multi-view and 3d human action/interaction database. In: Conference for Visual Media Production, Proc. of
Grau O, Pullen T, Thomas GA (2004) A combined studio production system for 3-d capturing of live action and immersive actor feedback. IEEE Trans Circuits Syst Video Technol 14(3):370–380
Article Google Scholar
Grest D, Woetzel J, Koch R (2005) Nonlinear body pose estimation from depth images. In. In Proc, DAGM
Guerra-Filho G, Aloimonos Y (2007) A language for human action. Computer 40(5):42–51
Article Google Scholar
Huang KS, Trivedi MM (2007) 3d shape context based gesture analysis integrated with tracking using omni video array. In: Proc. of IEEE Works. on Vis. for Hum.-Comp. Inter. (V4HCI), in conjunction with IEEE Conf. on Comp. Vis. and Patt. Rec. (CVPR’05), San Diego, CA, USA, 20–25 June 2007
Hummels C, Stappers PJ (1998) Meaningful gestures for human computer interaction: beyond hand postures. In: Proc. IEEE international conference on automatic face and gesture recognition (FG’98), Nara, Japan, 14–16 April 1998, pp 591–596
Hwang B-W, Kim S, Lee S-W (2006) A full-body gesture database for automatic gesture recognition. In: Proc. of IEEE Int. Conf. on Autom. Face and Gest. Rec. (FGR’06), Southampton, UK, 10–12 April 2006, pp 243–248
Jolliffe I (2002) Principal component analysis. Springer series in statistics, 2nd edn. Springer, New York
Google Scholar
Junejo IN, Dexter E, Laptev I, Perez P (2011) View-independent action recognition from temporal self-similarities. IEEE Trans Pattern Anal Mach Intell 33:172–185
Article Google Scholar
Kahol K, Tripathi P, Panchanathan S (2006) Documenting motion sequences with a personalized annotation system. IEEE Multimed 13(1):37–45
Article Google Scholar
Kakadiaris I, Metaxas D (2000) Model-based estimation of 3d human motion. IEEE Trans Pattern Anal Mach Intell 22(12):1453–1459
Article Google Scholar
Kakadiaris IA, Metaxas D (1995). 3d human body model acquisition from multiple views. In: Proc. of IEEE Int. Conf. on Comp. Vision (ICCV’95), Boston, MA, 20–23 June 1995, pp 618–623
Kelly PH, Katkere A, Kuramura DY, Moezzi S, Chatterjee S, Jain R (1995) An architecture for multiple perspective interactive video. In: Proc. of ACM Int. Conf. on Multim., San Francisco, CA, USA, 5–9 Nov 1995, pp 201–212
Kutulakos KN, Seitz SM (2000) A theory of shape by space carving. Int J Comput Vis 38(3):199–218
Article MATH Google Scholar
Laurentini A (1994) The visual hull concept for silhouette-based image understanding. IEEE Trans Pattern Anal Mach Intell 16:150–162
Article Google Scholar
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
LeCun Y, Cortes C (2010) MNIST handwritten digit database
Li G, Ren P, Lyu X, Zhang H (Dec 2016). Real-time top-view people counting based on a kinect and nvidia jetson tk1 integrated platform. In: 2016 IEEE 16th international conference on data mining workshops (ICDMW), pp 468–473
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal LSTM with trust gates for 3d human action recognition. arXiv:abs/1607.07043
Maitra R (2009) Initializing partition-optimization algorithms. IEEE/ACM Trans Comput Biol Bioinf 6(1):144–157
Article Google Scholar
Marcon M, Frigerio E, Sarti A, Tubaro S (2012) 3d correspondences in textured depth-maps through planar similarity transform. In: IEEE emerging signal processing applications, Int. conf
Marcon M, Frigerio E, Tubaro S, Sarti A (2012) 3d wide baseline correspondences using depth-maps. Sign Process Image Commun 27:849–855
Article Google Scholar
Microsoft (2012) Kinect sdk: http://www.microsoft.com/en-us/kinectforwindows/develop/
Moeslund TB, Granum E (2001) A survey of computer vision-based human motion capture. Comput Vis Image Underst 81(3):231–268
Article MATH Google Scholar
Nespoulous JL, Perron P, Lecours AR (1986) The biological foundations of gestures: motor and semiotic aspects. Lawrence Erlbaum Associates, New Jersey
Google Scholar
OToole AJ, Harms J, Snow SL, Hurst DR (2005) A video database of moving faces and people. IEEE Trans Pattern Anal Mach Intell 27(5):812–816
Article Google Scholar
Pentland AP (1996) Smart rooms. Sci Am 247(4):54–62
Google Scholar
Peterson AD, Ghosh AP, Maitra R (2010) A systematic evaluation of different methods for initializing the k-means clustering algorithm. In: Knowledge creation diffusion utilization, pp 1–11
Pham TTD, Nguyen HT, Lee S, Won CS (Oct 2016). Moving object detection with kinect v2. In: 2016 IEEE international conference on consumer electronics-Asia (ICCE-Asia), pp 1–4
Picone JW (1993) Signal modeling techniques in speech recognition. Proc IEEE 81(9):1215–1247
Article Google Scholar
Plagemann C, Ganapathi V, Koller D, Thrun S (2010) Real-time identification and localization of body parts from depth images. In: Proc, ICRA
Polana R, Nelson R (1994) Low level recognition of human motion (or how to get your man without finding his body parts). In: Proc. of IEEE Worksh. on Mot. of Non-Rigid and Artic. Obj. (NAM’94), Austin, Texas, USA, 11–12 Nov 1994, pp 77–82
Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Article Google Scholar
Reng L, Moeslund TB, Granum E (2005) Finding motion primitives in human body gestures. In: Proc. of Int. Worksh. on Gest. in Hum.-Comp. Interact. and Sim. (GW’05), Berder, France, 18–20 May 2005, pp 133–144
Sha Y, Shi P, Pan D, Zhou S (2016) Human pose estimation combined with depth information. In: 2016 IEEE advanced information management, communicates, electronic and automation control conference (IMCEC), pp 663–667
Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from a single depth image. In: Proceeding of internation conference on computer vision and pattern recognition
Soleimani V, Mirmehdi M, Damen D, Hannuna S, Camplani M (2016) 3d data acquisition and registration using two opposing kinects. In: 2016 fourth international conference on 3D vision (3DV), pp 128–137
Starck J, Hilton A (2007) Surface capture for performance based animation. IEEE Comput Graph Appl 27(3):21–31
Article Google Scholar
Stoll PA, Ohya J (1995) Applications of hmm modeling to recognizing human gestures in image sequences for a man-machine interface. In: Proc. of IEEE Int. Works. on Robot and Hum. Comm. (RO-MAN’95), Tokyo, JAPAN, 5–7 July 1995, pp 129–134
Sundaresan A, Chellappa R (2005) Markerless motion capture using multiple cameras. In: Proc. of IEEE Comp. Vis. for Inter. and Intell. Env. (CVIIE’05), Lexington, KY, USA, 17–18 Nov 2005, pp 15—26
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: 2015 IEEE international conference on computer vision (ICCV), pp 4489–4497
Trivedi MM, Huang KS, Mikic I (2005) Dynamic context capture and distributed video arrays for intelligent spaces. IEEE Trans Syst Man Cybern Part A Syst Hum 35(1):145–163
Article Google Scholar
Wasserman L (2000) Bayesian model selection and model averaging. J Math Psychol 44(1):92–107
Article MathSciNet MATH Google Scholar
Weinland D, Ronfard R, Boyer E (2006) Free viewpoint action recognition using motion history volumes. Comput Vis Image Underst 104(2–3):249–257
Article Google Scholar
Welch L (2003) Hidden markov models and the baum-welch algorithm. In: Prez LC (ed) IEEE information theory society newsletter, vol 53, pp 1, 10–13
Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden markov model. In: Proc. of IEEE Int. Conf. on Comp. Vision and Patt. Rec. (CVPR’92), Champaign, IL, USA, 15–18 June 1992, pp 379–385
Yang H-D, Park A-Y, Lee S-W (2007) Gesture spotting and recognition for humanrobot interaction. EEE Trans Robot 23(2):256–270
Article Google Scholar
Yu G, Yuan J, Liu Z (2012) Propagative hough voting for human activity recognition. Springer, Berlin, pp 693–706
Google Scholar
Zhang D, Gatica-Perez D, Bengio S, McCowan I (2006) Modeling individual and group actions in meetings with layered hmms. IEEE Trans Multimed 8(3):509–520
Article Google Scholar
Zhao H, Shi J, Qi X, Wang X, Jia J (2016) Pyramid scene parsing network. arXiv:abs/1612.01105
Zucchini W, MacDonald IL (2008) Hidden Markov models for time series. Chapman & Hall-CRC, Boca Raton
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci, 32, 20133, Milan, Italy
Marco Marcon, Marco Brando Mario Paracchini & Stefano Tubaro

Authors

Marco Marcon
View author publications
You can also search for this author in PubMed Google Scholar
Marco Brando Mario Paracchini
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Tubaro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco Marcon.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marcon, M., Paracchini, M.B.M. & Tubaro, S. A framework for interpreting, modeling and recognizing human body gestures through 3D eigenpostures. Int. J. Mach. Learn. & Cyber. 10, 1205–1226 (2019). https://doi.org/10.1007/s13042-018-0801-1

Download citation

Received: 03 April 2017
Accepted: 26 February 2018
Published: 21 March 2018
Issue Date: 01 May 2019
DOI: https://doi.org/10.1007/s13042-018-0801-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A framework for interpreting, modeling and recognizing human body gestures through 3D eigenpostures

Abstract

Access this article

Similar content being viewed by others

3D Activity Recognition Using Motion History and Binary Shape Templates

Laban movement analysis and hidden Markov models for dynamic 3D gesture recognition

Depth-based human action recognition using histogram of templates

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A framework for interpreting, modeling and recognizing human body gestures through 3D eigenpostures

Abstract

Access this article

Similar content being viewed by others

3D Activity Recognition Using Motion History and Binary Shape Templates

Laban movement analysis and hidden Markov models for dynamic 3D gesture recognition

Depth-based human action recognition using histogram of templates

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation