Full Body Gesture Recognition for Human-Machine Interaction in Intelligent Spaces

  • David Casillas-Perez
  • Javier Macias-Guarasa
  • Marta Marron-Romera
  • David Fuentes-Jimenez
  • Alvaro Fernandez-Rincon
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9656)


This paper describes a proposal for a full body gesture recognition system to be used in an intelligent space to allow users to control their environment. We describe a successful adaptation of the traditional strategy applied in the design of spoken language recognition systems, to the new domain of full body gesture recognition. The experimental evaluation has been done on a realistic task where different elements in the environment can be controlled by the users using gesture sequences. The evaluation results have been obtained applying a rigorous experimental procedure, evaluating different feature extraction strategies. The average recognition rates achieved are around 97 % for the gestural sentence level, and over 98 % at the gesture level, thus experimentally validating the proposal.


Full body gesture recognition Intelligent spaces Human-machine interaction Spoken language recognition strategies 


  1. 1.
    Gunes, H., Piccardi, M., Jan, T.: Face and body gesture recognition for a vision-based multimodal analyzer. In: Piccardi, M., Hintz, T., He, S., Huang, M.L., Feng, D.D. (eds.) VIP, vol. 36 of CRPIT, pp. 19–28. Australian Computer Society (2003)Google Scholar
  2. 2.
    Mehrabian, A.: Communication without words. Psychol. Today 2(9), 52–55 (1968)Google Scholar
  3. 3.
    Wikipedia: Wiimote sensor general information. https://en.wikipedia.org/wiki/Wii_Remote. Accessed January 2016
  4. 4.
    Wikipedia: Kinect sensor general information. http://en.wikipedia.org/wiki/Kinect . Accessed January 2016
  5. 5.
    Wikipedia: Google glass general information. https://en.wikipedia.org/wiki/Google_Glass. Accessed January 2016
  6. 6.
    Cassell, J.: A framework for gesture generation and interpretation. In: Computer Vision in Human-Machine Interaction, pp. 191–215. Cambridge University Press, Cambridge (2000)Google Scholar
  7. 7.
    Puranam, M.B.: Towards full-body gesture analysis and recognition. PhD thesis, University of Kentucky (2005)Google Scholar
  8. 8.
    Xu, D., Chen, Y., Lin, C., Kong, X., Wu, X.: Real-time dynamic gesture recognition system based on depth perception for robot navigation. In: 2012 IEEE International Conference on Robotics and Biomimetics, ROBIO 2012, Guangzhou, China, pp. 689–694, 11–14 December 2012Google Scholar
  9. 9.
    Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)CrossRefGoogle Scholar
  10. 10.
    Song, Y., Gu, Y., Wang, P., Liu, Y., Li, A.: A kinect based gesture recognition algorithm using GMM and HMM. In: 6th International Conference on Biomedical Engineering and Informatics, BMEI 2013, Hangzhou, China, pp. 750–754, 16–18 December 2013Google Scholar
  11. 11.
    Yin, Y.: Real-time continuous gesture recognition for natural multimodal interaction. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA (2014)Google Scholar
  12. 12.
    Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 1st edn. Prentice Hall PTR, Upper Saddle River (2000)Google Scholar
  13. 13.
    Lee, J.H., Hashimoto, H.: Intelligent space concept and contents. Adv. Rob. 16(3), 265–280 (2002)CrossRefGoogle Scholar
  14. 14.
    Shotton, J., Fitzgibbon, A.W., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, CO, USA, pp. 1297–1304, 20–25 June 2011Google Scholar
  15. 15.
    Pattis, R.E.: Ebnf: A notation to describe syntax. While developing a manuscript for a textbook on the Ada programming language in the late 1980s, I wrote a chapter on EBNF (1980)Google Scholar
  16. 16.
    OpenNI: https://github.com/OpenNI/OpenNI. Accessed January 2016
  17. 17.
    Müller, M., Röder, T., Clausen, M.: Efficient content-based retrieval of motion capture data. In: ACM Trans. Graph. (TOG), vol. 24, pp. 677–685. ACM (2005)Google Scholar
  18. 18.
    Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book, vol. 3. Cambridge University Engineering Department, Cambridge (2002)Google Scholar
  19. 19.
    Lee, K.F.: Context-dependent phonetic hidden markov models for speaker-independent continuous speech recognition. Acoust. Speech Signal Proc. IEEE Trans. 38(4), 599–609 (1990)CrossRefGoogle Scholar
  20. 20.
    Young, S.J., Odell, J.J., Woodland, P.C.: Tree-based state tying for high accuracy acoustic modelling. In: Proceedings of the Workshop on Human Language Technology, Association for Computational Linguistics, pp. 307–312 (1994)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • David Casillas-Perez
    • 1
    • 2
  • Javier Macias-Guarasa
    • 1
    • 2
  • Marta Marron-Romera
    • 1
    • 2
  • David Fuentes-Jimenez
    • 1
    • 2
  • Alvaro Fernandez-Rincon
    • 1
    • 2
  1. 1.Universidad de AlcaláMadridSpain
  2. 2.Escuela Politécnica Superior - Campus externoUniversidad de Alcalá – GEINTRA Research GroupAlcala de HenaresSpain

Personalised recommendations