Machine Vision and Applications

, Volume 25, Issue 4, pp 943–954 | Cite as

A natural and synthetic corpus for benchmarking of hand gesture recognition systems

  • Javier MolinaEmail author
  • José A. Pajuelo
  • Marcos Escudero-Viñolo
  • Jesús Bescós
  • José M. Martínez
Original Paper


The use of hand gestures offers an alternative to the commonly used human–computer interfaces (i.e. keyboard, mouse, gamepad, voice, etc.), providing a more intuitive way of navigating among menus and in multimedia applications. This paper presents a dataset for the evaluation of hand gesture recognition approaches in human–computer interaction scenarios. It includes natural data and synthetic data from several State of the Art dictionaries. The dataset considers single-pose and multiple-pose gestures, as well as gestures defined by pose and motion or just by motion. Data types include static pose videos and gesture execution videos—performed by a set of eleven users and recorded with a time-of-flight camera—and synthetically generated gesture images. A novel collection of critical factors involved in the creation of a hand gestures dataset is proposed: capture technology, temporal coherence, nature of gestures, representativeness, pose issues and scalability. Special attention is given to the scalability factor, proposing a simple method for the synthetic generation of depth images of gestures, making possible the extension of a dataset with new dictionaries and gestures without the need of recruiting new users, as well as providing more flexibility in the point-of-view selection. The method is validated for the presented dataset. Finally, a separability study of the pose-based gestures of a dictionary is performed. The resulting corpus, which exceeds in terms of representativity and scalability the datasets existing in the State Of Art, provides a significant evaluation scenario for different kinds of hand gesture recognition solutions.


Hand gesture dataset Hand gesture recognition Pose-based Motion-based  Human–computer interaction 


  1. 1.
    Causo, A., Matsuo, M., Ueda, E., Takemura, K., Matsumoto, Y., Takamatsu, J., Ogasawara, T.: Hand pose estimation using voxel-based individualized hand model. In: IEEE/ASME International Conference on Advanced Intelligent Mechatronics, 2009. AIM 2009, pp. 451–456 (2009)Google Scholar
  2. 2.
    Causo, A., Ueda, E., Kurita, Y., Matsumoto, Y., Ogasawara, T.: Model-based hand pose estimation using multiple viewpoint silhouette images and unscented kalman filter. In: The 17th IEEE International Symposium on Robot and Human Interactive Communication, 2008. RO-MAN 2008 , pp. 291–296 (2008)Google Scholar
  3. 3.
    Dadgostar, F., Barczak, A.L.C., Sarrafzadeh, A.: A color hand gesture database for evaluating and improving algorithms on hand gesture and posture recognition. Res. Lett. Inf. Math. Sci. 7, 127–134 (2005)Google Scholar
  4. 4.
    Erol, A., Bebis, G., Nicolescu, M., Boyle, R.D., Twombly, X.: Vision-based hand pose estimation: a review. Comput. Vis. Image Underst. 108(1–2), 52–73 (2007)CrossRefGoogle Scholar
  5. 5.
    Ge, S., Yang, Y., Lee, T.: Hand gesture recognition and tracking based on distributed locally linear embedding. In: IEEE Conference on Robotics, Automation and Mechatronics, 2006, pp. 1–6 (2006)Google Scholar
  6. 6.
    Han, L., Liang, W.: Continuous hand gesture recognition in the learned hierarchical latent variable space. In: Proceedings of the 5th international conference on Articulated Motion and Deformable Objects, AMDO ’08, pp. 32–41. Springer, Berlin (2008)Google Scholar
  7. 7.
    Ho, M.F., Tseng, C.Y., Lien, C.C., Huang, C.L.: A multi-view vision-based hand motion capturing system. Pattern Recognit. 44, 443–453 (2011)CrossRefzbMATHGoogle Scholar
  8. 8.
    Holte, M.B., Stoerring, M.: Pointing and command gestures under mixed illumination conditions: video sequence dataset, (2004)
  9. 9.
    Hu, M.K.: Visual pattern recognition by moment invariants. IRE Trans. Inf. Theory 8(2), 179–187 (1962)CrossRefzbMATHGoogle Scholar
  10. 10.
    Kim, T.K., Wong, S.F., Cipolla, R.: Tensor canonical correlation analysis for action classification. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR ’07, pp. 1–8 (2007)Google Scholar
  11. 11.
    Kollorz, E., Penne, J., Hornegger, J., Barke, A.: Gesture recognition with a time-of-flight camera. Int. J. Intell. Syst. Technol. Appl. 5(3/4), 334–343 (2008)Google Scholar
  12. 12.
    Laviola, J.J.: Bringing vr and spatial 3d interaction to the masses through video games. IEEE Comput. Gr. Appl. 28(5), 10–15 (2008)CrossRefGoogle Scholar
  13. 13.
    Lewis, J.P., Cordner, M., Fong, N.: Pose space deformation: a unified approach to shape interpolation and skeleton-driven deformation. In: Proceedings of the 27th annual conference on Computer graphics and interactive techniques, SIGGRAPH ’00, pp. 165–172. ACM Press/Addison-Wesley Publishing Co., New York, NY, USA (2000)Google Scholar
  14. 14.
    Liu, X., Fujimura, K.: Hand gesture recognition using depth data. In: Sixth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 529–534 (2004)Google Scholar
  15. 15.
    Marcel, S.: Hand posture recognition in a body-face centered space. In: CHI ’99 extended abstracts on Human factors in computing systems, CHI EA ’99, pp. 302–303. ACM, New York, NY, USA (1999)Google Scholar
  16. 16.
    Marcel, S., Bernier, O., Viallet, J.E., Collobert, D.: Hand gesture recognition using input-output hidden markov models. In: Fourth IEEE International Conference on Automatic Face and Gesture Recognition, 2000. Proceedings, pp. 456–461 (2000) Google Scholar
  17. 17.
    Martin Larsson Isabel Serrano Vicente, D.K.: Cvap arm/hand activity database, (2011)
  18. 18.
    Mitra, S., Acharya, T.: Gesture recognition: a survey. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 37(3), 311–324 (2007)CrossRefGoogle Scholar
  19. 19.
    Molina, J., Escudero-Viñolo, M., Signoriello, A., Pardás, M., Ferrán, C., Bescós, J., Marqués, F., Martínez, J.: Real-time user independent hand gesture recognition from time-of-flight camera video using static and dynamic models. Mach. Vis. Appl. 24, 187–204 (2013)CrossRefGoogle Scholar
  20. 20.
    Ren, Z., Meng, J., Yuan, J., Zhang, Z.: Robust hand gesture recognition with kinect sensor. In: Proceedings of the 19th ACM international conference on Multimedia, ACM MM ’11, pp. 759–760. ACM, New York (2011)Google Scholar
  21. 21.
    Soutschek, S., Penne, J., Hornegger, J., Kornhuber, J.: 3-d gesture-based scene navigation in medical imaging applications using time-of-flight cameras. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–6 (2008).Google Scholar
  22. 22.
    Triesch, J., VD Malsburg, C.: Robust classification of hand postures against complex backgrounds. In: Proceedings of the Second International Conference on Automatic Face and Gesture Recognition, 1996, pp. 170–175 (1996)Google Scholar
  23. 23.
    Triesch, J., VD Malsburg, C.: A system for person-independent hand posture recognition against complex backgrounds. Pattern Anal. Mach. Intell. 23(12), 1449–1453 (2001)Google Scholar
  24. 24.
    Yamanaka, K., Yano, A., Morishima, S.: Example based skinning with progressively optimized support joints. In: ACM SIGGRAPH ASIA 2009 Posters, SIGGRAPH ASIA ’09, p. 55:1. ACM, New York, NY, USA (2009)Google Scholar
  25. 25.
    Yoshiyasu, Y., Yamazaki, N.: Pose space surface manipulation. Int. J. Comput. Games Technol. 2012, 1:1–1:13 (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Javier Molina
    • 1
    Email author
  • José A. Pajuelo
    • 1
  • Marcos Escudero-Viñolo
    • 1
  • Jesús Bescós
    • 1
  • José M. Martínez
    • 1
  1. 1.Video Processing and Understanding Lab Laboratorio C-111 EscuelaPolitécnica SuperiorUniversidad Autónoma de MadridMadridSpain

Personalised recommendations