Machine Vision and Applications

, Volume 25, Issue 8, pp 1929–1951 | Cite as

The ChaLearn gesture dataset (CGD 2011)

  • Isabelle Guyon
  • Vassilis Athitsos
  • Pat Jangyodsuk
  • Hugo Jair Escalante
Special Issue Paper


This paper describes the data used in the ChaLearn gesture challenges that took place in 2011/2012, whose results were discussed at the CVPR 2012 and ICPR 2012 conferences. The task can be described as: user-dependent, small vocabulary, fixed camera, one-shot-learning. The data include 54,000 hand and arm gestures recorded with an RGB-D \(\hbox {Kinect}^\mathrm{TM}\)camera. The data are organized into batches of 100 gestures pertaining to a small gesture vocabulary of 8–12 gestures, recorded by the same user. Short continuous sequences of 1–5 randomly selected gestures are recorded. We provide man-made annotations (temporal segmentation into individual gestures, alignment of RGB and depth images, and body part location) and a library of function to preprocess and automatically annotate data. We also provide a subset of batches in which the user’s horizontal position is randomly shifted or scaled. We report on the results of the challenge and distribute sample code to facilitate developing new solutions. The data, datacollection software and the gesture vocabularies are downloadable from We set up a forum for researchers working on these data


Computer vision Gesture recognition Sign language recognition RGBD cameras Kinect Dataset Challenge Machine learning Transfer learning One-shot-learning 

Mathematics Subject Classification (2000)

65D19 68T10 97K80 



This challenge was organized by ChaLearn whose directors are gratefully acknowledged. The submission website was hosted by Kaggle and we thank Ben Hamner for his wonderful support. Our sponsors include Microsoft (Kinect for Xbox 360) and Texas Instrument who donated prizes. We are very grateful to Alex Kipman and Laura Massey at Microsoft and to Branislav Kisacanin at Texas Instrument who made this possible. We also thank the committee members and participants of the CVPR 2011, CVPR 2012, and ICPR 2012 gesture recognition workshop, the judges of the demonstration competitions hosted in conjunction with CVPR 2012 and ICPR 2012 and the Pascal2 reviewers who made valuable suggestions. We are particularly grateful to Richard Bowden, Philippe Dreuw, Ivan Laptev, Jitendra Malik, Greg Mori, and Christian Vogler, who provided us with useful guidance in the design of the dataset.


  1. 1.
    Accelerative Integrated Method (AIM) foreign language teaching methodology,
  2. 2.
    Computer vision datasets on the web.
  3. 3.
    Imageclef—the clef cross language image retrieval track.
  4. 4.
    The Pascal visual object classes homepage.
  5. 5.
    Alon, Jonathan, Athitsos, Vassilis, Yuan, Quan, Sclaroff, Stan: A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE Trans. Patt. Anal. Mach. Intell. 31(9), 1685–1699 (2009)CrossRefGoogle Scholar
  6. 6.
    Beyer, M.: Teach your baby to sign: an illustrated guide to simple sign language for babies. Fair Winds Press, Minneapolis (2007)Google Scholar
  7. 7.
    Calatroni, A., Roggen, D., Tröster, G.: Collection and curation of a large reference dataset for activity recognition. In: Systems, Man, and Cybernetics (SMC), 2011 IEEE International Conference on, pp. 30–35. (2011)Google Scholar
  8. 8.
    Carroll, C., Carroll, R.: Mudras of India: a comprehensive guide to the hand gestures of yoga and Indian dance. Jessica Kingsley Publishers, London (2012)Google Scholar
  9. 9.
    Chavarriaga, R., Sagha, H, Calatroni, A., Tejaswi D.S., Tröster, G., José del Millán, R., Roggen, D.: The opportunity challenge: A benchmark database for on-body sensor-based activity recognition. Patt. Recogn. Lett. (2013)Google Scholar
  10. 10.
    Private communicationGoogle Scholar
  11. 11.
    Curwen, J.: The standard course of lessons & exercises in the Tonic Sol-Fa Method of teaching music: (Founded on Miss Glover’s Scheme for Rendering Psalmody Congregational. A.D. 1835.).. Nabu Press, Charleston (2012)Google Scholar
  12. 12.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection, pp. 886–893. CVPR, Providence (2005)Google Scholar
  13. 13.
    Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Proceedings of the 9th European conference on Computer Vision—Volume Part II. ECCV’06, pp. 428–441. Springer-Verlag, Berlin, (2006)Google Scholar
  14. 14.
    De la Torre Frade, F., Hodgins, J.K., Bargteil, A.W., Martin A., Xavier, M., Justin C., Collado I Castells, A., Beltran, J.: Guide to the carnegie mellon university multimodal activity (cmu-mmac) database. In: Technical Report CMU-RI-TR-08-22, Robotics Institute, Pittsburgh, (2008)Google Scholar
  15. 15.
    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: A large-scale hierarchical image database. In: CVPR09, (2009)Google Scholar
  16. 16.
    Dreuw, P., Neidle, C., Athitsos, V, Sclaroff, S., Ney, H.: Benchmark databases for video-based automatic sign language recognition. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), European Language Resources Association (ELRA), Marrakech, (2008)Google Scholar
  17. 17.
    Eichner, Marcin, Marín-Jiménez, Manuel Jesús, Zisserman, Andrew, Ferrari, Vittorio: 2D articulated human pose estimation and retrieval in (almost) unconstrained still images. Intern. J. Comp. Vis. 99(2), 190–214 (2012)CrossRefGoogle Scholar
  18. 18.
    Jair, E.H., Guyon, I.: Principal motion: Pca-based reconstruction of motion histograms. In: Technical report, ChaLearn Technical Memorandum, (2012).
  19. 19.
    Escalante, H.J., Guyon, I., Athitsos, V., Jangyodsuk, P., Wan, J.: Principal motion components for gesture recognition using a single-example. CoRR abs/1310.4822 (2013).
  20. 20.
    Escalera, S., Gonzàlez, J., Baró, X., Reyes, M., Lopes, O., Guyon, I, Athitsos, V., Jair E.H.: Multi-modal gesture recognition challenge 2013: Dataset and results. In: Technical report, ChaLearn Technical Memorandum, (2013)Google Scholar
  21. 21.
    Glomb, P., Romaszewski, M., Opozda, S., Sochan, A.: Choosing and modeling the hand gesture database for a natural user interface. In: Proceedings of the 9th international conference on Gesture and Sign Language in Human–Computer Interaction and Embodied Communication. GW’11, pp. 24–35. Springer-Verlag, Berlin, (2012)Google Scholar
  22. 22.
    Gross, R., Shi, J.: The cmu motion of body (mobo) database. In: Technical Report CMU-RI-TR-01-18. Robotics Institute, Carnegie Mellon University, Pittsburgh, (2001)Google Scholar
  23. 23.
    Guyon, I.: Athitsos, V., Jangyodsuk, P., Jair E.H.: ChaLearn gesture demonstration kit. In: Technical report, ChaLearn Technical Memorandum, (2013)Google Scholar
  24. 24.
    Guyon, I., Athitsos, V., Jangyodsuk, P., Jair E.H., Hamner, B.: Results and analysis of the ChaLearn gesture challenge 2012. In: Advances in Depth Image Analysis and Applications, volume 7854 of, Lecture Notes in Computer Science, pp. 186–204. (2013)Google Scholar
  25. 25.
    Guyon, I., Athitsos, V., Jangyodsuk, P., Hamner, B., Jair E.H.: Chalearn gesture challenge: design and first results. In: CVPR Workshops, pp. 1–6. IEEE (2012)Google Scholar
  26. 26.
    Hargrave, J.L.: Let me see your body talk. Kendall/Hunt Pub. Co., Dubuque (1995)Google Scholar
  27. 27.
    Hwang, B.-W., Kim, S., Lee, S.-W.: A full-body gesture database for automatic gesture recognition. In: FG, pp. 243–248. IEEE Computer Society (2006)Google Scholar
  28. 28.
    Kendon, A.: Gesture: visible action as utterance. Cambridge University Press, Cambridge (2004) Google Scholar
  29. 29.
    Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: ICCV (2011)Google Scholar
  30. 30.
    Laptev, Ivan: On space–time interest points. Intern. J. Comp. Vis. 64(2–3), 107–123 (2005)CrossRefGoogle Scholar
  31. 31.
    Larsson, M., Serrano V.I., Kragic, D., Kyrki V.: Cvap arm/hand activity database,
  32. 32.
    Malgireddy, Manavender, Nwogu, Ifeoma, Govindaraju, Venu: Language-motivated approaches to action recognition. JMLR 14, 2189–2212 (2013)MathSciNetGoogle Scholar
  33. 33.
    Martnez, A.M., Wilbur, R.B., Shay, R., Kak, A.C.: Purdue rvl-slll asl database for automatic recognition of american sign language. In: Proceedings of the 4th IEEE International Conference on Multimodal Interfaces. ICMI ’02, pp. 167–172. IEEE Computer Society, Washington, (2002)Google Scholar
  34. 34.
    McNeill, D.: Hand and mind: what gestures reveal about thought. Psychology/cognitive science. University of Chicago Press, Chicago (1996)Google Scholar
  35. 35.
    Moeslund, T.B., Bajers, F.: Summaries of 107 computer vision-based human motion capture papers (1999)Google Scholar
  36. 36.
    Moeslund, Thomas B., Hilton, Adrian, Krüger, Volker, Sigal, L. (eds.): Visual analysis of humans—looking at people. Springer, Berlin (2011)Google Scholar
  37. 37.
    Müller, M., Röder, T., Clausen, M., Eberhardt, B., Krüger, B., Weber, A.: Documentation mocap database hdm05. In: Technical Report CG-2007-2, Universität Bonn, (2007)Google Scholar
  38. 38.
    Munari, B.: Speak Italian: the fine art of the gesture. Chronicle Books, San Francisco (2005)Google Scholar
  39. 39.
    World Federation of the Deaf and World Federation of the Deaf. Unification of Signs Commission. Gestuno: international sign language of the deaf. GESTUNO: International Sign Language of the Deaf, Langage Gestuel International Des Sourds. British Deaf Association [for] the World Federation of the Deaf (1975)Google Scholar
  40. 40.
    Raptis, M., Kirovski, D., Hoppes, H.: Real-time classification of dance gestures from skeleton animation. In: Proceedings of the ACM SIGGRAPH/Eurographics symposium on Computer animation, (2011)Google Scholar
  41. 41.
    Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: In CVPR (2011)Google Scholar
  42. 42.
    Sigal, Leonid, Balan, Alexandru O.: Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comp. Vision 87(1–2), 4–27 (2010)CrossRefGoogle Scholar
  43. 43.
    Antonio, T., Robert, F., Freeman, W.T.: 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans. Patt. Anal. Mach. Intell. 30(11) (2008)Google Scholar
  44. 44.
    Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. Info. Theory IEEE Trans 13(2), 260–269 (1967)CrossRefzbMATHGoogle Scholar
  45. 45.
    von Laban, R., Lange, R.: Laban’s principles of dance and movement notation. Macdonald & Evans, Canada (1975)Google Scholar
  46. 46.
    Wagner, M., Armstrong, N.: Field guide to gestures: how to identify and interpret virtually every gesture known to man. Field Guide, Quirk Books, Philadelphia (2003)Google Scholar
  47. 47.
    Wan, J., Ruan, Q., Li, W.: One-shot learning gesture recognition from rgb-d data using bag-of-features. JMLR (2013)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Isabelle Guyon
    • 1
  • Vassilis Athitsos
    • 2
  • Pat Jangyodsuk
    • 2
  • Hugo Jair Escalante
    • 3
  1. 1.ChaLearnBerkeleyUSA
  2. 2.University of Texas at ArlingtonArlingtonUSA
  3. 3.INAOEPueblaMexico

Personalised recommendations